10,000 Matching Annotations
  1. Mar 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This manuscript presents an interesting exploration of the potential activation mechanisms of DLK following axonal injury. While the experiments are beautifully conducted and the data are solid, I feel that there is insufficient evidence to fully support the conclusions made by the authors.

      In this manuscript, the authors exclusively use the puc-lacZ reporter to determine the activation of DLK. This reporter has been shown to be induced when DLK is activated.

      However, there is insufficient evidence to confirm that the absence of reporter activation necessarily indicates that DLK is inactive. As with many MAP kinase pathways, the DLK pathway can be locally or globally activated in neurons, and the level of DLK activation may depend on the strength of the stimulation. This reporter might only reflect strong DLK activation and may not be turned on if DLK is weakly activated. The results presented in this manuscript support this interpretation. Strong stimulation, such as axotomy of all synaptic branches, caused robust DLK activation, as indicated by puc-lacZ expression. In contrast, weak stimulation, such as axotomy of some synaptic branches, resulted in weaker DLK activation, which did not induce the puc-lacZ reporter. This suggests that the strength of DLK activation depends on the severity of the injury rather than the presence of intact synapses. Given that this is a central conclusion of the study, it may be worthwhile to confirm this further. Alternatively, the authors may consider refining their conclusion to better align with the evidence presented.

      In Figure 1E we have replotted the puc-lacZ data to show comparisons between different injuries that leave different numbers of spared (or lost) boutons and branches.  We observed no differences between injuries that remove only a small fraction of boutons (injury location (a)) and injuries that remove nearly all of them (injury locations (b) and (c)) and uninjured neurons (Figure 1E). These observations argue against the interpretation that the strength of DLK activation (at least within the cell body) depends on the severity of injury. Rather, puc-lacZ induction appears to be bimodal. It is either induced (in various injuries that remove all synaptic boutons), or not induced, including in injuries that spared only a small fraction of the total boutons. We therefore think that the presence of a remaining synaptic connection rather than the extent of the injury per se is a major determinant of whether the cell body component of Wnd signaling can be activated. 

      The reviewer (and others) fairly point out that our current study focuses on puc-lacZ as a reporter of Wnd signaling in the cell body. We consider this to be a downstream integration of events in axons that are more challenging to detect. It is striking that this integration appears strongly sensitized to the presence of spared synaptic boutons. Examination of Wnd’s activation in axons and synapses is a goal for our future work.

      As noted by the authors, DLK has been implicated in both axon regeneration and degeneration. Following axotomy, DLK activation can lead to the degeneration of distal axons, where synapses are located. This raises an important question: how is DLK activated in distal axons? The authors might consider discussing the significance of this "synapse connection-dependent" DLK activation in the broader context of DLK function and activation mechanisms.

      While it has been noted that inhibition of DLK can mildly delay Wallerian degeneration (Miller et al., 2009), this does not appear to be the case for retinal ganglion cell axons following optic nerve crush (Fernandes et al., 2014). It is also not the case for Drosophila motoneurons and NMJ terminals following peripheral nerve injury (Xiong et al., 2012; Xiong and Collins, 2012). Instead, overexpression of Wnd or activation of Wnd by a conditioning injury leads to an opposite phenotype - an increase in resiliency to Wallerian degeneration for axons that have been previously injured (Xiong et al., 2012; Xiong and Collins, 2012). The downstream outcome of Wnd activation is highly dependent on the context; it may be an integration of the outcomes of local Wnd/DLK activation in axons with downstream consequences of nuclear/cell body signaling.  The current study suggests some rules for the cell body signaling, however, how Wnd is regulated at synapses and why it promotes degeneration in some circumstances but not others are important future questions.

      For the reviewer’s suggestion, it is interesting to consider DLK’s potential contributions to the loss of NMJ synapses in a mouse model of ALS (Le Pichon et al., 2017; Wlaschin et al., 2023). Our findings suggest that the synaptic terminal is an important locus of DLK regulation, while dysfunction of NMJ terminals is an important feature of the ‘dying back’ hypothesis of disease etiology (Dadon-Nachum et al., 2011; Verma et al., 2022). We propose that the regulation of DLK at synaptic terminals is an important area for future study, and may reveal how DLK might be modulated to curtail disease progression. Of note, DLK inhibitors are in clinical trials (Katz et al., 2022; Le et al., 2023; Siu et al., 2018), but at least some have been paused due to safety concerns (Katz et al., 2022). Further understanding of the mechanisms that regulate DLK are needed to understand whether and how DLK and its downstream signaling can be tuned for therapeutic benefit.

      Reviewer #2 (Public review):

      Summary:

      The authors study a panel of sparsely labeled neuronal lines in Drosophila that each form multiple synapses. Critically, each axonal branch can be injured without affecting the others, allowing the authors to differentiate between injuries that affect all axonal branches versus those that do not, creating spared branches. Axonal injuries are known to cause Wnd (mammalian DLK)-dependent retrograde signals to the cell body, culminating in a transcriptional response. This work identifies a fascinating new phenomenon that this injury response is not all-or-none. If even a single branch remains uninjured, the injury signal is not activated in the cell body. The authors rule out that this could be due to changes in the abundance of Wnd (perhaps if incrementally activated at each injured branch) by Wnd, Hiw's known negative regulator. Thus there is both a yet-undiscovered mechanism to regulate Wnd signaling, and more broadly a mechanism by which the neuron can integrate the degree of injury it has sustained. It will now be important to tease apart the mechanism(s) of this fascinating phenomenon. But even absent a clear mechanism, this is a new biology that will inform the interpretation of injury signaling studies across species.

      Strengths:

      (1) A conceptually beautiful series of experiments that reveal a fascinating new phenomenon is described, with clear implications (as the authors discuss in their Discussion) for injury signaling in mammals.

      (2) Suggests a new mode of Wnd regulation, independent of Hiw.

      Weaknesses:

      (1) The use of a somatic transcriptional reporter for Wnd activity is powerful, however, the reporter indicates whether the transcriptional response was activated, not whether the injury signal was received. It remains possible that Wnd is still activated in the case of a spared branch, but that this activation is either local within the axons (impossible to determine in the absence of a local reporter) or that the retrograde signal was indeed generated but it was somehow insufficient to activate transcription when it entered the cell body. This is more of a mechanistic detail and should not detract from the overall importance of the study

      We agree. The puc-lacZ reporter tells us about signaling in the cell body, but whether and how Wnd is regulated in axons and synaptic branches, which we think occurs upstream of the cell body response, remains to be addressed in future studies.

      (2) That the protective effect of a spared branch is independent of Hiw, the known negative regulator of Wnd, is fascinating. But this leaves open a key question: what is the signal?

      This is indeed an important future question, and would still be a question even if Hiw were part of the protective mechanism by the spared synaptic branch. Our current hypothesis (outlined in Figure 4) is that regulation of Wnd is tied to the retrograde trafficking of a signaling organelle in axons. The Hiw-independent regulation complements other observations in the literature that multiple pathways regulate Wnd/DLK (Collins et al., 2006; Feoktistov and Herman, 2016; Klinedinst et al., 2013; Li et al., 2017; Russo and DiAntonio, 2019; Valakh et al., 2013). It is logical for this critical stress response pathway to have multiple modes of regulation that may act in parallel to tune and restrain its activation. 

      Reviewer #3 (Public review):

      Summary:

      This manuscript seeks to understand how nerve injury-induced signaling to the nucleus is influenced, and it establishes a new location where these principles can be studied. By identifying and mapping specific bifurcated neuronal innervations in the Drosophila larvae, and using laser axotomy to localize the injury, the authors find that sparing a branch of a complex muscular innervation is enough to impair Wallenda-puc (analogous to DLK-JNKcJun) signaling that is known to promote regeneration. It is only when all connections to the target are disconnected that cJun-transcriptional activation occurs.

      Overall, this is a thorough and well-performed investigation of the mechanism of sparedbranch influence on axon injury signaling. The findings on control of wnd are important because this is a very widely used injury signaling pathway across species and injury models. The authors present detailed and carefully executed experiments to support their conclusions. Their effort to identify the control mechanism is admirable and will be of aid to the field as they continue to try to understand how to promote better regeneration of axons.

      Strengths:

      The paper does a very comprehensive job of investigating this phenomenon at multiple locations and through both pinpoint laser injury as well as larger crush models. They identify a non-hiw based restraint mechanism of the wnd-puc signaling axis that presumably originates from the spared terminal. They also present a large list of tests they performed to identify the actual restraint mechanism from the spared branch, which has ruled out many of the most likely explanations. This is an extremely important set of information to report, to guide future investigators in this and other model organisms on mechanisms by which regeneration signaling is controlled (or not).

      Weaknesses:

      The weakest data presented by this manuscript is the study of the actual amounts of Wallenda protein in the axon. The authors argue that increased Wnd protein is being anterogradely delivered from the soma, but no support for this is given. Whether this change is due to transcription/translation, protein stability, transport, or other means is not investigated in this work. However, because this point is not central to the arguments in the paper, it is only a minor critique.

      We agree and are glad that the reviewer considers this a minor critique; this is an area for future study. In Supplemental Figure 1 we present differences in the levels of an ectopically expressed GFP-Wnd-kinase-dead transgene, which is strikingly increased in axons that have received a full but not partial axotomy. We suspect this accumulation occurs downstream of the cell body response because of the timing. We observed the accumulations after 24 hours (Figure S1F) but not at early (1-4 hour) time points following axotomy (data not shown). Further study of the local regulation of Wnd protein and its kinase activity in axons is an important future direction.

      As far as the scope of impact: because the conclusions of the paper are focused on a single (albeit well-validated) reporter in different types of motor neurons, it is hard to determine whether the mechanism of spared branch inhibition of regeneration requires wnd-puc (DLK/cJun) signaling in all contexts (for example, sensory axons or interneurons). Is the nerve-muscle connection the rule or the exception in terms of regeneration program activation?

      DLK signaling is strongly activated in DRG sensory neurons following peripheral nerve injury (Shin et al., 2012), despite the fact that sensory neurons have bifurcated axons and their projections in the dorsal spinal cord are not directly damaged by injuries to the peripheral nerve. Therefore it is unlikely that protection by a spared synapse is a universal rule for all neuron types. However the molecular mechanisms that underlie this regulation may indeed be shared across different types of neurons but utilized in different ways. For instance, nerve growth factor withdrawal can lead to activation of DLK (Ghosh et al., 2011), however neurotrophins and their receptors are regulated and implemented differently in different cell types. We suspect that the restraint of Wnd signaling by the spared synaptic branch shares a common underlying mechanism with the restraint of DLK signaling by neurotrophin signaling. Further elucidation of the molecular mechanism is an important next step towards addressing this question. 

      Because changes in puc-lacZ intensity are the major readout, it would be helpful to better explain the significance of the amount of puc-lacZ in the nucleus with respect to the activation of regeneration. Is it known that scaling up the amount of puc-lacZ transcription scales functional responses (regeneration or others)? The alternative would be that only a small amount of puc-lacZ is sufficient to efficiently induce relevant pathways (threshold response).

      While induction of puc-lacZ expression correlates with Wnd-mediated phenotypes, including sprouting of injured axons (Xiong et al., 2010), protection from Wallerian degeneration (Xiong et al., 2012; Xiong and Collins, 2012) and synaptic overgrowth (Collins et al., 2006), we have not observed any correlation between the degree of puc-lacZ induction (eg modest, medium or high) and the phenotypic outcomes (sprouting, overgrowth, etc). Rather, there appears to be a striking all-or-none difference in whether puc-lacZ is induced or not induced. There may indeed be a threshold that can be restrained through multiple mechanisms. We posit in figure 4 that restraint may take place in the cell body, where it can be influenced by the spared bifurcation. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      This is a beautiful study. Naturally, you're searching now for the underlying mechanism.

      A few questions:

      (1) At present you can not determine if the Wnd signal is never initiated (when a spared branch is present) or if it gets to the cell body but is incapable of activating the puckered reporter. Is there any optical reporter (JNK activation?) that could differentiate this?

      The reviewer is correct that a tool to detect local activity of JNK kinase in axons would be ideal for probing the mechanisms that underlie our observations. A FRET reporter for JNK kinase activity has been developed and utilized in cultured cells (Fosbrink et al. 2010). It would be interesting to implement this reporter in Drosophila; it would need to be sensitive enough to visualize  in single Drosophila axons. We have previously noted Wnd-dependent phosphorylated JNK in the cell body of injured motoneurons following nerve crush (Xiong et al., 2010). However anti-pJNK antibodies detect what appears to be a constitutive signal in uninjured axons that does not appear to be influenced by activation or inhibition of Wnd (Xiong et al., 2010).

      (2) What happens when you injure the axon in a dSarm KO? This is more of a curiosity, not a necessity, but is it the axon dying or the detection of the injury itself?

      We have tested whether overexpression of Nmnat or the WldS transgene, which inhibit Wallerian degeneration of injured axons, affect the induction of puc-lacZ following nerve injury. This manipulation has no effect on puc-lacZ expression in uninjured animals, and also has no effect on the induction of puc-lacZ following peripheral nerve crush (TJ Waller, personal communication).

      (3) Are Wnd rescue experiments possible in this context? Would be an interesting place to do Wnd structure-function and compare it to the synaptic work.

      This is not possible with current reagents. Expression of wild type wnd cDNA under the Gal4/UAS promoter leads to strong induction of puc-lacZ in uninjured animals, even when weak Gal4 driver lines are used (Xiong et al., 2012, 2010). Similar observations of constitutively active signaling have been observed for expression studies of DLK in mammalian cells ((Hao et al., 2016; Huntwork-Rodriguez et al., 2013; Nihalani et al., 2000), and data not shown). These and other observations suggest that the levels of Wnd/DLK protein are tightly controlled by posttranscriptional mechanisms. Delineation of sequences within Wnd/DLK that are required for its regulation would be helpful for addressing this question.

      This will be required reading in my lab.

      That is an honor. We look forward to help from the field to understand how and why this pathway is restrained at synapses. Your students may bring new ideas to the table.

      Reviewer #3 (Recommendations for the authors):

      Piezo is spelled incorrectly in the supplemental table in multiple places.

      Thank you for pointing this out! We have made the correction.

      References cited (in rebuttal)

      Collins CA, Wairkar YP, Johnson SL, DiAntonio A. 2006. Highwire restrains synaptic growth by attenuating a MAP kinase signal. Neuron 51:57–69.

      Dadon-Nachum M, Melamed E, Offen D. 2011. The “dying-back” phenomenon of motor neurons in ALS. J Mol Neurosci 43:470–477.

      Feoktistov AI, Herman TG. 2016. Wallenda/DLK protein levels are temporally downregulated by Tramtrack69 to allow R7 growth cones to become stationary boutons. Development 143:2983–2993.

      Fernandes KA, Harder JM, John SW, Shrager P, Libby RT. 2014. DLK-dependent signaling is important for somal but not axonal degeneration of retinal ganglion cells following axonal injury. Neurobiol Dis 69:108–116.

      Ghosh AS, Wang B, Pozniak CD, Chen M, Watts RJ, Lewcock JW. 2011. DLK induces developmental neuronal degeneration via selective regulation of proapoptotic JNK activity. J Cell Biol 194:751–764.

      Hao Y, Frey E, Yoon C, Wong H, Nestorovski D, Holzman LB, Giger RJ, DiAntonio A, Collins C. 2016. An evolutionarily conserved mechanism for cAMP elicited axonal regeneration involves direct activation of the dual leucine zipper kinase DLK. Elife 5. doi:10.7554/eLife.14048

      Huntwork-Rodriguez S, Wang B, Watkins T, Ghosh AS, Pozniak CD, Bustos D, Newton K, Kirkpatrick DS, Lewcock JW. 2013. JNK-mediated phosphorylation of DLK suppresses its ubiquitination to promote neuronal apoptosis. J Cell Biol 202:747–763.

      Katz JS, Rothstein JD, Cudkowicz ME, Genge A, Oskarsson B, Hains AB, Chen C, Galanter J, Burgess BL, Cho W, Kerchner GA, Yeh FL, Ghosh AS, Cheeti S, Brooks L, Honigberg L, Couch JA, Rothenberg ME, Brunstein F, Sharma KR, van den Berg L, Berry JD, Glass JD. 2022. A Phase 1 study of GDC-0134, a dual leucine zipper kinase inhibitor, in ALS. Ann Clin Transl Neurol 9:50–66.

      Klinedinst S, Wang X, Xiong X, Haenfler JM, Collins CA. 2013. Independent pathways downstream of the Wnd/DLK MAPKKK regulate synaptic structure, axonal transport, and injury signaling. J Neurosci 33:12764–12778.

      Le K, Soth MJ, Cross JB, Liu G, Ray WJ, Ma J, Goodwani SG, Acton PJ, Buggia-Prevot V, Akkermans O, Barker J, Conner ML, Jiang Y, Liu Z, McEwan P, Warner-Schmidt J, Xu A, Zebisch M, Heijnen CJ, Abrahams B, Jones P. 2023. Discovery of IACS-52825, a potent and selective DLK inhibitor for treatment of chemotherapy-induced peripheral neuropathy. J Med Chem 66:9954–9971.

      Le Pichon CE, Meilandt WJ, Dominguez S, Solanoy H, Lin H, Ngu H, Gogineni A, Sengupta Ghosh A, Jiang Z, Lee S-H, Maloney J, Gandham VD, Pozniak CD, Wang B, Lee S, Siu M, Patel S, Modrusan Z, Liu X, Rudhard Y, Baca M, Gustafson A, Kaminker J, Carano RAD, Huang EJ, Foreman O, Weimer R, Scearce-Levie K, Lewcock JW. 2017. Loss of dual leucine zipper kinase signaling is protective in animal models of neurodegenerative disease. Sci Transl Med 9. doi:10.1126/scitranslmed.aag0394

      Li J, Zhang YV, Asghari Adib E, Stanchev DT, Xiong X, Klinedinst S, Soppina P, Jahn TR, Hume RI, Rasse TM, Collins CA. 2017. Restraint of presynaptic protein levels by Wnd/DLK signaling mediates synaptic defects associated with the kinesin-3 motor Unc-104. Elife 6. doi:10.7554/eLife.24271

      Miller BR, Press C, Daniels RW, Sasaki Y, Milbrandt J, DiAntonio A. 2009. A dual leucine kinase-dependent axon self-destruction program promotes Wallerian degeneration. Nat Neurosci 12:387–389.

      Nihalani D, Merritt S, Holzman LB. 2000. Identification of structural and functional domains in mixed lineage kinase dual leucine zipper-bearing kinase required for complex formation and stress-activated protein kinase activation. J Biol Chem 275:7273–7279.

      Russo A, DiAntonio A. 2019. Wnd/DLK is a critical target of FMRP responsible for neurodevelopmental and behavior defects in the Drosophila model of fragile X syndrome. Cell Rep 28:2581–2593.e5.

      Shin JE, Cho Y, Beirowski B, Milbrandt J, Cavalli V, DiAntonio A. 2012. Dual leucine zipper kinase is required for retrograde injury signaling and axonal regeneration. Neuron 74:1015– 1022.

      Siu M, Sengupta Ghosh A, Lewcock JW. 2018. Dual Leucine Zipper Kinase Inhibitors for the Treatment of Neurodegeneration. J Med Chem 61:8078–8087.

      Valakh V, Walker LJ, Skeath JB, DiAntonio A. 2013. Loss of the spectraplakin short stop activates the DLK injury response pathway in Drosophila. J Neurosci 33:17863–17873.

      Verma S, Khurana S, Vats A, Sahu B, Ganguly NK, Chakraborti P, Gourie-Devi M, Taneja V. 2022. Neuromuscular junction dysfunction in amyotrophic lateral sclerosis. Mol Neurobiol 59:1502–1527.

      Wlaschin JJ, Donahue C, Gluski J, Osborne JF, Ramos LM, Silberberg H, Le Pichon CE. 2023. Promoting regeneration while blocking cell death preserves motor neuron function in a model of ALS. Brain 146:2016–2028.

      Xiong X, Collins CA. 2012. A conditioning lesion protects axons from degeneration via the Wallenda/DLK MAP kinase signaling cascade. J Neurosci 32:610–615.

      Xiong X, Hao Y, Sun K, Li J, Li X, Mishra B, Soppina P, Wu C, Hume RI, Collins CA. 2012. The Highwire ubiquitin ligase promotes axonal degeneration by tuning levels of Nmnat protein. PLoS Biol 10:e1001440.

      Xiong X, Wang X, Ewanek R, Bhat P, Diantonio A, Collins CA. 2010. Protein turnover of the Wallenda/DLK kinase regulates a retrograde response to axonal injury. J Cell Biol 191:211– 223.

    1. eLife Assessment

      This study reports that the RNA binding and cardiomyopathy-associated protein RBM20 is expressed in specific populations of neurons in the CNS, where it binds to and regulates the expression of synapse-related RNAs. This is an important finding because it reveals a new mechanism for gene regulation in neurons by an RNA binding protein previously studied in the heart; the authors also provide data to suggest that the mechanism by which RBM20 acts in neurons may be distinct from the splicing regulation studied in cardiac tissue. The data in support of the binding and regulation of RNAs by RBM20 is compelling, using leading edge sequencing methods to determine RNA binding profiles, and cell type specific genetics for evaluation of function.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of this study set out to find RNA binding proteins in the CNS in cell-type specific sequencing data and discover that the cardiomyopathy-associated protein RBM20 is selectively expressed in olfactory bulb glutamatergic neurons and PV+ GABAergic neurons. They make an HA-tagged RBM20 allele to perform CLIP-seq to identify RBM20 binding sites and find direct targets of RBM20 in olfactory bulb glutmatergic neurons. In these neurons, RBM20 binds intronic regions. RBM20 has previously been implicated in splicing, but when they selectively knockout RBM20 in glutamatergic neurons they do not see changes in splicing, but they do see changes in RNA abundance, especially of long genes with many introns, which are enriched for synapse-associated functions. These data show that RBM20 has important functions in gene regulation in neurons, which was previously unknown, and they suggest it acts through a mechanism distinct from what has been studied before in cardiomyocytes.

      Strengths:

      The study finds expression of the cardiomyopathy-associated RNA binding protein RBM20 in specific neurons in the brain, opening new windows into its potential functions there.

      The study uses CLIP-seq to identify RBM20 binding RNAs in olfactory bulb neurons.

      Conditional knockout of RBM20 in glutamatergic or PV neurons allows the authors to detect mRNA expression that is regulated by RBM20.

      The data include substantial controls and quality control information to support the rigor of the findings.

      Weaknesses:

      The authors do not fully identify the mechanism by which RBM20 acts to regulate RNA expression in neurons, though they do provide data suggesting that neuronal RBM20 does not regulate alternate splicing in neurons, which is an interesting contrast to its proposed mechanism of function in cardiomyocytes. Discovery of the RNA regulatory functions of RBM20 in neurons is left as a question for future studies.

      The study does not identify functional consequences of the RNA changes in the conditional knockout cells, so this is also a question for the future.

    3. Reviewer #2 (Public review):

      Summary:

      The group around Prof. Scheiffele has made seminal discoveries reg. alternative splicing that is reflected by a current ERC advanced grant and landmark papers in eLife (2015), Science (2016), and Nature Neuroscience (2019). Recently, the group investigated proteins that contain an RRM motif in the mouse cortex. One of them, termed RBM20, was originally thought be muscle-specific and involved in alternative splicing in cardiomyocytes. However, upon close inspection, RBP20 is expressed in a particular set of interneurons (PV positive cells of the somatosensory cortex) in the cortex as well as in mitral cells of the olfactory bulb (OB). Importantly, they used CLIP to identify targets in the OB and heart. Next and quite importantly, they generated a knock-in mouse line with a His-biotin acceptor peptide and a HA epitope to perform specific biochemistry. Not surprisingly, this allowed them to specifically identify transcripts with long introns, however, most of the intronic binding sites were very distant to the splice sites. Closer GO term inspection revealed that RBM20 specifically regulates synapse-related transcripts. In order to get in vivo insight into its function in the brain, the authors generated both global as well as conditional KO mice. Surprisingly, there were no significant differences in in RBM20 PV interneurons, however, 409 transcripts were deregulated in in OB glutamatergic neurons. Here, CLIP sites were mostly found to be very distant from differentially expressed exons. Furthermore, loss-of-function RBM20 primarily yields loss of transcripts, whereas upregulation appears to be indirect. Together, these results strongly suggest a role of RBM20 in the inclusion of cryptic exons thereby promoting target degradation.

      Strengths:

      The quality of the data and the figures is high, impressive and convincing. The reported results strongly suggest a role of RBM20 in the inclusion of cryptic exons thereby promoting target degradation.

      Weaknesses:

      In their revised manuscript, the authors significantly improved the intro and results section, which is now much better suited for the general public and allows better to follow the logic of the experiments. Also, the discussion has now been expanded doing better justice to the importance of the findings presented.

      In my opinion, the revised manuscript clearly improved and represents a timely and important study, which provides major new insight into the expression and possible function of RBM20 in tissues outside of muscle.

    4. Reviewer #3 (Public review):

      Summary:

      The authors identified RBM20 expression in neural tissues using cell type-specific transcriptomic analysis. This discovery was further validated through in vitro and in vivo approaches, including RNA fluorescent in situ hybridization (FISH), open-source datasets, immunostaining, western blotting, and gene-edited RBM20 knockout (KO) mice. CLIP-seq and RiboTRAP data demonstrated that RBM20 regulates common targets in both neural and cardiac tissues, while also modulating tissue-specific targets. Furthermore, the study revealed that neuronal RBM20 governs long pre-mRNAs encoding synaptic proteins.

      Strengths:

      • Utilization of a large dataset combined with experimental evidence to identify and validate RBM20 expression in neural tissues.<br /> • Global and tissue-specific RBM20 KO mouse models provide robust support for RBM20 localization and expression.<br /> • Employing heart tissue as a control highlights the unique findings in neural tissues.

      Weaknesses:

      • Lack of physiological functional studies to explore RBM20's role in neural tissues.<br /> • Data quality requires improvement for stronger conclusions.

      Comments on revisions:

      The authors have effectively addressed most of my concerns, which has significantly improved the quality and reliability of the data. While sufficient functional data were not provided, the current findings offer valuable and novel insights into the expression of RBM20 in neurons. I have no further concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      We thank the three reviewers for the constructive suggestions made in the Public Reviews and the Recommendations to Authors. We have now addressed these comments in a revised manuscript as follows:

      (1) We will revise the text according to the reviewer suggestions and provide more detailed explanations in results and discussion.

      (2) We have uploaded higher resolution images of several figures (resolution had been reduced to achieve lower file sizes) to address the comment regarding “data quality”.

      (3) We have included additional data on eCLIP control experiments in the supplementary figures.

      (4) We have performed additional replications of the western blot analysis for Rbm20 knock-out animals and provided the data in a new Figure.

      Recommendations for the authors:

      Reviewer #1:

      (1) The study is missing CLIP-seq data from control mice that do not express HA, or HA-knocked into a safe-harbor locus. This is important because there is plenty of background HA staining in Figure S2B, in wild-type mice. Including this control would allow subsequent peak calling to distinguish between non-specific HA peaks and RBM20 specific peaks.

      The biochemical conditions used in immunostaining are much less stringent than the buffers employed for immunoprecipitation in the eCLIP protocol. Thus, background staining is not a an informative reference to assess specificity of CLIP isolations. In previous experiments, we confirmed very low background with the anti-HA antibodies in our eCLIP protocol. In the present study, we used a “no-crosslinking control” where samples were not irradiated with UV light. This negative control is now included in Supplementary Figure 4.

      (2) The GO analysis performed to infer synapse-gene specific regulation would be more useful if the authors would discuss specific genes that are represented within these terms and have been shown to be associated with neuronal function.

      We have now noted several synapse-related genes identified in the text.

      (3) Some figures would benefit from larger size and higher resolution including Fig S1, S3.

      We had previously embedded Figures as png files in the text document. In the revised version we uploaded the figures in higher resolution as individual jpeg files. Moreover, we now split Figure S1 into two separate supplementary figures (new Fig.S2) which allowed for enlarging the size of panels. We further enlarged the panels of (former) Fig.S3 (now Fig.S4).

      (4) RBP genes in Figure 1A x-axis are all lowercase. This is not standard mouse gene nomenclature.

      We corrected this.

      (5) Typo in Figure S4F rightmost panel y-axis - 'Length' is misspelled.

      We corrected this.

      Reviewer #2:

      Minor points:

      - Shortly explain DESEQ2 (p4)

      We now added a brief note and corresponding reference in the main text of the manuscript.

      - Is RBM20 a shuttling protein? Any detection in the cytoplasm?

      Our immunostainings for the endogenous RBM20 in heart and olfactory bulb cells suggest that the vast majority of wild-type RBM20 is localized to the nucleus. Previous work on RBM20 disease mutants suggest that pathological forms can accumulate in the cytoplasm. However, with the sensitivity of our detection we did not obtain evidence for a significant cytoplasmic pool in neurons. This does not exclude the possibility that the protein is shuttling – but assessing this would require different types of experiments.

      Reviewer #3:

      (1) Figure 1C: It is shown that some of the RBM20 staining do not colocalize with PV. This observation requires further explanation and discussion to clarify the significance.

      As seen in the fluorescent in situ hybridizations as well as the RiboTRap purifications (Fig.S1C,D), we observe mRNA RBM20 expression not only in parvalbumin-positive interneurons but also somatostatin-positive cells of the neocortex. Accordingly, some RBM20-positive cells do not express parvalbumin. We now clarified this in the text.

      Additionally, in Figure S1C, the resolution of the image is low, making it difficult to conclusively determine whether RBM20 RNA is localized in the nucleus. A high-resolution image would be beneficial to address this ambiguity.

      The Rbm20 mRNA is localized in the nucleus and cytoplasm. We have now split Figure S1 into two separate figures to enlarge the panels for S1C and make this more visible. Moreover, we uploaded higher resolution figure files.

      (2) Figure 1E: The molecular weight of RBM20 is approximately 135 kDa, yet there is a band near 135 kDa in the KO heart. How do the authors determine that the 150 kDa band represents RBM20 rather than the 135 kDa band? The authors may consider increasing the sample size to confirm whether the smaller band consistently appears across all KO heart tissues.

      We appreciate that in this higher molecular weight range, the indicated weight markers may not be entirely accurate. We used a validated knock-out mouse line to identify the appropriate RBM20 protein band. As the 150kDa band was reproducibly lost in the knock-out tissue in the brain and the heart tissue whereas the fainter band of lower mobility remained we concluded that on our gel system RBM20 protein has an apparent molecular weight of 150 kDa. This is further supported by the fact that also the endogenously tagged RBM20 protein has a similar mobility.

      As suggested by the reviewer, we now re-ran Western blots from multiple wild-type and corresponding knock-out tissues. This further confirmed the migration of the protein and loss of the 150 kDa band in the mutant mice (new Figure 1E).

      (3) Figure 2A: A higher-resolution image is recommended. Prior studies on RBM20 mutation knock-in mice suggest that when RBM20 localizes to the cytoplasm, it promotes molecular condensate formation. This seems to be the case in Figure 2A; however, the low image quality makes it difficult to see these molecular condensates.

      Figure2A shows endogenous RBM20 (not the epitope-tagged protein in the knock-in mice). The vast majority of the protein is localized in the nucleus rather than the cytoplasm. We are a bit uncertain what “condensates” the reviewer refers to. In the heart, we indeed see accumulations of RBM20 in foci (as described previously in the literature). As judged by their location within the DAPI-positive area, these foci are in the nucleus. By contrast, in the olfactory bulb neurons (which express lower levels of RBM20) we do not see a comparable concentration in nuclear foci but rather broad and diffuse staining. This is consistent with the hypothesis that the nuclear foci depend on the expression of highly expressed target transcripts such as titin. To better visualize this, we now uploaded files with higher resolution for the revised manuscript.

      (4) Figure 4D: This figure is not cited in the main text and should be referenced appropriately.

      We corrected this.

      (5) Page 5: The sentence "Finally, introns bound by RBM20 were significantly longer than expected by chance as assed..." contains a typo. The word "assed" should be corrected to "assessed".

      We corrected this.

      (6) Functional data: The study would benefit from functional experiments to elucidate the physiological role of RBM20 in PV neurons. For instance, since RBM20 regulates calcium-handling genes in neurons, does its absence impair calcium signaling in PV neurons? Additionally, given that RBM20 is involved in synaptic regulation, could RBM20 KO disrupt synaptic function? While it may not be feasible to address all these questions, providing some functional data would greatly enhance the overall significance of the study.

      We completely agree with the reviewer that this would greatly advance the study and the lack of data on cellular functions is the most significant limitation of this work. We attempted to obtain insights into cellular function through the structural investigations (Fig.S5). We had obtained some data on a behavioral phenotype in the mice which indicates that knock-out in vGLUT2 neurons precipitates alterations in behavior. However, due to conditions in our animal facility (emissions from construction) we struggled to solidify/confirm this data. Thus, in the interest of sharing the existing data in a timely manner we felt that more elaborate functional studies on synaptic transmission or calcium imaging should better be performed in a separate effort.

    1. eLife Assessment

      This study presents a useful method based on flow cytometry to study partitioning noise during cell division. The evidence supporting the claims of the authors is incomplete, as the method neglects other sources of noise present in cells. With the theoretical part extended, this paper would be of interest to cell biologists and biophysicists working on asymmetric partitioning during cell division.

    2. Reviewer #1 (Public review):

      Summary:

      The aim of this paper is to develop a simple method to quantify fluctuations in the partitioning of cellular elements. In particular, they propose a flow-cytometry-based method coupled with a simple mathematical theory as an alternative to conventional imaging-based approaches.

      Strengths:

      The approach they develop is simple to understand and its use with flow-cytometry measurements is clearly explained. Understanding how the fluctuations in the cytoplasm partition vary for different kinds of cells is particularly interesting.

      Weaknesses:

      The theory only considers fluctuations due to cellular division events. This seems a large weakness because it is well known that fluctuations in cellular components are largely affected by various intrinsic and extrinsic sources of noise and only under particular conditions does partitioning noise become the dominant source of noise.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a combined experimental and theoretical workflow to study partitioning noise arising during cell division. Such quantifications usually require time-lapse experiments, which are limited in throughput. To bypass these limitations, the authors propose to use flow-cytometry measurements instead and analyse them using a theoretical model of partitioning noise. The problem considered by the authors is relevant and the idea to use statistical models in combination with flow cytometry to boost statistical power is elegant. The authors demonstrate their approach using experimental flow cytometry measurements and validate their results using time-lapse microscopy. However, while I appreciate the overall goal and motivation of this work, I was not entirely convinced by the strength of this contribution. The approach focuses on a quite specific case, where the dynamics of the labelled component depend purely on partitioning. As such it seems incompatible with studying the partitioning noise of endogenous components that exhibit production/turnover. The description of the methods was partly hard to follow and should be improved. In addition, I have several technical comments, which I hope will be helpful to the authors.

      Comments:

      (1) In the theoretical model, copy numbers are considered to be conserved across generations. As a consequence, concentrations will decrease over generations due to dilution. While this consideration seems plausible for the considered experimental system, it seems incompatible with components that exhibit production and turnover dynamics. I am therefore wondering about the applicability/scope of the presented approach and to what extent it can be used to study partitioning noise for endogenous components. As presented, the approach seems to be limited to a fairly small class of experiments/situations.

      (2) Similar to the previous comment, I am wondering what would happen in situations where the generations could not be as clearly identified as in the presented experimental system (e.g., due to variability in cell-cycle length/stage). In this case, it seems to be challenging to identify generations using a Gaussian Mixture Model. Can the authors comment on how to deal with such situations? In the abstract, the authors motivate their work by arguing that detecting cell divisions from microscopy is difficult, but doesn't their flow cytometry-based approach have a similar problem?

      (3) I could not find any formal definition of division asymmetry. Since this is the most important quantity of this paper, it should be defined clearly.

      (4) The description of the model is unclear/imprecise in several parts. For instance, it seems to me that the index "i" does not really refer to a cell in the population, but rather a subpopulation of cells that has undergone a certain number of divisions. Furthermore, why is the argument of Equation 11 suddenly the fraction f as opposed to the component number? I strongly recommend carefully rewriting and streamlining the model description and clearly defining all quantities and how they relate to each other.

      (5) Similarly, I was not able to follow the logic of Section D. I recommend carefully rewriting this section to make the rationale, logic, and conclusions clear to the reader.

      (6) Much theoretical work has been done recently to couple cell-cycle variability to intracellular dynamics. While the authors neglect the latter for simplicity, it would be important to further discuss these approaches and why their simplified model is suitable for their particular experiments.

      (7) In the discussion the authors note that the microscopy-based estimates may lead to an overestimation of the fluctuations due to limited statistics. I could not follow that reasoning. Due to the gating in the flow cytometry measurements, I could imagine that the resulting populations are more stringently selected as compared to microscopy. Could that also be an explanation? More generally, it would be interesting to see how robust the results are in terms of different gating diameters.

      (8) It would be helpful to show flow cytometry plots including the identified subpopulations for all cell lines, currently, they are shown only for HCT116 cells. More generally, very little raw data is shown.

      (9) The title of the manuscript could be tailored more to the considered problem. At the moment it is very generic.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The aim of this paper is to develop a simple method to quantify fluctuations in the partitioning of cellular elements. In particular, they propose a flow-cytometry-based method coupled with a simple mathematical theory as an alternative to conventional imaging-based approaches.

      Strengths:

      The approach they develop is simple to understand and its use with flow-cytometry measurements is clearly explained. Understanding how the fluctuations in the cytoplasm partition vary for different kinds of cells is particularly interesting.

      Weaknesses:

      The theory only considers fluctuations due to cellular division events. This seems a large weakness because it is well known that fluctuations in cellular components are largely affected by various intrinsic and extrinsic sources of noise and only under particular conditions does partitioning noise become the dominant source of noise.

      We thank the Reviewer for her/his evaluation of our manuscript. The point raised is indeed a crucial one. In a cell division cycle, there are at least three distinct sources of noise that affect component numbers [1] : 

      (1) Gene expression and degradation, which determine component numbers fluctuations during cell growth.

      (2) Variability in cell division time, which depending on the underlying model may or may not be a function of protein level and gene expression.

      (3) Noise in the partitioning/inheritance of components between mother and daughter cells.

      Our approach specifically addresses the latter, with the goal of providing a quantitative measure of this noise source. For this reason, in the present work, we consider homogeneous cancer cell populations that could be considered to be stationary from a population point-of-view. By tracking the time evolution of the distribution of tagged components via live fluorescent markers, we aim at isolating partitioning noise effects. However, as noted by the Reviewer, other sources of noise are present, and depending on the considered system the relative contributions of the different sources may change. Thus, we agree that a quantification of the effect of the various noise sources on the accuracy of our measurements will improve the reliability of our method. 

      In this respect, assuming independence between noise sources, we reasoned that variability in cell cycle length would affect the timing of population emergence but not the intrinsic properties of those populations (e.g., Gaussian variance). To test this hypothesis, we conducted a preliminary set of simulations in which cell division times were drawn from an Erlang distribution (mean = 18 h, k=4k = 4k=4). The results, showing the behavior of the mean and variance of the component distributions across generations, are presented in Author response image 1. Under the assumption of independence between different noise sources, no significant effects were observed. Next, we plan to quantify the accuracy of our measurements in the presence of cross-talks between the various noise sources. As suggested, we will update the manuscript to include a more complete discussion on this topic and an evaluation of our model’s stability.

      Author response image 1.

      Variance and mean of the distribution of fluorescence intensity as a function of the generation for a time course dynamic with cell-cycle length variability. We repeated the same simulations as the one in figure 1 of the manuscript, but introducing a variable division time for each cell. The division time of each cell is extracted from an Erlang distribution (mean = 18 h and k = 4). As it is possible to observe in the plots, the results of our theoretical framework are not affected from the introduction of this variability. Hence, the Gaussian Mixture Model is still able to give the correct results  even in a noisy environment.

      (1) Soltani, Mohammad, et al. "Intercellular variability in protein levels from stochastic expression and noisy cell cycle processes." PLoS computational biology 12.8 (2016): e1004972.

      Reviewer #2 (Public review):

      Summary:

      The authors present a combined experimental and theoretical workflow to study partitioning noise arising during cell division. Such quantifications usually require time-lapse experiments, which are limited in throughput. To bypass these limitations, the authors propose to use flow-cytometry measurements instead and analyse them using a theoretical model of partitioning noise. The problem considered by the authors is relevant and the idea to use statistical models in combination with flow cytometry to boost statistical power is elegant. The authors demonstrate their approach using experimental flow cytometry measurements and validate their results using time-lapse microscopy. However, while I appreciate the overall goal and motivation of this work, I was not entirely convinced by the strength of this contribution. The approach focuses on a quite specific case, where the dynamics of the labelled component depend purely on partitioning. As such it seems incompatible with studying the partitioning noise of endogenous components that exhibit production/turnover. The description of the methods was partly hard to follow and should be improved. In addition, I have several technical comments, which I hope will be helpful to the authors.

      We are grateful to the Reviewer for her/his comments. Indeed, both partitioning and production turnover noise are in general fundamental processes. At present the only way to consider them together are time-consuming and costly transfection/microscopy/tracking experiments. In this work, we aimed at developing a method to effectively pinpoint the first component, i.e. partitioning noise thus we opted to separate the two different noise sources.  

      Below, we provide a point-by-point response that we hope will clarify all raised concerns.

      Comments:

      (1) In the theoretical model, copy numbers are considered to be conserved across generations. As a consequence, concentrations will decrease over generations due to dilution. While this consideration seems plausible for the considered experimental system, it seems incompatible with components that exhibit production and turnover dynamics. I am therefore wondering about the applicability/scope of the presented approach and to what extent it can be used to study partitioning noise for endogenous components. As presented, the approach seems to be limited to a fairly small class of experiments/situations.

      We see the Reviewer's point. Indeed, we are proposing a high-throughput and robust procedure to measure the partitioning/inheritance noise of cell components through flow cytometry time courses. By using live-cell staining of cellular compounds, we can track the effect of partitioning noise on fluorescence intensity distribution across successive generations. This specific procedure is purposely optimized to isolate partitioning noise from other sources and, as it is, can not track endogenous components or dyes that require fixation. While this certainly poses limits to the proposed approach, there are numerous contexts in which our methodology could be used to explore the role of asymmetric inheritance. Among others, (i) investigating how specific organelles are differentially partitioned and how this influences cellular behavior could provide deeper insights into fundamental biological processes: asymmetric segregation of organelles is a key factor in cell differentiation, aging, and stress response. During cell division, organelles such as mitochondria, the endoplasmic reticulum, lysosomes, peroxisomes, and centrosomes can be unequally distributed between daughter cells, leading to functional differences that influence their fate. For instance, Kajaitso et al. [1] proposed that asymmetric division of mitochondria in stem cells is associated with the retention of stemness traits in one daughter cell and differentiation in the other. As organisms age, stem cells accumulate damage, and to prevent exhaustion and compromised tissue function, cells may use asymmetric inheritance to segregate older or damaged subcellular components into one daughter cell. (ii) Asymmetric division has also been linked to therapeutic resistance in Cancer Stem Cells  [2]. Although the functional consequences are not yet fully determined, the asymmetric inheritance of mitochondria is recognized as playing a pivotal role [3]. Another potential application of our methodology may be (iii) the inheritance of lysosomes, which, together with mitochondria, appears to play a crucial role in determining the fate of human blood stem cells [4]. Furthermore, similar to studies conducted on liquid tumors [5][6], our approach could be extended to investigate cell growth dynamics and the origins of cell size homeostasis in adherent cells [7][8][9].  The aforementioned cases of study can be readily addressed using our approach that in general is applicable whenever live-cell dyes can be used. We will add a discussion of the strengths and limitations of the method in the Discussion section of the revised version of the manuscript. 

      (1) Katajisto, Pekka, et al. "Asymmetric apportioning of aged mitochondria between daughter cells is required for stemness." Science 348.6232 (2015): 340-343.

      (2) Hitomi, Masahiro, et al. "Asymmetric cell division promotes therapeutic resistance in glioblastoma stem cells." JCI insight 6.3 (2021): e130510.

      (3) García-Heredia, José Manuel, and Amancio Carnero. "Role of mitochondria in cancer stem cell resistance." Cells 9.7 (2020): 1693.

      (4) Loeffler, Dirk, et al. "Asymmetric organelle inheritance predicts human blood stem cell fate." Blood, The Journal of the American Society of Hematology 139.13 (2022): 2011-2023.

      (5) Miotto, Mattia, et al. "Determining cancer cells division strategy." arXiv preprint arXiv:2306.10905 (2023).

      (6) Miotto, Mattia, et al. "A size-dependent division strategy accounts for leukemia cell size heterogeneity." Communications Physics 7.1 (2024): 248.

      (7) Kussell, Edo, and Stanislas Leibler. "Phenotypic diversity, population growth, and information in fluctuating environments." Science 309.5743 (2005): 2075-2078.

      (8) McGranahan, Nicholas, and Charles Swanton. "Clonal heterogeneity and tumor evolution: past, present, and the future." Cell 168.4 (2017): 613-628.

      (9) De Martino, Andrea, Thomas Gueudré, and Mattia Miotto. "Exploration-exploitation tradeoffs dictate the optimal distributions of phenotypes for populations subject to fitness fluctuations." Physical Review E 99.1 (2019): 012417.

      (2) Similar to the previous comment, I am wondering what would happen in situations where the generations could not be as clearly identified as in the presented experimental system (e.g., due to variability in cell-cycle length/stage). In this case, it seems to be challenging to identify generations using a Gaussian Mixture Model. Can the authors comment on how to deal with such situations? In the abstract, the authors motivate their work by arguing that detecting cell divisions from microscopy is difficult, but doesn't their flow cytometry-based approach have a similar problem?

      The point raised is an important one, as it highlights the fundamental role of the gating strategy. The ability to identify the distribution of different generations using the Gaussian Mixture Model (GMM) strongly depends on the degree of overlap between distributions. The more the distributions overlap, the less capable we are of accurately separating them.

      The extent of overlap is influenced by the coefficients of variation (CV) of both the partitioning distribution function and the initial component distribution. Specifically, the component distribution at time t results from the convolution of the component distribution itself at time t−1 and the partitioning distribution function. Therefore, starting with a narrow initial component distribution allows for better separation of the generation peaks. The balance between partitioning asymmetry and the width of the initial component distribution is thus crucial.

      As shown in Author response image 2, increasing the CV of either distribution reduces the ability to distinguish between different generations.

      Author response image 2.

      Components distribution at varying CVs of initial components and partitioning distributions. Starting from a condition in which both division asymmetry and wideness of the initial components distribution are low and different generations are clearly separable, increasing either the CVs leads to distribution mixing and greater reconstruction difficulty.

      However, the variance of the initial distribution cannot be reduced arbitrarily. While selecting a narrow distribution facilitates a better reconstruction of the distributions, it simultaneously limits the number of cells available for the experiment. Therefore, for components exhibiting a high level of asymmetry, further narrowing of the initial distribution becomes experimentally impractical.

      In such cases, an approach previously tested on liquid tumors [1] involves applying the Gaussian Mixture Model (GMM) in two dimensions by co-staining another cellular component with lower division asymmetry.

      Regarding time-lapse fluorescence microscopy, the main challenge lies not in disentangling the interplay of different noise sources, but rather in obtaining sufficient statistical power from experimental data. While microscopy provides detailed insights into the division process and component partitioning, its low throughput limits large-scale statistical analyses. Current segmentation algorithms still perform poorly in crowded environments and with complex cell shapes, requiring a substantial portion of the image analysis pipeline to be performed manually, a process that is time-consuming and difficult to scale. In contrast, our cytometry-based approach bypasses this analysis bottleneck, as it enables a direct population-wide measurement of the system's evolution. We will provide a detailed discussion on these aspects in the revised version of the manuscript.

      (1) Peruzzi, Giovanna, et al. "Asymmetric binomial statistics explains organelle partitioning variance in cancer cell proliferation." Communications Physics 4.1 (2021): 188.

      (3) I could not find any formal definition of division asymmetry. Since this is the most important quantity of this paper, it should be defined clearly.

      We thank the Reviewer for the note. With division asymmetry we refer to a quantity that reflects how similar two daughter cells are likely to be in terms of inherited components after a division process. We opted to measure it via the coefficient of variation (root squared variance divided by the mean) of the partitioning fraction distribution. We will amend this lack of definition in the reviewed version of the manuscript. 

      (4) The description of the model is unclear/imprecise in several parts. For instance, it seems to me that the index "i" does not really refer to a cell in the population, but rather a subpopulation of cells that has undergone a certain number of divisions. Furthermore, why is the argument of Equation 11 suddenly the fraction f as opposed to the component number? I strongly recommend carefully rewriting and streamlining the model description and clearly defining all quantities and how they relate to each other.

      We are amending the text carefully to avoid double naming of variables and clarifying each computation passage. In equation 11 the variable f refers to the fluorescent intensity, but the notation will be changed to increase clarity. 

      (5) Similarly, I was not able to follow the logic of Section D. I recommend carefully rewriting this section to make the rationale, logic, and conclusions clear to the reader.

      We will update the manuscript clarifying the scope of section D and its results. In brief, Section A presents a general model to derive the variance of the partitioning distribution from flow cytometry time-course data without making any assumptions about the shape of the distribution itself. In Section D, our goal is to interpret the origin of asymmetry and propose a possible form for the partitioning distribution. Since the dyes used bind non-specifically to cytoplasmic amines, the tagged proteins are expected to be uniformly distributed throughout the cytoplasm and present in large numbers. Given these assumptions the least complex model for division follows the binomial distribution, with a parameter that measures the bias in the process. Therefore, we performed a similar computation to that in Section A, which allows us to estimate not only the variance but also the degree of biased asymmetry. Finally, we fitted the data to this new model and proposed an experimental interpretation of the results.

      (6) Much theoretical work has been done recently to couple cell-cycle variability to intracellular dynamics. While the authors neglect the latter for simplicity, it would be important to further discuss these approaches and why their simplified model is suitable for their particular experiments.

      We agree with the Reviewer, we will discuss this aspect in the revised version of the manuscript.

      (7) In the discussion the authors note that the microscopy-based estimates may lead to an overestimation of the fluctuations due to limited statistics. I could not follow that reasoning. Due to the gating in the flow cytometry measurements, I could imagine that the resulting populations are more stringently selected as compared to microscopy. Could that also be an explanation? More generally, it would be interesting to see how robust the results are in terms of different gating diameters.

      The Reviewer is right on the importance of the sorting procedure. As already discussed in a previous point, the gating strategy we employed plays a fundamental role: it reduces the overlap of fluorescence distributions as generations progress, enables the selection of an initial distribution distinct from the fluorescence background, allowing for longer tracking of proliferation, and synchronizes the initial population. The narrower the initial distribution, the more separated the peaks of different generations will be. However, this also results in a smaller number of cells available for the experiment, requiring a careful balance between precision and experimental feasibility. A similar procedure, although it would certainly limit the estimation error, would be impracticable In the case of microscopy. Indeed, the primary limitation and source of error is the number of recorded events. Our pipeline allowed us to track on the order of hundreds of division dynamics, but the analysis time scales non-linearly with the number of events. Significantly increasing the dataset would have been extremely time-consuming. Reducing the analysis to cells with similar fluorescence, although theoretically true, would have reduced the statistics to a level where the sampling error would drastically dominate the measure. Moreover, different experiments would have been hardly comparable, since different fluorescences could map in equally sized cells. In light of these factors, we expect higher CV for the microscopy measure than for flow cytometry’s ones.  In the plots below, we show the behaviour of the mean and the standard deviation of N numbers sampled from a gaussian distribution N(0,1) as a function of the sampling number N. The higher is N the closer the sampled distribution will be to the true one. The region in the hundreds of samples is still very noisy, but to do much better we would have to reach the order of thousands. We will add a discussion on these aspects in the reviewed version of the manuscript. 

      Author response image 3.

      Standard deviation and mean value of a distribution of points sampled from a Gaussian distribution with mean 0 and standard deviation 1,  versus the number of samples, N. Increasing N leads to a closer approximation of the expected values. In orange is highlighted the Microscopy Working Region (Microscopy WR) which corresponds to the number of samples we are able to reach with microscopy experiments. In yellow the region we would have to reach to lower the estimating error, which is although very expensive in terms of analysis time.

      (8) It would be helpful to show flow cytometry plots including the identified subpopulations for all cell lines, currently, they are shown only for HCT116 cells. More generally, very little raw data is shown.

      We will provide the requested plots for the other cell lines together with additional raw data coming from simulations in the Supplementary Material. 

      (9) The title of the manuscript could be tailored more to the considered problem. At the moment it is very generic.

      We see the Reviewer point. The proposed title aims at conveying the wide applicability of the presented approach, which ultimately allows for the assessment of the levels of fluctuations in the levels of the cellular components at division. This in turn reflects the asymmetricity in the division.

    1. eLife Assessment

      This study presents valuable findings suggesting that the late maturation of prefrontal cortex-based control processes enhances conceptual learning by allowing a period of less-constrained knowledge acquisition. The authors provide convincing computational evidence that delayed semantic control promotes learning without compromising representation integrity, with the strongest benefits emerging when control connections target intermediate layers of the model. However, the model's narrow scope raises concerns about scalability to more complex, real-world learning environments, and the meta-analysis, while supporting the developmental trajectory, does not directly test the model's specific predictions regarding task outcomes or error patterns.

    2. Reviewer #1 (Public review):

      Summary:

      This study was motivated by the general claim that delayed development of cognitive control can be beneficial for learning, and investigated this claim in the specific domain of conceptual development. A comprehensive set of computational model simulations showed that delaying the onset of semantic control produces faster learning with only minimal effects on conceptual abstraction. The simulations also showed that control was most effective at intermediate levels between modality-specific "spokes" and the multimodal "hub". A meta-analysis of developmental data was consistent with the claim of delayed onset of semantic control: young children show substantially better semantic knowledge than the ability to constrain that knowledge to a specific task at hand.

      Strengths:

      The computational modelling is based on a very well-established model of semantic cognition, which means that the simulations allow exploring the specific issues under investigation here in the context of a model that accounts for a very large set of semantic cognition phenomena. The simulations are comprehensive - manipulating different parameters of the model provides important insights into how (and why) it works.

      In addition to simulations exploring delayed maturation, there is an exploration of where semantic control is most effective, yielding the interesting result that control is most effective when it targets intermediate levels of semantic processing. To my knowledge, this is a novel finding and a concrete prediction for future testing.

      The meta-analysis is designed in a very clever way that allows extracting evidence of semantic control from a large body of prior work. The results are quite clear and compelling in showing that semantic knowledge is acquired before children are able to use task demands to constrain the use of that knowledge.

      Weaknesses:

      Computational models of cognition inherently require simplification in order to focus on the mechanisms under investigation. However, it is also important to keep these simplifications in mind because they limit the generality of the inferences that can be made from the simulation results. Two aspects are important in this context:

      (1) The multimodal structure was orthogonal to the surface similarity structure of the concepts to be learned. It is certainly true that multimodal structure does not perfectly mirror surface similarity, but closely related things tend to be perceptually similar. There are exceptions (whales, penguins, etc.), but they are *exceptional*, not typical. It may be that the somewhat extreme dissociation of multimodal and surface similarity structures creates demands that are not faced in natural conceptual development.

      (2) Much of the benefit of delayed semantic control seems to be because the model is not penalised for activating task-irrelevant features. This blurs the distinction between being aware of a feature and making a response based on that feature. A full model that also includes a response layer could become a lot more complicated and more difficult to understand, so maybe there is an advantage to using a simpler architecture.

      In addition, there is a bit of a misalignment between the model simulations and the meta-analysis. In the model, there are distinct modality-specific "spokes" and control is required in order to focus on modality/spoke in a task-appropriate way. The meta-analysis does not compare a task-defined selection of a modality; it compares the selection of taxonomic vs thematic relations, both of which are multimodal. One way to resolve this is to say that taxonomic and thematic relations are also represented in distinct sub-systems of semantic knowledge and semantic control is needed to select between them in a task-appropriate way.

      This is particularly relevant to the inference at the bottom of p. 38: "taxonomic and thematic relationships ...[are]... both being encoded within the same system of representation", which seems in direct contradiction to the present results, or at least to the logic of combining these simulations with this meta-analysis. The simulations are based on semantic control being used to select/constrain the correct distinct sub-system (modality-specific spoke); the meta-analysis is based on semantic control being used to select/constrain the correct relationship type. If these two things are analogous in some way, then the relationship type has to be something like a distinct sub-system.

    3. Reviewer #2 (Public review):

      Summary:

      This paper investigates the idea that the protracted maturation of the prefrontal cortex - often viewed as a developmental limitation - may actually confer advantages for conceptual learning in children. The authors focus on semantic control processes, which govern the context-sensitive application of conceptual knowledge, and are closely associated with late-developing regions of the prefrontal cortex.

      Drawing on a computational model, the paper formally tests whether delayed maturation of semantic control promotes the acquisition of conceptual knowledge. The simulations demonstrate that when semantic control and anatomical connectivity mature later, conceptual learning is accelerated without compromising the integrity of the learned representations. Notably, the benefit is most apparent when control connections target intermediate layers in the computational model, suggesting a nuanced interplay between control processes and the underlying conceptual network.

      To validate these computational insights in a human developmental context, the authors conduct a meta-analysis of the classic triadic matching task - a paradigm where participants decide which of two choices best matches a reference concept based on either taxonomic or thematic relations. Critically, when these relations conflict, semantic control is required to select the context-appropriate match. Results indicate that context-sensitive semantic control develops more slowly than basic conceptual knowledge, showing marked improvements between 3 and 6 years of age.

      Overall, the paper argues that the delayed development of prefrontal cortex-based control processes allows for a period of less constrained learning, ultimately enhancing conceptual acquisition. The findings challenge the traditional view of late PFC maturation as solely disadvantageous and instead position it as an adaptive feature for building robust conceptual frameworks in early childhood.

      Strengths:

      (1) Novel Theoretical Contribution<br /> The paper offers a compelling, counterintuitive argument that a developmental lag in the maturation of control processes might be beneficial for semantic learning. This stands in contrast to the conventional framing of late prefrontal cortex (PFC) development as purely disadvantageous (e.g., a "necessary but unfortunate" constraint).

      (2) Well-Grounded Computational Approach<br /> The authors propose a neural network model that is both theoretically driven (hub-and-spoke framework) and systematically tested under various conditions (different timelines for control onset, and different connectivity patterns). Their simulations replicate and extend previous findings about how insulating the multimodal hub from direct control inputs helps preserve abstract conceptual representations.

      (3) Neuro-anatomical basis<br /> The paper connects its computational claims to empirical neuroanatomy, particularly the lack of direct structural connectivity between ventral ATL (the "hub") and the PFC in humans. This lends biological plausibility to the argument that control signals likely reach the ATL via intermediate regions (e.g., posterior temporal cortex).

      (4) Meta-Analysis of Triadic Match-to-Sample<br /> The authors leverage decades of developmental data on conceptual matching tasks, reframing them in terms of semantic control vs. semantic representation. Their analysis nicely illustrates that children can identify semantic relationships (taxonomic or thematic) at age 2 if the task does not require them to select between conflicting semantic relations. In contrast, the ability to choose a task-relevant relation only emerges more robustly in 3-6 years. This developmental pattern aligns with the computational model's predictions.

      Weaknesses:

      The contribution of the paper might be considered rather specialist, and might not appeal to a broad public, which should be typical of a generalist journal. Moreover, the scope of the model is fairly narrow - its relatively small, controlled training environment raises questions about scalability to more naturalistic, high-dimensional data. Finally, the meta-analysis does not test directly the model predictions in terms of specific outcomes of the task, error patterns, or model fit, but only the developmental pattern which was an already observed phenomenon that in part motivated the hypothesis and the model itself.

    4. Author response:

      On the control of taxonomic versus thematic information. Both reviewers had questions about the relationship between the focus of the meta-analysis, the control of responses based on taxonomic versus thematic relationships, and the simulation. Both the model and the meta-analysis focus on the same mechanism, the controlled selection of task-appropriate features. In the case of the meta-analysis, this was the features and associations needed to identify the taxonomic or thematic relationships. As reviewer 1 notes, one possibility is that these kinds of structures are represented in distinct cortical regions. For instance, Mirman, Schwartz and colleagues have suggested that temporoparietal regions may preferentially support thematic knowledge while temporal regions may preferentially support taxonomic knowledge. Alternatively, they may be supported by different features instantiated within the same regions.  However, whether taxonomic and thematic relationships require access to features in different regions or not, is not crucial to the conclusions of this paper. The simulations used here happen to select features based on their inclusion in a particular sensory modality, yet they could learn to select any combination of features. Indeed, prior simulations using the Jackson et al., (2021) model show that the functional impact on learning of “deep” conceptual representations (together with controlled behaviours) is the same regardless of whether the potentiated features are localised within one spoke or distributed across spokes. Thus, the key results regarding the acquisition of semantic knowledge before the maturation of control in the current work should hold regardless of whether knowledge of taxonomic and thematic relations is localised to different anatomical regions.

      On model size and scalability. Both reviewers noted the relatively small size of the model and wondered about implications for ecological validity of the simulations and scalability to larger, noisier, and potentially more systematically structured training environments. We agree this is an important direction for future research, but one that faces two nontrivial challenges. First, reviewer 1 notes that, whereas our model environment employs orthogonal structures across spokes and for the cross-modal features, perceptual structure may be better-aligned with conceptual structure for real-world experience. While we appreciate the intuition, its validity depends to a key extent on how visual information about objects is encoded. Conceptual structure is certainly not apparent, for instance, in the distance between bitmap images of objects, nor the overlap of simple feature-extraction algorithms (such as edge detection or Fourier decomposition, etc). Even in this age of deep vision models, it remains unclear how the visual system extracts and discerns perceptual similarity from retinal input (see e.g. Mukherjee & Rogers, 2025). Most successful contemporary models train neural networks to assign visual images to semantic categories, suggesting that the visual features the model learns, and thus the perceptual similarities it represents, depend on learning to generate semantic information. Therefore, it is not clear whether the similarity that people perceive amongst instances of the same class is natively apparent in the bottom-up visual input, or whether it depends on semantic/cross-modal learning and representation. It should also be noted that within our training environment, there are features in each modality that are predictive of features in other modalities, as well as some that are only predictive of features within this modality. Thus, the full cross-modality conceptual structure is not orthogonal to the information available in each sensory domain, instead there is a relationship between surface and multimodal similarity in the dataset as in the real-world environment. In general, one virtue of the small-scale modelling endeavour in the current work is that we can be very explicit about the nature of the structure apparent within and across spokes.

      The second non-trivial issue concerns the nature of the mechanisms that allow for context-sensitive responding in large-scale language/vision models such as GPT 4. Such models are trained on web-scale language and vision and provide a means of simulating controlled behaviour with realistic stimuli, so might seem to provide a means of assessing scalability of current neuro-cognitive models. Large language/vision models rely, however, on transformer architectures whose relationship to hypothesized mechanisms of control in the mind and brain is unclear. In transformers, context-sensitive responding depends upon “attention” mechanisms that are fully distributed and integrated throughout the entire system—there is no distinction between control, representation, and short-term memory in the architecture. As a consequence, it is very difficult to understand why a model behaves the way it does, or to relate patterns of behaviour to hypothesised mechanisms in the human mind/brain. Yet transformers are currently the only models capable of exhibiting context-sensitive patterns of responding based on both language and vision. Scaling up neuro-cognitive models will require developing alternative architectures that preserve the critical hypothesised distinctions between representation and control while retaining the ability of transformers to learn from large-scale ecologically realistic corpora of language and images. In the meantime, small-scale simulations like those reported here provide some critical insights into aspects of architecture and maturation that may aid in this endeavour.

      On including a response layer. Reviewer 1 notes that our model does not separately simulate response-generation and the selective activation of relevant feature representations. We agree that there are interesting questions about how feature-potentiation and response-generation relate to one another, and that incorporating response selection in the current model would significantly complicate the analysis. The general idea that control potentiates/suppresses task-relevant feature representations in addition to simply promoting the correct response derives from classic work by Martin and others (e.g., Martin et al., 1995) showing that, for instance, regions involved in colour perception activate more strongly in tasks requiring retrieval of colour than tasks involving retrieval of action and vice versa—results consistent with the model training/testing procedure in the current work. In general, it may be counterproductive to become aware of aspects of a concept that would be irrelevant, or even actively unhelpful in making a response, suggesting guided activation is a necessary precursor to response selection (Botvinick & Cohen, 2014). Here, we focus on this important feature potentiation step.

      On the novelty of the meta-analysis. Reviewer 2 suggests the results of the meta-analysis were already known and provided motivation for the simulation. However, an important contribution of the current work is the observation that, in fact, there is little prior work on the development of semantic control. The widely known developmental delay in domain-general executive control, which did indeed motivate the study, is exclusively based on tasks requiring very different forms of executive control. Many of these involve no meaningful stimuli or require the child to completely inhibit a practiced response and generate an opposite or completely arbitrary responses, instead of requiring the child to use context to select among two or more meaningful behaviours that are equally valid in different contexts (see the introduction to Part 2). This observation, coupled with recent evidence that semantic control relies on dedicated and partially non-overlapping neural systems to executive function, illustrates the utility of the current meta-analysis: delineating the developmental trajectory of semantic control requires a task in which control is applied to the context-appropriate retrieval and manipulation of semantic knowledge, such as the triadic matching task. Moreover, the results show that semantic control, while arising later than semantic representation, nevertheless begins to mature earlier (around 2.5 years) than typical estimations of domain-general executive control (around 4). Thus, the meta-analysis contributes to our understanding of cognitive development while also testing a key prediction of the model.

    1. eLife Assessment

      The study presents valuable findings regarding the incidence and clinical impact of a mutation in a cardiac muscle protein and its association with the development of atrial fibrillation. The authors provide some convincing evidence of electrophysiological disturbances in cells with this mutation which would be of interest to cellular electrophysiologists. However, evidence supporting the conclusion that this mutation causes atrial fibrillation would benefit from more rigorous electrophysiologic approaches.

    2. Reviewer #1 (Public review):

      Summary:

      Pavel et al. analyzed a cohort of atrial fibrillation (AF) patients from the University of Illinois at Chicago, identifying TTN truncating variants (TTNtvs) and TTN missense variants (TTNmvs). They reported a rare TTN missense variant (T32756I) associated with adverse clinical outcomes in AF patients. To investigate its functional significance, the authors modeled the TTN-T32756I variant using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). They demonstrated that mutant cells exhibit aberrant contractility, increased activity of the cardiac potassium channel KCNQ1 (Kv7.1), and dysregulated calcium homeostasis. Interestingly, these effects occurred without compromising sarcomeric integrity. The study further identified increased binding of the titin-binding protein Four-and-a-Half Lim domains 2 (FHL2) with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I iPSC-aCMs.

      Strengths:

      This work has translational potential, suggesting that targeting KCNQ1 or FHL2 could represent a novel therapeutic strategy for improving cardiac function. The findings may also have broader implications for treating patients with rare, disease-causing variants in sarcomeric proteins and underscore the importance of integrating genomic analysis with experimental evidence to advance AF research and precision medicine.

      Weaknesses:

      (1) Variant Identification: It is unclear how the TTN missense variant (T32756I) was identified using REVEL, as none of the patients' parents reportedly carried the mutation or exhibited AF symptoms. Are there other TTN variants identified in the three patients carrying TTN-T32756I? Clarification on this point is necessary.

      (2) Patient-Specific iPSC Lines: Since the TTN-T32756I variant was modeled using only one healthy iPSC line, it is unclear whether patient-specific iPSC-derived atrial cardiomyocytes would exhibit similar AF-related phenotypes. This limitation should be addressed.

      (3) Hypertension as a Confounding Factor: The three patients carrying TTN-T32756I also have hypertension. Could the hypertension associated with this variant contribute secondarily to AF? The authors should discuss or rule out this possibility.

      (4) FHL2 and KCNQ1-KCNE1 Interaction: Immunostaining data demonstrating the colocalization of FHL2 with the KCNQ1-KCNE1 (MinK) complex in TTN-T32756I iPSC-aCMs are needed to strengthen the mechanistic findings.

      (5) Functional Characterization of FHL2-KCNQ1-KCNE1 Interaction: Additional functional assays are necessary to characterize the interaction between FHL2 and the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs to further validate the proposed mechanism.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present data from a single-center cohort of African-American and Hispanic/Latinx individuals with atrial fibrillation (AF). This study provides insight into the incidences and clinical impact of missense variants in the Titin (TTN) gene in this population. In addition, the authors identified a single amino acid TTN missense variant (TTN-T32756I) that was further studied using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). These studies demonstrated that the Four-and-a-Half Lim domains 2 (FHL2), has increased binding with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I-iPSC-aCMs, enhancing the slow delayed rectifier potassium current (Iks) and is a potential mechanism for atrial fibrillation. Finally, the authors demonstrate that suppression of FHL2 could normalize the Iks current.

      Strengths:

      The strengths of this manuscript/study are listed below:

      (1) This study includes a previously underrepresented population in the study of the genetic and mechanistic basis of AF.<br /> (2) The authors utilize current state-of-the-art methods to investigate the pathogenicity of a specific TTN missense variant identified in this underrepresented patient population.<br /> (3) The findings of this study identify a potential therapeutic for treating atrial fibrillation.

      Weaknesses:

      (1) The authors do not include a non-AF group when evaluating the incidence and clinical significance of TTN missense variants in AF patients.

      (2) The authors do not provide evidence that TTN-T32756I-iPSC-aCMs are arrhythmogenic only that there is an increase in the Iks current and associated action potential changes. More specifically, the authors report "compared to the WT, TTN-T32756I-iPSC-aCMs exhibited increased arrhythmic frequency" yet is it is unclear what they are referring to by "arrhythmic frequency".

      (3) There seem to be discrepancies regarding the impact of the TTN-T32756I variant on mechanical function. Specifically, the authors report "both reduced contraction and abnormal relaxation in TTN-T32756I-iPSC-aCMs" yet, separately report "the contraction amplitude of the mutant was also increased . . . suggesting an increased contractile force by the TTN-T32756I-iPSC-aCMs and TTN-T32756I-iPSC-CMs exhibited similar calcium transient amplitudes as the WT."

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe the abnormal contractile function and cellular electrophysiology in an iPSC model of atrial myocytes with a titin missense variant. They provide contractility data by sarcomere length imaging, calcium imaging, and voltage clamp of the repolarizing current iKs. While each of the findings is separately interesting, the paper comes across as too descriptive because there is no merging of the data to support a cohesive mechanistic story/statement, especially from the electrophysiological standpoint. There is definitely not enough support for the title "A Titin Missense Variant Causes Atrial Fibrillation", since there is no strong causative evidence at all. There is some interesting clinical data regarding the variant of interest and its association with HF hospitalization, which may lead to future important discoveries regarding atrial fibrillation.

      Strengths:

      The manuscript is well written and there is a wide range of experimental techniques to probe this atrial fibrillation model.

      Weaknesses:

      (1) While the clinical data is interesting, it is extremely important to rule out heart failure with preserved EF as a confounder. HFpEF leads to AF due to increased atrial remodeling, so the fact that patients with this missense variant have increased HF hospitalizations does not necessarily directly support the variant as causative of AF. It could be that the variant is actually associated directly with HFpEF instead, and this needs to be addressed and corrected in the analyses.

      (2) All of the contractility and electrophysiologic data should be done with pacing at the same rate in both control and missense variant groups, to control for the effect of cycle length on APD and calcium loading. A claim of shorter APD cannot be claimed when the firing rate of one set of cells is much faster than the other, since shorter APD is to be expected with a faster rate. Similarly, contractility is affected by diastolic interval because of the influence of SR calcium content on the myocyte power stroke. So the cells need to be paced at the same rate in the IonOptix for any direct comparison of contractility. The authors should familiarize themselves with the concept of electrical restitution.

      (3) It is interesting that the firing rate of the myocytes is faster with the missense variant. This should lead to a hypothesis and investigation of abnormal automaticity or triggered activity, which may also explain the increased contractility since all these mechanisms are related to the calcium clock and calcium loading of the SR. See #2 above for suggestions on how to adequately probe calcium handling. Such an investigation into impulse initiation mechanisms would be very powerful in supporting the primary statement of the paper since these are actual mechanisms thought to cause AF.

      (4) The claim of shortened APD without correcting for cycle length is problematic. However, the general concept of linking shortened APD in isolated cells alone to AF causation is more problematic. To have a setup for reentry, there must be a gradient of APD from short to long, and this can only be demonstrated at the tissue level, not really at the cellular level, so reentry should not be invoked here. If shortened APD is demonstrated with correction of the cycle length problem, restitution curves can be made showing APD shortening at different cycle lengths. If restitution is abnormal (i.e. the APD does not shorten normally in relation to the diastolic interval), this may lead to triggered activity which is an arrhythmogenic mechanism. This would also tie in well with the finding of abnormally elevated iKs current since iKs is a repolarizing current directly responsible for restitution.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pavel et al. analyzed a cohort of atrial fibrillation (AF) patients from the University of

      Illinois at Chicago, identifying TTN truncating variants (TTNtvs) and TTN missense variants (TTNmvs). They reported a rare TTN missense variant (T32756I) associated with adverse clinical outcomes in AF patients. To investigate its functional significance, the authors modeled the TTN-T32756I variant using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). They demonstrated that mutant cells exhibit aberrant contractility, increased activity of the cardiac potassium channel KCNQ1 (Kv7.1), and dysregulated calcium homeostasis. Interestingly, these effects occurred without compromising sarcomeric integrity. The study further identified increased binding of the titin-binding protein Four-and-a-Half Lim domains 2 (FHL2) with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I iPSCaCMs.

      Strengths:

      This work has translational potential, suggesting that targeting KCNQ1 or FHL2 could represent a novel therapeutic strategy for improving cardiac function. The findings may also have broader implications for treating patients with rare, disease-causing variants in sarcomeric proteins and underscore the importance of integrating genomic analysis with experimental evidence to advance AF research and precision medicine.

      Weaknesses

      (1) Variant Identification: It is unclear how the TTN missense variant (T32756I) was identified using REVEL, as none of the patients' parents reportedly carried the mutation or exhibited AF symptoms. Are there other TTN variants identified in the three patients carrying TTN-T32756I? Clarification on this point is necessary.  

      We thank the reviewer for their insightful comment. Our study identified deleterious missense variants using a stringent REVEL score threshold of ≥0.7; however, variants with a REVEL score above 0.5 are generally considered potentially pathogenic (Ioannidis, Nilah M., et al., Am J Human Genetics 2016; 9.4: 877-885). The TTN-T32756I variant (REVEL Score: 0.58758, Supplementary Table 1) was prioritized due to its occurrence in multiple unrelated individuals within our clinical AF cohort, despite no reported family history of AF in affected individuals. While no parental inheritance was observed, the possibility of a de novo origin cannot be excluded. Furthermore, this variant is located within a region overlapping a deletion mutation recently shown to cause AF in a zebrafish model (Jiang et al., iScience, 2024;27(7):110395) supporting its potential pathogenicity. Notably, the affected individuals did not carry additional loss-of-function TTN variants. We will clarify these points in the revised manuscript.

      (2) Patient-Specific iPSC Lines: Since the TTN-T32756I variant was modeled using only one healthy iPSC line, it is unclear whether patient-specific iPSC-derived atrial cardiomyocytes would exhibit similar AF-related phenotypes. This limitation should be addressed.

      We acknowledge the reviewer’s concern that patient-specific iPSC lines could further validate our findings. However, due to the patients' unavailability of peripheral blood mononuclear cells (PBMCs), we utilized a healthy iPSC line and introduced the TTN-T32756I variant using CRISPR/Cas9 genome editing. This approach ensures an isogenic background, thereby minimizing genetic variability and providing a controlled system to study the direct effects of the mutation. We will acknowledge this limitation in the revised manuscript.

      (3) Hypertension as a Confounding Factor: The three patients carrying TTN-T32756I also have hypertension. Could the hypertension associated with this variant contribute secondarily to AF? The authors should discuss or rule out this possibility.

      We agree that hypertension is a common comorbidity in patients with AF and could contribute to disease progression. However, all three individuals carrying TTN-T32756I exhibited early-onset AF (onset before 66 years), with one case occurring as early as 36 years. This suggests a potential two-hit mechanism, where genetic predisposition and comorbidities influence disease risk. Importantly, our iPSC model isolates the genetic effects of TTN-T32756I from other factors, supporting a direct pathogenic role. We will explicitly discuss this in the revised manuscript.

      (4) FHL2 and KCNQ1-KCNE1 Interaction: Immunostaining data demonstrating the colocalization of FHL2 with the KCNQ1-KCNE1 (MinK) complex in TTN-T32756I iPSC-aCMs are needed to strengthen the mechanistic findings.

      We appreciate the reviewer’s suggestion and agree that additional immunostaining data would strengthen the evidence for FHL2 colocalization with the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs. We will work on obtaining these additional data to validate our mechanistic findings further.

      (5) Functional Characterization of FHL2-KCNQ1-KCNE1 Interaction: To further validate the proposed mechanism, additional functional assays are necessary to characterize the interaction between FHL2 and the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs.

      We agree with the reviewer that additional functional assays would further validate the proposed mechanism. We will perform contractility and electrophysiological experiments, such as multielectrode array (MEA) assays, to characterize better the interaction between FHL2 and the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs.

      Reviewer #2 (Public review):

      Summary:

      The authors present data from a single-center cohort of African-American and Hispanic/Latinx individuals with atrial fibrillation (AF). This study provides insight into the incidences and clinical impact of missense variants in this population in the Titin (TTN) gene. In addition, the authors identified a single amino acid TTN missense variant (TTN-T32756I) that was further studied using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). These studies demonstrated that the Four-and-a-Half Lim domains 2 (FHL2) has increased binding with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I-iPSCaCMs, enhancing the slow delayed rectifier potassium current (Iks) and is a potential mechanism for atrial fibrillation. Finally, the authors demonstrate that suppression of FHL2 could normalize the Iks current.

      Strengths:

      The strengths of this manuscript/study are listed below:

      (1) This study includes a previously underrepresented population in the study of the genetic and mechanistic basis of AF.

      (2) The authors utilize current state-of-the-art methods to investigate the pathogenicity of a specific TTN missense variant identified in this underrepresented patient population.

      (3) The findings of this study identify a potential therapeutic for treating atrial fibrillation.

      Weaknesses:

      (1) The authors do not include a non-AF group when evaluating the incidence and clinical significance of TTN missense variants in AF patients.

      We acknowledge the limitation of not including a non-AF group in our clinical analysis. Our cohort is derived from a single-center registry of individuals with AF, and we do not have a matched cohort of non-AF controls to compare the incidence of TTN missense variants. We recognize this as a limitation and will clarify that further studies are needed to define the prevalence of TTN missense variants in broader, multiethnic cohorts that include both AF and non-AF individuals.

      (2) The authors do not provide evidence that TTN-T32756I-iPSC-aCMs are arrhythmogenic, only that there is an increase in the Iks current and associated action potential changes. More specifically, the authors report that "compared to the WT, TTN-T32756I-iPSC-aCMs exhibited increased arrhythmic frequency," yet it is unclear what they are referring to by "arrhythmic frequency."

      We appreciate the reviewer’s request for clarification regarding "arrhythmic frequency." In our study, this term refers to the increased spontaneous beating rate and irregular action potentials observed in TTN-T32756I iPSC-aCMs compared to WT. Our findings suggest that the AF-associated TTN-T32756I variant induces ion channel remodeling and beating abnormalities, possibly contributing to an arrhythmogenic substrate for AF. We will refine our wording in the revised manuscript to enhance clarity and precision.

      (3) There seem to be discrepancies regarding the impact of the TTN-T32756I variant on mechanical function. Specifically, the authors report "both reduced contraction and abnormal relaxation in TTN-T32756I-iPSC-aCMs" yet, separately report "the contraction amplitude of the mutant was also increased … suggesting an increased contractile force by the TTN-T32756IiPSC-aCMs and TTN-T32756I-iPSC-CMs exhibited similar calcium transient amplitudes as the WT."

      We thank the reviewer for pointing this out and apologize for the inconsistency. We intended to report on contraction duration and relaxation rather than contraction force alone. The increased contraction amplitude reflects altered contractile force, whereas the reduced contraction duration and impaired relaxation indicate dysfunctional contractile dynamics. We will revise the text and corresponding figures to convey these findings accurately.

      Reviewer #3 (Public review):

      Summary:

      The authors describe the abnormal contractile function and cellular electrophysiology in an iPSC model of atrial myocytes with a titin missense variant. They provide contractility data by sarcomere length imaging, calcium imaging, and voltage clamp of the repolarizing current iKs. While each of the findings is interesting, the paper comes across as too descriptive because there is no data merging to support a cohesive mechanistic story/statement, especially from the electrophysiological standpoint. There is not enough support for the title "A Titin Missense Variant Causes Atrial Fibrillation", since there is no strong causative evidence. There is some interesting clinical data regarding the variant of interest and its association with HF hospitalization, which may lead to future important discoveries regarding atrial fibrillation.

      Strengths:

      The manuscript is well written, and a wide range of experimental techniques are used to probe this atrial fibrillation model.

      Weaknesses

      (1) While the clinical data is interesting, it is essential to rule out heart failure with preserved EF as a confounder. HFpEF leads to AF due to increased atrial remodeling, so the fact that patients with this missense variant have increased HF hospitalizations does not necessarily directly support the variant as causative of AF. It could be that the variant is associated directly with HFpEF instead, and this needs to be addressed and corrected in the analyses.

      We recognize that AF and HFpEF frequently coexist and that HFpEF-related atrial remodeling could contribute to AF development. The primary aim of our cohort analysis was to explore the potential clinical significance of TTNmv. While we acknowledge the inherent limitations of retrospective observational data in establishing causality, our subsequent in vitro experiments were designed to demonstrate that TTNmv can alter the electrophysiological substrate, potentially predisposing individuals to AF.

      As HFpEF is a potential confounder, it is reasonable to consider whether TTNmv may also be associated with HFpEF. However, to our knowledge, no existing literature directly links TTNmv to HFpEF. In contrast, loss-of-function TTN variants are typically associated with heart failure with reduced ejection fraction (HFrEF) and dilated cardiomyopathy, and even their role in HFrEF remains controversial. To address potential confounding, our multivariable analysis for clinical outcomes was adjusted for reduced ejection fraction, and we conducted a sensitivity analysis excluding patients with nonischemic dilated cardiomyopathy (Supplementary Table 6). We will clarify these points in the revised manuscript.

      (2) All contractility and electrophysiologic data should be done with pacing at the same rate in both control and missense variant groups, to control for the effect of cycle length on APD and calcium loading. A shorter APD cannot be claimed when the firing rate of one set of cells is much faster than the other, since shorter APD is to be expected with a quicker rate. Similarly, contractility is affected by diastolic interval because of the influence of SR calcium content on the myocyte power stroke. So the cells need to be paced at the same rate in the IonOptix for any direct comparison of contractility. The authors should familiarize themselves with the concept of electrical restitution.

      We appreciate the reviewer’s technical concern. iPSC-derived cardiomyocytes (iPSC-CMs) exhibit spontaneous beating due to the presence of pacemaker-like currents and the absence of I<sub>k1</sub>, which allows for the study of intrinsic electrophysiological properties, ion channel function, and disease modeling. In our study, we utilized this unique property of iPSCCMs to test our hypothesis that TTNmvs alter electrophysiological properties through ion channel remodeling.

      While iPSC-CMs with identical backgrounds are expected to show comparable electrophysiological phenotypes under the same conditions, variability due to biological and technical factors (e.g., protein expression and culture handling) can result in differences between samples. We agree with the reviewer that pacing iPSC-CMs at the same rate for action potential duration (APD) and contractility measurements will control for cycle length effects and improve the reliability and interpretability of our findings. We will incorporate this approach into our revised experimental design.

      (3) It is interesting that the firing rate of the myocytes is faster with the missense variant. This should lead to a hypothesis and investigation of abnormal automaticity or triggered activity, which may also explain the increased contractility since all these mechanisms are related to the SR's calcium clock and calcium loading. See #2 above for suggestions on how to probe calcium handling adequately. Such an investigation into impulse initiation mechanisms would be compelling in supporting the primary statement of the paper since these are actual mechanisms thought to cause AF.

      We agree with the reviewer that investigating abnormal automaticity or triggered activity about the increased firing rate observed with the missense variant could provide valuable insights into the mechanisms underlying AF. As these processes are closely linked to calcium handling and the calcium clock, probing calcium cycling abnormalities could strengthen our understanding of how TTNmvs contribute to AF. We will incorporate additional experiments to investigate these mechanisms, further supporting our study's central hypothesis.

      (4) The claim of shortened APD without correcting for cycle length is problematic. However, linking shortened APD in isolated cells alone to AF causation is more complicated. To have a setup for reentry, there must be a gradient of APD from short to long, and this can only be demonstrated at the tissue level, not at the cellular level, so reentry should not be invoked here. If shortened APD is demonstrated with correction of the cycle length problem, restitution curves can be made showing APD shortening at different cycle lengths. If restitution is abnormal (i.e. the APD does not shorten normally in relation to the diastolic interval), this may lead to triggered activity which is an arrhythmogenic mechanism. This would also tie in well with the finding of abnormally elevated iKs current since iKs is a repolarizing current directly responsible for restitution.

      We appreciate the reviewer’s insightful comment. We recognize that isolated cell studies cannot directly demonstrate reentrant circuits, and we agree that reentry should not be invoked solely based on cellular data. Our claim of shortened APD is based on observed abnormalities in APD and beating patterns, which may contribute to conditions conducive to reentry at the tissue level. We will clarify this distinction in the revised manuscript and refrain from directly linking APD shortening to reentry without tissue-level evidence.

    1. eLife Assessment

      Studying the biological roles of polyphosphates in metazoans has been a longstanding challenge to the field given that the polyP synthase has yet to be discovered in metazoans. This important study capitalizes on the sophisticated genetics available in the Drosophila system and uses a combination of methodologies to start to tease apart how polyphosphate participates in Drosophila development and in the clotting of Drosophila hemolymph. The data validating the tools are solid and well-documented and they will open up a field of research into the functional roles of polyP in a metazoan model.

    2. Reviewer #1 (Public review):

      Polymers of orthophosphate of varying lengths are abundant in prokaryotes and some eukaryotes where they regulate many cellular functions. Though they exist in metazoans, few tools exist to study their function. This study documents the development of tools to extract, measure, and deplete inorganic polyphosphates in *Drosophila*. Using these tools, the authors show:

      (1) that polyP levels are negligible in embryos and larvae of all stages while they are feeding. They remain high in pupae but their levels drop in adults.

      (2) that many cells in tissues such as the salivary glands, oocytes, haemocytes, imaginal discs, optic lobe, muscle, and crop, have polyP that is either cytoplasmic or nuclear (within the nucleolus).

      (3) that polyP is necessary in plasmatocytes for blood clotting in Drosophila.

      (4) that ployP controls the timing of eclosion.

      The tools developed in the study are innovative, well-designed, tested, and well-documented. I enjoyed reading about them and I appreciate that the authors have gone looking for the functional role of polyP in flies, which hasn't been demonstrated before. The documentation of polyP in cells is convincing as its role in plasmatocytes in clotting. Its control of eclosion timing, however, could result from non-specific effects of expressing an exogenous protein in all cells of an animal. The RNAseq experiments and their associated analyses on polyP-depleted animals and controls have not been discussed in sufficient detail. In its current form, the data look to be extremely variable between replicates and I'm therefore unsure of how the differentially regulated genes were identified.

      It is interesting that no kinases and phosphatases have been identified in flies. Is it possible that flies are utilising the polyP from their gut microbiota? It would be interesting to see if these signatures go away in axenic animals.

    3. Reviewer #2 (Public review):

      Summary:

      The authors of this paper note that although polyphosphate (polyP) is found throughout biology, the biological roles of polyP have been under-explored, especially in multicellular organisms. The authors created transgenic Drosophila that expressed a yeast enzyme that degrades polyP, targeting the enzyme to different subcellular compartments (cytosol, mitochondria, ER, and nucleus, terming these altered flies Cyto-FLYX, Mito-FLYX, etc.). The authors show the localization of polyP in various wild-type fruit fly cell types and demonstrate that the targeting vectors did indeed result in the expression of the polyP degrading enzyme in the cells of the flies. They then go on to examine the effects of polyP depletion using just one of these targeting systems (the Cyto-FLYX). The primary findings from the depletion of cytosolic polyP levels in these flies are that it accelerates eclosion and also appears to participate in hemolymph clotting. Perhaps surprisingly, the flies seemed otherwise healthy and appeared to have little other noticeable defects. The authors use transcriptomics to try to identify pathways altered by the cyto-FLYX construct degrading cytosolic polyP, and it seems likely that their findings in this regard will provide avenues for future investigation. And finally, although the authors found that eclosion is accelerated in pupae of Drosophila expressing the Cyto-FLYX construct, the reason why this happens remains unexplained.

      Strengths:

      The authors capitalize on the work of other investigators who had previously shown that expression of recombinant yeast exopolyphosphatase could be targeted to specific subcellular compartments to locally deplete polyP, and they also use a recombinant polyP binding protein (PPBD) developed by others to localize polyP. They combine this with the considerable power of Drosophila genetics to explore the roles of polyP by depleting it in specific compartments and cell types to tease out novel biological roles for polyP in a whole organism. This is a substantial advance.

      Weaknesses:

      Page 4 of the Results (paragraph 1): I'm a bit concerned about the specificity of PPBD as a probe for polyP. The authors show that the fusion partner (GST) isn't responsible for the signal, but I don't think they directly demonstrate that PPBD is binding only to polyP. Could it also bind to other anionic substances? A useful control might be to digest the permeabilized cells and tissues with polyphosphatase prior to PPBD staining and show that the staining is lost.

      In the hemolymph clotting experiments, the authors collected 2 ul of hemolymph and then added 1 ul of their test substance (water or a polyP solution). They state that they added either 0.8 or 1.6 nmol polyP in these experiments (the description in the Results differs from that of the Methods). I calculate this will give a polyP concentration of 0.3 or 0.6 mM. This is an extraordinarily high polyP concentration and is much in excess of the polyP concentrations used in most of the experiments testing the effects of polyP on clotting of mammalian plasma. Why did the authors choose this high polyP concentration? Did they try lower concentrations? It seems possible that too high a polyP concentration would actually have less clotting activity than the optimal polyP concentration.

    4. Reviewer #3 (Public review):

      Summary:

      Sarkar, Bhandari, Jaiswal, and colleagues establish a suite of quantitative and genetic tools to use Drosophila melanogaster as a model metazoan organism to study polyphosphate (polyP) biology. By adapting biochemical approaches for use in D. melanogaster, they identify a window of increased polyP levels during development. Using genetic tools, they find that depleting polyP from the cytoplasm alters the timing of metamorphosis, accelerating eclosion. By adapting subcellular imaging approaches for D. melanogaster, they observe polyP in the nucleolus of several cell types. They further demonstrate that polyP localizes to cytoplasmic puncta in hemocytes, and further that depleting polyP from the cytoplasm of hemocytes impairs hemolymph clotting. Together, these findings establish D. melanogaster as a tractable system for advancing our understanding of polyP in metazoans.

      Strengths:

      (1) The FLYX system, combining cell type and compartment-specific expression of ScPpx1, provides a powerful tool for the polyP community.

      (2) The finding that cytoplasmic polyP levels change during development and affect the timing of metamorphosis is an exciting first step in understanding the role of polyP in metazoan development, and possible polyP-related diseases.

      (3) Given the significant existing body of work implicating polyP in the human blood clotting cascade, this study provides compelling evidence that polyP has an ancient role in clotting in metazoans.

      Limitations:

      (1) While the authors demonstrate that HA-ScPpx1 protein localizes to the target organelles in the various FLYX constructs, the capacity of these constructs to deplete polyP from the different cellular compartments is not shown. This is an important control to both demonstrate that the GTS-PPBD labeling protocol works, and also to establish the efficacy of compartment-specific depletion. While not necessary to do this for all the constructs, it would be helpful to do this for the cyto-FLYX and nuc-FLYX.

      (2) The cell biological data in this study clearly indicates that polyP is enriched in the nucleolus in multiple cell types, consistent with recent findings from other labs, and also that polyP affects gene expression during development. Given that the authors also generate the Nuc-FLYX construct to deplete polyP from the nucleus, it is surprising that they test how depleting cytoplasmic but not nuclear polyP affects development. However, providing these tools is a service to the community, and testing the phenotypic consequences of all the FLYX constructs may arguably be beyond the scope of this first study.

    5. Author response:

      Our reviewers brought three things to our notice:

      (1) PolyP has not been introduced as an abbreviation in the abstract.

      (2) 'colorimetric' is misspelled as 'calorimetric' in the following sentence of the results section.

      This method involved the digestion of polyP by recombinant S. cerevisiae exopolyphosphatase 1 (_Sc_Ppx1) followed by calorimetric measurement of the released Pi by malachite green.

      (3) A reference for hNUDT3 has been deleted due to the same technical glitch from the following sentence of introduction.

      Recently, biochemical experiments led to the discovery of endopolyphosphatase NUDT3, an enzyme known as a dinucleoside phosphatase.

    1. eLife Assessment

      This is an important study that examines the impact of Streptococcus pneumoniae genetics on its in vitro growth kinetics, aiming to identify potential targets for vaccines and therapeutics. The study identified significant variations in growth characteristics among capsular serotypes and lineages, linked to phylogeny and high heritability, but genome-wide association studies did not reveal specific genomic loci associated with growth features independent of the genetic background. The evidence supporting these findings is solid.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript uses a diverse isolate collection of Streptococcus pneumoniae from hospital patients in the Netherlands to understand the population-level genetic basis of growth rate variation in this pathogen, which is a key determinant of S. pneumoniae within-host fitness. Previous efforts have studied this phenomenon in strain-specific comparisons, which can lack the statistical power and scope of population-level studies. The authors collected a rigorous set of in vitro growth data for each S. pneumoniae isolate and subsequently paired growth curve analysis with whole-genome analyses to identify how phylogenetics, serotype, and specific genetic loci influence in vitro growth. While there were noticeable correlations between capsular serotype and phylogeny with growth metrics, they did not identify specific loci associated with altered in vitro growth, suggesting that these phenotypes are controlled by the collective effect of the entire genetic background of a strain. This is an important finding that lays the foundation for additional, more highly-powered studies that capture more S. pneumoniae genetic diversity to identify these genetic contributions.

      Strengths:

      (1) The authors were able to completely control the experimental and genetic analyses to ensure all isolates underwent the same analysis pipeline to enhance the rigor of their findings.

      (2) The isolate collection captures an appreciable amount of S. pneumoniae diversity and, importantly, enables disentangling the contributions of the capsule and phylogenetic background to growth rates.

      (3) This study provides a population-level, rather than strain-specific, view of how genetic background influences the growth rate in S. pneumoniae. This is an advance over previous studies that have only looked at smaller sets of strains.

      (4) The methods used are well-detailed and robust to allow replication and extension of these analyses. Moreover, the manuscript is very well written and includes a thoughtful and thorough discussion of the strengths and limitations of the current study.

      Weaknesses:

      (1) As acknowledged by the authors, the genetic diversity and sample size of this newly collected isolate set are still limited relative to the known global diversity of S. pneumoniae, which evidently limits the power to detect loci with smaller/combinatorial contributions to growth rate (and ultimately infection).

      (2) The in vitro growth data is limited to a single type of rich growth medium, which may not fully reflect the nutritional and/or selective pressures present in the host.

      (3) The current study does not use genetic manipulation or in vitro/in vivo infection models to experimentally test whether alteration of growth rates as observed in this study is linked to virulence or successful infection. The availability of a naturally diverse collection with phylogenetic and serotype combinations already identified as interesting by the authors provides a strong rationale for wet-lab studies of these phenotypes.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Chaguza et al. presents a novel perspective on pneumococcal growth kinetics, suggesting that the overall genetic background of Streptococcus pneumoniae, rather than specific loci, plays a more dominant role in determining growth dynamics. Through a genome-wide association study (GWAS) approach, the authors propose a shift in how we understand growth regulation, differing from earlier findings that pinpointed individual genes, such as wchA or cpsE, as key regulators of growth kinetics. This study highlights the importance of considering the cumulative impact of the entire genetic background rather than focusing solely on individual genetic loci.

      The study emphasizes the cumulative effects of genetic variants, each contributing small individual impacts, as the key drivers of pneumococcal growth. This polygenic model moves away from the traditional focus on single-gene influences. Through rigorous statistical analyses, the authors persuasively advocate for a more holistic approach to understanding bacterial growth regulation, highlighting the complex interplay of genetic factors across the entire genome. Their findings open new avenues for investigating the intricate mechanisms underlying bacterial growth and adaptation, providing fresh insights into bacterial pathogenesis.

      Strengths:

      This study exemplifies a holistic approach to unraveling key factors in bacterial pathogenesis. By analyzing a large dataset of whole-genome sequences and employing robust statistical methodologies, the authors provide strong evidence to support their main findings. Which is a leap forward from previous studies focused on a relatively smaller number of strains. Their integration of genome-wide association studies (GWAS) highlights the cumulative, polygenic influences on pneumococcal growth kinetics, challenging the traditional focus on individual loci. This comprehensive strategy not only advances our understanding of bacterial growth regulation but also establishes a foundation for future research into the genetic underpinnings of bacterial pathogenesis and adaptation. The amount of data generated and corresponding approaches to analyze the data are impressive as well as convincing. The figures are convincing and comprehensible too.

      Weaknesses:

      Despite the strong outcomes of the GWAS approach, this study leaves room for differing interpretations. A key point of contention lies in the title, which initially gives the impression that the research addresses growth kinetics under both in vitro and in vivo conditions. However, the study is limited to in vitro growth kinetics, with the assumption that these findings are equally applicable to in vivo scenarios-a premise that is not universally valid. To more accurately reflect the study's scope and avoid potential misrepresentation, the title should explicitly specify "in vitro" growth kinetics. This clarification would better align the title with the study's actual focus and findings.

      This study suggests that the entire genetic background significantly influences bacterial growth kinetics. However, to transform these predictions into established facts, extensive experimental validation is necessary. This would involve "bench experiments" focusing on generating and studying mutant variants of serotypes or strains with diverse genomic variations, such as targeted deletions. The growth phenotypes of these mutants should be analyzed, complemented by complementation assays to confirm the specific roles of the deleted regions. These efforts would provide critical empirical evidence to support the findings from the GWAS approach and enhance understanding of the genetic basis of bacterial growth kinetics.

      In the discussion section, the authors state that "the influence of serotype appeared to be higher than the genetic background for the average growth rate" (lines 296-298). Alongside references 13-15, this emphasizes the important role of capsular variability, which is a key determinant of serotypes, in influencing growth kinetics. However, this raises the question: why isn't a specific locus like cps, which is central to capsule biogenesis, considered a strong influencer of growth kinetics in this study?

      One plausible explanation could be the absence of "elevated signals" for cps in the GWAS analysis. GWAS relies on identifying loci with statistically significant associations to phenotypes. The lack of such signals for cps may indicate that its contribution, while biologically important, does not stand out genome-wide. This might be due to the polygenic nature of growth kinetics, where the overall genetic background exerts a cumulative effect, potentially diluting the apparent influence of individual loci like cps in statistical analyses.

    4. Reviewer #3 (Public review):

      This study provides insights into the growth kinetics of a diverse collection of Streptococcus pneumoniae, identifying capsule and lineage differences. It was not able to identify any specific loci from the genome-wide association studies (GWAS) that were associated with the growth features. It does provide a useful study linking phenotypic data with large-scale genomic population data. The methods for the large part were appropriately written in sufficient detail, and data analysis was performed with rigour. The interpretation of the results was supported by the data, although some additional explanation of the significance of e.g. ancestral state reconstruction would be useful. Efforts were made to make the underlying data fully accessible to the readers although some of the supplementary material could be formatted and explained a bit better.

    1. eLife Assessment

      This important study examines the relationship between cognition and mental health and investigates how brain, genetics, and environmental measures mediate that relationship. The methods and results are compelling and well-executed. Overall, this study will be of interest in the field of population neuroscience and in studies of mental health.

    2. Reviewer #1 (Public review):

      Summary:

      This work integrates two timepoints from the Adolescent Brain Cognitive Development (ABCD) Study to understand how neuroimaging, genetic, and environmental data contribute to the predictive power of mental health variables in predicting cognition in a large early adolescent sample. Their multimodal and multivariate prediction framework involves a novel opportunistic stacking model to handle complex types of information to predict variables that are important in understanding mental health-cognitive performance associations.

      Strengths:

      The authors are commended for incorporating and directly comparing the contribution of multiple imaging modalities (task fMRI, resting state fMRI, diffusion MRI, structural MRI), neurodevelopmental markers, environmental factors, and polygenic risk scores in a novel multivariate framework (via opportunistic stacking), as well as interpreting mental health-cognition associations with latent factors derived from partial least squares. The authors also use a large well-characterized and diverse cohort of adolescents from the ABCD Study. The paper is also strengthened by commonality analyses to understand the shared and unique contribution of different categories of factors (e.g., neuroimaging vs mental health vs polygenic scores vs sociodemographic and adverse developmental events) in explaining variance in cognitive performance

      Weaknesses:

      The paper is framed with an over-reliance on the RDoC framework in the introduction, despite deviations from the RDoC framework in the methods. The field is also learning more about RDoC's limitations when mapping cognitive performance to biology. The authors also focus on a single general factor of cognition as the core outcome of interest as opposed to different domains of cognition. The authors could consider predicting mental health rather than cognition. Using mental health as a predictor could be limited by the included 9-11 year age range at baseline (where many mental health concerns are likely to be low or not well captured), as well as the nature of how the data was collected, i.e., either by self-report or from parent/caregiver report.

    3. Reviewer #2 (Public review):

      Summary:

      This paper by Wang et al. uses rich brain, behaviour, and genetics data from the ABCD cohort to ask how well cognitive abilities can be predicted from mental-health-related measures, and how brain and genetics influence that prediction. They obtain an out-of-sample correlation of 0.4, with neuroimaging (in particular task fMRI) proving the key mediator. Polygenic scores contributed less.

      Strengths:

      This paper is characterized by the intelligent use of a superb sample (ABCD) alongside strong statistical learning methods and a clear set of questions. The outcome - the moderate level of prediction between the brain, cognition, genetics, and mental health - is interesting. Particularly important is the dissection of which features best mediate that prediction and how developmental and lifestyle factors play a role.

      Weaknesses:

      There are relatively few weaknesses to this paper. It has already undergone review at a different journal, and the authors clearly took the original set of comments into account in revising their paper. Overall, while the ABCD sample is superb for the questions asked, it would have been highly informative to extend the analyses to datasets containing more participants with neurological/psychiatric diagnoses (e.g. HBN, POND) or extend it into adolescent/early adult onset psychopathology cohorts. But it is fair enough that the authors want to leave that for future work.

      In terms of more practical concerns, much of the paper relies on comparing r or R2 measures between different tests. These are always presented as point estimates without uncertainty. There would be some value, I think, in incorporating uncertainty from repeated sampling to better understand the improvements/differences between the reported correlations.

      The focus on mental health in a largely normative sample leads to the predictions being largely based on the normal range. It would be interesting to subsample the data and ask how well the extremes are predicted.

      A minor query - why are only cortical features shown in Figure 3?

    1. eLife Assessment

      This study establishes the methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. Their results enable unrestrained social interactions under more rigorous conditions with detailed quantification of position and gaze. It has been difficult to study social interactions using artificial stimuli, as opposed to genuine interactions between unrestrained animals. This study makes an important contribution to studying social neuroscience within a laboratory setting; the approach is novel and well-executed, backed by convincing evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The current study by Xing et al. establishes the methodology (machine vision and gaze pose estimation) and behavioral apparatus for examining social interactions between pairs of marmoset monkeys. Their results enable unrestrained social interactions under more rigorous conditions with detailed quantification of position and gaze. It has been difficult to study social interactions using artificial stimuli, as opposed to genuine interactions between unrestrained animals. This study makes an important contribution for studying social neuroscience within a laboratory setting that will be valuable to the field.

      Strengths:

      Marmosets are an ideal species for studying primate social interactions due to their prosocial behavior and the ease of group housing within laboratory environments. They also predominantly orient their gaze through head movements during social monitoring. Recent advances in machine vision pose estimation set the stage for estimating 3D gaze position in marmosets but require additional innovation beyond DeepLabCut or equivalent methods. A six-point facial frame is designed to accurately fit marmoset head gaze. A key assumption in the study is that head gaze is a reliable indicator of the marmoset's gaze direction, which will also depend on the eye position. Overall, this assumption has been well supported by recent studies in head-free marmosets. Thus the current work introduces an important methodology for leveraging machine vision to track head gaze and demonstrates its utility for use with interacting marmoset dyads as a first step in that study.

      Weaknesses:

      One weakness that should be easily addressed is that no data is provided to directly assess how accurate the estimated head gaze is based on calibrations of the animals, for example, when they are looking at discrete locations like faces or video on a monitor. This would be useful to get an upper bound on how accurate the 3D gaze vector is estimated to be, for planned use in other studies. Although the accuracy appears sufficient for the current results, it would be difficult to know if it could be applied in other contexts where more precision might be necessary.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes novel technique development and experiments to track the social gaze of marmosets. The authors used video tracking of multiple cameras in pairs of marmosets to infer head orientation and gaze and then studied gaze direction as a function of distance between animals, relationships, and social conditions/stimuli.

      Strengths:

      Overall the work is interesting and well done. It addresses an area of growing interest in animal social behavior, an area that has largely been dominated by research in rodents and other non-primate species. In particular, this work addresses something that is uniquely primate (perhaps not unique, but not studied much in other laboratory model organisms), which is that primates, like humans, look at each other, and this gaze is an important social cue of their interactions. As such, the presented work is an important advance and addition to the literature that will allow more sophisticated quantification of animal behaviors. I am particularly enthusiastic with how the authors approach the cone of uncertainty in gaze, which can be both due to some error in head orientation measurements as well as variable eye position.

      Weaknesses:

      There are a few technical points in need of clarification, both in terms of the robustness of the gaze estimate, and possible confounds by gaze to non-face targets which may have relevance but are not discussed. These are relatively minor, and more suggestions than anything else.

    1. eLife Assessment

      The study describes a useful tool for assessing microglia morphology in a variety of experimental conditions. The MorphoCellSorter provides a solid platform for ranking microglia to reflect their morphology continuum and may offer new insight into changes in morphology associated with injury or disease. While the study provides an alternative approach to existing methods for measuring microglia morphology, the functional significance of the measured morphological changes were not determined.

    2. Reviewer #1 (Public review):

      The current manuscript by Bendeker et al. (2024) presents a new platform, MorphoCellSorter, for performing population wide microglial morphological analyses. This method adds to the many programs/platforms available to determine characteristics of microglial morphology; however, MorphoCellSorter is unique in that it uses Andrew's plotting to rank populations of cells together (in control and experimental groups) and present "big picture" views of how entire populations of microglia alter under different conditions. In their ranking system, Bendeker et al. (2024) use PCA to determine which of the morphological characteristics most define microglial populations, avoiding user subjective biases to determine these parameters. Compared to "expert" evaluators, MorphoCellSorter appears to perform consistently and accurately, including in different types of tissue preservation methods and in live cells, a key feature of the program. In addition, the researchers point out that this platform can be used across a wide array of imaging techniques and most microscopes that are available in a basic research lab. There are minor concerns about the platform's utility in analyzing embryonic microglia and primary microglial cultures, but overall, this platform will be another useful tool for microglial researchers to consider using in future studies. Furthermore, the method of morphological assessment aligns with the current direction of the field in identifying microglial cells in more nuanced ways.

      In their current revision, the authors have done an excellent job responding to concerns and have updated the manuscript accordingly.

    3. Reviewer #2 (Public review):

      The authors introduce MorphCellSorter, an open-source tool available on GitHub, designed for automated morphometric analysis of microglia. Current understanding suggests that microglia represent a heterogeneous population, especially in non-steady adult states, better characterized as a continuum rather than distinct cell groups.

      This tool was developed to classify microglia along this continuum. Using stained brain sections and microscope imaging, individual microglia are binarized and processed with MorphCellSorter, which categorizes them based on 20 morphological parameters. Notably, the tool is versatile, as it can be applied to both fluorescent and brightfield brain sections, as demonstrated by the authors. Additionally, it has been tested across various setups (both fixed and live tissues) and biological contexts (including embryonic stages, Alzheimer's disease models, stroke, and primary cell cultures), showcasing its versatility and adaptability. Overall, the study is well-conceived and could have some value in the field.

      Numerous similar tools already exist, and the number is likely to grow, especially with advancements in AI. These tools have limited scientific utility as they provide descriptive rather than informative outputs. Microglial morphology varies due to external influences (such as developmental stages and injuries), but the significance of these variations remains largely hypothetical.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews (consolidated):

      In the microglia research community, it is accepted that microglia change their shape both gradually and acutely along a continuum that is influenced by external factors both in their microenvironments and in circulation. Ideally, a given morphological state reflects a functional state that provides insight into a microglia's role in physiological and pathological conditions. The current manuscript introduces MorphoCellSorter, an open-source tool designed for automated morphometric analysis of microglia. This method adds to the many programs and platforms available to assess the characteristics of microglial morphology; however, MorphoCellSorter is unique in that it uses Andrew's plotting to rank populations of cells together (in control and experimental groups) and presents "big picture" views of how entire populations of microglia alter under different conditions. Notably, MorphoCellSorter is versatile, as it can be used across a wide array of imaging techniques and equipment. For example, the authors use MorphoCellSorter on images of fixed and live tissues representing different biological contexts such as embryonic stages, Alzheimer's disease models, stroke, and primary cell cultures.

      This manuscript outlines a strategy for efficiently ranking microglia beyond the classical homeostatic vs. active morphological states. The outcome offers only a minor improvement over the already available strategies that have the same challenge: how to interpret the ranking functionally.

      We would like to thank the reviewers for their careful reading and constructive comments and questions. While MorphoCellSorter currently does not rank cells functionally based on their morphology, its broad range of application, ease of use and capacity to handle large datasets provide a solid foundation. Combined with advances in single-cell transcriptomics, MorphoCellSorter could potentially enable the future prediction of cell functions based on morphology.

      Strengths and Weaknesses:

      (1) The authors offer an alternative perspective on microglia morphology, exploring the option to rank microglia instead of categorizing them with means of clusterings like k-means, which should better reflect the concept of a microglia morphology continuum. They demonstrate that these ranked representations of morphology can be illustrated using histograms across the entire population, allowing the identification of potential shifts between experimental groups. Although the idea of using Andrews curves is innovative, the distance between ranked morphologies is challenging to measure, raising the question of whether the authors oversimplify the problem.

      We have access to the distance between cells through the Andrew’s score of each cell. However, the challenge is that these distances are relative values and specific to each dataset. While we believe that these distances could provide valuable information, we have not yet determined the most effective way to represent and utilize this data in a meaningful manner.

      Also, the discussion about the pipeline's uniqueness does not go into the details of alternative models.The introduction remains weak in outlining the limitations of current methods (L90). Acknowledging this limitation will be necessary.

      Thank you for these insightful comments. The discussion about alternative methods was already present in the discussion L586-598 but to answer the request of the reviewers, we have revised the introduction and discussion sections to more clearly address the limitations of current methods, as well as discussed the uniqueness of the pipeline. Additionally, we have reorganized Figure 1 to more effectively highlight the main caveats associated with clustering, the primary method currently in use.

      (2) The manuscript suffers from several overstatements and simplifications, which need to be resolved. For example:

      a)  L40: The authors talk about "accurately ranked cells". Based on their results, the term "accuracy" is still unclear in this context.

      Thank you for this comment. Our use of the term "accurately" was intended to convey that the ranking was correct based on comparison with human experts, though we agree that it may have been overstated. We have removed "accurately" and propose to replace it with "properly" to better reflect the intended meaning.

      b) L50: Microglial processes are not necessarily evenly distributed in the healthy brain. Depending on their embedded environment, they can have longer process extensions (e.g., frontal cortex versus cerebellum).

      Thank you for raising this point to our attention. We removed evenly to be more inclusive on the various morphologies of microglia cells in this introductory sentence

      c)  L69: The term "metabolic challenge" is very broad, ranging from glycolysis/FAO switches to ATP-mediated morphological adaptations, and it needs further clarification about the author's intended meaning.

      Thank you for this comment, indeed we clarified to specify that we were talking about the metabolic challenge triggered by ischemia and added a reference as well.

      d) L75: Is morphology truly "easy" to obtain?

      Yes, it is in comparison to other parameters such as transcripts or metabolism, but we understand the point made by the reviewer and we found another way of writing it. As an alternative we propose: “morphology is an indicator accessible through…”

      e) L80: The sentence structure implies that clustering or artificial intelligence (AI) are parameters, which is incorrect. Furthermore, the authors should clarify the term "AI" in their intended context of morphological analysis.

      We apologize for this confusing writing, we reformulated the sentence as follows: “Artificial intelligence (AI) approaches such as machine learning have also been used to categorize morphologies (Leyh et al., 2021)”.

      f) L390f: An assumption is made that the contralateral hemisphere is a non-pathological condition. How confident are the authors about this statement? The brain is still exposed to a pathological condition, which does not stop at one brain hemisphere.

      We did not say that the contralateral is non-pathological but that the microglial cells have a non-pathological morphology which is slightly different. The contralateral side in ischemic experiments is classically used as a control (Rutkai et al 2022). Although It has been reported that differences in transcript levels can be found between sham operated animals and contralateral hemisphere in tMCAO mice (Filippenkov et al 2022) https://doi.org/10.3390/ijms23137308 showing that indeed the contralateral side is in a different state that sham controls, no report have been made on differences in term of morphology.

      We have removed “non-pathological” to avoid misinterpretations

      g)  Methodological questions:

      a) L299: An inversion operation was applied to specific parameters. The description needs to clarify the necessity of this since the PCA does not require it.

      Indeed, we are sorry for this lack of explanation. Some morphological indexes rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, simplifying data interpretation. This clarification has been added to the revised manuscript as follows:

      “Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio and skeleton processes ratio were subjected to an inversion operation in order to homogenize the parameters before conducting the PCA: indeed, some parameters rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, thus simplifying data interpretation.”

      b) Different biological samples have been collected across different species (rat, mouse) and disease conditions (stroke, Alzheimer's disease). Sex is a relevant component in microglia morphology. At first glance, information on sex is missing for several of the samples. The authors should always refer to Table 1 in their manuscript to avoid this confusion. Furthermore, how many biological animals have been analyzed? It would be beneficial for the study to compare different sexes and see how accurate Andrew's ranking would be in ranking differences between males and females. If they have a rationale for choosing one sex, this should be explained.

      As reported in the literature, we acknowledge the presence of sex differences in microglial cell morphology. Due to ethical considerations and our commitment to reducing animal use, we did not conduct dedicated experiments specifically for developing MorphoCellSorter. Instead, we relied on existing brain sections provided by collaborators, which were already prepared and included tissue from only one sex—either female or male—except in the case of newborn pups, whose sex is not easily determined. Consequently, we were unable to evaluate whether MorphoCellSorter is sensitive enough to detect morphological differences in microglia attributable to sex. Although assessing this aspect is feasible, we are uncertain if it would yield additional insights relevant to MorphoCellSorter’s design and intended applications.

      To address this, we have included additional references in Table 1 of the revised manuscript and clearly indicated the sex of the animals from which each dataset was obtained.

      c) In the methodology, the slice thickness has been given in a range. Is there a particular reason for this variability?

      We could not spot any range in the text, we usually used 30µm thick sections in order to have entire or close to entire microglia cells.

      Although the thickness of the sections was identical for all the sections of a given dataset, only the plans containing the cells of interest were selected during the imaging for both of the ischemic stroke model. This explains why depending on how the cell is distributed in Z the range of the plans acquired vary.

      Also, the slice thickness is inadequate to cover the entire microglia morphology. How do the authors include this limitation of their strategy? Did the authors define a cut-off for incomplete microglia?

      We found that 30 µm sections provide an effective balance, capturing entire or nearly entire microglial cells (consistent with what we observe in vivo) while allowing sufficient antibody penetration to ensure strong signal quality, even at the section's center. In our segmentation process, we excluded microglia located near the section edges (i.e., cells with processes visible on the first or last plane of image acquisition, as well as those close to the field of view’s boundary). Although our analysis pipeline should also function with thicker sections (>30 µm), we confirmed that thinner sections (15 µm or less) are inadequate for detecting morphological differences, as tested initially on the AD model. Segmented, incomplete microglia lack the necessary structural information to accurately reflect morphological differences thus impairing the detection of existing morphological differences.

      c) The manuscript outlines that the authors have used different preprocessing pipelines, which is great for being transparent about this process. Yet, it would be relevant to provide a rationale for the different imaging processing and segmentation pipelines and platform usages (Supplementary Figure 7). For example, it is not clear why the Z maximum projection is performed at the end for the Alzheimer's Disease model, while it's done at the beginning of the others.

      The same holds through for cropping, filter values, etc. Would it be possible to analyze the images with the same pipelines and compare whether a specific pipeline should be preferable to others?

      The pre-processing steps depend on the quality of the images in each dataset. For example, in the AD dataset, images acquired with a wide-field microscope were considerably noisier compared to those obtained via confocal microscopy. In this case, reducing noise plane-by-plane was more effective than applying noise reduction on a Z-projection, as we would typically do for confocal images. Given that accurate segmentation is essential for reliable analysis in MorphoCellSorter, we chose to tailor the segmentation approach for each dataset individually. We recommend future users of MorphoCellSorter take a similar approach. This clarification has been added to the discussion.

      On a note, Matlab is not open-access,

      This is correct. We are currently translating this Matlab script in Python, this will be available soon on Github. https://github.com/Pascuallab/MorphCellSorter.

      This also includes combining the different animals to see which insights could be gained using the proposed pipelines.

      Because of what we have been explaining earlier, having a common segmentation process for very diverse types of acquisitions (magnification, resolution and type of images) is not optimal in terms of segmentation and accuracy in the analysis. Although we could feed MorphoCellSorter with all this data from a unique segmentation pipeline, the results might be very difficult to interprete.

      d) L227: Performing manual thresholding isn't ideal because it implies the preprocessing could be improved. Additionally, it is important to consider that morphology may vary depending on the thresholding parameters. Comparing different acquisitions that have been binarized using different criteria could introduce biases.

      As noted earlier, segmentation is not the main focus of this paper, and we leave it to users to select the segmentation method best suited to their datasets. Although we acknowledge that automated thresholding would be in theory ideal, we were confronted toimage acquisitions that were not uniform, even within the same sample. For instance, in ischemic brain samples, lipofuscin from cell death introduces background noise that can artificially impact threshold levels. We tested global and local algorithms to automatically binarize the cells but these approaches resulted often on imperfect and not optimized segmentation for every cell. In our experience, manually adjusting the threshold provides a more accurate, reliable, and comparable selection of cellular elements, even though it introduces some subjectivity. To ensure consistency in segmentation, we recommend that the same person performs the analysis across all conditions. This clarification has been added to the discussion.

      e) Parameter choices: L375: When using k-means clustering, it is good practice to determine the number of clusters (k) using silhouette or elbow scores. Simply selecting a value of k based on its previous usage in the literature is not rigorous, as the optimal number of clusters depends on the specific data structure. If they are seeking a more objective clustering approach, they could also consider employing other unsupervised techniques, (e.g. HDBSCAN) (L403f).

      We do agree with the referee’s comment but, the purpose of the k-mean we used was just to illustrate the fact that the clusters generated are artificial and do not correspond to the reality of the continuum of microglia morphology. In the course of the study we used the elbow score to determine the k means but this did not work well because no clear elbow was visible in some datasets (probably because of the continuum of microglia morphologies). Anyway, using whatever k value will not change the problem that those clusters are quite artificial and that the boundaries of those clusters are quite arbitrary whatever the way k is determined manually or mathematically.

      L373: A rationale for the choice of the 20 non-dimensional parameters as well as a detailed explanation of their computation such as the skeleton process ratio is missing. Also, how strongly correlated are those parameters, and how might this correlation bias the data outcomes?

      Thank you for raising this point. There is no specific rationale beyond our goal of being as exhaustive as possible, incorporating most of the parameters found in the literature, as well as some additional ones that we believed could provide a more thorough description of microglial morphology.

      Indeed, some of these parameters are correlated. Initially, we considered this might be problematic, but we quickly found that these correlations essentially act as factors that help assign more weight to certain parameters, reflecting their likely greater importance in a given dataset. Rather than being a limitation, the correlated parameters actually enhance the ranking. We tested removing some of these parameters in earlier versions of MorphoCellSorter, and found that doing so reduced the accuracy of the tool.

      Differences between circularity and roundness factors are not coming across and require further clarification.

      These are two distinct ways of characterizing morphological complexity, and we borrowed these parameters and kept the name from the existing literature, not necessarily in the context of microglia. In our case, these parameters are used to describe the overall shape of the cell. The advantage of using different metrics to calculate similar parameters is that, depending on the dataset, one method may be better suited to capture specific morphological features of a given dataset. MorphoCellSorter selects the parameter that best explains the greatest dispersion in the data, allowing for a more accurate characterization of the morphology. In Author response image 1 you will see how circularity and roundness describe differently cells

      Author response image 1.

      Correlation between Circularity and Roundness Factor in the Alzheimer disease dataset. A second order polynomial correlation exists between the two parameters in our dataset. Indeed (1) a single maximum is shared between both parameters. However, Circularity and Roundness Factor are not entirely redundant, as examplified by (2) the possible variety of Roundness Factors for a given Circularity as well as (3) the very different morphology minima of these two parameters.

      One is applied to the soma and the other to the cell, but why is neither circularity nor loudness factor applied to both?

      None of the parameters concern the cell body by itself. The cell body is always relative to another metric(s). Because these parameters and what they represent does not seem to be very clear we have added a graphic representation of the type of measurements and measure they provide in the revised version of the manuscript (Supplemental figure 8).

      f) PCA analysis:

      The authors spend a lot of text to describe the basic principles of PCA. PCA is mathematically well-described and does not require such depth in the description and would be sufficient with references.

      Thank you for this comment indeed the description of PCA may be too exhaustive, we will simplify the text.

      Furthermore, there are the following points that require attention:

      L321: PC1 is the most important part of the data could be an incorrect statement because the highest dispersion could be noise, which would not be the most relevant part of the data. Therefore, the term "important" has to be clarified.

      We are not sure in the case of segmented images the noise would represent most of the data, as by doing segmentation we also remove most of the noise, but maybe the reviewer is concerned about another type of noise? Nonetheless, we thank the reviewer for his comment and we propose the following change, that should solve this potential issue.

      PC<sub>1<.sub> is the direction in which data is most dispersed.”

      L323: As before, it's not given that the first two components hold all the information.

      Thank you for this comment we modified this statement as follows: “The two first components represent most of the information (about 70%), hence we can consider the plan PC<sub>1</sub>, PC<sub>2</sub> as the principal plan reducing the dataset to a two dimensional space”

      L327 and L331 contain mistakes in the nomenclature: Mix up of "wi" should be "wn" because "i" does not refer to anything. The same for "phi i = arctan(yn/wn)" should be "phi n".

      Thanks a lot for these comments. We have made the changes in the text as proposed by the reviewer.

      L348: Spearman's correlation measures monotonic correlation, not linear correlation. Either the authors used Pearson Correlation for linearity or Spearman correlation for monotonic. This needs to be clarified to avoid misunderstandings.

      Sorry for the misunderstanding, we did use Spearman correlation which is monotonic, we thus changed linear by monotonic in the text. Thanks a lot for the careful reading.

      g) If the authors find no morphological alteration, how can they ensure that the algorithm is sensitive enough to detect them? When morphologies are similar, it's harder to spot differences. In cases where morphological differences are more apparent, like stroke, classification is more straightforward.

      We are not entirely sure we fully understand the reviewer's comment. When data are similar or nearly identical, MorphoCellSorter performs comparably to human experts (see Table 1). However, the advantage of using MorphoCellSorter is that it ranks cells do.much faster while achieving accuracy similar to that of human experts AND gives them a value on an axis (andrews score), which a human expert certainly can't. For example, in the case of mouse embryos, MorphoCellSorter’s ranking was as accurate as that made by human experts. Based on this ranking, the distributions were similar, suggesting that the morphologies are generally consistent across samples.

      The algorithm itself does not detect anything—it simply ranks cells according to the provided parameters. Therefore, it is unlikely that sensitivity is an issue; the algorithm ranks the cells based on existing data. The most critical factor in the analysis is the segmentation step, which is not the focus of our paper. However, the more accurate the segmentation, the more distinct the parameters will be if actual differences exist. Thus, sensitivity concerns are more related to the quality of image acquisition or the segmentation process rather than the ranking itself. Once MorphoCellSorter receives the parameters, it ranks the cells accordingly. When cells are very similar, the ranking process becomes more complex, as reflected in the correlation values comparing expert rankings to those from MorphoCellSorter (Table 1).

      Moreover, MorphoCellSorter does not only provide a ranking: the morphological indexes automatically computed offer useful information to compare the cells’ morphology between groups.

      h) Minor aspects:

      % notation requires to include (weight/volume) annotation.

      This has been done in the revised version of the manuscript

      Citation/source of the different mouse lines should be included in the method sections (e.g. L117).

      The reference of the mouse line has been added (RRID:IMSR_JAX:005582) to the revised version of the manuscript.

      L125: The length of the single housing should be specified to ensure no variability in this context.

      The mice were kept 24h00 individually, this is now stated in the text

      L673: Typo to the reference to the figure.

      This has been corrected, thank you for your thoughtful reading.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Methods

      (1) Alzheimer's disease model: was a perfusion performed and then an hour later brains extracted? Please clarify.

      This is indeed what has been done.

      (2) For in vitro microglial studies: was a percoll gradient used for the separation of immune cells? What percentage percoll was used? Was there separation of myelin and associated debris with the percoll centrifugation? Please clarify the protocol as it is not completely clear how these cells were separated from the initial brain lysate suspension. What cell density was plated?

      The protocol has been completed, as followed: “Myelin and debris were then eliminated thanks to a Percoll® PLUS solution (E0414, Sigma-Aldrich) diluted with DPBS10X (14200075, Gibco) and enriched in MgCl<sub>2</sub> and CaCl<sub>2</sub> (for 50 mL of myelin separation buffer: 90 mL of Percoll PLUS, 10 mL of DPBS10X, 90 μL of 1 M CaCl<sub>2</sub> solution, and 50 μL of 1 M MgCl<sub>2</sub> solution).”. Thank you for your feedback.

      (3) How are the microglia "automatically cropped" in FIJI (for the Phox2b mutant)? Is there a function/macro in the program you used? This is very important for the workflow and needs to be clarified. The methods section of this manuscript is a guide for future users of this workflow and should be as descriptive as possible. It would be useful to give detailed information on the manual classification process, perhaps as a supplement. The authors do a nice job pointing out that these older methods are not effective in categorizing microglia that don't necessarily fit into a predefined phenotype.

      The protocol has been completed, as follows “. Briefly, the centroid of each detected object (i.e. microglia), except the ones on the borders, were detected, and a crop of 300x300 pixels around the objects were generated. Then, the pixels belonging to neighboring cells were manually removed on each generated crop.

      (4) Please address the concern that manual tuning and thresholding are required for this method's accuracy. Is this easily reproducible?

      Yes, it is easily reproducible for a given experimenter and is better suited than automatic thresholding. Although segmentation is not the primary focus of this paper, we leave it to users to choose the segmentation method that best fits their datasets.

      To address your question, we acknowledge that automated thresholding would theoretically be ideal. However, we encountered challenges due to non-uniform image acquisitions, even within the same sample. For instance, in ischemic brain samples, lipofuscin resulting from cell death introduced background noise that could artificially influence threshold levels. We tested both global and local algorithms for automatic binarization of cells, but these approaches often produced suboptimal segmentation results for individual cells.

      Based on our experience, manually adjusting the threshold provided more accurate, reliable, and consistent selection of cellular elements, even though it introduces a degree of subjectivity. To maintain consistency, we recommend that the same individual perform the analysis across all conditions.

      This clarification has been incorporated into the discussion as follows: “Although, automated thresholding would be ideal. In our case, image acquisitions were not entirely uniform, even within the same sample. For instance, in ischemic brain samples, lipofuscin from cell death introduces background noise that can artificially impact threshold levels. This effect is observed even when comparing contralateral and ipsilateral sides of the same brain. In our experience, manually adjusting the threshold provides a more accurate, reliable, and comparable selection of cellular elements, even though it introduces some subjectivity. To ensure consistency in segmentation, we recommend that the same person performs the analysis across all conditions. “

      (5) How are the authors performing the PCA---what program (e.g .R)? Again, please be explicit about how these mathematical operations were computed. (lines 302-345).

      The PCA was made in Matlab, the code can be found on Github (https://github.com/Pascuallab/MorphCellSorter), as stated in the discussion.

      Other:

      (1) Can the authors comment on the challenges of the in vitro microglial analyses? The correlation of the experts v. MorphoCellSorter is much less than the fixed tissue. This is not addressed in the manuscript.

      In vitro, microglial cells exhibit a narrower range of morphological diversity compared to ex vivo or in vivo conditions. A higher proportion of cells share similar morphologies or morphologies with comparable complexities, which makes establishing a precise ranking more challenging. Consequently, the rank of many cells could be adjusted without significantly affecting the overall quality of the ranking.

      This explains why the rankings tend to show slightly greater divergence between experts. Interestingly, the ranking generated by MorphoCellSorter, which is objective and not subject to human bias, lies roughly midway between the rankings of the two experts.

      (2) You point out that the MorphoCellSorter may not be suited for embryonic/prenatal microglial analysis.

      This must be a misunderstanding because it is not what we concluded; we found that the ranking was correct but that we could not spot any differences due to transgenic alteration.

      The lack of differences observed in the embryonic microglia (Figure 5) is not necessarily surprising, as embryonic microglia have diverse morphological characteristics--- immature microglia do not possess highly ramified processes until postnatal development [see Hirosawa et al. (2005) https://doi.org/10.1002/jnr.20480 -they use an Iba1-GFP transgenic mouse to visualize prenatal microglia]. Also, see Bennett et al. (2016) [https://doi.org/10.1073/pnas.1525528113] which shows mature microglia not appearing until 14 days postnatal.

      We agree with the reviewer on that point nonetheless MorphoCellSorter provides an information on the fact that the population is homogeneous and that the mutation has no effect on the morphology.

      (3) Although a semantic issue, Figure 1's categorization of microglia shows predefined groups of microglia do not necessarily usefully bin many cells. Is still possible to categorize the microglia without using hotly debated categorization methods? The literature review in the current manuscript correctly points out the spectrum phenomenon of microglial activation states, though some of the suggestions from Paolicelli et al. (2022) are not put into action. The use of "activated" only further perpetuates the oversimplified classification of microglia. Perhaps the authors could consider using the term "reactive", as it is recognized by the Microglial nomenclature paper cited above. Are "amoeboid microglia" not "activated microglia"? "Reactive" is a less loaded term and is a recommended descriptor. Amoeboid microglia are commonly understood to be indicative of a highly proinflammatory environment, though you could potentially use "hyper-reactive" to differentiate them from the slightly ramified "reactive" cells.

      We changed activated microglia to reactive microglia as requested by the reviewer in the text. Thanks a lot for your comment

      (4) The graphs in Figures 3 B-D are visually difficult to interpret. The better color contrast between the MorphoCellSorter/Expert and Expert1/Expert2 would be useful--- perhaps a color for Expert 1 and a different color for Expert 2. Is this the ranking from the same data in Figure 1 (lines 420-421)? It is unclear what the x-axis represents in 3B-D. E-G is much more intuitive.

      We believe the confusion stems more from Figure 1 than Figure 3, as both figures use similar representations for entirely different analyses (clustering vs. ranking). To address this, we have provided an updated version of Figure 1 to help clarify this distinction and avoid any potential misinterpretation.

      Regarding Figure 3B-D, we do not fully see the need for changing the colors. These panels are histograms that display the distribution of rank differences either between experts and MorphoCellSorter or between the two experts. Assigning specific colors to the experts or MorphoCellSorter would be challenging, as the histograms represent comparative distributions involving both an expert and MorphoCellSorter or the ranking differences between the two experts.

      The same reasoning applies to Figures 3E-G. In these scatter plots, each point is defined by an ordinate (ranking value for one expert) and an abscissa (ranking value for either the other expert or MorphoCellSorter). Therefore, it would not be straightforward or meaningful to assign distinct colors to these elements within this context.

      (5) Line 217: use the term "imaged" rather than "generated" ... or "images were generated of clusters of microglia located .... using MICROSOPE and Zen software." You aren't generating microglia, rather, you are generating images.

      Thanks a lot for raising this problem, we changed the sentence as followed: “For the AD model, crops of individual microglial cells located in the secondary visual cortex were extracted from images using the Zen software (v3.5, Zeiss) and exported to the Tif image format.

      (6) Elaborate on how an "inversion operation" was applied to Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio, and skeleton processes. (Lines 299-300) Furthermore, a paragraph separation would be useful if the "inversion operation" is not what is described in the text immediately after this description.

      Indeed, we are sorry for this lack of explanation. Some morphological indexes rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, simplifying data interpretation. This clarification has been added to the revised manuscript as follows:

      “Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio and skeleton processes ratio were subjected to an inversion operation in order to homogenize the parameters before conducting the PCA: indeed, some parameters rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, thus simplifying data interpretation.”

      (7) Line 560: "measureclarke" seems to be an error associated with the reference. Please correct.

      Thanks a lot, this has been corrected

      (8) Discussion: compare MorphoCellSorter to the MIC-MAC program used by Salamanca et al. (2019). They use a similar approach, albeit not Andrew's plot.

      We have added the Salamanca reference

      Reviewer #2 (Recommendations for the authors):

      While it's not expected that the authors address the significance of the morphology in relation to function here, they could help highlight the issue and produce data that would enhance the paper's significance. Therefore, I recommend a small-scale and straightforward study where the authors couple their analysis with a marker (e.g. Lysotracker or Mitotracker) to produce data that link their morphometric analysis to more functional readouts. Furthermore, I encourage the authors to elaborate on the practical applications of these morphometric tools and the implications of their measurements, as this would provide context for their work, which, as it stands, feels like just another tool.

      We would like to thank the reviewer for their thoughtful comment and suggestion. Indeed, MorphoCellSorter is simply another tool, but one that offers a more convenient and efficient approach, producing a variety of results tailored to specific research needs. We strongly believe that MorphoCellSorter should be used in conjunction with other tools, depending on the specific research question.

      In our view, MorphoCellSorter is particularly well-suited for researchers who need a quick and efficient way to determine whether their treatment, gene invalidation, or other experimental conditions affect microglial morphology. In this context, MorphoCellSorter is fast, user-friendly, and highly effective. However, for those who aim to uncover detailed differences in cell morphology, other tools requiring more time-intensive, full reconstructions of the cells would be more appropriate.

      Providing additional data on the relationship between cellular function and morphology could certainly pave the way for new questions and more robust evidence. For instance, combining single-cell transcriptomics with morphological analysis would be an excellent approach to exploring the relationship between function and morphology. However, this would involve significant time, expense, and effort, and it represents a different line of inquiry altogether.

      While it would be ideal to clearly demonstrate the link between morphology and function, we are concerned that pursuing such a goal would considerably delay the implementation and adoption of our tool, potentially raising additional questions beyond the scope of this study.!

      Minor comments:

      (1) Can MorphCellSorter be adapted for use with other cell types (e.g., astrocytes)?

      Yes it could, we have made some pretty conclusive analysis on astrocytes but some parameters have to be adapted before being released.

      (2) What modifications would be necessary? If it is not applicable, would a name that includes "Microglia" be more descriptive?

      Modification would be quite minor, it is mainly the parameters being considered that would change, this is the reason why we will keep the MorphoCellSorter name. Thank you for the suggestion!

      (3) A common challenge with such tools is the technical expertise required to use them. Could a user-friendly interface be developed to better fulfill its intended purpose and benefit the community?

      This is a good point thank you, and the answer is yes, we will translate our Matlab code to Python to open it to a wider audience and we will certainly work on a friendly user interface!

      (4) Given that this tool relies on imaging, can users trace a cell (or group of cells) back to the original image?

      Yes, it is possible if each crop is annotated with the spatial coordinates during the segmentation step. It is not yet implemented in the actual version of the software but mainly depend on the way segmentation is performed, which is not the topic of the paper.

      (5)  Line 36: The "biologically relevant" statement is central and needs to be expanded.

      This is not easy as it is the abstract with a word limit. What we mean by this sentence is that when classifying cells we force them by mathematical tools to enter in a group of cells based on metrics that have not necessarily a biological meaning. We suggest the following modification “However, this classification may lack biological relevance, as microglial morphologies represent a continuum rather than distinct, separate groups, and do not correspond to mathematically defined, clusters irrelevant of microglial cells function.”

      (6) Line 49-50: Provide reference and elaborate. For example, does this apply during early life?

      We have slightly changed the sentence and added a reference.

      (7) Line 69: Provide reference.

      The reference, Hubert et al 2021 has been added

      (8) Lines 78-88: A table summarizing other efforts in morphometric characterization of microglia would be helpful in distinguishing your work from others.

      This has already been done in some review articles; we thus added the references to address readers to these reviews. Here is the revised version of the sentence: “ To date, the literature contains a wide variety of criteria to quantitatively describe microglial morphology, ranging from descriptive measures such as cell body surface area, perimeter, and process length to indices calculating different parameters such as circularity, roundness, branching index, and clustering (Adaikkan et al., 2019; Heindl et al., 2018; Kongsui, Beynon, Johnson, & Walker, 2014; Morrison et al., 2017; Young & Morrison, 2018)”

      (9) Lines 130, 145: Please provide complete genotype information and the sources of the animals used.

      It has been done

      (10) Materials and Methods:

      (1) Standardize the presentation of products (e.g., using # consistently).

      It has been done

      (2) Provide versions of software used.

      We have modified accordingly

      (3) Lines 372-373: A table listing the 20 parameters with brief explanations (as partially done in Materials and Methods) would greatly improve readability.

      This is done in supp figure 8

      (4) Since nomenclature is a critical issue in the literature, you used specific definitions (lines 376-383). However, please indicate (with a reference) why you use the term "activated," as it implies that the others are non-activated. Alternatively, define "activated" cluster differently.

      We change activated microglia to reactive microglia as requested by the reviewer #1.

      (4) Figure 1: In my opinion placing this figure as the first main figure is problematic as it confuses the message of the paper. Since the authors are introducing a new approach for morphological characterization in Figure 2, I recommend the latter for the sake of readability and clarity should be the first main image, while Figure 1 can move the supplements.

      We do agree with the reviewer, we thus changed figure one as explained earlier to reviewer 1. Nonetheless because it is an important step of our reflection process we believe it can stay as a figure. We hope the change made in figure one clarifies the message of the paper.

      (5) Figure 1: Please indicate on the figure the marker for the analysis.

      Figure 2 has been changed

      (6) No funding agencies are communicated.

      This has been corrected

    1. eLife Assessment

      This manuscript represents a fundamental contribution demonstrating that fentanyl-induced respiratory depression can be reversed with a peripherally-restricted mu opioid receptor antagonist. The paper reports compelling and rigorous physiological, pharmacokinetic, and behavioral evidence supporting this major claim, and furthers mechanistic understanding of how peripheral opioid receptors contribute to respiratory depression. These findings reshape our understanding of opioid-related effects on respiration and have significant therapeutic implications given that medications currently used to reverse opioid overdose (such as naloxone) produce severe aversive and withdrawal effects via actions within the central nervous system.

    2. Reviewer #1 (Public review):

      Summary:

      This paper shows that the synthetic opioid fentanyl induces respiratory depression in rodents. This effect is revised by the opioid receptor antagonist naloxone, as expected. Unexpectedly, the peripherally restricted opioid receptor antagonist naloxone methiodide also blocks fentanyl-induced respiratory depression.

      Strengths:

      The paper reports compelling physiology data supporting the induction of respiratory distress in fentanyl-treated animals. Evidence suggesting that naloxone methiodide reverses this respiratory depression is compelling. This is further supported by pharmacokinetic data suggesting that naloxone methiodide does not penetrate into the brain, nor is it metabolized into brain-penetrant naloxone.

      Weaknesses:

      The paper would be further strengthened by establishing the functional significance of the altered neural activity detected in the nTS (as measured by cFos and GcAMP/photometry) in the context of opioid-induced respiratory depression.

    3. Reviewer #2 (Public review):

      Summary:

      In this article, Ruyle and colleagues assessed the contribution of central and peripheral mu opioid receptors in mediating fentanyl-induced respiratory depression using both nalaxone and nalaxone methiodide, which does not cross the blood brain barrier. Both compounds prevented and reversed fentanyl-induced respiratory depression to a comparable degree. The advantage of peripheral treatments is that they circumvent the withdrawal-like effects of nalaxone. Moreover, neurons located in the nucleus of the solitary tract are no longer activated by fentanyl when nalaxone methiodide is administered, suggesting that these responses are mediated by peripheral mu opioid receptors. The results delineate a role for peripheral mu opioid receptors in fentanyl-derived respiratory depression and identify a potentially advantageous approach to treating overdoses without inflicting withdrawal on the patients.

      Strengths:

      The strengths of the article include the intravenous delivery of all compounds, which increases the translational value of the article. The authors address both prevention and reversal of fentanyl-derived respiratory depression. The experimental design and data interpretation are rigorous and appropriate controls were used in the study. Multiple doses were screened in the study and the approaches were multipronged. The authors demonstrated activation of NTS cells using multiple techniques and the study links peripheral activation of mu opioid receptors to central activation of NTS cells. Both males and females were used in the experiments. The authors demonstrate the peripheral restriction of nalaxone methiodide.

      Weaknesses:

      Nalaxone is already broadly used to prevent overdoses from opioids so in some respects, the effects reported here are somewhat incremental.

      Comments on the latest version:

      I think the authors have adequately addressed previous critiques and I don't have any additional comments.

    4. Reviewer #3 (Public review):

      Summary

      This manuscript outlines a series of very exciting and game-changing experiments examining the role of peripheral MORs in OIRD. The authors outline experiments that demonstrate a peripherally restricted MOR antagonist (NLX Methiodide) can rescue fentanyl-induced respiratory depression and this effect coincides with a lack of conditioned place aversion. This approach would be a massive boon to the OUD community, as there are a multitude of clinical reports showing that naloxone rescue post fentanyl over-intoxication is more aversive than the potential loss-of-life to the individuals involved. This important study reframes our understanding of successful overdose rescue with a potential for reduced aversive withdrawal effects.

      Strengths:

      Strengths include the plethora of approaches arriving at the same general conclusion, the inclusion of both sexes, and the result that a peripheral approach for OIRD rescue may side-step severe negative withdrawal symptoms of traditional NLX rescue.

      Weaknesses:

      All weaknesses were addressed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript represents a fundamental contribution demonstrating that fentanyl-induced respiratory depression can be reversed with a peripherally-restricted mu opioid receptor antagonist. The paper reports compelling and rigorous physiological, pharmacokinetic, and behavioral evidence supporting this major claim, and furthers mechanistic understanding of how peripheral opioid receptors contribute to respiratory depression. These findings reshape our understanding of opioid-related effects on respiration and have significant therapeutic implications given that medications currently used to reverse opioid overdose (such as naloxone) produce severe aversive and withdrawal effects via actions within the central nervous system.

      We thank the reviewers for their insightful comments and critiques, which we have incorporated into the manuscript. We believe these revisions have significantly improved the manuscript. Additionally, following discussions among the authors, we have revised the color scheme across all figures. For example, the color of the symbols in Figure 1B-D now match the bars in Figure 1E-J, rather than the symbols. We feel that this change improves the clarity and visual consistency of the figures, making it easier to interpret the data across figures.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper shows that the synthetic opioid fentanyl induces respiratory depression in rodents. This effect is revised by the opioid receptor antagonist naloxone, as expected. Unexpectedly, the peripherally restricted opioid receptor antagonist naloxone methiodide also blocks fentanyl-induced respiratory depression.

      Strengths:

      The paper reports compelling physiology data supporting the induction of respiratory distress in fentanyl-treated animals. Evidence suggesting that naloxone methiodide reverses this respiratory depression is compelling. This is further supported by pharmacokinetic data suggesting that naloxone methiodide does not penetrate into the brain, nor is it metabolized into brain-penetrant naloxone.

      Weaknesses:

      A weakness of the study is the fact that the functional significance of opioid-induced changes in neural activity in the nTS (as measured by cFos and GcAMP/photometry) is not established. Does the nTS regulate fentanyl-induced respiratory depression, and are changes in nTS activity induced by naloxone and naloxone methiodide relevant to their ability to reverse respiratory depression?

      Reviewer #2 (Public review):

      Summary:

      In this article, Ruyle and colleagues assessed the contribution of central and peripheral mu opioid receptors in mediating fentanyl-induced respiratory depression using both naloxone and naloxone methiodide, which does not cross the blood-brain barrier. Both compounds prevented and reversed fentanyl-induced respiratory depression to a comparable degree. The advantage of peripheral treatments is that they circumvent the withdrawal-like effects of naloxone. Moreover, neurons located in the nucleus of the solitary tract are no longer activated by fentanyl when nalaxone methiodide is administered, suggesting that these responses are mediated by peripheral mu opioid receptors. The results delineate a role for peripheral mu opioid receptors in fentanyl-derived respiratory depression and identify a potentially advantageous approach to treating overdoses without inflicting withdrawal on the patients.

      Strengths:

      The strengths of the article include the intravenous delivery of all compounds, which increase the translational value of the article. The authors address both the prevention and reversal of fentanyl-derived respiratory depression. The experimental design and data interpretation are rigorous and appropriate controls were used in the study. Multiple doses were screened in the study and the approaches were multipronged. The authors demonstrated the activation of NTS cells using multiple techniques and the study links peripheral activation of mu opioid receptors to central activation of NTS cells. Both males and females were used in the experiments. The authors demonstrate the peripheral restriction of naloxone methiodide.

      Weaknesses:

      Nalaxone is already broadly used to prevent overdoses from opioids so in some respects, the effects reported here are somewhat incremental.

      The reviewer is correct that naloxone is the standard antidote for reversing opioid-induced respiratory depression. However, its limitations, including the risk of precipitated withdrawal, are well-documented in both preclinical and clinical studies. The likelihood of withdrawal increases when multiple doses of naloxone are administered. Since naloxone-induced withdrawal is centrally mediated, this study aimed to evaluate a peripherally restricted MOR antagonist for its ability to prevent or reverse fentanyl-induced respiratory depression. A key finding is that NLXM reversed OIRD without inducing aversive behavior. This suggests that peripheral antagonists like NLXM may be integrated into intervention strategies that save lives while preventing the adverse behavioral and physiological effects that are observed after treatment with naloxone.

      Reviewer #3 (Public review):

      Summary:

      This manuscript outlines a series of very exciting and game-changing experiments examining the role of peripheral MORs in OIRD. The authors outline experiments that demonstrate a peripherally restricted MOR antagonist (NLX Methiodide) can rescue fentanyl-induced respiratory depression and this effect coincides with a lack of conditioned place aversion. This approach would be a massive boon to the OUD community, as there are a multitude of clinical reports showing that naloxone rescue post fentanyl over-intoxication is more aversive than the potential loss-of-life to the individuals involved. This important study reframes our understanding of successful overdose rescue with potential for reduced aversive withdrawal effects.

      Strengths:

      Strengths include the plethora of approaches arriving at the same general conclusion, the inclusion of both sexes and the result that a peripheral approach for OIRD rescue may side-step severe negative withdrawal symptoms of traditional NLX rescue.

      Weaknesses:

      The major weakness of this version relates to the data analysis assessed sex-specific contributors to the results.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some points for the authors to consider are:

      (1) In the Abstract, it is unclear why "high potency and lipophilicity" contribute to opioid-induced respiratory depression.

      The higher potency of fentanyl compared to other opioids significantly increases the risk of overdose and subsequent respiratory depression. Its high lipophilicity facilitates rapid absorption and central nervous system penetration, which contributes to the rapid onset of these cardiorespiratory depression. The narrow therapeutic window of fentanyl further emphasizes the critical need for timely intervention when an overdose has occurred, and effective antagonists to reverse respiratory depression and save lives. We have revised the abstract to clarify these points.

      (2) Are the doses of fentanyl used in the study (2, 20, or 50 µg/kg IV) relevant to those achieved by fentanyl-exposed human drug users?

      In these studies, we intravenously administered three doses of fentanyl. The human equivalent doses (HED) of 20ug/kg and 50 ug/kg fentanyl are ~3 ug/kg and ~8 ug/kg, respectively. These doses have previously been shown to induce respiratory depression in humans (Dahan et al.,2005).

      (3) In Figure 1, it appeared that only a small fraction of tyrosine hydroxylase-positive (TH+) neurons expressed cFos in response to fentanyl, and the degree of cFos expression was largely similar across all fentanyl doses tested. Thus, it is unclear whether TH+ neurons play a role in fentanyl-induced respiratory depression, and the value of these data is unclear (see point #6 below also).

      As shown in the mean data, the lowest dose of fentanyl, which was below the threshold for inducing OIRD, activated approximately 50% of tyrosine hydroxylase-positive (TH+) nTS neurons. In contrast, the highest dose of fentanyl resulted in a statistically significant increase, with ~75% of TH+ cells co-expressing Fos-IR.

      We included the assessment of catecholaminergic nTS cells for several reasons. The regions of the nTS evaluated in this study contains high expression of MOR and are the termination points of sensory afferent fibers transmitting cardiorespiratory information to the nTS (Aicher et al., 2000; Furdui et al., 2024). Catecholaminergic cells receive direct excitatory inputs from visceral afferents (Appleyard et al., 2007) and exhibit intensity-dependent increases in Fos-IR in rats exposed to hypoxic air (Kline et al., 2010; King et al., 2012). These neurons are essential for generating appropriate cardiorespiratory responses to hypoxic challenges (Bathina et al., 2013; King et al., 2015). As the reviewer notes, rats exposed to fentanyl exhibit a high degree of Fos-IR in the nTS, including catecholaminergic neurons. Despite the robust fentanyl-induced activation (increased Fos-IR) nTS neurons, yet there appears to be a failure to initiate appropriate chemoreflex-mediated cardiorespiratory responses. Our photometry data further indicate that fentanyl-induced changes in neuronal activity are mediated, in part, by peripheral MOR. Collectively, these findings suggest that fentanyl impacts nTS activity through alterations in peripheral afferent signaling to the nTS, which may contribute to the severity and duration of OIRD.

      (4) It would help with the flow of the paper if the pharmacokinetic data shown in Figure 6 were presented earlier (as part of Figure 2).

      We have moved the biodistribution data earlier in the manuscript, now presenting it as Figure 2. The numbering of all subsequent figures has been adjusted accordingly.

      (5) In Figure 5, there appears to be a large number of GCaMP-expressing neurons located outside the nTS. To what degree can the changes in calcium signaling, attributed to alterations in neural activity in the nTS, be explained by altered activity of neurons located outside the nTS?

      The reviewer is correct that our viral spread extends beyond the boundaries of the nTS, raising the possibility that the responses observed in Figure 5 may be influenced by neural activity of cells outside the nTS. While some viral spread beyond the target region is unavoidable, calcium transients were measured at the tip of the fiber, which was positioned directly within the nTS.

      To address this concern further, we performed Fos immunohistochemistry in a subset of animals that received bilateral GCaMP virus injections into the nTS. Following fentanyl administration (50 µg/kg IV), brains were collected two hours later. As shown in the accompanying image, we observed Fos-IR co-expression with GCaMP exclusively within the nTS boundaries. No Fos-IR was detected outside the nTS, including in GCaMP cells. Taken together, these findings support our conclusion that the data depicted in our photometry figure (now Figure 6) accurately represent fentanyl-induced activity changes in nTS neurons.

      Author response image 1.

      Arrowheads: Fos-negative GCaMP cell; Arrows: Co-labeled Fos/GCaMP cell; Asterisk: Fos+ GCaMP-negative cell

      (6) Currently, the cFos and photometry data are descriptive in nature. Are opioid-induced changes in nTS neural activity relevant to respiratory depression? If so, one might expect DREADD-mediated stimulation of the nTS neural activity (or stimulating nTS activity by some other means) would reverse fentanyl-induced respiratory depression similar to naloxone and methyl-naloxone.

      The reviewer raises an interesting point regarding the relevance of the nTS in the context of OIRD. The nTS is a major site of integration of sensory afferent information and involved in the initiation of reflex responses that facilitate a return to homeostasis. As described above, we characterized the collective response of nTS neurons to intravenous fentanyl using both Fos immunohistochemistry and fiber photometry. Our data indicate that fentanyl-induced changes in nTS activity are strongly mediated by peripheral MOR. While the suggestion to use global chemogenetic activation of nTS neurons to reverse fentanyl-induced respiratory depression is intriguing, results from these experiments may be difficult to interpret due to the extensive heterogeneity of the nTS. However, we are currently conducting similar experiments using a more selective approach that will allow us to isolate and evaluate specific nTS phenotypes to better understand their contributions to OIRD.

      (7) Are peripherally restricted mu opioid receptor (MOR) agonists available? If so, it would strengthen the paper if such compounds could be used to show that stimulation of peripheral MORs is sufficient to induce respiratory distress independent of actions on centrally located MORs.

      Peripherally acting Mu Opioid Receptor Antagonists (PAMORAs) are indeed available and currently being evaluated in our laboratory.

      Reviewer #2 (Recommendations for the authors):

      Consider having the figures/data numbered in the order that they appear in the manuscript. Right now, Figure 6 is mentioned between Figures 1 and 2 (minor).

      Thank you for this suggestion. We have reordered the figures so that the biodistribution figure appears before the MOR antagonist pretreatment and reversal figures.

      Reviewer #3 (Recommendations for the authors):

      This manuscript outlines a series of very exciting and game-changing experiments examining the role of peripheral MORs in OIRD. The authors outline experiments that demonstrate a peripherally restricted MOR antagonist (NLX Methiodide) can rescue fentanyl-induced respiratory depression and this effect coincides with a lack of conditioned place aversion. This approach would be a massive boon to the OUD community, as there are a multitude of clinical reports showing that naloxone rescue post fentanyl over-intoxication is more aversive than the potential loss-of-life to the individuals involved. This important study reframes our understanding of successful overdose rescue with potential for reduced aversive withdrawal effects.

      While this is an exciting and important study, there are a few minor to moderate critiques for the authors to consider. These are below.

      (1) Title: "devoid of aversive effects" - While CPA is a good, cumulative indicator of potential aversive effects, it is not an exhaustive one. Since no other withdrawal measures were included, this is an overstatement.

      The reviewer is correct in noting that our analysis of aversive effects is not exhaustive. Since we only assessed changes in aversive behavior between NLX and NLXM, we believe it is more accurate to modify the title accordingly. We have changed the title from “devoid of aversive effects” to “devoid of aversive behavior” better reflect the scope of the experiments conducted.

      (2) Page 3, top line: MOR (mu opioid receptor) is highly expressed...

      An article should likely be included prior to MOR or make plural and adjust the sentence.

      Thank you for this suggestion. We have reworked this section in the manuscript.

      (3) Figure 6D: this figure is very important for the interpretation of every single figure. It should either be moved to figure 1 or 2 or combined with figure 1 or 2.

      Thank you for this suggestion. The biodistribution figure has been moved to Figure 2.

      (4) Page 5, line 164, Figure 21-D: remove the 1.

      Done.

      (5) Sex differences (or lack thereof):

      Throughout the manuscript, the authors report a lack of sex differences. However, while the data is not powered for the distinction of sex differences, there appears to be a bi-modal distribution of the individual data points that likely correspond to sex across most experiments. For example, in Figure 2E there are both color and clear dots, which this reviewer assumes indicates sex (however, this wasn't easily apparent if it was commented on at all in the paper). If you look at the saline oxygen saturation (nadir) levels (2e), there is wide variability with the red-filled circles, but not the clear ones. This may indicate a bimodal distribution (and may be related to the baseline HR sex differences highlighted). This is also the case in Figure 2L but is perhaps more obvious in the CPA score data (Figure 4d), where it seems the nlx negative CPA effects were likely driven primarily by one sex. While this reviewer does not expect a full powering of experiments for sex differences (and also is very appreciative of the inclusion of both sexes), full raw data with sex indicated included in the supplemental data would greatly aid the field in general and allow for those with a specific interest in this area to build upon this data. Additionally, further discussion regarding the potential role of sex differences in the translational value of these findings is also warranted.

      For all bar graphs, open symbols represent females and filled symbols represent males. This information can be found in the first paragraph of the Materials and Methods section. We have also added this information to each figure for increased visibility. We appreciate the acknowledgement of our inclusion of both sexes. For all experiments, we attempted to balance by sex. Unfortunately, we occasionally had to exclude animals for technical reasons (with clogged catheters being the most common reason for exclusion). This sometimes led to an imbalance in sex in some groups, as the reviewer has noted. In the graph of oxygen saturation nadir values in Fig 2E (now Fig 3E in the revised manuscript, all animals received intravenous fentanyl at a dose of 20 ug/kg. The reviewer is correct that there is greater variability in the males (filled symbols) compared to the females (open symbols) in this graph. However, this variability in the distribution was not observed in Fig 1E or Fig 4E, in which male and female rats received an identical dose of 20 ug/kg. Taking this into account, our overall interpretation of the data is that there is relatively minor sex difference in the responses observed after intravenous fentanyl, and the variability in Fig 3E is primarily due to a lower n compared to Fig 1E.

      All raw data will be uploaded to a data repository.

      (6) Page 7, line 209: Figure 5D should be Figure 6D.

      We have incorporated this change.

      (7) Page 8, line 267: Cure should be Curve.

      We have incorporated this change.

      (8) Discussion: Page10, line322 states that "no detectable NLX ... was found in brain tissue". This is incorrect based on Figure 6.

      The sentence the reviewer highlighted refers to detection of NLX or NLXM in brain tissue from animals that received intravenous NLXM. As demonstrated in the biodistribution figure (now Figure 2 in the manuscript), our data demonstrate that an intravenous injection of NLXM did not result in NLX formation in the brain. We have reworked the sentence for clarity.

      (9) jGCaMP injections: Figure 5B/c shows the distribution of the gcamp across animals. The optic fiber is placed directly over the NTs. However, how are we certain there isn't a nearby nuclei/structure outside the NTS that is contributing to the photometry data presented in D-G?

      See our above comment.  

      (10) Fiber Photometry and Sex: These studies unfortunately may have had only 1 of a sex included in the fiber photometry data. While the inclusion is overall good, the single value for a sex suggests that there are differences, given the clustering of the data. While the anesthesia may be driving this potential sex effect, it is not clear based on the data presented. For reference: https://link.springer.com/article/10.1007/s12975-012-0229-y

      The reviewer is correct that there was an imbalance of sex in this dataset. While we made every attempt to balance for sex across all experiments, we unfortunately had to exclude some animals for technical reasons (clogged catheter, missed injection site, etc). This produced an imbalance in our photometry studies and did not allow us to thoroughly evaluate sex differences in fentanyl-induced changes in neural activity or in the responses to anesthesia. We have expanded on this limitation in the discussion.

      (11) Figure 5 - the bars are not the color indicated by the legend.

      We have corrected this in the figure. Thank you.

    1. eLife Assessment

      In this revised work, Barzó et al. assessed the electrophysiological and anatomical properties of a large number of layer 2/3 pyramidal neurons in brain slices of human neocortex across a wide range of ages, from infancy to elderly individuals, using whole-cell patch clamp recordings and anatomical reconstructions. This large data set represents an important contribution to our understanding of how these properties change across the human lifespan, supported by convincing data and analyses. The authors have addressed the concerns raised in previous reviews. Overall, this study strengthens our understanding of how the neural properties of human cortical neurons change with age and will contribute to building more realistic models of human cortical function.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript co-authored by Pál Barzó et al is very clear and very well written, demonstrating the electrophysiological and morphological properties of the human cortical layer 2/3 pyramidal cells across a wide age range, from age 1 month to 85 years using whole-cell patch clamp. To my knowledge, this is the first study that look at the cross-age differences biophysical and morphological properties of human cortical pyramidal cells. The community will also appreciate the significant effort involved in recording data from 485 cells, given the challenges associated with collecting data from human tissue. Understanding the electrophysiological properties of individual cells, which are essential for brain function, is crucial for comprehending human cortical circuits. I think this research enhances our knowledge of how biophysical properties change over time in the human cortex. I also think that by building models of human single cells at different ages using these data, we can develop more accurate representations of brain function. This, in turn, provides valuable insights into human cortical circuits and function and helps in predicting changes in biophysical properties in both health and disease.

      Strengths:

      The strength of this work lies in demonstrating how the electrophysiological and morphological features of human cortical layer 2/3 pyramidal cells change with age, offering crucial insights into brain function throughout life.

      Comments on revisions:

      Thanks to the authors for addressing my comments and providing greater clarity in the methodology. The analysis is much clearer now. I also appreciate their additional data analysis, particularly on morphology, which strengthens the paper.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Barzo and colleagues aim to establish an appraisal for the development of basal electrophysiology of human layer 2/3 pyramidal cells across life and compare their morphological features at the same ages.

      Strengths:

      The authors have generates recordings from an impressive array of patient samples, allowing them to directly compare the same electrophysiological features as a function of age and other biological features. These data are extremely robust and well organised.

      The authors group patient ages into developmentally organised bins, which are elaborated on in supplementary analysis - exemplifying the importance of determining early postnatal development on human neuron function

      Weaknesses:

      The author's use of (perhaps) arbitrary categorisation of spine morphology could limit the full usefulness of these data.

      Overall, the authors achieve their aims by assessing the physiological and morphological properties of human L2/3 pyramidal neurons across life. Their findings have extremely important ramifications for our understanding of human brain development and implications for how different neuronal properties may influence life and disease associated with neurological conditions.

      Comments on revisions:

      Overall, the authors have satisfied my concerns. I fully appreciate their candour with their data and the potential limitations. I especially appreciate their supplementary data inclusions which I believe truly strengthen their conclusions and are a valuable resource for the field,

      I agree whole-heartedly with the authors assertion that it is perhaps better to use the most sophisticated equipment, not always being most appropriate. However, statistical rigour should still be standard. As such, my one remaining concern relates to inappropriate replicate choice of spine morphology data in figure 6. I commend the authors inclusion of additional reconstructions and morphology data from further cells in this data set. However, to me, these still represent data from 3 cells and 1 patient/age - as to the best of my interpretation. I feel it would be more helpful to plot cell averages +/- SD for each cell - even if side-by-side with data from all spines. Likewise, it is unclear what statistical test was performed on these data and did it take into account the fact that these values are a) from 3 technical replicates per group, or b) that many of the data sets consist of many zero-values (would a categorical test be more appropriate?).

    4. Reviewer #3 (Public review):

      Summary:

      To understand the specificity of age-dependent changes in the human neocortex, this paper investigated the electrophysiological and morphological characteristics of pyramidal cells in a wide age range from infants to the elderly.

      The results show that some electrophysiological characteristics change with age, particularly in early childhood. In contrast, the larger morphological structures, such as the spatial extent and branching frequency of dendrites, remained largely stable from infancy to old age. On the other hand, the shape of dendritic spines is considered immature in infancy, i.e., the proportion of mushroom-shaped spines increases with age.

      Strengths:

      Whole-cell recordings and intracellular staining of pyramidal cells in defined areas of the human neocortex allowed the authors to compare quantitative parameters of electrophysiological and morphological properties between finely divided age groups.

      They succeeded in finding symmetrical changes specific to both infants and the elderly, and asymmetrical changes specific to either infants or the elderly. The similarity of pyramidal cell characteristics between areas is unexpected.

      Weaknesses:

      Human L2/3 pyramidal cells are thought to be heterogeneous, as L2/3 has expanded to a high degree during the evolution from rodents to humans. However, the diversity (subtyping) is not revealed in this paper.

      Comments on revisions:

      I believe that the current version has been sufficiently revised based on my comments.

    1. eLife assessment:

      This study describes a new set of genetic tools for optimized Cre-mediated gene deletion in mice. The advances are substantial and will facilitate biomedical research. Although the tools have been validated using solid methodologies, the quantitative assessment of their recombination efficiency is not yet sufficiently described. Evaluating their ability to mediate the deletion of multiple alleles in a mosaic setting would also be a highly valuable addition.

    2. Reviewer #1 (Public Review):

      Summary:

      Shi and colleagues report the use of modified Cre lines in which the coding region of Cre is disrupted by rox-STOP-rox or lox-STOP-lox sequences to prevent the expression of functional protein in the absence of Dre or Cre activity, respectively. The main purpose of these tools is to enable intersectional or tamoxifen-induced Cre activity with minimal or no leaky activity from the second, Cre-expressing allele. It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      Strengths:

      The new tools can reduce Cre leak in vivo.

      Weaknesses:

      (1) Activity of R26-loxCre line. As the authors point out, the greatest value of this approach is to accomplish a more complete Cre-mediated gene deletion using CreER transgenes that are combined with low-efficiency floxed alleles using their R26-loxCre line that is similar to the iSure Cre reported by Benedito and colleagues. The data in Figure 5 show strong activity at the Confetti locus, but the design of the newly reported R26-loxCre line lacks a WPRE sequence that was included in the iSure-Cre line to drive very robust protein expression. Thus while the line appears to have minimal leak, as the design would predict, the question of how much of a deletion increase is obtained over simple use of the CreER transgene alone is a key question for use by investigators. This is further addressed in Figure 6 where it is compared with Alb-CreER alone to recombine the Ctnnb1 floxed allele. They demonstrate that recombination frequency is clearly improved, but the western blot in Figure 6E does not look like there was a large amount of remaining b-catenin to remove. These data are certainly promising, but the most valuable experiment for such a new tool would be a head-to-head comparison with iSure (or the latest iSure version from the Benedito lab) using the same CreER and target floxed allele. At the very least a comparision of Cre protein expression between the two lines using identical CreER activators is needed.

      (2) In vivo analysis of mCre activities. Why did the authors not use the same driver to compare mCre 1, 4, 7, and 10? The study in Figure 2 uses Alb-roxCre for 1 and 7 and Cdh5-roxCre for 4 and 10, with clearly different levels of activity driven by the two alleles in vivo. Thus whether mCre1 is really better than mCre4 or 10 is not clear.

      (3) Technical details are lacking. The authors provide little specific information regarding the precise way that the new alleles were generated, i.e. exactly what nucleotide sites were used and what the sequence of the introduced transgenes is. Such valuable information must be gleaned from schematic diagrams that are insufficient to fully explain the approach.

    3. Reviewer #2 (Public Review):

      Summary:

      This work presents new genetic tools for enhanced Cre-mediated gene deletion and genetic lineage tracing. The authors optimise and generate mouse models that convert temporally controlled CreER or DreER activity to constitutive Cre expression, coupled with the expression of tdT reporter for the visualizing and tracing of gene-deleted cells. This was achieved by inserting a stop cassette into the coding region of Cre, splitting it into N- and C-terminal segments. Removal of the stop cassette by Cre-lox or Dre-rox recombination results in the generation of modified Cre that is shown to exhibit similar activity to native Cre. The authors further demonstrate efficient gene knockout in cells marked by the reporter using these tools, including intersectional genetic targeting of pericentral hepatocytes.

      Strengths:

      The new models offer several important advantages. They enable tightly controlled and highly effective genetic deletion of even alleles that are difficult to recombine. By coupling Cre expression to reporter expression, these models reliably report Cre-expressing i.e. gene-targeted cells, and circumvent false positives that can complicate analyses in genetic mutants relying on separate reporter alleles. Moreover, the combinatorial use of Dre/Cre permits intersectional genetic targeting, allowing for more precise fate mapping.

      Weaknesses:

      The scenario where the lines would demonstrate their full potential compared to existing models has not been tested. Mosaic genetics is increasingly recognized as a key methodology for assessing cell-autonomous gene functions. The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. Therefore, a demonstration of the efficient deletion of multiple floxed alleles in a mosaic fashion would be a valuable addition.

      In addition, a drawback of this line is the constitutive expression of Cre. When combined with the confetti line, the reporter cassette will continue flipping, potentially leading to misleading lineage tracing results. Constitutive expression of Cre is also associated with toxicity, as discussed by the authors in the introduction. These drawbacks should be acknowledged.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors report a new version of the iSuRe-Cre approach, which was originally developed by Rui Benedito's group in Spain (https://doi.org/10.1038/s41467-019-10239-4). Shi et al claim that their approach shows reduced leakiness compared to the iSuRe-Cre line. Shi et al elaborate strongly about the leakiness of iSuRe-Cre mice, although leakiness is rather minor according to the original publication and the senior author of the study wrote in a review a few years ago that there is no leakiness (https://doi.org/10.1016/j.jbc.2021.100509). Furthermore, a new R26-roxCre-tdT mouse line was established after extensive testing, which enables efficient expression of the Cre recombinase after activation of the Dre recombinase.

      Strengths:

      The authors carefully evaluated the efficiency and leakiness of the new strains and demonstrated the applicability by marking peri-central hepatocytes in an intersectional genetics approach, amongst others. I can only find very few weaknesses in the paper, which represents the result of an enormous effort. Carefully conducted technical studies have considerable value. However, I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, rather than a primarily technical report, which describes the ongoing efforts to further improve Cre and Dre recombinase-mediated recombination.

      Weaknesses:

      Very high levels of Cre expression may cause toxic effects as previously reported for the hearts of Myh6-Cre mice. Thus, it seems sensible to test for unspecific toxic effects, which may be done by bulk RNA-seq analysis, cell viability, and cell proliferation assays. It should also be analyzed whether the combination of R26-roxCre-tdT with the Tnni3-Dre allele causes cardiac dysfunction, although such dysfunctions should be apparent from potential changes in gene expression.

      The R26-GFP or R26-tdT reporters, Alb-roxCre1-tdT, Cdh5-roxCre4-tdT, Alb-roxCre7-GFP, and Cdh5-roxCre10-GFP demonstrate no leakiness without Dre-rox recombination (Figure S1-S2). Is there any leakiness when the inducible DreER allele is introduced but no tamoxifen treatment is applied? This should be documented. The same also applies to loxCre mice.

      The enhanced efficiency of loxCre and roxCre systems holds promise for reducing the necessary tamoxifen dosage, potentially reducing toxicity and side effects. In Figure 6, the author demonstrates an enhanced recombination efficiency of loxCre mice, which makes it possible to achieve efficient deletion of Ctnnb1 with a single dose of tamoxifen, whereas a conventional driver (Alb-CreER) requires five dosages. It would be very helpful to include a dose-response curve for determining the minimum dosage required in Alb-CreER; R26-loxCre-tdT; Ctnnb1flox/flox mice for efficient recombination.

      In the liver panel of Figure 4F, tdT signals do not seem to colocalize with the VE-cad signals, which is odd. Is there any compelling explanation?

      The authors claim that "virtually all tdT+ endothelial cells simultaneously expressed YFP/mCFP" (right panel of Figure 5D). Well, it seems that the abundance of tdT is much lower compared to YFP/mCFP. If the recombination of R26-Confetti was mainly triggered by R26-loxCre-tdT, the expression of tdT and YFP/mCFP should be comparable. This should be clarified.

      In several cases, the authors seem to have mixed up "R26-roxCre-tdT" with "R26-loxCre-tdT". There are errors in #251 and #256. Furthermore, in the passage from line #278 to #301. In the lines #297 and #300 it should probably read "Alb-CreER; R26-loxCre-tdT;Ctnnb1flox/flox"" rather than "Alb-CreER;R26-tdT2;Ctnnb1flox/flox".

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      We are grateful for this comment. For the usefulness of these alleles, figure 3 shows that specific and efficient genetic manipulation of one cell subpopulation can be achieved by mating across the DreER mouse strain to the rox-Cre mouse strain. In addition, figure 6 shows that R26-loxCre-tdT can effectively ensure Cre-loxP recombination on some gene alleles and for genetic manipulation. The expression of the tdT protein is aligned with the expression of the Cre protein (Alb roxCre-tdT and R26-loxCre-tdT, figure 2 and figure 5), which ensures the accuracy of the tracing experiments. We believe more functional data can be shown in future articles that use mice lines mentioned in this manuscript.

      (2) The data in Figure 5 show strong activity at the Confetti locus, but the design of the newly reported R26-loxCre line lacks a WPRE sequence that was included in the iSure-Cre line to drive very robust protein expression.

      Thank you for coming up with this point in the manuscript. In the R26-loxCre-tdT mice knock-in strategy, the WPRE sequence is added behind the loxCre-P2A-tdT sequence.

      (3) the most valuable experiment for such a new tool would be a head-to-head comparison with iSure (or the latest iSure version from the Benedito lab) using the same CreER and target foxed allele. At the very least a comparison of Cre protein expression between the two lines using identical CreER activators is needed.

      According to the reviewer’s suggestion, we will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (4) Why did the authors not use the same driver to compare mCre 1, 4, 7, and 10? The study in Figure 2 uses Alb-roxCre for 1 and 7 and Cdh5-roxCre for 4 and 10, with clearly different levels of activity driven by the two alleles in vivo. Thus whether mCre1 is really better than mCre4 or 10 is not clear.

      Thank you for raising this concern. After screening out four robust versions of mCre, we generated these four roxCre knock-in mice. It is unpredictable for us which is the most robust mCre in vivo. It might be one or two mCre versions that work efficiently. For example, if Alb-mCre1 was competitive with Cdh5-mCre10, we can use them for targeting genes in different cell types, broadening the potential utility of these mice.

      (5) Technical details are lacking. The authors provide little specific information regarding the precise way that the new alleles were generated, i.e. exactly what nucleotide sites were used and what the sequence of the introduced transgenes is. Such valuable information must be gleaned from schematic diagrams that are insufficient to fully explain the approach.

      Thank you for your careful suggestions.

      We will provide schematic figures as well as nucleotide sequences for mice generation in the revised manuscript.

      Reviewer #2 (Public Review):

      (1) The scenario where the lines would demonstrate their full potential compared to existing models has not been tested.

      We are grateful for this suggestion. We will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (2) The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. Therefore, a demonstration of the efficient deletion of multiple floxed alleles in a mosaic fashion would be a valuable addition.

      Thank you for your constructive comments. Mosaic analysis using sparse labeling and efficient gene deletion would be our future direction using roxCre and loxCre strategies. We will include some discussion of using such strategy in the revised manuscript.

      (3) When combined with the confetti line, the reporter cassette will continue flipping, potentially leading to misleading lineage tracing results.

      Thank you for your professional comments. Indeed, the confetti used in this study can continue flipping, which would lead to potentially misleading lineage tracing results. Our use of R26-Confetti is to demonstrate the robustness of mCre for recombination. Some multiple-color mice lines that don’t flip have been published, for example, R26-Confetti2(10.1038/s41588-019-0346-6) and Rainbow (10.1161/CIRCULATIONAHA.120.045750). These reporters could be used for tracing Cre-expressing cells, without concerns of flipping of reporter cassettes.

      (4) Constitutive expression of Cre is also associated with toxicity, as discussed by the authors in the introduction.

      Thank you for your professional comments. The toxicity of constitutive expression of Cre and the toxicity associated with tamoxifen treatment in CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      Reviewer #3 (Public Review):

      (1) Although leakiness is rather minor according to the original publication and the senior author of the study wrote in a review a few years ago that there is no leakiness(https://doi.org/10.1016/j.jbc.2021.100509).

      Thank you so much for your careful check. In this review (https://doi.org/10.1016/j.jbc. 2021.100509), the writer’s comments on iSuRe-Cre are on the reader's side, and all summary words are based on the original published paper (10.1038/s41467-019-10239-4). Currently, we have tested iSuRe-Cre in our hands. We did detect some leakiness in the heart and muscle, but hardly in other tissues as shown in the following figure.

      Author response image 1.

      Leakiness in Alb CreER;iSuRe-Cre mouse line. Pictures are representative results for 5 mice. Scale bars, white 100 µm.

      (2) I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, rather than a primarily technical report, which describes the ongoing efforts to further improve Cre and Dre recombinase-mediated recombination.

      We gratefully appreciate your valuable comment. The roxCre and loxCre mice mentioned in this study provide more effective methods for inducible genetic manipulation in studying gene function. We hope that the application of our new genetic tools could help address some major biological questions in different biomedical fields in the future.

      (3) Very high levels of Cre expression may cause toxic effects as previously reported for the hearts of Myh6-Cre mice. Thus, it seems sensible to test for unspecific toxic effects, which may be done by bulk RNA-seq analysis, cell viability, and cell proliferation assays. It should also be analyzed whether the combination of R26-roxCre-tdT with the Tnni3-Dre allele causes cardiac dysfunction, although such dysfunctions should be apparent from potential changes in gene expression.

      We are sorry that we mistakenly spelled R26-loxCre-tdT into R26-roxCre-tdT in our manuscript. We have not generated R26-roxCre-tdT mouse line. We also thank the reviewer for concerns about the toxicity of high Cre expression. The toxicity of constitutive expression of Cre and the toxicity of tamoxifen treatment of CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      (4) Is there any leakiness when the inducible DreER allele is introduced but no tamoxifen treatment is applied? This should be documented. The same also applies to loxCre mice.

      In this study, we come up with new mice tool lines, including Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT. As the data shown in supplementary figure 1, supplementary figure 2, and figure 4D, Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT are not leaky. Therefore, if there is any leakiness driven by the inducible DreER or CreER allele, the leakiness is derived from the DreER or CreER. We will supplement relevant experimental data in the revision.

      (5) It would be very helpful to include a dose-response curve for determining the minimum dosage required in Alb-CreER; R26-loxCre-tdT; Ctnnb1flox/flox mice for efficient recombination.

      Thank you for your suggestion. We understand the reviewer’s concern. We can do a dose-response curve in the revision work.

      (6) In the liver panel of Figure 4F, tdT signals do not seem to colocalize with the VE-cad signals, which is odd. Is there any compelling explanation?

      As the file-loading website has a file size limitation, the compressed image results in some signal unclear. The following are the zoom-out figures. The staining in Figure 4F will be optimized and high-resolution images will be provided in the revision.

      Author response image 2.

      (7) The authors claim that "virtually all tdT+ endothelial cells simultaneously expressed YFP/mCFP" (right panel of Figure 5D). Well, it seems that the abundance of tdT is much lower compared to YFP/mCFP. If the recombination of R26-Confetti was mainly triggered by R26-loxCre-tdT, the expression of tdT and YFP/mCFP should be comparable. This should be clarified.

      Thank you so much for your careful check. We checked these signals carefully and didn't find the “much lower” tdT signal. As the file-loading website has a file size limitation, the compressed image results in some signal unclear. We attached clear high resolution images here. The following figure shows how we split the tdT signal and compared it with YFP/mCFP.

      Author response image 3.

      (8) In several cases, the authors seem to have mixed up "R26-roxCre-tdT" with "R26-loxCre-tdT". There are errors in #251 and #256.Furthermore, in the passage from line #278 to #301. In the lines #297 and #300 it should probably read "Alb-CreER; R26-loxCretdT;Ctnnb1flox/flox"" rather than "Alb-CreER;R26-tdT2;Ctnnb1flox/flox".

      We are grateful for these careful observations. We have corrected these typos accordingly.

    1. eLife Assessment

      This is an important study that characterizes a surprising interaction between two different cytokine/hormone receptors using nanoscale resolution (dSTORM) microscopy. The study provides solid evidence that the interaction is ligand-dependent, and is mediated by the receptor-associated intracellular signalling molecule JAK2. While at present limited to growth hormone and prolactin receptors in a limited number of cell lines, there are potentially broad implications for cytokine signalling, as such JAK2-mediated interactions could occur between a range of different cytokines. Moreover, the specific hormone interactions shown in the manuscript may have significant implications for understanding how these hormones can have differential effects in breast cancer, under different conditions.

    2. Reviewer #3 (Public review):

      Summary:

      The authors are interested in the relative importance of PRL versus GH and their interactive signaling in breast cancer. After examining GHR-PRLR interactions in response to ligands, they suggest that a reduction in cell surface GHR in response to PRL may be a mechanism whereby PRL can sometimes be protective against breast cancer.

      Strengths:

      The strengths of the study include the interesting question being addressed and the application of multiple complementary techniques, including dSTORM, which is technically very challenging, especially when using double labeling. Thus, dSTORM is used to analyze co-clustering of GHR and PRLR, and, in response to PRL, rapid internalization of GHR and increased cell surface PRLR. Conclusions from Proximity ligation assays are that some GHR and PRLR are within 40 nm (≈ 4 plasma membranes) of each other and that upon ligand stimulation, they move apart. Intact receptor knockin and knockout approaches and receptor constructs without the Jak2 binding domain demonstrate a) a requirement for the PRLR for there to be PRL- driven internalization of GHR, and b) that Jak2-PRLR interactions are necessary for stability of the GHR-PRLR colocalizations.

      Weaknesses:

      Although improved over the first version, the manuscript still suffers from a lack of detail, which in places makes it difficult to evaluate the data and would make it very difficult for the results to be replicated by others.

      Comments on revised version:

      Points for improvement of the manuscript:

      (1) There is still insufficient detail about the proximity ligation assay. For example, PLAs that use reagents from Sigma (as now reported) require primary antibodies from two different species and yet both the anti-PRLR and anti-GHR used for dSTORM were mouse monoclonals. On line 356 it says that the ECD antibodies were used for microscopy and the PLA is microscopy. Were instead the ICD antibodies used for the PLA? If so, how do we know that one or more of the proteins in the very strong "non-specific" bands seen on Figure 5A are not what is being localized? Could you do a Western blot of just cell membrane proteins? There needs to be further clarity/explanation.

      (2) Although the manuscript now shows a Western blot using the antibodies against intracellular regions of the receptor, a full Western blot is not provided for the antibodies against the S2 extracellular domain used for the dSTORM. While I haven't checked the papers showing characterization of the anti-GHR, I did re-check reference 70, which the authors say shows full characterization of the PRLR antibody, and this does not show a full Western (only portions of gels). How do we know that this antibody is not recognizing some other cell surface molecule, the surface expression of which increases upon stimulation of the cells with PRL? Is there only one band when blotting whole cell extracts with either the GHR or PRLR ECD antibodies so we can be sure of specificity? Figure S2 helps some, but these are different cells and the relative expression of the PRLR versus some other potential cell surface protein in these engineered cells may well be completely different.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The questions after reading this manuscript are what novel insights have been gained that significantly go beyond what was already known about the interaction of these receptors and, more importantly, what are the physiological implications of these findings? The proposed significance of the results in the last paragraph of the Discussion section is speculative since none of the receptor interactions have been investigated in TNBC cell lines. Moreover, no physiological experiments were conducted using the PRLR and GH knockout T47D cells to provide biological relevance for the receptor heteromers. The proposed role of JAK2 in the cell surface distribution and association of both receptors as stated in the title was only derived from the analysis of box 1 domain receptor mutants. A knockout of JAK2 was not conducted to assess heteromers formation.

      We thank the reviewer for these comments. The novel insight is that two different cytokine receptors can interact in an asymmetric, ligand-dependent manner, such that one receptor regulates the other receptor’s surface availability, mediated by JAK2. To our knowledge this has not been reported before. Beyond our observations, there is the question if this could be a much more common regulatory mechanism and if it has therapeutic relevance. However, answering these questions is beyond the scope of this work.

      Along the same line, the question regarding the biological relevance of our receptor heteromers and JAK2’s role in cell surface distribution is undoubtfully very important. Studying GHR-PRLR cell surface distributions in JAK2 knockout cells and certain TNBC cell lines as proposed by the reviewer could perhaps be insightful. However, most TNBCs down-regulate PRLR [1], so we would first have to identify TNBC cell lines that actually express PRLR at sufficiently high levels. Moreover, knocking out JAK2 is known to significantly reduce GHR surface availability [2,3], such that the proposed experiment would probably provide only limited insights.

      Unfortunately, our team is currently not in the position to perform any experiments (due to lack of funding and shortage of personnel). However, to address the reviewer’s comment as much as possible, we have revised the respective paragraph of the discussion section to emphasize the speculative nature of our statement and have added another paragraph discussing shortcoming and future experiments (see revised manuscript, pages 23-24).

      (1) López-Ozuna, V., Hachim, I., Hachim, M. et al. Prolactin Pro-Differentiation Pathway in Triple Negative Breast Cancer: Impact on Prognosis and Potential Therapy. Sci Rep 6, 30934 (2016). https://www.nature.com/articles/srep30934

      (2) He, K., Wang, X., Jiang, J., Guan, R., Bernstein, K.E., Sayeski, P.P., Frank, S.J. Janus kinase 2 determinants for growth hormone receptor association, surface assembly, and signaling. Mol Endocrinol. 2003;17(11):2211-27. doi: 10.1210/me.2003-0256. PMID: 12920237.

      (3) He, K., Loesch, K., Cowan, J.W., Li, X., Deng, L., Wang, X., Jiang, J., Frank, S.J. Janus Kinase 2 Enhances the Stability of the Mature Growth Hormone Receptor, Endocrinology, Volume 146, Issue 11, 2005, Pages 4755–4765,https://doi.org/10.1210/en.2005-0514

      (2) Except for some investigation of γ2A-JAK2 cells, most of the experiments in this study were conducted on a single breast cancer cell line. In terms of rigor and reproducibility, this is somewhat borderline. The CRISPR/Cas9 mutant T47D cells were not used for rescue experiments with the corresponding full-length receptors and the box1 mutants. A missed opportunity is the lack of an investigation correlating the number of receptors with physiological changes upon ligand stimulation (e.g., cellular clustering, proliferation, downstream signaling strength).

      We appreciate the reviewer’s comments. While we are confident in the reproducibility of our findings, including those obtained in the T47D cell line, we acknowledge that testing in additional cell lines would have strengthened the generalizability of our results. We also recognize that performing a rescue experiment using our T47D hPRLR or hGHR KO cells would have been valuable. Furthermore, examining physiological changes, such as proliferation rates and downstream signaling responses, would have provided additional insights. Unfortunately, these experiments were not conducted at the time, and we currently lack the resources to carry them out.

      (3) An obvious shortcoming of the study that was not discussed seems to be that the main methodology used in this study (super-resolution microscopy) does not distinguish the presence of various isoforms of the PRLR on the cell surface. Is it possible that the ligand stimulation changes the ratio between different isoforms? Which isoforms besides the long form may be involved in heteromers formation, presumably all that can bind JAK2?

      This is a very good point. We fully agree with the reviewer that a discussion of the results in the light of different PRLR isoforms is appropriate. We have added information on PRLR isoforms to the Introduction (see revised manuscript, page 2) and Discussion sections (see revised manuscript, pages 23-24).

      (4) Changes in the ligand-inducible activation of JAK2 and STAT5 were not investigated in the T47D knockout models for the PRL and GHR. It is also a missed opportunity to use super-resolution microscopy as a validation tool for the knockouts on the single cell level and how it might affect the distribution of the corresponding other receptor that is still expressed.

      We thank the reviewer for his comment. We fully agree that such additional experiments could be very valuable. We are sorry but, as already mentioned above, this is not something we are able to address at this stage due to lack of personnel and funding. However, we do hope to address these and other proposed experiments in the future.

      (5) Why does the binding of PRL not cause a similar decrease (internalization and downregulation) of the PRLR, and instead, an increase in cell surface localization? This seems to be contrary to previous observations in MCF-7 cells (J Biol Chem. 2005 October 7; 280(40): 33909-33916).

      It has been recently reported for GHR that not only JAK2 but also LYN binds to the box1-box2 region, creating competition that results in divergent signaling cascades and affects GHR nanoclustering [1]. So, it is reasonable to assume that similar mechanisms may be at work that regulate PRLR cell surface availability. Differences in cells’ expression of such kinases could perhaps play a role in the perceived inconsistency. Also, Lu et al. [2] studied the downregulation of the long PRLR isoform in response to PRL. All other PRLR isoforms were not detectable in MCF-7 cells. So, differences between MCF-7 and T47D may lead to this perceived contradiction.

      At this stage, we can only speculate about the actual reasons for these seemingly contradictory results. However, for full transparency, we are now mentioning this apparent contradiction in the Discussion section (see page 23) and have added the references below.

      (1) Chhabra, Y., Seiffert, P., Gormal, R.S., et al. Tyrosine kinases compete for growth hormone receptor binding and regulate receptor mobility and degradation. Cell Rep. 2023;42(5):112490. doi: 10.1016/j.celrep.2023.112490. PMID: 37163374.

      https://www.cell.com/cell-reports/pdf/S2211-1247(23)00501-6.pdf

      (2) Lu, J.C., Piazza, T.M., Schuler, L.A. Proteasomes mediate prolactin-induced receptor down-regulation and fragment generation in breast cancer cells. J Biol Chem. 2005 Oct 7;280(40):33909-16. doi: 10.1074/jbc.M508118200. PMID: 16103113; PMCID: PMC1976473.

      (6) Some figures and illustrations are of poor quality and were put together without paying attention to detail. For example, in Fig 5A, the GHR was cut off, possibly to omit other nonspecific bands, the WB images look 'washed out'. 5B, 5D: the labels are not in one line over the bars, and what is the point of showing all individual data points when the bar graphs with all annotations and SD lines are disappearing? As done for the y2A cells, the illustrations in 5B-5E should indicate what cell lines were used. No loading controls in Fig 5F, is there any protein in the first lane? No loading controls in Fig 6B and 6H.

      We thank the reviewer for pointing this out. We have amended Fig. 5A to now show larger crops of the two GHR and PRLR Western Blot images and thus a greater range of proteins present in the extracts. Please note that the bands in the WBs other than what is identified as GHR and PRLR are non-specific and reflect roughly equivalent loading of protein in each lane.

      We also made some changes to Figures 5B-5E.

      (7) The proximity ligation method was not described in the M&M section of the manuscript.

      We thank the reviewer for pointing this out. We have added a description of the PL method to the Methods section.

      Reviewer #1 (Recommendations for the Authors):

      A final suggestion for future investigations: Instead of focusing on the heteromer formation of the GHR/PRLR which both signal all through the same downstream effectors (JAK2, STAT5), it would have been more cancer-relevant, and perhaps even more interesting, to look for heteromers between the PRLR and receptors of the IL-6 family since it had been shown that PRL can stimulate STAT3, which is a unique feature of cancer cells. If that is the case, this would require a different modality of the interaction between different JAK kinases.

      We highly appreciate the reviewer’s recommendation and hope to follow up on it in the near future.

      Reviewer #2 (Public Review):

      (1) I could not fully evaluate some of the data, mainly because several details on acquisition and analysis are lacking. It would be useful to know what the background signal was in dSTORM and how the authors distinguished the specific signal from unspecific background fluorescence, which can be quite prominent in these experiments. Typically, one would evaluate the signal coming from antibodies randomly bound to a substrate around the cells to determine the switching properties of the dyes in their buffer and the average number of localisations representing one antibody. This would help evaluate if GHR or PRLR appeared as monomers or multimers in the plasma membrane before stimulation, which is currently a matter of debate. It would also provide better support for the model proposed in Figure 8.

      We are grateful for the reviewer’s comment. In our experience, the background signal is more relevant in dSTORM when imaging proteins that are located at deeper depths (> 3 μm) above the coverslip surface. In our experiments, cells are attached to the coverslip surface and the proteins being imaged are on the cell membrane. In addition, we employed dSTORM’s TIRF (total internal reflection fluorescence) microscopy mode to image membrane receptor proteins. TIRFM exploits the unique properties of an induced evanescent field in a limited specimen region immediately adjacent to the interface between two media having different refractive indices. It thereby dramatically reduces background by rejecting fluorescence from out-of-focus areas in the detection path and illuminating only the area right near the surface.

      Having said that, a few other sources such as auto-fluorescence, scattering, and non-bleached fluorescent molecules close to and distant from the focal plane can contribute to the background signal. We tried to reduce auto-fluorescence by ensuring that cells are grown in phenol-red-free media, imaging is performed in STORM buffer which reduces autofluorescence, and our immunostaining protocol includes a quenching step aside from using blocking buffer with different serum, in addition to BSA. Moreover, we employed extensive washing steps following antibody incubations to eliminate non-specifically bound antibodies. Ensuring that the TIRF illumination field is uniform helps reduce scatter. Additionally, an extended bleach step prior to the acquisition of frames to determine localizations helped further reduce the probability of non-bleached fluorescent molecules.

      In short, due to the experimental design we do not expect much background. However, in the future, we will address this concern and estimate background in a subtype dependent manner. To this end we will distinguish two types of background noise: (A) background with a small change between subsequent frames, which mainly consists of auto-fluorescence and non-bleached out-of-focus fluorescent molecules; and (B) background that changes every imaging frame, which is mainly from non-bleached fluorescent molecules near the focal plane. For type (A) background, temporal filters must be used for background estimation [1]; for type (B) background, low-pass filters (e.g., wavelet transform) should be used for background estimation [2].

      (1) Hoogendoorn, Crosby, Leyton-Puig, Breedijk, Jalink, Gadella, and Postma (2014). The fidelity of stochastic single-molecule super-resolution reconstructions critically depends upon robust background estimation. Scientific reports, 4, 3854. https://doi.org/10.1038/srep03854

      (2) Patel, Williamson, Owen, and Cohen (2021). Blinking statistics and molecular counting in direct stochastic reconstruction microscopy (dSTORM). Bioinformatics, Volume 37, Issue 17, September 2021, Pages 2730–2737, https://doi.org/10.1093/bioinformatics/btab136

      (2) Since many of the findings in this work come from the evaluation of localisation clusters, an image showing actual localisations would help support the main conclusions. I believe that the dSTORM images in Figures 1 and 2 are density maps, although this was not explicitly stated. Alexa 568 and Alexa 647 typically give a very different number of localisations, and this is also dependent on the concentration of BME. Did the authors take that into account when interpreting the results and creating the model in Figures 2 and 8?

      I believe that including this information is important as findings in this paper heavily rely on the number of localisations detected under different conditions.

      Including information on proximity labelling and CRISPR/Cas9 in the methods section would help with the reproducibility of these findings by other groups.

      Figures 1 and 2 show Gaussian interpolations of actual localizations, not density maps. Imaging captured the fluorophores’ blinking events and localizations were counted as true localizations, when at least 5 consecutive blinking events had been observed. Nikon software was used for Gaussian fitting. In other words, we show reconstructed images based on identifying true localizations using gaussian fitting and some strict parameters to identify true fluorophore blinking. This allowed us to identify true localizations with high confidence and generate a high-resolution image for membrane receptors.

      Indeed, Alexa 568 and 647 give different numbers of localization. This is dependent on the intrinsic photo-physics of the fluorophores. Specifically, each fluorophore has a different duty cycle, switching cycle, and survival fraction. However, we note that we focused on capturing the relative changes in receptor numbers over time, before and after stimulation by ligands, not the absolute numbers of surface GHR and PRLR. We are not comparing the absolute numbers of localizations or drawing comparisons for localization numbers between 568 and 647. For all these different conditions/times, the photo-physics for a particular fluorophore remains the same. This allows us to make relative comparisons.

      As far as the effect of BME is concerned, the concentration of mercaptoethanol needs to be carefully optimized, as too high a concentration can potentially quench the fluorescence or affect the overall stability of the sample. However, we are using an optimized concentration which has been previously validated across multiple STORM experiments. This makes the concerns relating to the concentration of BME irrelevant to the current experimental design. Besides, the concentration of BME is maintained across all experimental conditions.

      We have added information regarding PL and CRISPR/Cas9 for generating hGHR KO and hPRLR KO cells in two new subsections to the Methods section.

      Reviewer #2 (Recommendations for the authors):

      In the methods please include:<br /> (1) A section with details on proximity ligation assays.

      We have added a description of the PL method to the Methods section.

      (2) A section on CRISPR/Cas9 technology.

      We have added two new sections on “Generating hGHR knockout and hPRLR knockout T47D cells” and “Design of sgRNAs for hGHR  or hPRLR knockout” to the Methods section.

      (3) List the precise composition of the buffer or cite the paper that you followed.

      We used the buffer recipe described in this protocol [1] and have added the components with concentrations as well as the following reference to the manuscript.

      (1) Beggs, R.R., Dean, W.F., Mattheyses, A.L. (2020). dSTORM Imaging and Analysis of Desmosome Architecture. In: Turksen, K. (eds) Permeability Barrier. Methods in Molecular Biology, vol 2367. Humana, New York, NY. https://doi.org/10.1007/7651_2020_325

      (4) Exposure time used for image acquisition to put 40 000 frames in the context of total imaging time and clarify why you decided to take 40 000 images per channel.

      Our Nikon Ti2 N-STORM microscope is equipped with an iXon DU-897 Ultra EMCCD camera from Andor (Oxford Instruments). According to the camera’s manufacturer, this camera platform uses a back-illuminated 512 x 512 frame transfer sensor and overclocks readout to 17 MHz, pushing speed performance to 56 fps (in full frame mode). We note that we always tried to acquire STORM images at the maximal frame rate. As for the exposure time, according to the manufacturer it can be as short as 17.8 ms. We would like to emphasize that we did not specify/alter the exposure time.

      See also: https://andor.oxinst.com/assets/uploads/products/andor/documents/andor-ixon-ultra-emccd-specifications.pdf

      The decision to take 40,000 images per frame was based on our intention to identify the true population of the molecules of interest that are localized and accurately represented in the final reconstruction image. The total number of frames depends on the sample complexity, density of sample labeling and desired resolution. We tested a range of frames between 20,000 and 60,000 and found for our experimental design and output requirements that 40,000 frames provided the best balance between achieving maximal resolution and desired localizations to make consistent and accurate localization estimates across different stimulation conditions compared to basal controls.

      (5) The lasers used to switch Alexa 568 and Alexa 647. Were you alternating between the lasers for switching and imaging of dyes? Intermittent and continuous illumination will produce very different unspecific background fluorescence.

      Yes, we used an alternating approach for the lasers exciting Alexa 647 and Alexa 568, for both switching and imaging of the dyes.

      (6) A paragraph with a detailed description of methods used to differentiate the background fluorescence from the signal.

      We have addressed the background fluorescence under Point 1 (Public Review). We have added a paragraph in the Methods section on this issue.

      (7) Minor corrections to the text:

      It appears as though there is a large difference in the expression level of GHR and PRLR in basal conditions in Figure 1. This can be due to the switching properties of the dyes, which is related to the amount of BME in the buffer, or it can be because there is indeed more PRL. Would the authors be able to comment on this?

      We thank the reviewer for this suggestions. According to expression data available online there is indeed more PRLR than GHR in T47D cells. According to CellMiner [1], T47D cells have an RNA-Seq gene expression level log2(FPKM + 1) of 6.814 for PRLR, and 3.587 for GHR, strongly suggesting that there is more PRLR than GHR in basal conditions, matching the reviewer’s interpretation of our images in Fig. 1 (basal). However, we would advise against using STORM images for direct comparisons of receptor expression. First, with TIRF images, we are only looking at the membrane fraction (~150 nm close to the coverslip membrane interface) that is attached to the coverslip. Secondly, as discussed above, our data represent relative cell surface receptor levels that allow for comparison of different conditions (basal vs. stimulation) and does not represent absolute quantifications. Everything is relative and in comparison to controls.

      Also, BME is not going to change the level of expression. The differences in growth factor expression as estimated by relative comparison can be attributed to the actual changes in growth factors and is not an artifact of the amount of BME in the buffer or the properties of dyes. These factors are maintained across all experimental conditions and do not influence the final outcome.

      (1) https://discover.nci.nih.gov/cellminer/

      (8) I would encourage the authors to use unspecific binding to characterize the signal coming from single antibodies bound to the substrate. This would provide a mean number of localizations that a single antibody generates. With this information, one can evaluate how many receptors there are per cluster, which would strengthen the findings and potentially provide additional support for the model presented in Figure 8. It would also explain why the distributions of localisations per cluster in Fig. 3B look very different for hGHR and hPRLR. As the authors point out in the discussion, the results on predimerization of these receptors in basal conditions are conflicting and therefore it is important to shed more light on this topic.

      We thank the reviewer for this suggestions. While we are unable to perform this experiment at this stage, we will keep it in mind for future experiments.

      (9) Minor corrections to the figures:

      Figure 1:

      In the legend, please say what representation was used. Are these density maps or another representation? Please provide examples of actual localisations (either as dots or crosses representing the peaks of the Gaussians). Most findings of this work rely on the characterisation of the clusters of localisations and therefore it is of essence to show what the clusters look like. This could potentially go to the supplemental info to minimise additional work. It's very hard to see the puncta in this figure.

      If the authors created zoomed regions in each of the images (as in Figure 3), it would be much easier to evaluate the expression level and the extent of colocalisation. Halfway through GHR 3 min green pixels become grey, but this may be the issue with the document that was created. Please check. Either increase the font on the scale bars in this figure or delete it.

      As described above, Figure 1 does not show density maps. Imaging captured the fluorophores’ blinking events and localizations were counted as true localizations, when at least 5 consecutive blinking events had been observed. Nikon software was used for Gaussian fitting and smoothing.

      We have generated zoomed regions. In our files (original as well as pdf) we do not see pixels become grey. We increased the font size above one of the scale bars and removed all others.

      Figure 3:

      In A, the GHR clusters are colour coded but PRLR are not. Are both DBSCN images? Explain the meaning of colour coding or show it as black and white. Was brightness also increased in the PRLR image? The font on the scale bars is too small. In B, right panels, the font on the axes is too small. In the figure legend explain the meaning of 33.3 and 16.7

      In our document, both GHR and PRLR are color coded but the hGHR clusters are certainly bigger and therefore appear brighter than the hPRLR clusters. Both are DBSCAN images. The color coding allows to distinguish different clusters (there is no other meaning). We have kept the color-coding but have added a sentence to the caption addressing this. Brightness was increased in both images of Panel B equally. 33.3 and 16.7 are the median cluster sizes. We have added a sentence to the caption explaining this. We have increased the font on the axes in B (right panels).

      Figure 4:

      I struggled to see any colocalization in the 2nd and the 3rd image. Please show zoomed-in sections. In the panels B and C, the data are presented as fractions. Is this per cell? My interpretation is that ~80% of PRL clusters also contain GHR.

      Is this in agreement with Figures 1 and 2? In Figure 1, PRL 3 min, Merge, colocalization seems much smaller. Could the authors give the total numbers of GHR and PRLR from which the fractions were calculated at least in basal conditions?

      We have provided zoom-in views. As for panels B and C, fractions are number of clusters containing both receptors divided by the total number of clusters. We used the same strategy that we had used for calculating the localization changes: We randomly selected 4 ROIs (regions of interest) per cell to calculate fractions and then calculated the average of three different cells from independently repeated experiments. We did not calculate total numbers of GHR/PRLR. The numbers are fractions of cluster numbers.

      Moreover, the reviewer interprets results in panels B and C that ~80% of PRLR clusters also contain GHR. We assume the reviewer refers to Basal state. Now, the reviewer’s interpretation is not correct for the following reason: ~80% of clusters have both receptors. How many of the remaining (~20%) clusters have only PRLR or only GHR is not revealed in the panels. Only if 100% of clusters have PRLR, we can conclude that 80% of PRLR clusters also contain GHR.

      Also, while Figures 1 and 2 show localization based on dSTORM images, Figure 3 indicates and quantifies co-localization based on proximity ligation assays following DBSCAN analysis using Clus-DoC. We do not think that the results are directly comparable.

      Reviewer #3 (Public Review):

      (1) The manuscript suffers from a lack of detail, which in places makes it difficult to evaluate the data and would make it very difficult for the results to be replicated by others. In addition, the manuscript would very much benefit from a full discussion of the limitations of the study. For example, the manuscript is written as if there is only one form of the PRLR while the anti-PRLR antibody used for dSTORM would also recognize the intermediate form and short forms 1a and 1b on the T47D cells. Given the very different roles of these other PRLR forms in breast cancer (Dufau, Vonderhaar, Clevenger, Walker and other labs), this limitation should at the very least be discussed. Similarly, the manuscript is written as if Jak2 essentially only signals through STAT5 but Jak2 is involved in multiple other signaling pathways from the multiple PRLRs, including the long form. Also, while there are papers suggesting that PRL can be protective in breast cancer, the majority of publications in this area find that PRL promotes breast cancer. How then would the authors interpret the effect of PRL on GHR in light of all those non-protective results? [Check papers by Hallgeir Rui]

      We thank the reviewer for such thoughtful comments. We have added a paragraph in the Discussion section on the limitations of our study, including sole focus on T47D and γ2A-JAK2 cells and lack of PRLR isoform-specific data. Also, we are now mentioning that these isoforms play different roles in breast cancer, citing papers by Dufau, Vonderhaar, Clevenger, and Walker labs.

      We did not mean to imply that JAK2 signals only via STAT5 or by only binding the long form. We have made this point clear in the Introduction as well as in our revised Discussion section. Moreover, we have added information and references on JAK2 signaling and PRLR isoform specific signaling.

      In our Discussions section we are also mentioning the findings that PRL is promoting breast cancer. We would like to point out that it is well perceivable that PRL is protective in BC by reducing surface hGHR availability but that this effect may depend on JAK2 levels as well as on expression levels of other kinases that competitively bind Box1 and/or Box2 [1]. Besides, could it not be that PRL’s effect is BC stage dependent? In any case, we have emphasized the speculative nature of our statement.

      (1) Chhabra, Y., Seiffert, P., Gormal, R.S., et al. Tyrosine kinases compete for growth hormone receptor binding and regulate receptor mobility and degradation. Cell Rep. 2023;42(5):112490. doi: 10.1016/j.celrep.2023.112490. PMID: 37163374.

      Reviewer #3 (Recommendations for the authors):

      Points for improvement of the manuscript:

      (1) Method details -

      a) "we utilized CRISPR/Cas9 to generate hPRLR knockout T47D cells ......" Exactly how? Nothing is said under methods. Can we be sure that you knocked out the whole gene?

      We have addressed this point by adding two new sections on “Generating hGHR knockout and hPRLR knockout T47D cells” and “Design of sgRNAs for hGHR or hPRLR knockout” to the Methods section.

      b) Some of the Western blots are missing mol wt markers. How specific are the various antibodies used for Westerns? For example, the previous publications are quoted as providing characterization of the antibodies also seem to use just band cutouts and do not show the full molecular weight range of whole cell extracts blotted. Anti-PRLR antibodies are notoriously bad and so this is important.

      There is an antibody referred to in Figure 5 that is not listed under "antibodies" in the methods.

      We have modified Figure 5a, showing the entire gel as well as molecular weight markers. As for specificity of our antibodies, we used monoclonal antibodies Anti-GHR-ext-mAB 74.3 and Anti-PRLR-ext-mAB 1.48, which have been previously tested and used. In addition, we did our own control experiments to ensure specificity. We have added some of our many control results as Supplementary Figures S2 and S3.

      We thank the reviewer for noticing the missing antibody in the Methods section. We have now added information about this antibody.

      c) There is no description of the proximity ligation assay.

      We have addressed this by adding a paragraph on PLA in the Methods section.

      d) What is the level of expression of GHR, PRLR, and Jak2 in the gamma2A-JAK2 cells compared to the T47D cells? Artifacts of overexpression are always a worry.

      γ2A-JAK2 cell series are over-expressing the receptors. That’s the reason we did not only rely on the observation in γ2A-JAK2 cell lines but also did the experiment in T47D cell lines.

      e) There are no concentrations given for components of the dSTORM imaging buffer. On line 380, I think the authors mean alternating lasers not alternatively.

      Thank you. Indeed, we meant alternating lasers. We are referring to [1] (the protocol we followed) for information on the imaging buffer.

      (1) Beggs, R.R., Dean, W.F., Mattheyses, A.L. (2020). dSTORM Imaging and Analysis of Desmosome Architecture. In: Turksen, K. (eds) Permeability Barrier. Methods in Molecular Biology, vol 2367. Humana, New York, NY. https://doi.org/10.1007/7651_2020_325

      f) In general, a read-through to determine whether there is enough detail for others to replicate is required. 4% PFA in what? Do you mean PBS or should it be Dulbecco's PBS etc., etc.?

      We prepared a 4% PFA in PBS solution. We mean Dulbecco's PBS.

      (2) There are no controls shown or described for the dSTORM. For example, non-specific primary antibody and second antibodies alone for non-specific sticking. Do the second antibodies cross-react with the other primary antibody? Is there only one band when blotting whole cell extracts with the GHR antibody so we can be sure of specificity?

      We used monoclonal antibodies Anti-GHR-ext-mAB 74.3 and Anti-PRLR-ext-mAB 1.48 (but also tested several other antibodies). While these antibodies have been previously tested and used, we performed additional control experiments to ensure specificity of our primary antibodies and absence of non-specific binding of our secondary antibodies. We have added some of our many control results as Supplementary Figures S2 and S3.

      (3) Writing/figures-

      a) As discussed in the public review regarding different forms of the PRLR and the presence of other Jak2-dependent signaling

      We have added paragraphs on PRLR isoforms and other JAK2-dependent signaling pathways to the Introduction. Also, we have added a paragraph on PRLR isoforms (in the context of our findings) to the Discussion section.

      b) What are the units for figure 3c and d?

      The figures show numbers of localizations (obtained from fluorophore blinking events). In the figure caption to 3C and 3D, we have specified the unit (i.e. counts).

      c) The wheat germ agglutinin stains more than the plasma membrane and so this sentence needs some adjustment.

      We thank the reviewer for this comment. We have rephrased this sentence (see caption to Fig. 4).

      d) It might be better not to use the term "downregulation" since this is usually associated with expression and not internalization.

      While we understand the reviewer’s discomfort with the use of the word “downregulation”, we still think that it best describes the observed effect. Moreover, we would like to note that in the field of receptorology “downregulation” is a specific term for trafficking of cell surface receptors in response to ligands. That said, to address the reviewer’s comment, we are now using the terms “cell surface downregulation” or “downregulation of cell surface [..] receptor” throughout the manuscript in order to explicitly distinguish it from gene downregulation.

      e) Line 420 talks about "previous work", a term that usually indicates work from the same lab. My apologies if I am wrong, but the reference doesn't seem to be associated with the authors.

      At the end of the sentence containing the phrase “previous work”, we are referring to reference [57], which has Dr. Stuart Frank as senior and corresponding author. Dr. Frank is also a co-corresponding author on this manuscript. While in our opinion, “previous work” does not imply some sort of ownership, we are happy to confirm that one of us was responsible for the work we are referencing.

      Reviewing Editor's recommendations:

      The reviewers have all provided a very constructive assessment of the work and offered many useful suggestions to improve the manuscript. I'd advise thinking carefully about how many of these can be reasonably addressed. Most will not require further experiments. I consider it essential to improve the methods to ensure others could repeat the work. This includes adding methods for the PLA and including detail about the controls for the dSTORM. The reviewers have offered suggestions about types of controls to include if these have not already been done.

      We thank the editor for their recommendations. We have revised the methods section, which now includes a paragraph on PLA as well as on CRISPR/Cas9-based generation of mutant cell lines. We have also added information on the dSTORM buffer to the manuscript. Data of controls indicating antibody specificity (using confocal microscopy) have been added to the manuscript’s supplementary material (see Fig. S2 and S3).

      I agree with the reviewers that the different isoforms of the prolactin receptor need to be considered. I think this could be done as an acknowledgment and point of discussion.

      We have revised the discussions section and have added a paragraph on the different PRLR isoforms, among others.

      For Figure 2E, make it clear in the figure (or at least in legend) that the middle line is the basal condition.

      We thank the editor for their comment. We have made changes to Fig 2E and have added a sentence to the legend making it clear that the middle depicts the basal condition.

      My biggest concern overall was the fact that this is all largely conducted in a single cell line. This was echoed by at least one of the reviewers. I wonder if you have replicated this in other breast cancer cell lines or mammary epithelial cells? I don't think this is necessary for the current manuscript but would increase confidence if available.

      We thank the editor for their comment and fully agree with their assessment. Unfortunately, we have not replicated these experiments in other BC cell lines nor mammary epithelial cells but would certainly want to do so in the near future.

    1. eLife Assessment

      In their valuable study, Lee et al. explore a role for the Hippo signaling pathway, specifically wts-1/LATS and the downstream regulator yap, in age-dependent neurodegeneration and microtubule dynamics using C. elegans mechanosensory neurons as a model. The authors demonstrate that disruption of wts-1/LATS leads to age-associated morphological and functional neuronal abnormalities, linked to enhanced microtubule stabilization, and show a genetic connection between yap and microtubule stability. Overall, the study employs robust genetic and molecular approaches to reveal a convincing link between the Hippo pathway, microtubule dynamics, and neurodegeneration.

    2. Joint Public Review:

      The Lee et al. study has been revised in response to reviewer comments. It presents a valuable investigation into the role of the Hippo signaling pathway (specifically wts-1/LATS and yap) in age-dependent neurodegeneration and microtubule dynamics in C. elegans TRNs. The authors convincingly demonstrated that disruption of wts-1/LATS leads to age-associated neuronal abnormalities and enhanced microtubule stabilization, with a genetic link to yap. While the study was praised for its well-conducted and well-controlled approaches, reviewers raised concerns about the specificity of the Hippo pathway's effects to TRNs, the correlation of Hpo signaling decline in TRNs with age, and the mechanistic link between Hpo-mediated gene expression and microtubule regulation. The authors addressed the TRN specificity by suggesting the unique microtubule structure of these neurons might contribute to their susceptibility. They acknowledged the difficulty in detecting Hpo signaling decline specifically in aged TRNs but noted increased YAP-1 nuclear localization in other tissues. Importantly, the authors provided evidence suggesting that YAP-TEAD-mediated transcriptional regulation is responsible for neuronal degeneration, as loss of yap-1 or egl-44 restored the wts-1 mutant phenotype. However, the specific transcriptional targets of YAP-1 regulating microtubule stability remain unidentified, representing a key limitation. The authors also discussed the possibility of non-cell-autonomous effects of YAP-1 and offered explanations for the seemingly moderate impairment of the touch response despite structural damage. Finally, they attributed the shorter lifespan of wts-1 and wts-1; yap-1 mutants to roles of wts-1 beyond TRNs and potential synergistic effects of yap-1. Overall, the study provides significant insights into the Hippo pathway's role in neuronal aging and microtubule dynamics, while acknowledging remaining mechanistic gaps.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of microtubule dynamics and its effects on neuronal aging. Using C. elegans as a model, the authors investigate the role of evolutionarily conserved Hippo pathway in microtubule dynamics of touch receptor neurons (TRNs) in an age-dependent manner. Using genetic, molecular, behavioral, and pharmacological approaches, the authors show that age-dependent loss of microtubule dynamics might underlie structural and functional aging of TRNs. Further, the authors show that the Hippo pathway specifically functions in these neurons to regulate microtubule dynamics. Specifically, authors show that hyperactivation of YAP-1, a downstream component of the Hippo pathway that is usually inhibited by the kinase activity of the upstream components of the pathway, results in microtubule stabilization and that might underlie the structural and functional decline of TRNs with age. However, how the Hippo pathway regulates microtubule dynamics and neuronal aging was not investigated by the authors.

      Strengths:

      This is a well-conducted and well-controlled study, and the authors have used multiple approaches to address different questions.

      Weaknesses:

      There are no major weaknesses identified, except that the effect of the Hippo pathway seems to be specific to only a subset of neurons. I would like the authors to address the specificity of the effect of the Hippo pathway in TRNs, in their resubmission.

      Although our genetic experiments, including TRNs-specific rescue/overexpression of YAP-1 and knockdown of WTS-1, strongly suggest that a cell-autonomous function of WTS-1-YAP-1 axis in TRNs, the Hpo pathway could have broader roles in neuroprotection. While this pathway may regulate microtubules stability in multiple neurons, other characteristics of TRNs, such as their anatomical localization near the cuticle or their long projections along body axis, could contribute to their susceptibilities to age-related deformation. Otherwise, the Hpo pathway may be truly TRNs-specific. TRNs have unique microtubules in both terms of composition and structure. Among nine α-, six β-tubulin genes in C. elegans, one α-tubulin (mec-12) and one β-tubulin (mec-7) showed highly enriched expression in TRNs [1, 2] and TRNs contain special 15-protofilament microtubule structure, while all other neurons in C. elegans have 11-protofilament microtubules [3]. Transcriptional regulation through YAP-1 may affect the specific microtubule structure of TRNs, leading to premature neuronal deformation. We have included this in the discussion section of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study examines a novel role of the Hpo signaling pathway, specifically of wts-1/LATS and the downstream regulator of gene expression, yap, in age-related neurodegeneration in C. elegans touch-responsive mechanosensory neurons, ALM and PLM. The study shows that knockdown or deletion of wts-1/LATS causes age-associated morphological abnormalities of these neurons, accompanied by functional loss of touch responsiveness. This is further associated with enhanced, abnormal, microtubule stabilization in these neurons.

      Strengths:

      This study examines a novel role of the Hpo signaling pathway, specifically of wts-1/LATS and the downstream regulator of gene expression, yap, in age-related neurodegeneration in C. elegans touch-responsive mechanosensory neurons, ALM and PLM. The study shows that knockdown or deletion of wts-1/LATS causes age-associated morphological abnormalities of these neurons, accompanied by functional loss of touch responsiveness. This is further associated with enhanced, abnormal, microtubule stabilization in these neurons. Strong pharmacological and especially genetic manipulations of MT-stabilizing or severing proteins show a strong genetic link between yap and regulation of MTs stability. The study is strong and uses robust approaches, especially strong genetics. The demonstrations on the aging-related roles of the Hpo signaling pathway, and the link to MTs, are novel and compelling. Nevertheless, the study also has mechanistic weaknesses (see below).

      Weaknesses:

      Specific comments:

      (1) The study demonstrates age-specific roles of the Hpo pathway, specifically of wts-1/LATS and yap, specifically in TRN mechanosensory neurons, without observing developmental defects in these neurons, or effects in other neurons. This is a strong demonstration. Nevertheless, the study does not address whether there is a correlation of Hpo signaling pathway activity decline specifically in these neurons, and not other neurons, and at the observed L4 stage and onwards (including the first day of adulthood, 1DA stage). Such demonstrations of spatio-temporal regulation of the Hpo signaling pathway and its activation seem important for linking the Hpo pathway with the observed age-related neurodegeneration. Can this age-related response be correlated to indeed a decline in Hpo signaling during adulthood? Especially at L4 and onwards? It will be informative to measure this by examining the decline in wts1 as well as yap levels and yap nuclear localization.

      As described above, we have included possible explanations for the specificity of the Hpo pathway in TRNs. Since components of the Hpo pathway are expressed in various tissues, including the intestine and hypodermis, this pathway could have broader neuroprotective roles across multiple neurons. Alternatively, it could function in TRNs. Given that the TRNs possess unique microtubules in both structure and composition, and that Hpo pathway has crucial roles in microtubule stability regulation, the roles of the Hpo pathway may indeed be TRNs-specific. As we described in the manuscript, our observations, along with those of others, indicate that neuronal deformation of TRNs begins around the 4th day of adulthood. Additionally, the degree of morphological deformation in wts-1 mutants at the L4 stage is comparable to that of aged wild-type worms on the 15th day of adulthood. Therefore, to assess the functional decline of WTS-1 or nuclear localization of YAP-1, observations should begin in 4-day-old animals. Using fluorescence-tagged YAP-1 under the mec-4 promoter, we couldn’t detect a significant increase in nuclear YAP-1 in TRNs of 4-day-old adult. Additionally, we were unable to assess YAP-1 intercellular localization in older animals, such as 10-day-old animals, possibly due to the small cell size of neurons or morphological alteration along with aging of TRNs. Although we did not detect functional decline of WTS-1 or increased nuclear YAP-1 in TRNs, nuclear localization of YAP-1 increases with age in other tissues, such as the intestine and hypodermis (Author response image. 1). This may result from inactivation of the Hippo (Hpo) pathway, an indirect consequence of structural and functional decline—such as tissue stiffness associated with aging—or a combination of both. Additionally, given that morphological deformation of TRNs appears to begin around fourth day of adulthood, nuclear localization of YAP-1 in the intestine and hypodermis seems to have a later onset and be more moderate. It is possible that YAP-1 nuclear localization in TRNs occurs earlier or that other factors contribute early-stage touch neuronal deformation.

      Author response image 1.

      Quantification of the proportion of worms exhibiting nuclear localization of YAP-1. We used GFP-tagged YAP-1 driven by its own 4 kb promoter. A total of 90 animals were observed each day.

      (2) The Hpo pathway eventually activates gene expression via yap. Although the study uses robust genetic manipulations of yap and wts-1/LATS, it is not clear whether the observed effects are attributed to yap-mediated regulation of gene expression (see 3).

      Given that the neuronal deformation in the wts-1 mutant was completely restored by the loss of yap-1 or egl-44, it strongly suggests that YAP-TEAD-mediated transcriptional regulation is responsible for the premature neuronal degeneration of the wts-1 mutant. However, in this study, we were unable to identify specific transcriptional target genes associated with these phenomena, which represents a limitation of our research (please see below).

      (3) The observations on the abnormal MT stabilization, and the subsequent genetic examinations of MT-stability/severing genes, are a significant strength of the study. Nevertheless, despite the strong genetic links to yap and wts-1/LATS, it is not clear whether MT-regulatory genes are regulated by transcription downstream of the Hpo pathway, thus not enabling a strong causal link between MT regulation and Hpo-mediated gene expression, making this strong part of the study mechanistically circumstantial. Specifically, it will be good to examine whether the genes addressed herein, for example, Spastin, are transcriptionally regulated downstream of the Hpo pathway. This comment is augmented by the finding that in the wts-1/ yap-1 double mutants, MT abnormality, and subsequent neuronal morphology and touch responses are restored, clearly indicating that there is an associated transcriptional regulation

      If the target genes of YAP-1 are not identified, it will be difficult to fully understand how YAP-1 regulates microtubule stability. Microtubule-stabilizing genes, whose knockdown alleviates wts-1 mutant neuronal deformation, could be potential transcriptional targets of YAP-1. Among these genes, PTRN-1 and DLK-1 contain MCAT sequences (CATTCCA/T), a well-conserved DNA motif recognized by the TEAD transcription factor, in their promoters near the transcription start site (TSS). We hypothesized that the expression of fluorescence-tagged reporters of promoter regions containing these MCAT sequences would be enhanced in the absence of wts-1 activity. Although both reporters were expressed in TRNs, they did not show significant changes in the wts-1 mutant background. We also focused on spv-1, a worm homolog of ARHGAP29, which negatively regulates RhoA. YAP is known to modulate actin cytoskeleton rigidity through transcriptional regulation of ARHGAP29 [4]. The promoter of spv-1 contains 2 MCAT sequences and loss of spv-1 mitigated neuronal deformation of the wts-1 mutant. However, reporters of promoter regions containing MCAT sequences only weakly expressed in the process of TRNs. More importantly, ectopic expression of dominant-negative form of rho-1/rhoA did not lead to significant deformation of TRNs. While YAP typically functions as a transcriptional co-activator, it has also been reported to repress target gene expression, such as DDIT4 and Trail, in collaborated with TEAD transcriptional factor [5].  As a reviewer pointed out, spas-1 might be transcriptionally repressed by yap-1, given that its loss leads to premature deformation of TRNs. However, since the phenotype of the spas-1 mutant has a later onset than the wts-1 mutant and is relatively restricted to ALM, we excluded it from our candidate gene search. Despite extensive genetic approaches, we were unable to establish a strong causal link between YAP-1 and the regulation of microtubule stability. Unbiased screenings, such as tissue-specific transcriptome analysis, may help address the remaining questions. We have outlined the limitations of this study in the discussion section of the revised manuscript.

      Other comments:

      (1) The TRN-specific knockdown of wts-1 and yap-1 is a clear strength. Nevertheless, these do not necessarily show cell-autonomous effects, as the yap transcription factor may regulate the expression of external cues, secreted or otherwise, thus generating non-cell autonomous effects. For example, it is known that yap regulates TGF-beat expression and signaling.

      In the absence of LATS1/2 activity, activated YAP has been reported to drive biliary epithelial cell lineage specification by directly regulating TGF-β transcription during and after liver development [6]. Even when functioning in an autocrine manner, TGF-β can exhibit non-cell autonomous effects. While it primarily acts on the same cell that secretes it, some molecules may also affect neighboring cells, leading to paracrine effects. Additionally, TGF-β can modify the extracellular matrix (ECM), indirectly affecting surrounding cells. Similarly, if YAP regulates transcription of secretory protein in TRNs, the resulting extracellular factors or surrounding cells may influence touch neuronal microtubules in a non-cell-autonomous manner. Although our genetic data strongly suggest a cell-autonomous function of WTS-1-YAP-1 in TRNs, we could not exclude the possibility that YAP-1 functions non-cell-autonomously, as we were unable to identify its transcriptional targets. We have included this in the discussion section of the revised manuscript.

      (2) Continuing from comment (3) above, it seems that many of the MT-regulators chosen here for genetic examinations were chosen based on demonstrated roles in neurodegeneration in other studies. It would be good to show whether these MT-associated genes are directly regulated by transcription by the Hpo pathway.

      As we described above, several MT-associated genes­­, such as ptrn-1, dlk-1 and spv-1, contain MCAT sequences in their promoter and their knockdown alleviated wts-1-induced neuronal deformation. These genes were tested to determine whether they were directly regulated by WTS-1-YAP-1. Based on our findings, we concluded that they were unlikely to be regulated by the Hpo pathway in TRNs.

      (3) The impairment of the touch response may not be robust: it is only a 30-40% reduction at L4, and even less reduction at 1DA. It would be good to offer possible explanations for this finding.

      As pointed out by the reviewer, the impairment of touch responses of wts-1 mutants showed an approximately 33% reduction at both L4 and 1DA compared to age-matched wild-type animals. At the L4 stage, control worms responded to nearly every gentle touch (94%), whereas wts-1 mutants responded to only 60% of stimuli. By 1DA, control worms exhibited slightly decline in touch responses compared to L4 (82.5%), whereas wts-1 mutants displayed more pronounced impairment (55.7%) (Fig 1E). Regarding the severity and frequency of structural degeneration of wts-1 mutant at both stages, it appears to be relatively moderate. As we noted in the manuscript, our observations, along with those of others, indicate that structural abnormalities in ALM and PLM neurons begin to appear around the fourth day of adulthood and progressively worsen as the worms age [7]. In a previous study, Tank et al. categorized day 10-aged worms into two groups based on their movement ability and then assessed structural deformation in each animal to determine whether structural and functional degeneration of TRNs were correlated. In this same group of animals, they examined the gentle touch response and found that animals responded to gentle touch 46 ± 5.1 %, 84 ± 12.2 %, respectively [8]. It could be said that, on average, day 10 animals had 65% touch response on average, which is consistent with our observation in day 10 animals (Fig. 5E, 56.3%). Given these observations, the function of TRNs of wts-1 mutant or aged animals appears to be preserved despite severe structure failures. The gentle touch response evokes an escape behavior in which animals quickly move away from the stimulus; thus proper touch responses are essential for avoiding predators and ensuring survival. It has been reported to be necessary for evading fungal predation, such as escaping from a constricting hyphal ring [9]. Given that the gentle touch response is crucial for survival, its function is likely well preserved despite structural abnormalities, such as age-related deformation.

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) Why is the effect of the Hippo pathway on microtubule dynamics specific to TRNs? Is it the structure of TRNs that makes them prone to the effects of age-dependent decline in microtubule dynamics? The authors are advised to discuss it in their resubmission.

      As described above, we have included possible explanations for the tissue specificity of the Hpo pathway in TRNs and the vulnerability of TRNs to age-associated decline in the discussion section of the revised manuscript.

      (2) The authors are advised to explain the shorter life span of wts-1; yap-1 double mutants (with restored TRNs) compared to wts-1 single mutants in Figure 2F. The life span of yap-1 single mutants should be included in Figure 2F. Further, based on the data, the shorter lifespan of wts-1 mutants cannot be attributed to abnormal TRNs as the lifespan of wts-1; yap-1 double mutants is even shorter. The authors are advised to explain the shorter life span of wts-1 mutants compared to wild-type controls.

      wts-1 is known to be involved in various developmental processes, including the maintenance of apicobasal polarity in the intestine, growth rate control, and dauer formation [10-12]. Since WTS-1 activity is restored in the intestine of the mutant used for lifespan measurement, the shorter lifespan of the wts-1 mutant may result from the loss of WTS-1 in tissues other than the intestine. Although we were unable to include lifespan data for the yap-1 mutant, recent studies indicate that the yap-1(tm1416) mutant or yap-1 RNAi treated worms exhibit a shortened lifespan [13, 14]. Thus, our data showing a slightly shorter lifespan of the wts-1; yap-1 mutant compared with the wts-1 mutant may result from the synergistic action of yap-1 and yap-1-independent downstream factors of wts-1. While this study does not provide an explanation for the shortened lifespan of wts-1 or wts-1; yap-1 mutants, the fact that the wts-1; yap-1 double mutant with restored TRNs still have a shorter lifespan compared with the wts-1 mutant strongly suggests that premature deformation of the wts-1 neurons appear to be a touch neuron-specific event, rather than being associated with whole body, as described in the manuscript..

      Minor comments:

      (1) In the abstract, please provide definitions for LATS and YAP. Authors can mention that LATS is a kinase and YAP a transcriptional co-activator in the Hippo pathway.

      (2) In the last paragraph on page 9, change "these function" to "this function", and change "knock-downed" to "knocked down".

      (3) On page 10, paragraph 2, change "regarding the action mechanism" to "regarding the mechanism of action".

      (4) On page 11, paragraph 1, change "endogenous WTS-1 could inhibits" to "endogenous WTS-1 could inhibit".

      (5) On page 16, paragraph 1, change "consistent to the hypothesis" to "consistent with this hypothesis".

      (6) Overall, the paper is well written. However, there is still room to improve the language and diction used by the authors.

      We have revised all minor comments suggested by the reviewer in the revised manuscript.

      References

      (1) Hamelin M, Scott IM, Way JC, Culotti JG. The mec-7 beta-tubulin gene of Caenorhabditis elegans is expressed primarily in the touch receptor neurons. EMBO J. 1992;11(8):2885-93. Epub 1992/08/01. doi: 10.1002/j.1460-2075.1992.tb05357.x. PubMed PMID: 1639062; PubMed Central PMCID: PMCPMC556769.

      (2) Fukushige T, Siddiqui ZK, Chou M, Culotti JG, Gogonea CB, Siddiqui SS, et al. MEC-12, an alpha-tubulin required for touch sensitivity in C. elegans. J Cell Sci. 1999;112 ( Pt 3):395-403. Epub 1999/01/14. doi: 10.1242/jcs.112.3.395. PubMed PMID: 9885292.

      (3) Chalfie M, Thomson JN. Structural and functional diversity in the neuronal microtubules of Caenorhabditis elegans. J Cell Biol. 1982;93(1):15-23. Epub 1982/04/01. doi: 10.1083/jcb.93.1.15. PubMed PMID: 7068753; PubMed Central PMCID: PMCPMC2112106.

      (4) Qiao Y, Chen J, Lim YB, Finch-Edmondson ML, Seshachalam VP, Qin L, et al. YAP Regulates Actin Dynamics through ARHGAP29 and Promotes Metastasis. Cell Rep. 2017;19(8):1495-502. Epub 2017/05/26. doi: 10.1016/j.celrep.2017.04.075. PubMed PMID: 28538170.

      (5) Kim M, Kim T, Johnson RL, Lim DS. Transcriptional co-repressor function of the hippo pathway transducers YAP and TAZ. Cell Rep. 2015;11(2):270-82. Epub 2015/04/07. doi: 10.1016/j.celrep.2015.03.015. PubMed PMID: 25843714.

      (6) Lee DH, Park JO, Kim TS, Kim SK, Kim TH, Kim MC, et al. LATS-YAP/TAZ controls lineage specification by regulating TGFbeta signaling and Hnf4alpha expression during liver development. Nat Commun. 2016;7:11961. Epub 2016/07/01. doi: 10.1038/ncomms11961. PubMed PMID: 27358050; PubMed Central PMCID: PMCPMC4931324.

      (7) Toth ML, Melentijevic I, Shah L, Bhatia A, Lu K, Talwar A, et al. Neurite sprouting and synapse deterioration in the aging Caenorhabditis elegans nervous system. J Neurosci. 2012;32(26):8778-90. Epub 2012/06/30. doi: 10.1523/JNEUROSCI.1494-11.2012. PubMed PMID: 22745480; PubMed Central PMCID: PMCPMC3427745.

      (8) Tank EM, Rodgers KE, Kenyon C. Spontaneous age-related neurite branching in Caenorhabditis elegans. J Neurosci. 2011;31(25):9279-88. Epub 2011/06/24. doi: 10.1523/JNEUROSCI.6606-10.2011. PubMed PMID: 21697377; PubMed Central PMCID: PMCPMC3148144.

      (9) Maguire SM, Clark CM, Nunnari J, Pirri JK, Alkema MJ. The C. elegans touch response facilitates escape from predacious fungi. Curr Biol. 2011;21(15):1326-30. Epub 2011/08/02. doi: 10.1016/j.cub.2011.06.063. PubMed PMID: 21802299; PubMed Central PMCID: PMCPMC3266163.

      (10) Cai Q, Wang W, Gao Y, Yang Y, Zhu Z, Fan Q. Ce-wts-1 plays important roles in Caenorhabditis elegans development. FEBS Lett. 2009;583(19):3158-64. Epub 2009/09/10. doi: 10.1016/j.febslet.2009.09.002. PubMed PMID: 19737560.

      (11) Kang J, Shin D, Yu JR, Lee J. Lats kinase is involved in the intestinal apical membrane integrity in the nematode Caenorhabditis elegans. Development. 2009;136(16):2705-15. Epub 20090715. doi: 10.1242/dev.035485. PubMed PMID: 19605499.

      (12) Lee H, Kang J, Ahn S, Lee J. The Hippo Pathway Is Essential for Maintenance of Apicobasal Polarity in the Growing Intestine of Caenorhabditis elegans. Genetics. 2019;213(2):501-15. Epub 20190729. doi: 10.1534/genetics.119.302477. PubMed PMID: 31358532; PubMed Central PMCID: PMCPMC6781910.

      (13) Teuscher AC, Statzer C, Goyala A, Domenig SA, Schoen I, Hess M, et al. Longevity interventions modulate mechanotransduction and extracellular matrix homeostasis in C. elegans. Nat Commun. 2024;15(1):276. Epub 2024/01/05. doi: 10.1038/s41467-023-44409-2. PubMed PMID: 38177158; PubMed Central PMCID: PMCPMC10766642.

      (14) Saul N, Dhondt I, Kuokkanen M, Perola M, Verschuuren C, Wouters B, et al. Identification of healthspan-promoting genes in Caenorhabditis elegans based on a human GWAS study. Biogerontology. 2022;23(4):431-52. Epub 2022/06/25. doi: 10.1007/s10522-022-09969-8. PubMed PMID: 35748965; PubMed Central PMCID: PMCPMC9388463.

    1. eLife Assessment

      This important study aims to understand the function of ProSAP-interacting protein 1 (Prosapip1) in the brain. Using a conditional Prosapip1 KO mouse (floxed prosapip1 crossed with Syn1-Cre line), the authors performed analysis including protein biochemistry, synaptic physiology, and behavioral learning. Convincing evidence from this study supports a role of Prosapip 1 in synaptic protein composition, synaptic NMDA responses, LTP, and spatial memory.

    2. Reviewer #1 (Public review):

      Summary:

      Summary of what author's were trying to achieve: In the manuscript by Hoisington et al., the authors utilized a novel conditional neuronal prosap2-interacting protein 1 (Prosapip1) knockout mouse to delineate the effects of both neuronal and dorsal hippocampal (dHP)-specific knockout of Prosapip1 impacts biochemical and electrophysiological neuroadaptations within the dHP that may mediate behaviors associated with this brain region.

      Strengths:

      (1) Methodological Strengths

      a) The generation and use of a conditional neuronal knockout of Prosapip1 is a strength. These mice will be useful for anyone interested in studying or comparing and contrasting the effects of loss of Prosapip1 in different brain regions or in non-neuronal tissues.<br /> b) The use of biochemical, electrophysiological, and behavioral approaches are a strength. By providing data across multiple domains, a picture begins to emerge about the mechanistic role for Prosapip1. While questions still remain, the use of the 3 domains is a strength.<br /> c) The use of both global, constitutive neuronal loss of Prosapip1 and postnatal dHP-specific knockout of Prosapip1 help support and validate the behavioral conclusions.

      (2) Strengths of the results

      a) It is interesting that loss of Prosapip1 leads to specific alterations in the expression of GluN2B and PSD95 but not GluA1 or GluN2A in a post homogenization fraction that the author's term a "synaptic" fraction. Therefore, these results suggest protein-specific modulation of glutamatergic receptors within a "synaptic" fraction.<br /> b) The electrophysiological data demonstrate an NMDAR-dependent alteration in measures of hippocampal synaptic plasticity, including long-term potentiation (LTP) and NMDAR input/output. These data correspond with the biochemical data demonstrating a biochemical effect on GluN2B localization. Therefore, the conclusion that loss of Prosapip1 influences NMDAR function is well supported.<br /> c) The behavioral data suggest deficits in memory in particular novel object recognition and spatial memory, in the Prosapip1 knockout mice. These data are strongly bolstered by both the pan neuronal knockout and the dHP Cre transduction.

      The authors highlight potential future studies to further the understanding of Prosapip1.

    3. Reviewer #2 (Public review):

      The authors provide valuable findings characterizing a Prosapip1 conditional knockout mouse and the effects of knockout on hippocampal excitatory transmission, NMDAR transmission, and several learning behaviors. Furthermore, the authors selectively and conditionally knockout Prosapip1 in the dorsal hippocampus and show that it is required for the same spatial learning and memory assessed in the conditional knockout mice. The study uncovers how Prosapip1 is involved PSD organization and is a functional and critical player in dorsal Hippocampal LTP via its interaction with GluN2B subunits. The study is well controlled, detailed, and data in the paper match the conclusions.

      Comments on revisions:

      The authors have addressed all concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      The biochemical fractionation and use of the term "synaptic" were my biggest issues. I would recommend using a more targeted approach to measure the PSD or compare and contrast synaptic from extrasynaptic. For instance, PMID 16797717 does a PSD purification, whereas other papers have fractionated extrasynaptic from synaptic. Moreover, a PSD95 immunoprecipitation may be of interest as one question that could arise is since you see decreases in PSD95 GluN2B, but not 2A or GluA1, could the association of PSD95 with the different proteins be altered? To evaluate this, proteomics or some other unbiased methodology could enhance an understanding of the full panoply of changes induced by Prosapip1 within the dHP.

      The reviewer makes value points; however, this is a large endeavor, which we will address in future experiments.

      There seems to be a missed opportunity to really determine how Prosapip1 is influencing protein expression and/or phosphorylation at the PSD.

      There is no indication that Prosapip1 is linked to transcription or translation machinery; therefore, we don’t see the value of examining protein expression in this context. Phosphorylation is a broad term, and although this can be answered through phosphoproteomics, this is outside the scope of this study.

      At the very least, additional discussion within this realm would help the reader contextualize the biochemical data.

      Further studies are needed to determine the mechanism by which Prosapip1 controls the localization of PSD95, GlunN2B, and potentially others. It is plausible that posttranslational modifications are responsible for Prosapip1 function. For example, the Prosapip1 sequence contains a potential glycosylation site (Ser622), and several potential phosphorylation sites (https://glygen.org/protein/O60299#Glycosylation, https://www.phosphosite.org/proteinAction.action?id=18395&showAllSites=true#appletMsg). These posttranslational modifications can contribute to the stabilization of the synaptic localization of GluN2B and PSD95.

      We added to the discussion the paragraph above as well as the caveat that proteomic studies are needed for a comprehensive study of the role of Prosapip1 in the PSD.

      Weaknesses:

      (1) Methodological Weaknesses

      a. The synapsin-Cre mice may more broadly express Cre-recombinase than just in neuronal tissues. Specifically, according to Jackson Laboratories, there is a concern with these mice expressing Cre-recombinase germline. As the human protein atlas suggests that Prosapip1 protein is expressed extraneuronally, validation of neuron or at least brain-specific knockout would be helpful in interpreting the data. Having said that, the data demonstrating that the brain region-specific knockout has similar behavioral impacts helps alleviate this concern somewhat; however, there are no biochemical or electrophysiological readouts from these animals, and therefore an alternative mechanism in this adult knockout cannot be excluded.

      This is a valuable insight from the reviewer, especially considering the information from Jackson Laboratories. As mentioned in the paper, we exclusively used female Syn1-Cre carrying breeders to avoid germline recombination. Furthermore, we consistently assessed the prevalence of the Prosapip1 flox sites alongside the presence of Syn1-Cre with our regular litter genotyping, confirming the presence of Prosapip1. Additionally, Prosapip1 protein expression was directly examined in rats in Wendholdt et al., 2006, where this group reported that Prosapip1 is a brain-specific protein, minimizing the potential consequences of a peripheral loss of Prosapip1. In addition, to confirm that Prosapip1 is a brain-specific protein in mice, we performed a western blot analysis on the dorsal hippocampus, liver, and kidney of a C57BL/6 mouse (Author response image 1), and found that Prosapip1 protein is not found in these peripheral organs, aligning with the findings in rats reported by Wendholdt et al.

      Author response image 1. Prosapip1 protein in the dorsal hippocampus, liver, and kidney of C57BL/6 mice.

      b. The use of the word synaptic and the crude fractionation make some of the data difficult to interpret/contextualize. It is unclear how a single centrifugation that eliminates the staining of a nuclear protein can be considered a "synaptic" fraction. This is highlighted by the presence of GAPDH in this fraction which is a cytosolically-enriched protein. While GAPDH may be associated with some membranes it is not a synaptic protein. There is no quantification of GAPDH against total protein to validate that it is not enriched in this fraction over control. Moreover, it should not be used as a loading control in the synaptic fraction. There are multiple different ways to enrich membranes, extrasynaptic fractions, and PSDs and a better discussion on the caveats of the biochemical fractionation is a minimum to help contextualize the changes in PSD95 and GluN2B.

      We apologize for the confusion. As we described in the methods section, the crude synaptosome was isolated by several centrifugations as depicted in the figure which we are now including in the manuscript. As shown in Extended Figure 2, the P2 fraction does contain PSD-95 and synapsin, as well as GluN2B, GluN2A, and GluA1; however, it does not contain the transcription factor CREB, indicating the isolation of the crude synaptosomal fraction. As shown in the figure, a small amount of GAPDH is present in the crude synaptosomal fraction. The presence of GAPDH in the crude synaptosomal fraction has been previously reported in (Atsushi et al., 2003; Lee et al. 2016; Wang et al. 2012). As we have added to the discussion, there remains a caveat that we cannot differentiate the pre- and post-synaptic fraction, and as a result we do not know if Prosapip1 plays a role in the assembly of axonal proteins.

      c. Also, the word synaptosomal on page 7 is not correct. One issue is this is more than synaptosomes and another issue is synaptosomes are exclusively presynaptic terminals. The correct term to use is synaptoneurosome, which includes both pre and postsynaptic components. Moreover, as stated above, this may contain these components but is most likely not a pure or even enriched fraction.

      Since we cannot exclude the possibility that Prosapip1 is also expressed in glia, we do not believe that the term synaptoneurosome is accurate.

      d. The age at which the mice underwent injection of the Cre virus was not mentioned.

      We apologize for the oversight. As now noted in the methods, the mice used for experiments underwent surgery to infect neurons with the AAV-GFP or AAV-Cre viruses between 5 and 6 weeks of age to ensure full viral expression by the experimental window beginning at 8 weeks old.

      (2) Weaknesses of Results

      a. There were no measures of GluN1 or GluA2 in the biochemical assays. As GluN1 is the obligate subunit, how it is impacted by the loss of Prosapip1 may help contextualize the fact that GluN2B, but not GluN2A, is altered. Moreover, as GluA2 has different calcium permeance, alterations in it may be informative.

      Since we detect NMDAR current, which requires the obligatory subunit GluN1 and at least one GluN2 subunit (GluN2A, GluN2B, GluN2C, GluN2D), we did not see the rationale behind examining the level of GluN1 in the Prosapip1 knockout mice.

      b. While there was no difference in GluA1 expression in the "synaptic" fraction, it does not mean that AMPAR function is not impacted by the loss of Prosapip1. This is particularly important as Prosapip1 may interact with kinases or phosphatases or their targeting proteins. Therefore, measuring AMPAR function electrophysiologically or synaptic protein phosphorylation would be informative.

      We agree with the reviewer that the loss of Prosapip1 could potentially impact AMPAR function. To address this, we measured spontaneous excitatory postsynaptic currents (sEPSCs) in hippocampal pyramidal neurons from both Prosapip1(flx/flx);Syn1-Cre(-) and Prosapip1(flx/flx);Syn1-Cre(+) mice. Given that neurons were voltage-clamped at -70 mV and extracellular Mg<sup>2+</sup> was maintained at 1.3 mM, the sEPSCs we recorded were primarily mediated by AMPARs.

      We found no significant differences in either the frequency or amplitude of these AMPA-mediated sEPSCs between Prosapip1(flx/flx);Syn1-Cre(-) and Prosapip1(flx/flx);Syn1-Cre(+) mice, suggesting that AMPAR function in hippocampal pyramidal neurons is not noticeably affected by the loss of Prosapip1 (see Author response image 2 below).

      Author response image 2. Comparison of hippocampal sEPSCs between Prosapip1(flx/flx); Syn1-Cre(-) (Cre(-)) and Prosapip1(flx/flx);Syn1-Cre(+) (Cre(+)) mice. sEPSCs were recorded in the presence of 1.3 mM Mg²⁺ and 0.1 mM picrotoxin, with neurons clamped at -70 mV. (A) Sample sEPSC traces from Prosapip1(flx/flx); Syn1-Cre(-) (top) and Prosapip1(flx/flx); Syn1-Cre(+) (bottom) mice. (B, C) Bar graphs showing no significant differences in sEPSC frequency (B) or amplitude (C) between Prosapip1(flx/flx); Syn1-Cre(-)and Prosapip1(flx/flx); Syn1-Cre(+) mice. Statistical analysis was performed using an unpaired t-test; p > 0.05, n.s. (not significant). Data represent 11 neurons from 3 Prosapip1(flx/flx); Syn1-Cre(-) mice (11/3) and 8 neurons from 3 Prosapip1(flx/flx); Syn1-Cre(+) mice (8/3).

      c. There is a lack of mechanistic data on what specifically and how GluN2B and PSD95 expression is altered. This is due to some of the challenges with interpreting the biochemical fractionation and a lack of results regarding changes in protein posttranslational modifications.

      See response above.

      d. The loss of social novelty measures in both the global and dHP-specific Prosapip1 knockout mice were not very robust. As they were consistently lost in both approaches and as there were other consistent memory deficits, this does not impact the conclusions, but may be important to temper discussion to match these smaller deficits within this domain.

      There is a clear difference between the Prosapip1(flx/flx);Syn1-Cre(-) and Prosapip1(flx/flx);Syn1-Cre(+) mice as well as the AAV-GFP and AAV-Cre mice in the loss of social novelty metric. We have emphasized that the Prosapip1(flx/flx);Syn1-Cre(+) mice and AAV-Cre mice do not recognize social novelty, which is supported by the statistics.

      4E: Two-way ANOVA: Effect of Social Novelty F<sub>(1,20)</sub> = 17.60, p = 0.0002; Post hoc Familiar vs. Novel (Cre(-)) p = 0.0008, Familiar vs. Novel (Cre(+)) p = 0.1451.

      5I: Two-way ANOVA: Effect of Social Novelty F<sub>(1,31)</sub> = 9.777, p = 0.0038; Post hoc Familiar vs. Novel (AAV-GFP) p = 0.0303, Familiar vs. Novel (AAV-Cre) p = 0.1319.

      e. Alterations in presynaptic paired-pulse ratio measures are intriguing and may point to a role for Prosapip1 in synapse development, as discussed in the manuscript. It would be interesting to delineate if these PPR changes also occur in the adult knockout to help detail the specific Prosapip1-induced neuroadaptations that link to the alterations in novelty-induced behaviors.

      This interesting question will be addressed in future studies.

      Reviewer #2 (Recommendations for the authors):

      (1) The test statistics are required for each experiment for completeness. Currently, only p-values, tests used, and N are included.

      The entirety of the statistical information can be found in TYable 1, including test statistics and degrees of freedom (see Column 7, ‘Result’).

      (2) The authors claim that the function of Prosapip1 is not known in vivo, yet detail a study in the NAc where they investigated its function in vivo. The wording or discussion around what is and is not known should be altered to reflect this.

      The reviewer is correct to point to our previous manuscript (Laguesse et al. Neuron. 2017.) in which we found that Prosapip1 is important in mechanisms underlying alcohol-associated molecular, cellular and behavioral adaptations. However, these findings are specific to alcohol-related paradigms. Since the normal physiological role of Prosapip1 has never been delineated, this study was aimed to start addressing this gap in knowledge.

      References

      Wang, M., Li, S., Zhang, H. et al. Direct interaction between GluR2 and GAPDH regulates AMPAR-mediated excitotoxicity. Mol Brain 5, 13 (2012). https://doi.org/10.1186/1756-6606-5-13

      Atsushi Ikemoto, David G. Bole, Tetsufumi Ueda, Glycolysis and Glutamate Accumulation into Synaptic Vesicles: Role of Glyceraldehyde Phosphate Dehydrogenase and 3-Phosphoglycerate Kinase, Journal of Biological Chemistry, 8, 278 (2003). https://doi.org/10.1074/jbc.M211617200.

      Lee, F., Su, P., Xie, YF. et al. Disrupting GluA2-GAPDH Interaction Affects Axon and Dendrite Development. Sci Rep 6, 30458 (2016). https://doi.org/10.1038/srep30458

    1. eLife Assessment

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements, and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning.

      Strengths:

      The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established a neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods.

      The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%.

      Weaknesses:

      A formal analysis and quantification of how head movement may have contributed to the results should be included in the paper or supplemental material. The type of correlated head movements coming from vigorous key presses aren't necessarily visible to the naked eye, and even if arms etc are restricted, this will not preclude shoulder, neck or head movement necessarily; if ICA was conducted, for example, the authors are in the position to show the components that relate to such movement; but eye-balling the data would not seem sufficient. The related issue of eye movements is addressed via classifier analysis. A formal analysis which directly accounts for finger/eye movements in the same analysis as the main result (ie any variance related to these factors) should be presented.

      This reviewer recommends inclusion of a formal analysis that the intra-vs inter parcels are indeed completely independent. For example, the authors state that the inter-parcel features reflect "lower spatially resolved whole-brain activity patterns or global brain dynamics". A formal quantitative demonstration that the signals indeed show "complete independence" (as claimed by the authors) and are orthogonal would be helpful

    3. Reviewer #2 (Public review):

      Summary:

      The current paper consists of two parts. The first part is the rigorous feature optimization of the MEG signal to decode individual finger identity performed in a sequence (4-1-3-2-4; 1~4 corresponds to little~index fingers of the left hand). By optimizing various parameters for the MEG signal, in terms of (i) reconstructed source activity in voxel- and parcel-level resolution and their combination, (ii) frequency bands, and (iii) time window relative to press onset for each finger movement, as well as the choice of decoders, the resultant "hybrid decoder" achieved extremely high decoding accuracy (~95%). This part seems driven almost by pure engineering interest in gaining as high decoding accuracy as possible.<br /> In the second part of the paper, armed with the successful 'hybrid decoder,' the authors asked more scientific questions about how neural representation of individual finger movement that is embedded in a sequence, changes during a very early period of skill learning and whether and how such representational change can predict skill learning. They assessed the difference in MEG feature patterns between the first and the last press 4 in sequence 41324 at each training trial and found that the pattern differentiation progressively increased over the course of early learning trials. Additionally, they found that this pattern differentiation specifically occurred during the rest period rather than during the practice trial. With a significant correlation between the trial-by-trial profile of this pattern differentiation and that for accumulation of offline learning, the authors argue that such "contextualization" of finger movement in a sequence (e.g., what-where association) underlies the early improvement of sequential skill. This is an important and timely topic for the field of motor learning and beyond.

      Strengths:

      Each part has its own strength. For the first part, the use of temporally rich neural information (MEG signal) has a significant advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. For the second part, the finding of the early "contextualization" of the finger movement in a sequence and its correlation to early (offline) skill improvement is interesting and important. The comparison between "online" and "offline" pattern distance is a neat idea.

      Weaknesses:

      Despite the strengths raised, the specific goal for each part of the current paper, i.e., achieving high decoding accuracy and answering the scientific question of early skill learning, seems not to harmonize with each other very well. In short, the current approach, which is solely optimized for achieving high decoding accuracy, does not provide enough support and interpretability for the paper's interesting scientific claim. This reminds me of the accuracy-explainability tradeoff in machine learning studies (e.g., Linardatos et al., 2020). More details follow.

      There are a number of different neural processes occurring before and after a key press, such as planning of upcoming movement and ahead around premotor/parietal cortices, motor command generation in primary motor cortex, sensory feedback related processes in sensory cortices, and performance monitoring/evaluation around the prefrontal area. Some of these may show learning-dependent change and others may not.

      Given the use of whole-brain MEG features with a wide time window (up to ~200 ms after each key press) under the situation of 3~4 Hz (i.e., 250~330 ms press interval) typing speed, these different processes in different brain regions could have contributed to the expression of the "contextualization," making it difficult to interpret what really contributed to the "contextualization" and whether it is learning related. Critically, the majority of data used for decoder training has the chance of such potential overlap of signal, as the typing speed almost reached a plateau already at the end of the 11th trial and stayed until the 36th trial. Thus, the decoder could have relied on such overlapping features related to the future presses. If that is the case, a gradual increase in "contextualization" (pattern separation) during earlier trials makes sense, simply because the temporal overlap of the MEG feature was insufficient for the earlier trials due to slower typing speed.

      Several direct ways to address the above concern, at the cost of decoding accuracy to some degree, would be either using the shorter temporal window for the MEG feature or training the model with the early learning period data only (trials 1 through 11) to see if the main results are unaffected would be some example.

    4. Reviewer #3 (Public review):

      Summary:

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training, and correlates with a performance metric which the authors interpret as an indicator of offline learning.

      Strengths:

      A strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers.

      Weaknesses:

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, which partly arise from the experimental design (mainly the use of a single sequence) and which are described below, question the neurobiological implications proposed by the authors, and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption.

      Specifically:<br /> The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence, and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., Neuron 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the keypress, up to at least {plus minus}100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides little evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context.<br /> During the review process, the authors pointed out that a "mixing" of temporally overlapping information from consecutive keypresses, as described above, should result in systematic misclassifications and therefore be detectable in the confusion matrices in Figures 3C and 4B, which indeed do not provide any evidence that consecutive keypresses are systematically confused. However, such absence of evidence (of systematic misclassification) should be interpreted with caution, and, of course, provides no evidence of absence. The authors also pointed out that such "mixing" would hamper the discriminability of the two ordinal positions of the index finger, given that "ordinal position 5" is systematically followed by "ordinal position 1". This is a valid point which, however, cannot rule out that "contextualization" nevertheless reflects the described "mixing".

      During the review process, the authors responded to my concern that training of a single sequence introduces the potential confound of "mixing" described above, which could have been avoided by training on several sequences, as in Kornysheva et al. (Neuron 2019), by arguing that Day 2 in their study did include control sequences. However, the authors' findings regarding these control sequences are fundamentally different from the findings in Kornysheva et al. (2019), and do not provide any indication of effector-independent ordinal information in the described contextualization - but, actually, the contrary. In Kornysehva et al. (Neuron 2019), ordinal, or positional, information refers purely to the rank of a movement in a sequence. In line with the idea of competitive queuing, Kornysheva et al. (2019) have shown that humans prepare for a motor sequence via a simultaneous representation of several of the upcoming movements, weighted by their rank in the sequence. Importantly, they could show that this gradient carries information that is largely devoid of information about the order of specific effectors involved in a sequence, or their timing, in line with competitive queuing. They showed this by training a classifier to discriminate between the five consecutive movements that constituted one specific sequence of finger movements (five classes: 1st, 2nd, 3rd, 4th, 5th movement in the sequence) and then testing whether that classifier could identify the rank (1st, 2nd, 3rd, etc) of movements in another sequence, in which the fingers moved in a different order, and with different timings. Importantly, this approach demonstrated that the graded representations observed during preparation were largely maintained after this cross-decoding, indicating that the sequence was represented via ordinal position information that was largely devoid of information about the specific effectors or timings involved in sequence execution. This result differs completely from the findings in the current manuscript. Dash et al. report a drop in detected ordinal position information (degree of contextualization in figure 5C) when testing for contextualization in their novel, untrained sequences on Day 2, indicating that context and ordinal information as defined in Dash et al. is not at all devoid of information about the specific effectors involved in a sequence. In this regard, a main concern in my public review, as well as the second reviewer's public review, is that Dash et al. cannot tell apart, by design, whether there is truly contextualization in the neural representation of a sequence (which they claim), or whether their results regarding "contextualization" are explained by what they call "mixing" in their author response, i.e., an overlap of representations of consecutive movements, as suggested as an alternative explanation by Reviewer 2 and myself.

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - figure supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - figure supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject, or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time, and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physcial context should be controlled for).

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence, but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).

      A further complication in interpreting the results stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen. It is not clear why the authors introduced this complicating visual feedback in their task, besides consistency with their previous studies. The resulting systematic link between the pattern of visual stimulation (the number of asterisks on the screen) and the ordinal position of a keypress makes the interpretation of "contextual information" that differentiates between ordinal positions difficult. During the review process, the authors reported a confusion matrix from a classification of asterisks position based on eye tracking data recorded during the task, and concluded that the classifier performed at chance level and gaze was, thus, apparently not biased by the visual stimulation. However, the confusion matrix showed a huge bias that was difficult to interpret (a very strong tendency to predict one of the five asterisk positions, despite chance-level performance). Without including additional information for this analysis (or simply the gaze position as a function of the number of astersisk on the screen) in the manuscript, this important control anaylsis cannot be properly assessed, and is not available to the public.

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, this does not address the question whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - i.e., the question whether performance changes (micro-offline gains) are less pronounced across rest periods for which the change in "contextualization" is relatively low. The single-subject correlation between contextualization changes "during" rest and micro-offline gains (Figure 5 - figure supplement 4) addresses this question, however, the critical statistical test (are correlation coefficients significantly different from zero) is not included. Given the displayed distribution, it seems unlikely that correlation coefficients are significantly above zero.

      The authors follow the assumption that micro-offline gains reflect offline learning. However, there is no compelling evidence in the literature, and no evidence in the present manuscript, that micro-offline gains (during any training phase) reflect offline learning. Instead, emerging evidence in the literature indicates that they do not (Das et al., bioRxiv 2024), and instead reflect transient performance benefits when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). During the review process, the authors argued that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed (lasting) learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks for the acquired skill level, despite the presence of micro-offline gains.

      Along these lines, the authors' claim, based on Bönstrup et al. 2020, that "retroactive interference immediately following practice periods reduces micro-offline learning", is not supported by that very reference. Citing Bönstrup et al. (2020), "Regarding early learning dynamics (trials 1-5), we found no differences in microscale learning parameters (micro-online/offline) or total early learning between both interference groups." That is, contrary to Dash et al.'s current claim, Bönstrup et al. (2020) did not find any retroactive interference effect on the specific behavioral readout (micro-offline gains) that the authors assume to reflect consolidation.

      The authors conclude that performance improves, and representation manifolds differentiate, "during" rest periods (see, e.g., abstract). However, micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition). That is, the definition of micro-offline gains (as well as offline contextualization) conflates online and "offline" processes. This becomes strikingly clear in the recent Nature paper by Griffin et al. (2025), who computed micro-offline gains as the difference in average performance across the first five sequences in a practice period (a block, in their terminology) and the last five sequences in the previous practice period. Averaging across sequences in this way minimises the chance to detect online performance changes, and inflates changes in performance "offline". The problem that "offline" gains (or contextualization) is actually computed from data entirely generated online, and therefore subject to processes that occur online, is inherent in the very definition of micro-offline gains, whether, or not, they computed from averaged performance.

      A simple control analysis based on shuffled class labels could lend further support to the authors' complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance-level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). During the review process, the authors reported this analysis to the reviewers. Given that readers may consider following the presented decoding approach in their own work, it would have been important to include that control analysis in the manuscript to convince readers of its validity.

      Furthermore, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - it is unclear what the authors refer to when they talk about the sign of the "average source", line 477).

    1. eLife Assessment

      This work investigates the functional difference between the most commonly expressed form of PTH, and a mutant form of PTH, identified in a patient with chronic hypocalcemia and hyperphosphatemia which characterizes hypoparathyroidism. The authors investigate the hypothesis that this mutant PTH assumes a dimeric form in vivo and serves anabolic functions in the bone. The data are compelling and the translational aspects are fundamental in understanding PTH-1 Receptor activation.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the functional difference between the most commonly expressed form of PTH, and a novel point mutation in PTH identified in a patient with chronic hypocalcemia and hyperphosphatemia. The value of this mutant form of PTH as a potential anabolic agent for bone is investigated alongside PTH(1-84), which is a current anabolic therapy. The authors have achieved the aims of the study.

      Strengths:

      The work is novel, as it describes the function of a novel, naturally occurring, variant of PTH in terms of its ability to dimerise, to lead to cAMP activation, to increase serum calcium, and its pharmacological action compared to normal PTH.

      Comments on revisions: No further recommendations for revisions. Acceptable as the paper stands.

      [Editors' note: the original reviews are here, https://doi.org/10.7554/eLife.97579.1.sa1]

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the functional difference between the most commonly expressed form of PTH, and a novel point mutation in PTH identified in a patient with chronic hypocalcemia and hyperphosphatemia. The value of this mutant form of PTH as a potential anabolic agent for bone is investigated alongside PTH(1-84), which is a current anabolic therapy. The authors have achieved the aims of the study.

      Strengths:

      The work is novel, as it describes the function of a novel, naturally occurring, variant of PTH in terms of its ability to dimerise, to lead to cAMP activation, to increase serum calcium, and its pharmacological action compared to normal PTH.

      Recommendations for the authors:

      (1) In your response to the reviewers you included a figure. You said it was for the reviewers only. We are *not* including it here. Is that correct or should it be in the Public Reviews?

      We apologize for any confusion and appreciate your thorough review. The phrase “data only for reviewers” was intended to indicate that the content was included in the revision based on reviewers’ comments, not in the main text (article). However, we acknowledge that this phrasing may be inappropriate. We are agree to make the figure included in the previous author response of the public reviews. Accordingly, we propose to revise the previous author response as follows:

      - Remove "(data only for reviewers)".

      -  Correct the typo from "perosteal" to "periosteal".

      - “Thank you for your comment. First, we ensured that the bones sampled during the experiment showed no defects, and we carefully separated the femur bones from the mice to preserve their integrity. In the 3-point bending test, PTH treatment significantly increased the maximum load of the femur bone compared to the OVX-control group. Additionally, the maximum load in the PTH treatment group was significantly greater than that observed in the PTH dimer group. Furthermore, structural factors influencing bone strength, such as the periosteal perimeter and the endocortical bone perimeter, were also increased in the PTH treatment group compared to the PTH dimer group.”

      (2) Do you mean to always have R<sup>0</sup> (have a superscript) and RG (never have a superscript) or should they be shown in the same way throughout your paper?

      Thank you for your thorough review. Based on previous studies that addressed the conformation of PTH1R, R<sup>0</sup> is typically shown with a superscript, while RG is not (Hoare et al., 2001; Dean et al., 2006; Okazaki et al., 2008). We have followed this notation and will ensure consistency throughout our paper.

      Hoare, S. R., Gardella, T. J., & Usdin, T. B. (2001). Evaluating the signal transduction mechanism of the parathyroid hormone 1 receptor: effect of receptor-G-protein interaction on the ligand binding mechanism and receptor conformation. Journal of Biological Chemistry, 276(11), 7741-7753.

      Dean, T., Linglart, A., Mahon, M. J., Bastepe, M., Jüppner, H., Potts Jr, J. T., & Gardella, T. J. (2006). Mechanisms of ligand binding to the parathyroid hormone (PTH)/PTH-related protein receptor: selectivity of a modified PTH (1–15) radioligand for GαS-coupled receptor conformations. Molecular endocrinology, 20(4), 931-943.

      Okazaki, M., Ferrandon, S., Vilardaga, J. P., Bouxsein, M. L., Potts Jr, J. T., & Gardella, T. J. (2008). Prolonged signaling at the parathyroid hormone receptor by peptide ligands targeted to a specific receptor conformation. Proceedings of the National Academy of Sciences, 105(43), 16525-16530.

      (3) The following grammatical and fact changes and word changes are requested.

      We appreciate the thoughtful review and thank you for pointing out the grammatical, factual, and word changes required. We have carefully reviewed and addressed each of these corrections to ensure the paper's accuracy and readability.

      We appreciate the reviewers' detailed and constructive reviews. We have addressed all the comments to improve the quality of our paper.

    1. eLife Assessment

      Catani and colleagues provide data on antigenic properties of neuraminidase proteins of pandemic H1N1 and show that antigenic diversity of the neuraminidase from 2009 to 2020 largely falls into two groups. These antigenic groups map to two phylogenetic groups, and substitutions at positions 432 and 321 are likely associated with the antigenic change. These data and results allow useful insights into the antigenic properties of N1 influenza and the evidence supporting the conclusions is solid.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors have performed an antigenic assay for human seasonal N1 neuraminidase using antigens and mouse sera from 2009-2020 (with one avian N1 antigen). This shows two distinct antigen groups. There is poorer reactivity with sera from 2009-2012 against antigens from 2015-2019, and poorer reactivity with sera from 2015-2020 against antigens from 2009-2013. There is a long branch separating these two groups. However, 321 and 423 are the only two positions that are consistently different between the two groups. Therefore these are the most likely cause of these antigenic differences.

      Strengths:

      (1) A sensible rationale was given for the choice of sera, in terms of the genetic diversity.

      (2) There were two independent batches of one of the antigens used for generating sera, which demonstrated the level of heterogeneity in the experimental process.

      (3) Replicate of the Wisconsin/588/2019 antigen (as H1 and H6) is another useful measure of heterogeneity.

      (4) The presentation of the data, e.g. Figure 2, clearly shows two main antigenic groups.

      (5) The most modern sera are more recent than other related papers, which demonstrates that has been no major antigenic change.

      Weaknesses:

      (1) Issues with experimental methods<br /> As I am not an experimentalist, I cannot comment fully on the experimental methods. However, I note that BALB/c mice sera were used, whereas outbred ferret sera are typically used in influenza antigenic characterisation, so the antigenic difference observed may not be relevant in humans. Similarly, the mice were immunised with an artificial NA immunogen where the typical approach would be to infect the ferret with live virus intra-nasally.

      (2) Five mice sera were generated per immunogen and then pooled, but data was not presented that demonstrated these sera were sufficiently homogenous that this approach is valid.

      (3) There were no homologous antigens for most of the sera. This makes the responses difficult to interpret as the homologous titre is often used to assess the overall reactivity of a serum. The sequence of the antigens used is not described, which again makes it difficult to interpret the results.

      (4) To be able to untangle the effects of the individual substitutions at 321, 386, and 432, it would have been useful to have included the naturally occurring variants at these positions, or to have generated mutants at these positions. Gao et al clearly show an antigenic difference with ferret sera correlated separately with N386K and I321V/K432E.

      (5) The challenge experiments in Gao et al showed that NI titre was not a good correlate of protection, so that limits the interpretation of these results.

      Issues with the computational methods

      (6) The NAI titres were normalised using the ELISA results, and the motivation for this is not explained. It would be nice to see the raw values.

      (7) It is not clear what value the random forest analysis adds here, given that positions 321 and 432 are the only two that consistently differ between the two groups.

      (8) As with the previous N2 paper, the metric for antigenic distance (the root mean square of the difference between the titres for two sera) is not one that would be consistent when different sera are included. More usual metrics of distance are Archetti-Horsfall, fold down from homologous, or fold down from maximum.

      (9) Antigenic cartography of these data is fraught. I wonder whether 2 dimensions are required for what seems like a 1-dimensional antigenic difference - certainly, the antigens, excluding the H5N1, are in a line. The map may be skewed by the high reactivity Brisbane/18 antigen. It is not clear if the column bases (normalisation factors for calculating antigenic distance) have been adjusted to account for the lack of homologous antigens. It is typical to present antigenic maps with a 1:1 x:y ratio.

      Issues with interpretation

      (10) Figure 2 shows the NAI titres split into two groups for the antigens, however, A/Brisbane is an outlier in the second antigenic group with high reactivity.

      (11) Following Gao et al, I think you can claim that it is more likely that the antigenic change is due to K432E than I321V, based on a comparison of the amino acid change.

      Appraisal:

      Taking into account the limitations of the experimental techniques (which I appreciate are due to resource constraints), this paper meets its aim of measuring the antigenic relationships between 2009-2020 seasonal N1s, showing that there were two main groups. The authors discovered that the difference between the two antigenic groups was likely attributable to positions 321 and 432, as these were the only two positions that were consistently different between the two groups. They came to this finding by using a random forest model, but other simpler methods could have been used.

      Impact:

      This paper contributes to the growing literature on the potential benefit of NA in the influenza vaccine.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Catani et al. have immunized mice with 17 recombinant N1 neuraminidases (NAs) from human isolates circulating between 2009-2020 to investigate antigenic diversity. NA inhibition (NAI) titers revealed two groups that were antigenically and phylogenetically distinct. Machine learning was used to estimate the antigenic distances between the N1 NAs and mutations at residues K432E and I321V were identified as key determinants of N1 NA antigenicity.

      Strengths:

      Observation of mutations associated with N1 antigenic drift.

      Weaknesses:

      Validation that K432E and I321V are responsible for antigenic drift was not determined in a background strain with native K432 and I321 or the restitution of antibody binding by reversion to K432 and I321 in strains that evaded sera.

    4. Author rsponse:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors have performed an antigenic assay for human seasonal N1 neuraminidase using antigens and mouse sera from 2009-2020 (with one avian N1 antigen). This shows two distinct antigen groups. There is poorer reactivity with sera from 2009-2012 against antigens from 2015-2019, and poorer reactivity with sera from 2015-2020 against antigens from 2009-2013. There is a long branch separating these two groups. However, 321 and 423 are the only two positions that are consistently different between the two groups. Therefore these are the most likely cause of these antigenic differences.

      Strengths:

      (1) A sensible rationale was given for the choice of sera, in terms of the genetic diversity.

      (2) There were two independent batches of one of the antigens used for generating sera, which demonstrated the level of heterogeneity in the experimental process.

      (3) Replicate of the Wisconsin/588/2019 antigen (as H1 and H6) is another useful measure of heterogeneity.

      (4) The presentation of the data, e.g. Figure 2, clearly shows two main antigenic groups.

      (5) The most modern sera are more recent than other related papers, which demonstrates that has been no major antigenic change.

      Weaknesses:

      (1) Issues with experimental methods

      As I am not an experimentalist, I cannot comment fully on the experimental methods. However, I note that BALB/c mice sera were used, whereas outbred ferret sera are typically used in influenza antigenic characterisation, so the antigenic difference observed may not be relevant in humans. Similarly, the mice were immunised with an artificial NA immunogen where the typical approach would be to infect the ferret with live virus intra-nasally.

      Indeed, ferrets are the gold standard model for the study of influenza. The main reason for this is the susceptibility of ferrets to infection with primary human influenza virus isolates and their ability to transmit human influenza A and B viruses. Although mouse models often require the use of mouse-adapted influenza virus strains, it is still the most used model to study new developments on influenza vaccine.

      In our previous publication we performed a parallel analysis of sera of ferrets that were primed by infection and boosted by recombinant protein, as well as mice that, like in this study that focuses on N1 NA, were prime-boosted with purified recombinant NA proteins in the presence of an adjuvant. Our data indicate that the NAI responses in immune sera from infected ferrets after infection and after boost enables similar antigenic classification and correlated strongly with those induced in mice that had been prime-boosted with adjuvanted recombinant NA (Catani et al., eLife 2024). To a large extend, the immunogenicity of an antigen relies on epitope accessibility, which may dictate a universal rule of immunogenicity and antigenicity (Altman et al., 2015).

      (2) Five mice sera were generated per immunogen and then pooled, but data was not presented that demonstrated these sera were sufficiently homogenous that this approach is valid.

      Although individual sera was not tested here. Based on previous studies from our group we are confident that a prime-boost schedule with 1 µg of adjuvanted soluble tetrameric NA, induces a highly homogeneous response in mice (Catani et al., 2022).

      (3) There were no homologous antigens for most of the sera. This makes the responses difficult to interpret as the homologous titre is often used to assess the overall reactivity of a serum. The sequence of the antigens used is not described, which again makes it difficult to interpret the results.

      The absence of homologous antigens may indeed make interpretation more difficult. However, we have observed that homologous sera do not always coincide with the highest reactivity, although highest reactivity is always found within an antigenic cluster. A sequence comparison would be appropriate to improve interpretability of the data. Therefore, a sequence alignment and a pairwise comparison will be provided in the revised manuscript as supplement. 

      (4) To be able to untangle the effects of the individual substitutions at 321, 386, and 432, it would have been useful to have included the naturally occurring variants at these positions, or to have generated mutants at these positions. Gao et al clearly show an antigenic difference with ferret sera correlated separately with N386K and I321V/K432E.

      The prevalence of single amino acid substitutions in N1 NA of clinical H1N1 virus strains isolated between 2009 and 2024 is minimal, which may indicate reduced fitness (see Author response image 1) in strains with these substitutions in NA. Nevertheless, we agree that the rescue of single mutants would provide important evidence to untangle those individual impacts on antigenicity. We plan to generate mutants with substitution at these positions in NA of A/Wisconsin/588/2019 H1N1 and determine the NAI against our panel of sera.

      Author response image 1.

      Prevalence of the indicated N1 NA substitutions in all clinical human H1N1 isolates with unique sequences deposited in the GISAID data bank since 2009.

      (5) The challenge experiments in Gao et al showed that NI titre was not a good correlate of protection, so that limits the interpretation of these results.

      On the contrary, challenges experiments confirmed that drift occurred in NA from H1N1 viruses isolated between 2009 (CA/09) and 2015 (MI/15). The dilution of transferred sera to equal inhibitory titers indicate that the homologous ferret sera (shown in figure 5e-f)(Gao et al., 2019) is still effective in protecting against infection while heterologous sera are not. This result emphasises that the nature of the homologous NAI response is well-suited for protection against a homologous challenge, although mechanistic data was not provided.

      Issues with the computational methods

      (6) The NAI titres were normalised using the ELISA results, and the motivation for this is not explained. It would be nice to see the raw values.

      Mice were immunized with different batches of recombinant protein. Each of those batches may have distinct intrinsic immunogenicity, as observed in Figure 1d. For that reason, NAI values were normalized using homologous ELISA titers induced by each respective NA antigen. A table with the raw values will be included in the revised manuscript.

      (7) It is not clear what value the random forest analysis adds here, given that positions 321 and 432 are the only two that consistently differ between the two groups.

      The substitutions at position 321 and 432 are indeed the only 2 consistently differing amino acids among the tested N1s. Although their correlation with antigenic clustering may be obvious after analysis, a random forest analysis would enable to reveal less obvious substitutions that contribute to the antigenic diversity. In the future, we intend to expand this methodology to strains that are not currently included in the panel. A random forest model is a relatively simple and performant method to deal with a new dataset.

      (8) As with the previous N2 paper, the metric for antigenic distance (the root mean square of the difference between the titres for two sera) is not one that would be consistent when different sera are included. More usual metrics of distance are Archetti-Horsfall, fold down from homologous, or fold down from maximum.

      The antigenic distances calculated prior to our random forest does use fold-difference as metrics as log2(max(EC50) / EC50). After having obtained the fold-difference values, a pairwise dissimilarity matrix was calculated to obtain the average antigenic distance between pairs of sera. A more detailed description of the methodology will be included in the methods session, including the R-code.

      (9) Antigenic cartography of these data is fraught. I wonder whether 2 dimensions are required for what seems like a 1-dimensional antigenic difference - certainly, the antigens, excluding the H5N1, are in a line. The map may be skewed by the high reactivity Brisbane/18 antigen. It is not clear if the column bases (normalisation factors for calculating antigenic distance) have been adjusted to account for the lack of homologous antigens. It is typical to present antigenic maps with a 1:1 x:y ratio.

      Antigenic cartography will be repeated excluding H5N1 and/or Brisbane/18 antigen. Data will be provided in the final rebuttal letter.

      Issues with interpretation

      (10) Figure 2 shows the NAI titres split into two groups for the antigens, however, A/Brisbane is an outlier in the second antigenic group with high reactivity.

      Indeed, A/Brisbane/02/2018 has overall higher IC50 values. However, it still falls into the same cluster that we called AG2. Highlighting A/Brisbane/02/2018 may lead to the misinterpretation of a non-existent antigenic group. 

      (11) Following Gao et al, I think you can claim that it is more likely that the antigenic change is due to K432E than I321V, based on a comparison of the amino acid change.

      Indeed, we would expect that substitution of the basic arginine to an acidic glutamate is more likely to impact antigenicity than the isoleucine-to-valine apolar substitution. Testing of mutant reassortants with single mutations may provide the definitive answer for that question.

      Appraisal:

      Taking into account the limitations of the experimental techniques (which I appreciate are due to resource constraints), this paper meets its aim of measuring the antigenic relationships between 2009-2020 seasonal N1s, showing that there were two main groups. The authors discovered that the difference between the two antigenic groups was likely attributable to positions 321 and 432, as these were the only two positions that were consistently different between the two groups. They came to this finding by using a random forest model, but other simpler methods could have been used.

      Impact:

      This paper contributes to the growing literature on the potential benefit of NA in the influenza vaccine.

      Reviewer #2 (Public review):

      Summary:

      In this study, Catani et al. have immunized mice with 17 recombinant N1 neuraminidases (NAs) from human isolates circulating between 2009-2020 to investigate antigenic diversity. NA inhibition (NAI) titers revealed two groups that were antigenically and phylogenetically distinct. Machine learning was used to estimate the antigenic distances between the N1 NAs and mutations at residues K432E and I321V were identified as key determinants of N1 NA antigenicity.

      Strengths:

      Observation of mutations associated with N1 antigenic drift.

      Weaknesses:

      Validation that K432E and I321V are responsible for antigenic drift was not determined in a background strain with native K432 and I321 or the restitution of antibody binding by reversion to K432 and I321 in strains that evaded sera.

      Reassortant A/Wisconsin/588/2019 with E432K, V321I and also K386N single mutations will be rescued and tested against the panel of sera.

    1. eLife Assessment

      The study by Chi and colleagues presents important new tools for precise genetic manipulation and lineage tracing in mice. The characterization of these new models was conducted using validated, state-of-the-art methodologies and convincingly demonstrates their ability to enhance the precision of genetic manipulation in distinct cell types. This work will be of great interest to many laboratories worldwide and will facilitate future research across various biomedical disciplines.

    2. Reviewer #1 (Public review):

      Summary:

      Shi and colleagues report the use of modified Cre lines in which the coding region of Cre is disrupted by rox-STOP-rox or lox-STOP-lox sequences to prevent the expression of functional protein in the absence of Dre or Cre activity, respectively. The main purpose of these tools is to enable intersectional or tamoxifen-induced Cre activity with minimal or no leaky activity from the second, Cre-expressing allele. It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      Strengths:

      The new tools can reduce Cre leak in vivo.

      Comments on revisions:

      The major improvement in my mind is the inclusion of Supp Fig 7 where the authors compare their loxCre to iSureCre. The discussion is somewhat improved, but still fails to discuss significant issues such as Cre toxicity in detail. As noted by most reviewers, without a biological question the paper is entirely a technical description of a a couple of new tools. However, I do feel that these tools will be of use to the field.

    3. Reviewer #2 (Public review):

      This work present new genetic tools for enhanced Cre-mediated gene deletion and genetic lineage tracing. The authors optimise and generate mouse models that convert temporally controlled CreER or DreER activity to constitutive Cre expression, coupled with the expression of tdT reporter for the visualizing and tracing of gene-deleted cells. This was achieved by inserting a stop cassette into the coding region of Cre, splitting it into N- and C-terminal segments. Removal of the stop cassette by Cre-lox or Dre-rox recombination results in the generation of modified Cre that is shown to exhibit similar activity to native Cre. The authors further demonstrate efficient gene knockout in cells marked by the reporter using these tools, including intersectional genetic targeting of pericentral hepatocytes.

      The new models offer several important advantages. They enable tightly controlled and highly effective genetic deletion of even alleles that are difficult to recombine. By coupling Cre expression to reporter expression, these models reliably report Cre-expressing i.e. gene-targeted cells and circumvent false positives that can complicate analyses in genetic mutants relying on separate reporter alleles. Moreover, the combinatorial use of Dre/Cre permits intersectional genetic targeting, allowing for more precise fate mapping.

      The study and the new models have also some limitations. The demonstration of efficient deletion of multiple floxed alleles in a mosaic fashion, a scenario where the lines would demonstrate their full potential compared to existing models, has not been tested in the current study. Mosaic genetics is increasingly recognized as a key methodology for assessing cell-autonomous gene functions. The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. In addition, as discussed by the authors, a limitation of this line is the constitutive expression of Cre, which is associated with toxicity in some cases.

    4. Reviewer #3 (Public review):

      Shi et al describe a new set of tools to facilitate Cre or Dre-recombinase-mediated recombination in mice. The strategies are not completely novel but have been pursued previously by the lab, which is world-leading in this field, and by others. The authors report a new version of the iSuRe-Cre approach, which was originally developed by Rui Benedito's group in Spain. Shi et al describe that their approach shows reduced leakiness compared to the iSuRe-Cre line. Furthermore, a new R26-roxCre-tdT mouse line was established after extensive testing, which enables efficient expression of the Cre recombinase after activation of the Dre recombinase. The authors carefully evaluated efficiency and leakiness of the new line and demonstrated the applicability by marking peri-central hepatocytes in an intersectional genetics approach. The paper represents the result of enormous, carefully executed efforts. Although I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, carefully conducted technical studies have a considerable value for the scientific community, justifying publication.

      It seems very likely that the new mouse lines generated in this study will enhance the precision of genetic manipulation in distinct cell types and greatly facilitate future work in numerous laboratories. The authors expertly have eradicated weaknesses from the initial submission. One minor issue remains. The authors did not investigate potential toxic effects that might be caused by high level expression of a combination of "foreign" genes such as recombinases and fluorescence reporters. The authors refer to published studies about toxic effects, speculating that they can only be prevented by removing recombinases in an additional step. Although this is a valid argument, I would have appreciated to see an assessment of putative toxic effects by RNA-sequencing, since different combinations of recombinases and fluorescence reporters sometimes can generate unexpected effects. However, this minor issue does not compromise the value of this important study.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      We are grateful for this comment. For the usefulness of these alleles, figure 3 shows that specific and efficient genetic manipulation of one cell subpopulation can be achieved by mating across the DreER mouse strain to the rox-Cre mouse strain. In addition, figure 6 shows that R26-loxCre-tdT can effectively ensure Cre-loxP recombination on some gene alleles and for genetic manipulation. The expression of the tdT protein is aligned with the expression of the Cre protein (Alb roxCre-tdT and R26-loxCre-tdT, figure 2 and figure 5), which ensures the accuracy of the tracing experiments. We believe more functional data can be shown in future articles that use mice lines mentioned in this manuscript.

      (2) The data in Figure 5 show strong activity at the Confetti locus, but the design of the newly reported R26-loxCre line lacks a WPRE sequence that was included in the iSure-Cre line to drive very robust protein expression.

      Thank you for bringing up this point in the manuscript. In the R26-loxCre-tdT mice knock-in strategy, the WPRE sequence is added behind the loxCre-P2A-tdT sequence, as shown in Supplementary Figure 9.

      (3) The most valuable experiment for such a new tool would be a head-to-head comparison with iSure (or the latest iSure version from the Benedito lab) using the same CreER and target foxed allele. At the very least a comparison of Cre protein expression between the two lines using identical CreER activators is needed.

      Thank you for your valuable and insightful comment. The comparison results of R26-loxCre-tdT with iSuRe-Cre using Alb-CreER and targeting R26-Confetti can be found in Supplementary Figure 7 C-E, according to the reviewer’s suggestion.

      (4) Why did the authors not use the same driver to compare mCre 1, 4, 7, and 10? The study in Figure 2 uses Alb-roxCre for 1 and 7 and Cdh5-roxCre for 4 and 10, with clearly different levels of activity driven by the two alleles in vivo. Thus whether mCre1 is really better than mCre4 or 10 is not clear.

      Thank you for raising this concern. After screening out four robust versions of mCre, we generated these four roxCre knock-in mice. It is unpredictable for us which is the most robust mCre in vivo. It might be one or two mCre versions that work efficiently. For example, if Alb-mCre1 was competitive with Cdh5-mCre10, we can use them for targeting genes in different cell types, broadening the potential utility of these mice.

      (5) Technical details are lacking. The authors provide little specific information regarding the precise way that the new alleles were generated, i.e. exactly what nucleotide sites were used and what the sequence of the introduced transgenes is. Such valuable information must be gleaned from schematic diagrams that are insufficient to fully explain the approach.

      We appreciate your thoughtful suggestions. The schematic figures, along with the nucleotide sequences for the generation of mice, can be found in the revised Supplementary Figure 9.

      Reviewer #2 (Public Review):

      (1) The scenario where the lines would demonstrate their full potential compared to existing models has not been tested.

      Thank you for your thoughtful and constructive comment. The comparative analysis of R26-loxCre-tdT with iSuRe-Cre, employing Alb-CreER to target R26-Confetti, is provided in Supplementary Figure 7 C-E.

      (2) The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. Therefore, a demonstration of the efficient deletion of multiple floxed alleles in a mosaic fashion would be a valuable addition.

      Thank you for your constructive comments. Mosaic analysis using sparse labeling and efficient gene deletion would be our future direction using roxCre and loxCre strategies.

      (3) When combined with the confetti line, the reporter cassette will continue flipping, potentially leading to misleading lineage tracing results.

      Thank you for your professional comments. Indeed, the confetti used in this study can continue flipping, which would lead to potentially misleading lineage tracing results. Our use of R26-Confetti is to demonstrate the robustness of mCre for recombination. Some multiple-color mice lines that don’t flip have been published, for example, R26-Confetti2(10.1038/s41588-019-0346-6) and Rainbow (10.1161/CIRCULATIONAHA.120.045750). These reporters could be used for tracing Cre-expressing cells, without concerns of flipping of reporter cassettes.

      (4) Constitutive expression of Cre is also associated with toxicity, as discussed by the authors in the introduction.

      Thank you for your professional comments. The toxicity of constitutive expression of Cre and the toxicity associated with tamoxifen treatment in CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      Reviewer #3 (Public Review):

      (1) Although leakiness is rather minor according to the original publication and the senior author of the study wrote in a review a few years ago that there is no leakiness(https://doi.org/10.1016/j.jbc.2021.100509).

      Thank you so much for your careful check. In this review (https://doi.org/10.1016/j.jbc.2021.100509), the writer’s comments on iSuRe-Cre are on the reader's side, and all summary words are based on the original published paper (10.1038/s41467-019-10239-4). Currently, we have tested iSuRe-Cre in our hands. We did detect some leakiness in the heart and muscle, but hardly in other tissues as shown in Author response image 1.

      Author response image 1.

      Leakiness in Alb CreER;iSuRe-Cre mouse line Pictures are representative results for 5 mice. Scale bars, white 100 µm.

      (2) I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, rather than a primarily technical report, which describes the ongoing efforts to further improve Cre and Dre recombinase-mediated recombination.

      We gratefully appreciate your valuable comment. The roxCre and loxCre mice mentioned in this study provide more effective methods for inducible genetic manipulation in studying gene function. We hope that the application of our new genetic tools could help address some major biological questions in different biomedical fields in the future.

      (3) Very high levels of Cre expression may cause toxic effects as previously reported for the hearts of Myh6-Cre mice. Thus, it seems sensible to test for unspecific toxic effects, which may be done by bulk RNA-seq analysis, cell viability, and cell proliferation assays. It should also be analyzed whether the combination of R26-roxCre-tdT with the Tnni3-Dre allele causes cardiac dysfunction, although such dysfunctions should be apparent from potential changes in gene expression.

      We are sorry that we mistakenly spelled R26-loxCre-tdT into R26-roxCre-tdT in our manuscript. We have not generated the R26-roxCre-tdT mouse line. We also thank the reviewer for concerns about the toxicity of high Cre expression. The toxicity of constitutive expression of Cre and the toxicity of tamoxifen treatment of CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      (4) Is there any leakiness when the inducible DreER allele is introduced but no tamoxifen treatment is applied? This should be documented. The same also applies to loxCre mice.

      In this study, we come up with new mice tool lines, including Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT. As the data shown in supplementary figure 1, supplementary figure 2, and figure 4D, Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT are not leaky. Therefore, if there is any leakiness driven by the inducible DreER or CreER allele, the leakiness is derived from the DreER or CreER. Additional pertinent experimental data can be referenced in Figure S4C, Figure S7A-B, and Figure S8A.

      (5) It would be very helpful to include a dose-response curve for determining the minimum dosage required in Alb-CreER; R26-loxCre-tdT; Ctnnb1flox/flox mice for efficient recombination.

      Thank you for your suggestion. We value your feedback and have incorporated your suggestion to strengthen our study. Relevant experimental data can be referenced in Figure S8E-G.

      (6) In the liver panel of Figure 4F, tdT signals do not seem to colocalize with the VE-cad signals, which is odd. Is there any compelling explanation?

      The staining in Figure 4F in the revision is intended to deliver optimized and high-resolution images.

      (7) The authors claim that "virtually all tdT+ endothelial cells simultaneously expressed YFP/mCFP" (right panel of Figure 5D). Well, it seems that the abundance of tdT is much lower compared to YFP/mCFP. If the recombination of R26-Confetti was mainly triggered by R26-loxCre-tdT, the expression of tdT and YFP/mCFP should be comparable. This should be clarified.

      Thank you so much for your careful check. We checked these signals carefully and didn't find the “much lower” tdT signal. As the file-loading website has a file size limitation, the compressed image results in some signal unclear. We attached clear high-resolution images here. Author response image 2 shows how we split the tdT signal and compared it with YFP/mCFP.

      Author response image 2.

      (8) In several cases, the authors seem to have mixed up "R26-roxCre-tdT" with "R26-loxCre-tdT". There are errors in #251 and #256.Furthermore, in the passage from line #278 to #301. In the lines #297 and #300 it should probably read "Alb-CreER; R26-loxCretdT;Ctnnb1flox/flox"" rather than "Alb-CreER;R26-tdT2;Ctnnb1flox/flox".

      We are grateful for these careful observations. We have corrected these typos accordingly.

      Recommendations for the authors:

      Reviewer #1:

      (1) However, for it to be useful to investigators a more direct comparison with the Benedito iSure line (or the latest version) is required as that is the crux of the study.

      Thank you for emphasizing this point, which we have now addressed in the revised manuscript and in Figure S7D-G.

      (2) I would like to know how the authors will make these new lines available to outside investigators.

      Please contact the lead author by email to consult about the availability of new mouse lines developed in this study.

      (3) The discussion is overly long and fails to address potential weaknesses. Much of it reiterates what was already said in the results section.

      We are thankful for your critical evaluation, which has helped us improve our discussion.

      Reviewer #2:

      (1) Assessing the efficiency and accuracy of the lines in mosaic deletions of multiple alleles and reporting them in single cells after low-dose tamoxifen exposure would be highly beneficial to demonstrate the full potential of the models.

      We appreciate your careful consideration of this issue. Our future endeavors will focus on mosaic analysis utilizing sparse labeling and efficient gene deletion, employing both roxCre and loxCre strategies.

      (2) Performing FACS analysis to confirm that all targeted (Cre reporter-positive) cells are also tdT-positive would provide more precise data and avoid vague statements like 'virtually all' or 'almost complete' in the results section:

      Line 166: Although mCre efficiently labeled virtually all targeted cells (Figure S3A-E)…

      Line 293: ... and not a single tdT+ hepatocyte 293 expressed Cyp2e1 (Figure 6D)... However, the authors do not provide any quantification. FACS would be ideal here.

      Line 244: ...expression of beta-catenin and GS almost disappeared in the 4W mutant sample... The resolution in the provided PDF is not adequate for assessment.

      Line 296: ... revealed almost complete deletion of Ctnnb1 in the Alb-CreER;R26-tdT2;Ctnnb1flox/flox mice...

      Thank you for suggesting these improvements, which have strengthened the robustness of our conclusions. In the revised version, we have incorporated FACS results that correspond to related sections. Additionally, a quantification statement has been included in the statistical analysis section. We appreciate your meticulous review and comments, which have significantly improved the clarity of our manuscript.

      (3) In the beginning of the results section, it is not clear which results are from this study and which are known background information (like Figure 1A). For example, it is not clear if Figure 1C presents data from R26-iSuRe-Cre. Please revise the text to more clearly present the experimental details and new findings.

      Thank you for this observation. Figure 1C belongs to this study, and the revised version has been modified to the related statement for improved clarity.

      (4) Experimental details regarding the genetic constructs and genotyping of the new knock-in lines are missing. Are R26 constructs driven by the endogenous R26 promoter or were additional enhancers used?

      Thank you for emphasizing this point. The schematic figures and nucleotide sequences for the generation of mice can be found in the revised Supplementary Figure 9, which can help to address this issue.

      (5) The method used to quantify mCre activity in terms of reporter+ target cells is not specified. From images or by FACS?

      Additionally, if images were used for quantification, it would be important to provide details on the number of images analyzed, the number of cells counted per image, and how individual cells were identified.

      Thank you for your comment. We have included the quantification statement in the statistical analysis section. Analyzing R26-Confetti+ target cells using FACS is challenging due to the limitations of the sorting instrument. Consequently, we quantified the related data by images. Each dot on the chart represents one sample, and the quantification for each mouse was conducted by averaging the data from five 10x fields taken from different sections.

      (6) Line 160: These data demonstrate that roxCre was functionally efficient yet non-leaky. Functional efficiency in vivo was not shown in the preceding experiments.

      Functional efficiency in vivo can be referred to in Figures S1-S2 and S4C.

      (7) It would be useful to provide a reference for easy vs low-efficiency recombination of different reporter alleles (lines 56-58).

      We are grateful for this comment, as it has allowed us to improve the clarity of our explanation. Consequently, we have made the necessary modifications.

      (8) Discussion on the potential drawbacks and limitations of the lines would be useful.

      We are thankful for your evaluation, which has significantly contributed to the enhancement of our discourse.

    1. eLife Assessment

      This important study examined orientation representations along the visual hierarchy during perception and working memory. The authors provide results suggesting that during working memory there is a gradient where representations are more categorical in nature later in the visual hierarchy. The evidence presented is solid, most notably a match between behavioral data, though minor weakness can be attributed to the tasks and behaviors not being designed to address this question. The findings should be of interest to a relatively broad audience, namely those interested in the relationship between sensory coding and memory.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, Chunharas and colleagues compared the representational differences of orientation information during a sensory task and a working memory task. By reanalyzing data from a previous fMRI study and applying representational similarity analysis (RSA), they observed that orientation information was represented differently in the two tasks: during visual perception, orientation representation resembled the veridical model, which captures the known naturalistic statistics of orientation information; whereas during visual working memory, a categorical model, which assumes different psychological distances between orientations, better explained the data, particularly in more anterior retinotopic regions. The authors suggest fundamental differences in the representational geometry of visual perception and working memory along the human retinotopic cortex.

      Strengths:

      Examining the differences in representational geometry between perception and working memory has important implications for the understanding of the nature of working memory. This study presents a carefully-executed reanalysis of previous data to address this question. The authors developed a novel method (model construction combined with RSA) to examine the representational geometry of orientation information under different tasks, and the control analyses provide rich, convincing support for their claims.

      Weaknesses:

      Although the control analyses are convincing, I still have alternative explanations for some of the results. I'm also concerned about the low sample size (n = 6 in the fMRI experiment). Overall, I think additional analyses may help to further clarify the issues and strengthen the claims.

      (1) The central claim of the current study is that orientation information is represented in a veridical manner during the sensory task, and in a categorical manner during working memory. However, In the sensory task, a third type of representational geometry was observed, especially in brain regions from V3AB and beyond. These regions showed a symmetric pattern in which oblique orientations (45 and 135 degrees) appeared more similar to each other. In fact, a similar pattern can even be found in V1-V3, although the effect looked weaker. The authors raised two possible explanations for this in the discussion, one being that participants might have used verbal labels (e.g., diagonal) for both orientations, and the other being a lack of attention to orientation. Either way, this suggests that a veridical model may not be the best fit for these ROIs. How would this symmetric model explain the sensory data, in comparison to the veridical model?

      (2) If the symmetric model also explains the sensory data well, I wonder whether this result challenges the authors' central claim, or instead suggests that the sensory task is not ideal for the purpose of the study. One way to address this issue might be to use the sample period of the working memory task as the perception task, as some other studies have been doing (e.g., Kwak & Curtis, 2022). This epoch of data might function as a stronger version of the attention task as the authors discussed in the discussion. What would the representational geometry look like in the sample period? I would also like to note that the current analyses used 5.6-13.6 s after stimulus onset for the memory task, which I think may reflect a mix of sample- and delay-related activity.

      (3) When comparing the veridical and categorical models, it is important to first show the significance of each model before making comparisons. For instance, was the veridical model significant in different ROIs in the memory task? And was either model significant in IPS1-3 in the two tasks? I'm asking about this because the two models appear to be both significant in the memory task, whereas only the veridical model was significant in the sensory task (with overall lower correlation coefficients than the categorical model in the memory task).

      (4) The current study has a low sample size of six participants. With such a small sample, it would be helpful to show results from individual participants. For example, I appreciate that Figures 2D and 3C showed individual data points, but additionally showing the representational geometry plot (i.e., Figure 1C) for each subject could better illustrate the robustness of the effect. Alternatively, the original paper from which the fMRI data were drawn actually had two fMRI experiments with similar task designs. I wonder if the authors could replicate these patterns using data from the second experiment with seven participants. This might provide even stronger support for the current findings with a more reasonable sample size.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examined the representational geometry of orientation representations during visual perception and working memory along the visual hierarchy. Using representational similarity analysis, they found that similarity was relatively evenly distributed among all orientations during perception, while higher around oblique orientations during WM. There were some noticeable differences along the visual hierarchy. IPS showed the most pronounced oblique orientation preferences during WM but no clear patterns during perception, likely due to the different task demands for the WM orientation task and the perception contrast discrimination task. The authors proposed two models to capture the differences. The veridical model estimated the representational geometry in perception by assuming an efficient coding framework, while the categorical model estimated the pattern in WM using psychological distances to measure the differences among orientations (including estimates from a separate psychophysical study performed outside the scanner). Therefore, I think this work is valuable and advances our understanding of the transition from perception to memory.

      Strengths:

      The use of RSA to identify representational biases goes beyond simply relying on response patterns and helps identify how representational formats change from perception to WM. The study nicely leverages ideas about efficient coding to explain perceptual representations that are more veridical, while leaning on ideas about abstractions of percepts that are more categorical-psychological in nature (but see (1) below). Moreover, the match between memory biases of orientation and the patterns estimated with RSA were compelling (but see (2) below). I found the analyses showing how RSA and decoding (eg, cross-generalization) are associated and how/why they may differ to be particularly interesting.

      Weaknesses:

      (1) The idea that later visual maps (ie, IPS0) encode perceptions of orientation in a veridical form and then in a categorical form during WM is an attractive idea. However, the support is somewhat weakened by a few issues. The RSA plots in Figure 1C for IPS0 appear to show a similar pattern, but just of lower amplitude during perception. But in the model fits either for orientation statistics or estimated from the psychophysics task, the Veridical model fits best for perception and the Categorical model fits best for memory in IPS0. By my eye, the modeled RSMs in Figures 2 & 3 do not look like the observed ones in Figure 1C. Those modeled RSMs look way more categorical than the observed IPS0. They look like something in between.

      (2) My biggest concern is the omission of the in-scanner behavioral data. Yes, on the one hand, they used the N=17 outside the scanner psychophysics dataset for the analyses in Figure 3. On the other hand, they do not even mention the behavioral data collected in the scanner along with the BOLD data. Those data had clear oblique effects if I recall correctly. Why use the data from the psychophysics experiment? Also, perhaps a missed opportunity; I wonder if the Veridical/Categorical models fit a single subject's RSA data matches that subject's behavioral biases. That would really be compelling if found.

      The data were collected (reanalysis of published study) without consideration for the aims of the current study, and are therefore not optimized to test their goals. The biggest issue is that "The distractors are really distracting me." I'm somewhat concerned about how the distractors may have impacted the results. I honestly did not notice that the authors were using delay periods that had 11s of distractor stimuli until way into the paper. On the one hand, the "patterns" of the model fits across the ROIs appear to be qualitatively similar. That's good if you want to pool data like the authors did. But, while the authors state on line 350 "..we also confirmed that the presence of distractors during the delay did not impact the pattern of results in the memory task (Supplementary Figure 5)." When looking at Supplementary Figure 5, I noticed that there are a couple of exceptions to this. In the Gratings distractor data, V1 shows a better fit to the Veridical model, while V4 and IPS0 shows no better fit to either model. And in the Noise distractor data, neither model fits better for any ROI. At first glance, I was concerned, but then looking at the No distractor data, the pattern is identical to that of the combined data. Thus, this can be seen as a glass half full/empty issue as almost all of the ROIs show a similar pattern, but still it would concern me if I were leading this study. This gets me to my key question, why even use the distractor trials at all, where the interpretation can get dicey? For instance, the authors have shown in this exact data that the impact of distraction affects the fidelity of representations differently along the visual hierarchy (Rademaker, 2019), consistent with several other studies (eg., Bettencourt & Xu, 2016; Lorenc, 2018; Hallenbeck et al., 2022) and with one of the author's preprints (Rademaker & Serences, 2024). My guess is that without the full dataset, some of the RSA analyses are underpowered. If that is the case, I'm fine with it, but it might be nice to state that.

    1. eLife Assessment

      The songbird vocal motor nucleus HVC contains cells that project to the basal ganglia, the auditory system, or downstream vocal motor structures. In this fundamental study, the authors conduct optogenetic circuit mapping to clarify how four distinct inputs to HVC act on these distinct HVC cell types. They provide compelling evidence that all long-range projections engage inhibitory circuits in HVC and can also exhibit cell-type specific preferences in monosynaptic input strength. Understanding the HVC microcircuit at this microcircuit level is critical for informing models of song learning and production.

    2. Reviewer #1 (Public review):

      Summary:

      This work tried to map the synaptic connectivity between the inputs and outputs of the song premotor nucleus, HVC in zebra finches to understand how sensory (auditory) to motor circuits interact to coordinate song production and learning. The authors optimized the optogenetic technique via AAV to manipulate auditory inputs from a specific auditory area one-by-one and recorded synaptic activity from a neuron with whole-cell recording from slice preparation with identification of the projection area by retrograde neuronal tracing. This thorough and detailed analysis provides compelling evidence of synaptic connections between 4 major auditory inputs (3 forebrain and 1 thalamic region) within three projection neurons in the HVC; all areas give monosynaptic excitatory inputs and polysynaptic inhibitory inputs, but proportions of projection to each projection neuron varied. They also find specific reciprocal connections between mMAN and Av. Taken together the authors provide the map of the synaptic connection between intercortical sensory to motor areas which is suggested to be involved in zebra finch song production and learning.

      Strengths:

      The authors optimized optogenetic tools with eGtACR1 by using AAV which allow them to manipulate synaptic inputs in a projection-specific manner in zebra finches. They also identify HVC cell types based on projection area. With their technical advance and thorough experiments, they provided detailed map synaptic connections.

      Weaknesses:

      As it is the study in brain slice, the functional implication of synaptic connectivity is limited. Especially as all the experiments were done in the adult preparation, there could be a gap in discussing the functions of developmental song learning.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes synaptic connectivity in the Songbird cortex's four main classes of sensory neuron afferents onto three known classes of projection neurons of the pre-motor cortical region HVC. HVC is a region associated with the generation of learned bird songs. Investigators here use all male zebra finches to examine the functional anatomy of this region using patch clamp methods combined with optogenetic activation of select neuronal groups.

      Strengths:

      The quality of the recordings is extremely high and the quantity of data is on a very significant scale, this will certainly aid the field.

      Weaknesses:

      The authors could make the figures a little easier to navigate. Most of the figures use actual anatomical images but it would be nice to have this linked with a zebra finch atlas in more of a cartoon format that accompanied each fluro image. Additionally, for the most part, figures showing the labeling lack scale bar values (in um). These should be added not just shown in the legends.

      The authors could make it clear in the abstract that this is all male zebra finches - perhaps this is obvious given the bird song focus, but it should be stated. The number of recordings from each neuron class and the overall number of birds employed should be clearly stated in the methods (this is in the figures, but it should say n=birds or cells as appropriate).

      The authors should consider sharing the actual electrophysiology records as data.

    4. Reviewer #3 (Public review):

      Nucleus HVC is critical both for song production as well as learning and arguably, sitting at the top of the song control system, is the most critical node in this circuit receiving a multitude of inputs and sending precisely timed commands that determine the temporal structure of song. The complexity of this structure and its underlying organization seem to become more apparent with each experimental manipulation, and yet our understanding of the underlying circuit organization remains relatively poorly understood. In this study, Trusel and Roberts use classic whole-cell patch clamp techniques in brain slices coupled with optogenetic stimulation of select inputs to provide a careful characterization and quantification of synaptic inputs into HVC. By identifying individual projection neurons using retrograde tracer injections combined with pharmacological manipulations, they classify monosynaptic inputs onto each of the three main classes of glutamatergic projection neurons in HVC (RA-, Area X- and Av-projecting neurons). This study is remarkable in the amount of information that it generates, and the tremendous labor involved for each experiment, from the expression of opsins in each of the target inputs (Uva, NIf, mMAN, and Av), the retrograde labelling of each type of projection neuron, and ultimately the optical stimulation of infected axons while recording from identified projection neurons. Taken together, this study makes an important contribution to increasing our identification, and ultimately understanding, of the basic synaptic elements that make up the circuit organization of HVC, and how external inputs, which we know to be critical for song production and learning, contribute to the intrinsic computations within this critic circuit.

      This study is impressive in its scope, rigorous in its implementation, and thoughtful regarding its limitations. The manuscript is well-written, and I appreciate the clarity with which the authors use our latest understanding of the evolutionary origins of this circuit to place these studies within a larger context and their relevance to the study of vocal control, including human speech. My comments are minor and primarily about legibility, clarification of certain manipulations, and organization of some of the summary figures.

    1. eLife Assessment

      This work presents important findings that the human frontal cortex is involved in a flexible, dual role in both maintaining information in short-term memory, and controlling this memory content to guide adaptive behavior and decisions. The evidence supporting the conclusions is compelling, with a well-designed task, best-practice decoding methods, and careful control analyses. The work will be of broad interest to cognitive neuroscience researchers working on working memory and cognitive control.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Shao et al. investigate the contribution of different cortical areas to working memory maintenance and control processes, an important topic involving different ideas about how the human brain represents and uses information when no longer available to sensory systems. In two fMRI experiments, they demonstrate that human frontal cortex (area sPCS) represents stimulus (orientation) information both during typical maintenance, but even more so when a categorical response demand is present. That is, when participants have to apply an added level of decision control to the WM stimulus, sPCS areas encode stimulus information more than conditions without this added demand. These effects are then expanded upon using multi-area neural network models, recapitulating the empirical gradient of memory vs control effects from visual to parietal and frontal cortices. Multiple experiments and analysis frameworks provide support for the authors' conclusions, and control experiments and analysis are provided to help interpret and isolate the frontal cortex effect of interest. While some alternative explanations/theories may explain the roles of frontal cortex in this study and experiments, important additional analyses have been added that help ensure a strong level of support for these results and interpretations.

      Strengths:

      - The authors use an interesting and clever task design across two fMRI experiments that is able to parse out contributions of WM maintenance alone along with categorical, rule-based decisions. Importantly, the second experiments only uses one fixed rule, providing both an internal replication of Experiment 1's effects and extending them to a different situation when rule switching effects are not involved across mini-blocks.

      - The reported analyses using both inverted encoding models (IEM) and decoders (SVM) demonstrate the stimulus reconstruction effects across different methods, which may be sensitive to different aspects of the relationship between patterns of brain activity and the experimental stimuli.

      - Linking the multivariate activity patterns to memory behavior is critical in thinking about the potential differential roles of cortical areas in sub-serving successful working memory. Figure 3's nicely shows a similar interaction to that of Figure 2 in the role of sPCS in the categorization vs. maintenance tasks. This is an important contribution to the field when we consider how a distributed set of interacting cortical areas support successful working memory behavior.

      - The cross-decoding analysis in Figure 4 is a clever and interesting way to parse out how stimulus and rule/category information may be intertwined, which would have been one of the foremost potential questions or analyses requested by careful readers.

      - Additional ROI analyses in more anterior regions of the PFC help to contextualize the main effects of interest in the sPCS (and no effect in the inferior frontal areas, which are also retinotopic, adds specificity). And, more explanation for how motor areas or preparation are likely not involved strengthens the takeaways of the study (M1 control analysis).

      - Quantitative link via RDM-style analyses between the RNNs constructed and fMRI data.

      Weaknesses:

      - In the given tasks, multiple types of information codes may be present, and more detail on this possibility could always be added analytically or in discussion. However, the authors have added beneficial support to this comparison in this version of the manuscript.

      - The space of possible RNN architectures and their biological feasibility could always be explored more, but links between the fMRI and RNN data provide a good foundation for this work moving forward.

    3. Reviewer #2 (Public review):

      Summary:

      The author provide evidence that helps resolve long-standing questions about the differential involvement of frontal and posterior cortex in working memory. They show that whereas early visual cortex shows stronger decoding of memory content in a memorization task vs a more complex categorization task, frontal cortex shows stronger decoding during categorization tasks than memorization tasks. They find that task-optimized RNNs trained to reproduce the memorized orientations show some similarities in neural decoding to people. Together, this paper presents interesting evidence for differential responsibilities of brain areas in working memory.

      Strengths:

      This paper was overall strong. It had a well-designed task, best-practice decoding methods, and careful control analyses. The neural network modeling adds additional insight into the potential computational roles of different regions.

      Weaknesses:

      Few. The RNN-fMRI correspondence could be a little more comprehensive, but the paper contributes a compelling set of empirical findings and interpretations that can inform future research.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      We would like to sincerely thank the reviewers again for their insightful comments on the previous version of our manuscript. In the last round of review, the reviewers were mostly satisfied with our revision but raised a few suggestions and/or remaining concerns. We have further edited the manuscript to address these concerns.

      Reviewer #1:

      - An explicit, quantitative link between the RNN and fMRI data is perhaps a last point that would integrate the RNN conclusion and analyses in line with the human imaging data.

      Reviewer #2:

      - Few. While more could be perhaps done to understand the RNN-fMRI correspondence, the paper contributes a compelling set of empirical findings and interpretations that can inform future research.

      To better align the RNN and fMRI results qualitatively, we performed an additional representational similarity analysis (RSA) on the data. Specifically, we computed the representational dissimilarity matrices (RDMs) for fMRI and RNN data separately, and calculated the correlation between the RDMs to quantify the similarity between fMRI data and different RNN models. We found that, consistent with our main claims, RNN2 generally demonstrated higher similarity with the fMRI data compared to RNN1. These results provide further support that RNN2 aligns better with human neuroimaging data. We have included this result (lines 496-505) and the corresponding figure (Figure 7) in the manuscript.

      Reviewer #1:

      - As Rev 2 mentions, multiple types of information codes may be present, and the response letter Figure 5 using representational similarity (RSA) gets at this question. It would strengthen the work to, at minimum, include this analysis as an extended or supplemental figure.

      Following this suggestion, we have now included Response Letter Figure 5 from the previous round of review in the manuscript (lines 381-387 and Appendix 1 – figure 7).

      Reviewer #1:

      - To sum up the results, a possible, brief schematic of each cortical area analyzed and its contribution to information coding in WM and successful subsequent behavior may help readers take away important conclusions of the cortical circuitry involved.

      Following this suggestion, we have added a schematic figure illustrating the contribution of each cortical region in our experiment to better summarize our findings (Figure 8).

      We hope that these changes further clarify the issues and strengthen the key claims in our manuscript.

    1. eLife Assessment

      This study presents a fundamental finding on how levels of m6A levels are controlled, invoking a consolidated model where degradation of modified RNAs in the cytoplasm plays a primary role in shaping m6A patterns and dynamics, rather than needing active regulation by m6A erasers and other related processes. The evidence is compelling through its use of transcriptome-wide data and mechanistic modeling. Relevant for any reader with an interest in RNA metabolism, this new framework consolidates previous observations and highlights the importance of careful experimentation for evaluation m6A levels.

    2. Reviewer #1 (Public review):

      Here, the authors propose that changes in m6A levels may be predictable via a simple model that is based exclusively on mRNA metabolic events. Under this model, m6A mRNAs are "passive" victims of RNA metabolic events with no "active" regulatory events needed to modulate their levels by m6A writers, readers, or erasers; looking at changes in RNA transcription, RNA export, and RNA degradation dynamics is enough to explain how m6A levels change over time.

      The relevance of this study is extremely high at this stage of the epitranscriptome field. This compelling paper is in line with more and more recent studies showing how m6A is a constitutive mark reflecting overall RNA redistribution events. At the same time, it reminds every reader to carefully evaluate changes in m6A levels if observed in their experimental setup. It highlights the importance of performing extensive evaluations on how much RNA metabolic events could explain an observed m6A change.

    3. Reviewer #2 (Public review):

      Dierks et al. investigate the impact of m6A RNA modifications on the mRNA life cycle, exploring the links between transcription, cytoplasmic RNA degradation and subcellular RNA localization. Using transcriptome-wide data and mechanistic modelling of RNA metabolism, the authors demonstrate that a simplified model of m6A primarily affecting cytoplasmic RNA stability is sufficient to explain the nuclear-cytoplasmic distribution of methylated RNAs and the dynamic changes in m6A levels upon perturbation. Based on multiple lines of evidence, they propose that passive mechanisms based on the restricted decay of methylated transcripts in the cytoplasm play a primary role in shaping condition-specific m6A patterns and m6A dynamics. The authors support their hypothesis with multiple large-scale datasets and targeted perturbation experiments. Overall, the authors present compelling evidence for their model which has the potential to explain and consolidate previous observations on different m6A functions, including m6A-mediated RNA export.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript works with a hypothesis where the overall m6A methylation levels in cells is influenced by mRNA metabolism (sub-cellular localization and decay). The basic assumption is that m6A causes Mrna decay and this happens in the cytoplasm. They go on to experimentally test their model to confirm its predictions. This is confirmed by sub-cellular fractionation experiments which shows high m6A levels in the nuclear RNA. Nuclear localized RNAs have higher methylation. Using a heat shock model, they demonstrate that RNAs with increased nuclear localization or transcription, are methylated at higher levels. Their overall argument is that changes in m6A levels is rather determined by passive processes that are influenced by RNA processing/metabolism. However, it should be considered that erasers have their roles under specific environments (early embryos or germline) and are not modelled by the cell culture systems used here.

      Strengths:

      This is a thought-provoking series of experiments that challenge the idea that active mechanisms of recruitment or erasure are major determinants for m6A distribution and levels.

      Comments on revisions:

      The authors have done a good job with the revision.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors propose that changes in m6A levels may be predictable via a simple model that is based exclusively on mRNA metabolic events. Under this model, m6A mRNAs are "passive" victims of RNA metabolic events with no "active" regulatory events needed to modulate their levels by m6A writers, readers, or erasers; looking at changes in RNA transcription, RNA export, and RNA degradation dynamics is enough to explain how m6A levels change over time.

      The relevance of this study is extremely high at this stage of the epi transcriptome field. This compelling paper is in line with more and more recent studies showing how m6A is a constitutive mark reflecting overall RNA redistribution events. At the same time, it reminds every reader to carefully evaluate changes in m6A levels if observed in their experimental setup. It highlights the importance of performing extensive evaluations on how much RNA metabolic events could explain an observed m6A change.

      Weaknesses:

      It is essential to notice that m6ADyn does not exactly recapitulate the observed m6A changes. First, this can be due to m6ADyn's limitations. The authors do a great job in the Discussion highlighting these limitations. Indeed, they mention how m6ADyn cannot interpret m6A's implications on nuclear degradation or splicing and cannot model more complex scenario predictions (i.e., a scenario in which m6A both impacts export and degradation) or the contribution of single sites within a gene.

      Secondly, since predictions do not exactly recapitulate the observed m6A changes, "active" regulatory events may still play a partial role in regulating m6A changes. The authors themselves highlight situations in which data do not support m6ADyn predictions. Active mechanisms to control m6A degradation levels or mRNA export levels could exist and may still play an essential role.

      We are grateful for the reviewer’s appreciation of our findings and their implications, and are in full agreement with the reviewer regarding the limitations of our model, and the discrepancies in some cases - with our experimental measurements, potentially pointing at more complex biology than is captured by m6ADyn. We certainly cannot dismiss the possibility that active mechanisms may play a role in shaping m6A dynamics at some sites, or in some contexts. Our study aims to broaden the discussion in the field, and to introduce the possibility that passive models can explain a substantial extent of the variability observed in m6A levels.

      (1) "We next sought to assess whether alternative models could readily predict the positive correlation between m6A and nuclear localization and the negative correlations between m6A and mRNA stability. We assessed how nuclear decay might impact these associations by introducing nuclear decay as an additional rate, δ. We found that both associations were robust to this additional rate (Supplementary Figure 2a-c)."

      Based on the data, I would say that model 2 (m6A-dep + nuclear degradation) is better than model 1. The discussion of these findings in the Discussion could help clarify how to interpret this prediction. Is nuclear degradation playing a significant role, more than expected by previous studies?

      This is an important point, which we’ve now clarified in the discussion. Including nonspecific nuclear degradation in the m6ADyn framework provides a model that better aligns with the observed data, particularly by mitigating unrealistic predictions such as excessive nuclear accumulation for genes with very low sampled export rates. This adjustment addresses potential artifacts in nuclear abundance and half-life estimations. However, we continued to use the simpler version of m6ADyn for most analyses, as it captures the key dynamics and relationships effectively without introducing additional complexity. While including nuclear degradation enhances the model's robustness, it does not fundamentally alter the primary conclusions or outcomes. This balance allows for a more straightforward interpretation of the results.

      (2) The authors classify m6A levels as "low" or "high," and it is unclear how "low" differs from unmethylated mRNAs.

      We thank the reviewer for this observation. We analyzed gene methylation levels using the m6A-GI (m6A gene index) metric, which reflects the enrichment of the IP fraction across the entire gene body (CDS + 3UTR). While some genes may have minimal or no methylation, most genes likely exist along a spectrum from low to high methylation levels. Unlike earlier analyses that relied on arbitrary thresholds to classify sites as methylated, GLORI data highlight the presence of many low-stoichiometry sites that are typically overlooked. To capture this spectrum, we binned genes into equal-sized groups based on their m6A-GI values, allowing a more nuanced interpretation of methylation patterns as a continuum rather than a binary or discrete classification (e.g. no- , low- , high methylation).

      (3) The authors explore whether m6A changes could be linked with differences in mRNA subcellular localization. They tested this hypothesis by looking at mRNA changes during heat stress, a complex scenario to predict with m6ADyn. According to the collected data, heat shock is not associated with dramatic changes in m6A levels. However, the authors observe a redistribution of m6A mRNAs during the treatment and recovery time, with highly methylated mRNAs getting retained in the nucleus being associated with a shorter half-life, and being transcriptional induced by HSF1. Based on this observation, the authors use m6Adyn to predict the contribution of RNA export, RNA degradation, and RNA transcription to the observed m6A changes. However:

      (a) Do the authors have a comparison of m6ADyn predictions based on the assumption that RNA export and RNA transcription may change at the same time?

      We thank the reviewer for this point. Under the simple framework of m6ADyn in which RNA transcription and RNA export are independent of each other, the effect of simultaneously modulating two rates is additive. In Author response image 1, we simulate some scenarios wherein we simultaneously modulate two rates. For example, transcriptional upregulation and decreased export during heat shock could reinforce m6A increases, whereas transcriptional downregulation might counteract the effects of reduced export. Note that while production and export can act in similar or opposing directions, the former can only lead to temporary changes in m6A levels but without impacting steady-state levels, whereas the latter (changes in export) can alter steady-state levels. We have clarified this in the manuscript results to better contextualize how these dynamics interact.

      Author response image 1.

      m6ADyn predictions of m6A gene levels (left) and Nuc to Cyt ratio (right) upon varying perturbations of a sampled gene. The left panel depicts the simulated dynamics of log2-transformed m6A gene levels under varying conditions. The lines represent the following perturbations: (1) export is reduced to 10% (β), (2) production is increased 10-fold (α) while export is reduced to 10% (β), (3) export is reduced to 10% (β) and production is reduced to 10% (α), and (4) export is only decreased for methylated transcripts (β^m6A) to 10%. The right panel shows the corresponding nuclear:cytoplasmic (log2 Nuc:Cyt) ratios for perturbations 1 and 4.

      (b) They arbitrarily set the global reduction of export to 10%, but I'm not sure we can completely rule out whether m6A mRNAs have an export rate during heat shock similar to the non-methylated mRNAs. What happens if the authors simulate that the block in export could be preferential for m6A mRNAs only?

      We thank the reviewer for this interesting suggestion. While we cannot fully rule out such a scenario, we can identify arguments against it being an exclusive explanation. Specifically, an exclusive reduction in the export rate of methylated transcripts would be expected to increase the relationship between steady-state m6A levels (the ratio of methylated to unmethylated transcripts) and changes in localization, such that genes with higher m6A levels would exhibit a greater relative increase in the nuclear-to-cytoplasmic (Nuc:Cyt) ratio. However, the attached analysis shows only a weak association during heat stress, where genes with higher m6A-GI levels tend to increase just a little more in the Nuc:Cyt ratio, likely due to cytoplasmic depletion. A global reduction of export (β 10%) produces a similar association, while a scenario where only the export of methylated transcripts is reduced (β^m6A 10%) results in a significantly stronger association (Author response image 2). This supports the plausibility of a global export reduction. Additionally, genes with very low methylation levels in control conditions also show a significant increase in the Nuc:Cyt ratio, which is inconsistent with a scenario of preferential export reduction for methylated transcripts (data not shown).

      Author response image 2.

      Wild-type MEFs m6A-GIs (x-axis) vs. fold change nuclear:cytoplasmic localization heat shock 1.5 h and control (y-axis), Pearson’s correlation indicated (left panel). m6ADyn, rates sampled for 100 genes based on gamma distributions and simulation based on reducing the global export rate (β) to 10% (middle panel). m6ADyn simulation for reducing the export rate for m6A methylated transcripts (β^m6A) to 10% (right panel).

      (c) The dramatic increase in the nucleus: cytoplasmic ratio of mRNA upon heat stress may not reflect the overall m6A mRNA distribution upon heat stress. It would be interesting to repeat the same experiment in METTL3 KO cells. Of note, m6A mRNA granules have been observed within 30 minutes of heat shock. Thus, some m6A mRNAs may still be preferentially enriched in these granules for storage rather than being directly degraded. Overall, it would be interesting to understand the authors' position relative to previous studies of m6A during heat stress.

      The reviewer suggests that methylation is actively driving localization during heat shock, rather than being passively regulated. To address this question, we have now knocked down WTAP, an essential component of the methylation machinery, and monitored nuclear:cytoplasmic localization over the course of a heat shock response. Even with reduced m6A levels, high PC1 genes exhibit increased nuclear abundance during heat shock. Notably, the dynamics of this trend are altered, with the peak effect delayed from 1.5h heat shock in siCTRL samples to 4 hours in siWTAP samples (Supplementary Figure 4). This finding underscores that m6A is not the primary driver of these mRNA localization changes but rather reflects broader mRNA metabolic shifts during heat shock. These findings have been added as a panel e) to Supplementary Figure 4.

      (d) Gene Ontology analysis based on the top 1000 PC1 genes shows an enrichment of GOs involved in post-translational protein modification more than GOs involved in cellular response to stress, which is highlighted by the authors and used as justification to study RNA transcriptional events upon heat shock. How do the authors think that GOs involved in post-translational protein modification may contribute to the observed data?

      High PC1 genes exhibit increased methylation and a shift in nuclear-to-cytoplasmic localization during heat stress. While the enriched GO terms for these genes are not exclusively related to stress-response proteins, one could speculate that their nuclear retention reduces translation during heat stress. The heat stress response genes are of particular interest, which are massively transcriptionally induced and display increased methylation. This observation supports m6ADyn predictions that elevated methylation levels in these genes are driven by transcriptional induction rather than solely by decreased export rates.

      (e) Additionally, the authors first mention that there is no dramatic change in m6A levels upon heat shock, "subtle quantitative differences were apparent," but then mention a "systematic increase in m6A levels observed in heat stress". It is unclear to which systematic increase they are referring to. Are the authors referring to previous studies? It is confusing in the field what exactly is going on after heat stress. For instance, in some papers, a preferential increase of 5'UTR m6A has been proposed rather than a systematic and general increase.

      We thank the reviewer for raising this point. In our manuscript, we sought to emphasize, on the one hand, the fact that m6A profiles are - at first approximation - “constitutive”, as indicated by high Pearson correlations between conditions (Supplementary Figure 4a). On the other hand, we sought to emphasize that the above notwithstanding, subtle quantitative differences are apparent in heat shock, encompassing large numbers of genes, and these differences are coherent with time following heat shock (and in this sense ‘systematic’), rather than randomly fluctuating across time points. Based on our analysis, these changes do not appear to be preferentially enriched at 5′UTR sites but occur more broadly across gene bodies (potentially a slight 3’ bias). A quick analysis of the HSF1-induced heat stress response genes, focusing on their relative enrichment of methylation upon heat shock, shows that the 5'UTR regions exhibit a roughly similar increase in methylation after 1.5 hours of heat stress compared to the rest of the gene body (Author response image 3). A prominent previous publication (Zhou et al. 2015) suggested that m6A levels specifically increase in the 5'UTR of HSPA1A in a YTHDF2- and HSF1-dependent manner, and highlighted the role of 5'UTR m6A methylation in regulating cap-independent translation, our findings do not support a 5'UTR-specific enrichment. However, we do observe that the methylation changes are still HSF1-dependent. Off note, the m6A-GI (m6A gene level) as a metric that captures the m6A enrichment of gene body excluding the 5’UTR, due to an overlap of transcription start site associated m6Am derived signal.

      Author response image 3.

      Fold change of m6A enrichment (m6A-IP / input) comparing 1.5 h heat shock and control conditions for 5UTR region and the rest of the gene body (CDS and 3UTR) in the 10 HSF! dependent stress response genes.

      Reviewer #2 (Public review):

      Dierks et al. investigate the impact of m6A RNA modifications on the mRNA life cycle, exploring the links between transcription, cytoplasmic RNA degradation, and subcellular RNA localization. Using transcriptome-wide data and mechanistic modelling of RNA metabolism, the authors demonstrate that a simplified model of m6A primarily affecting cytoplasmic RNA stability is sufficient to explain the nuclear-cytoplasmic distribution of methylated RNAs and the dynamic changes in m6A levels upon perturbation. Based on multiple lines of evidence, they propose that passive mechanisms based on the restricted decay of methylated transcripts in the cytoplasm play a primary role in shaping condition-specific m6A patterns and m6A dynamics. The authors support their hypothesis with multiple large-scale datasets and targeted perturbation experiments. Overall, the authors present compelling evidence for their model which has the potential to explain and consolidate previous observations on different m6A functions, including m6A-mediated RNA export.

      We thank the reviewer for the spot-on suggestions and comments on this manuscript.

      Reviewer #3 (Public review):

      Summary:

      This manuscript works with a hypothesis where the overall m6A methylation levels in cells are influenced by mRNA metabolism (sub-cellular localization and decay). The basic assumption is that m6A causes mRNA decay and this happens in the cytoplasm. They go on to experimentally test their model to confirm its predictions. This is confirmed by sub-cellular fractionation experiments which show high m6A levels in the nuclear RNA. Nuclear localized RNAs have higher methylation. Using a heat shock model, they demonstrate that RNAs with increased nuclear localization or transcription, are methylated at higher levels. Their overall argument is that changes in m6A levels are rather determined by passive processes that are influenced by RNA processing/metabolism. However, it should be considered that erasers have their roles under specific environments (early embryos or germline) and are not modelled by the cell culture systems used here.

      Strengths:

      This is a thought-provoking series of experiments that challenge the idea that active mechanisms of recruitment or erasure are major determinants for m6A distribution and levels.

      We sincerely thank the reviewer for their thoughtful evaluation and constructive feedback.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Supplementary Figure 5A Data: Please double-check the label of the y-axis and the matching legend.

      We corrected this.

      (2) A better description of how the nuclear: cytoplasmic fractionation is performed.

      We added missing information to the Material & Methods section.

      (3) Rec 1hr or Rec 4hr instead of r1 and r4 to indicate the recovery.

      For brevity in Figure panels, we have chosen to stick with r1 and r4.

      (4) Figure 2D: are hours plotted?

      Plotted is the fold change (FC) of the calculated half-lives in hours (right). For the model (left) hours are the fold change of a dimension-less time-unit of the conditions with m6A facilitated degradation vs without. We have now clarified this in the legend.

      (5) How many genes do we have in each category? How many genes are you investigating each time?

      We thank the reviewer for this question. In all cases where we binned genes, we used equal-sized bins of genes that met the required coverage thresholds. We have reviewed the manuscript to ensure that the number of genes included in each analysis or the specific coverage thresholds used are clearly stated throughout the text.

      (6) Simulations on 1000 genes or 2000 genes?

      We thank the reviewer for this question and went over the text to correct for cases in which this was not clearly stated.

      Reviewer #2 (Recommendations for the authors):

      Specific comments:

      (1) The manuscript is very clear and well-written. However, some arguments are a bit difficult to understand. It would be helpful to clearly discriminate between active and passive events. For example, in the sentence: "For example, increasing the m6A deposition rate (⍺m6A) results in increased nuclear localization of a transcript, due to the increased cytoplasmic decay to which m6A-containing transcripts are subjected", I would directly write "increased relative nuclear localization" or "apparent increase in nuclear localization".

      We thank the reviewer for this careful observation. We have modified the quoted sentence, and also sought to correct additional instances of ambiguity in the text.

      Also, it is important to ensure that all relationships are described correctly. For example, in the sentence: "This model recovers the positive association between m6A and nuclear localization but gives rise to a positive association between m6A and decay", I think "decay" should be replaced with "stability". Similarly, the sentence: "Both the decrease in mRNA production rates and the reduction in export are predicted by m6ADyn to result in increasing m6A levels, ..." should it be "Both the increase in mRNA production and..."?

      We have corrected this.

      This sentence was difficult for me to understand: "Our findings raise the possibility that such changes could, at least in part, also be indirect and be mediated by the redistribution of mRNAs secondary to loss of cytoplasmic m6A-dependent decay." Please consider rephrasing it.

      We rephrased this sentence as suggested.

      (2) Figure 2d: "A final set of predictions of m6ADyn concerns m6A-dependent decay. m6ADyn predicts that (a) cytoplasmic genes will be more susceptible to increased m6A mediated decay, independent of their m6A levels, and (b) more methylated genes will undergo increased decay, independently of their relative localization (Figure 2d left) ... Strikingly, the experimental data supported the dual, independent impact of m6A levels and localization on mRNA stability (Figure 2d, right)."

      I do not understand, either from the text or from the figure, why the authors claim that m6A levels and localization independently affect mRNA stability. It is clear that "cytoplasmic genes will be more susceptible to increased m6A mediated decay", as they always show shorter half-lives (top-to-bottom perspective in Figure 2d). Nonetheless, as I understand it, the effect is not "independent of their m6A levels", as half-lives are clearly the shortest with the highest m6A levels (left-to-right perspective in each row).

      The two-dimensional heatmaps allow for exploring conditional independence between conditions. If an effect (in this case delta half-life) is a function of the X axis (in this case m6A levels), continuous increases should be seen going from one column to another. Conversely, if it is a function of the Y axis (in this case localization), a continuous effect should be observed from one row to another. Given that effects are generally observed both across rows and across columns, we concluded that the two act independently. The fact that half-life is shortest when genes are most cytoplasmic and have the highest m6A levels is therefore not necessarily inconsistent with two effects acting independently, but instead interpreted by us as the additive outcome of two independent effects. Having said this, a close inspection of this plot does reveal a very low impact of localization in contexts where m6A levels are very low, which could point at some degree of synergism between m6A levels and localization. We have therefore now revised the text to avoid describing the effects as "independent."

      (3) The methods part should be extended. For example, the description of the mRNA half-life estimation is far too short and lacks details. Also, information on the PCA analysis (Figure 4e & f) is completely missing. The code should be made available, at least for the differential model.

      We thank the reviewer for this point and expanded the methods section on mRNA stability analysis and PCA. Additionally, we added a supplementary file, providing R code for a basic m6ADyn simulation of m6A depleted to normal conditions (added Source Code 1).

      https://docs.google.com/spreadsheets/d/1Wy42QGDEPdfT-OAnmH01Bzq83hWVrYLsjy_B4n CJGFA/edit?usp=sharing

      (4) Figure 4e, f: The authors use a PCA analysis to achieve an unbiased ranking of genes based on their m6A level changes. From the present text and figures, it is unclear how this PCA was performed. Besides a description in the methods sections, the authors could show additional evidence that the PCA results in a meaningful clustering and that PC1 indeed captures induced/reduced m6A level changes for high/low-PC1 genes.

      We have added passages to the text, hoping to clarify the analysis approach.

      (5) In Figure 4i, I was surprised about the m6A dynamics for the HSF1-independent genes, with two clusters of increasing or decreasing m6A levels across the time course. Can the model explain these changes? Since expression does not seem to be systematically altered, are there differences in subcellular localization between the two clusters after heat shock?

      A general aspect of our manuscript is attributing changes in m6A levels during heat stress to alterations in mRNA metabolism, such as production or export. As shown in Supplementary Figure 4d, even in WT conditions, m6A level changes are not strictly associated with apparent changes in expression, but we try to show that these are a reflection of the decreased export rate. In the specific context of HSF1-dependent stress response genes, we observe a clear co-occurrence of increased m6A levels with increased expression levels, which we propose to be attributed to enhanced production rates during heat stress. This suggests that transcriptional induction can drive the apparent rise in m6A levels. We try to control this with the HSF1 KO cells, in which the m6A level changes, as the increased production rates are absent for the specific cluster of stress-induced genes, further supporting the role of transcriptional activation in shaping m6A levels for these genes. For HSF1-independent genes, the HSF-KO cells mirror the behavior of WT conditions when looking at 500 highest and lowest PC1 (based on the prior analysis in WT cells), suggesting that changes in m6A levels are primarily driven by altered export rates rather than changes in production.

      Among the HSF1 targets, Hspa1a seems to show an inverse behaviour, with the highest methylation in ctrl, even though expression strongly goes up after heat shock. Is this related to the subcellular localization of this particular transcript before and after heat shock?

      Upon reviewing the heat stress target genes, we identified an issue with the proper labeling of the gene symbols, which has now been corrected (Figure 4 panel i). The inverse behavior observed for Hspb1 and partially for Hsp90aa1 is not accounted for by the m6ADyn model, and is indeed an interesting exception with respect to all other induced genes. Further investigation will be required to understand the methylation dynamics of Hspb1 during the response to heat stress.

      Reviewer #3 (Recommendations for the authors):

      Page 4. Indicate reference for "a more recent study finding reduced m6A levels in chromatin-associated RNA.".

      We thank the reviewer for this point and added two publications with a very recent one, both showing that chromatin-associated nascent RNA has less m6A methylation

      The manuscript is perhaps a bit too long. It took me a long time to get to the end. The findings can be clearly presented in a more concise manner and that will ensure that anyone starting to read will finish it. This is not a weakness, but a hope that the authors can reduce the text.

      We have respectfully chosen to maintain the length of the manuscript. The model, its predictions and their relationship to experimental observations are somewhat complex, and we felt that further reduction of the text would come at the expense of clarity.

    1. eLife Assessment

      This valuable study builds on previous work by the authors by presenting a potentially key method for correcting optical aberrations in GRIN lens-based microendoscopes used for imaging deep brain regions. By combining simulations and experiments, the authors provide convincing evidence showing that the obtained field of view is significantly increased with corrected, versus uncorrected microendoscopes. Because the approach described in this paper does not require any microscope or software modifications, it can be readily adopted by neuroscientists who wish to image neuronal activity deep in the brain.

    2. Reviewer #1 (Public review):

      Summary:

      Sattin, Nardin, and colleagues designed and evaluated corrective microlenses that increase the useable field of view of two long (>6mm) thin (500 um diameter) GRIN lenses used in deep-tissue two-photon imaging. This paper closely follows the thread of earlier work from the same group (esp. Antonini et al, 2020; eLife), filling out the quiver of available extended-field-of-view 2P endoscopes with these longer lenses. The lenses are made by a molding process that appears practical and easy to adopt with conventional two-photon microscopes.

      Simulations are used to motivate the benefits of extended field of view, demonstrating that more cells can be recorded, with less mixing of signals in extracted traces, when recorded with higher optical resolution. In vivo tests were performed in piriform cortex, which is difficult to access, especially in chronic preparations.

      The design, characterization, and simulations are clear and thorough, but they do not break new ground in optical design or biological application. However, the approach shows much promise, including for applications such as miniaturized GRIN-based microscopes. Readers will largely be interested in this work for practical reasons: to apply the authors' corrected endoscopes to their own research.

      Strengths:

      The text is clearly written, the ex vivo analysis is thorough and well supported, and the figures are clear. The authors achieved their aims, as evidenced by the images presented, and were able to make measurements from large numbers of cells simultaneously in vivo in a difficult preparation.

      The authors did a good job of addressing issues I raised in initial review, including analyses of chromaticity and the axial field of view, descriptions of manufacturing and assembly yield, explanations in the text of differences between ex vivo and in vivo imaging conditions, and basic analysis of the in vivo recordings relative to odor presentations. They have also shortened the text, reduced repetition, and better motivated their approach in the introduction.

      Weaknesses:

      As discussed in review and nicely simulated by the authors, the large figure error indicated by profilometry (~10 um in some cases on average) is inconsistent with the optical performance improvements observed, suggesting that those measurements are inaccurate. I see no reason to include these inaccurate measurements.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors present an approach to correct GRIN lens aberrations, which primarily cause a decrease in signal-to-noise ratio (SNR), particularly in the lateral regions of the field-of-view (FOV), thereby limiting the usable FOV. The authors propose to mitigate these aberrations by designing and fabricating aspherical corrective lenses using ray trace simulations and two-photon lithography, respectively; the corrective lenses are then mounted on the back aperture of the GRIN lens.

      This approach was previously demonstrated by the same lab for GRIN lenses shorter than 4.1 mm (Antonini et al., eLife, 2020). In the current work, the authors extend their method to a new class of GRIN lenses with lengths exceeding 6 mm, enabling access to deeper brain regions as most ventral region of the mouse brain. Specifically, they designed and characterized corrective lenses for GRIN lenses measuring 6.4 mm and 8.8 mm in length. Finally, they applied these corrected long micro-endoscopes to perform high-precision calcium signal recordings in the olfactory cortex.

      Compared with alternative approaches using adaptive optics, the main strength of this method is that it does not require hardware or software modifications, nor does it limit the system's temporal resolution. The manuscript is well-written, the data are clearly presented, and the experiments convincingly demonstrate the advantages of the corrective lenses.

      The implementation of these long corrected micro-endoscopes, demonstrated here for deep imaging in the mouse olfactory bulb, will also enable deep imaging in larger mammals such as rats or marmosets.

      Comments on revisions:

      The authors have clearly addressed all my comments.

    4. Reviewer #3 (Public review):

      Summary:

      This work presents the development, characterization and use of new thin microendoscopes (500µm diameter) whose accessible field of view has been extended by the addition of a corrective optical element glued to the entrance face. Two microendoscopes of different lengths (6.4mm and 8.8mm) have been developed, allowing imaging of neuronal activity in brain regions >4mm deep. An alternative solution to increase the field of view could be to add an adaptive optics loop to the microscope to correct the aberrations of the GRIN lens. The solution presented in this paper does not require any modification of the optical microscope and can therefore be easily accessible to any neuroscience laboratory performing optical imaging of neuronal activity.

      Strengths:

      (1) The paper is generally clear and well written. The scientific approach is well structured and numerous experiments and simulations are presented to evaluate the performance of corrected microendoscopes. In particular, we can highlight several consistent and convincing pieces of evidence for the improved performance of corrected microendoscopes:

      - PSFs measured with corrected microendoscopes 75µm from the centre of the FOV show a significant reduction in optical aberrations compared to PSFs measured with uncorrected microendoscopes.

      - Morphological imaging of fixed brain slices shows that optical resolution is maintained over a larger field of view with corrected microendoscopes compared to uncorrected ones, allowing neuronal processes to be revealed even close to the edge of the FOV.

      - Using synthetic calcium data, the authors showed that the signals obtained with the corrected microendoscopes have a significantly stronger correlation with the ground truth signals than those obtained with uncorrected microendoscopes.

      (2) There is a strong need for high quality microendoscopes to image deep brain regions in vivo. The solution proposed by the authors is simple, efficient and potentially easy to disseminate within the neuroscience community.

      Weaknesses:

      Weaknesses that were present in the first version of the paper were carefully addressed by the authors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Life Assessment

      This valuable study builds on previous work by the authors by presenting a potentially key method for correcting optical aberrations in GRIN lens-based micro endoscopes used for imaging deep brain regions. By combining simulations and experiments, the authors show that the obtained field of view is significantly increased with corrected, versus uncorrected microendoscopes. The evidence supporting the claims of the authors is solid, although some aspects of the manuscript should be clarified and missing information provided. Because the approach described in this paper does not require any microscope or software modifications, it can be readily adopted by neuroscientists who wish to image neuronal activity deep in the brain.

      We thank the Referees for their interest in the paper and for the constructive feedback. We have taken the time necessary to address all of their comments, acquiring new data and performing additional analyses. With the inclusion of these new results, we modified four main figures (Figures 1, 6, 7, and 8), added three new Supplementary Figures (Supplementary Figures 1, 2, and 3), and significantly edited the text. Based on the additional work suggested by the Referees, we believe that we have improved our manuscript, provided missing information, and clarified some aspects of the manuscript, which the Referees pointed our attention to.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Referee’s comment: Sattin, Nardin, and colleagues designed and evaluated corrective microlenses that increase the useable field of view of two long (>6mm) thin (500 um diameter) GRIN lenses used in deep-tissue two-photon imaging. This paper closely follows the thread of earlier work from the same group (e.g. Antonini et al, 2020; eLife), filling out the quiver of available extended-fieldof-view 2P endoscopes with these longer lenses. The lenses are made by a molding process that appears practical and easy to adopt with conventional two-photon microscopes.

      Simulations are used to motivate the benefits of extended field of view, demonstrating that more cells can be recorded, with less mixing of signals in extracted traces, when recorded with higher optical resolution. In vivo tests were performed in the piriform cortex, which is difficult to access, especially in chronic preparations.

      The design, characterization, and simulations are clear and thorough, but not exhaustive (see below), and do not break new ground in optical design or biological application. However, the approach shows much promise, including for applications not mentioned in the present text such as miniaturized GRIN-based microscopes. Readers will largely be interested in this work for practical reasons: to apply the authors' corrected endoscopes.

      Strengths:

      The text is clearly written, the ex vivo analysis is thorough and well-supported, and the figures are clear. The authors achieved their aims, as evidenced by the images presented, and were able to make measurements from large numbers of cells simultaneously in vivo in a difficult preparation.

      Weaknesses:

      Referee’s comment: (1) The novelty of the present work over previous efforts from the same group is not well explained. What needed to be done differently to correct these longer GRIN lenses?

      We thank the Referee for the positive evaluation of our work. The optical properties of GRIN lenses depend on the geometrical and optical features of the specific GRIN lens type considered, i.e. its diameter, length, numerical aperture, pitch, and radial modulation of the refractive index. Our approach is based on the addition of a corrective optical element at the back end of the GRIN lens to compensate for aberrations that light encounters as it travels through the GRIN lens. The corrective optical element must, therefore, be specifically tailored to the specific GRIN lens type we aim to correct the aberrations of. The novelty of the present article lies in the successful execution of the ray-trace simulations and two-photon lithography fabrication of corrective optical elements necessary to achieve aberration correction in the two novel and long GRIN lens types, i.e. NEM-050-25-15-860-S-1.5p and NEM-050-23-15-860-S-2.0p (GRIN length, 6.4 mm and 8.8 mm, respectively). Our previous work (Antonini et al. eLife 2020) demonstrated aberration correction with GRIN lenses shorter than 4.1 mm. The design and fabrication of a single corrective optical element suitable to enlarge the field-of-view (FOV) in these longer GRIN lenses is not obvious, especially because longer GRIN lenses are affected by stronger aberrations. To better clarify this point, we revised the Introduction at page 5 (lines 3-10 from bottom) as follows:

      “Recently, a novel method based on 3D microprinting of polymer optics was developed to correct for GRIN aberrations by placing specifically designed aspherical corrective lenses at the back end of the GRIN lens 7. This approach is attractive because it is built-in on the GRIN lens and corrected microendoscopes are ready-to-use, requiring no change in the optical set-up. However, previous work demonstrated the feasibility of this method only for GRIN lenses of length < 4.1 mm 7, which are too short to reach the most ventral regions of the mouse brain. The applicability of this technology to longer GRIN lenses, which are affected by stronger optical aberrations 19, remained to be proven.”

      (2) Some strong motivations for the method are not presented. For example, the introduction (page 3) focuses on identifying neurons with different coding properties, but this can be done with electrophysiology (albeit with different strengths and weaknesses). Compared to electrophysiology, optical methods more clearly excel at genetic targeting, subcellular measurements, and molecular specificity; these could be mentioned.

      Thank you for the comment. We added a paragraph in the Introduction (page 3, lines 2-8) according to what suggested by the Reviewer:

      “High resolution 2P fluorescence imaging of the awake brain is a fundamental tool to investigate the relationship between the structure and the function of brain circuits 1. Compared to electrophysiological techniques, functional imaging in combination with genetically encoded indicators allows monitoring the activity of genetically targeted cell types, access to subcellular compartments, and tracking the dynamics of many biochemical signals in the brain (2). However, a critical limitation of multiphoton microscopy lies in its limited (< 1 mm) penetration depth in scattering biological media 3”.

      Another example, in comparing microfabricated lenses to other approaches, an unmentioned advantage is miniaturization and potential application to mini-2P microscopes, which use GRIN lenses.

      We added the concept suggested by the Reviewer in the Discussion (page 21, lines 4-7 from bottom). The text now reads:

      “Another advantage of long corrected microendoscopes described here over adaptive optics approaches is the possibility to couple corrected microendoscopes with portable 2P microscopes 42-44, allowing high resolution functional imaging of deep brain circuits on an enlarged FOV during naturalistic behavior in freely moving mice”.

      (3) Some potentially useful information is lacking, leaving critical questions for potential adopters:

      How sensitive is the assembly to decenter between the corrective optic and the GRIN lens?

      Following the Referee’s comment, we conducted new optical simulations to evaluate the decrease in optical performance of the corrected endoscopes as a function of the radial shift of the corrective lens from the optical axis of the GRIN rod (decentering, new Supplementary Figure 3), using light rays passing either off- or on-axis. For off-axis rays, we found that the Strehl ratio remained above 0.8 (Maréchal criterion) for positive translations in the range 6-11.5 microns and 16-50 microns for the 6.4 mm- and the 8.8 mm-long corrected microendoscope, respectively, while the Strehl ratio decreased below 0.8 for negative translations of amplitude ~ 5 microns. Please note that for the most marginal rays, a negative translation produces a mismatch between the corrective microlens and the GRIN lens such that the light rays no longer pass through the corrective lens. In contrast, rays passing near the optical axis were still focused by the corrected probe with Strehl ratio above 0.8 in a range of radial shifts of -40 – 40 microns for both microendoscope types. Altogether, these novel simulations suggest that decentering between the corrective microlens and the GRIN lens < 5 microns do not majorly affect the optical properties of the corrected endoscopes. These new results are now displayed in Supplementary Figure 3 and described on page 7 (lines 3-5 from bottom).

      What is the yield of fabrication and of assembly?

      The fabrication yield using molding was ~ 90% (N > 30 molded lenses). The main limitation of this procedure was the formation of air bubbles between the mold negative and the glass coverslip. Molded lenses were visually inspected with a stereomicrscope and, in case of air bubble formation, they were discarded.

      The assembly yield, i.e. correct positioning of the GRIN lens with respect to the coverslip, was 100 % (N = 27 endoscopes).

      We added this information in the Methods at page 29 (lines 1-12), as follows:

      “After UV curing, the microlens was visually inspected at the stereomicroscope. In case of formation of air bubbles, the microlens was discarded (yield of the molding procedure: ~ 90 %, N > 30 molded lenses). The coverslip with the attached corrective lens was sealed to a customized metal or plastic support ring of appropriate diameter (Fig. 2C). The support ring, the coverslip and the aspherical lens formed the upper part of the corrected microendoscope, to be subsequently coupled to the proper GRIN rod (Table 2) using a custom-built opto-mechanical stage and NOA63 (Fig. 2C) 7. The GRIN rod was positioned perpendicularly to the glass coverslip, on the other side of the coverslip compared to the corrective lens, and aligned to the aspherical lens perimeter (Fig. 2C) under the guidance of a wide field microscope equipped with a camera. The yield of the assembly procedure for the probes used in this work was 100 % (N = 27 endoscopes). For further details on the assembly of corrected microendoscope see(7)”. 

      Supplementary Figure 1: Is this really a good agreement between the design and measured profile? Does the figure error (~10 um in some cases on average) noticeably degrade the image?

      As the Reviewer correctly noticed, the discrepancy between the simulated profile and the experimentally measured profile can be up to 5-10 microns at specific radial positions. This discrepancy could be due to issues with: (i) the fabrication of the microlens; (ii) the experimental measurement of the lens profile with the stylus profilometer. To discriminate among these two possibilities, we asked what would be the expected optical properties of the corrected endoscope should the corrective lens have the experimentally measured (not the simulated) profile. To this aim, we performed new optical simulations of the point spread function (PSF) of the corrected probe using, as corrective microlens profile, the average, experimentally measured, profile of a fabricated corrective lens. For both microendoscope types, we first fitted the mean experimentally measured profile of the fabricated lens with the aspherical function reported in equation (1) of the main text:

      where:

      -                is the radial distance from the optical axis;

      -                is equal to 1⁄ , where R is the radius of curvature;

      -                is the conic constant;

      -                − are asphericity coefficients;

      -                is the height of the microlens profile on-axis.

      The fitting values of the parameters of equation (1) for the two lenses are reported for the Referee’s inspection here below (variables describing distances are expressed in mm):

      Author response table 1.

      Fitting values for the parameters of Equation (1) describing the profile of corrective microlens replicas measured with the stylus profilometer. Distances are expressed in mm.

      We then assumed that the profile of the corrective microlenses were equal to the mean experimentally measured profiles and used the aspherical fitting functions in the optical simulations to compute the performance of corrected microendoscopes. For both microendoscope types, we found that the Strehl ratio was lower than 0.35, well below the theoretical diffractionlimited threshold of 0.8 (Maréchal criterion) at moderate distances from the optical axis (68 μm94 μm and 67 μm-92 μm on the focal plane in the object space, after the front end of the GRIN lens, for the 6.4 mm- and the 8.8 mm-long corrected microendoscope, respectively, Author response image 1A, C), and the PSF was strongly distorted (Author response image 1B, D).

      Author response image 1.

      Simulated optical performance of corrected probes with profiles of corrective microlenses equal to the mean experimentally measured profiles of fabricated corrective lenses. A) The Strehl ratio for the 6.4 mm-long corrected microendoscope with measured microlens profile (black dots) is computed on-axis (distance from the center of the FOV d = 0 µm) and at two radial distances off-axis (d = 68 μm and 94 μm on the focal plane in the object space) and compared to the Strehl ratio of the uncorrected (red line) and corrected (blue line) microendoscopes. B) Lateral (x,y) and axial (x,z) fluorescence intensity (F) profiles of simulated PSFs on-axis (left) and off-axis (right, at the indicated distance d computed on the focal plane in the object space) for the 6.4 mm-long corrected microendoscope with measured microlens profile. C) Same as in (A) for the 8.8 mm-long corrected microendoscope (off-axis d = 67 μm and 92 μm on the focal plane in the object space). D) Same as in (B) for the 8.8 mm-long corrected microendoscope.

      These simulated findings are in contrast with the experimentally measured optical properties of our corrected endoscopes (Figure 3). In other words, these novel simulated results show that experimentally measured profiles of the corrected lenses are incompatible with the experimental measurements of the optical properties of the corrected endoscopes. Therefore, our experimental recording of the lens profile shown in Supplementary Figure 1 of the first submission (now Supplementary Figure 4) should be used only as a coarse measure of the lens shape and cannot be used to precisely compare simulated lens profiles with measured lens profiles.

      How do individual radial profiles compare to the presented means?

      We provide below a modified version of Supplementary Figure 4 (Supplementary Figure 1 in the first submission), where individual profiles measured with the stylus profilometer and the mean profile are displayed for both microendoscope types (Author response image 2). In the manuscript (Supplementary Figure 4), we would suggest to keep showing mean profiles ± standard errors of the mean, as we did in the original submission.

      Author response image 2.

      Characterization of polymeric corrective lens replicas. A) Stylus profilometer measurements were performed along the radius of the corrective polymer microlens replica for the 6.4 mm-long corrected microendoscope. Individual measured profiles (grey solid lines) obtained from n = 3 profile measurements on m = 3 different corrective lens replicas, plus the mean profile (black solid line) are displayed. B) Same as (A) for the 8.8 mm-long microendoscope.

      What is the practical effect of the strong field curvature? Are the edges of the field, which come very close to the lens surface, a practical limitation?

      A first practical effect of the field curvature is that structures at different z coordinates are sampled. The observed field curvature of corrected endoscopes may therefore impact imaging in brain regions characterized by strong axially organized anatomy (e.g., the pyramidal layer of the hippocampus), but would not significantly affect imaging in regions with homogeneous cell density within the axial extension of the field curvature (< 170 µm, see more details below). A second consequence of the field curvature, as the Referee correctly points out, is that cell at the border of the FOV are closer to the front end of the GRIN lens. In measurements of subresolved fluorescent layers (Figure 3A-D), we observed that the field curvature extends in the axial direction to ~ 110 μm and ~170 μm for the 6.4 mm- and the 8.8 mm-long microendoscopes, respectively. Considered that the nominal working distances on the object side of the 6.4 mm- and the 8.8 mm-long microendoscopes were, respectively, 210 μm and 178 μm (Table 3), structures positioned at the very edge of the FOV were ~ 100 μm and ~ 8 μm away from the GRIN front end for the 6.4 mm-long and for the 8.8 mm-long probe, respectively. Previous studies have shown that brain tissue within 50-100 μm from the GRIN front end may show signs of tissue reaction to the implant (Curreli et al. PLOS Biology 2022, Attardo et al. Nature 2015). Therefore, structures at the very edge of the FOV of the 8.8 mm-long endoscopes, but not those at the edge of the 6.4 mm-long endoscopes, may be within the volume showing tissue reaction. We added a paragraph in the text to discuss these points (page 18 lines 10-14).

      The lenses appear to be corrected for monochromatic light; high-performance microscopes are generally achromatic. Is the bandwidth of two-photon excitation sufficient to warrant optimization over multiple wavelengths?

      Thanks for this comment. All optical simulations described in the first submission were performed at a fixed wavelength (λ = 920 nm). Following the Referee’s request, we explored the effect of changing wavelength on the Strehl ratio using new optical simulations. We found that the Strehl ratio remains > 0.8 at least within ± 10 nm from λ = 920 nm (new Supplementary Figure 1A-D, left panels), which covers the limited bandwidth of our femtosecond laser. Moreover, these simulations demonstrate that, on a much wider wavelength range (800 - 1040 nm), high Strehl ratio is obtained, but at different z planes (new Supplementary Figure 1A-D, right panels). This means that the corrective lens is working as expected also for wavelengths which are different from 920 nm, with different wavelengths having the most enlarged FOV located at different working distances. These new results are now described on page 7 (lines 8-10).

      GRIN lenses are often used to access a 3D volume by scanning in z (including in this study). How does the corrective lens affect imaging performance over the 3D field of view?

      The optical simulations we did to design the corrective lenses were performed maximizing aberration correction only in the focal plane of the endoscope. Following the Referee’s comment, we explored the effect of aberration correction outside the focal plane using new optical simulations. In corrected endoscopes, we found that for off-axis rays (radial distance from the optical axis > 40 μm) the Strehl ratio was > 0.8 (Maréchal criterion) in a larger volume compared to uncorrected endoscopes (new Supplementary Figure 2), demonstrating that the aberration correction method developed in this study does extend beyond the focal plane for short distances. For example, at a radial distance of ~ 90 μm from the optical axis, the axial range in which the Strehl ratio was > 0.8 in corrected endoscopes was 28 μm and 19 μm for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively. These new results are now described on page 7 (10-19).

      (4) The in vivo images (Figure 7D) have a less impressive resolution and field than the ex vivo images (Figure 4B), and the reason for this is not clear. Given the difference in performance, how does this compare to an uncorrected endoscope in the same preparation? Is the reduced performance related to uncorrected motion, field curvature, working distance, etc?

      In comparing images in Figure 4B with images shown in Figure 7D, the following points should be considered:

      (1) Figure 4B is a maximum fluorescence intensity projection of multiple axial planes of a z-stack acquired through a thin brain slice (slice thickness: 50 µm) using 8 frame averages for each plane. In contrast, images in Figure 7D are median projection of a t-series acquired on a single plane in the awake mouse at 30 Hz resonant scanning imaging (8 min, 14,400 frames).

      (2) Images of the fixed brain slice in Figure 4B were acquired at 1024 pixels x 1024 pixels resolution, nominal pixel size 0.45 µm/pixel, and with objective NA = 0.50, whereas in vivo images in Figure 7D were acquired at 512 pixels x 512 pixels resolution, nominal pixel size 0.72 - 0.84 µm/pixel, and with objective NA = 0.45.

      (3) In the in vivo preparation (Figure 7D), excitation and emission light travel through > 180 µm of scattering and absorbing brain tissue, reducing spatial resolution and the SNR of the collected fluorescence signal.

      (4) By shifting the sample in the x, y plane, in Figure 4B we could chose a FOV containing homogenously stained cells. x, y shifting and selecting across multiple FOVs was not possible in vivo, as the GRIN lens was cemented on the animal skull.

      (5) Images in Figure 7D were motion corrected, but we cannot exclude that part of the decrease in resolution observed in Figure 7D when compared to images in Figure 4B are due to incomplete correction of motion artifacts.

      For all the reasons listed above, we believe that it is expected to see smaller resolution and contrast in images recorded in vivo (Figure 7D) compared to images acquired in fixed tissue (Figure 4B).

      Regarding the question of how do images from an uncorrected and a corrected endoscopes compared in vivo, we think that this comparison is better performed in fixed tissue (Figure 4) or in simulated calcium data (Figure 5-6), rather than in vivo recordings (Figure 7). In fact, in the brain of living mice motion artifacts, changes in fluorophore expression level, variation in the optical properties of the brain (e.g., the presence of a blood vessel over the FOV) may make the comparison of images acquired with uncorrected and corrected microendoscopes difficult, requiring a large number of animals to cancel out the contributions of these factors. Comparing optical properties in fixed tissue is, in contrast, devoid of these confounding factors. Moreover, the major advantage of quantifying how the optical properties of uncorrected and corrected endoscopes impact on the ability to extract information about neuronal activity in simulated calcium data is that, under simulated conditions, we can count on a known ground truth as reference (e.g., how many neurons are in the FOV, where they are, and which is their electrical activity). This is clearly not possible in the in vivo recordings.

      Regarding Figure 7, there is no analysis of the biological significance of the calcium signals or even a description of where olfactory stimuli were presented.

      We appreciate the Reviewer pointing out the lack of detailed analysis regarding the biological significance of the calcium signals and the presentation of olfactory stimuli in Figure 7. Our initial focus was on demonstrating the effectiveness of the optimized GRIN lenses for imaging deep brain areas like the piriform cortex, with an emphasis on the improved signal-tonoise ratio (SNR) these lenses provide. However, we agree that including more context about the experimental conditions would enhance the manuscript. To address this point, we added a new panel (Figure 7F) showing calcium transients aligned with the onset of olfactory stimulus presentations, which are now indicated by shaded light blue areas. Additionally, we have specified the timing of each stimulus presented in Figure 7E. This revision allows readers to better understand the relationship between the calcium signals and the olfactory stimuli.

      The timescale of jGCaMP8f signals in Figure 7E is uncharacteristically slow for this indicator (compared to Zhang et al 2023 (Nature)), though perhaps this is related to the physiology of these cells or the stimuli.

      Regarding the timescale of the calcium signals observed in Figure 7E, we apologize for the confusion caused by a mislabeling we inserted in the original manuscript. The experiments presented in Figure 7 were conducted using jGCaMP7f, not jGCaMP8f as previously stated (both indicators were used in this study but in separate experiments). We have corrected this error in the Results section (caption of Figure 7D, E). It is important to note that jGCaMP7f has a longer half-decay time compared to jGCaMP8f, which could in part account for the slower decay kinetics observed in our data. Furthermore, the prolonged calcium signals can be attributed to the physiological properties of neurons in the piriform cortex. Upon olfactory stimulation, these neurons often fire multiple action potentials, resulting in extended calcium transients that can last several seconds. This sustained activity has been documented in previous studies, such as Roland et al. (eLife 2017, Figure 1C therein) in anesthetized animals and Wang et al. (Neuron 2020, Figure 1E therein) in awake animals, which report similar durations for calcium signals.

      (5) The claim of unprecedented spatial resolution across the FOV (page 18) is hard to evaluate and is not supported by references to quantitative comparisons. The promises of the method for future studies (pages 18-19) could also be better supported by analysis or experiment, but these are minor and to me, do not detract from the appeal of the work.

      GRIN lens-based imaging of piriform cortex in the awake mouse had already been done in Wang et al., Neuron 2020. The GRIN lens used in that work was NEM-050-50-00920-S-1.5p (GRINTECH, length: 6.4 mm; diameter: 0.5 mm), similar to the one that we used to design the 6.4 mm-long corrected microendoscope. Here we used a microendoscope specifically design to correct off-axis aberrations and enlarge the FOV, in order to maximize the number of neurons recorded with the highest possible spatial resolution, while keeping the tissue invasiveness to the minimum. Following the Referee’s comments, we revised the sentence at page 19 (lines 68 from bottom) as follows:

      “We used long corrected microendoscopes to measure population dynamics in the olfactory cortex of awake head-restrained mice with unprecedented combination of high spatial resolution across the FOV and minimal invasiveness(17)”.

      (6) The text is lengthy and the material is repeated, especially between the introduction and conclusion. Consolidating introductory material to the introduction would avoid diluting interesting points in the discussion.

      We thank the Reviewer for this comment. As suggested, we edited the Introduction and shortened the Discussion.

      Reviewer #2 (Public review):

      In this manuscript, the authors present an approach to correct GRIN lens aberrations, which primarily cause a decrease in signal-to-noise ratio (SNR), particularly in the lateral regions of the field-of-view (FOV), thereby limiting the usable FOV. The authors propose to mitigate these aberrations by designing and fabricating aspherical corrective lenses using ray trace simulations and two-photon lithography, respectively; the corrective lenses are then mounted on the back aperture of the GRIN lens.

      This approach was previously demonstrated by the same lab for GRIN lenses shorter than 4.1 mm (Antonini et al., eLife, 2020). In the current work, the authors extend their method to a new class of GRIN lenses with lengths exceeding 6 mm, enabling access to deeper brain regions as most ventral regions of the mouse brain. Specifically, they designed and characterized corrective lenses for GRIN lenses measuring 6.4 mm and 8.8 mm in length. Finally, they applied these corrected long micro-endoscopes to perform high-precision calcium signal recordings in the olfactory cortex.

      Compared with alternative approaches using adaptive optics, the main strength of this method is that it does not require hardware or software modifications, nor does it limit the system's temporal resolution. The manuscript is well-written, the data are clearly presented, and the experiments convincingly demonstrate the advantages of the corrective lenses.

      The implementation of these long corrected micro-endoscopes, demonstrated here for deep imaging in the mouse olfactory bulb, will also enable deep imaging in larger mammals such as rats or marmosets.

      We thank the Referee for the positive comments on our study. We address the points indicated by the Referee in the “Recommendation to the authors” section below.

      Reviewer #3 (Public review):

      Summary:

      This work presents the development, characterization, and use of new thin microendoscopes (500µm diameter) whose accessible field of view has been extended by the addition of a corrective optical element glued to the entrance face. Two micro endoscopes of different lengths (6.4mm and 8.8mm) have been developed, allowing imaging of neuronal activity in brain regions >4mm deep. An alternative solution to increase the field of view could be to add an adaptive optics loop to the microscope to correct the aberrations of the GRIN lens. The solution presented in this paper does not require any modification of the optical microscope and can therefore be easily accessible to any neuroscience laboratory performing optical imaging of neuronal activity.

      Strengths:

      (1) The paper is generally clear and well-written. The scientific approach is well structured and numerous experiments and simulations are presented to evaluate the performance of corrected microendoscopes. In particular, we can highlight several consistent and convincing pieces of evidence for the improved performance of corrected micro endoscopes:

      a) PSFs measured with corrected micro endoscopes 75µm from the centre of the FOV show a significant reduction in optical aberrations compared to PSFs measured with uncorrected micro endoscopes.

      b) Morphological imaging of fixed brain slices shows that optical resolution is maintained over a larger field of view with corrected micro endoscopes compared to uncorrected ones, allowing neuronal processes to be revealed even close to the edge of the FOV.

      c) Using synthetic calcium data, the authors showed that the signals obtained with the corrected microendoscopes have a significantly stronger correlation with the ground truth signals than those obtained with uncorrected microendoscopes.

      (2) There is a strong need for high-quality micro endoscopes to image deep brain regions in vivo. The solution proposed by the authors is simple, efficient, and potentially easy to disseminate within the neuroscience community.

      Weaknesses:

      (1) Many points need to be clarified/discussed. Here are a few examples:

      a) It is written in the methods: “The uncorrected microendoscopes were assembled either using different optical elements compared to the corrected ones or were obtained from the corrected

      probes after the mechanical removal of the corrective lens.”

      This is not very clear: the uncorrected microendoscopes are not simply the unmodified GRIN lenses?

      We apologize for not been clear enough on this point. Uncorrected microendoscopes are not simply unmodified GRIN lenses, rather they are GRIN lenses attached to a round glass coverslip (thickness: 100 μm). The glass coverslip was included in ray-trace optical simulations of the uncorrected system and this is the reason why commercial GRIN lenses and corresponding uncorrected microendoscopes have different working distances, as reported in Tables 2-3. To make the text clearer, we added the following sentence at page 27 (last 4 lines):

      “To evaluate the impact of corrective microlenses on the optical performance of GRIN-based microendoscopes, we also simulated uncorrected microendoscopes composed of the same optical elements of corrected probes (glass coverslip and GRIN rod), but in the absence of the corrective microlens”.

      b) In the results of the simulation of neuronal activity (Figure 5A, for example), the neurons in the center of the FOV have a very large diameter (of about 30µm). This should be discussed.

      Thanks for this comment. In synthetic calcium imaging t-series, cell radii were randomly sampled from a Gaussian distribution with mean = 10 µm and standard deviation (SD) = 3 µm. Both values were estimated from the literature (ref. no. 28: Suzuki & Bekkers, Journal of Neuroscience, 2011) as described in the Methods (page 35). In the image shown in Figure 5A, neurons near to the center of the FOV have radius of ~ 20 µm corresponding to the right tail of the distribution (mean + 3SD = 19 µm). It is also important to note that, for corrected microendoscopes, neurons in the central portion of the FOV appear larger than cells located near the edges of the FOV, because the magnification depends on the distance from the optical axis (see Figure 3E, F) and near the center the magnification is > 1 for both microendoscope types.

      Also, why is the optical resolution so low on these images?

      Images shown in Figure 5 are median fluorescence intensity projections of 5 minute-long simulated t-series. Simulated calcium data were generated with pixel size 0.8 μm/pixel and frame rate 30 Hz, similarly to in vivo recordings. In the simulations, pixels not belonging to any cell soma were assigned a value of background fluorescence randomly sampled from a normal distribution with mean and standard deviation estimated from experimental data, as described in the Methods section (page 37). To simulate activity, the mean spiking rate of neurons was set to 0.3 Hz, thus in a large fraction of frames neurons do not show calcium transients. Therefore, the median fluorescence intensity value of somata will be close to their baseline fluorescence value (_F_0). Since in simulations F0 values (~ 45-80 a.u.) were not much higher than the background fluorescence level (~ 45 a.u.), this may generate the appearance of low contrast image in Figure 5A. Finally, we suspect that PDF rendering also contributed to degrade the quality of those images. We will now submit high resolution images alongside the PDF file.

      c) It seems that we can't see the same neurons on the left and right panels of Figure 5D. This should be discussed.

      The Referee is correct. When we intersected the simulated 3D volume of ground truth neurons with the focal surface of microendoscopes, the center of the FOV for the 8.8 mmlong corrected microendoscope was located at a larger depth than the FOV of the 8.8 mm uncorrected microendoscope. This effect was due to the larger field curvature of corrected 8.8 mmlong endoscopes compared to 8.8 mm-long uncorrected endoscopes. This is the reason why different neurons were displayed for uncorrected and corrected endoscopes in Figure 5D. We added this explanation in the text at page 37 (lines 1-4). The text reads:

      “Due to the stronger field curvature of the 8.8 mm-long corrected microendoscope (Figure 1C) compared to 8.8 mm-long uncorrected microendoscopes, the center of the corrected imaging focal surface resulted at a larger depth in the simulated volume compared to the center of the uncorrected focal surface(s). Therefore, different simulated neurons were sampled in the two cases”.

      d) It is not very clear to me why in Figure 6A, F the fraction of adjacent cell pairs that are more correlated than expected increases as a function of the threshold on peak SNR. The authors showed in Supplementary Figure 3B that the mean purity index increases as a function of the threshold on peak SNR for all micro endoscopes. Therefore, I would have expected the correlation between adjacent cells to decrease as a function of the threshold on peak SNR. Similarly, the mean purity index for the corrected short microendoscope is close to 1 for high thresholds on peak SNR: therefore, I would have expected the fraction of adjacent cell pairs that are more correlated than expected to be close to 0 under these conditions. It would be interesting to clarify these points.

      Thanks for raising this point. We defined the fraction of adjacent cell pairs more correlated than expected as the number of adjacent cell pairs more correlated than expected divided by the number of adjacent cell pairs. The reason why this fraction raises as a function of the SNR threshold is shown in Supplementary Figure 2 in the first submission (now Supplementary Figure 5). There, we separately plotted the number of adjacent cell pairs more correlated than expected (numerator) and the number of adjacent cell pairs (denominator) as a function of the SNR threshold. For both microendoscope types, we observed that the denominator more rapidly decreased with peak SNR threshold than the numerator. Therefore, the fraction of adjacent cell pairs more correlated than expected increases with the peak SNR threshold.

      To understand why the denominator decreases with SNR threshold, it should be considered that, due to the deterioration of spatial resolution and attenuation of fluorescent signal collection as a function of the radial distance from the optical axis (see for example fluorescent film profiles in Figure 3A, C), increasing the threshold on the peak SNR of extracted calcium traces implies limiting cell detection to those cells located within smaller distance from the center of the FOV. This information is shown in Figure 5C, F.

      In the manuscript text, this point is discussed at page 12 (lines 1-3 from bottom) and page 13 (lines 1-4):

      “The fraction of pairs of adjacent cells (out of the total number of adjacent pairs) whose activity correlated significantly more than expected increased as a function of the SNR threshold for corrected and uncorrected microendoscopes of both lengths (Fig. 6A, F). This effect was due to a larger decrease of the total number of pairs of adjacent cells as a function of the SNR threshold compared to the decrease in the number of pairs of adjacent cells whose activity was more correlated than expected (Supplementary Figure 5)”.

      e) Figures 6C, H: I think it would be fairer to compare the uncorrected and corrected endomicroscopes using the same effective FOV.

      To address the Reviewer’s concern, we repeated the linear regression of purity index as a function of the radial distance using the same range of radial distances for the uncorrected and corrected case of both microendoscope types. Below, we provide an updated version of Figure 6C, H for the referee’s perusal. Please note that the maximum value displayed on the x-axis of both graphs is now corresponding to the minimum value between the two maximum radial distance values obtained in the uncorrected and corrected case (maximum radial distance displayed: 151.6 µm and 142.1 μm for the 6.4 mm- and the 8.8 mm-long GRIN rod, respectively). Using the same effective FOV, we found that the purity index drops significantly more rapidly with the radial distance for uncorrected microendoscopes compared to the corrected ones, similarly to what observed in the original version of Figure 6. The values of the linear regression parameters and statistical significance of the difference between the slopes in the uncorrected and corrected cases are stated in the Author response image 3 caption below for both microendoscope types. In the manuscript, we would suggest to keep showing data corresponding to all detected cells, as we did in the original submission.

      Author response image 3.

      Linear regression of purity index as a function of the radial distance. A) Purity index of extracted traces with peak SNR > 10 was estimated using a GLM of ground truth source contributions and plotted as a function of the radial distance of cell identities from the center of the FOV for n = 13 simulated experiments with the 6.4 mm-long uncorrected (red) and corrected (blue) microendoscope. Black lines represent the linear regression of data ± 95% confidence intervals (shaded colored areas). Maximum value of radial distance displayed: 151.6 μm. Slopes ± standard error (s.e.): uncorrected, (-0.0015 ± 0.0002) µm-1; corrected, (-0.0006 ± 0.0001) μm-1. Uncorrected, n = 991; corrected, n = 1156. Statistical comparison of slopes, p < 10<sup>-10</sup>, permutation test. B) Same as (A) for n = 15 simulated experiments with the 8.8 mm-long uncorrected and corrected microendoscope. Maximum value of radial distance displayed: 142.1 μm. Slopes ± s.e.: uncorrected, (-0.0014 ± 0.0003) μm-1; corrected, (-0.0010 ± 0.0002) µm-1. Uncorrected, n = 718; corrected, n = 1328. Statistical comparison of slopes, p = 0.0082, permutation test.

      f) Figure 7E: Many calcium transients have a strange shape, with a very fast decay following a plateau or a slower decay. Is this the result of motion artefacts or analysis artefacts?

      Thank you for raising this point about the unusual shapes of the calcium transients in Figure 7E. The observed rapid decay following a plateau or a slower decay is indeed a result of how the data were presented in the original submission. Our experimental protocol consisted of 22 s-long trials with an inter-trial interval of 10 s (see Methods section, page 44). In the original figure, data from multiple trials were concatenated, which led to artefactual time courses and apparent discontinuities in the calcium signals. To resolve this issue, we revised Figure 7E to accurately represent individual concatenated trials. We also added a new panel (please see new Figure 7F) showing examples of single cell calcium responses in individual trials without concatenation, with annotations indicating the timing and identity of presented olfactory stimuli.

      Also, the duration of many calcium transients seems to be long (several seconds) for GCaMP8f. These points should be discussed.

      Author response: regarding the timescale of the calcium signals observed in Figure 7E, we apologize for the confusion caused by a mislabeling we inserted in the manuscript. The experiments presented in Figure 7 were conducted using jGCaMP7f, not jGCaMP8f as previously stated (both indicators were used in this study, but in separate experiments). We have corrected this error in the Results section (caption of Figure 7D, E). It is important to note that jGCaMP7f has a longer half-decay time compared to jGCaMP8f, which could in part account for the slower decay kinetics observed in our data. Furthermore, the prolonged calcium signals can be attributed to the physiological properties of neurons in the piriform cortex. Upon olfactory stimulation, these neurons often fire multiple action potentials, resulting in extended calcium transients that can last several seconds. This sustained activity has been documented in previous studies, such as Roland et al. (eLife 2017, Figure 1C therein) in anesthetized animals and Wang et al. (Neuron 2020, Figure 1E therein) in awake animals, which report similar durations for calcium signals. We cite these references in the text. We believe that these revisions and clarifications address the Reviewer's concern and enhance the overall clarity of our manuscript.

      g) The authors do not mention the influence of the neuropil on their data. Did they subtract the neuropil's contribution to the signals from the somata? It is known from the literature that the presence of the neuropil creates artificial correlations between neurons, which decrease with the distance between the neurons (Grødem, S., Nymoen, I., Vatne, G.H. et al. An updated suite of viral vectors for in vivo calcium imaging using intracerebral and retro-orbital injections in male mice. Nat Commun 14, 608 (2023). https://doi.org/10.1038/s41467-023-363243; Keemink SW, Lowe SC, Pakan JMP, Dylda E, van Rossum MCW, Rochefort NL. FISSA: A neuropil decontamination toolbox for calcium imaging signals. Sci Rep. 2018 Feb 22;8(1):3493.

      doi: 10.1038/s41598-018-21640-2. PMID: 29472547; PMCID: PMC5823956)

      This point should be addressed.

      We apologize for not been clear enough in our previous version of the manuscript. The neuropil was subtracted from calcium traces both in simulated and experimental data. Please note that instead of using the term “neuropil”, we used the word “background”. We decided to use the more general term “background” because it also applies to the case of synthetic calcium tseries, where neurons were modeled as spheres devoid of processes. The background subtraction is described in the Methods on page 39:

      F(t) was computed frame-by-frame as the difference between the average signal of pixels in each ROI and the background signal. The background was calculated as the average signal of pixels that: i) did not belong to any bounding box; ii) had intensity values higher than the mean noise value measured in pixels located at the corners of the rectangular image, which do not belong to the circular FOV of the microendoscope; iii) had intensity values lower than the maximum value of pixels within the boxes”.

      h) Also, what are the expected correlations between neurons in the pyriform cortex? Are there measurements in the literature with which the authors could compare their data?

      We appreciate the reviewer's interest in the correlations between neurons in the piriform cortex. The overall low correlations between piriform neurons we observed (Figure 8) are consistent with a published study describing ‘near-zero noise correlations during odor inhalation’ in the anterior piriform cortex of rats, based on extracellular recordings (Miura et al., Neuron 2013). However, to the best of our knowledge, measurements directly comparable to ours have not been described in the literature. Recent analyses of the correlations between piriform neurons were restricted to odor exposure windows, with the goal to quantify odor-specific activation patterns (e.g. Roland et al., eLife 2017; Bolding et al., eLife 2017, Pashkovski et al., Nature 2020; Wang et al., Neuron 2020). Here, we used correlation analyses to characterize the technical advancement of the optimized GRIN lens-based endoscopes. We showed that correlations of pairs of adjacent neurons were independent from radial distance (Figure 8B), highlighting homogeneous spatial resolution in the field of view.

      (2) The way the data is presented doesn't always make it easy to compare the performance of corrected and uncorrected lenses. Here are two examples:

      a) In Figures 4 to 6, it would be easier to compare the FOVs of corrected and uncorrected lenses if the scale bars (at the centre of the FOV) were identical. In this way, the neurons at the centre of the FOV would appear the same size in the two images, and the distances between the neurons at the centre of the FOV would appear similar. Here, the scale bar is significantly larger for the corrected lenses, which may give the illusion of a larger effective FOV.

      We appreciate the Referee’s comment. Below, we explain why we believe that the way we currently present imaging data in the manuscript is preferable:

      (1) current figures show images of the acquired FOV as they are recorded from the microscope (raw data), without rescaling. In this way, we exactly show what potential users will obtain when using a corrected microendoscope.

      (2) In the current version of the figures, the fact that the pixel size is not homogeneous across the FOV, nor equal between uncorrected and corrected microendoscopes, is initially shown in Figure 3E, F and then explicitly stated throughout the manuscript when images acquired with a corrected microendoscope are shown.

      (3) Rescaling images acquired with the corrected endoscopes gives the impression that the acquisition parameters were different between acquisitions with the corrected and uncorrected microendoscopes, which was not the case.

      Importantly, the larger FOV of the corrected microendoscope, which is one of the important technological achievements presented in this study, can be appreciated in the images regardless of the presentation format.

      b) In Figures 3A-D it would be more informative to plot the distances in microns rather than pixels. This would also allow a better comparison of the micro endoscopes (as the pixel sizes seem to be different for the corrected and uncorrected micro endoscopes).

      The Referee is correct that the pixel size is different between the corrected and uncorrected probes. This is because of the different magnification factor introduced by the corrective microlens, as described in Figure 3E, F. The rationale for showing images in Figure 3AD in pixels rather than microns is the following:

      (1) Optical simulations in Figure 1 suggest that a corrective optical element is effective in compensating for some of the optical aberrations in GRIN microendoscopes.

      (2) After fabricating the corrective optical element (Figure 2), in Figure 3A-D we conduct a preliminary analysis of the effect of the corrective optical element on the optical properties of the GRIN lens. We observed that the microfabricated optical element corrected for some aberrations (e.g., astigmatism), but also that the microfabricated optical element was characterized by significant field curvature. This can be appreciated showing distances in pixels.

      (3) The observed field curvature and the aspherical profile of the corrected lens prompted us to characterize the magnification factor of the corrected endoscopes as a function of the radial distance. We found that the magnification factor changed as a function of the radial distance (Figure 3E-F) and that pixel size was different between uncorrected and corrected endoscopes. We also observed that, in corrected endoscopes, pixel size was a function of the radial distance (Figure 3E-F).

      (4) Once all of the above was established and quantified, we assigned precise pixel size to images of uncorrected and corrected endoscopes and we show all following images of the study (Figure 3G on) using a micron (rather than pixel) scale.

      (3) There seems to be a discrepancy between the performance of the long lenses (8.8 mm) in the different experiments, which should be discussed in the article. For example, the results in Figure 4 show a considerable enlargement of the FOV, whereas the results in Figure 6 show a very moderate enlargement of the distance at which the person's correlation with the first ground truth emitter starts to drop.

      Thanks for raising this point and helping us clarifying data presentation. Images in Figure 4B are average z-projections of z-stacks acquired through a mouse fixed brain slice and they were taken with the purpose of showing all the neurons that could be visualized from the same sample using an uncorrected and a corrected microendoscope. In Figure 4B, all illuminated neurons are visible regardless of whether they were imaged with high axial resolution (e.g., < 10 µm as defined in Figure 3J) or poor axial resolution. In contrast, in Figure 6J we evaluated the correlation between the calcium trace extracted from a given ROI and the real activity trace of the first simulated ground truth emitter for that specific ROI. The moderate increase in the correlation for the corrected microendoscope compared to the uncorrected microendoscope (Figure 6J) is consistent with the moderate improvement in the axial resolution of the corrected probe compared to the uncorrected probe at intermediate radial distances (60-100 µm from the optical axis, see Figure 3J). We added a paragraph in the Results section (page 14, lines 8-18) to summarize the points described above.

      a) There is also a significant discrepancy between measured and simulated optical performance, which is not discussed. Optical simulations (Figure 1) show that the useful FOV (defined as the radius for which the size of the PSF along the optical axis remains below 10µm) should be at least 90µm for the corrected microendoscopes of both lengths. However, for the long microendoscopes, Figure 3J shows that the axial resolution at 90µm is 17µm. It would be interesting to discuss the origin of this discrepancy: does it depend on the microendoscope used?

      As the Reviewer correctly pointed out, the size of simulated PSFs at a given radial distance (e.g., 90 µm) tends to be generally smaller than that of the experimentally measured PSFs. This might be due to multiple reasons:

      (1) simulated PSFs are excitation PSFs, i.e. they describe the intensity spatial distribution of focused excitation light. On the contrary, measured PSFs result from the excitation and emission process, thus they are also affected by aberrations of light emitted by fluorescent beads and collected by the microscope.

      (2) in the optical simulations, the Zemax file of the GRIN lenses contained first-order aberrations. High-order aberrations were therefore not included in simulated PSFs.

      (3) intrinsic variability of experimental measurements (e.g., intrinsic variability of the fabrication process, alignment of the microendoscope to the optical axis of the microscope, the distance between the GRIN back end and the objective…) are not considered in the simulations.

      We added a paragraph in the Discussion section (page 17, lines 9-18) summarizing the abovementioned points.

      Are there inaccuracies in the construction of the aspheric corrective lens or in the assembly with the GRIN lens? If there is variability between different lenses, how are the lenses selected for imaging experiments?

      The fabrication yield, i.e. the yield of generating the corrective lenses, using molding was ~ 90% (N > 30 molded lenses). The main limitation of this procedure was the formation of air bubbles between the mold negative and the glass coverslip. Molded lenses were visually inspected with the stereoscope and, in case of air bubble formation, they were discarded.

      The assembly yield, i.e. the yield of correct positioning of the GRIN lens with respect to the coverslip, was 100 % (N = 27 endoscopes).

      We added this information in the Methods at page 29 (lines 1-12), as follows:

      “After UV curing, the microlens was visually inspected at the stereomicroscope. In case of formation of air bubbles, the microlens was discarded (yield of the molding procedure: ~ 90 %, N > 30 molded lenses). The coverslip with the attached corrective lens was sealed to a customized metal or plastic support ring of appropriate diameter (Fig. 2C). The support ring, the coverslip and the aspherical lens formed the upper part of the corrected microendoscope, to be subsequently coupled to the proper GRIN rod (Table 2) using a custom-built opto-mechanical stage and NOA63 (Fig. 2C) 7. The GRIN rod was positioned perpendicularly to the glass coverslip, on the other side of the coverslip compared to the corrective lens, and aligned to the aspherical lens perimeter (Fig. 2C) under the guidance of a wide field microscope equipped with a camera. The yield of the assembly procedure for the probes used in this work was 100 % (N = 27 endoscopes). For further details on the assembly of corrected microendoscope see(7)”.

      Reviewer #1 (Recommendations for the authors):

      (1) Page 4, what is meant by 'ad-hoc" in describing software control?

      With “ad-hoc” we meant “specifically designed”. We revised the text to make this clear.

      (2) It was hard to tell how the PSF was modeled for the simulations (especially on page 34, describing the two spherical shells of the astigmatic PSF and ellipsoids modeled along them). Images or especially videos that show the modeling would make this easier to follow.

      Simulated calcium t-series were generated following previous work by our group (Antonini et al., eLife 2020), as stated in the Methods on page 37 (line 5). In Figure 4A of Antonini et al. eLife 2020, we provided a schematic to visually describe the procedure of simulated data generation. In the present paper, we decided not to include a similar drawing and cite the eLife 2020 article to avoid redundancy.

      (3) Some math symbols are missing from the methods in my version of the text (page 36/37).

      We apologize for the inconvenience. This issue arose in the PDF conversion of our Word document and we did not spot it at the time of submission. We will now make sure the PDF version of our manuscript correctly reports symbols and equations.

      (4) The Z extent of stacks (i.e. number of steps) used to generate images in Figure 4 is missing.

      We thank the Reviewer for the comment and we now revised the caption of Figure 4 and the Methods section as follows:

      “Figure 4. Aberration correction in long GRIN lens-based microendoscopes enables highresolution imaging of biological structures over enlarged FOVs. A) jGCaMP7f-stained neurons in a fixed mouse brain slice were imaged using 2PLSM (λexc = 920 nm) through an uncorrected (left) and a corrected (right) microendoscope based on the 6.4 mm-long GRIN rod. Images are maximum fluorescence intensity (F) projections of a z-stack acquired with a 5 μm step size. Number of steps: 32 and 29 for uncorrected and corrected microendoscope, respectively. Scale bars: 50 μm. Left: the scale applies to the entire FOV. Right, the scale bar refers only to the center of the FOV; off-axis scale bar at any radial distance (x and y axes) is locally determined multiplying the length of the drawn scale bar on-axis by the corresponding normalized magnification factor shown in the horizontal color-coded bar placed below the image (see also Fig. 3, Supplementary Table 3, and Materials and Methods for more details). B) Same results for the microendoscope based on the 8.8 mm-long GRIN rod. Number of steps: 23 and 31 for uncorrected and corrected microendoscope, respectively”.

      We also modified the text in the Methods (page 35, lines 1-2):

      “(1024 pixels x 1024 pixels resolution; nominal pixel size: 0.45 µm/pixel; axial step: 5 µm; number of axial steps: 23-32; frame averaging = 8)”.

      (5) Overall, the text is wordy and a bit repetitive and could be cut down significantly in length without loss of clarity. This is true throughout, but especially when comparing the introduction and discussion.

      We edited the text (Discussion and Introduction), as suggested by the Reviewer.

      (6) Although I don't think it's necessary, I would advise including comparison data with an uncorrected endoscope in the same in vivo preparation.

      We thank the Referee for the suggestion. Below, we list the reasons why we decided not to perform the comparison between the uncorrected and corrected endoscopes in the in vivo preparation:

      (1) We believe that the comparison between uncorrected and corrected endoscopes is better performed in fixed tissue (Figure 4) or in simulated calcium data (Figure 5-6), rather than in vivo recordings (Figure 7). In fact, in the brain of living mice motion artifacts, changes in fluorophore expression level, variation in the optical properties of the brain (e.g., the presence of a blood vessel over the FOV) may make the comparison of images acquired with uncorrected and corrected microendoscopes difficult, requiring a large number of animals to cancel out the contributions of all these factors. Comparing optical properties in fixed tissue is, in contrast, devoid of these confounding factors.

      (2) A major advantage of quantifying how the optical properties of uncorrected and corrected endoscope impact on the ability to extract information about neuronal activity in simulated calcium data is that, under simulated conditions, we can count on a known ground truth as reference (e.g., how many neurons are in the FOV, where they are, and which is their electrical activity). This is clearly not possible under in vivo conditions.

      (3) The proposed experiment requires to perform imaging in the awake mouse with a corrected microendoscope, then anesthetize the animal to carefully remove the corrective microlens using forceps, and finally repeat the optical recordings in awake mice with the uncorrected microendoscope. Although this is feasible (we performed the proposed experiment in Antonini et al. eLife 2020 using a 4.1 mm-long microendoscope), the yield of success of these experiments is low. The low yield is due to the fact that the mechanical force applied on top of the microendoscope to remove the corrective microlens may induce movement of the GRIN lens inside the brain, both in vertical and horizontal directions. This can randomly result in change of the focal plane, death or damage of the cells, tissue inflammation, and bleeding. From our own experience, the number of animals used for this experiment is expected to be high.

      Reviewer #2 (Recommendations for the authors):

      Below, I provide a few minor corrections and suggestions for the authors to consider before final submission.

      (1) Page 5: when referring to Table 1 maybe add "Table 1 and Methods".

      Following the Reviewer’s comment, we revised the text at page 6 (lines 4-5 from bottom) as follows:

      “(see Supplementary Table 1 and Materials and Methods for details on simulation parameters)”.

      (2) Page 8: "We set a threshold of 10 µm on the axial resolution to define the radius of the effective FOV (corresponding to the black triangles in Fig. 3I, J) in uncorrected and corrected microendoscopes. We observed an enlargement of the effective FOV area of 4.7 times and 2.3 times for the 6.4 mm-long micro endoscope and the 8.8 mm-long micro endoscope, respectively (Table 1). These findings were in agreement with the results of the ray-trace simulations (Figure 1) and the measurement of the subresolved fluorescence layers (Figure 3AD)." I could not find the information given in this paragraph, specifically:

      a) Upon examining the black triangles in Figure 3I and J, the enlargement of the effective FOV does not appear to be 4.7 and 2.3 times.

      In Figure 3I, J, black triangles mark the intersections between the curves fitting the data and the threshold of 10 µm on the axial resolution. The values on the x-axis corresponding to the intersections (Table 1, “Effective FOV radius”) represent the estimated radius of the effective FOV of the probes, i.e. the radius within which the microendoscope has spatial resolution below the threshold of 10 μm. The ratios of the effective FOV radii are 2.17 and 1.53 for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively, which correspond to 4.7 and 2.3 times larger FOV (Table 1). To make this point clearer, we modified the indicated sentence as follows (page 10, lines 3-11 from bottom):

      “We set a threshold of 10 µm on the axial resolution to define the radius of the effective FOV (corresponding to the black triangles in Fig. 3I, J) in uncorrected and corrected microendoscopes. We observed a relative increase of the effective FOV radius of 2.17 and 1.53 for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively (Table 1). This corresponded to an enlargement of the effective FOV area of 4.7 times and 2.3 times for the 6.4 mm-long microendoscope and the 8.8

      mm-long microendoscope, respectively (Table 1). These findings were in agreement with the results of the ray-trace simulations (Figure 1) and the measurement of the subresolved fluorescence layers (Figure 3A-D)."

      b) I do not understand how the enlargements in Figure 3I and J align with the ray trace simulations in Figure 1, indicating an enlargement of 5.4 and 5.6.

      In Figure 1C, E of the first submission we showed the Strehl ratio of focal spots focalized after the microendoscope, in the object plane, as a function of radial distance from the optical axis of focal spots focalized in the focal plane at the back end of the GRIN rod (“Objective focal plane” in Figure 1A, B), before the light has traveled along the GRIN lens. After reading the Referee’s comment, we realized this choice does not facilitate the comparison between Figure 1 and Figure 3I, J. We therefore decided to modify Figure 1C, E by showing the Strehl ratio of focal spots focalized after the microendoscope as a function of their radial distance from the optical axis in the objet plane (where the Strehl ratio is computed), after the light has traveled through the GRIN lens (radial distances are still computed on a plane, not along the curved focal surface represented by the “imaging plane” in Figure 1 A, B). Computing radial distances in the object space, we found that the relative increase in the radius of the FOV due to the correction of aberrations was 3.50 and 3.35 for the 6.4 mm- and the 8.8 mm-long microendoscope, respectively. We also revised the manuscript text accordingly (page 7, lines 6-8):

      “The simulated increase in the radius of the diffraction-limited FOV was 3.50 times and 3.35 times for the 6.4 mm-long and 8.8 mm-long probe, respectively (Fig. 1C, E)”. We believe this change should facilitate the comparison of the data presented in Figure 1 and Figure 3.

      Moreover, in comparing results in Figure 1 and Figure 3, it is important to keep in mind that:

      (1) the definitions of the effective FOV radius were different in simulations (Figure 1) and real measurements (Figure 3). In simulations, we considered a theoretical criterion (Maréchal criterion) and set the lower threshold for a diffraction-limited FOV to a Strehl ratio value of 0.8. In real measures, the effective FOV radius obtained from fluorescent bead measurements was defined based on the empirical criterion of setting the upper threshold for the axial resolution to 10 µm.

      (2) the Zemax file of the GRIN lenses contained low-order aberrations and not high-order aberrations.

      (3) the small variability in some of the experimental parameters (e.g., the distance between the GRIN back end and the focusing objective) were not reflected in the simulations.

      Given the reasons listed above, it is expected that the prediction of the simulations do not perfectly match the experimental measurements and tend to predict larger improvements of aberration correction than the experimentally measured ones.

      c) Finally, how can the enlargement in Figure 3I be compared to the measurements of the sub-resolved fluorescence layers in Figures 3A-D? Could the authors please clarify these points?

      When comparing measurements of subresolved fluorescent films and beads it is important to keep in mind that the two measures have different purposes and spatial resolution. We used subresolved fluorescent films to visualize the shape and extent of the focal surface of microendoscopes in a continuous way along the radial dimension (in contrast to bead measurements that are quantized in space). This approach comes at the cost of spatial resolution, as we are using fluorescent layers, which are subresolved in the axial but not in the radial dimension. Therefore, fluorescent film profiles are not used in our study to extract relevant quantitative information about effective FOV enlargement or spatial resolution of corrected microendoscopes. In contrast, to quantitatively characterize axial and lateral resolutions we used measurements of 100 nm-diameter fluorescent beads (therefore subresolved in the x, y, and z dimensions) located at different radial distances from the center of the FOV, using a much smaller nominal pixel size compared to the fluorescent films (beads, lateral resolution: 0.049 µm/pixel, axial resolution: 0.5 µm/pixel; films, lateral resolution: 1.73 µm/pixel, axial resolution: 2 µm/pixel).

      (3) On page 15, the statement "significantly enlarge the FOV" should be more specific by providing the actual values for the increase. It would also be good to mention that this is not a xy lateral increase; rather, as one moves further from the center, more of the imaged cells belong to axially different planes.

      The values of the experimentally determined FOV enlargements (4.7 times and 2.3 times for 6.4 mm- and 8.8 mm-long microendoscope, respectively) are provided in Table 1 and are now referenced on page 10. Following the Referee’s request, we added the following sentence in the discussion (page 18, lines 10-14) to underline that the extended FOV samples on different axial positions because of the field curvature effect:

      “It must be considered, however, that the extended FOV achieved by our aberration correction method was characterized by a curved focal plane. Therefore, cells located in different radial positions within the image were located at different axial positions and cells at the border of the FOV were closer to the front end of the microendoscope”.

      (4) On page 36, most of the formulas appear to be corrupted. This may have occurred during the conversion to the merged PDF. Please verify this and check for similar problems in other equations throughout the text as well.

      We apologize for the inconvenience. This issue arose in the PDF conversion of our Word document and we did not spot it upon submission. We will now make sure the PDF version of our manuscript correctly reports symbols and equations.

      (5) In the discussion, the authors could potentially add comments on how the verified performance of the corrective lenses depends on the wavelength and mention the range within which the wavelength can be changed without the need to redesign a new corrective lens.

      Following this comments and those of other Reviewers, we explored the effect of changing wavelength on the Strehl ratio using new Zemax simulations. We found that the Strehl ratio remains > 0.8 within ± at least 10 nm from λ = 920 nm (new Supplementary Figure 1A-D, left panels), which covers the limited bandwidth of our femtosecond laser. Moreover, these simulations demonstrate that, on a much wider wavelength range (800 - 1040 nm), high Strehl ratio is obtained but at different z planes (new Supplementary Figure 1A-D, right panels). These new results are now described on page 7 (lines 8-10).

      (6) Also, they could discuss if and how the corrective lens could be integrated into fiberscopes for freely moving experiments.

      Following the Referee’s suggestion, we added a short text in the Discussion (page 21, lines 4-7 from bottom). It reads:

      “Another advantage of long corrected microendoscopes described here over adaptive optics approaches is the possibility to couple corrected microendoscopes with portable 2P microscopes(42-44), allowing high resolution functional imaging of deep brain circuits on an enlarged FOV during naturalistic behavior in freely moving mice”.

      (7) Finally, since the main advantage of this approach is its simplicity, the authors should also comment on or outline the steps to follow for potential users who are interested in using the corrective lenses in their systems.

      Thanks for this comment. The Materials and Methods section of this study and that of Antonini et al. eLife 2020 describe in details the experimental steps necessary to reproduce corrective lenses and apply them to their experimental configuration.

      Reviewer #3 (Recommendations for the authors):

      (1) Suggestions for improved or additional experiments, data, or analyses, and Recommendations for improving the writing and presentation:

      See Public Review.

      Please see our point-by-point response above.

      (2) Minor corrections on text and figures: a) Figure 6A: is the fraction of cells expressed in %?

      Author response: yes, that is correct. Thank you for spotting it. We added the “%” symbol to the y label.

      b) Figurer 8A, left: The second line is blue and not red dashed. In addition, it could be interesting to also show a line corresponding to the 0 value.

      Thank you for the suggestions. We modified Figure 8 according to the Referee’s comments.

      c) Some parts of equation (1) and some variables in the Material and Methods section are missing

      We apologize for the inconvenience. This issue arose in the PDF conversion of our Word document and we did not spot it upon submission. We will now make sure the PDF version of our manuscript correctly reports symbols and equations.

      d) In the methods, the authors mention a calibration ruler with ticks spaced every 10 µm along two orthogonal directions and refer to the following product: 4-dot calibration slide, Cat. No. 1101002300142, Motic, Hong Kong. However, this product does not seem to correspond to a calibration ruler.

      We double check. The catalog number 1101002300142 is correct and product details can be found at the following link:

      https://moticmicroscopes.com/products/calibration-slide-4-dots-1101002300142?srsltid=AfmBOorGYx9PcXtAlIMmSs_tEpxS4nX21qIcV8Kfn4qGwizQK3LYOQn3

    1. eLife Assessment

      This paper represents an important contribution to the field. Summarizing results from neural recording experiments in mice across ten labs, the work provides compelling evidence that basic electrophysiology features, single-neuron functional properties, and population-level decoding are fairly reproducible across labs with proper preprocessing. The results and suggestions regarding preprocessing and quality metrics may be of significant interest to investigators carrying out such experiments in their own labs.

    2. Reviewer #1 (Public review):

      The IBL here presents an important paper that aims to assess potential reproducibility issues in rodent electrophysiological recordings across labs and suggests solutions to these. The authors carried out a series of analyses on data collected across 10 laboratories while mice performed the same decision-making task, and provided convincing evidence that basic electrophysiology features, single-neuron functional properties, and population-level decoding were fairly reproducible across labs with proper preprocessing. This well-motivated large-scale collaboration allowed systematic assessment of lab-to-lab reproducibility of electrophysiological data, and the suggestions outlined in the paper for streamlining preprocessing pipelines and quality metrics will provide general guidance for the field, especially with continued effort to benchmark against standard practices (such as manual curation).

      The authors have carefully incorporated our suggestions. As a result, the paper now better reflects where reproducibility is affected when using common, simple, and more complex analyses and preprocessing methods, and it is more informative-and more reflective of the field overall. We thank the reviewers for this thorough revision. We have 2 remaining suggestions on text clarification:

      (1) Regarding benchmarking the automated metrics to manual curation of units: although we appreciate that a proper comparison may require a lot of effort potentially beyond the scope of the current paper; we do think that explicit discussion regarding this point is needed in the text, to remind the readers (and indeed future generations of electrophysiologists) the pros and cons of different approaches.

      In addition to what the authors have currently stated (line 469-470):<br /> "Another significant limitation of the analysis presented here is that we have not been able to assess the extent to which other choices of quality metrics and inclusion criteria might have led to greater or lesser reproducibility."

      Maybe also add:<br /> "In particular, a thorough comparison of automated metrics against a careful, large, manually-curated dataset, is an important benchmarking step for future studies.

      (2) The authors now include in Figure 3-Figure Supplement 1 that highlight how much probe depth is adjusted by using electrophysiological features such as LFP power to estimate probe and channel depth. This plot is immensely informative for the field, as it implies that there can be substantial variability-sometimes up to 1 mm discrepancy between insertions-in depth estimation based on anatomical DiI track tips alone. Using electrophysiological features in this way for probe depth estimation is currently not standard in the field and has only been made possible with Neuropixels, which span several millimeters. These figures highlight that this should be a critical step in preprocessing pipelines, and the paper provides solid evidence for this.

      Currently, this part of the figure is only subtly referenced to in the text. We think it would be helpful to explicitly reference this particular panel with discussions of its implication in the text.

    3. Reviewer #2 (Public review):

      Summary:

      The authors sought to evaluate whether analyses of large-scale electrophysiology data obtained from 10 different individual laboratories are reproducible when they use standardized procedures and quality control measures. They were able to reproduce most of their experimental findings across all labs. Despite attempting to target the same brain areas in each recording, variability in electrode targeting was a source of some differences between datasets.

      Strengths:

      This paper gathered a standardized dataset across 10 labs and performed a host of state-of-the-art analyses on it. Their ability to assess the reproducibility of each analysis across this kind of data is an important contribution to the field.

      Comments on revisions:

      The authors have addressed almost all of the concerns that I raised in this revised version. The new RIGOR notebook is helpful, as are the new analyses.

      This paper attributes much error in probe insertion trajectory planning to the fact that the Allen CCF and standard stereotaxic coordinate systems are not aligned. Consequently, it would be very helpful for the community if this paper could recommend software tools, procedures, or code to do trajectory planning that accounts for this.

      I think it would still be helpful for the paper to have some discussion comparing/contrasting the use of the RIGOR framework with existing spike sorting statistics. They mention in their response to reviewers that this is indeed a large space of existing approaches. Most labs performing Neuropixels recordings already do some type of quality control, but these approaches are not standardized. This work is well-positioned to discuss the advantages and disadvantages of these alternative approaches (even briefly) but does not currently do so-it does not need to run any of these competing approaches to helpfully mention ideas for what a reader of the paper should do for quality control with their own data.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their careful read of our paper, and appreciate the thoughtful comments.

      Both reviewers agreed that our work had several major strengths: the large dataset collected in collaboration across ten labs, the streamlined processing pipelines, the release of code repositories, the multi-task neural network, and that we definitively determined that electrode placement is an important source of variability between datasets.

      However, a number of key potential improvements were noted: the reviewers felt that a more standard model-based characterization of single neuron responses would benefit our reproducibility analysis, that more detail was needed about the number of cells, sessions, and animals, and that more information was needed to allow users to deploy the RIGOR standards and to understand their relationship to other metrics in the field.

      We agree with these suggestions and have implemented many major updates in our revised manuscript. Some highlights include:

      (1)  A new regression analysis that specifies the response profile of each neuron, allowing a comparison of how similar these are across labs and areas (See Figure 7 in the new section, “Single neuron coefficients from a regression-based analysis are rep oducible across labs”);

      (2) A new decoding analysis (See Figure 9 in the section, “Decodability of task variables is consistent across labs, but varies by brain region”);

      (3) A new RIGOR notebook to ease useability;

      (4) A wealth of additional information about the cells, animals and sessions in each figure;

      (5) Many new additional figure panels in the main text and supplementary material to clarify the specific points raised by the reviewers.

      Again, we are grateful to the reviewers and editors for their helpful comments, which have significantly improved the work. We are hopeful that the many revisions we have implemented will be sufficient to change the “incomplete” designation that was originally assigned to the manuscript.

      Reviewer #1 (Public review):

      Summary:

      The authors explore a large-scale electrophysiological dataset collected in 10 labs while mice performed the same behavioral task, and aim to establish guidelines to aid reproducibility of results collected across labs. They introduce a series of metrics for quality control of electrophysiological data and show that histological verification of recording sites is important for interpreting findings across labs and should be reported in addition to planned coordinates. Furthermore, the authors suggest that although basic electrophysiology features were comparable across labs, task modulation of single neurons can be variable, particularly for some brain regions. The authors then use a multi-task neural network model to examine how neural dynamics relate to multiple interacting task- and experimenter-related variables, and find that lab-specific differences contribute little to the variance observed. Therefore, analysis approaches that account for correlated behavioral variables are important for establishing reproducible results when working with electrophysiological data from animals performing decision-making tasks. This paper is very well-motivated and needed. However, what is missing is a direct comparison of task modulation of neurons across labs using standard analysis practice in the fields, such as generalized linear model (GLM). This can potentially clarify how much behavioral variance contributes to the neural variance across labs; and more accurately estimate the scale of the issues of reproducibility in behavioral systems neuroscience, where conclusions often depend on these standard analysis methods.

      We fully agree that a comparison of task-modulation across labs is essential. To address this, we have performed two new analyses and added new corresponding figures to the main text (Figures 7 and 9). As the reviewer hoped, this analysis did indeed clarify how much behavioral variance contributes to the variance across labs. Critically, these analyses suggested that our results were more robust to reproducibility than the more traditional analyses would indicate.

      Additional details are provided below (See detailed response to R1P1b).

      Strengths:

      (1) This is a well-motivated paper that addresses the critical question of reproducibility in behavioural systems neuroscience. The authors should be commended for their efforts.

      (2) A key strength of this study comes from the large dataset collected in collaboration across ten labs. This allows the authors to assess lab-to-lab reproducibility of electrophysiological data in mice performing the same decision-making task.

      (3) The authors' attempt to streamline preprocessing pipelines and quality metrics is highly relevant in a field that is collecting increasingly large-scale datasets where automation of these steps is increasingly needed.

      (4) Another major strength is the release of code repositories to streamline preprocessing pipelines across labs collecting electrophysiological data.

      (5) Finally, the application of MTNN for characterizing functional modulation of neurons, although not yet widely used in systems neuroscience, seems to have several advantages over traditional methods.

      Thanks very much for noting these strengths of our work.

      Weaknesses:

      (1) In several places the assumptions about standard practices in the field, including preprocessing and analyses of electrophysiology data, seem to be inaccurately presented:

      a) The estimation of how much the histologically verified recording location differs from the intended recording location is valuable information. Importantly, this paper provides citable evidence for why that is important. However, histological verification of recording sites is standard practice in the field, even if not all studies report them. Although we appreciate the authors' effort to further motivate this practice, the current description in the paper may give readers outside the field a false impression of the level of rigor in the field.

      We agree that labs typically do perform histological verification. Still, our methods offer a substantial improvement over standard practice, and this was critical in allowing us to identify errors in targeting. For instance, we used new software, LASAGNA, which is an innovation over the traditional, more informal approach to localizing recording sites. Second, the requirement that two independent reviewers concur on each proposed location for a recording site is also an improvement over standard practice. Importantly, these reviewers use electrophysiological features to more precisely localize electrodes, when needed, which is an improvement over many labs. Finally, most labs use standard 2D atlases to identify recording location (a traditional approach); our use of a 3D atlas and a modern image registration pipeline has improved the accuracy of identifying the true placement of probes in 3D space.

      Importantly, we don’t necessarily advocate that all labs adopt our pipeline; indeed, this would be infeasible for many labs. Instead, our hope is that the variability in probe trajectory that we uncovered will be taken into account in future studies. Here are 3 example ways in which that could happen. First, groups hoping to target a small area for an experiment might elect to use a larger cohort than previously planned, knowing that some insertions will miss their target. Second, our observation that some targeting error arose because experimenters had to move probes due to blood vessels will impact future surgeries: when an experimenter realizes that a blood vessel is in the way, they might still re-position the probe, but they can also adjust its trajectory (e.g., changing the angle) knowing that even little nudges to avoid blood vessels can have a large impact on the resulting insertion trajectory. Third, our observation of a 7 degree deviation between stereotaxic coordinates and Allen Institute coordinates can be used for future trajectory planning steps to improve accuracy of placement. Uncovering this deviation required many insertions and our standardized pipeline, but now that it is known, it can be easily corrected without needing such a pipeline.

      We thank the reviewer for bringing up this issue and have added new text (and modified existing text) in the Discussion to highlight the innovations we introduced that allowed us to carefully quantify probe trajectory across labs (lines 500 - 515):

      “Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, an approach that greatly exceeds the histological analyses done by most individual labs. Our approach, which enables scalability and standardization across labs while minimizing subjective variability, revealed that much of the variance in targeting was due to the probe entry positions at the brain surface, which were randomly displaced across the dataset. … Detecting this offset relied on a large cohort size and an automated histological pipeline, but now that we have identified the offset, it can be easily accounted for by any lab. Specifically, probe angles must be carefully computed from the CCF, as the CCF and stereotaxic coordinate systems do not define the same coronal plane angle. Minimizing variance in probe targeting is another important element in increasing reproducibility, as slight deviations in probe entry position and angle can lead to samples from different populations of neurons. Collecting structural MRI data in advance of implantation could reduce targeting error, although this is infeasible for most labs. A more feasible solution is to rely on stereotaxic coordinates but account for the inevitable off-target measurements by increasing cohort sizes and adjusting probe angles when blood vessels obscure the desired location.”

      b) When identifying which and how neurons encode particular aspects of stimuli or behaviour in behaving animals (when variables are correlated by the nature of the animals behaviour), it has become the standard in behavioral systems neuroscience to use GLMs - indeed many labs participating in the IBL also has a long history of doing this (e.g., Steinmetz et al., 2019; Musall et al., 2023; Orsolic et al., 2021; Park et al., 2014). The reproducibility of results when using GLMs is never explicitly shown, but the supplementary figures to Figure 7 indicate that results may be reproducible across labs when using GLMs (as it has similar prediction performance to the MTNN). This should be introduced as the first analysis method used in a new dedicated figure (i.e., following Figure 3 and showing results of analyses similar to what was shown for the MTNN in Figure 7). This will help put into perspective the degree of reproducibility issues the field is facing when analyzing with appropriate and common methods. The authors can then go on to show how simpler approaches (currently in Figures 4 and 5) - not accounting for a lot of uncontrolled variabilities when working with behaving animals - may cause reproducibility issues.

      We fully agree with the reviewer's suggestion. We have addressed their concern by implementing a Reduced-Rank Regression (RRR) model, which builds upon and extends the principles of Generalized Linear Models (GLMs). The RRR model retains the core regression framework of GLMs while introducing shared, trainable temporal bases across neurons, enhancing the model’s capacity to capture the structure in neural activity (Posani, Wang, et al., bioRxiv, 2024). Importantly, Posani, Wang et al compared the predictive performance of GLMs vs the RRR model, and found that the RRR model provided (slightly) improved performance, so we chose the RRR approach here.

      We highlight this analysis in a new section (lines 350-377) titled, “Single neuron coefficients from a regression-based analysis are reproducible across labs”. This section includes an entirely new Figure (Fig. 7), where this new analysis felt most appropriate, since it is closer in spirit to the MTNN analysis that follows (rather than as a new Figure 3, as the reviewer suggested). As the reviewer hoped, this analysis provides some reassurance that including many variables when characterizing neural activity furnishes results with improved reproducibility. We now state this in the Results and the Discussion (line 456-457), highlighting that these analyses complement the more traditional selectivity analyses, and that using both methods together can be informative.

      When the authors introduce a neural network approach (i.e. MTNN) as an alternative to the analyses in Figures 4 and 5, they suggest: 'generalized linear models (GLMs) are likely too inflexible to capture the nonlinear contributions that many of these variables, including lab identity and spatial positions of neurons, might make to neural activity'). This is despite the comparison between MTNN and GLM prediction performance (Supplement 1 to Figure 7) showing that the MTNN is only slightly better at predicting neural activity compared to standard GLMs. The introduction of new models to capture neural variability is always welcome, but the conclusion that standard analyses in the field are not reproducible can be unfair unless directly compared to GLMs.

      In essence, it is really useful to demonstrate how different analysis methods and preprocessing approaches affect reproducibility. But the authors should highlight what is actually standard in the field, and then provide suggestions to improve from there.

      Thanks again for these comments. We have also edited the MTNN section slightly to accommodate the addition of the previous new RRR section (line 401-402).

      (2) The authors attempt to establish a series of new quality control metrics for the inclusion of recordings and single units. This is much needed, with the goal to standardize unit inclusion across labs that bypasses the manual process while keeping the nuances from manual curation. However, the authors should benchmark these metrics to other automated metrics and to manual curation, which is still a gold standard in the field. The authors did this for whole-session assessment but not for individual clusters. If the authors can find metrics that capture agreed-upon manual cluster labels, without the need for manual intervention, that would be extremely helpful for the field.

      We thank the reviewer for their insightful suggestions regarding benchmarking our quality control metrics against manual curation and other automated methods at the level of individual clusters. We are indeed, as the reviewer notes, publishing results from spike sorting outputs that have been automatically but not manually verified on a neuron-by-neuron basis. To get to the point where we trust these results to be of publishable quality, we manually reviewed hundreds of recordings and thousands of neurons, refining both the preprocessing pipeline and the single-unit quality metrics along the way. All clusters, both those passing QCs and those not passing QCs, are available to review with detailed plots and quantifications at https://viz.internationalbrainlab.org/app (turn on “show advanced metrics” in the upper right, and navigate to the plots furthest down the page, which are at the individual unit level). We would emphasize that these metrics are definitely imperfect (and fully-automated spike sorting remains a work in progress), but so is manual clustering. Our fully automated approach has the advantage of being fully reproducible, which is absolutely critical for the analyses in the present paper. Indeed, if we had actually done manual clustering or curation, one would wonder whether our results were actually reproducible independently. Nevertheless, it is not part of the present manuscript’s objectives to validate or defend these specific choices for automated metrics, which have been described in detail elsewhere (see our Spike Sorting whitepaper, https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_La boratory/19705522?file=49783080). It would be a valuable exercise to thoroughly compare these metrics against a careful, large, manually-curated set, but doing this properly would be a paper in itself and is beyond the scope of the current paper. We also acknowledge that our analyses studying reproducibility across labs could, in principle, result in more or less reproducibility under a different choice of metrics, which we now describe in the Discussion (line 469-470)”:

      “Another significant limitation of the analysis presented here is that we have not been able to assess the extent to which other choices of quality metrics and inclusion criteria might have led to greater or lesser reproducibility.”

      (3) With the goal of improving reproducibility and providing new guidelines for standard practice for data analysis, the authors should report of n of cells, sessions, and animals used in plots and analyses throughout the paper to aid both understanding of the variability in the plots - but also to set a good example.

      We wholeheartedly agree and have added the number of cells, mice and sessions for each figure. This information is included as new tabs in our quality control spreadsheet (https://docs.google.com/spreadsheets/d/1_bJLDG0HNLFx3SOb4GxLxL52H4R2uPRcpUlIw6n4 n-E/). This is referred to in line 158-159 (as well as its original location on line 554 in the section, “Quality control and data inclusion”).

      Other general comments:

      (1) In the discussion (line 383) the authors conclude: 'This is reassuring, but points to the need for large sample sizes of neurons to overcome the inherent variability of single neuron recording'. - Based on what is presented in this paper we would rather say that their results suggest that appropriate analytical choices are needed to ensure reproducibility, rather than large datasets - and they need to show whether using standard GLMs actually allows for reproducible results.

      Thanks. The new GLM-style RRR analysis in Figure 7, following the reviewer’s suggestion, does indeed indicate improved reproducibility across labs. As described above, we see this new analysis as complementary to more traditional analyses of neural selectivity and argue that the two can be used together. The new text (line 461) states:

      “This is reassuring, and points to the need for appropriate analytical choices to ensure reproducibility.”

      (2) A general assumption in the across-lab reproducibility questions in the paper relies on intralab variability vs across-lab variability. An alternative measure that may better reflect experimental noise is across-researcher variability, as well as the amount of experimenter experience (if the latter is a factor, it could suggest researchers may need more training before collecting data for publication). The authors state in the discussion that this is not possible. But maybe certain measures can be used to assess this (e.g. years of conducting surgeries/ephys recordings etc)?

      We agree that understanding experimenter-to-experimenter variability would be very interesting and indeed we had hoped to do this analysis for some time. The problem is that typically, each lab employed one trainee to conduct all the data collection. This prevents us from comparing outcomes from two different experimenters in the same lab. There are exceptions to this, such as the Churchland lab in which 3 personnel (two postdocs and a technician) collected the data. However, even this fortuitous situation did not lend itself well to assessing experimenter-to-experimenter variation: the Churchland lab moved from Cold Spring Harbor to UCLA during the data collection period, which might have caused variability that is totally independent of experimenter (e.g., different animal facilities). Further, once at UCLA, the postdoc and technician worked closely together- alternating roles in animal training, surgery and electrophysiology. We believe that the text in our current Discussion (line 465-468) accurately characterizes the situation:

      “Our experimental design precludes an analysis of whether the reproducibility we observed was driven by person-to-person standardization or lab-to-lab standardization. Most likely, both factors contributed: all lab personnel received standardized instructions for how to implant head bars and train animals, which likely reduced personnel-driven differences.”

      Quantifying the level of experience of each experimenter is an appealing idea and we share the reviewer’s curiosity about its impact on data quality. Unfortunately, quantifying experience is tricky. For instance, years of conducting surgeries is not an unambiguously determinable number. Would we count an experimenter who did surgery every day for a year as having the same experience as an experimenter who did surgery once/month for a year? Would we count a surgeon with expertise in other areas (e.g., windows for imaging) in the same way as surgeons with expertise in ephys-specific surgeries? Because of the ambiguities, we leave this analysis to be the subject of future work; this is now stated in the Discussion (line 476).

      (3) Figure 3b and c: Are these plots before or after the probe depth has been adjusted based on physiological features such as the LFP power? In other words, is the IBL electrophysiological alignment toolbox used here and is the reliability of location before using physiological criteria or after? Beyond clarification, showing both before and after would help the readers to understand how much the additional alignment based on electrophysiological features adjusts probe location. It would also be informative if they sorted these penetrations by which penetrations were closest to the planned trajectory after histological verification.

      The plots in Figure 3b and 3c reflect data after the probe depth has been adjusted based on electrophysiological features. This adjustment incorporates criteria such as LFP power and spiking activity to refine the trajectory and ensure precise alignment with anatomical landmarks. The trajectories have also been reviewed and confirmed by two independent reviewers. We have clarified this in line 180 and in the caption of Figure 3.

      To address this concern, we have added a new panel c in Figure 3 supplementary 1 (also shown below) that shows the LFP features along the probes prior to using the IBL alignment toolbox. We hope the reviewer agrees that a comparison of panels (a) and (c) below make clear the improvement afforded by our alignment tools.

      In Figure 3 and Figure 3 supplementary 1, as suggested, we have also now sorted the probes by those that were closest to the planned trajectory. This way of visualizing the data makes it clear that as the distance from the planned trajectory increases, the power spectral density in the hippocampal regions becomes less pronounced and the number of probes that have a large portion of the channels localized to VISa/am, LP and PO decreases. We have added text to the caption to describe this. We thank the reviewer for this suggestion and agree that it will help readers to understand how much the additional alignment (based on electrophysiological features) adjusts probe location.

      (4) In Figures 4 and 6: If the authors use a 0.05 threshold (alpha) and a cell simply has to be significant on 1/6 tests to be considered task modulated, that means that they have a false positive rate of ~30% (0.05*6=0.3). We ran a simple simulation looking for significant units (from random null distribution) from these criteria which shows that out of 100.000 units, 26500 units would come out significant (false error rate: 26.5%). That is very high (and unlikely to be accepted in most papers), and therefore not surprising that the fraction of task-modulated units across labs is highly variable. This high false error rate may also have implications for the investigation of the spatial position of task-modulated units (as effects of the spatial position may drown in falsely labelled 'task-modulated' cells).

      Thank you for this concern. The different tests were kept separate, so we did not consider a neuron modulated if it was significant in only one out of six tests, but instead we asked whether a neuron was modulated according to test one, whether it was modulated according to test two, etc., and performed further analyses separately for each test. Thus, we are only vulnerable to the ‘typical’ false positive rate of 0.05 for any given test. We made this clearer in the text (lines 232-236) and hope that the 5% false positive rate seems more acceptable.

      (5) The authors state from Figure 5b that the majority of cells could be well described by 2 PCs. The distribution of R2 across neurons is almost uniform, so depending on what R2 value one considers a 'good' description, that is the fraction of 'good' cells. Furthermore, movement onset has now been well-established to be affecting cells widely and in large fractions, so while this analysis may work for something with global influence - like movement - more sparsely encoded variables (as many are in the brain) may not be well approximated with this suggestion. The authors could expand this analysis into other epochs like activity around stimulus presentation, to better understand how this type of analysis reproduces across labs for features that have a less global influence.

      We thank the reviewer for the suggestion and fully agree that the window used in our original analysis would tend to favor movement-driven neurons. To address this, we repeated the analysis, this time using a window centered around stimulus onset (from -0.5 s prior to stimulus onset until 0.1 s after stimulus onset). As the reviewer suspected, far fewer neurons were active in this window and consequently far fewer were modelled well by the first two PCs, as shown in Author response image 1b (below). Similar to our original analysis using the post-movement window, we found mixed results for the stimulus-centered window across labs. Interestingly, regional differences were weaker in this new analysis compared to the original analysis of the post-movement window. We have added a sentence to the results describing this. Because the results are similar to the post-movement window main figure, we would prefer to restrict the new analysis only to this point-by-point response, in the hopes of streamlining the paper.

      Author response image 1.

      PCA analysis applied to a stimulus-aligned window ([-0.5, 0.1] sec relative to stim onset). Figure conventions as in main text Fig 5. Results are comparable to the post-movement window analysis, however regional differences are weaker here, possibly because fewer cells were active in the pre-movement window. We added panel j here and in the main figure, showing cell-number-controlled results. I.e. for each test, the minimum neuron number of the compared classes was sampled from all classes (say labs in a region), this sampling was repeated 1000 times and p-values combined via Fisher’s method, overall resulting in much fewer significant differences across laboratories and, independently, regions.

      (6) Additionally, in Figure 5i: could the finding that one can only distinguish labs when taking cells from all regions, simply be a result of a different number of cells recorded in each region for each lab? It makes more sense to focus on the lab/area pairing as the authors also do, but not to make their main conclusion from it. If the authors wish to do the comparison across regions, they will need to correct for the number of cells recorded in each region for each lab. In general, it was a struggle to fully understand the purpose of Figure 5. While population analysis and dimensionality reduction are commonplace, this seems to be a very unusual use of it.

      We agree that controlling for varying cell numbers is a valuable addition to this analysis. We added panel j in Fig. 5 showing cell-number-controlled test results of panel i. I.e. for a given statistical comparison, we sample the lowest number of cells of compared classes from the others, do the test, and repeat this sampling 1000 times, before combining the p-values using Fisher’s method. This cell-number controlled version of the tests resulted in clearly fewer significant differences across distributions - seen similarly for the pre-movement window shown in j in Author response image 1. We hope this clarified our aim to illustrate that low-dimensional embedding of cells’ trial-averaged activity can show how regional differences compare with laboratory differences.

      As a complementary statistical analysis to the shown KS tests, we fitted a linear-mixed-effects model (statsmodels.formula.api mixedlm), to the first and second PC for both activity windows (“Move”: [-0.5,1] first movement aligned; “Stim”: [-0.5,0.1] stimulus onset aligned), independently. Author response image 2 (in this rebuttal only) is broadly in line with the KS results, showing more regional than lab influences on the distributions of first PCs for the post-movement window.

      Author response image 2:

      Linear mixed effects model results for two PCs and two activity windows. For the post-movement window (“Move”), regional influences are significant (red color in plots) for all but one region while only one lab has a significant model coefficient for PC1. For PC2 more labs and three regions have significant coefficients. For the pre-movement window (“Stim”) one region for PC1 or PC2 has significant coefficients. The variance due to session id was smaller than all other effects (“eids Var”). “Intercept” shows the expected value of the response variable (PC1, PC2) before accounting for any fixed or random effects. All p-values were grouped as one hypothesis family and corrected for multiple comparisons via Benjamini-Hochberg.

      (7) In the discussion the authors state: " Indeed this approach is a more effective and streamlined way of doing it, but it is questionable whether it 'exceeds' what is done in many labs.

      Classically, scientists trace each probe manually with light microscopy and designate each area based on anatomical landmarks identified with nissl or dapi stains together with gross landmarks. When not automated with 2-PI serial tomography and anatomically aligned to a standard atlas, this is a less effective process, but it is not clear that it is less precise, especially in studies before neuropixels where active electrodes were located in a much smaller area. While more effective, transforming into a common atlas does make additional assumptions about warping the brain into the standard atlas - especially in cases where the brain has been damaged/lesioned. Readers can appreciate the effectiveness and streamlining provided by these new tools without the need to invalidate previous approaches.

      We thank the reviewer for highlighting the effectiveness of manual tracing methods used traditionally. Our intention in the statement was not to invalidate the precision or value of these classical methods but rather to emphasize the scalability and streamlining offered by our pipeline. We have revised the language to more accurately reflect this (line 500-504):

      “Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, an approach that greatly exceeds the histological analyses done by most individual labs. Our approach, which enables scalability and standardization across labs while minimizing subjective variability, revealed that much of the variance in targeting was due to the probe entry positions at the brain surface, which were randomly displaced across the dataset.”

      (8) What about across-lab population-level representation of task variables, such as in the coding direction for stimulus or choice? Is the general decodability of task variables from the population comparable across labs?

      Excellent question, thanks! We have added the new section “Decodability of task variables is consistent across labs, but varies by brain region” (line 423-448) and Figure 9 in the revised manuscript to address this question. In short, yes, the general decodability of task variables from the population is comparable across labs, providing additional reassurance of reproducibility.

      Reviewer #2 (Public review):

      Summary:

      The authors sought to evaluate whether observations made in separate individual laboratories are reproducible when they use standardized procedures and quality control measures. This is a key question for the field. If ten systems neuroscience labs try very hard to do the exact same experiment and analyses, do they get the same core results? If the answer is no, this is very bad news for everyone else! Fortunately, they were able to reproduce most of their experimental findings across all labs. Despite attempting to target the same brain areas in each recording, variability in electrode targeting was a source of some differences between datasets.

      Major Comments:

      The paper had two principal goals:

      (1) to assess reproducibility between labs on a carefully coordinated experiment

      (2) distill the knowledge learned into a set of standards that can be applied across the field.

      The manuscript made progress towards both of these goals but leaves room for improvement.

      (1) The first goal of the study was to perform exactly the same experiment and analyses across 10 different labs and see if you got the same results. The rationale for doing this was to test how reproducible large-scale rodent systems neuroscience experiments really are. In this, the study did a great job showing that when a consortium of labs went to great lengths to do everything the same, even decoding algorithms could not discern laboratory identity was not clearly from looking at the raw data. However, the amount of coordination between the labs was so great that these findings are hard to generalize to the situation where similar (or conflicting!) results are generated by two labs working independently.

      Importantly, the study found that electrode placement (and thus likely also errors inherent to the electrode placement reconstruction pipeline) was a key source of variability between datasets. To remedy this, they implemented a very sophisticated electrode reconstruction pipeline (involving two-photon tomography and multiple blinded data validators) in just one lab-and all brains were sliced and reconstructed in this one location. This is a fantastic approach for ensuring similar results within the IBL collaboration, but makes it unclear how much variance would have been observed if each lab had attempted to reconstruct their probe trajectories themselves using a mix of histology techniques from conventional brain slicing, to light sheet microscopy, to MRI imaging.

      This approach also raises a few questions. The use of standard procedures, pipelines, etc. is a great goal, but most labs are trying to do something unique with their setup. Bigger picture, shouldn't highly "significant" biological findings akin to the discovery of place cells or grid cells, be so clear and robust that they can be identified with different recording modalities and analysis pipelines?

      We agree, and hope that this work may help readers understand what effect sizes may be considered “clear and robust” from datasets like these. We certainly support the reviewer’s point that multiple approaches and modalities can help to confirm any biological findings, but we would contend that a clear understanding of the capabilities and limitations of each approach is valuable, and we hope that our paper helps to achieve this.

      Related to this, how many labs outside of the IBL collaboration have implemented the IBL pipeline for their own purposes? In what aspects do these other labs find it challenging to reproduce the approaches presented in the paper? If labs were supposed to perform this same experiment, but without coordinating directly, how much more variance between labs would have been seen? Obviously investigating these topics is beyond the scope of this paper. The current manuscript is well-written and clear as is, and I think it is a valuable contribution to the field. However, some additional discussion of these issues would be helpful.

      We thank the reviewer for raising this important issue. We know of at least 13 labs that have implemented the behavioral task software and hardware that we published in eLife in 2021, and we expect that over the next several years labs will also implement these analysis pipelines (note that it is considerably cheaper and faster to implement software pipelines than hardware). In particular, a major goal of the staff in the coming years is to continue and improve the support for pipeline deployment and use. However, our goal in this work, which we have aimed to state more clearly in the revised manuscript, was not so much to advocate that others adopt our pipeline, but instead to use our standardized approach as a means of assessing reproducibility under the best of circumstances (see lines 48-52): “A high level of reproducibility of results across laboratories when procedures are carefully matched is a prerequisite to reproducibility in the more common scenario in which two investigators approach the same high-level question with slightly different experimental protocols.”

      Further, a number of our findings are relevant to other labs regardless of whether they implement our exact pipeline, a modified version of our pipeline, or something else entirely. For example, we found probe targeting to be a large source of variability. Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, but now that we have identified the offset, it can be easily accounted for by any lab. Specifically, probe angles must be carefully computed from the CCF, as the CCF and stereotaxic coordinate systems do not define the same coronal plane angle. Relatedly, we found that slight deviations in probe entry position can lead to samples from different populations of neurons. Although this took large cohort sizes to discover, knowledge of this discovery means that future experiments can plan for larger cohort sizes to allow for off-target trajectories, and can re-compute probe angle when the presence of blood vessels necessitates moving probes slightly. These points are now highlighted in the Discussion (lines 500-515).

      Second, the proportion of responsive neurons (a quantity often used to determine that a particular area subserves a particular function), sometimes failed to reproduce across labs. For example, for movement-driven activity in PO, UCLA reported an average change of 0 spikes/s, while CCU reported a large and consistent change (Figure 4d, right most panel, compare orange vs. yellow traces). This argues that neuron-to-neuron variability means that comparisons across labs require large cohort sizes. A small number of outlier neurons in a session can heavily bias responses. We anticipate that this problem will be remedied as tools for large scale neural recordings become more widely used. Indeed, the use of 4-shank instead of single-shank Neuropixels (as we used here) would have greatly enhanced the number of PO neurons we measured in each session. We have added new text to Results explaining this (lines 264-268):

      “We anticipate that the feasibility of even larger scale recordings will make lab-to-lab comparisons easier in future experiments; multi-shank probes could be especially beneficial for cortical recordings, which tend to be the most vulnerable to low cell counts since the cortex is thin and is the most superficial structure in the brain and thus the most vulnerable to damage. Analyses that characterize responses to multiple parameters are another possible solution (See Figure 7).”

      (2) The second goal of the study was to present a set of data curation standards (RIGOR) that could be applied widely across the field. This is a great idea, but its implementation needs to be improved if adoption outside of the IBL is to be expected. Here are three issues:

      (a) The GitHub repo for this project (https://github.com/int-brain-lab/paper-reproducible-ephys/) is nicely documented if the reader's goal is to reproduce the figures in the manuscript. Consequently, the code for producing the RIGOR statistics seems mostly designed for re-computing statistics on the existing IBL-formatted datasets. There doesn't appear to be any clear documentation about how to run it on arbitrary outputs from a spike sorter (i.e. the inputs to Phy).

      We agree that clear documentation is key for others to adopt our standards. To address this, we have added a section at the end of the README of the repository that links to a jupyter notebook (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/master/RIGOR_script.ipynb) that runs the RIGOR metrics on a user’s own spike sorted dataset. The notebook also contains a tutorial that walks through how to visually assess the quality of the raw and spike sorted data, and computes the noise level metrics on the raw data as well as the single cell metrics on the spike sorted data.

      (b) Other sets of spike sorting metrics that are more easily computed for labs that are not using the IBL pipeline already exist (e.g. "quality_metrics" from the Allen Institute ecephys pipeline [https://github.com/AllenInstitute/ecephys_spike_sorting/blob/main/ecephys_spike_sorting/m odules/quality_metrics/README.md] and the similar module in the Spike Interface package [https://spikeinterface.readthedocs.io/en/latest/modules/qualitymetrics.html]). The manuscript does not compare these approaches to those proposed here, but some of the same statistics already exist (amplitude cutoff, median spike amplitude, refractory period violation).

      There is a long history of researchers providing analysis algorithms and code for spike sorting quality metrics, and we agree that the Allen Institute’s ecephys code and the Spike Interface package are the current options most widely used (but see also, for example, Fabre et al. https://github.com/Julie-Fabre/bombcell). Our primary goal in the present work is not to advocate for a particular implementation of any quality metrics (or any spike sorting algorithm, for that matter), but instead to assess reproducibility of results, given one specific choice of spike sorting algorithm and quality metrics. That is why, in our comparison of yield across datasets (Fig 1F), we downloaded the raw data from those comparison datasets and re-ran them under our single fixed pipeline, to establish a fair standard of comparison. A full comparison of the analyses presented here under different choices of quality metrics and spike sorting algorithms would undoubtedly be interesting and useful for the field - however, we consider it to be beyond the scope of the present work. It is therefore an important assumption of our work that the result would not differ materially under a different choice of sorting algorithm and quality metrics. We have added text to the Discussion to clarify this limitation:

      “Another significant limitation of the analysis presented here is that we have not been able to assess the extent to which other choices of quality metrics and inclusion criteria might have led to greater or lesser reproducibility.”

      That said, we still intend for external users to be able to easily run our pipelines and quality metrics.

      (c) Some of the RIGOR criteria are qualitative and must be visually assessed manually. Conceptually, these features make sense to include as metrics to examine, but would ideally be applied in a standardized way across the field. The manuscript doesn't appear to contain a detailed protocol for how to assess these features. A procedure for how to apply these criteria for curating non-IBL data (or for implementing an automated classifier) would be helpful.

      We agree. To address this, we have provided a notebook that runs the RIGOR metrics on a user’s own dataset, and contains a tutorial on how to interpret the resulting plots and metrics (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/master/RIGOR_script.ipynb).

      Within this notebook there is a section focused on visually assessing the quality of both the raw data and the spike sorted data. The code in this section can be used to generate plots, such as raw data snippets or the raster map of the spiking activity, which are typically used to visually assess the quality of the data. In Figure 1 Supplement 2 we have provided examples of such plots that show different types of artifactual activity that should be inspected.

      Other Comments:

      (1) How did the authors select the metrics they would use to evaluate reproducibility? Was this selection made before doing the study?

      Our metrics were selected on the basis of our experience and expertise with extracellular electrophysiology. For example: some of us previously published on epileptiform activity and its characteristics in some mice (Steinmetz et al. 2017), so we included detection of that type of artifact here; and, some of us previously published detailed investigations of instability in extracellular electrophysiological recordings and methods for correcting them (Steinmetz et al. 2021, Windolf et al. 2024), so we included assessment of that property here. These metrics therefore represent our best expert knowledge about the kinds of quality issues that can affect this type of dataset, but it is certainly possible that future investigators will discover and characterize other quality issues.

      The selection of metrics was primarily performed before the study (we used these assessments internally before embarking on the extensive quantifications reported here), and in cases where we refined them further during the course of preparing this work, it was done without reference to statistical results on reproducibility but instead on the basis of manual inspection of data quality and metric performance.

      (2) Was reproducibility within-lab dependent on experimenter identity?

      We thank the reviewer for this question. We have addressed it in our response to R1 General comment 2, as follows:

      We agree that understanding experimenter-to-experimenter variability would be very interesting and indeed we had hoped to do this analysis for some time. The problem is that typically, each lab employed one trainee to conduct all the data collection. This prevents us from comparing outcomes from two different experimenters in the same lab. There are exceptions to this, such as the Churchland lab in which 3 personnel (two postdocs and a technician) collected the data. However, even this fortuitous situation did not lend itself well to assessing experimenter-to-experimenter variation: the Churchland lab moved from Cold Spring Harbor to UCLA during the data collection period, which might have caused variability that is totally independent of experimenter (e.g., different animal facilities). Further, once at UCLA, the postdoc and technician worked closely together- alternating roles in animal training, surgery and electrophysiology. We believe that the text in our current Discussion (line 465-468) accurately characterizes the situation:

      “Our experimental design precludes an analysis of whether the reproducibility we observed was driven by person-to-person standardization or lab-to-lab standardization. Most likely, both factors contributed: all lab personnel received standardized instructions for how to implant head bars and train animals, which likely reduced personnel-driven differences.”

      Quantifying the level of experience of each experimenter is an appealing idea and we share the reviewer’s curiosity about its impact on data quality. Unfortunately, quantifying experience is tricky. For instance, years of conducting surgeries is not an unambiguously determinable number. Would we count an experimenter who did surgery every day for a year as having the same experience as an experimenter who did surgery once/month for a year? Would we count a surgeon with expertise in other areas (e.g., windows for imaging) in the same way as surgeons with expertise in ephys-specific surgeries? Because of the ambiguities, we leave this analysis to be the subject of future work; this is now stated in the Discussion (line 476).

      (3) They note that UCLA and UW datasets tended to miss deeper brain region targets (lines 185-188) - they do not speculate why these labs show systematic differences. Were they not following standardized procedures?

      Thank you for raising this point. All researchers across labs were indeed following standardised procedures. We note that our statistical analysis of probe targeting coordinates and angles did not reveal a significant effect of lab identity on targeting error, even though we noted the large number of mis-targeted recordings in UCLA and UW to help draw attention to the appropriate feature in the figure. Given that these differences were not statistically significant, we can see how it was misleading to call out these two labs specifically. While the overall probe placement surface error and angle error both show no such systematic difference, the magnitude of surface error showed a non-significant tendency to be higher for samples in UCLA & UW, which, compounded with the direction of probe angle error, caused these probe insertions to land in a final location outside LP & PO.

      This shows how subtle differences in probe placement & angle accuracy can lead to compounded inaccuracies at the probe tip, especially when targeting deep brain regions, even when following standard procedures. We believe this is driven partly by the accuracy limit or resolution of the stereotaxic system, along with slight deviations in probe angle, occurring during the setup of the stereotaxic coordinate system during these recordings.

      We have updated the relevant text in lines 187-190 as follows, to clarify:

      “Several trajectories missed their targets in deeper brain regions (LP, PO), as indicated by gray blocks, despite the lack of significant lab-dependent effects in targeting as reported above. These off-target trajectories tended to have both a large displacement from the target insertion coordinates and a probe angle that unfavorably drew the insertions away from thalamic nuclei (Figure 2f).”

      (4) The authors suggest that geometrical variance (difference between planned and final identified probe position acquired from reconstructed histology) in probe placement at the brain surface is driven by inaccuracies in defining the stereotaxic coordinate system, including discrepancies between skull landmarks and the underlying brain structures. In this case, the use of skull landmarks (e.g. bregma) to determine locations of brain structures might be unreliable and provide an error of ~360 microns. While it is known that there is indeed variance in the position between skull landmarks and brain areas in different animals, the quantification of this error is a useful value for the field.

      We thank the reviewer for their thoughtful comment and are glad that they found the quantification of variance useful for the field.

      (5) Why are the thalamic recording results particularly hard to reproduce? Does the anatomy of the thalamus simply make it more sensitive to small errors in probe positioning relative to the other recorded areas?

      We thank the reviewer for raising this interesting question. We believe that they are referring to Figure 4: indeed when we analyzed the distribution of firing rate modulations, we saw some failures of reproducibility in area PO (bottom panel, Figure 4h). However, the thalamic nuclei were not, in other analyses, more vulnerable to failures in reproducibility. For example, in the top panel of Figure 4h, VisAM shows failures of reproducibility for modulation by the visual stimulus. In Fig. 5i, area CA1 showed a failure of reproducibility. We fear that the figure legend title in the previous version (which referred to the thalamus specifically) was misleading, and we have revised this. The new title is, “Neural activity is modulated during decision-making in five neural structures and is variable between laboratories.” This new text more accurately reflects that there were a number of small, idiosyncratic failures of reproducibility, but that these were not restricted to a specific structure. The new analysis requested by R1 (now in Figure 7) provides further reassurance of overall reproducibility, including in the thalamus (see Fig. 7a, right panels; lab identity could not be decoded from single neuron metrics, even in the thalamus).

      Reviewer #1 (Recommendations for the authors):

      (1) Figure font sizes and formatting are variable across panels and figures. Please streamline the presentation of results.

      Thank you for your feedback. We have remade all figures with the same standardized font sizes and formatting.

      (2) Please correct the noncontinuous color scales in Figures 3b and 3d.

      Thank you for pointing this out, we fixed the color bar.

      (3) In Figures 5d and g, the error bars are described as: 'Error bands are standard deviation across cells normalised by the square root of the number of sessions in the region'. How does one interpret this error? It seems to be related to the standard error of the mean (std/sqrt(n)) but instead of using the n from which the standard deviation is calculated (in this case across cells), the authors use the number of sessions as n. If they took the standard deviation across sessions this would be the sem across sessions, and interpretable (as sem*1.96 is the 95% parametric confidence interval of the mean). Please justify why these error bands are used here and how they can be interpreted - it also seems like it is the only time these types of error bands are used.

      We agree and for clarity use standard error across cells now, as the error bars do not change dramatically either way.

      (4) It is difficult to understand what is plotted in Figures 5e,h, please unpack this further and clarify.

      Thank you for pointing this out. We have added additional explanation in the figure caption (See caption for Figure 5c) to explain the KS test.

      (5) In lines 198-201 the authors state that they were worried that Bonferroni correction with 5 criteria would be too lenient, and therefore used 0.01 as alpha. I am unsure whether the authors mean that they are correcting for multiple comparisons across features or areas. Either way, 0.01 alpha is exactly what a Bonferroni corrected alpha would be when correcting for either 5 features or 5 areas: 0.05/5=0.01. Or do they mean they apply the Bonferroni correction to the new 0.01 alpha: i.e., 0.01/5=0.002? Please clarify.

      Thank you, that was indeed written confusingly. We considered all tests and regions as whole, so 7 tests * 5 regions = 35 tests, which would result in a very strong Bonferroni correction. Indeed, if one considers the different tests individually, the correction we apply from 0.05 to 0.01 can be considered as correcting for the number of regions, which we now highlight better. We apply no further corrections of any kind to our alpha=0.01. We clarified this in the manuscript in all relevant places (lines 205-208, 246, 297-298, and 726-727).

      (6) Did the authors take into account how many times a probe was used/how clean the probe was before each recording. Was this streamlined between labs? This can have an effect on yield and quality of recording.

      We appreciate the reviewer highlighting the potential impact of probe use and cleanliness on recording quality and yield. While we did not track the number of times each probe was used, we ensured that all probes were cleaned thoroughly after each use using a standardized cleaning protocol (Section 16: Cleaning the electrode after data acquisition in Appendix 2: IBL protocol for electrophysiology recording using Neuropixels probe). We acknowledge that tracking the specific usage history of each probe could provide additional insights, but unfortunately we did not track this information for this project. In prior work the re-usability of probes has been quantified, showing insignificant degradation with use (e.g. Extended Data Fig 7d from Jun et al. 2017).

      (7) Figure 3, Supplement1: DY_013 missed DG entirely? Was this included in the analysis?

      Thank you for this question. We believe the reviewer is referring to the lack of a prominent high-amplitude LFP band in this mouse, and lack of high-quality sorted units in that region. Despite this, our histology did localize the recording trajectory to DG. This recording did pass our quality control criteria overall, as indicated by the green label, and was used in relevant analyses.

      The lack of normal LFP features and neuron yield might reflect the range of biological variability (several other sessions also have relatively weak DG LFP and yield, though DY_013 is the weakest), or could reflect some damage to the tissue, for example as caused by local bleeding. Because we could not conclusively identify the source of this observation, we did not exclude it.

      (8) Given that the authors argue for using the MTNN over GLMs, it would be useful to know exactly how much better the MTNN is at predicting activity in the held-out dataset (shown in Figure 7, Supplement 1). It looks like a very small increase in prediction performance between MTNN and GLMs, is it significantly different?

      The average variance explained on the held-out dataset, as shown in Figure 8–Figure Supplement 1 Panel B, is 0.065 for the GLMs and 0.071 for the MTNN. As the reviewer correctly noted, this difference is not significant. However, one of the key advantages of the MTNN over GLMs lies in its flexibility to easily incorporate covariates, such as electrophysiological characteristics or session/lab IDs, directly into the analysis. This feature is particularly valuable for assessing effect sizes and understanding the contributions of various factors.

      (9) In line 723: why is the threshold for mean firing rate for a unit to be included in the MTNN results so high (>5Hz), and how does it perform on units with lower firing rates?      

      We thank the reviewer for pointing this out. The threshold for including units with a mean firing rate above 5 Hz was set because most units with firing rates below this threshold were silent in many trials, and reducing the number of units helped keep the MTNN training time reasonable. Based on this comment, we ran the MTNN experiments including all units with firing rates above 1 Hz, and the results remained consistent with our previous conclusions (Figure 8). Crucially, the leave-one-out analysis consistently showed that lab and session IDs had effect sizes close to zero, indicating that both within-lab and between-lab random effects are small and comparable.

      Reviewer #2 (Recommendations for the authors):

      (1) Most of the more major issues were already listed in the above comments. The strongest recommendation for additional work would be to improve the description and implementation of the RIGOR statistics such that non-IBL labs that might use Neuropixels probes but not use the entire IBL pipeline might be able to apply the RIGOR framework to their own data.

      We thank the reviewer for highlighting the importance of making the RIGOR statistics more accessible to a broader audience. We agree that improving the description and implementation of the RIGOR framework is essential for facilitation of non-IBL labs using Neuropixels probes. To address this we created a jupyter notebook with step-by-step guidance that is not dependent on the IBL pipeline. This tool (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/develop/RIGOR_script.ipynb) is publicly available through the repository, accompanied by example datasets and usage tutorials.

      (2) Table 1: How are qualitative features like "drift" defined? Some quantitative statistics like "presence ratio" (the fraction of the dataset where spikes are present) already exist in packages like ecephys_spike_sorting. Who measured these qualitative features? What are the best practices for doing these qualitative analyses?

      At the probe level, we compute the estimate of the relative motion of the electrodes to the brain tissue at multiple depths along the electrode. We overlay the drift estimation over a raster plot to detect sharp displacements as a function of time. Quantitatively, the drift is the cumulative absolute electrode motion estimated during spike sorting (µm). We clarified the corresponding text in Table 1.

      The qualitative assessments were carried out by IBL staff and experimentalists. We have now provided code to run the RIGOR metrics along with an embedded tutorial, to complement the supplemental figures we have shown about qualitative metric interpretation.

      (3) Table 1: What are the units for the LFP derivative?

      We thank the reviewer for noting that the unit was missing. The unit (decibel per unit of space) is now in the table.

      (4) Table 1: For "amplitude cutoff", the table says that "each neuron must pass a metric". What is the metric?

      We have revised the table to include this information. This metric was designed to detect potential issues in amplitude distributions caused by thresholding during deconvolution, which could result in missed spikes. There are quantitative thresholds on the distribution of the low tail of the amplitude histogram relative to the high tail, and on the relative magnitude of the bins in the low tail. We now reference the methods text from the table, which includes a more extended description and gives the specific threshold numbers. Also, the metric and thresholds are more easily understood with graphical assistance; see the IBL Spike Sorting Whitepaper for this (Fig. 17 in that document and nearby text; https://doi.org/10.6084/m9.figshare.19705522.v4). This reference is now also cited in the text.

      (5) Figure 2: In panel A, the brain images look corrupted.

      Thanks; in the revised version we have changed the filetype to improve the quality of the panel image.

      (6) Figure 7: In panel D, make R2 into R^2 (with a superscript)

      Panel D y-axis label has been revised to include superscript (note that this figure is now Figure 8).

      Works Cited

      Julie M.J. Fabre, Enny H. van Beest, Andrew J. Peters, Matteo Carandini, and Kenneth D. Harris. Bombcell: automated curation and cell classification of spike-sorted electrophysiology data, July 2023. URL https://doi.org/10.5281/zenodo.8172822.

      James J. Jun, Nicholas A. Steinmetz, Joshua H. Siegle, Daniel J. Denman, Marius Bauza, Brian Barbarits, Albert K. Lee, Costas A. Anastassiou, Alexandru Andrei, C¸ a˘gatayAydın, Mladen Barbic, Timothy J. Blanche, Vincent Bonin, Jo˜ao Couto, Barundeb Dutta, Sergey L. Gratiy, Diego A. Gutnisky, Michael H¨ausser, Bill Karsh, Peter Ledochowitsch, Carolina Mora Lopez, Catalin Mitelut, Silke Musa, Michael Okun, Marius Pachitariu, Jan Putzeys, P. Dylan Rich, Cyrille Rossant, Wei-lung Sun, Karel Svoboda, Matteo Carandini, Kenneth D. Harris, Christof Koch, John O’Keefe, and Timothy D.Harris. Fully integrated silicon probes for high-density recording of neural activity.Nature, 551(7679):232–236, Nov 2017. ISSN 1476-4687. doi: 10.1038/nature24636. URL https://doi.org/10.1038/nature24636.

      Simon Musall, Xiaonan R. Sun, Hemanth Mohan, Xu An, Steven Gluf, Shu-Jing Li, Rhonda Drewes, Emma Cravo, Irene Lenzi, Chaoqun Yin, Bj¨orn M. Kampa, and Anne K. Churchland. Pyramidal cell types drive functionally distinct cortical activity patterns during decision-making. Nature Neuroscience, 26(3):495– 505, Mar 2023. ISSN 1546-1726. doi: 10.1038/s41593-022-01245-9. URL https://doi.org/10.1038/s41593-022-01245-9.

      Ivana Orsolic, Maxime Rio, Thomas D Mrsic-Flogel, and Petr Znamenskiy. Mesoscale cortical dynamics reflect the interaction of sensory evidence and temporal expectation during perceptual decision-making. Neuron, 109(11):1861–1875.e10, April 2021. Hyeong-Dong Park, St´ephanie Correia, Antoine Ducorps, and Catherine Tallon-Baudry.Spontaneous fluctuations in neural responses to heartbeats predict visual detection.Nature Neuroscience, 17(4):612–618, Apr 2014. ISSN 1546-1726. doi: 10.1038/nn.3671. URL https://doi.org/10.1038/nn.3671.

      Lorenzo Posani, Shuqi Wang, Samuel Muscinelli, Liam Paninski, and Stefano Fusi. Rarely categorical, always high-dimensional: how the neural code changes along the cortical hierarchy. bioRxiv, 2024. doi: 10.1101/2024.11.15.623878. URL https://www.biorxiv.org/content/early/2024/12/09/2024.11.15.623878.

      Nicholas A. Steinmetz, Christina Buetfering, Jerome Lecoq, Christian R. Lee, Andrew J. Peters, Elina A. K. Jacobs, Philip Coen, Douglas R. Ollerenshaw, Matthew T. Valley, Saskia E. J. de Vries, Marina Garrett, Jun Zhuang, Peter A. Groblewski, Sahar Manavi, Jesse Miles, Casey White, Eric Lee, Fiona Griffin, Joshua D. Larkin, Kate Roll, Sissy Cross, Thuyanh V. Nguyen, Rachael Larsen, Julie Pendergraft, Tanya Daigle, Bosiljka Tasic, Carol L. Thompson, Jack Waters, Shawn Olsen, David J. Margolis, Hongkui Zeng, Michael Hausser, Matteo Carandini, and Kenneth D. Harris. Aberrant cortical activity in multiple gcamp6-expressing transgenic mouse lines. eNeuro, 4(5), 2017. doi: 10.1523/ENEURO.0207-17.2017. URL https://www.eneuro.org/content/4/5/ENEURO.0207-17.2017.

      Nicholas A. Steinmetz, Peter Zatka-Haas, Matteo Carandini, and Kenneth D. Harris. Distributed coding of choice, action and engagement across the mouse brain. Nature, 576(7786):266–273, Dec 2019. ISSN 1476-4687. doi: 10.1038/s41586-019-1787-x. URL https://doi.org/10.1038/s41586-019-1787-x.

      Nicholas A. Steinmetz, Cagatay Aydin, Anna Lebedeva, Michael Okun, Marius Pachitariu, Marius Bauza, Maxime Beau, Jai Bhagat, Claudia B¨ohm, Martijn Broux, Susu Chen, Jennifer Colonell, Richard J. Gardner, Bill Karsh, Fabian Kloosterman, Dimitar Kostadinov, Carolina Mora-Lopez, John O’Callaghan, Junchol Park, Jan Putzeys, Britton Sauerbrei, Rik J. J. van Daal, Abraham Z. Vollan, Shiwei Wang, Marleen Welkenhuysen, Zhiwen Ye, Joshua T. Dudman, Barundeb Dutta, Adam W. Hantman,Kenneth D. Harris, Albert K. Lee, Edvard I. Moser, John O’Keefe, Alfonso Renart, Karel Svoboda, Michael H¨ausser, Sebastian Haesler, Matteo Carandini, and Timothy D. Harris. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science, 372(6539):eabf4588, 2021. doi: 10.1126/science.abf4588.URL https://www.science.org/doi/abs/10.1126/science.abf4588.

      Charlie Windolf, Han Yu, Angelique C. Paulk, Domokos Mesz´ena, William Mu˜noz, Julien Boussard, Richard Hardstone, Irene Caprara, Mohsen Jamali, Yoav Kfir, Duo Xu, Jason E. Chung, Kristin K. Sellers, Zhiwen Ye, Jordan Shaker, Anna Lebedeva, Manu Raghavan, Eric Trautmann, Max Melin, Jo˜ao Couto, Samuel Garcia, Brian Coughlin, Csaba Horv´ath, Rich´ard Fi´ath, Istv´an Ulbert, J. Anthony Movshon, Michael N. Shadlen, Mark M. Churchland, Anne K. Churchland, Nicholas A. Steinmetz, Edward F. Chang, Jeffrey S. Schweitzer, Ziv M. Williams, Sydney S. Cash, Liam Paninski, and Erdem Varol. Dredge: robust motion correction for high-density extracellular recordings across species. bioRxiv, 2023. doi: 10.1101/2023.10.24.563768. URL https://www.biorxiv.org/content/early/2023/10/29/2023.10.24.563768.

    1. eLife Assessment

      This study provides valuable insights into the evolutionary histories and cellular infection responses of two Salmonella Dublin genotypes. While the evidence is compelling, a more phylogenetically diverse bacterial collection would enhance the findings. This research is relevant to scientists studying Salmonella and gastroenteritis-related pathogens.

    2. Reviewer #1 (Public review):

      The manuscript consists of two separate but interlinked investigations: genomic epidemiology and virulence assessment of Salmonella Dublin. ST10 dominates the epidemiological landscape of S. Dublin, while ST74 was uncommonly isolated. Detailed genomic epidemiology of ST10 unfolded the evolutionary history of this common genotype, highlighting clonal expansions linked to each distinct geography. Notably, North American ST10 was associated with more antimicrobial resistance compared to others. The authors also performed long read sequencing on a subset of isolates (ST10 and ST74), and uncovered a novel recombinant virulence plasmid in ST10 (IncX1/IncFII/IncN). Separately, the authors performed cell invasion and cytotoxicity assays on the two S. Dublin genotypes, showing differential responses between the two STs. ST74 replicates better intracellularly in macrophage compared to ST10, but both STs induced comparable cytotoxicity levels. Comparative genomic analyses between the two genotypes showed certain genetic content unique to each genotype, but no further analyses were conducted to investigate which genetic factors likely associated with the observed differences. The study provides a comprehensive and novel understanding on the evolution and adaptation of two S. Dublin genotypes, which can inform public health measures. The methodology included in both approaches were sound and written in sufficient detail, and data analysis were performed with rigour. Source data were fully presented and accessible to readers.

      Comments on revised version:

      The authors have addressed all the points raised by the reviewer. The manuscript is now much enhanced in clarity and accuracy. The re-written Discussion is more relevant and brings in comparison with other invasive Salmonella serotypes.

      Comments:

      In light of the metadata supplied in this revision, for Australian isolates, all human cases of ST74 (n=7) were from faeces (assuming from gastroenteritis) while 18/40 of ST10 were from invasive specimen (blood and abscess). This may contradict with the manuscript's finding and discussion on different experiment phenotypes of the two STs, with ST74 showing more replication in macrophages and potentially more invasive. Thus, the reviewer suggests the authors to mention this disparity in the Discussion, and discuss possible reasons underlying this disparity. This can strengthen the author's rationale for further in vivo studies.

    3. Reviewer #2 (Public review):

      This is a comprehensive analysis of Salmonella Dublin genomes that offers insights into the global spread of this pathogen and region-specific traits that are important to understand its evolution. The phenotyping of isolates of ST10 and ST74 also offer insights into the variability that can be seen in S. Dublin, which is also seen in other Salmonella serovars, and reminds the field that it is important to look beyond lab-adapted strains to truly understand these pathogens. This is a valuable contribution to the field. The only limitation, which the authors also acknowledge, is the bias towards S. Dublin genomes from high income settings. However, there is no selection bias; this is simply a consequence of publicly available sequences.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      The manuscript consists of two separate but interlinked investigations: genomic epidemiology and virulence assessment of Salmonella Dublin. ST10 dominates the epidemiological landscape of S. Dublin, while ST74 was uncommonly isolated. Detailed genomic epidemiology of ST10 unfolded the evolutionary history of this common genotype, highlighting clonal expansions linked to each distinct geography. Notably, North American ST10 was associated with more antimicrobial resistance compared to others. The authors also performed long-read sequencing on a subset of isolates (ST10 and ST74) and uncovered a novel recombinant virulence plasmid in ST10 (IncX1/IncFII/IncN). Separately, the authors performed cell invasion and cytotoxicity assays on the two S. Dublin genotypes, showing differential responses between the two STs. ST74 replicates better intracellularly in macrophages compared to ST10, but both STs induced comparable cytotoxicity levels.

      Comparative genomic analyses between the two genotypes showed certain genetic content unique to each genotype, but no further analyses were conducted to investigate which genetic factors were likely associated with the observed differences. The study provides a comprehensive and novel understanding of the evolution and adaptation of two S. Dublin genotypes, which can inform public health measures. 

      The methodology included in both approaches was sound and written in sufficient detail, and data analysis was performed with rigour. Source data were fully presented and accessible to readers. Certain aspects of the manuscript could be clarified and extended to improve the manuscript. 

      (1) For epidemiology purposes, it is not clear which human diseases were associated with the genomes included in this manuscript. This is important since S. Dublin can cause invasive bloodstream infections in humans. While such information may be unavailable for public sequences, this should be detailed for the 53 isolates sequenced for this study, especially for isolates selected to perform experiments in vitro.

      Thank you for the suggestion. We have added the sample type for the 53 isolates sequenced for this study. These additional details have been added to Supplementary Tables 1, 4, 9 and 10.

      (2) The major AMR plasmid in described S. Dublin was the IncC associated with clonal expansion in North America. While this plasmid is not found in the Australian isolates sequenced in this study, the reviewer finds that it is still important to include its characterization, since it carries blaCMY-2 and was sustainedly inherited in ST10 clade 5. If the plasmid structure is already published, the authors should include the accession number in the Main Results.

      We have provided accessions and context for two of the IncC hybrid plasmids that have been previously reported in the literature in the Introduction. The text now reads:

      “These MDR S. Dublin isolates all type as sequence type 10 (ST10), and the AMR determinants have been demonstrated to be carried on an IncC plasmid that has recombined with a virulence plasmid encoding the spvRABCD operon (12,16,18,19).  This has resulted in hybrid virulence and AMR plasmids circulating in North America including a 329kb megaplasmid with IncX1, IncFIA, IncFIB, and IncFII replicons (isolate CVM22429, NCBI accession CP032397.1) (12,16) and a smaller hybrid plasmid 172,265 bases in size with an IncX1 replicon (isolate N13-01125, NCBI accession KX815983.1) (19).”

      Further characterisation of the IncA/C plasmid circulating in North America was beyond the scope of this study.

      (a) The reviewer is concerned that the multiple annotations missing in  plasmid structures in Supplementary Figures 5 & 6, and  genetic content unique to ST10 and ST74 was due to insufficient annotation by Prokka. I would recommend the authors use another annotation tool, such as Bakta (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8743544/) for plasmid annotation, and reconstruction of the pangenome described in Supplementary Figure 10. Since the recombinant virulence plasmid in ST10 is a novel one, I would recommend putting Supplementary Figure 5 as a main figure, with better annotations to show the virulence region, plasmid maintenance/replication, and possible conjugation cluster.

      In the supplementary figures of the plasmids, we sought to highlight key traits on interest on the plasmids, namely plasmid replicons, antimicrobial resistance and heavy metal resistance (Supplementary Figure 5) and virulence genes (Supplementary Figure 6). The inclusion of the accessions of publicly available isolates provide for characterised plasmids such as the S. Dublin virulence plasmid (NCBI accession: CP001143). 

      For the potentially hybrid plasmid with IncN/IncX1/IncFII reported in Supplementary Figure 6, we have undertaken additional analyses of the two Australian isolates to reannotate these isolates with Bakta which provides for more detailed annotations. 

      We have added new text to the methods which reads as: 

      “The final genome assemblies were confirmed as S. Dublin using SISTR and annotated using both Prokka v1.14.6 (69) for consistency with the draft genome assemblies and  Bakta v1.10.1 (93) which provides for more detailed annotations (Supplementary Table 13). Both Prokka and Bakta annotations were in agreement for AMR, HMR and virulence genes, with Bakta annotating between 3-7 additional CDS which were largely ‘hypothetical protein’.”

      For the pangenome analysis of the seven ST74 and ten ST10 isolates, we have continued to use the Prokka annotated draft genome assemblies for input to Panaroo. 

      (4) The authors are lauded for the use of multiple strains of ST10 and ST74 in the in vitro experiment. While results for ST74 were more consistent, readouts from ST10 were more heterogenous (Figure 5, 6). This is interesting as the tested ST10 were mostly clade 1, so ST10 was, as expected, of lower genetic diversity compared to tested ST74 (partly shown in Figure 1D. Could the authors confirm this by constructing an SNP table separately for tested ST10 and ST74? Additionally, the tested ST10 did not represent the phylogenetic diversity of the global epidemiology, and this limitation should be reflected in the Discussion.

      In response to the reviewer’s comments, we have provided a detailed SNP table (Supplementary Table 12) to further clarify the genetic diversity within the tested ST10 and ST74 strains. 

      Additionally, we have expanded on the limitation regarding the phylogenetic diversity of the ST10 isolates in the Discussion, highlighting how the strains used in the in vitro experiments may not fully represent the global epidemiological diversity of S. Dublin ST10. The new text now reads:

      “This study has limitations, including a focus on ST10 isolates from clade 1, which do not represent global phylogenetic diversity. Nonetheless, our pangenome analysis identified >900 uncharacterised genes unique to ST74, offering potential targets for future research. Another limitation is the geographic bias in available genomes, with underrepresentation from Asia and South America. This reflects broader disparities in genomic research resources but may improve as public health genomics capacity expands globally.”

      (5) The comparative genomics between ST10 and ST74 can be further improved to allow more interpretation of the experiments. Why were only SPI-1, 2, 6, and 19 included in the search for virulome, how about other SPIs? ST74 lacks SPI-19 and has truncated SPI-6, so what would explain the larger genome size of ST74? Have the authors screened for other SPIs using more well-annotated databases or references (S. Typhi CT18 or S. Typhimurium ST313)? The mismatching between in silico prediction of invasiveness and phenotypes also warrants a brief discussion, perhaps linked to bigger ST74 genome size (as intracellular lifestyle is usually linked with genome degradation).

      Systematic screening for SPIs with detailed reporting on individual genes and known effectors is still an area of development in Salmonella comparative genomics. In our characterisation of the virulome in this S. Dublin dataset we decided to focus on SPI1, SPI-2, SPI-6 and SPI-19 as these had been identified in previous studies and were considered to be most likely linked to the invasive phenotype of S. Dublin. We thought the truncation of SPI-6 and lack of SPI-19 in ST74 compared to the ST10 isolates would provide a basis to explore genomic differences in the two genotypes, with the screening for individual genes on each SPIs reported in Supplementary Figure 7 and Supplementary Table 9.  

      We have expanded upon the mismatching of the in silico prediction of invasiveness and phenotypes in the Discussion. We now explore the increased genome size and intracellular replication of the ST74 population. We hypothesise that invasiveness has not been studied as thoroughly in zoonotic iNTS as much as human adapted iNTS and S. Typhi, and the increased genome content may be required for survival in different host species. The new text now reads:

      “Our phenotypic data demonstrated a striking difference in replication dynamics between ST10 and ST74 populations in human macrophages. ST74 isolates replicated significantly over 24 hours, whereas ST10 isolates were rapidly cleared after 9 hours of infection. ST74 induced significantly less host cell death during the early-mid stage of macrophage infection, supported by limited processing and release of IL-1ß at 9 hpi. While NTS are generally potent inflammasome activators (60), most supporting data come from laboratory-adapted S. Typhimurium strains. Our findings suggest that ST74 isolates may employ immune evasion mechanisms to avoid host recognition and activation of cell death signaling in early infection stages. Similar trends have been observed with S. Typhimurium ST313, which induces less inflammasome activation than ST19 during murine macrophage infection (61). This could facilitate increased replication and dissemination at later stages of infection. Consistent with this, we observed comparable cytotoxicity between ST10 and ST74 isolates at 24 hpi, suggesting ST74 induces cell death via alternative mechanisms once intracellular bacterial numbers are unsustainable. Further research is needed to identify genomic factors underpinning these observations.”

      (6) On the epidemiology scale, ST10 is more successful, perhaps due to its ongoing adaptation to replication inside GI epithelial cells, favouring shedding. ST74 may tend to cause more invasive disease and less transmission via fecal shedding. The presence of T6SS in ST10 also can benefit its competition with other gut commensals, overcoming gut colonization resistance. The reviewer thinks that these details should be more clearly rephrased in the Discussion, as the results highly suggested different adaptations of two genotypes of the same serovar, leading to different epidemiological success.

      We thank the reviewer for highlighting that we could rephrase this important point. We have added additional text in the Discussion to better interpret the differences in the two genotypes of S. Dublin and how this relates to difference epidemiological success. The new text now reads:

      “While machine learning predicted lower invasiveness for ST74 compared to ST10, the increased genomic content of ST74 may support higher replication in macrophages. We speculate that increased intracellular replication could enhance systemic dissemination, though this requires in vivo validation. Invasiveness of S. enterica is often linked to genome degradation (4,62–64). However, this is mostly based on studies of human-adapted iNTS (ST313) and S. Typhi, leaving open the possibility that the additional genomic content of ST74 supports survival in diverse host species. An uncharacterised virulence factor may underlie this replication advantage. Collectively, these findings highlight phenotypic differences between S. Dublin populations ST10 and ST74. Enhanced intra-macrophage survival of ST74 could promote invasive disease, whereas the prevalence of ST10 may relate to better intestinal adaptation and enhanced faecal shedding. In vivo models are needed to test this hypothesis. Interestingly, the absence of SPI-19 in ST74, which encodes a T6SS, may reflect adaptation to enhanced replication in macrophages. SPI-19 has been linked to intestinal colonisation in poultry (23,56) and mucosal virulence in mice (56). It’s possible that the efficient replication of ST74 in macrophages might compensate for the absence of SPI-19, relying instead on phagocyte uptake via M cells or dendritic cells. The larger pangenome of ST74 compared to ST10 could further enhance survival within hosts. These findings highlight important knowledge gaps in zoonotic NTS host-pathogen interactions and drivers of emerging invasive NTS lineages with broad host ranges.”

      Reviewer #2 (Public review): 

      This is a comprehensive analysis of Salmonella Dublin genomes that offers insights into the global spread of this pathogen and region-specific traits that are important to understanding its evolution. The phenotyping of isolates of ST10 and ST74 also offers insights into the variability that can be seen in S. Dublin, which is also seen in other Salmonella serovars, and reminds the field that it is important to look beyond lab-adapted strains to truly understand these pathogens. This is a valuable contribution to the field. The only limitation, which the authors also acknowledge, is the bias towards S. Dublin genomes from high-income settings. However, there is no selection bias; this is simply a consequence of publically available sequences.

      Reviewer #1 (Recommendations for the authors): 

      (1) The Abstract did not summarize the main findings of the study. The authors should rewrite to highlight the key findings in genomic epidemiology (low AMR generally, novel plasmid of which Inc type, etc.) and the in vitro experiments. The findings clearly illustrate the differing adaptations of the two genotypes. Suggest to omit 'economic burden' and 'livestock' as this study did not specifically address them.

      We agree with the Reviewer and have re-written the abstract to directly reflect the major outcomes of the research. We have also deleted wording such as ‘livestock’, ‘economic burden’ and ‘One Health’ as we did not specifically address these issues as highlighted by the Reviewer. 

      (2) Figure 2: The MCC tree should include posterior support in major internal nodes. The current colour scheme is also confusing to readers (columns 1, 2). Suggest to revise and include additional key information as columns: major AMR genes (blaCMY-2, strAB, floR) and mer locus, so this info can be visualized in the main figure. 

      Thank you for your valuable feedback. We have revised Figure 2 with the MCC tree to include posterior support on the internal nodes. We have also amended the figure legend to explain the additional coloured internal nodes. We have also amended the heatmap in Figure 2 to include additional white space between the columns to make it easier for the readers to distinguish. We didn’t change the colours in this figure as we have used the same colours throughout for the different traits reported in this study. Further, we chose to keep the AMR profiles reported in Figure 2 at the susceptible, resistant or MDR. This was done to convey the overview of the AMR profiles, and we provide detail in the AMR and HMR determinants in the Supplementary Figures and Tables. 

      (3) The manuscript title is not informative, as it did not study the 'dynamics' of the two genotypes. Suggest to revise the study title along the lines of main results.

      Thank you for the feedback on the title. We have amended this to better reflect the main findings of the study, and it now reads as “Distinct adaptation and epidemiological success of different genotypes within Salmonella enterica serovar Dublin”

      (4) The co-occurrence of AMR and heavy metal resistance genes (like mer) are quite common in Salmonella and E. coli. This is not a novel finding. The reviewer would suggest shortening the details related to heavy metal resistance in Results and Discussion, to make the writing more streamlined. 

      In line with the Reviewer comments, we have shortened the details in the Results and Discussion on the co-occurrence of AMR and HMR.  

      (5) L185: missing info after n=82. 

      This has been revised to now read as “n=82 from Canada”. 

      (6) I think Vi refers to the capsular antigen, not flagelle. Please double-check this.

      Thank you for highlighting this mistake. We have revised all instances.

      (7) L252-253: which statistic was used to state 'no association'. Also, there is no evidence presented to support 'no fitness cost associated with resistance and virulence."

      We have removed this sentence.

      (8) 320: Figure 6F is a scatterplot, not PCA. Please confirm. 

      The reviewer is correct, this is in fact a scatterplot. We have amended the figure legend and text.

      (9) For Discussion, it would be helpful to compare the phenotype findings with that of other invasive Salmonella like Typhi or Typhimurium ST313.

      Thank you for noting this, we had alluded to findings from ST313 but have now expanded include some further comparisons to S. Typhimurium ST313 and added references for these within the Discussion. The additional text now reads:

      “Similar trends have been observed with S. Typhimurium ST313, which induces less inflammasome activation than ST19 during murine macrophage infection (61). This could facilitate increased replication and dissemination at later stages of infection.”

      "Invasiveness of S. enterica is often linked to genome degradation (4,62–64).

      However, this is mostly based on studies of human-adapted iNTS (ST313) and S. Typhi, leaving open the possibility that the additional genomic content of ST74 supports survival in diverse host species. An uncharacterised virulence factor may underlie this replication advantage.”

      (10) L440: no evidence for "successful colonization" of ST74. Actually, the findings suggested otherwise.

      Thank you for picking this up, we have amended the sentence to better reflect the findings. The amended text now reads as:

      “It’s possible that the efficient replication of ST74 in macrophages might compensate for the absence of SPI-19, relying instead on phagocyte uptake via M cells or dendritic cells. The larger pangenome of ST74 compared to ST10 could further enhance survival within hosts.”

      (11) L460-461: The data did not show an increasing trend of iNTS related to S. Dublin.

      Thank you for identifying this. This sentence has been revised accordingly and now reads as:

      “While the data did not indicate an increasing trend of iNTS associated with S. Dublin, the potential public health risk of this pathogen suggests it may still warrant considering it a notifiable disease, similar to typhoid and paratyphoid fever.”

      (12) L465: Data were not analyzed explicitly in the context of animal vs. human. Suggest omitting 'One Health' from the conclusion.

      Thank you for the suggestion. We have omitted “One Health” from the conclusion

      (13) L500: Was the alignment not checked for recombination using Gubbins? The approach here is inconsistent with the method described in the subtree selected for BEAST analysis (L546).

      We have now applied Gubbins to the phylogenetic tree constructed using IQTREE, and the methods and results have been updated accordingly.

      (14) What was the output of Tempest? Correlation or R2 value? 

      We have now included the R2 value from Tempest and reported this in the manuscript. 

      (15) L556: marginal likelihood to allow evaluation of the best-fit model. Please rephrase to state this clearly.

      We have rephrased this in the manuscript to state this clearly.

    1. eLife Assessment

      This valuable study reports that epididymal proteins are required for embryogenesis after fertilization. The data presented are generally supportive of the conclusion and considered solid. This work will be of interest to reproductive biologists and andrologists.

    2. Reviewer #1 (Public review):

      Summary:

      The main observation that the sperm from CRISP proteins 1 and 3 KO lines are post-fertilization less developmentally competent is convincing. The data showing progressive acquisition of the sperm defects during epididymal transport and the exchange fluid studies showing the altered epididymal environment are important. However, the molecular characterization of the mechanism(s) that leads to these defects requires additional studies.

      Strengths:

      The generation of these double mutant mice is valuable for the field. Moreover, the fact that the double mutant line of Crisp 1 and 3 is phenotypically different from the Crisp 1 and 4 line suggests different functions of these epididymis proteins. The methods used to demonstrate that developmental defects are largely due to post-fertilization defects are also a considerable strength. The initial characterization that these sperm have altered intracellular Ca2+ levels, and increased rates of DNA fragmentation are valuable. The increase fragmentation of control sperm DNA when exposed to mutant epididymal fluid is significant and an excellent platform for future studies.

      Weaknesses:

      The study is mechanistically incomplete because evidence of how these proteins alter the environment is not shown. What are the target(s) of these proteins that result in increased Ca2+?

    3. Reviewer #2 (Public review):

      Summary:

      The study highlights the role of CRISP1 and CRISP3, two epididymal proteins, in early embryo development through DNA integrity. The authors demonstrate that C1/C3 DKO sperm exhibit defects in the DNA integrity, probably due to Ca2+ dysregulation in the epididymis. However, direct evidence for this mechanism requires further experiments. The finding of the involvement of the epididymal environment in embryogenesis is significant, but some results on sperm fertilizing ability of C1/C3 DKO mice were similar to the previous report. Thus, this point raises concern about the perspective of novelty.

      Strengths:

      The authors demonstrate that CRISP1 and CRISP3 regulate Ca2+ in the epididymal fluid, and loss of CRISP1 and CRISP3 disrupts Ca2+ regulation in the epididymal fluid, leading to sperm DNA fragmentation and impaired embryonic development after fertilization. This proposed mechanism is both novel and intriguing, offering valuable insights into the epididymal control of sperm quality.

      Weaknesses:

      The evidence supporting the mechanism of CRISP1 and CRISP3 in calcium regulation within epididymis and its contribution to the sperm DNA damage remains limited.

      Major comments:

      The data provided in this manuscript (Figure 2A and B) appear to overlap with data in previously published paper (PMID:33037689), despite differences in the duration of in vivo fertilization after mating. The results in both studies show similar findings, raising concerns about potential data redundancy.

      As shown in Figure 6A, while wild-type sperm were exposed to the epididymal fluid of C1/C3 DKO mice, the wild-type sperm exhibited DNA fragmentation. Additionally, when wild-type sperm were exposed to the epididymal fluid of wild-type mice with 10 mM Ca2+, DNA fragmentation is still observed. Therefore, the authors conclude that the DNA fragmentation in C1/C3 DKO sperm is due to the increased level of the Ca2+. However, the connection between the DNA damage in wild-type sperm exposed to the epididymal fluid of C1/C3 DKO mice and the increased levels of Ca2+ remains unclear. To clarify this, it is suggested that intracellular calcium levels in the wild type sperm should be analyzed before and after exposure to the epididymal fluid of C1/C3 DKO mice (or before and after adding 10 mM Ca2+ into wild-type fluid). Furthermore, the author should explain detailed information on epididymal fluid collection, because Ca2+ levels vary between different sections of the epididymis.

      In lines 321-323, the authors mention the selection system of the female reproductive tract that only allows high-quality sperm to reach the eggs (Cummins and Yanagimachi 1982), but this paper is not listed in the bibliography. It is important to ensure proper referencing.

      The discussion section is too long and difficult to follow well because there is redundancy of the results in many parts. It is recommended to shorten it by focusing only on relevant and important information.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The main observation that the sperm from CRISP proteins 1 and 3 KO lines are postfertilization less developmentally competent is convincing. However, the molecular characterization of the mechanism that leads to these defects and the temporal appearance of the defects requires additional studies.

      We thank the reviewer for the valuable comments. As requested, additional experiments were carried out to analyze both the molecular mechanisms and the temporal appearance of the observed defects. Our results showed that DNA integrity defects appear during epididymal maturation and/or storage (see Figure 5B), that the epididymal fluid contributes to sperm DNA fragmentation defects (See Figure 6A) and that these defects seem not to be due to an increase in oxidative stress (Figure 5C) but rather to a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis (Figure 6A,B).

      Strengths:

      The generation of these double mutant mice is valuable for the field. Moreover, the fact that the double mutant line of Crisp 1 and 3 is phenotypically different from the Crisp 1 and 4 line suggests different functions of these epididymis proteins. The methods used to demonstrate that developmental defects are largely due to post-fertilization defects are also a considerable strength. The initial characterization of these sperm has altered intracellular Ca<sup>2+</sup> levels, and increased rates of DNA fragmentation are valuable.

      We thank the reviewer for the positive comments on our work.

      Weaknesses:

      The study is mechanistically incomplete because there is no direct demonstration that the absence of these proteins alters the epididymal environment and fluid, wherein during the passage through the epididymis the sperm become affected. Also, a direct demonstration of how the proteins in question cause or lead to DNA damage and increased Ca<sup>2+</sup> requires further characterization.

      The new experiments included in the revised version (see Figure 6A) showed that exposure of control WT sperm to epididymal fluid form mutant mice leads to an increase in sperm DNA fragmentation levels, confirming that the absence of CRISP1 and CRISP3 alters the epididymal fluid wherein the sperm become affected. In addition, new observations showing that WT sperm exposed to WT epididymal fluid in the presence of Ca<sup>2+</sup> also exhibit higher DNA fragmentation levels (Figure 6A) together with the finding that mutant sperm exhibit higher intracellular Ca<sup>2+</sup> levels (Figure 6B) but no higher levels of ROS, strongly support a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis and sperm as the main responsible for DNA integrity defects.

      Reviewer #2 (Public Review):

      The authors showed that CRISP1 and CRISP3, secreted proteins in the epididymis, are required for early embryogenesis after fertilization through DNA integrity in cauda epididymal sperm. This paper is the first report showing that the epididymal proteins are required for embryogenesis after fertilization. However, some data in this paper (Table 1 and Figure 2A) are overlapped in a published paper (Curci et al., FASEB J, 34,15718-15733, 2020; PMID: 33037689). Furthermore, the authors did not address why the disruption of CRISP1/3 leads to these phenomena (the increased level of the intracellular Ca<sup>2+</sup> level and impaired DNA integrity in sperm) with direct evidence. Therefore, if the authors can address the following comments to improve the paper's novelty and clarification, this paper may be worthwhile to readers.

      We thank the reviewer for the constructive comments. Regarding the data included in Table 1 and Figure 2A, it is important to note that Table 1 includes data on embryo development corresponding to C1/C4 DKO mice not published before in which the data on embryo development corresponding to C1/C3 DKO was used as simultaneous control. Figure 2A showed in vivo fertilization results at short times after mating (4h instead of 18 h) that have been neither reported before.

      Regarding studies to address why the disruption of CRISP1 and CRISP3 leads to defects in DNA integrity and Ca<sup>2+</sup> levels, we have carried out new experiments showing that mutant sperm do not exhibit higher levels of ROS (see Figure 5C), not favoring oxidative stress as the mechanism underlying mutant sperm defects. In addition, we found that DNA integrity defects develop during epididymal transit (Figure 5B) and that exposure of WT sperm to epididymal fluid from mutant mice leads to an increase in sperm DNA fragmentation levels (Figure 6A), confirming that the absence of CRISP1 and CRISP3 alters the epididymal fluid. Finally, our new results showing that WT sperm exposed to WT epididymal fluid in the presence of Ca<sup>2+</sup> also exhibit higher DNA fragmentation levels (Figure 6A) together with the higher intracellular Ca<sup>2+</sup> levels detected in mutant sperm (Figure 6B) strongly support a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis and sperm as the main responsible for DNA integrity defects.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall comments:

      This manuscript investigates the mechanisms whereby the absence of the epididymal CRISP proteins 1 and 3 (Cysteine-Rich Secretory Proteins) causes infertility and lower embryo developmental rates. This strain's infertility seems to have a post-fertilization origin because the rates of in vivo fertilization are like the controls, but the development to the blastocyst stage is decreased. The results of this study show that (1) mutant sperm viability, progressive motility, and morphology are normal;

      (2) in vivo fertilization rates are comparable to controls, but embryo development is reduced;

      (3) in vitro fertilization studies found reduced fertilization rates and activation rates even in zona-free studies;

      (4) additional functional studies showed increased rates of DNA fragmentation and elevated Ca<sup>2+</sup> levels in mutant sperm.

      The results presented are credible and hint that the epididymis might play a role before and after fertilization and directly affect embryo development. However, the study is mechanistically incomplete, as there is no direct demonstration that the absence of these proteins alters the epididymal environment and fluid, wherein the passage through the epididymis the sperm become functionally defective, and whether mutant or control epididymal fluid or purified CRISP proteins can change, either reduce or overcome, respectively, the developmental competence of the control or mutant sperm and induce functional changes in the counterpart sperm. In summary, the main observation that the sperm from CRISP proteins 1 and 3 KO lines are post-fertilization less developmentally competent is significant and important, but the molecular characterization of the defects and the temporal appearance of defects requires additional studies.

      Specific comments:

      (1) Introduction.

      It is too long. The description of the function of the epididymis should be reduced. The functional properties of the Crisp genes should also be substantially shortened.

      As requested, the Introduction has been revised and descriptions of the epididymis and CRISP have been shortened

      (2) Results.

      • Lines 140 to 142. Remove these initial lines. Start directly addressing the results of the C1/C3 strain, which is the mutant under consideration here. Referring to the C1/C4 results detracts from the focus of the study.

      As suggested by the reviewer, lines 140 to 142 have been removed.

      • Table 1. Move the two-cell embryo line to the top of the Table and place the Blastocyst line below it. This organization is the conventional method to present this type of data.

      As suggested, the order of the lines in Table 1 has been modified to align with the conventional presentation method.

      • Figures 1 and 2A and B data are solid and support the notion that enough sperm reach the site of fertilization, and that the sperm are defective in their capacity to support embryo development. Figures 2C and D have interesting data, although additional information would strengthen these results. The authors concluded that the sperm were defective in the epididymis. Where in the epididymis? These sperm were all from the cauda. Could the authors collect sperm from the upper portion of the cauda, or midportion, and compare if the defects manifest gradually?

      We appreciate this interesting and appropriate comment from the reviewer. In this regard, all the studies in our work were carried out using sperm from the whole cauda epididymis, the reason why we could not answer where defective sperm appear in the epididymis. In view of this, we have now conducted a comparative DNA fragmentation analysis between caput and cauda sperm from both genotypes. Our findings indicate that while cauda mutant sperm showed once again higher DNA fragmentation levels than controls, caput sperm exhibited levels of DNA damage not significantly different between genotypes. These results confirm that defects in DNA appear following sperm passage through the epididymal caput, supporting the hypothesis that defects in DNA fragmentation manifest during sperm transit through the epididymis and /or during storage in the cauda. These results have been included in the revised version of the manuscript (see lines 235-240/Figure 5B of the revised version)

      • Figure 3 displays the results of in vitro fertilization, either COCs A-C or zona-free fertilization D-F. The results are important and differ from those produced by fertilization in vivo. The authors indicate that these confirm that the in vivo conditions overcome in vitro defects. However, this study never addresses the reason behind it. Is there less expression of proteins related to these functions, or the function of some proteins is compromised? The authors should advance a hypothesis or a rationale to explain these results.

      As indicated by the reviewer, our results showed differences between the fertilization rates observed for mutant mice under in vivo and in vitro conditions, as previously observed for all our single and multiple KO models (Da Ros et al., 2008; PMID: 18571638, Brukman et al., 2016; PMID: 26786179, Weigel Muñoz, 2018; PMID: 29481619, Ernesto et al., 2015; PMID: 26416967, Carvajal et al,. 2018; PMID: 30510210) and also reported by other groups (Okabe et al., 2007; PMID: 17558467). In this regard, it has been well established that, although millions of sperm are ejaculated into the female tract, only a few (approximately one per oocyte) reach the fertilization site (i.e. the ampulla) (Cummins and Yanagimachi, 1982; doi:10.1002/mrd.1120050304). This efficient selection system by the female reproductive tract leads to the arrival of only the best sperm at the fertilization site, even in males with reproductive deficiencies, thereby “masking” sperm defects that can be detected under in vitro conditions due to the competition between good and bad quality sperm for the egg. Thus, although we can not exclude other mechanisms to explain the commonly observed differences between in vivo and in vitro fertilization rates, our rationale is that the natural and efficient sperm selection process that takes place within the female reproductive tract masks sperm defects that can, otherwise, be detected under the competitive in vitro conditions. This explanation is now included in the discussion of the revised version of the manuscript (see lines 320-325).

      • Data in Figures 4 and 5 support the interpretation of the authors. However, it is necessary to establish the level of oxidative stress in the mutant sperm vs. the controls. Also, a question to explore is for how long does the sperm need to reside in that mutant environment to start undergoing the DNA fragmentation reported?

      In response to the valuable request from the reviewer regarding the level of oxidative stress in sperm, we have analyzed reactive oxygen species (ROS) levels in mutant and control epididymal sperm. Our results showed that ROS levels in mutant sperm were not higher than those observed in the control group, supporting the idea that mechanisms other than oxidative stress may be leading to the increased DNA fragmentation observed in mutant sperm. These results are now included in the revised version of the manuscript (see Figure 5C).

      Regarding the question on how long the sperm need to reside in the mutant environment to undergo DNA fragmentation, recent experiments carried out in response to this reviewer in which we analyzed DNA fragmentation in caput sperm led us to conclude that DNA fragmentation develops during epididymal transit and/or storage in the cauda. While these observations do not precisely define the time within the epididymis that sperm require for exhibiting DNA fragmentation, our additional new in vitro experiments analyzing the effect of epididymal fluids on sperm DNA integrity showed that exposure of WT sperm to DKO fluid for only 1 hr already leads to an increase in DNA fragmentation (see Figure 6A of the revised manuscript), suggesting that sperm do not need long periods within the mutant environment to be affected.

      (3) The length of the Discussion section should be shortened, especially by not recapitulating data presented in the Results section.

      As requested by the reviewer, sections recapitulating results have been modified.

      Minor comments:

      (1) The sentence in lines 171 and 172 is unclear, "However, despite the short time after mating, once again, the in vivo fertilized eggs corresponding to the mutant group exhibited clear defects to reach the blastocyst stage in vitro compared to controls." What do the authors mean by short time? It is the expected time, correct?

      It is well established that after copulatory plug formation, most oocytes are fertilized within 2 to 8 hours, with fertilization rates that increase over time: 0–5% at 1.5 hours post-mating; 40% at 4 hours post-mating and more than 90% at 7 hs after mating (Muro et al., 2016; PMID: 26962112, La Spina et al., 2016; PMID: 26872876). In order to examine whether the embryo development defects observed for mutant mice were due to a delayed arrival of sperm to the ampulla, we decided to analyze the percentage of fertilized eggs recovered from the ampulla at “short times” (4 hs) after mating to avoid the possibility that the prolonged stay of sperm within the female tract corresponding to the usual “overnight mating” schedule could be giving defective sperm enough time to reach the ampulla and, finally, fertilize the eggs (i.e. delayed fertilization). Our results showed that, despite the expected lower fertilization rates observed for both control and mutant males when analyzed just 4 hs after mating, the fertilized eggs corresponding to the mutant group were still exhibiting clear defects to develop into blastocysts compared to controls, not favoring the idea that embryo development defects were due to a delayed fertilization. The sentence in lines “171 and 172” has been modified in the revised version of the manuscript to better explain this conclusion (see lines 152-155 of the revised version).

      (2) Line 177. Mutant epididymal sperm already carry defects leading to embryo development failure. Under this subheading, the authors compare within the same female the ability of mutant and control sperm delivered into different horns to support fertilization and embryo development. They show that the embryo development induced by mutant sperm is diminished vs. controls under very similar conditions, confirming the previous results of post-fertilization failure. The data also answers the question raised by the authors of whether the fertilization defects appear during or after epididymal transit; the interpretation of the results is the functional defects in the sperm are present before the transport into the female tract. Important unaddressed questions are, could these defects begin even earlier before arriving at the cauda? Did the authors try to incubate the mutant sperm with the epididymal fluid of WT mice to examine if the sperm defects could be rescued? The opposite experiment could also be performed, where WT sperm are incubated with the epididymal fluid of mutant mice, and the treated sperm examined for altered Ca<sup>2+</sup> levels or DNA fragmentation.

      First of all, we would like to clarify that our question about whether the fertilization defects appear “during or after epididymal transit” was in fact referring to whether defects appear during epididymal maturation or later on, at the moment of ejaculation. In this regard, our in vivo and in vitro fertilization studies allowed us to conclude that defects were already present in epididymal sperm without excluding the possibility that additional defects could appear at the vas deferens or at the moment of ejaculation due to the contribution of seminal plasma secretions.

      Regarding whether sperm defects could appear even earlier before arriving to the cauda, we have now analyzed DNA fragmentation defects in caput vs cauda both mutant and control sperm observing differences between genotypes only for cauda sperm. Based on these observations, we conclude that DNA integrity defects appear within the epididymis after sperm passage through the caput either when sperm reach the corpus or the cauda epididymis, or during their storage within the cauda region.

      Also, as suggested by the reviewer, we incubated in vitro WT sperm with epididymal fluid from DKO mice (and vice versa) and then analyzed DNA fragmentation levels. Results showed that exposure of control sperm to the mutant epididymal fluid for 1 hr significantly increased DNA fragmentation levels. When mutant sperm (exhibiting higher levels of DNA fragmentation than control sperm), were exposed to epididymal fluid from WT mice, no differences between groups were observed. Together, these results confirm both that the epididymal fluid from mutant mice contributes to the higher DNA fragmentation levels detected in mutant sperm, and that normal epididymal fluid would not be able to rescue the DNA fragmentation present in mutant cells. These results are now included in the revised version of the manuscript (see Figure 6A).

      (3) Lines 203 to 216. In these paragraphs the authors indicate "that mutant sperm had a lower percentage of fertilization and lower rates of blastocysts (Figure 3D, E), indicating that defects in egg coat penetration were not responsible for embryo development failure. Later, they indicated that a few eggs fertilized by mutant sperm failed to activate. It is shown that Ca<sup>2+</sup> oscillations are normal, indicating that the defects lie elsewhere. Could the authors propose a mechanism based on their sperm DNA defects?

      As described in the Result and Discussion sections of the original manuscript, we decided to investigate the existence of possible defects in sperm DNA fragmentation based on evidence indicating that delays in early embryo development may result from the time taken by the egg to repair damaged paternal DNA (Esbert et al., 2018; PMID: 30259705, Newman et al., 2022; PMID: 34954800, Nguyen et al., 2023; PMID: 37658763). In this regard, it is known that time is needed before the first embryonic cell division for activation of the egg DNA repairing machinery (Martin et al., 2019; PMID: 30541031, Newman et al., 2022; PMID: 34954800) and that increased sperm DNA damage may necessitate more time for repair by the oocyte (Martin et al., 2019; PMID: 30541031, Newman et al., 2022; PMID: 34954800). Based on this, we decided to examine possible DNA damage in sperm. Our finding that, in fact, sperm DNA fragmentation was clearly increased in mutant sperm led us to propose that delays in early embryo development in our mutant colonies may result from the time required by the egg to repair sperm DNA fragmentation.

      (4) The demonstration that C1/C3 sperm have abnormal rates of DNA fragmentation and Ca<sup>2+</sup> levels is significant. Additional studies would strengthen the findings reported here. For example, what are the levels of oxidative stress in these sperm? Are there other changes related to oxidative stress? Performing a TUNNEL assay will strengthen the notion of DNA damage demonstrated here with the chromatin dispersion assay.

      As mentioned previously, we analyzed oxidative stress by evaluating ROS levels in control and mutant sperm observing no differences between genotypes. These results have been included in the revised version of the manuscript (See Figure 5C). We appreciate the suggestion of performing TUNNEL assay for future studies.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) There are some reports small RNAs gained during the epididymal transition of sperm are essential for embryonic development (e.g., Conine et al., Dev Cell, 46, 470480, 2018; PMID: 30057276), suggesting that the luminal changes in Crisp1/3 double KO (dKO) epididymis lead to the phenotype in this study. In fact, there is no evidence whether CRISP1/CRISP3 secreted from an epididymis exists in cauda epididymal sperm and directly controls the observed phenomena. Also, the authors wrote there is no strong evidence to exclude the possible role of small RNA in Crisp1/3 dKO sperm (lines 370-372). Therefore, it is at least necessary to measure small RNA abundance in dKO mice.

      As mentioned by the reviewer and as cited in our manuscript, there is a report indicating that the small RNAs gained during epididymal transit may play a role in embryonic development (Conine et al., 2018; PMID: 30057276). However, the need of small RNAs for embryonic development still remains a topic of debate (Wang et al. 2020; PMCID: PMC7799177). In this regard, clear evidence indicating that sperm DNA fragmentation is associated with embryo development defects together with the increase in sperm DNA fragmentation levels observed in mutant sperm support sperm DNA damage as one of the causes leading to the observed phenotype in our mutant mice. Moreover, recent experiments carried out in response to Reviewer 1 comments revealed that exposure of control sperm to epididymal fluid from mutant mice significantly increases DNA fragmentation levels, confirming that the absence of CRISP1 and CRISP3 proteins in epididymal fluid contributes to sperm DNA damage in mutant sperm. Finally, whereas oxidative stress might also lead to embryo development impairment as mentioned in our original manuscript, recent evaluation of ROS levels in control and mutant sperm carried out in response to Reviewer 1’s comments did not show higher ROS levels in mutant sperm. Thus, although as mentioned in the manuscript, we do not exclude the possibility that small RNAs may also contribute to embryo development defects, our observations support DNA fragmentation and a dysregulation in Ca<sup>2+</sup> homeostasis within the epididymis and sperm as the main responsible for embryo development failure in our mutant males. The experiments using epididymal fluid (Figure 6A) and those evaluating ROS levels (Figure 5C) have been included in the revised version of the manuscript and discussed accordingly.

      (2) Lines 245-248 and 354-374: According to Figure 5C, the intracellular Ca<sup>2+</sup> level significantly increased in Crisp1/3 dKO sperm compared to control. The author hypothesized that this increase could destroy sperm DNA integrity, causing defects in early embryogenesis. However, the authors did not show the direct evidence.

      Specifically, as CRISP1 inhibits CatSper (line 95), the authors believed the increased Ca<sup>2+</sup> level in Crisp1/3 dKO sperm was observed. Crisp1/3 dKO and Crisp1/4 dKO mice share the disruption of Crisp1, but the phenotype is totally different. Thus, the authors should also examine the CatSper activity in Crisp1/3 dKO sperm.

      We appreciate the reviewer's insightful comments. In this regard, whereas C1/C3 and C1/C4 DKO colonies shares the disruption of Crisp1, the intracellular Ca<sup>2+</sup> levels in these two colonies are different as no increase in sperm intracellular Ca<sup>2+</sup> was detected in Crisp C1/C4 DKO mice. Thus, this difference in intracellular Ca<sup>2+</sup> levels might explain the different embryo development phenotype observed in our two DKO colonies. In this regard, our results revealed that sperm intracellular Ca<sup>2+</sup> levels are different depending on the Crisp gene being deleted. Whereas the lack of Crisp1 did not affect intracellular sperm Ca<sup>2+</sup> levels (Weigel Munoz et al, 2018; PMID: 29481619), there was an increase in Ca<sup>2+</sup> levels in CRISP2 KO sperm (Brukman et al., 2016; PMID: 26786179) and a decrease in sperm when Crisp4 was deleted (Carvajal 2019, Ph.D Thesis). Thus, although the ability of CRISP3 to regulate sperm Ca<sup>2+</sup> channels has not yet been reported, the existence of functional compensations between homologous CRISP members (Curci et al., 2020; PMID: 33037689) makes it complicated to draw straightforward conclusions based on the behavior of each individual protein in Ca<sup>2+</sup> regulation. In fact, while the lack of CRISP1 and CRISP4 does not affect sperm Ca<sup>2+</sup> concentration (Carvajal 2019, Ph.D Thesis), the simultaneous lack of CRISP1 and CRISP3 produced an increase in Ca<sup>2+</sup> levels and the lack of the four CRISP proteins showed a decrease in the intracellular levels of the cation after capacitation (Curci et al, 2020). Based on these observations, we conclude that the absence of CRISP1 may or may not lead to altered intracellular Ca<sup>2+</sup> levels depending on the other simultaneously-deleted gene/s.

      The authors make a hypothesis that the increased Ca<sup>2+</sup> level may lead to damaged DNA integrity by citing a published paper (lines 360-363). In the published paper, the authors examined the influence of the luminal fluid of the epididymis and vas deference on sperm chromatin fragmentation (Gawecka et al., 2015). However, they did not mention the increased DNA fragmentation in epididymal sperm when these sperm were incubated with Ca<sup>2+</sup> or Mn2+. So, the authors' hypothesis is over discussion. Thus, the correlation between the intracellular Ca<sup>2+</sup> level and DNA integrity in sperm is still unclear. So, the authors should show why the increased Ca<sup>2+</sup> level leads to DNA fragmentation with direct evidence.

      We appreciate the reviewer’s comment regarding the work by Gawecka et al., (2015), and the opportunity to clarify the proposed mechanism underlying our observations. In the above mentioned paper, the authors reported that when mouse epididymal or vas deferens sperm were incubated with divalent cations (Ca<sup>2+</sup> and Mn<sup>2+</sup>) in the presence of luminal fluid, they were induced to degrade their DNA in a process termed sperm chromatin fragmentation (SCF). The fact that both the ejaculated and epididymal mutant sperm used in our studies had been exposed to epididymal fluid lacking CRISP proteins known to regulate sperm Ca<sup>2+</sup> channels, opened the possibility that changes in Ca<sup>2+</sup> levels within the epididymal fluid and/or sperm could be responsible for the higher DNA fragmentation levels observed in mutant cells. In this regard, it is important to note that, as requested by Reviewer 1, we performed additional in vitro experiments in which WT epididymal sperm were exposed to mutant or WT epididymal fluid in the presence or absence of Ca<sup>2+</sup> and DNA fragmentation analyzed at the end of incubation. Results showed a significant increase in DNA fragmentation in WT sperm exposed to either mutant epididymal fluid or WT fluid in the presence of Ca<sup>2+</sup> (Figure 6A). We believe these observations together with the higher intracellular Ca<sup>2+</sup> levels detected in DKO sperm (Figure 6B) provides strong evidence supporting changes in Ca<sup>2+</sup> homeostasis in the epididymis and sperm as the main responsible for the observed sperm DNA integrity defects. This could be mediated by the activation of Ca<sup>2+</sup>-dependent nucleases present within the epididymal fluid and/or sperm cells as previously suggested (Shaman et al., 2006; PMID: 16914690, Sotolongo et al., 2005; PMID: 15713834, Boaz et al., 2008; PMID: 17879959, Dominguez and Ward, 2009; PMID: 19938954). These observations have now been included and discussed in the revised version of the manuscript (see lines 245-265 and 427-439).

      Minor Comments:

      (3) Standards for measuring rates should be clarified, such as two-cell rates are determined by dividing the number of two-cell embryos by the total number of eggs.

      As requested, standards for measuring rates have now been clarified in the corresponding figure legends

    1. eLife Assessment

      This study provides valuable information on a novel gene that regulates meiotic progression in both male and female meiosis. The evidence supporting the conclusions of the authors is solid. This study will be of interest to developmental and reproductive biologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate the role of BEND2, a novel regulator of meiosis, in both male and female fertility. Huang et al have created a mouse model where the full-length BEND2 transcript is depleted but the truncated BEND2 version remains. This mouse model is fertile, and the authors used it to study the role of BEND2 on both male and female meiosis. Overall, the full-length BEND2 appears dispensable for male meiosis. The more interesting phenotype was observed in females. Females exhibit a lower ovarian reserve suggesting that full-length BEND2 is involved in the establishment of the primordial follicle pool.

      Strengths:

      The authors generated a mouse model that enabled them to study the role of BEND2 in meiosis. The role of BEND2 in female fertility is novel and enhances our knowledge of genes involved in the establishment of the primordial follicle pool.

      Weaknesses highlighted previously:

      The manuscript extensively explores the role of BEND2 in male meiosis; however, a more interesting result was obtained from the study of female mice.

    3. Reviewer #2 (Public review):

      In their manuscript entitled "BEND2 is a crucial player in oogenesis and reproductive aging", the authors present their findings that full-length BEND2 is important for repair of meiotic double strand break repair in spermatocytes, regulation of LINE-1 elements in spermatocytes, and proper oocyte meiosis and folliculogenesis in females. The manuscript utilizes an elegant system to specifically ablate the full-length form of BEND2 which has been historically difficult to study due to its location on the X chromosome and male sterility of global knockout animals.

      The authors have been extremely responsive to reviewer critiques and have presented strong data and appropriate conclusions, making it an excellent addition to the field.

    4. Reviewer #3 (Public review):

      Huang et al. investigated the phenotype of Bend2 mutant mice which expressed truncated isoform. Bend2 deletion in male showed fertility and this enabled them to analyze the BEND2 function in females. They showed that Bend2 deletion in females showed decreasing follicle number which may lead to loss of ovarian reserve.

      Strengths:

      They found the truncated isoform of Bend2 and the depletion of this isoform showed decreasing follicle number at birth.

      Weaknesses highlighted previously:

      The authors showed novel factors that impact ovarian reserve. Although the number of follicles and conception rate are reduced in mutant mice, the in vitro fertilization rate is normal and follicles remain at 40 weeks of age. It is difficult to know how critical this is when applied to the human case.

      [Editors' note: We thank the authors for considering the previous recommendations and suggested corrections.]

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors investigate the role of BEND2, a novel regulator of meiosis, in both male and female fertility. Huang et al have created a mouse model where the full-length BEND2 transcript is depleted but the truncated BEND2 version remains. This mouse model is fertile, and the authors used it to study the role of BEND2 on both male and female meiosis. Overall, the full-length BEND2 appears dispensable for male meiosis. The more interesting phenotype was observed in females. Females exhibit a lower ovarian reserve suggesting that full-length BEND2 is involved in the establishment of the primordial follicle pool. 

      Strengths: 

      The authors generated a mouse model that enabled them to study the role of BEND2 in meiosis. The role of BEND2 in female fertility is novel and enhances our knowledge of genes involved in the establishment of the primordial follicle pool. 

      Weaknesses: 

      The manuscript extensively explores the role of BEND2 in male meiosis; however, a more interesting result was obtained from the study of female mice. 

      We sincerely appreciate the reviewer’s thoughtful evaluation of our work and recognition of the strengths of our study. We are especially grateful for the acknowledgment of the novelty of our findings regarding the role of BEND2 in female fertility. While we extensively characterized the e ects of BEND2 depletion in male meiosis, we agree that the phenotype observed in females provides particularly interesting insights into the establishment of the primordial follicle pool. 

      Reviewer #2 (Public review): 

      In their manuscript entitled "BEND2 is a crucial player in oogenesis and reproductive aging", the authors present their findings that full-length BEND2 is important for repair of meiotic double strand break repair in spermatocytes, regulation of LINE-1 elements in spermatocytes, and proper oocyte meiosis and folliculogenesis in females. The manuscript utilizes an elegant system to specifically ablate the full-length form of BEND2 which has been historically di icult to study due to its location on the X chromosome and male sterility of global knockout animals. 

      The authors have been extremely responsive to reviewer critiques and have presented strong data and appropriate conclusions, making it an excellent addition to the field. 

      We are truly grateful for the reviewer’s thoughtful review and recognition of the key contributions of our study. We appreciate the acknowledgment of how our model overcomes the challenges in studying BEND2 and the importance of our findings in both male and female meiosis. We also value the reviewer’s encouraging comments on our responsiveness to their feedback and the quality of our data and conclusions.

      Reviewer #3 (Public review): 

      Huang et al. investigated the phenotype of Bend2 mutant mice which expressed truncated isoform. Bend2 deletion in male showed fertility and this enabled them to analyze the BEND2 function in females. They showed that Bend2 deletion in females showed decreasing follicle number which may lead to loss of ovarian reserve. 

      Strengths: 

      They found the truncated isoform of Bend2 and the depletion of this isoform showed decreasing follicle number at birth. 

      Weaknesses: 

      The authors showed novel factors that impact ovarian reserve. Although the number of follicles and conception rate are reduced in mutant mice, the in vitro fertilization rate is normal and follicles remain at 40 weeks of age. It is difficult to know how critical this is when applied to the human case. 

      We greatly appreciate the reviewer’s comments and recognition of the strengths of our work. We are grateful for their acknowledgment of our findings related to the truncated isoform of Bend2 and its e ect on ovarian reserve. We also agree that, although our study provides important insights, we are still far from directly applying these results to human clinical scenarios. There is much further research needed before these findings can be translated. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):: 

      The authors have addressed all concerns both editorially and experimentally. This is a very nice manuscript, and I congratulate the authors on their work. 

      We sincerely appreciate your kind words and thoughtful review. Your feedback has been invaluable in improving our manuscript, and we are grateful for your time and effort. Thank you for your support and encouragement!

      Reviewer #2 (Recommendations for the authors):: 

      In Figure 3, graphs in panels C & D have typos in the early zygotene column where it reads "zyotene". 

      We appreciate your careful review and for pointing out the typos in Figure 4, which has been corrected in the new version of the manuscript. 

      Reviewer #3 (Recommendations for the authors): 

      ・Since there are two isoforms of Bend2, and the authors depleted one isoform, this is not suitable to use "full length" in the titles and in the manuscripts. 

      We respectfully disagree with the reviewer’s comment. In our mouse model, we specifically remove the full-length isoform of Bend2. Therefore, we consider it appropriate to refer to it as such in the manuscript. Our results indicate that the full-length isoform is not required to complete meiotic prophase in males but is indispensable for setting up the ovarian reserve in females. We appreciate the reviewer’s input and are happy to clarify this point further if needed.

      ・Is there any reason why authors used 7 month old females for in vitro fertilization? It may not be recognized as aged mice but it seems a bit old to perform IVF especially when the ovarian reserve in mutant mice is decreased. If there is any reason, please clarify it. In addition, since the authors added IVF data, which showed similar fertilization ratio between control and mutant, the authors need to discuss why the litter size was decreased in mutant mice. It may be to strong to conclude "subfertility". 

      We used 7-month-old females for IVF because this falls within the age range of the samples analyzed for ovarian reserve, with the oldest females being 8 months old. Regarding the apparent discrepancy between IVF results and litter size, we addressed this in the discussion section of the manuscript: 'Interestingly, our mutant oocyte quality analysis suggests that mature oocytes from mutant females are equally competent to develop into a blastocyst as control ones. These data suggest that the subfertility observed in Bend2 mutants may be due to errors in later developmental stages, such as implantation or organogenesis.' We appreciate the reviewer’s feedback and hope this clarification helps.

    1. eLife Assessment

      This important study shows that a very slow (infraslow) oscillation occurs in voltage recordings from the dentate gyrus of the adult mouse. The authors suggest that it is related to sleep stage and serotonin acting at one type of serotonin receptor in the dentate gyrus. The results are significant because they suggest new insight into how a slow oscillation affects memory through serotonin receptors in the dentate gyrus. Convincing data are provided to support the claims.

    2. Reviewer #1 (Public review):

      Turi, Teng and the team used state of the art techniques to provide convincing evidence on the infraslow oscillation of DG cells during NREM sleep, and how serotonergic innervation modulates hippocampal activity pattern during sleep and memory. First, they showed that the glutamatergic DG cells become activated following an infraslow rhythm during NREM sleep. In addition, the infraslow oscillation in the DG is correlated with rhythmic serotonin release during sleep. Finally, they found that specific knockdown of 5-HT receptors in the DG impairs the infraslow rhythm and memory, suggesting that serotonergic signaling is crucial for regulating DG activity during sleep. Given that the functional role of infraslow rhythm still remains to be studied, their findings deepen our understanding on the role of DG cells and serotonergic signaling in regulating infraslow rhythm, sleep microarchitecture and memory.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated DG neuronal activity at the population and single cell level across sleep/wake periods. They found an infraslow oscillation (0.01-0.03 Hz) in both granule cells (GC) and mossy cells (MC) during NREM sleep. The important findings are:

      (1) The antiparallel temporal dynamics of DG neuron activities and serotonin neuron activities/extracellular serotonin levels during NREM sleep<br /> (2) The GC Htr1a-mediated GC infraslow oscillation.

      Strengths:

      (1) The combination of polysomnography, Ca-fiber photometry, two-photon microscopy and gene depletion is technically sound. The coincidence of microarousals and dips in DG population activity is convincing. The dip in activity in upregulated cells is responsible for the dip at the population level.

      (2) DG GCs express excitatory Htr4 and Htr7 in addition to inhibitory Htr1a, but deletion of Htr1a is sufficient to disrupt DG GC infraslow oscillation, supporting the importance of Htr1a in DG activity during NREM sleep.

      Weaknesses from the original round of review:

      (1) The current data set and analysis are insufficient to interpret the observation correctly [...].

      (2) It is acceptable that DG Htr1a KO induces the reduced freezing in the CFC test (Fig. 6E, F), but it is too much of a stretch that the disruption of DG ISO causes impaired fear memory. There should be a correlation.

      (3) It is necessary to describe the extent of AAV-Cre infection. The authors injected AAV into the dorsal DG (AP -1.9 mm), but the histology shows the ventral DG (Supplementary Fig. 4), which reduces the reliability of this study.

      Comments on revisions:

      Thank you for the clarification of the detection criteria and the quantification of the specific events. This reviewer can now follow the authors' interpretation.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Turi, Teng and the team used state-of-the-art techniques to provide convincing evidence on the infraslow oscillation of DG cells during NREM sleep, and how serotonergic innervation modulates hippocampal activity pattern during sleep and memory. First, they showed that the glutamatergic DG cells become activated following an infraslow rhythm during NREM sleep. In addition, the infraslow oscillation in the DG is correlated with rhythmic serotonin release during sleep. Finally, they found that specific knockdown of 5-HT receptors in the DG impairs the infraslow rhythm and memory, suggesting that serotonergic signaling is crucial for regulating DG activity during sleep. Given that the functional role of infraslow rhythm still remains to be studied, their findings deepen our understanding on the role of DG cells and serotonergic signaling in regulating infraslow rhythm, sleep microarchitecture and memory.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated DG neuronal activity at the population and single cell level across sleep/wake periods. They found an infraslow oscillation (0.01-0.03 Hz) in both granule cells (GC) and mossy cells (MC) during NREM sleep. The important findings are 1) the antiparallel temporal dynamics of DG neuron activities and serotonin neuron activities/extracellular serotonin levels during NREM sleep, and 2) the GC Htr1a-mediated GC infraslow oscillation.

      Strengths:

      (1) The combination of polysomnography, Ca-fiber photometry, two-photon microscopy and gene depletion is technically sound. The coincidence of microarousals and dips in DG population activity is convincing. The dip in activity in upregulated cells is responsible for the dip at the population level.

      (2) DG GCs express excitatory Htr4 and Htr7 in addition to inhibitory Htr1a, but deletion of Htr1a is sufficient to disrupt DG GC infraslow oscillation, supporting the importance of Htr1a in DG activity during NREM sleep.

      Weaknesses:

      (1) The current data set and analysis are insufficient to interpret the observation correctly.<br /> a. In Fig 1A, during NREM, the peaks and troughs of GC population activities seem to gradually decrease over time. Please address this point.

      b. In Fig 1F, about 30% of Ca dips coincided with MA (EMG increase) and 60% of Ca dips did not coincide with EMG increase. If this is true, the readers can find 8 Ca dips which are not associated with MAs from Fig 1E. If MAs were clustered, please describe this properly.<br /> c. In Fig 1F, the legend stated the percentage during NREM. If the authors want to include the percentage of wake and REM, please show the traces with Ca dips during wake and REM. This concern applies to all pie charts provided by the authors.

      d. In Fig 1C, please provide line plots connecting the same session. This request applies to all related figures.

      e. In Fig 2C, the significant increase during REM and the same level during NREM are not convincing. In Fig 2A, the several EMG increasing bouts do not appear to be MA, but rather wakefulness, because the duration of the EMG increase is greater than 15 seconds. Therefore, it is possible that the wake bouts were mixed with NREM bouts, leading to the decrease of Ca activity during NREM. In fact, In Fig 2E, the 4th MA bout seems to be the wake bout because the EMG increase lasts more than 15 seconds.

      f. Fig 5D REM data are interesting because the DRN activity is stably silenced during REM. The varied correlation means the varied DG activity during REM. The authors need to address it.

      g. In Fig 6, the authors should show the impact of DG Htr1a knockdown on sleep/wake structure including the frequency of MAs. I agree with the impact of Htr1a on DG ISO, but possible changes in sleep bout may induce the DG ISO disturbance.

      (2) It is acceptable that DG Htr1a KO induces the reduced freezing in the CFC test (Fig. 6E, F), but it is too much of a stretch that the disruption of DG ISO causes impaired fear memory. There should be a correlation.

      (3) It is necessary to describe the extent of AAV-Cre infection. The authors injected AAV into the dorsal DG (AP -1.9 mm), but the histology shows the ventral DG (Supplementary Fig. 4), which reduces the reliability of this study.

      Responses to weaknesses mentioned above have been addressed in the first revision.

      Comments on revisions:

      In the first revision, I pointed out the inappropriate analysis of the EEG/EMG/photometry data and gave examples. The authors responded only to the points raised and did not seem to see the need to improve the overall analysis and description. In this second revision, I would like to ask the authors to improve them. The biggest problem is that the detection criteria and the quantification of the specific event are not described at all in Methods and it is extremely difficult to follow the statement. All interpretations are made by the inappropriate data analysis; therefore, I have to say that the statement is not supported by the data.

      Please read my following concerns carefully and improve them.

      (1) The definition of the event is critical to the detection of the event and the subsequent analysis. In particular, the authors explicitly describe the definition of MA (microarousal), the trough and peak of the population level of intracellular Ca concentrations, or the onset of the decline and surge of Ca levels.

      (1-1) The authors categorized wake bouts of <15 seconds with high EMG activity as MA (in Methods). What degree of high EMG is relevant to MA and what is the lower limit of high EMG? In Fig 1E, there are some EMG spikes, but it was unclear which spike/wave (amplitude/duration) was detected as MA-relevant spike and which spike was not detected. In Fig 2E, the 3rd MA coincides with the EMG spike, but other EMG spikes have comparable amplitude to the 3rd MA-relevant EMG spike. Correct counting of MA events is critical in Fig 1F, 2F, 4C.

      We have added more information about the MA definition in Methods, including EMG amplitude. Furthermore, we have re-analyzed MA and MA-related calcium signals in Fig1 and Fig2. Fig-S1 shows the traces of EMG aptitude for all MA events show in Fig1G and Fig2G.

      (1-2) Please describe the definition of Ca trough in your experiments. In Fig 1G, the averaged trough time is clear (~2.5 s), so I can acknowledge that MA is followed by Ca trough. However, the authors state on page 4 that "30% of the calcium troughs during NREM sleep were followed by an MA epoch". This discrepancy should be corrected.

      We apologize for the misleading statement. We meant 30% of ISO events during NERM sleep. We have corrected this. To detect the calcium trough of ISO, we first calculated a moving baseline (blue line in Fig-S2 below) by smoothing the calcium signals over 60 s, then set a threshold (0.2 standard deviation from the moving baseline) for events of calcium decrease, and finally detected the minimum point (red dots in Fig-S2) in each event as the calcium trough. We have added these in Methods.

      (1-3) Relating comment 1-2, I agree that the latency is between MA and Ca through in page 4, as the authors explain in the methods, but, in Fig 1G, t (latency) is labeled at incorrect position. Please correct this.

      We are sorry for the mistake in describing the latency in the Methods. The latency was defined as the time difference between the onset of calcium decline (see details below in 1-4) and the onset of the MA. We have corrected this in the revised manuscript. Thus, the labeling in Fig1G was correct.

      (1-4) The authors may want to determine the onset of the decline in population Ca activity and the latency between onset and trough (Fig 1G, latency t). If so, please describe how the onset of the decline is determined. In Fig 1G, 2G, S6, I can find the horizontal dashed line and infer that the intersection of the horizontal line and the Ca curve is considered the onset. However, I have to say that the placement of this horizontal line is super arbitrary. The results (t and Drop) are highly dependent on the position of horizontal line, so the authors need to describe how to set the horizontal line.

      Indeed, we used the onset of calcium decline to calculate the latency as mentioned above. First, we defined the baseline (dashed line in Fig1G) by calculating the average of calcium signals in the10s window before the MA (from -15s to -5s in Fig1G). The onset of calcium decline is defined as the timepoint where calcium decrease was larger than 0.05 SD from this baseline. We have added these in Methods.

      (1-5) In order to follow Fig 1F correctly, the authors need to indicate the detection criteria of "Ca dip (in legend)". Please indicate "each Ca dip" in Fig 1E. As a reader, I would like to agree with the Ca dip detection of this Ca curve based on the criteria. Please also indicate "each Ca dip" in Fig 2E and 2F. In the case of the 2nd and 3rd MAs, do they follow a single Ca dip or does each MA follow each Ca dip? This chart is highly dependent on the detection criteria of Ca dip.

      We have indicated each ca dip in Fig 1 and Fig 2.

      As I mentioned above, most of the quantifications are not based on the clear detection criteria. The authors need to re-analyze the data and fix the quantification. Please interpret data and discuss the cellular mechanism of ISO based on the re-analyzed quantification.

      As suggested, we have re-analyzed the MA and MA-related photometry signals. Accordingly, parts of Fig1 and Fig2 have been revised. Although there are some small changes, the main results and conclusions remain unchanged.

      Reviewer #3 (Public review):

      Summary:

      The authors employ a series of well-conceived and well-executed experiments involving photometric imaging of the dentate gyrus and raphe nucleus, as well as cell-type specific genetic manipulations of serotonergic receptors that together serve to directly implicate serotonergic regulation of dentate gyrus (DG) granule (GC) and mossy cell (MC) activity in association with an infra slow oscillation (ISO) of neural activity has been previously linked to general cortical regulation during NREM sleep and microarousals.

      Strengths:

      There are a number of novel and important results, including the modulation of dentage granule cell activity by the infraslow oscillation during NREM sleep, the selective association of different subpopulations of granule cells to microarousals (MA), the anticorrelation of raphe activity with infraslow dentate activity.

      The discussion includes a general survey of ISOs and recent work relating to their expression in other brain areas and other potential neuromodulatory system involvement, as well as possible connections with infraslow oscillations, micro arousals, and sensory sensitivity.

      Weaknesses:

      - The behavioral results showing contextual memory impairment resulting from 5-HT1a knockdown are fine, but are over-interpreted. The term memory consolidation is used several times, as well as references to sleep-dependence. This is not what was tested. The receptor was knocked down, and then 2 weeks later animals were found to have fear conditioning deficits. They can certainly describe this result as indicating a connection between 5-HT1a receptor function and memory performance, but the connection to sleep and consolidation would just be speculation. The fact that 5-HT1a knockdown also impacted DG ISOs does not establish dependency. Some examples of this are:

      – The final conclusion asserts "Together, our study highlights the role of neuromodulation in organizing neuronal activity during sleep and sleep-dependent brain functions, such as memory.", but the reported memory effects (impairment of fear conditioning) were not shown to be explicitly sleep-dependent.

      – Earlier in the discussion it mentions "Finally, we showed that local genetic ablation of 5-HT1a receptors in GCs impaired the ISO and memory consolidation". The effect shown was on general memory performance - consolidation was not specifically implicated.

      – The assertion on page 9 that the results demonstrate "that the 5-HT is directly acting in the DG to gate the oscillations" is a bit strong given the magnitude of effect shown in Fig. 6D, and the absence of demonstration of negative effect on cortical areas that also show ISO activity and could impact DG activity (see requested cortical sigma power analysis).

      – Recent work has shown that abnormal DG GC activity can result from the use of the specific Ca indicator being used (GCaMP6s). (Teng, S., Wang, W., Wen, J.J.J. et al. Expression of GCaMP6s in the dentate gyrus induces tonic-clonic seizures. Sci Rep 14, 8104 (2024). https://doi.org/10.1038/s41598-024-58819-9). The authors of that study found that the effect seemed to be specific to GCaMP6s and that GCaMP6f did not lead to abnormal excitability. Note this is of particular concern given similar infraslow variation of cortical excitability in epilepsy (cf Vanhatalo et al. PNAS 2004). While I don't think that the experiments need to be repeated with a different indicator to address this concern, you should be able to use the 2p GCaMP7 experiments that have already been done to provide additional validation by repeating the analyses done for the GCaMP6s photometry experiments. This should be done anyway to allow appropriate comparison of the 2p and photometry results.

      – While the discussion mentions previous work that has linked ISOs during sleep with regulation of cortical oscillations in the sigma band, oddly no such analysis is performed in the current work even though it is presumably available and would be highly relevant to the interpretation of a number of primary results including the relationship between the ISOs and MAs observed in the DG and similar results reported in other areas, as well as the selective impact of DG 5-HT1a knockdown on DG ISOs. For example, in the initial results describing the cross correlation of calcium activity and EMG/EEG with MA episodes (paragraph 1, page 4), similar results relating brief arousals to the infraslow fluctuation in sleep spindles (sigma band) have been reported also at .02 Hz associated with variation in sensory arousability (cf. Cardis et al., "Cortico-autonomic local arousals and heightened somatosensory arousability during NREMS of mice in neuropathic pain", eLife 2021). It would be important to know whether the current results show similar cortical sigma band correlations. Also, in the results on ISO attenuation following 5-HT1 knockdown on page 7 (fig. 6), how is cortical EEG affected? is ISO still seen in EEG but attenuated in DG?

      – The illustrations of the effect of 5-HT1a knockdown shown in Figure 6 are somewhat misleading. The examples in panels B and C show an effect that is much more dramatic than the overall effect shown in panel D. Panels B and C do not appear to be representative examples. Which of the sample points in panel D are illustrated in panels B, C? it is not appropriate to arbitrarily select two points from different animals for comparison, or worse, to take points from the extremes of the distributions. If the intent is to illustrate what the effect shown in D looks like in the raw data, then you need to select examples that reflect the means shown in panel D. It is also important to show the effect on cortical EEG, particularly in sigma band to see if the effects are restricted to the DG ISOs. It would also be helpful to show that MAs and their correlations as shown in Fig 1 or G as well as broader sleep architecture are not affected.

      – On page 9 of the results it states that GCs and MCs are upregulated during NREM and their activity is abruptly terminated by MAs through a 5-HT mediated mechanism. I didn't see anything showing the 5-HT dependence of the MA activity correlation. The results indicate a reduction in ISO modulation of GC activity but not the MA correlated activity. I would like to see the equivalent of Fig 1,2 G panels with the 5-HT1a manipulation.

      Responses to Revewer#3 have been addressed in the first revision. 

      Reviewer #1 (Recommendations for the authors):

      Minor comment: Several recent publications from different laboratories have shown rhythmic release of norepinephrine (NE) (~0.03 Hz) in the medial prefrontal cortex, the thalamus, and in the locus coeruleus (LC) of the mouse during sleep-wake cycles-> Please add "preoptic area" here

      We have added the citation.

      Reviewer #2 (Recommendations for the authors):

      Minor

      (1) (abstract, page 2 line 9) what kind of "increased activity" did the authors find?

      Increased activity compared to that during wakefulness. We have added this.

      (2) (result, page 4) please define first, early, and late stage of NREM sleep in the methods.

      We have added these in the Methods.

      (3) (result, page 6) please define "the risetime of the phasic increase".

      It refers to the latency between the increase of 5-HT and the MA onset. We have clarified this in the text.

      (4) (supplement Fig 3 legend) please reword "5-HT events" and "5-HT signals" because these are ambiguous.

      We have defined the events in the legend.

      (5) (Fig 5A) please replace the picture without bubbles.

      We have replaced the image in Fig5A.

    1. eLife Assessment

      This important manuscript proposes a dual behavioral/computational approach to assess emotional regulation in humans. The authors present solid evidence for the idea that emotional distancing (as routinely used in clinical interventions for e.g. mood and anxiety disorders) enhances emotional control.

    2. Reviewer #1 (Public review):

      Summary:

      Using sequences of short videos to elicit emotional changes in participants, Malamud & Huys demonstrate how a brief, controlled emotion regulation intervention (distancing) can effectively alter subsequent emotion ratings. The authors employ latent state-space models to capture the trajectories of emotion ratings, leveraging tools from control theory to quantify the intervention's impact on emotion dynamics.

      Strengths:

      The experiment is well-designed and tailored to the computational modeling approach advanced in the paper. It also relies on a selection of stimuli that were previously validated. Within the constraints of a controlled experiment, the intervention successfully implements a relatively common tool of psychotherapeutic treatment, ensuring its clinical relevance.

      The computational modeling is grounded in the well-established framework of dynamical systems and control theory. This foundation offers a conceptually clear formalization along with powerful quantification tools that go beyond previous more data-driven approaches.

      Overall, the study presents a coherent approach that bridges concepts from clinical psychology and computational theories, providing a timely stepping stone toward advancing quantified, evidence-based psychological interventions targetting emotion control.

      Weaknesses:

      A primary limitation of this study, acknowledged by the authors, is its reliance on self-reports of participants' emotional states. Although considerable effort was made to minimize expectation effects, further research is needed to confirm that the observed behavioral changes reflect genuine alterations in emotional states. Additionally, the generalizability of the findings to long-term remediation strategies remains an open question.

      Second, the statistical analysis, particularly the computational approach, sometimes lacks sufficient detail and refinement. While I will not elaborate on specific points here, one notable issue is the interpretation of the intrinsic matrix (A). The model-free analysis reveals correlations between emotions at a given time or within an emotional state across time points. However, it does not provide evidence to support lagged interactions across states that would justify non-diagonal elements in A. The other result concerning the dynamics matrix only highlights a trend in the dominant eigenvalue, which is difficult to interpret in isolation. The absence of a statistically significant group x intervention interaction furthermore makes this finding a little compelling. This weakens the study's conclusions about the importance of intrinsic dynamics, as claimed in the title.

      Finally, to avoid potential misunderstandings of their work, the authors should be more careful about their use of terms pertaining to the control theory and take the time to properly define them. For example, the "controllability" of emotional states can either denote that those states are more changeable (control theory definition), or, conversely, more tightly regulated (common interpretation, as used in the abstract). This is true for numerous terms (stability, sensitivity, Gramian, etc.) for which no clear definition nor references are provided. Readers unfamiliar with the framework of control theory will likely be at a loss without more guidance.

    3. Reviewer #2 (Public review):

      Summary:

      In this well-conceived and timely study, the authors assess the controllability of emotions in a quantitative way using the framework of control theory. They use a controlled distancing intervention halfway through an emotion rating task where emotion-inducing short videos from a validated database are shown and find that the intervention enables a better controllability of externally induced emotions in the experimental group.

      Strengths:

      It is a highly original idea to address the external controllability of emotions using the formal framework of control theory. It is also a very propitious approach to take what could be called a 'micro-therapeutic' perspective which looks at the immediate effect of an intervention instead of the 'macro-therapeutic' mid- or long-term effect of a whole course of therapy.

      Weaknesses:

      Acquiring data online inevitably gives rise to selection and self-selection effects. This needs to be acknowledged clearly. Exacerbating this, participant remuneration seems low at an amount below the minimum or living wage in Western countries (do the authors know where their participants came from?).

      Another concern is that the intervention does not simply take place before the second block begins but is ongoing during the whole of the second block in that it is integrated into the phrasing of the task on each trial. It is therefore somewhat misleading to speak of a period 'after the intervention', and it would have been interesting to assess the effect of this by including a third group where the phrasing does not change, but the floating leaves intervention takes place.

      As mentioned in the Limitations section, observation noise was assumed and not estimated. While this is understandable in this case, the effect of this assumption could have been assessed by simulation with varying levels of observation (and process) noise.

      Relatedly, the reliance on formal model comparison is unfortunate since the outcome of such comparisons is easily influenced by slight changes to assumptions such as noise levels. An alternative approach would have been to develop a favoured model based on its suitability to address the research question and its ability, established by simulation, to distill relevant changes of behaviour into reliable parameter estimates.

      The statistical analyses clearly show the limitations of classical statistical testing with highly complex models of the kind the authors (commendably) use. Hunting for statistically significant interactions in a multivariate repeated-measures design relying on inputs from time series-derived point estimates is a difficult proposition. While the authors make the best of the bad situation they create by using null-hypothesis significance testing, a more promising approach would have been to estimate parameters using a sampler like Stan or PyMC and then draw conclusions based on posterior predictive simulations.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript takes a dynamical systems perspective on emotion regulation, meaning that rather than a simplistic model conceptualising regulation as applying to a single emotion (e.g. regulation of sadness), emotion regulation could cause a shift in the dynamics of a whole system of emotions (which are linked mathematically to one another). This builds on the idea that there are 'attractor states' of emotions between which people transition, governed by both the system's intrinsic characteristics (e.g. temporal autocorrelation of a particular emotion/person) and external driving forces (having a stressful week). Conceptually this is a very useful advance because it is very unlikely that emotions are elicited (or reduced) singly, without affecting other emotions. This paper is a timely implementation of these ideas in the context of psychotherapeutic intervention, distancing, which participants were trained (randomised) to perform while watching emotion-inducing videos.

      The authors' main conclusion is that distancing both stabilises specific emotional patterns and reduces the impact of external video clips. I would consider these results strong and believable, and to have the potential to impact models of emotion regulation as well as the field's broader views on the mechanisms of psychological therapies.

      Strengths:

      This paper has very many strengths: I would especially note the authors' very-well-matched active control condition and the robustness of their model comparison approach. One feature of the authors' approach is that they explicitly add noise - not what you typically see in an emotion time-series analysis - which allows participants to make errors in their own subjective ratings (a reasonable thing to assume); this noise can then be smoothed during filtering. In their model comparison approach, they explicitly test whether a true dynamical system explains emotion change/emotion regulation effect on emotions - demonstrating that both intrinsic dynamics and external inputs were needed to explain subjective emotion. Powerfully, they also used this approach to test the differential effects of the treatment groups (see below).

      The main result seems quite robust statistically. Verifying the effects of the distancing intervention on emotion, the authors found an interaction between time (pre- to post-intervention) and intervention group (distancing vs. relaxation) suggesting that distancing (but not relaxation) reduced ratings of almost all emotions. Participants allocated to the distancing intervention also showed decreased variability of emotion ratings compared to those in the relaxation intervention (though note this interaction was not significant).

      Using a model comparison approach, the authors then demonstrated that whilst the control group was best explained by a model that did not change its dynamics of emotions, the active intervention (distancing) group was best explained by a model that captured both changing emotion dynamics and a changing input weights (influence of the videos) - results confirmed in follow-up analyses. This is convincing evidence that emotion regulation strategies may specifically affect the dynamics of emotions - both their relationships to one another and their susceptibility to changes evoked by external influences.

      The authors also perform analyses that suggest their result is not attributable to a demand effect (finding that participants were quicker during the control intervention, which one would expect if they had already decided how to respond in advance of the emotion question). I personally also think a demand effect is unlikely given the robustness of their control intervention (which participants would be just as likely to interpret as mental health-enhancing training as distancing), and I am convinced by the notion that demand effects would be unlikely to elicit their more specific effects on the dynamic quality of emotions.

      Weaknesses:

      An interesting but perhaps at present slightly confusing aspect of their described results relates to the 'controllability' of emotions, which they define as their susceptibility to external inputs. Readers should note this definition is (as I understand it) quite distinct from, and sometimes even orthogonal to, concepts of emotional control in the emotion literature, which refer to intentional control of emotions (by emotion regulation strategies such as distancing). The authors also use this second meaning in the discussion. Because of the centrality of control/controllability (in both meanings) to this paper, at present it is key for readers to bear these dual meanings in mind for juxtaposed results that distancing "reduces controllability" while causing "enhanced emotional control".

      As above the authors use an active control - a relaxation intervention - which is extremely closely matched with their active intervention (and a major strength). However, there was an additional difference between the groups (as I currently understand it): "in the group allocated to the distancing intervention, the phrasing of the question about their feelings in the second video block reminded participants about the intervention, stating: "You observed your emotions and let them pass like the leaves floating by on the stream." I do wonder if the effects of distancing also have been partially driven by some degree of reappraisal (considered a separate emotion regulation strategy) since this reminder might have evoked retrospective changes in ratings.

      Not necessarily a weakness, but an unanswered question is exactly how distancing is producing these effects. As the authors point out, there is a possibility that eye-movement avoidance of the more emotionally salient aspects of scenes could be changing participants' exposure to the emotions somewhat. Not discussed by the authors, but possibly relevant, is the literature on differences between emotion types on oculomotor avoidance, which could have contributed to differential effects on different emotions.

    5. Author response:

      Reviewer 1:

      A primary limitation of this study, acknowledged by the authors, is its reliance on self-reports of participants’ emotional states. Although considerable effort was made to minimize expectation effects, further research is needed to confirm that the observed behavioral changes reflect genuine alterations in emotional states.

      Thank you very much for raising this point. We fully agree that self-reported emotional states are inherently subjective and that the ramifications of this need to be clarified in the manuscript. However, we would suggest that the focus on self-report may be a strength rather than a limitation. First, the regularities and rules underlying and determining emotional self-report are of primary importance and interest in their own right, and the work presented here does, we believe, shed light on a rich structure present in multivariate timeseries of subjective self-reports and their response to external inputs. Second, there is no clear definition of what a ”genuine emotion state” might be; particularly if there is a discrepancy with self-reported emotions.

      Additionally, the generalizability of the findings to long-term remediation strategies remains an open question.

      Yes, we agree that what we have described is limited to a short-term intervention and change.

      Whether these changes bear on longer-term changes remains to be assessed. Furthermore, the mechanisms or processes that would support such a maintenance are of substantial interest, and will be the focus of future work.

      Second, the statistical analysis, particularly the computational approach, sometimes lacks sufficient detail and refinement. While I will not elaborate on specific points here, one notable issue is the interpretation of the intrinsic matrix (A). The model-free analysis reveals correlations between emotions at a given time or within an emotional state across time points. However, it does not provide evidence to support lagged interactions across states that would justify non-diagonal elements in A. The other result concerning the dynamics matrix only highlights a trend in the dominant eigenvalue, which is difficult to interpret in isolation. The absence of a statistically significant group x intervention interaction furthermore makes this finding a little compelling. This weakens the study’s conclusions about the importance of intrinsic dynamics, as claimed in the title.

      We appreciate the reviewer’s detailed feedback on the statistical analysis and interpretation of the intrinsic dynamics matrix. It is true that the model-free analysis as presented focuses on within-state correlations and that we have not provided such model-free evidence for lagged interactions across states. We do note that the model comparison suggested that the intervention caused changes in the full A matrix. This would be unlikely if there had not been meaningful cross-emotion lagged effects. Similarly, inference of the A matrix could have revealed a diagonal matrix, and we preferred not to impose such an assumption a priori, as it is very restrictive. Nevertheless, in the absence of a statistically significant group x intervention interaction, the findings regarding the A matrix are less compelling than those related to the control analyses. While this is likely due to a lack of statistical power, these are important points which we will consider in more detail in the revision.

      Finally, to avoid potential misunderstandings of their work, the authors should be more careful about their use of terms pertaining to the control theory and take the time to properly define them. For example, the ”controllability” of emotional states can either denote that those states are more changeable (control theory definition), or, conversely, more tightly regulated (common interpretation, as used in the abstract). This is true for numerous terms (stability, sensitivity, Gramian, etc.) for which no clear definition nor references are provided. Readers unfamiliar with the framework of control theory will likely be at a loss without more guidance.

      Thank you for this point. We recognize the potential for misunderstanding due to the dual usage of terms such as ”controllability” and will improve the clarity to avoid any misunderstanding.

      Reviewer 2:

      Acquiring data online inevitably gives rise to selection and self-selection effects. This needs to be acknowledged clearly. Exacerbating this, participant remuneration seems low at an amount below the minimum or living wage in Western countries (do the authors know where their participants came from?).

      Thank you for this point. We certainly agree that different experimental settings can induce different biases, and this is no different for online settings. However, online tasks such as the one used here, have become accepted, and there is now a substantial literature showing that in-lab effects are often well-replicated in online settings (Gillan and Rutledge, 2021) . For the current study, it is not clear that an inperson setting may not induce comparably complex biases, e.g. to do with differences between experimenters. All participants were from the UK. Remuneration rates were comparable to other experimental settings, in keeping with other online studies, UK living wage recommendations, and ultimately determined according to institutional ethical guidance.

      Another concern is that the intervention does not simply take place before the second block begins but is ongoing during the whole of the second block in that it is integrated into the phrasing of the task on each trial. It is therefore somewhat misleading to speak of a period ’after the intervention’, and it would have been interesting to assess the effect of this by including a third group where the phrasing does not change, but the floating leaves intervention takes place.

      Thank you for this point. We acknowledge that the phrasing of the emotion question in the second block may have influenced the observed effects. Including a third group without the reminder would have provided valuable insights and is an important consideration for future studies. We will acknowledge this limitation.

      As mentioned in the Limitations section, observation noise was assumed and not estimated. While this is understandable in this case, the effect of this assumption could have been assessed by simulation with varying levels of observation (and process) noise.

      Thank you for this comment. We would like to clarify that both observation noise and process noise were estimated in the analyses. We will ensure this is emphasized better in the revised version to avoid future misunderstandings.

      Relatedly, the reliance on formal model comparison is unfortunate since the outcome of such comparisons is easily influenced by slight changes to assumptions such as noise levels. An alternative approach would have been to develop a favoured model based on its suitability to address the research question and its ability, established by simulation, to distill relevant changes of behaviour into reliable parameter estimates.

      We agree that model comparison alone is insufficient. This is why we have also included extensive simulations, including posterior predictive checks, and have followed established best-practice procedures (Wilson and Collins, 2019). We have focused on a relatively simple model space to avoid overfitting to the dataset, and hence reduce the risk of spurious findings. While we agree that outcomes will be influenced by underlying assumptions, this would persist with the suggested approach of relying on a favoured model. Simulations themselves rely on predefined structures and noise specifications, which inherently shape parameter recovery and inference. Relying only on a favoured model might risk model misspecification, whereby the model may not actually capture the data, and the parameters intended to capture the intervention effect could be confounded. We will clarify the reasoning behind our approach in the revised version.

      The statistical analyses clearly show the limitations of classical statistical testing with highly complex models of the kind the authors (commendably) use. Hunting for statistically significant interactions in a multivariate repeated-measures design relying on inputs from time seriesderived point estimates is a difficult proposition. While the authors make the best of the bad situation they create by using null-hypothesis significance testing, a more promising approach would have been to estimate parameters using a sampler like Stan or PyMC and then draw conclusions based on posterior predictive simulations.

      This comment raises several interesting points. First, we agree that the value of classical test on individual parameters within such complex situations is limited. This is why our main focus is on global measures like model comparison. Our use of the classical tests is more to support the understanding of the nature of the data, i.e. they have a more descriptive aim. We will hope to clarify this further in the revision. Second, in terms of sampling, we would like to emphasize that the Kalman filter is both efficient and analytical tractable, making it well-suited to our data and research question. It may have been possible to use sampling to obtain posterior distributions rather than point estimates. However, we did not judge this to be worth the (substantial) additional computational cost.

      Reviewer 3:

      An interesting but perhaps at present slightly confusing aspect of their described results relates to the ’controllability’ of emotions, which they define as their susceptibility to external inputs. Readers should note this definition is (as I understand it) quite distinct from, and sometimes even orthogonal to, concepts of emotional control in the emotion literature, which refer to intentional control of emotions (by emotion regulation strategies such as distancing). The authors also use this second meaning in the discussion. Because of the centrality of control/controllability (in both meanings) to this paper, at present it is key for readers to bear these dual meanings in mind for juxtaposed results that distancing ”reduces controllability” while causing ”enhanced emotional control”.

      We fully agree with the reviewer’s observation that ”controllability” can be interpreted in different ways. we will revise the text to ensure consistent usage and explicitly state the distinction between the control theory definition of controllability and its interpretation in the emotion regulation literature.

      As above the authors use an active control - a relaxation intervention - which is extremely closely matched with their active intervention (and a major strength). However, there was an additional difference between the groups (as I currently understand it): ”in the group allocated to the distancing intervention, the phrasing of the question about their feelings in the second video block reminded participants about the intervention, stating: ”You observed your emotions and let them pass like the leaves floating by on the stream.” I do wonder if the effects of distancing also have been partially driven by some degree of reappraisal (considered a separate emotion regulation strategy) since this reminder might have evoked retrospective changes in ratings.

      We appreciate this substantial point. While our study was designed to isolate the effects of distancing, we acknowledge that elements of reappraisal may also have influenced the results. We will discuss this in the revised version. Additionally, as noted in our response to Reviewer 2, including a third group without the reminder could have provided valuable information, and we consider this to be an important direction for future research.

      Not necessarily a weakness, but an unanswered question is exactly how distancing is producing these effects. As the authors point out, there is a possibility that eye-movement avoidance of the more emotionally salient aspects of scenes could be changing participants’ exposure to the emotions somewhat. Not discussed by the authors, but possibly relevant, is the literature on differences between emotion types on oculomotor avoidance, which could have contributed to differential effects on different emotions.

      Thank you very much for these suggestions. It is very true that different emotions can elicit different patterns of oculomotor avoidance, which could have contributed to our observed effects. Research suggests that emotions such as disgust are associated with visual avoidance (Armstrong et al., 2014; Dalmaijer et al., 2021), whereas anxiety and other negative emotions exhibited increased attentional bias after fear conditioning (Kelly and Forsyth, 2009; Pischek-Simpson et al., 2009). It would be very interesting to repeat the experiment with eye-tracking to examine these possibilities. What would be particularly interesting to examine is whether a distancing intervention induces multiple, emotionally-specific behaviours, or not.

      References

      Armstrong, T., McClenahan, L., Kittle, J., and Olatunji, B. O. (2014). Don’t look now! Oculomotor avoidance as a conditioned disgust response. Emotion (Washington, D.C.), 14(1):95–104.

      Dalmaijer, E. S., Lee, A., Leiter, R., Brown, Z., and Armstrong, T. (2021). Forever yuck: Oculomotor avoidance of disgusting stimuli resists habituation. Journal of Experimental Psychology. General, 150(8):1598– 1611.

      Gillan, C. M. and Rutledge, R. B. (2021). Smartphones and the Neuroscience of Mental Health. Annual Review of Neuroscience, 44(Volume 44, 2021):129–151. Publisher: Annual Reviews.

      Kelly, M. M. and Forsyth, J. P. (2009). Associations between emotional avoidance, anxiety sensitivity, and reactions to an observational fear challenge procedure. Behaviour Research and Therapy, 47(4):331–338. Place: Netherlands Publisher: Elsevier Science.

      Pischek-Simpson, L. K., Boschen, M. J., Neumann, D. L., and Waters, A. M. (2009). The development of an attentional bias for angry faces following Pavlovian fear conditioning. Behaviour Research and Therapy, 47(4):322–330.

      Wilson, R. C. and Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8:e49547. Publisher: eLife Sciences Publications, Ltd.

    1. eLife Assessment

      This important study has modified ChIP-seq and 4C-seq procedures with a urea step and shows that this drastically changes the pattern of chromatin interactions observed for SATB1 but not other proteins (CTCF, Jarid2, Suz12, Ezh2). Multiple controls make the data convincing. The findings shed new light on the role of SATB1 in genome organization and will be of interest to those who study chromosome structure and nuclear organization.

    2. Reviewer #1 (Public review):

      Summary:

      The nuclear protein SATB-1 was originally identified as a protein of the 'nuclear matrix', an aggregate of nuclear components that arose upon extracting nuclei with high salt. While the protein was assumed to have a global function in chromatin organization, it has subsequently been linked to a variety of pathological conditions, notably cancer. The mapping of the factor by conventional ChIP procedures showed strong enrichment in active, accessible chromatin, suggesting a direct role in gene regulation, perhaps in enhancer-promoter communication. These findings did not explain why SATB-1-chromatin interaction resisted the 2 M salt extraction during early biochemical fractionation of nuclei.

      The authors, who have studied SATB-1 for many years, now developed an unusual variation of the ChIP procedure, in which they purify crosslinked chromatin by centrifugation through 8 M urea. Remarkably, while they lose all previously mapped signals for SATB-1 in active chromatin, they now gain many binding events in silent regions of the genome, represented by lamin-associated domains (LADs).

      SATB-1 had previously been shown by the authors and others to bind to DNA with special properties, termed BUR (for 'base-unpairing regions'). BURs are AT-rich and apparently enriched in equally AT-rich LADs. The 'urea-ChIP' pattern is essentially complementary to the classical ChIP pattern. The authors now speculate that the previously known SATB-1 binding pattern, which does not overlap BURs particularly well, is due to indirect chromatin binding, whereas they consider the urea-ChIP profile that fits better to the BUR distribution on the chromosome to be due to direct binding.

      Building on the success with urea-ChIP the authors adapted the 4C-procedure of chromosome conformation mapping to work with urea-purified chromatin. The data suggest that BUR-bound SATB-1 in heterochromatin mediates long-distance interaction with loci in active chromatin. They close with a model, whereby SATB-1 tethers active chromatin to the nuclear lamina. Because cell type-specific differences are observed, they suggest that the SATB-1 interactions are functionally relevant.

      Strengths:

      Given the unusual finding of essentially mutually exclusive 'standard ChIP' and 'urea-ChIP' profiles for SATB-1, the authors conducted many appropriate controls. They showed that all SATB-1 peaks in urea-ChIP and 96% of peaks in standard-ChIP represent true signals, as they are not observed in a SATB-1 knockout cell line. They also show that urea-ChIP and standard ChIP yield similar profiles for CTCF. The data appear reproducible, judged by at least two replicates and triangulation. The SATB-1 KO cells provide a nice control for the specificity of signals, including those that arise from their elaborately modified 4C protocol.

      Weaknesses:

      The weaknesses mainly relate to missing qualifier statements and overinterpretations. I also found some aspects of the model not yet well supported by the data.

      (1) Under high urea conditions the BUR elements should be rendered single-stranded, and one wonders whether this has any effect on the procedure. The authors should alert the reader of these circumstances.

      (2) An important conclusion is that urea-ChIP reveals direct DNA binding events, whereas standard ChIP shows indirect binding (which is stripped off by urea). I do not yet see any evidence for direct binding. It cannot be excluded, for example, that the binding is RNA-mediated. The authors mention in passing that urea-ChIP material still contains (specific!) RNA. Given that this is a new procedure, the authors should document the RNA content of urea-ChIP and RNase-treat their samples prior to ChIP to monitor an RNA contribution.

      (3) An important aspect of the model is that SATB-1 tethers active genes to inactive LADs. However, in the 4C experiment the BUR elements used to anchor the looping are both in the accessible, active chromatin domain.

    3. Reviewer #2 (Public review):

      Summary:

      The report by Kohwi-Shigematsu et al. describes the key observation that SATB1 binds directly to so-called BUR elements. This is in contrast to several other reports describing SATB1 binding to promoters and enhancers. This discrepancy is explained by the authors to depend on the features of the ChIP technique being used. Urea-ChIP, innovated by the authors, strips off protein-protein interactions that are maintained in conventional ChIP. The authors convincingly make the case that SATB1 and the key genome organiser CTCF co-localize by conventional ChIP but not urea ChIP, as particularly evident in Figure 2A. SATB1 controls long-range interactions in thymocytes and the expression of gene clusters. This feature is independent of TADs, as the knockdown of SATB1 expression does not affect the TAD patterns.

      Strengths:

      A new and innovative adaptation of the urea ChIP-seq technique has enabled the authors to reveal a new aspect of SATB1 binding to the genome. The authors provide a wealth of data to reinforce their claims. This report thus sheds new light on SATB1 function, which is particularly important given its role in metastasising cancer cells.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    1. eLife Assessment

      This important study investigates the different mechanisms that provide instructions for a missing body part to regenerate its appropriate identity. The authors use two species of planarians to identify a key role for bodywide canonical Wnt gradients in controlling the outcome of regeneration. The study provides convincing evidence for variable regeneration efficiency among planarian species that will be of interest to developmental biologists interested in regeneration. However, some of the results are over-interpreted and the additional experiments could provide better support for the authors' claims.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript entitled 'A comparative analysis of planarian regeneration specificity reveals tissue polarity contributions of the axial cWnt signalling gradient.' Cleland et al. study the robustness of regenerating a head or a tail in the proper position in two different planarian species (S. mediterranea and G. sinensis). The authors find that the expression of notum, a Wnt inhibitor that is triggered after any cut, shows different dynamics of expression in both planarian species, being more symmetrical in the species that display a higher number of double-headed or Janus heads (G. sinenesis), which they refer to a less robust regeneration. The authors claim that the reduced robustness of G. sinensis regeneration is partially explained by this anterior-posterior symmetric expression of notum, since in S. mediterranea, which shows a 'robust regeneration' it appears asymmetric. So, the first claim of the manuscript is that the symmetry in notum expression could underlie the poor robustness of regenerating a head/tail in small bipolar regenerating planarian fragments.

      Then, they analyse the role of a proposed tail-to-head cWnt signalling gradient during the regeneration of heads and tails in the same planarian species. To do so they develop an antibody that allows the quantification of b-catenin activity along the AP axis, together with a pharmacological approach that reduces the pre-existent cWnt gradient without affecting the wound-induced. Through this strategy the authors can demonstrate the slope of the b-catenin activity, which is a very nice result, and that it changes according to the size of the animal. Furthermore, they are able to demonstrate that by reducing the cWnt signalling in the pre-existent tissue, there is an increase in the number of double-headed regenerates (Janus heads) and that it depends on the body size and on the decreasing steepness of the cWnt gradient. This result relies on G. sinensis species since the drug is not so effective in S. mediterranea. Thus, the authors' second claim is that the slope of the cWnt gradient may contribute to head-tail regeneration specificity in planarians.

      To conclude, it is proposed that regeneration of the correct identity in each wound depends on multiple cues acting in parallel and that their species-specificity provides variations in the regenerative capability of the different planarian species.

      The study has great potential to have a high impact on the regeneration community, since the opportunity to compare mechanisms between close species provides the framework for understanding the essential mechanism of regeneration.

      Strengths:

      The project has several strengths. The authors are able to reproduce the Janus heads phenotypes described by Morgan TH by analysing different planarian species. This is of great importance in the planarian field, because with the current model species, S. mediterranea, this could not be reproduced. So, these results demonstrate that small planarian fragments do make errors during regeneration, giving rise to double-headed animals, which supports the well-known hypothesis that it exists an anteroposterior gradient underlying anteroposterior identity during regeneration. However, and importantly, it does not occur in all planarian species. So, there are differences between planarian species in the robustness of regeneration and may be in the mechanisms that drive this regeneration. The finding of different behaviours and gene expressions in different planarian species is very interesting and promising in the field of regeneration.

      A second strength of the study is the demonstration of the b-catenin1 slope in planarians and how it changes with the animal size, and also the establishment of a method to decrease it in the pre-existent tissue but not in the wound. This strategy allows us to examine specifically the role of the pre-existent cWnt signal, demonstrating that it does have a role in the decision of making head or tail during regeneration, which was an essential question in the field of planarians and animal regeneration.

      Weaknesses:

      (1) The finding that notum, which is the main head determinant identified in planarians, has a different dynamic in both planarian species is very suggestive. However, the different dynamics of notum expression during regeneration, which is the basis of the subsequent rationale, is not properly demonstrated, nor is its correlation with the robustness in regenerating a proper head/tail identity. Main concerns regarding this point:

      a) The authors observe that 'In regenerating S. mediterranea 2 mm trunk pieces cut from 6 mm animals, notum expression was induced predominantly at anterior-facing wounds as early as 6 h post-amputation (Figure 2A), as previously reported (Petersen and Reddien 2011)'. However, in the graphics in Figures 2B and C, the expression of notum at 6h is shown as symmetric. It definitely does not agree with the in situ, with the text, or with the published data. How was it measured? It should be corrected and explained since it is the basis of the subsequent rationale.

      b) Then, when measuring notum in G. sinensis the authors conclude: 'Strikingly and in sharp contrast to S. mediterranea, the number of notum expressing cells was nearly identical between anterior and posterior wounds without any discernible A/P asymmetry at any of the examined time points (Figures 2E-F)'. However, in the in situ results of 12 h regenerating G. sinensis, there is a clear difference in notum expression between anterior and posterior wounds. Is it not representative of the image? Again, how exactly the measurements were performed? Are dots or pixels quantified? It is not explained in the text. This is a crucial result that has to be consistent.

      c) A more general weakness of this part of the manuscript is that even if the authors demonstrate that in G. sinensis the expression of notum is symmetrical in contrast to S. mediterranea, this is just an observation of 1 species that has symmetrical notum and regenerates less robustly than 1 species that has asymmetrical expression and regenerates more robustly. If they for instance look at the expression of wnt1, maybe they also see differences between both species that could be linked to their different regeneration properties (related to this, see below the comment on wnt1 expression). That is to say, comparing 1 to 1 species cannot give any cause-effect evidence.<br /> Furthermore, the authors rely on the fact that notum inhibition rescues the wild-type phenotype to conclude that is the symmetric expression of notum that underlies the appearance of Janus heads. This is what can be read in the results: 'Significantly, the rescue of wild-type regenerates by notum(RNAi) suggests that the symmetric G. sinensis notum expression contributes to the formation of double-heads and thus to reduced regeneration specificity'; and in the Summary: We found that the reduced regeneration robustness of G. sinensis was partially explained by wound site-symmetric expression of the head determinant notum, which is highly anterior-specific in S. mediterranea.' However, notum RNAi decreases notum in both wounds, so it does not produce an asymmetric expression (at least this is not shown). So, it does not link the symmetry or asymmetry of notum with the appearance of Janus heads.

      d) If the authors want to maintain the claim that the symmetry of notum is one of the reasons that explain the increase in Janus head phenotype in G. sinensis, there are several possibilities to test it. For instance:

      i) Analyse notum expression in different planarian species and relate its symmetry or asymmetry with the appearance of Janus heads. If the claim is true, the species that are more robust should show more asymmetric expression of notum. This would sustain strongly the first claim, and would really be a breakthrough in the field of regeneration.

      ii) Another possibility is a more in-depth analysis of notum expression in the species of the study. If the authors show that larger fragments show fewer Janus heads, and also that it depends on the anteroposterior level of the fragments, they could try to relate the rate of Janus heads with the degree of asymmetry in notum expression in both wounds. For instance, they could analyze notum expression in bipolar regenerating fragments along the anteroposterior axis in both species; it should be more symmetric in G sinenesis, in all fragments, according to Figure 2 L. Or they could analyze notum expression in bipolar regenerating fragments of different sizes, mainly in 1 or 2 mm fragments of big planarians, since they are the fragments analyzed that form or not the Janus heads. In G sinensis the expression of notum should be more symmetrical than in S. mediterranea in these fragments.

      iii) The authors could design an experiment to demonstrate that the symmetry in the expression of notum affects the rate of Janus heads. The experiment that the authors show is the rescue of the Janus heads in G. sinensis after notum RNAi. However, notum RNAi suppresses notum in both wounds, thus not making them asymmetric. Furthermore, the rescue could be explained by the posteriorizing effect that notum RNAi has in planarians, as reported by several authors. A possibility could be to inhibit APC, which increases notum expression in S. mediterranea (Petersen and Reddien 2011). If APC RNAi in G. sinenesis produces an increase in notum in both wounds and the rate of Janus heads is not rescued, then it would support the hypothesis that notum symmetry is the cause of the Janus heads. However, if it produces an increase of notum in an asymmetric manner, then the Janus phenotype should be rescued.

      (2) The second weakness of the study is related to the methodology used to support the second claim, that the slope of bcatenin1 activity has a role in the decision of regeneration - a head and a tail in the correct tip. The main concerns relate to the specificity of the anti-bcatenin1 antibody and to the broad effect of C59 in the secretion of all Wnts.

      a) Raising an antibody against beta-catenin1 that allows the quantification by western blot is a strength of the study, since beta-catenin1 is the key element of the cWnt pathway, and their levels are directly associated with the activation of the pathway. Since this is one of the tools that support the second claim of the study, a characterization of the antibody and additional tests to prove its specificity are required. The authors show a Western blot in which the band intensity decreases after beta-catenin1 inhibition in both species. Further analysis should be shown:<br /> i) Demonstration that the intensity of the band increases after APC or Axin inhibition.<br /> ii) Does the antibody work in immunohistochemistry? It would provide further evidence of the specificity of a nuclear signal could be demonstrated.<br /> iii) Explanation and discussion of the protocol used to analyse the levels of b-catenin1 activity along the anteroposterior axis is required. It has been reported that beta-catenin1 is highly expressed and required in the brain in planarians, and also in the pharynx, and in the sexual organs (Hill and Petersen 2015, Sureda-Gomez et al 2016). How is it then explained the anterior-to-posterior gradient of expression of beta-catenin1 seen in this study in both species? Has the pharynx been removed before the protein extraction? What about the beta-catenin1 activity demonstrated in the brain? Why is it not reflected in the western blot analysis using the antibody? This point should be clarified.

      b) The second tool used in the second part of the manuscript is the drug C59, which inhibits Porcupine, a protein required for palmitoylation and secretion of Wnts. Because Porcupine could be required for the secretion of all Wnts, the phenotype obtained with the drug could be the sum of the inhibition of cWNT signal (wnt1 for instances) and non-canonical WNT (as wnt5). This is in fact the phenotype resulting after the inhibition of Wntless in planarians (Adell et al. 2009), which is also required for the secretion of Wnts. Thus, in the phenotypes resulting from C59 treatment the analysis of the nervous system and posterior/anterior markers is required. Looking at the in vivo phenotype it appears that in fact the drug is affecting both canonical and no canonical pathways since the animal with protrusions in the lateral part (Figure 4B-double head, or Supplementary Figure 3A) is very similar to the one reported after Wntless inhibition. In case the phenotypes observed also show non-canonical Wnt inhibition, this should be clearly shown and discussed.

      The above-mentioned weaknesses are the most important concerns about the present manuscript. However, there are other concerns related to a further analysis of the phenotypes and the analysis of additional Wnt elements as wnt1, which are essential to complete the study and are directly discussed with the authors.

    3. Reviewer #2 (Public review):

      Summary:

      This study identifies a key role for bodywide canonical Wnt gradients in controlling the outcome of regeneration within planarians, likely acting in parallel to injury-induced cues that also use tissue asymmetry to control this process. In S. Mediterranea a central part of this decision process is the asymmetric expression of the Wnt inhibitor notum specifically at injury sites facing in the anterior direction to promote head formation and inhibit tail formation through regulation of canonical Wnt signaling pathways. Leveraging classic studies by T.H. Morgan over a century ago, which found that amputated thin transverse fragments occasionally incorrectly regenerate 2 heads rather than a head and a tail in a species of Girardia planarians, this study identifies a closely related species G. Sinensis which undergoes errors to regeneration specificity under similar challenges. Morgan had proposed that his results might arise from the use of a "gradient of materials" providing axis information across the body axis such that small tissue fragments are too narrow to interpret gradient differences, leading to head/tail polarity defects in regeneration. The authors show very convincingly that this species of planaria undergoes notum expression after injury, but unlike in S. Mediterranea, this occurs symmetrically at the onset of regeneration. Using RNAi, they show notum participates in the regeneration of mispolarized heads (though interestingly apparently not in normal head regeneration unlike in Smeds, at least under these conditions). G. Sinensis planarians, like many organisms, have abundant expression of Wnt genes posteriorly. To test whether this gradient of Wnts may participate in the regeneration distinct from any Wnt signals activated after injury, the authors use chemical inhibition to reduce Wnt signaling prior to injury and then alleviate inhibition following injury by removal of the drug and confirming successful washout of the drug using mass spec. They also raise a new antibody that can detect beta-catenin-1 in this species in order to monitor the body-wide cWnt gradient after these treatments, and correlate this with outcomes on the head/tail regeneration decision. Using this approach, they find that homeostatic inhibition of porcupine (required for Wnt secretion) could dampen the cWnt/beta-catenin gradient and increase the incidence of inappropriate head regeneration at posterior-facing wounds. In addition, they find that the cWnt gradient is less steep in larger animals that also concurrently have a higher incidence of mistakes in regeneration specificity. Together, the paper presents compelling experiments and analysis to support the conclusion that cWnt gradients are an important determinant of head/tail identity determination decisions in G. Sinensis, and thereby proposes a plausible model that the notum asymmetry present in S. Mediterranea could act in parallel to support the higher regeneration robustness observed in that species.

      Strengths:

      This is a great paper, an instant classic. It addresses an enduring problem that Morgan and others initiated more than a century ago and brings a new synthesis of ideas to clarify an important mechanism. I also like the term "regeneration specificity" which can provide a nice unification and generalization of ideas that other authors have variously described as regeneration patterning or regeneration polarity. The work is a tour de force that creatively builds new tools and observations to leverage a new model of planarian species for unraveling general mechanisms of regeneration decision-making. The experiments are rigorously conducted and I find the overall data to be quite compelling. I have some comments for the authors to consider below for drawing out the interpretation and also clarifying the underlying mechanism.

      Comments:

      (1) The G. Sinesis species showed accurate head/tail specificity in 2mm thick fragments but was strongly impaired at 1 mm thick. I am assuming that outcomes of pieces greater than 2mm would make similarly robust head/tail choices, implying a rather sharp transition occurring between 1 and 2 mm. In that case, in the gradient model, are there theoretical reasons to predict that polarity outcomes would decline sharply rather than gradually as size thickness decreases? I think the muscle fibers themselves are thought to have lengths on the order of 200 microns, so I wonder what could account for the characteristic length of less than 1mm here? From the lab's prior analysis of beta-cat gradient, is this perhaps the minimal length where a difference in bcat protein levels can be detected? This is not essential to resolve in this draft (in my view), just a very interesting question arising from the present study. Relatedly, it seems that the slope of cWnt at the wound site itself might not be enough information for polarity because at a highly granular level, this should be identical at posterior-facing wounds from trunk fragments versus thin transverse fragments obtained at the same AP position, yet trunk fragments succeed at regeneration specificity whereas thin transverse fragments fail.

      (2) The paper nicely shows strong evidence that notum expression is definitely symmetric at the first occurrence of its expression by 6 hours in D. Sinensis, and this is a really important result of the paper. At 12 hours, it does look to me in the FISH experiments that there is more persistence of expression at the anterior-facing wound versus the posterior-facing wounds (Fig 2D), although the methods for quantification in Fig2E/F do not show a difference in expression at the two wound sites at this time point. Could this difference arise from differences in the perdurance or timing of early wound-induced signaling at the two wound sites that was perhaps too subtle to detect in the quantification methods used? Or perhaps these images do not represent the population? On a related note, the quantification method seems to fail to show that in 6h Smeds, notum expression is indeed asymmetric. Probably the issue here is not the data in the FISH images themselves which strongly support the author's interpretations, but rather a deficiency or limitation of the quantification method used, which should be resolved so that the conclusions from the single FISH images can be interpreted robustly. For example, some authors have used a method of counting notum+ cells and I wonder if this could provide better quantitative information here.

      (3) Given that the double-headed phenotype is observed from thin transverse fragments, ideally, the symmetry of notum could be established to occur in those types of fragments as well. This experiment would clarify that notum is expressed at posterior-facing wounds in the very same types of fragments that undergo the highest levels of mistakes in regeneration specificity.

      (4) Is wnt1 expressed symmetrically at wound sites in this species? It seems there are cases like acoels where wound-induced Wnt activation can occur asymmetrically but through preferential expression of Wnts at posterior-facing wounds, rather than notum. It would be interesting to know although I also think the work the authors already have done in this study itself already constitutes a very comprehensive advance and could be the subject of future work.

      (5) I agree that notum is relatively much more strongly expressed at the far posterior region in D. Senesis than in Smeds, but it does seem from the RNAseq data it also has some locally enriched expression at the anterior pole. Because the RNAseq analysis involves scaling expression across the regions for each gene, it is difficult to know if the anterior expression is relatively lower or perhaps even about the same level of expression as the anterior pole expression of this gene in Smeds. Though not essential to make the desired arguments, in situs on notum in the intact animals would be helpful to clarify this. Relatedly it would be fascinating to know whether D. Senesis notum undergoes anterior-pole expression around the 72 hour or similar timepoint as in Smeds.

      (6) The assessment of beta-catenin gradients was done through protein extractions from whole tissue fragments. However, it has been shown in other planarian species that beta-catenin can have strong tissue-specific expression in, for example, the pharynx, brain, and reproductive systems. Some supporting evidence or argument should be presented to clarify the interpretation that the graded expression observed by western blotting cannot be fully explained by this kind of tissue-specific expression of beta-catenin rather than representing a true signaling gradient as interpreted by the authors. For example, if this antibody could be used in immunostaining, this could support the beta-catenin signaling gradient. Alternatively, information about the location of the pharynx or any other posterior reproductive tissues in D. Sinensis could be calibrated with respect to the fragment bins used for the gradient--perhaps a portion of the C59-dependent body-wide gradient measured here occurs fully within tail tissue that lacks other regionalized tissue that could be a potential additional source of beta-catenin. Further discussion and interpretation, or additional experiments, should be included to rule out alternative confounding sources of beta-catenin in order to clarify the interpretation of the western blot as representing a beta-catenin signaling gradient.

      (7) I find the analysis in Figure 5 to be quite compelling for showing the importance of cWnt/Bcat gradients in contributing to head/tail determination, and I also think that the author's discussion of the limitations of the approach are well articulated and considered. Based on prior literature, it also seems very likely that there is a third redundantly acting component to regeneration specificity, which is the amplification of small differences in cWnt in a directional-dependent manner early in the regeneration process (24-72 hours in Smeds). This would explain why post-amputation with porcupine inhibitor in D. Sinensis caused 100% penetrant defects in regeneration specificity while the pre-treatment paradigm caused a weaker effect (25-40% for larger animals). In Smeds, it is known already that delivery of dsRNAs against beta-catenin-1, wnt1, and notum only after injury caused polarity defects, and thus all three genes certainly have a function relevant for head/tail after injury (Petersen and Reddien 2008, 2009, 2011- please note these experiments were reported in the text of these studies and not in individual figures). This evidence, combined with extensive FISH and complementary RNAi studies in the field, strongly suggests that some combination of the 6-18h injury-induced phase but also very likely the subsequent "pole-specific phase" of wnt1 expression is likely to be important for driving or enacting the tail fate program and is therefore a component of the regeneration specificity mechanism described here.

      (8) Prior work has also demonstrated roles for Wnt genes expressed in gradients to participate in regeneration specificity. In particular, inhibition of the wntP-2/wnt11-5 gene, which is expressed in an animal-wide gradient, strongly enhanced the effects of inhibition of wnt1, which is the earliest wound-activated Wnt gene, to cause 100% penetrant posterior head regeneration phenotypes in S. mediterranea (Petersen and Reddien 2009). These observations are complementary to the present study by implicating Wnts expressed in bodywide gradients in the process of regeneration decision-making. Given that this study also showed that wnt1 is necessary for new wntP-2 expression during the wound-induced early phase and that wnt1 activation does not require beta-catenin for its expression, collectively suggest a more complex process involved in gradient detection and the involvement of wound signals likely beyond only autoregulation of the cWnt gradient or notum asymmetry mechanisms. Although this paper is cited already, framing the present study more fully in context with this and other relevant prior work would be helpful to contextualize the advance for the field.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors revisit the hypothesis of gradient-based polarity specification during planarian regeneration proposed over a century ago, but here they apply molecular techniques and a valuable comparative approach. By using a comparative analysis with classic and modern planarian model organisms, the authors have identified variable molecular mechanisms that different planarian species utilize to ensure that the proper tissues are regenerated following wounding.

      Strengths:

      The comparative approach of using 2 different planarian species allowed the study to elucidate different molecular mechanisms that planarians utilize in re-establishing anterior-posterior axis polarity during regeneration. Without this comparative approach, the mystery of T.H. Morgan's data classic studies that demonstrate mistakes in this axis re-polarization would remain unanswered. Furthermore, the use of both a modern molecular model species and another more classical planarian species, which the authors have fully developed with molecular tools and techniques, sheds light on the diversity of genetic processes that closely related species seem to utilize in regeneration. To dissect the role of a long-hypothesized canonical cWnt signaling gradient, the authors developed a novel strategy using chemical genetics to titer this gradient, which led to phenotypes with enhanced aberrant axis polarity re-establishment. Together these experimental approaches establish a well-supported initial model for explaining the molecular mechanisms that different planarian species utilize to allow for proper regeneration of lost tissues.

      Weaknesses:

      While pharmacological perturbation of signaling pathways could produce off-target effects, the authors provide well-documented evidence that canonical Wnt signaling is altered with drug treatment. The correlation between altered cWnt signaling gradients and the incidence of double-headed regeneration is strong, but it is not clear that the axial cWnt signaling gradient is the ultimate cause of the modified regeneration polarity. However, the model established here and supported by considerable data provides a useful alternative to the mechanism of notum upregulation that has been well-documented in the Schmidtea mediterranea, the workhouse model in planarian research. Throughout the manuscript, the authors suggest that Girardia sinensis lost the ability to upregulate notum at anterior-facing wounds, but until additional planarian species are evaluated, it remains plausible (and equally parsimonious) that S. mediterranea could have innovated a novel strategy to re-establish axis-polarity through asymmetric notum expression.

      The study is very well-designed with considerable confirmation of results, especially in the novel use of the pharmacological inhibitor C59. This study is invaluable in its comparative approach, finding that well-established molecular processes may not explain similar developmental outcomes for different species; this corroborates the need to study additional model organisms and how an evolutionary approach to the study of development is imperative.

    1. eLife Assessment

      This valuable study reports a potential connection between the seminal microbiome and sperm quality/male fertility. The data are generally convincing. This study will be of interest to clinicians and biomedical researchers who work on microbiome and male fertility.

    2. Reviewer #1 (Public review):

      Summary:

      The authors analyzed the bacterial colonization of human sperm using 16S rRNA profiling. Patterns of microbiota colonization were subsequently correlated with clinical data, such as spermiogram analysis, presence of reactive oxygen species (ROS), and DNA fragmentation. The authors identified three main clusters dominated by Streptococcus, Prevotella, and Lactobacillus & Gardnerella, respectively, which aligns with previous observations. Specific associations were observed for certain bacterial genera, such as Flavobacterium and semen quality. Overall, it is a well-conducted study that further supports the importance of the seminal microbiota.

      Strengths:

      - The authors performed the analysis on 223 samples, which is the largest dataset in semen microbiota analysis so far

      - Inclusion of negative controls to control contaminations.

      - Inclusion of a positive control group consisting of men with proven fertility.

      [Editors' note: the authors addressed the concerns raised in the previous round of review.]

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Discussion: Could the authors discuss more the findings about Flavobacterium? Has it ever been associated with the urogenital tract?

      Page 13-14, line 252-268:

      ‘The genus Flavobacterium was defined in 1923 to encompass gram-negative, non-spore-forming rods, of yellow pigment (44). The inclusiveness of this definition resulted in a collective of heterogenous species. By 1984 the genus had been restricted to those that were also non-motile and non-gliding (44). More recently, with an increase in genomic profiling, many species previously considered to be of genus Flavobacterium have been reclassified to genus Chryseobacterium, Cytophaga, and Weeksella (45). Increasing numbers of Flavobacterium species are being discovered such as gondwanense, Collinsii, branchiarum, branchiicola, salegens and scophthalmum (46) (47) (48). The allocation of Flavobacterium aquatile to this genus remains controversial due to its motility (49). Flavobacterium species are widely distributed in the environment including soil, fresh water and saltwater habitats (50) (51).  There are many reports of pathogenic infections of Flavobacterium species in fish, however human infections are rare (48).  A handful of case reports have described opportunistic infections to include pneumonia, urinary tract infection, peritonitis and meningitis (52) (53) (54) (55). Flavobacterium lindanitolerans and Flavobacterium ceti have been isolated as causative agents in some (56) (54). Case reports also describe Flavobacterium odoratum as a causative agent in urinary tract infection, most often in the immunocompromised or those with indwelling devices (57) (58) (59). However, this was one of many species previously of genus Flavobacterium reclassified, in this case to genus Myroides (60). Notably in our sample participants were asymptomatic of urinary tract infection’. 

      What is the relative abundance of Flavobacterium in the present study: this type of bacterium has been previously associated with contaminations (PMID: 25387460, 30497919).

      Page 13, line 244-247:

      ‘The Flavobacterium genus taxon we identified as significantly associated with abnormal semen quality and sperm morphology was present in 36.28% of the samples, with a mean relative abundance of 1.15% in those samples. This information and the mention of previous findings of Flavibacterium in contamination studies have been added to the discussion’.

      Figure 1: Increase the size of panel A.

      Amended.

      Figure 3: Can the authors indicate the relative abundance of each genus/species by the size of the node?

      Co-occurrence network figure has been modified to display relative abundance of nodes.

      Supplementary data: I don't see anywhere the decontam plots.

      Decontam plots as suggested in the package vignette https://benjjneb.github.io/decontam/vignettes/decontam_intro.html have been added in the GitHub repository. For practical purposes, the plot corresponding to the frequency testing only display a random subset (n=15) of the total taxa (n=82) flagged by this test as contaminants. The. .csv files with the outputs of each filter are available in the same directory

      Line 12: Check the sentence

      Line 15: Genera in italics

      Line 33: Change "overall quality of the spermatozoa" to "overall semen quality"

      Lines 18-20: Rephrase

      Line 87: 28F-Borrelia

      Line 134: "Seminal microbiota" or "Composition of the seminal microbiota"

      Line 159: "These included ... genera"

      Line 166: "Of note, Flavobacterium genus was..."

      Lines 187-188: Check sentence

      Thank you, these have been amended

    1. eLife Assessment

      This compelling study introduces a set of novel genetically encoded tools for the selective and reversible ablation of excitatory and inhibitory synapses. These new tools enable selective and efficient ablation of excitatory synapses, and photoactivatable and chemically inducible methods for inhibitory synapse ablation in specific cell types, providing valuable methods for disrupting neural circuits. This approach holds broad potential for investigating the roles of specific synaptic input onto genetically determined cells.

    2. Reviewer #1 (Public review):

      Summary:

      This work is a continuation of a previous paper from the Arnold group, where they engineered GFE3, which allows to specifically ablate inhibitory synapses. Here, the authors generate 3 different actuators:

      (1) An excitatory synapse ablator.<br /> (2) A photoactivatable inhibitory synapse ablator.<br /> (3) A chemically inhibitory synapse ablator.

      Following initial engineering, the authors present characterization and optimization data to showcase that these new tools allow one to specifically ablate synapses, without toxicity and with specificity. Furthermore, they showcase that these manipulations are reversible.

      Altogether, these new tools would be important for the neuroscience community.

      Strengths:

      The authors convincingly demonstrate the engineering, optimization and characterization of these new probes. The main novelty here is the new excitatory synapse ablator, which has not been shown yet and thus could be a valuable tool for neuroscientists.

      Weaknesses:

      The authors have convincingly demonstrated the use of these tools in cultured neurons. The biggest weakness is the limited information given for the use of these tools for in vivo studies. The authors provide one example of the use of these new tool to study retinal circuits, and show evidence that the excitatory synapse ablator reduces synaptic transmission in retinal slices. Still, more work will be required to use this tool in intact neuronal circuits. It remains unclear if it would be trivial to characterize how well these tools express and operate in vivo. This could be substantially different and present some limitations as to the utility of these tools.

    3. Reviewer #2 (Public review):

      Summary:

      This study introduces a set of genetically encoded tools for the selective and reversible ablation of excitatory and inhibitory synapses. Previously, the authors developed GFE3, a tool that efficiently ablates inhibitory synapses by targeting an E3 ligase to the inhibitory scaffolding protein Gephyrin via GPHN.FingR, a recombinant, antibody-like protein (Gross et al., 2016). Building on this work, they now present three new ablation tools: PFE3, which targets excitatory synapses, and two new versions of GFE3-paGFE3 and chGFE3-that are photoactivatable and chemically inducible, respectively. These tools enable selective and efficient synapse ablation in specific cell types, providing valuable methods for disrupting neural circuits. This approach holds broad potential for investigating the roles of specific synaptic input onto genetically determined cells.

      Strengths:

      The primary strength of this study lies in the rational design and robust validation of each tool's effectiveness, building on previous work by the authors' group (Gross et al., 2016). Each tool serves distinct research needs: PFE3 enables efficient degradation of PSD-95 at excitatory synapses, while paGFE3 and chGFE3 allow for targeted degradation of Gephyrin, offering spatiotemporal control over inhibitory synapses via light or chemical activation. These tools are efficiently validated through robust experiments demonstrating reductions in synaptic markers (PSD-95 and Gephyrin) and confirming reversibility, which is crucial for transient ablation. By providing tools with both optogenetic and chemical control options, this study broadens the applicability of synapse manipulation across varied experimental conditions, enhancing the utility of E3 ligase-based approaches for synapse ablation.

      Weaknesses:

      While this study provides valuable tools and addresses many critical points for varidation, examining potential issues with specificity and background ubiquitination in further detail could strengthen the paper. For PFE3, the study demonstrates reductions in both PSD-95 and GluA1. In their previous work, GFE3 selectively reduced Gephyrin without affecting major Gephyrin interactors or other PSD proteins. Clarifying whether PFE3 affects additional PSD proteins beyond GluA1 would be important for accurately interpreting results in experiments using PFE3. Additionally, further insight into PFE3's impact on inhibitory synapses would be valuable to assess the excitatory specificity and potential for circuit-level applications. For paGFE3 and chGFE3, the E3 ligase (RING domain of Mdm2) is overexpressed and thus freely diffusible within the cell as a separate construct. Although the authors show that Gephyrin is not significantly reduced without light or chemical activation, it remains possible that other proteins, particularly non-synaptic proteins, could be ubiquitinated due to the presence of freely diffusing E3 ligase in cytosol. Addressing these points would clarify the strengths and limitations of tools, providing users with valuable information.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1:

      The biggest concern in this regard is: that almost all the characterization is performed in cultured dissociated neurons…

      While it is true that most of the characterization done in this paper was in cultured neurons, we verified that PFE3 mediates functional ablation of excitatory synapses in vivo (Fig. 3). Furthermore, the GPHN.FingR-XIAP (GFE3), a protein very similar to the complex formed following activation of paGFE3 and chGFE3, has been extensively tested by us and others in vivo(1-4).

      Reviewer #2:

      For paGFE3 and chGFE3, the E3 ligase (RING domain of Mdm2) is overexpressed throughout cells as a separate construct. Although the authors show that Gephyrin is not significantly reduced without light or chemical activation, it remains possible that other proteins could be ubiquitinated due to the overexpressed E3 domain.

      In our previous paper(1), we tested neurons under 3 conditions: 1. expressing a construct similar to PBP-E3, consisting of a FingR with a randomized binding domain fused to the same XIAP ring domain used in paGFE3 and chGFE3 (RAND-E3). 2. expressing GPHN.FingR. 3. not expressing any exogenous proteins (control neurons). In each case, we found that expression of a variety of excitatory and inhibitory synaptic proteins was not significantly different when exposed to either of these exogenous proteins compared with control neurons.

      Recommendations for the authors:

      (1)  Can the authors use the tools to show the ablation of endogenous PSD95 without FingR overexpression?

      The experiments described in Fig. 3 are an example of this type of experiment. Furthermore, the PSD-95.FingR was extensively tested and has been used in dozens of studies without any indication that its expression alters cellular function or morphology. Note also that the transcriptional regulation system of PSD-95.FingR limits the expression such that there is virtually no background, so it is not really being overexpressed.

      (2) I am missing some control experiments for the excitatory synapses ablator- can the authors show that cells transfected with the plasmid and no DOX, show similar numbers of synapses as neurons without transfection?

      We have added an experiment comparing cells expressing PSD-95.FingR alone, and others expressing PFE3 with no Dox. We found that the two types of cells express amounts of PSD-95 that are not significantly different (Fig. S2L).

      (3) I am not quite sure how they used paired statistics on staining since they could only stain the cell at the end of the experiment. Are the comparisons performed on different cells?

      These experiments were done on the same cells. However, the methods of labeling were different- the initial counting of synapses was done, so we agree with the reviewer that it would be best not to use a paired analysis. Accordingly, we have changed Figs. 1F and 2D.

    1. eLife Assessment

      The paper describes a novel approach for inferring features of synaptic networks from recordings of individual cells within the network. The paper will be a valuable contribution to those studying central pattern generators, including those involved in respiration. However, the theoretical approach to drawing inferences regarding the underlying synaptic currents is incomplete as it relies on unsupported simplifying assumptions.

    2. Reviewer #1 (Public review):

      Summary:

      The paper develops a phase method to obtain the excitatory and inhibitory afferents to certain neuron populations in the brainstem. The inferred contributions are then compared to the results of voltage clamp and current clamp experiments measuring the synaptic contributions to post-I, aug-E and ramp-I neurons.

      Strengths:

      The electrophysiology part of the paper is sound and reports novel features with respect to earlier work by JC Smith et al 2012, Paton et al 2022 (and others) who have mapped circuits of the respiratory central pattern generator. Measurements on ramp-I neurons, late-I neurons and two types of post-I neurons in Fig.2 besides measurements of synaptic inputs to these neurons in Fig.5 are to my knowledge new.

      Weaknesses:

      The phase method for inferring synaptic conductances fails to convince. The method rests on many layers of assumptions and the inferred connections in Fig.4 remain speculative. To be convincing, such method ought to be tested first on a model CPG with known connectivity to assess how good it is at inferring known connections back from the analysis of spatio-temporal oscillations. For biological data, once the network connectivity has been inferred as claimed, the straightforward validation is to reconstruct the experimental oscillations (Fig.2) noting that Rybak et al (Rybak, Paton Schwaber J. Neurophysiol. 77, 1994 (1997)) have already derived models for the respiratory neurons.

      The transformation from time to phase space, unlike in the Kuramoto model, is not justified here (L.94) and is wrong. The underpinning idea that "the synaptic conductances depend on the cycle phase and not on time explicitly" is flawed because synapses have characteristic decay times and delays to response which remain fixed when the period of network oscillations increases. Synaptic properties depend on time and not on phase in the network. One major consequence relevant to the present identification of excitatory or inhibitory behaviour, is that it cannot account for change in behaviour of inhibitory synapses - from inhibitory to excitatory action - when the inhibitory decay time becomes commensurable to the period of network oscillations (Wang & Buzsaki Journal of Neuroscience 16, 6402 (1996), van Vreeswijk et al. J. Comp. Neuroscience 1,313 (1994), Borgers and Kopell Neural Comput. 15, 2003). In addition, even small delays in the inhibitory synapse response relative to the pre-synaptic action potential also produce in-phase synchronization (Chauhan et al., Sci. Rep. 8, 11431 (2018); Borgers and Kopell, Neural Comput. 15, 509 (2003)). The present assumption are way too simplistic because you cannot account for these commensurability effects with a single parameter like the network phase. There is therefore little confidence that this model can reliably distinguish excitatory from inhibitory synapses when their dynamics properties are not properly taken into account.

      L..82, Eq.1 makes extremely crude assumptions that the displacement current (CdV/dt) is negligible and that the ion channel currents are all negligible. Vm(t) is also not defined. The assumption that the activation/inactivation times of all ion channels are small compared to the 10-20ms decay time of synaptic currents is not true in general. Same for the displacement current. The leak conductance is typically g~0.05-0.09ms/cm^2 while C~1uF/cm^2. Therefore the ratio C/g leak is in the 10-20ms range - the same as the typical docking neurotransmitter time in synapses.

      Models of brainstem CPG circuits have been known to exist for decades: JC Smith et al 2012, Paton et al 2022, Bellingham Clin. Exp. Pharm. And Physiol. 25, 847 (1998); Rubin et al., J. Neurophysiol. 101, 2146 (2009) among others. The present paper does not discuss existing knowledge on respiratory networks and gives the impression of reinventing the wheel from scratch. How will this paper add to existing knowledge?

      Comments on revisions:

      The authors have done a good job at revising the manuscript to put this work into the context of earlier work on brainstem central pattern generators.

      I still believe the case for the method is not as convincing as it would have been if the method had been validated first on oscillations produced by a known CPG model. Why would the inference of synaptic types from the model CPG voltage oscillations be predetermined? Such inverse problems are quite complicated and their solution is often not unique or sufficiently constrained. Recovering synaptic weights (or CPG parameters) from limited observations of a highly nonlinear system is not warranted (Gutenkunst et al., Universally sloppy parameter sensitivities in systems biology models, PLoS Comp. Biol. 2007; www.doi.org/10.1371/journal.pcbi.0030189) especially when using surrogate biological models like Hodgkin-Huxley models.

      In p.2, the edited section refers to the interspike interval being much smaller than the period of the network. More important is to mention the relationship between the decay time of inhibitory synapses and the period of the network.

    3. Reviewer #2 (Public review):

      Summary:

      By measuring intracellular changes in membrane voltage from a single neuron of the medulla the authors describe a method for determining the balance of excitatory and inhibitory synaptic drive onto a single neuron within this important brain region.

      Strengths:

      This data-driven approach to exploring neural circuits is well described and could be valuable in identifying microcircuits that generate rhythms. Importantly, perhaps, this inference method could enable microcircuits to be studied without the need for time consuming anatomical tracing or other more involved electrophysiological techniques. Therefore, I definitely can see the value in developing an approach of this type.

      Weaknesses:

      There are many assumptions that need to be accepted in order to successfully apply this technique and I was pleased to see that several of these assumption have been explored by the authors in this study.

      For example, this approach involves assuming the reversal potential that is associated with the different permeant ions that underlie the excitation and inhibition as well as the application of Ohms law to estimate the contribution of excitation and inhibitory conductance. My first concern was that this approach relies on a linear I-V relationship between the measured voltage and the estimated reversal potential. However, open rectification is a feature of any I-V relationship generated by asymmetric distributions of ions (see the GHK current equation) and will therefore be a particular issue for the inhibition resulting from asymmetrical Cl- ion gradients across GABA-A receptors. The mixed cation conductance that underlies most synaptic excitation will also generate a non-linear I-V relationship due to the inward rectification associated with polyamine block of AMPA receptors. The authors present evidence that over most of the voltage range examined the I-V relationship is linear and this is a helpful addition.

      This approach has similarities to earlier studies undertaken in the visual cortex that estimated the excitatory and inhibitory synaptic conductance changes that contributed to membrane voltage changes during receptive field stimulation. However, these approaches also involved the recording of transmembrane current changes during visual stimulation that were undertaken in voltage-clamp at various command voltages to estimate the underlying conductance changes. Molkov et al have attempted to essentially deconvolve the underlying conductance changes without this information and I am concerned that this simply may not be possible.

      The current balance equation (1) cited in this study is based upon the parallel conductance model developed by Hodgkin & Huxley. One key element of the HH equations is the inclusion of an estimate of the capacitive current generated due to the change in voltage across the membrane capacitance. While the present study takes into account the impact of membrane capacitance, a deeper discussion on how variations in capacitance across different neuron types might affect inference accuracy would be useful. Differences in capacitance could introduce variability in inferred conductances, potentially influencing model predictions.

      Studies using acute slicing preparations to examine circuit effects have often been limited to the study of small microcircuits - especially feedforward and feedback interneuron circuits. It is widely accepted that any information gained from this approach will always be compromised by the absence of patterned afferent input from outside the brain region being studied. In this study, descending control from the Pons and the neocortex will not be contributing much to the synaptic drive and ascending information from respiratory muscles will also be absent completely. This may not have been such a major concern if this study was limited to demonstrating the feasibility of a methodological approach. However, this limitation does need to be considered when using an approach of this type to speculate on the prevalence of specific circuit motifs within the medulla (Figure 4). Therefore, I would argue that some discussion of this limitation should be included in this manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper develops a phase method to obtain the excitatory and inhibitory afferents to certain neuron populations in the brainstem. The inferred contributions are then compared to the results of voltage clamp and current clamp experiments measuring the synaptic contributions to post-I, aug-E, and ramp-I neurons.

      Strengths:

      The electrophysiology part of the paper is sound and reports novel features with respect to earlier work by JC Smith et al 2012, Paton et al 2022 (and others) who have mapped circuits of the respiratory central pattern generator. Measurements on ramp-I neurons, late-I neurons, and two types of post-I neurons in Figure 2 besides measurements of synaptic inputs to these neurons in Figure 5 are to my knowledge new.

      Weaknesses:

      The phase method for inferring synaptic conductances fails to convince. The method rests on many layers of assumptions and the inferred connections in Figure 4 remain speculative. 

      We hope that the additional method justifications now incorporated in the manuscript will make our method more convincing and change this reviewer’s opinion.

      To be convincing, such a method ought to be tested first on a model CPG with known connectivity to assess how good it is at inferring known connections back from the analysis of spatio-temporal oscillations. 

      We respectfully disagree with this critique. Existing respiratory CPG models are based on a conductance-based formalism. Since the neurons recorded using our approach are typically hyperpolarized, in the model at the corresponding values of the membrane potential, all voltage-gated channels will be deactivated. Therefore, the current balance equation used in this study will closely align with the descriptions used in these models. This alignment will result in a near-exact correspondence between the synaptic conductance values inferred by our method and their model counterparts. However, we believe that such a demonstration, while predetermined to be successful, would not be convincing for a computationally savvy audience.

      For biological data, once the network connectivity has been inferred as claimed, the straightforward validation is to reconstruct the experimental oscillations (Figure 2) noting that Rybak et al (Rybak, Paton Schwaber J. Neurophysiol. 77, 1994 (1997)) have already derived models for the respiratory neurons.

      Running such simulations is beyond the scope of this paper, which focuses on our methods for extracting synaptic conductances during network activity cycles from intracellular recordings. However, the existing, largely speculative, respiratory CPG models can be validated against the "ground truth" of the inferences we present here. To illustrate how our circuit connection motifs elaborate on existing respiratory CPG models, we have now included a combinatorial connectivity model in the manuscript derived from the connectivity motifs in the supplemental figures (Figure 4 Supplemental Figure 1) with comparisons to the model schematic utilized by Rybak, Smith et al. in simulation studies to simulate a rhythmic three-phase respiratory pattern. There are conserved mechanistically important connectivity features between these schematics that it is possible to suggest that our more elaborate connectivity scheme would almost certainly generate the three-phase patterns of neuronal firing and network rhythmic activity.

      The transformation from time to phase space, unlike in the Kuramoto model, is not justified here (Line 94) and is wrong. The underpinning idea that "the synaptic conductances depend on the cycle phase and not on time explicitly" is flawed because synapses have characteristic decay times and delays to response which remain fixed when the period of network oscillations increases. Synaptic properties depend on time and not on phase in the network. 

      The primary assumption of our method is that all variables within the system are periodic functions of time. Therefore, the inputs to the recorded neuron, at minimum, are fully defined by the oscillation's phase. While the transduction of the input into postsynaptic conductance may have its own time dependence, the characteristic timescale of synaptic dynamics (10-20 ms, as suggested by the reviewer) is much smaller than the period of network oscillations. This is certainly true for the test system we are using. This valid assumption of our method is now further clarified in the revised manuscript.

      One major consequence relevant to the present identification of excitatory or inhibitory behaviour, is that it cannot account for change in the behaviour of inhibitory synapses - from inhibitory to excitatory action - when the inhibitory decay time becomes commensurable to the period of network oscillations (Wang & Buzsaki Journal of Neuroscience 16, 6402 (1996), van Vreeswijk et al. J. Comp. Neuroscience 1,313 (1994), Borgers and Kopell Neural Comput. 15, 2003). 

      Our method focuses on recovering synaptic conductances rather than directly measuring presynaptic inputs. The conversion of presynaptic inputs (spike trains) into postsynaptic conductances involves its own time scales. This can lead to complex dynamical effects when synaptic delay or decay times are comparable to the oscillation period. In such cases, although our conductance calculation remains accurate, we might misinterpret the phase of the presynaptic input, as it may not align with the phase of the postsynaptic conductance peak. However, this discrepancy is not significant for applications where the synaptic delay/decay times are considerably shorter than the oscillation period.

      In addition, even small delays in the inhibitory synapse response relative to the pre-synaptic action potential also produce in-phase synchronization (Chauhan et al., Sci. Rep. 8, 11431 (2018); Borgers and Kopell, Neural Comput. 15, 509 (2003)). 

      The reviewer is referring to a phenomenon involving interspike synchronization that generates oscillations with very short periods, comparable to synaptic delay times. Our technique, in contrast, is designed for systems of asynchronously firing neurons forming functional populations whose oscillations emerge on a much longer time scale or are driven by periodic stimuli (e.g., sensory input) with a period much longer than the interspike intervals of individual neurons. The time scale difference we are addressing in our test system is two orders of magnitude.

      The present assumptions are way too simplistic because you cannot account for these commensurability effects with a single parameter like the network phase. There is therefore little confidence that this model can reliably distinguish excitatory from inhibitory synapses when their dynamic properties are not properly taken into account.

      As we explained in our previous responses, in our test system, we can reliably resolve post-synaptic conductance variations at 1/100th of the oscillation period. This is due to a >100X time scale difference between the oscillation period and the synaptic/membrane decay time constants. The efficiency of our method in other systems may vary depending on the relationship between the membrane time constant and the oscillation period. The text now provides a clearer discussion of the method's resolution.

      To interpret post-synaptic conductance profiles in terms of presynaptic inputs (e.g., to reconstruct connectivity), one should consider the input-to-conductance transduction processes.We did not aim to provide a general solution for this step in our paper (hence the title) as these processes may differ for different neurotransmitter systems and involve individual dynamics. However, in our test system, as discussed, the oscillation period is much longer than the synaptic decay times of the fast-acting neurotransmitters involved (i.e., glutamate, glycine, and GABA). This means that the possible phase difference between presynaptic neuronal activity and the corresponding postsynaptic conductances is negligible. This allows for a straightforward interpretation of conductance profiles in terms of the functional connectivity of the network. In other systems, the situation may, of course, be different and additional efforts for inferring the presynaptic activity from postsynaptic conductance profiles may be necessary.

      Line 82, Equation 1 makes extremely crude assumptions that the displacement current (CdV/dt) is negligible and that the ion channel currents are all negligible. Vm(t) is also not defined. The assumption that the activation/inactivation times of all ion channels are small compared to the 10-20ms decay time of synaptic currents is not true in general. Same for the displacement current. The leak conductance is typically g~0.05-0.09ms/cm^2 while C~1uF/cm^2. Therefore the ratio C/g leak is in the 10-20ms range - the same as the typical docking neurotransmitter time in synapses.

      We have explicitly included capacitive current in the model formulation and described the time scale separation requirement that justifies our approach. Additionally, we now explain within the text that the current injection protocol involves hyperpolarizing the recorded neuron to ensure voltage-dependent currents remain deactivated during the recording. The remarkable linearity of the current-voltage relationships observed in the vast majority of recorded neurons provides post-hoc evidence supporting this assumption. For further details, please refer to our responses to Reviewer 2 and Figure 1 Supplemental Figure 1 as an example.

      Models of brainstem CPG circuits have been known to exist for decades: JC Smith et al 2012, Paton et al 2022, Bellingham Clin. Exp. Pharm. And Physiol. 25, 847 (1998); Rubin et al., J. Neurophysiol. 101, 2146 (2009) among others. The present paper does not discuss existing knowledge on respiratory networks and gives the impression of reinventing the wheel from scratch. How will this paper add to existing knowledge?

      We appreciate this comment, and in fact, in the original submitted version of this manuscript, we discussed existing knowledge of respiratory networks, but there was editorial concern that this material was above and beyond the technical aspects that we were trying to convey and therefore may detract from the paper as a technical submission. To strike a balance, we have re-incorporated some of this material in abbreviated form into the Discussion section “Implications of reconstructed synaptic conductance profiles for respiratory functional circuit architecture”.

      Reviewer #2 (Public review):

      Summary:

      By measuring intracellular changes in membrane voltage from a single neuron of the medulla the authors describe a method for determining the balance of excitatory and inhibitory synaptic drive onto a single neuron within this important brain region.

      Strengths:

      This approach could be valuable in describing the microcircuits that generate rhythms within this respiratory control centre. This method could more generally be used to enable microcircuits to be studied without the need for time-consuming anatomical tracing or other more involved electrophysiological techniques.

      Weaknesses:

      This approach involves assuming the reversal potential that is associated with the different permeant ions that underlie the excitation and inhibition as well as the application of Ohms law to estimate the contribution of excitation and inhibitory conductance. My first concern is that this approach relies on a linear I-V relationship between the measured voltage and the estimated reversal potential. However, open rectification is a feature of any I-V relationship generated by asymmetric distributions of ions (see the GHK current equation) and will therefore be a particular issue for the inhibition resulting from asymmetrical Cl- ion gradients across GABA-A receptors. The mixed cation conductance that underlies most synaptic excitation will also generate a non-linear I-V relationship due to the inward rectification associated with the polyamine block of AMPA receptors. Could the authors please speculate what impact these non-linearities could have on results obtained using their approach?

      In our Figure 1 Supplemental Figure 1, we illustrated that I-V relationships for each particular phase of the cycle (except for transitions between inspiration and expiration where our error estimates are greatest) are remarkably linear. 

      In Author response iamge 1 we compare the I-V dependence for Cl- as predicted by the GHK equation and its linear approximation using constant conductance and the Cl- Nernst potential. One can see that in the typical range of voltages used (shown by solid black vertical lines), the linear approximation appears quite adequate.

      Author response image 1.

      This approach has similarities to earlier studies undertaken in the visual cortex that estimated the excitatory and inhibitory synaptic conductance changes that contributed to membrane voltage changes during receptive field stimulation. However, these approaches also involved the recording of transmembrane current changes during visual stimulation that were undertaken in voltage-clamp at various command voltages to estimate the underlying conductance changes. Molkov et al have attempted to essentially deconvolve the underlying conductance changes without this information and I am concerned that this simply may not be possible. 

      This was why we compared the results of our reconstructions applied to current- and voltage-clamp recordings from the same neurons and we found, as illustrated, that the synaptic conductance profiles are qualitatively identical with both techniques.

      The current balance equation (1) cited in this study is based on the parallel conductance model developed by Hodgkin & Huxley. However, one key element of the HH equations is the inclusion of an estimate of the capacitive current generated due to the change in voltage across the membrane capacitance. I would always consider this to be the most important motivation for the development of the voltage-clamp technique in the 1930's. Indeed, without subtraction of the membrane capacitance, it is not possible to isolate the transmembrane current in the way that previous studies have done. In the current study, I feel it is important that the voltage change due to capacitive currents is taken into consideration in some way before the contribution of the underlying conductance changes are inferred.

      We have incorporated the capacitive current into the initial model formulation and established explicit requirements for time scale separation. These requirements justify the application of our method. Specifically, the membrane time constant (C/g ~ 10ms in our test system) must be substantially shorter than the period of network oscillations (T ~ 2s in our test system). Under this condition, aggregate variations in synaptic conductances can be considered slow, allowing us to treat membrane voltage as being in instantaneous equilibrium. This defines the time resolution of our method. Please refer to our responses to Reviewer 1 and the revised manuscript text for a more detailed explanation.

      Studies using acute slicing preparations to examine circuit effects have often been limited to the study of small microcircuits - especially feedforward and feedback interneuron circuits. It is widely accepted that any information gained from this approach will always be compromised by the absence of patterned afferent input from outside the brain region being studied. In this study, descending control from the Pons and the neocortex will not be contributing much to the synaptic drive and ascending information from respiratory muscles will also be absent completely. This may not have been such a major concern if this study was limited to demonstrating the feasibility of a methodological approach. However, this limitation does need to be considered when using an approach of this type to speculate on the prevalence of specific circuit motifs within the medulla (Figure 4). Therefore, I would argue that some discussion of this limitation should be included in this manuscript.

      Our experimental brainstem-spinal cord in situ preparation does include important inputs from the pons that are necessary to generate the 3-phase respiratory pattern (e.g., Smith et al. (2013). Brainstem respiratory networks: building blocks and microcircuits. Trends Neurosci, 36(3), 152-162), but we agree that other inputs such as from midbrain and cortex as well as important peripheral afferents are absent, and we have now noted this limitation in the text at the end of the new section “Implications of reconstructed synaptic conductance profiles for respiratory functional circuit architecture“. We show specific circuit motifs simply to illustrate how our readout of synaptic conductances from single neurons and the information on the main neuronal activity patterns in our experimental preparation can be interpreted. We thought that it would be useful to illustrate and interpret inferred connectivity motifs as an output of our methodological approach. As we now discuss in the section “Implications of reconstructed synaptic conductance profiles for respiratory functional circuit architecture” in response to Reviewer #1, our circuit motifs are consistent with some sets of connections that have been speculated in the literature, but they also provide some novel information about connectivity that we have been able to infer for respiratory circuits from the complex sets of synaptic conductances indicated by our approach. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) My recommendation is to clarify how each neuron population was identified. Individual populations are very hard to identify based on morphology alone in brain slices such as Supplemental Figure 1. I assume the authors identified each population based on their phase difference relative to the inspiratory pulse in the phrenic nerve. This ought to be clarified. 

      Neuronal populations were classified based on their firing patterns within the respiratory cycle. Immunohistochemistry was only used for post-hoc identification of the transmitter phenotype in select neurons. Specifically, recorded neurons were categorized according to the phase range of the respiratory cycle in which they fired and their firing pattern during that range. For example, neurons firing during inspiration (synchronously with the phrenic nerve) with a progressively increasing firing rate were classified as ramp-I, etc., as illustrated in the figure depicting phase-dependent firing patterns. This classification is detailed in the "Firing patterns of respiratory interneurons" sub-section.

      It would also be beneficial to discuss the benefits and limitations of using this preparation relative to brainstem slices and in-vivo preparations (e.g. Moraes et al. J. Physiol. 599, 3237 (2021)) for measuring live network activity.

      We provided the reference to an important recent review (Paton et al. 2022, Advancing respiratory-cardiovascular physiology with the working heart-brainstem preparation over 25 years. J Physiol, 600(9), 2049-2075) on the benefits and limitations of using the in situ rodent brainstem-spinal cord preparation employed in our study. 

      (2) The background on inference methods is similarly thin. The works in line 47 are mainly experimental characterizations of excitatory and inhibitory cells. Techniques for estimating network conductances/parameters ought to be covered. One reference that comes to mind: Armstrong, E. Statistical data assimilation for estimating electrophysiology simultaneously with connectivity within a biological neuronal network. Physical Review E 101, 012415, 2020.

      Our technique is not intended to estimate synaptic connections between neurons from paired recordings. Instead, we calculate the dynamics of inhibitory and excitatory synaptic conductances that result from many concurrent synaptic inputs representing aggregate activities of the functionally interacting populations. The previous studies that we cited are the ones that have direct or indirect relation to this paradigm. 

      (3) How the "patterns of synaptic conductances" in phase diagrams imply the network connectivity (l.244) is not clear. Are the patterns of "activity patterns" depicted in Figure 2 the only neuron populations driving the postsynaptic neurons in Figure 4? 

      Figure 2 shows all of the basic firing patterns that we have recorded in our experimental preparation. So, yes, assuming that all periodic inputs in this network originate from within the network, those 6 populations are the main sources of the corresponding patterns.

      The methodology for constructing the networks is unclear, 

      This is explained in detail in the section "Synaptic Conductances and Functional Connectome of Respiratory Interneurons". Specifically, when a neuron with a given firing pattern (and thus belonging to a corresponding population, e.g., pre-I/I) exhibits excitatory or inhibitory conductance during a particular phase of the respiratory cycle (e.g., inhibition during the first half of expiration, as in Figure 3A1), we infer that the population with the same firing pattern receives input from a population with an activity pattern matching the postsynaptic conductance profile (e.g., the pre-I/I population receives post-I inhibition, as in Figure 4A1).

      yet 6 lines later (l.251) the narrative jumps to the conclusion that "the information on inhibitory transmitter phenotypes...indeed corroborates that subsets of the presynaptic neurons are inhibitory" and further "conductance profiles, which gives additional confidence in the correlation between pre-synaptic firing patterns and likely post-synaptic interactions". The method also blends in empirical information from immune labelling. It is unclear what method can actually infer on its own.

      The functional connections that we were able to infer implied that neurons with specific firing patterns (e.g., post-I neurons) must include neurons with specific transmitter phenotypes (e.g., inhibitory). Immune labeling results were used to show that there are indeed neurons having corresponding firing patterns and neurotransmitter phenotypes. It has nothing to do with the inference method. It just shows that our assumption about various inhibitory inputs originating from within the network is plausible.

      (4) Figure 3 - why does the Early-I population which is connected by the same mutually inhibitory links as Post-I and Aug-E within the respiratory CPG have the opposite conductance activation sequence as post-I and aug-E. Namely, it receives excitatory input at phases 0,1,2 when post-I and aug-E receive inhibitory input?

      We added the section “Implications of reconstructed synaptic conductance profiles for respiratory functional circuit architecture” discussing the correspondence and inconsistencies between our findings and existing respiratory CPG models (see Figure 4 Supplemenntal Figure 1). For this specific question, phase 0, 1 and 2 represent the same phase of the respiratory cycle corresponding to a transition from expiration to inspiration. According to the Rybak et al. models, the early-I population receives excitation from the pre-I/I population which is active at the E-I transition and throughout the entire inspiratory phase of the cycle. This is largely consistent with our findings shown in Figure 3. Also, according to Rybak et al., post-I and aug-E populations are inhibited by early-I neurons, which is also consistent with inspiratory inhibition in all examples of these neurons that we show in Figure 3. As noted in other responses to the reviewers’ comments, we have now discussed in the “Implications of reconstructed synaptic conductance profiles for respiratory functional circuit architecture” which covers some comparisons to previously inferred connectivity in the respiratory network.

      Minor comments:

      (1) l.39 - The terminology "patterns of inhibitory and excitatory synaptic conductances" used throughout the manuscript (l.66, 241, 244, 259...) is vague.

      We defined this terminology in the updated version.

      (2) Figure 1 what is the integration time of the moving median in Figure 1a?

      0.1s. Now included in the figure legend.

      (3) L.128 "rhythmic inspiratory neuron" which one is this post-I, aug-E, early-I?

      This example demonstrates a pre-I/I firing pattern, as the neuron begins firing slightly before the phrenic burst and continues throughout inspiration (as defined by phrenic nerve activity). However, this is merely an arbitrary example used to illustrate the methodology. The actual firing pattern of the recorded neuron is not considered in any way for synaptic conductance inference.

      (4) Figure 3 What the panel labelling means A1, B1, A2, etc. is not disclosed in the caption.

      These labels are used in the text to refer to specific examples. Now it is explained in the caption that the letter corresponds to the firing phenotype indicated on the top of each column and the digit refers to the example number.

      (5) L.129/ L.133 - the diagram of the medulla in Supplementary Figure 1 ought to be inserted early on in the main text when introducing the respiratory CPG, phrenic and vagal signals.

      This is a good suggestion and we have linked this figure specifically to Figure 2 as Figure 2 Supplemental Figure 1 in the main text to better orient readers.

      (6) L. 457 - Reference needed on reversal potentials.

      We report what we observed, so it is unclear what reference the reviewer means.

    1. eLife Assessment

      This neuroimaging and electrophysiology study in a small cohort of congenital cataract patients with sight recovery aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in visual cortex. While contrasting sight-recovery with visually intact controls suggested the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, it provided incomplete evidence supporting claims about the effects of early deprivation itself. The reported data were considered valuable, given the rare study population. However, methodological limitations will likely restrict usefulness to scientists working in this particular subfield.

    2. Reviewer #1 (Public review):

      Summary

      In this human neuroimaging and electrophysiology study, the authors aimed to characterise effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight. First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then perform multiple exploratory correlations between MRS measures and visual acuity and report a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants. The same participants then took part in an EEG experiment. The authors selected two electrodes placed in the visual cortex for analysis and report a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. Control electrodes in the frontal region did not present with the same pattern. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel. Nevertheless, the study provides a rare and valuable insight into experience-dependent plasticity in the human brain.

      Strengths of study

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations

      Low sample size. Ten for CC and ten for SC, and further two SC participants were rejected due to lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      In the updated manuscript, the authors have provided justification for their sample size by pointing to prior studies and the inherent difficulties in recruiting individuals with bilateral congenital cataracts. Importantly, this highlights the value the study brings to the field while also acknowledging the need to replicate the effects in a larger cohort.

      Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from a more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      In the updated version, the authors have indicated that future studies can pursue comparisons between congenital cataract participants and cohorts with later sight loss.

      MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      In the updated version, the authors have added more information that informs the reader of the MRS quality differences between voxel locations. This increases the transparency of their reporting and enhances the assessment of the results.

      Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drives the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised to due congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      The updated manuscript contains key reference from non-human work to justify their interpretation.

      Heterogeneity in patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The updated document has addressed this caveat.

      Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      This has now been done throughout the document and increases the transparency of the reporting.

      P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlates with age.

      This caveat has been addressed in the revised manuscript.

      Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Fig.4. yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      This has been done throughout the document and increases the transparency of the reporting.

      The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      This caveat has been addressed. The authors have added frontal electrodes to their analysis, providing an essential regional control for the visual cortex location.

      Comments on revisions:

      In the first revision, the authors made reasonable adjustments to their manuscript that addressed most of my comments by adding further justification for their methodology, essential literature support, pointing out exploratory analyses, limitations and adding key control analyses. Their revised manuscript was overall improved, providing valuable information, though the evidence that supports their claims is still incomplete.

      In their second revision, the authors pointed to justifications for their analyses, careful interpretation and tempered claims to clarify their response to the initial feedback. However, my assessment of the first revision has not been changed after the second revision, because there were no further modifications of their responses to my feedback.

    3. Reviewer #2 (Public review):

      Summary:

      The study examined 10 congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts, measuring neural activity and neuro chemical profiles from the visual cortex. The declared aim is to test whether restoring visual function after years of complete blindness impacts excitation/inhibition balance in the visual cortex. The manuscript reports precious behavioural, electrophysiological and magnetic resonance data from a rare population. Although the findings are useful for stimulating further research in the field, they only provide incomplete support to the authors' claims.

      The main claim is that sight recovery impacts the excitation/inhibition balance in the visual cortex; however, the paradigm does not allow to distinguish the effects of sight recovery from those of visual deprivation (i.e. in patients who were born blind but recovered vision after several months/years vs. patients who were born blind and never recovered vision); moreover, the link between electrophysiological findings and cortical excitation/inhibition is tentative and its interpretation remains speculative.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways in which this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      The main methodological limitation is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested that Excitation/Inhibition ratio in the visual cortex is increased in congenitally blind patients; the present study reports that E/I ratio decreases instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      There are also more technical limitations related to the correlation analyses, which are partly acknowledged in the manuscript. A bland correlation between GLX/GABA and the visual impairment is reported, but this is specific to the patients group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patients group.

      Conclusions:

      The main claim of the study is that sight recovery impacts the excitation/inhibition balance in the visual cortex, estimated with MRS or through indirect EEG indices. However, due to the weaknesses outlined above, the study cannot distinguish the effects of sight recovery from those of visual deprivation. Moreover, many aspects of the results are interesting but their validation and interpretation require additional experimental work.

      Comments on revisions:

      The authors' revisions did not substantially alter the manuscript. As such, my assessment above remains unaltered.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship and to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration. First of all, I would like to disclose that I am not an expert in congenital visual deprivation, nor in MRS. My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. Second, although the authors addressed some of my concerns on the previous version of this manuscript, major concerns and flaws remain in terms of methodological and statistical approaches along with the (over) interpretation of the results.

      Persistent specific concerns include:<br /> (1 3.1) Response to Variability in Visual Deprivation<br /> Rather than listing the advantages and disadvantages of visual deprivation, I recommend providing at least a descriptive analysis of how the duration of visual deprivation influenced the measures of interest. This would enhance the depth and relevance of the discussion.

      (2 3.2) Small Sample Size<br /> The issue of small sample size remains problematic. The justification that previous studies employed similar sample sizes does not adequately address the limitation in the current study. I strongly suggest that the correlation analyses should not feature prominently in the main manuscript or the abstract, especially if the discussion does not substantially rely on these correlations. Please also revisit the recommendations made in the section on statistical concerns.

      (3 3.3) Statistical Concerns<br /> While I appreciate the effort of conducting an independent statistical check, it merely validates whether the reported statistical parameters, degrees of freedom (df), and p-values are consistent. However, this does not address the appropriateness of the chosen statistical methods.

      Several points require clarification or improvement:

      (4) Correlation Methods: The manuscript does not specify whether the reported correlation analyses are based on Pearson or Spearman correlation.<br /> This has been addressed in the final revision

      (5) Confidence Intervals: Include confidence intervals for correlations to represent the uncertainty associated with these estimates.<br /> This has been addressed in the final revision

      (6) Permutation Statistics: Given the small sample size, I recommend using permutation statistics, as these are exact tests and more appropriate for small datasets.

      (7) Adjusted P-Values: Ensure that reported Bonferroni corrected p-values (e.g., p > 0.999) are clearly labeled as adjusted p-values where applicable.<br /> This has been addressed in the final revision

      (8) Figure 2C<br /> Figure 2C still lacks crucial information that the correlation between Glx/GABA ratio and visual acuity was computed solely in the control group (as described in the rebuttal letter). Why was this analysis restricted to the control group? Please provide a rationale.

      (9 3.4) Interpretation of Aperiodic Signal<br /> Relying on previous studies to interpret the aperiodic slope as a proxy for excitation/inhibition (E/I) does not make the interpretation more robust.

      (10) Additionally, the authors state:<br /> "We cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness."

      (11) This could be addressed directly by including skull thickness as a covariate or visualizing it in scatterplots, for instance, by representing skull thickness as the size of the dots.

      (12 3.5) Problems with EEG Preprocessing and Analysis<br /> Downsampling: The decision to downsample the data to 60 Hz "to match the stimulation rate" is problematic. This choice conflates subsequent spectral analyses due to aliasing issues, as explained by the Nyquist theorem. While the authors cite prior studies (Schwenk et al., 2020; VanRullen & MacDonald, 2012) to justify this decision, these studies focused on alpha (8-12 Hz), where aliasing is less of a concern compared of analyzing aperiodic signal. Furthermore, in contrast, the current study analyzes the frequency range from 1-20 Hz, which is too narrow for interpreting the aperiodic signal asE/I. Typically, this analysis should include higher frequencies, spanning at least 1-30 Hz oreven 1-45 Hz (not 20-40 Hz).

      (13) Baseline Removal: Subtracting the mean activity across an epoch as a baseline removal step is inappropriate for resting-state EEG data. This preprocessing step undermines the validity of the analysis. The EEG dataset has fundamental flaws, many of which were pointed out in the previous review round but remain unaddressed. In its current form, the manuscript falls short of standards for robust EEG analysis.

      (14) The authors mention: "The EEG data sets reported here were part of data published earlier (Ossandón et al.,2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided."<br /> The authors addressed this comment and adjusted the statement. However, I do not understand, why the full sample published earlier (Ossandón et al., 2023) was not used in the current study?

      Comments on revisions:

      The current version of the manuscript is almost unchanged compared to the last version. Unfortunately, I observed that the authors have not adequately addressed most of my previous suggestions; rather, they provided justifications for not incorporating them.

      Given this, I do not see the need to modify my initial assessment.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This neuroimaging and electrophysiology study in a small cohort of congenital cataract patients with sight recovery aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in visual cortex. While contrasting sight-recovery with visually intact controls suggested the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, it provided only incomplete evidence supporting claims about the effects of early deprivation itself. The reported data were considered valuable, given the rare study population. However, the small sample sizes, lack of a specific control cohort and multiple methodological limitations will likely restrict usefulness to scientists working in this particular subfield.

      We thank the reviewing editors for their consideration and updated assessment of our manuscript after its first revision.

      In order to assess the effects of early deprivation, we included an age-matched, normally sighted control group recruited from the same community, measured in the same scanner and laboratory. This study design is analogous to numerous studies in permanently congenitally blind humans, which typically recruited sighted controls, but hardly ever individuals with a different, e.g. late blindness history. In order to improve the specificity of our conclusions, we used a frontal cortex voxel in addition to a visual cortex voxel (MRS). Analogously, we separately analyzed occipital and frontal electrodes (EEG).

      Moreover, we relate our findings in congenital cataract reversal individuals to findings in the literature on permanent congenital blindness. Note, there are, to the best of our knowledge, neither MRS nor resting-state EEG studies in individuals with permanent late blindness.

      Our participants necessarily have nystagmus and low visual acuity due to their congenital deprivation phase, and the existence of nystagmus is a recruitment criterion to diagnose congenital cataracts.

      It might be interesting for future studies to investigate individuals with transient late blindness. However, such a study would be ill-motivated had we not found differences between the most “extreme” of congenital visual deprivation conditions and normally sighted individuals (analogous to why earlier research on permanent blindness investigated permanent congenitally blind humans first, rather than permanently late blind humans, or both in the same study). Any result of these future work would need the reference to our study, and neither results in these additional groups would invalidate our findings.

      Since all our congenital cataract reversal individuals by definition had visual impairments, we included an eyes closed condition, both in the MRS and EEG assessment. Any group effect during the eyes closed condition cannot be due to visual acuity deficits changing the bottom-up driven visual activation.

      As we detail in response to review 3, our EEG analyses followed the standards in the field.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this human neuroimaging and electrophysiology study, the authors aimed to characterise effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects, because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then perform multiple exploratory correlations between MRS measures and visual acuity, and report a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected two electrodes placed in the visual cortex for analysis and report a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. Control electrodes in the frontal region did not present with the same pattern. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel. Nevertheless, the study provides a rare and valuable insight into experience-dependent plasticity in the human brain.

      Strengths of study

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well written.

      Limitations

      Low sample size. Ten for CC and ten for SC, and further two SC participants were rejected due to lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      In the updated manuscript, the authors have provided justification for their sample size by pointing to prior studies and the inherent difficulties in recruiting individuals with bilateral congenital cataracts. Importantly, this highlights the value the study brings to the field while also acknowledging the need to replicate the effects in a larger cohort.

      Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from a more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      In the updated version, the authors have indicated that future studies can pursue comparisons between congenital cataract participants and cohorts with later sight loss.

      MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      In the updated version, the authors have added more information that informs the reader of the MRS quality differences between voxel locations. This increases the transparency of their reporting and enhances the assessment of the results.

      Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drives the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised to due congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      The updated manuscript contains key reference from non-human work to justify their interpretation.

      Heterogeneity in patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The updated document has addressed this caveat.

      Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      This has now been done throughout the document and increases the transparency of the reporting.

      P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlates with age.

      This caveat has been addressed in the revised manuscript.

      Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Fig.4. yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      This has been done throughout the document and increases the transparency of the reporting.

      The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      This caveat has been addressed. The authors have added frontal electrodes to their analysis, providing an essential regional control for the visual cortex location.

      Comments on the latest version:

      The authors have made reasonable adjustments to their manuscript that addressed most of my comments by adding further justification for their methodology, essential literature support, pointing out exploratory analyses, limitations and adding key control analyses. Their revised manuscript has overall improved, providing valuable information, though the evidence that supports their claims is still incomplete.

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The study examined 10 congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts, measuring neural activity and neuro chemical profiles from the visual cortex. The declared aim is to test whether restoring visual function after years of complete blindness impacts excitation/inhibition balance in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways in which this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      The main methodological limitation is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested that Excitation/Inhibition ratio in the visual cortex is increased in congenitally blind patients; the present study reports that E/I ratio decreases instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Since we have not been able to acquire longitudinal data with the experimental design of the present study in congenital cataract reversal individuals, we compared the MRS and EEG results of congenital cataract reversal individuals  to published work in congenitally permanent blind individuals. We consider this as a resource saving approach. We think that the results of our cross-sectional study now justify the costs and enormous efforts (and time for the patients who often have to travel long distances) associated with longitudinal studies in this rare population.

      There are also more technical limitations related to the correlation analyses, which are partly acknowledged in the manuscript. A bland correlation between GLX/GABA and the visual impairment is reported, but this is specific to the patients group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patients group.

      Given the exploratory nature of the correlations, we do not base the majority of our conclusions on this analysis. There are no doubts that the reported correlations need replication; however, replication is only possible after a first report. Thus, we hope to motivate corresponding analyses in further studies.

      It has to be noted that in the present study significance testing for correlations were corrected for multiple comparisons, and that some findings replicate earlier reports (e.g. effects on EEG aperiodic slope, alpha power, and correlations with chronological age).

      Conclusions:

      The main claim of the study is that sight recovery impacts the excitation/inhibition balance in the visual cortex, estimated with MRS or through indirect EEG indices. However, due to the weaknesses outlined above, the study cannot distinguish the effects of sight recovery from those of visual deprivation. Moreover, many aspects of the results are interesting but their validation and interpretation require additional experimental work.

      We interpret the group differences between individuals tested years after congenital visual deprivation and normally sighted individuals as supportive of the E/I ratio being impacted by congenital visual deprivation. In the absence of a sensitive period for the development of an E/I ratio, individuals with a transient phase of congenital blindness might have developed a visual system indistinguishable  from normally sighted individuals. As we demonstrate, this is not so. Comparing the results of congenitally blind humans with those of congenitally permanently blind humans (from previous studies) allowed us to identify changes of E/I ratio, which add to those found for congenital blindness.  

      We thank the reviewer for the helpful comments and suggestions related to the first submission and first revision of our manuscript. We are keen to translate some of them into future studies.

      Reviewer #3 (Public review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship and to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      First of all, I would like to disclose that I am not an expert in congenital visual deprivation, nor in MRS. My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods.

      Although the authors addressed some of the concerns of the previous version, major concerns and flaws remain in terms of methodological and statistical approaches along with the (over)interpretation of the results. Specific concerns include:

      (1 3.1) Response to Variability in Visual Deprivation<br /> Rather than listing the advantages and disadvantages of visual deprivation, I recommend providing at least a descriptive analysis of how the duration of visual deprivation influenced the measures of interest. This would enhance the depth and relevance of the discussion.

      Although Review 2 and Review 3 (see below) pointed out problems in interpreting multiple correlational analyses in small samples, we addressed this request by reporting such correlations between visual deprivation history and measured EEG/MRS outcomes.

      Calculating the correlation between duration of visual deprivation and behavioral or brain measures is, in fact, a common suggestion. The existence of sensitive periods, which are typically assumed to not follow a linear gradual decline of neuroplasticity, does not necessary allow predicting a correlation with duration of blindness. Daphne Maurer has additionally worked on the concept of “sleeper effects” (Maurer et al., 2007), that is, effects on the brain and behavior by early deprivation which are observed only later in life when the function/neural circuits matures.

      In accordance with this reasoning, we did not observe a significant correlation between duration of visual deprivation and any of our dependent variables.

      (2 3.2) Small Sample Size<br /> The issue of small sample size remains problematic. The justification that previous studies employed similar sample sizes does not adequately address the limitation in the current study. I strongly suggest that the correlation analyses should not feature prominently in the main manuscript or the abstract, especially if the discussion does not substantially rely on these correlations. Please also revisit the recommendations made in the section on statistical concerns.

      In the revised manuscript, we explicitly mention that our sample size is not atypical for the special group investigated, but that a replication of our results in larger samples would foster their impact. We only explicitly mention correlations that survived stringent testing for multiple comparisons in the main manuscript.

      Given the exploratory nature of the correlations, we have not based the majority of our claims on this analysis.

      (3 3.3) Statistical Concerns<br /> While I appreciate the effort of conducting an independent statistical check, it merely validates whether the reported statistical parameters, degrees of freedom (df), and p-values are consistent. However, this does not address the appropriateness of the chosen statistical methods.

      We did not intend for the statcheck report to justify the methods used for statistics, which we have done in a separate section with normality and homogeneity testing (Supplementary Material S9), and references to it in the descriptions of the statistical analyses (Methods, Page 13, Lines 326-329 and Page 15, Lines 400-402).

      Several points require clarification or improvement:<br /> (4) Correlation Methods: The manuscript does not specify whether the reported correlation analyses are based on Pearson or Spearman correlation.

      The depicted correlations are Pearson correlations. We will add this information to the Methods.

      (5) Confidence Intervals: Include confidence intervals for correlations to represent the uncertainty associated with these estimates.

      We have added the confidence intervals for all measured correlations to the second revision of our manuscript.

      (6) Permutation Statistics: Given the small sample size, I recommend using permutation statistics, as these are exact tests and more appropriate for small datasets.

      Our study focuses on a rare population, with a sample size limited by the availability of participants. Our findings provide exploratory insights rather than make strong inferential claims. To this end, we have ensured that our analysis adheres to key statistical assumptions (Shapiro-Wilk as well as Levene’s tests, Supplementary Material S9), and reported our findings with effect sizes, appropriate caution and context.

      (7) Adjusted P-Values: Ensure that reported Bonferroni corrected p-values (e.g., p > 0.999) are clearly labeled as adjusted p-values where applicable.

      In the revised manuscript, we have changed Figure 4 to say ‘adjusted p,’  which we indeed reported.

      (8) Figure 2C

      Figure 2C still lacks crucial information that the correlation between Glx/GABA ratio and visual acuity was computed solely in the control group (as described in the rebuttal letter). Why was this analysis restricted to the control group? Please provide a rationale.

      Figure 2C depicts the correlation between Glx/GABA+ ratio and visual acuity in the congenital cataract reversal group, not the control group. This is mentioned in the Figure 2 legend, as well as in the main text where the figure is referred to (Page 18, Line 475).

      The correlation analyses between visual acuity and MRS/EEG measures were only performed in the congenital cataract reversal group since the sighed control group comprised of individuals with vision in the normal range; thus this analyses would not make sense. Table 1 with the individual visual acuities for all participants, including the normally sighted controls, shows the low variance in the latter group.  

      For variables in which no apiori group differences in variance were predicted, we performed the correlation analyses across groups (see Supplementary Material S12, S15).

      We have now highlighted these motivations more clearly in the Methods of the revised manuscript (Page 16, Lines 405-410).

      (9 3.4) Interpretation of Aperiodic Signal

      Relying on previous studies to interpret the aperiodic slope as a proxy for excitation/inhibition (E/I) does not make the interpretation more robust.

      How to interpret aperiodic EEG activity has been subject of extensive investigation. We cite studies which provide evidence from multiple species (monkeys, humans) and measurements (EEG, MEG, ECoG), including studies which pharmacologically manipulated E/I balance.

      Whether our findings are robust, in fact, requires a replication study. Importantly, we analyzed the intercept of the aperiodic activity fit as well, and discuss results related to the intercept.

      Quote:

      “(3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in humans, in addition to monkey ECoG (Muthukumaraswamy & Liley, 2018). Further, Medel et al. (now published as Medel et al., 2023) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG from humans.

      In the introduction of the revised manuscript, we have made more explicit that this metric is indirect (Page 3, Line 91), (additionally see Discussion, Page 24, Lines 644-645, Page 25, Lines 650-657).

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged. We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity. “

      (10) Additionally, the authors state:

      "We cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness."

      (11) This could be addressed directly by including skull thickness as a covariate or visualizing it in scatterplots, for instance, by representing skull thickness as the size of the dots.

      We are not aware of any study that would justify such an analysis.

      Our analyses were based on previous findings in the literature.

      Since to the best of our knowledge, no evidence exists that congenital cataracts go together with changes in skull thickness, and that skull thickness might selectively modulate visual cortex Glx/GABA+ but not NAA measures, we decided against following this suggestion.

      Notably, the neurotransmitter concentration reported here is after tissue segmentation of the voxel region. The tissue fraction was shown to not differ between groups in the MRS voxels (Supplementary Material S4). The EEG electrode impedance was lowered to <10 kOhm in every participant (Methods, Page 13, Line 344), and preparation was identical across groups.

      (12 3.5) Problems with EEG Preprocessing and Analysis

      Downsampling: The decision to downsample the data to 60 Hz "to match the stimulation rate" is problematic. This choice conflates subsequent spectral analyses due to aliasing issues, as explained by the Nyquist theorem. While the authors cite prior studies (Schwenk et al., 2020; VanRullen & MacDonald, 2012) to justify this decision, these studies focused on alpha (8-12 Hz), where aliasing is less of a concern compared of analyzing aperiodic signal. Furthermore, in contrast, the current study analyzes the frequency range from 1-20 Hz, which is too narrow for interpreting the aperiodic signal as E/I. Typically, this analysis should include higher frequencies, spanning at least 1-30 Hz or even 1-45 Hz (not 20-40 Hz).

      As previously mentied in the Methods (Page 15 Line 376) and the previous response, the pop_resample function used by EEGLAB applies an anti-aliasing filter, at half the resampling frequency (as per the Nyquist theorem

      https://eeglab.org/tutorials/05_Preprocess/resampling.html). The upper cut off of the low pass filter set by EEGlab prior to down sampling (30 Hz) is still far above the frequency of interest in the current study  (1-20 Hz), thus allowing us to derive valid results.

      Quote:

      “- The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .”

      Moreover, the resting-state data were not resampled to 60 Hz. We have made this clearer in the Methods of the second revision (Page 15, Line 367).

      Our consistent results of group differences across all three EEG conditions, thus, exclude any possibility that they were driven by aliasing artifacts.

      The expected effects of this anti-aliasing filter can be seen in the attached Author response image 1, showing an example participant’s spectrum in the 1-30 Hz range (as opposed to the 1-20 Hz plotted in the manuscript), clearly showing a 30-40 dB drop at 30 Hz. Any aliasing due to, for example, remaining line noise, would additionally be visible in this figure (as well as Figure 3) as a peak.

      Author response image 1.

      Power spectral density of one congenital cataract-reversal (CC) participant in the visual stimulation condition across all channels. The reduced power at 30 Hz shows the effects of the anti-aliasing filter applied by EEGLAB’s pop_resample function.

      As we stated in the manuscript, and in previous reviews, so far there has been no consensus on the exact range of measuring aperiodic activity. We made a principled decision based on the literature (showing a knee in aperiodic fits of this dataset at 20 Hz) (Medel et al., 2023; Ossandón et al., 2023), data quality (possible contamination by line noise at higher frequencies) and the purpose of the visual stimulation experiment (to look at the lower frequency range by stimulating up to 60 Hz, thereby limiting us to quantifying below 30 Hz), that 1-20 Hz would be the fit range in this dataset.

      Quote:

      “(3) What's the underlying idea of analyzing two separate aperiodic slopes (20-40Hz and 1-19Hz). This is very unusual to compute the slope between 20-40 Hz, where the SNR is rather low.

      "Ossandón et al. (2023), however, observed that in addition to the flatter slope of the aperiodic power spectrum in the high frequency range (20-40 Hz), the slope of the low frequency range (1-19 Hz) was steeper in both, congenital cataract-reversal individuals, as well as in permanently congenitally blind humans."

      The present manuscript computed the slope between 1-20 Hz. Ossandón et al. as well as Medel et al. (2023) found a “knee” of the 1/f distribution at 20 Hz and describe further the motivations for computing both slope ranges. For example, Ossandón et al. used a data driven approach and compared single vs. dual fits and found that the latter fitted the data better. Additionally, they found the best fit if a knee at 20 Hz was used. We would like to point out that no standard range exists for the fitting of the 1/f component across the literature and, in fact, very different ranges have been used (Gao et al., 2017; Medel et al., 2023; Muthukumaraswamy & Liley, 2018). “

      (13) Baseline Removal: Subtracting the mean activity across an epoch as a baseline removal step is inappropriate for resting-state EEG data. This preprocessing step undermines the validity of the analysis. The EEG dataset has fundamental flaws, many of which were pointed out in the previous review round but remain unaddressed. In its current form, the manuscript falls short of standards for robust EEG analysis. If I were reviewing for another journal, I would recommend rejection based on these flaws.

      The baseline removal step from each epoch serves to remove the DC component of the recording and detrend the data. This is a standard preprocessing step (included as an option in preprocessing pipelines recommended by the EEGLAB toolbox, FieldTrip toolbox and MNE toolbox), additionally necessary to improve the efficacy of ICA decomposition (Groppe et al., 2009).

      In the previous review round, a clarification of the baseline timing was requested, which we added. Beyond this request, there was no mention of the appropriateness of the baseline removal and/or a request to provide reasons for why it might not undermine the validity of the analysis.

      Quote:

      “- "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has been explicitly stated in the revised manuscript (Page 13, Line 354).”

      Prior work in the time (not frequency) domain on event-related potential (ERP) analysis has suggested that the baselining step might cause spurious effects (Delorme, 2023) (although see (Tanner et al., 2016)). We did not perform ERP analysis at any stage. One recent study suggests spurious group differences in the 1/f signal might be driven by an inappropriate dB division baselining method (Gyurkovics et al., 2021), which we did not perform.

      Any effect of our baselining procedure on the FFT spectrum would be below the 1 Hz range, which we did not analyze.  

      Each of the preprocessing steps in the manuscript match pipelines described and published in extensive prior work. We document how multiple aspects of our EEG results replicate prior findings (Supplementary Material S15, S18, S19), reports of other experimenters, groups and locations, validating that our results are robust.

      We therefore reject the claim of methodological flaws in our EEG analyses in the strongest possible terms.

      Quote:

      “(3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two occipital channels, O1 and O2 neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023). As control sites we added the frontal channels FP1 and Fp2 (see Supplementary Material S14)

      Neither Ossandón et al. (2023) nor Pant et al. (2023) considered frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations (Methods, Page 14, Lines 365-367). The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used spectrum interpolation to remove line noise; the group differences remained stable (Ossandón et al., 2023). We have reported this analysis in the revised manuscript (Page 14, Lines 364-357).

      Further, both groups were measured in the same lab, making line noise (~ 50 Hz) as an account for the observed group effects in the 1-20 Hz frequency range highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition and the percentage of 6.25 long segments rejected in each group for the visual stimulation condition have been added to the revised manuscript (Supplementary Material S10), and referred to in the Methods on Page 14, Lines 372-373).

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which changed in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; VanRullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has now been explicitly stated in the revised manuscript (Page 14, Lines 379-380).

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the Methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values. Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023). The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former, as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group.

      In the revised manuscript, we added the fit quality metrics (average R<sup>2</sup> values > 0.91 for each group and condition) (Methods Page 15, Lines 395-396; Supplementary Material S11) and additionally show individual subjects’ fits (Supplementary Material S11). “

      (14) The authors mention:

      "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided."

      The authors addressed this comment and adjusted the statement. However, I do not understand, why not the full sample published earlier (Ossandón et al., 2023) was used in the current study?

      The recording of EEG resting state data stated in 2013, while MRS testing could only be set up by the second half of 2019. Moreover, not all subjects who qualify for EEG recording qualify for being scanned (e.g. due to MRI safety, claustrophobia)

      References

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13(1), 2372. https://doi.org/10.1038/s41598-023-27528-0

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, R., Peterson, E. J., & Voytek, B. (2017). Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage, 158(March), 70–78. https://doi.org/10.1016/j.neuroimage.2017.06.078

      Groppe, D. M., Makeig, S., & Kutas, M. (2009). Identifying reliable independent components via split-half comparisons. NeuroImage, 45(4), 1199–1211. https://doi.org/10.1016/j.neuroimage.2008.12.038

      Gyurkovics, M., Clements, G. M., Low, K. A., Fabiani, M., & Gratton, G. (2021). The impact of 1/f activity and baseline correction on the results and interpretation of time-frequency analyses of EEG/MEG data: A cautionary tale. NeuroImage, 237. https://doi.org/10.1016/j.neuroimage.2021.118192

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Maurer, D., Mondloch, C. J., & Lewis, T. L. (2007). Sleeper effects. In Developmental Science. https://doi.org/10.1111/j.1467-7687.2007.00562.x

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Tanner, D., Norton, J. J. S., Morgan-Short, K., & Luck, S. J. (2016). On high-pass filter artifacts (they’re real) and baseline correction (it’s a good idea) in ERP/ERMF analysis. Journal of Neuroscience Methods, 266, 166–170. https://doi.org/10.1016/j.jneumeth.2016.01.002

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

    1. eLife Assessment

      van Vliet and colleagues show a useful correlation between internal states of a convolutional neural network (CNN) trained on visual word stimuli with three specific components of evoked MEG potentials during reading in humans. The findings are solid, though quantitative evidence that model can produce any of the phenomena that the human visual system is known to have (e.g., feedback connections, sensitivity to word frequency), or that it has comparable performance to human behaviour (i.e., similar task accuracy with a comparable pattern of mistakes) would make the conclusions much stronger.

    2. Reviewer #2 (Public review):

      van Vliet and colleagues present results of a study correlating internal states of a convolutional neural network trained on visual word stimuli with evoked MEG potentials during reading.

      In this study, a standard deep learning image recognition model (VGG-11) trained on a large natural image set (ImageNet) that begins illiterate but is then further trained on visual word stimuli, is used on a set of predefined stimulus images to extract strings of characters from "noisy" words, pseudowords and real words. This methodology is used in hopes of creating a model which learns to apply the same nonlinear transforms that could be happening in different regions of the brain - which would be validated by studying the correlations between the weights of this model and neural responses. Specifically, the aim is that the model learns some vector embedding space, as quantified by the spread of activations across a layer's weights (L2 Norm prior to ReLu Activation Function), for the different kinds of stimuli, that creates a parameterized decision boundary that is similar to amplitude changes at different times for a MEG signal. More importantly, the way that the stimuli are ordered or ranked in that space should be separable to the degree we see separation in neural activity. This study does show that the layer weights corresponding to five different broad classes of stimuli do statistically correlate with three specific components in the ERP. However, I believe there are fundamental theoretical issues that limit the implications of the results of this study.

      As has been shown over many decades, there are many potential computational algorithms, with varied model architectures, that can perform the task of text recognition from an image. However, there is no evidence presented here that this particular algorithm has comparable performance to human behavior (i.e. similar accuracy with a comparable pattern of mistakes). This is a fundamental prerequisite before attempting to meaningfully correlate these layer activations to human neural activations. Therefore, it is unlikely that correlating these derived layer weights to neural activity provides meaningful novel insights into neural computation beyond what is seen using traditional experimental methods.

      One example of a substantial discrepancy between this model and neural activations is that, while incorporating frequency weighting into the training data is shown to slightly increase neural correlation with the model, Figure 7 shows that no layer of the model appears directly sensitive to word frequency. This is in stark contrast to the strong neural sensitivity to word frequency seen in EEG (e.g. Dambacher et al 2006 Brain Research), fMRI (e.g. Kronbichler et al 2004 NeuroImage), MEG (e.g. Huizeling et al 2021 Neurobio. Lang.), and intracranial (e.g. Woolnough et al 2022 J. Neurosci.) recordings. Figure 7 also demonstrates that late stages of the model show a strong negative correlation with font size, whereas later stages of neural visual word processing are typically insensitive to differences in visual features, instead showing sensitivity to lexical factors.

      Another example of the mismatch between this model and visual cortex is the lack of feedback connections in the model. Within visual cortex there are extensive feedback connections, with later processing stages providing recursive feedback to earlier stages. This is especially evident in reading, where feedback from lexical level processes feeds back to letter level processes (e.g. Heilbron et al 2020 Nature Comms.). This feedback is especially relevant for reading of words in noisy conditions, as tested in the current manuscript, as lexical knowledge enhances letter representation in visual cortex (the word superiority effect). This results in neural activity in multiple cortical areas varying over time, changing selectivity within a region at different measured time points (e.g. Woolnough et al 2021 Nature Human Behav.), which in the current study is simplified down to three discrete time windows, each attributed to different spatial locations.

      The presented model needs substantial further development to be able to replicate, both behaviorally and neurally, many of the well-characterized phenomena seen in human behavior and neural recordings that are fundamental hallmarks of human visual word processing. Until that point it is unclear what novel contributions can be gleaned from correlating low dimensional model weights from these computational models with human neural data.

      The revised version of this manuscript has not addressed these concerns.

    3. Reviewer #3 (Public review):

      Summary:

      The authors investigate the extent to which the responses of different layers of a vision model (VGG-11) can be linked to the cascade of responses (namely, type-I, type-II and N400) in the human brain when reading words. To achieve maximal consistency between, they add noisy-activations to VGG and finetune it on a character recognition task. In this setup, they observe various similarities between the behavior of VGG and the brain when presented with various transformations of the words (added noise, font modification etc).

      Strengths:<br /> - The paper is well written and well presented<br /> - The topic studied is interesting.<br /> - The fact that the response of the CNN on unseen experimental contrasts such as adding noise correlated with previous results on the brain is compelling.

      Weaknesses:<br /> - The paper is rather qualitative in nature. In particular, the authors show that some resemblance exists between the behavior of some layers and some parts of the brain, but it is hard to quantitively understand how strong the resemblences are in each layer, and the exact impact of experimental settings such as the frequency balancing (which seems to only have a very moderate effect according to figure 5)<br /> - The experiments only consider a rather outdated vision model (VGG)

      Comments on revisions:

      After rebuttal, the authors significantly strengthened their results. I now find the paper much more convincing, and thank the author for their careful consideration of the reviewers' suggestions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their efforts. They have pointed out several shortcomings and made very helpful suggestions. Based on their feedback, we have substantially revised the manuscript and feel the paper has been much improved because of it.

      Notable changes are:

      (1) As our model does not contain feed-back connections, the focus of the study is now more clearly communicated to be on feed-forward processes only, with appropriate justifications for this choice added to the Introduction and Discussion sections. Accordingly, the title has been changed to include the term “feed-forward”.

      (2) The old Figure 5 has been removed in favor of reporting correlation scores to the right of the response profiles in other figures.

      (3) We now discuss changes to the network architecture (new Figure 5) and fine-tuning of the hyperparameters (new Figure 6) in the main text instead of only the Supplementary Information.

      (4) The discussion on qualitative versus quantitative analysis has been extended and given its own subsection entitled “On the importance of experimental contrasts and qualitative analysis of the model”.

      Below, we address each point that the reviewers brought up in detail and outline what improvements we have made in the revision to address them.

      Reviewer #1 (Public Review):

      Summary:

      This study trained a CNN for visual word classification and supported a model that can explain key functional effects of the evoked MEG response during visual word recognition, providing an explicit computational account from detection and segmentation of letter shapes to final word-form identification.

      Strengths:

      This paper not only bridges an important gap in modeling visual word recognition, by establishing a direct link between computational processes and key findings in experimental neuroimaging studies, but also provides some conditions to enhance biological realism.

      Weaknesses:

      The interpretation of CNN results, especially the number of layers in the final model and its relationship with the processing of visual words in the human brain, needs to be further strengthened.

      We have experimented with the number of layers and the number of units in each layer. In the previous version of the manuscript, these results could be found in the supplementary information. For the revised version, we have brought some of these results into the main text and discuss them more thoroughly.

      We have added a figure (Figure 5 in the revised manuscript) showing the impact of the number of convolution and fully-connected layers on the response profiles of the layers, as well as the correlation with the three MEG components.

      We discuss the figure in the Results section as follows:

      “Various variations in model architecture and training procedure were evaluated. We found that the number of layers had a large impact on the response patterns produced by the model (Figure 5). The original VGG-11 architecture defines 5 convolution layers and 3 fully connected layers (including the output layer). Removing a convolution layer (Figure 5, top row), or removing one of the fully connected layers (Figure 5, second row), resulted in a model that did exhibit an enlarged response to noisy stimuli in the early layers that mimics the Type-I response. However, such models failed to show a sufficiently diminished response to noisy stimuli in the later layers, hence failing to produce responses that mimic the Type-II or N400m, a failure which also showed as low correlation scores.

      Adding an additional convolution layer (Figure 5, third row) resulted in a model where none of the layer response profiles mimics that of the Type-II response. The Type-II response is characterized by a reduced response to both noise and symbols, but an equally large response to consonant strings, real and pseudo words. However, in the model with an additional convolution layer, the consonant strings evoked a reduced response already in the first fully connected layer, which is a feature of the N400m rather than the Type-II. These kind of subtleties in the response pattern, which are important for the qualitative analysis, generally did not show quantitatively in the correlation scores, as the fully connected layers in this model correlate as well with the Type-II response as models that did show a response pattern that mimics the Type-II.

      Adding an additional fully connected layer (Figure 5, fourth row) resulted in a model with similar response profiles and correlation with the MEG components as the original VGG-11 architecture (Figure 5, bottom row) The N400m-like response profile is now observed in the third fully connected layer rather than the output layer. However, the decrease in response to consonant strings versus real and pseudo words, which is typical of the N400m, is less distinct than in the original VGG-11 architecture.”

      And in the Discussion section:

      “In the model, convolution units are followed by pooling units, which serve the purpose of stratifying the response across changes in position, size and rotation within the receptive field of the pooling unit. Hence, the effect of small differences in letter shape, such as the usage of different fonts, was only present in the early convolution layers, in line with findings in the EEG literature (Chauncey et al., 2008; Grainger & Holcomb, 2009; Hauk & Pulvermüller, 2004). However, the ability of pooling units to stratify such differences depends on the size of their receptive field, which is determined by the number of convolution-and-pooling layers. As a consequence, the response profiles of the subsequent fully connected layers was also very sensitive to the number of convolution-and-pooling layers. The optimal number of such layers is likely dependent on the input size and pooling strategy. Given the VGG-11 design of doubling the receptive field after each layer, combined with an input size of 225×225 pixels, the optimal number of convolution-andpooling layers for our model was five, or the model would struggle to produce response profiles mimicking those of the Type-II component in the subsequent fully connected layers (Figure 5).”

      Reviewer #1 (Recommendations For The Authors):

      (1) The similarity between CNNs and human MEG responses, including type-I (100ms), type-II (150ms), and N400 (400ms) components, looks like separately, lacking the sequential properties among these three components. Is the recurrent neural network (RNN), which can be trained to process and convert a sequential data input into a specific sequential data output, a better choice?

      When modeling sequential effects, meaning that the processing of the current word is influenced by the word that came before it, such as priming and top-down modulations, we agree that such a model would indeed require recurrency in its architecture. However, we feel that the focus of modeling efforts in reading has been overwhelmingly on the N400 and such priming effects, usually skipping over the pixel-to-letter process. So, for this paper, we were keen on exploring more basic effects such as noise and symbols versus letters on the type-I and type-II responses. And for these effects, a feed-forward model turns out to be sufficient, so we can keep the focus of this particular paper on bottom-up processes during single word reading, on which there is already a lot to say.

      To clarify our focus on feed-forward process, we have modified the title of the paper to be:

      “Convolutional networks can model the functional modulation of the MEG responses associated with feed-forward processes during visual word recognition” furthermore, we have revised the Introduction to highlight this choice, noting:

      “Another limitation is that these models have primarily focused on feed-back lexicosemantic effects while oversimplifying the initial feed-forward processing of the visual input.

      […]

      For this study, we chose to focus on modeling the early feed-forward processing occurring during visual word recognition, as the experimental setup in Vartiainen et al. (2011) was designed to demonstrate.

      […]

      By doing so, we restrict ourselves to an investigation of how well the three evoked components can be explained by a feed-forward CNN in an experimental setting designed to demonstrate feed-forward effects. As such, the goal is not to present a complete model of all aspects of reading, which should include feed-back effects, but rather to demonstrate the effectiveness of using a model that has a realistic form of input when the aim is to align the model with the evoked responses observed during visual word recognition.”

      And in the Discussion section:

      “In this paper we have restricted our simulations to feed-forward processes. Now, the way is open to incorporate convolution-and-pooling principles in models of reading that simulate feed-back processes as well, which should allow the model to capture more nuance in the Type-II and N400m components, as well as extend the simulation to encompass a realistic semantic representation.”

      (2) There is no clear relationship between the layers that signal needs to traverse in the model and the relative duration of the three components in the brain.

      While some models offer a tentative mapping between layers and locations in the brain, none of the models we are aware of actually simulate time accurately and our model is no exception.

      While we provide some evidence that the three MEG components are best modeled with different types of layers, and the type-I becomes somewhere before type-II and N400m is last in our model, the lack of timing information is a weakness of our model we have not been able to address. In our previous version, this already was the main topic of our “Limitations of the model” section, but since this weakness was pointed out by all reviewers, we have decided to widen our discussion of it:

      “One important limitation of the current model is the lack of an explicit mapping from the units inside its layers to specific locations in the brain at specific times. The temporal ordering of the components is simulated correctly, with the response profile matching that of the type-I occurring the layers before those matching the type-II, followed by the N400m. Furthermore, every component is best modeled by a different type of layer, with the type-I best described by convolution-and-pooling, the type-II by fully-connected linear layers and the N400m by a one-hot encoded layer. However, there is no clear relationship between the number of layers the signal needs to traverse in the model to the processing time in the brain. Even if one considers that the operations performed by the initial two convolution layers happen in the retina rather than the brain, the signal needs to propagate through three more convolution layers to reach the point where it matches the type-II component at 140-200 ms, but only through one more additional layer to reach the point where it starts to match the N400m component at 300-500 ms. Still, cutting down on the number of times convolution is performed in the model seems to make it unable to achieve the desired suppression of noise (Figure 5). It also raises the question what the brain is doing during the time between the type-II and N400m component that seems to take so long. It is possible that the timings of the MEG components are not indicative solely of when the feed-forward signal first reaches a certain location, but are rather dictated by the resolution of feed-forward and feedback signals (Nour Eddine et al., 2024).”

      See also our response to the next comment of the Reviewer, in which we dive more into the effect of the number of layers, which could be seen as a manipulation of time.

      (3) I am impressed by the CNN that authors modified to match the human brain pattern for the visual word recognition process, by the increase and decrease of the number of layers. The result of this part was a little different from the author’s expectation; however, the author didn’t explain or address this issue.

      We are glad to hear that the reviewer found these results interesting. Accordingly, we now discuss these results more thoroughly in the main text.

      We have moved the figure from the supplementary information to the main text (Figure 5 in the revised manuscript). And describe the results in the Results section:

      “Various variations in model architecture and training procedure were evaluated. We found that the number of layers had a large impact on the response patterns produced by the model (Figure 5). The original VGG-11 architecture defines 5 convolution layers and 3 fully connected layers (including the output layer). Removing a convolution layer (Figure 5, top row), or removing one of the fully connected layers (Figure 5, second row), resulted in a model that did exhibit an enlarged response to noisy stimuli in the early layers that mimics the Type-I response. However, such models failed to show a sufficiently diminished response to noisy stimuli in the later layers, hence failing to produce responses that mimic the Type-II or N400m, a failure which also showed as low correlation scores.

      Adding an additional convolution layer (Figure 5, third row) resulted in a model where none of the layer response profiles mimics that of the Type-II response. The Type-II response is characterized by a reduced response to both noise and symbols, but an equally large response to consonant strings, real and pseudo words. However, in the model with an additional convolution layer, the consonant strings evoked a reduced response already in the first fully connected layer, which is a feature of the N400m rather than the Type-II. These kind of subtleties in the response pattern, which are important for the qualitative analysis, generally did not show quantitatively in the correlation scores, as the fully connected layers in this model correlate as well with the Type-II response as models that did show a response pattern that mimics the Type-II.

      Adding an additional fully connected layer (Figure 5, fourth row) resulted in a model with similar response profiles and correlation with the MEG components as the original VGG-11 architecture (Figure 5, bottom row) The N400m-like response profile is now observed in the third fully connected layer rather than the output layer. However, the decrease in response to consonant strings versus real and pseudo words, which is typical of the N400m, is less distinct than in the original VGG-11 architecture.”

      We also incorporated these results in the Discussion:

      “However, the ability of pooling units to stratify such differences depends on the size of their receptive field, which is determined by the number of convolution-andpooling layers. This might also explain why, in later layers, we observed a decreased response to stimuli where text was rendered with a font size exceeding the receptive field of the pooling units (Figure 8). Hence, the response profiles of the subsequent fully connected layers was very sensitive to the number of convolution-and-pooling layers. This number is probably dependent on the input size and pooling strategy. Given the VGG11 design of doubling the receptive field after each layer, combined with an input size of 225x225 pixels, the optimal number of convolution-and-pooling layers for our model was five, or the model would struggle to produce response profiles mimicking those of the type-II component in the subsequent fully connected layers (Figure 5).

      […]

      A minimum of two fully connected layers was needed to achieve this in our case, and adding more fully connected layers would make them behave more like the component (Figure 5).”

      (4) Can the author explain why the number of layers in the final model is optimal by benchmarking the brain hierarchy?

      We have incorporated the figure describing the correlation between each model and the MEG components (previously Figure 5) with the figures describing the response profiles (Figures 4 and 5 in the revised manuscript and Supplementary Figures 2-6). This way, we (and the reader) can now benchmark every model qualitatively and quantitatively.

      As we stated in our response to the previous comment, we have added a more thorough discussion on the number of layers, which includes the justification for our choice for the final model. The benchmark we used was primarily whether the model shows the same response patterns as the Type I, Type II and N400 responses, which disqualifies all models with fewer than 5 convolution and 3 fully connected layers. Models with more layers also show the proper response patterns, however we see that there is actually very little difference in the correlation scores between different models. Hence, our justification for sticking with the original VGG11 architecture is that it produces the qualitative best response profiles, while having roughly the same (decently high) correlation with the MEG components. Furthermore, by sticking to the standard architecture, we make it slightly easier to replicate our results as one can use readily available pre-trained ImageNet weights.

      As well as always discussing the correlation scores in tandem with the qualitative analysis, we have added the following statement to the Results:

      “Based on our qualitative and quantitative analysis, the model variant that performed best overall was the model that had the original VGG11 architecture and was preinitialized from earlier training on ImageNet, as depicted in the bottom rows of Figure 4 and Figure 5.”

      Reviewer #2 (Public Review):

      As has been shown over many decades, many potential computational algorithms, with varied model architectures, can perform the task of text recognition from an image. However, there is no evidence presented here that this particular algorithm has comparable performance to human behavior (i.e. similar accuracy with a comparable pattern of mistakes). This is a fundamental prerequisite before attempting to meaningfully correlate these layer activations to human neural activations. Therefore, it is unlikely that correlating these derived layer weights to neural activity provides meaningful novel insights into neural computation beyond what is seen using traditional experimental methods.

      We very much agree with the reviewer that a qualitative analysis of whether the model can explain experimental effects needs to happen before a quantitative analysis, such as evaluating model-brain correlation scores. In fact, this is one of the intended key points we wished to make.

      As we discuss at length in the Introduction, “traditional” models of reading (those that do not rely on deep learning) are not able to recognize a word regardless of exact letter shape, size, and (up to a point) rotation. In this study, our focus is on these low-level visual tasks rather than high-level tasks concerning semantics. As the Reviewer correctly states, there are many potential computational algorithms able to perform these visual task at a human level and so we need to evaluate the model not only on its ability to mimic human accuracy but also on generating a comparable pattern of mistakes. In our case, we need a pattern of behavior that is indicative of the visual processes at the beginning of the reading pipeline. Hence, rather than relying on behavioral responses that are produced at the very end, we chose the evaluate the model based on three MEG components that provide “snapshots” of the reading process at various stages. These components are known to manifest a distinct pattern of “behavior” in the way they respond to different experimental conditions (Figure 2), akin to what to Reviewer refers to as a “pattern of mistakes”. The model was first evaluated on its ability to replicate the behavior of the MEG components in a qualitative manner (Figure 4). Only then do we move on to a quantitative correlation analysis. In this manner, we feel we are in agreement with the approach advocated by the Reviewer.

      In the Introduction, we now clarify:

      “Another limitation is that these models have primarily focused on feed-back lexicosemantic effects while oversimplifying the initial feed-forward processing of the visual input.

      […]

      We sought to construct a model that is able to recognize words regardless of length, size, typeface and rotation, as well as humans can, so essentially perfectly, whilst producing activity that mimics the type-I, type-II, and N400m components which serve as snapshots of this process unfolding in the brain.

      […]

      These variations were first evaluated on their ability to replicate the experimental effects in that study, namely that the type-I response is larger for noise embedded words than all other stimuli, the type-II response is larger for all letter strings than symbols, and that the N400m is larger for real and pseudowords than consonant strings. Once a variation was found that could reproduce these effects satisfactorily, it was further evaluated based on the correlation between the amount of activation of the units in the model and MEG response amplitude.”

      To make this prerequisite more clear, we have removed what was previously Figure 5, which showed the correlation between the various models the MEG components out of the context of their response patterns. Instead, these correlation values are now always presented next to the response patterns (Figures 4 and 5, and Supplementary Figures 2-6 in the revised manuscript). This invites the reader to always consider these metrics in relation to one another.

      One example of a substantial discrepancy between this model and neural activations is that, while incorporating frequency weighting into the training data is shown to slightly increase neural correlation with the model, Figure 7 shows that no layer of the model appears directly sensitive to word frequency. This is in stark contrast to the strong neural sensitivity to word frequency seen in EEG (e.g. Dambacher et al 2006 Brain Research), fMRI (e.g. Kronbichler et al 2004 NeuroImage), MEG (e.g. Huizeling et al 2021 Neurobio. Lang.), and intracranial (e.g. Woolnough et al 2022 J. Neurosci.) recordings. Figure 7 also demonstrates that the late stages of the model show a strong negative correlation with font size, whereas later stages of neural visual word processing are typically insensitive to differences in visual features, instead showing sensitivity to lexical factors.

      We are glad the reviewer brought up the topic of frequency balancing, as it is a good example of the importance of the qualitative analysis. Frequency balancing during training only had a moderate impact on correlation scores and from that point of view does not seem impactful. However, when we look at the qualitative evaluation, we see that with a large vocabulary, a model without frequency balancing fails to properly distinguish between consonant strings and (pseudo)words (Figure 4, 5th row). Hence, from the point of view of being able to reproduce experimental effects, frequency balancing had a large impact. We now discuss this more explicitly in the revised Discussion section:

      “Overall, we found that a qualitative evaluation of the response profiles was more helpful than correlation scores. Often, a deficit in the response profile of a layer that would cause a decrease in correlation on one condition would be masked by an increased correlation in another condition. A notable example is the necessity for frequency-balancing the training data when building models with a vocabulary of 10 000. Going by correlation score alone, there does not seem to be much difference between the model trained with and without frequency balancing (Figure 4A, fifth row versus bottom row). However, without frequency balancing, we found that the model did not show a response profile where consonant strings were distinguished from words and pseudowords (Figure 4A, fifth row), which is an important behavioral trait that sets the N400m component apart from the Type-II component (Figure 2D). This underlines the importance of the qualitative evaluation in this study, which was only possible because of a straightforward link between the activity simulated within a model to measurements obtained from the brain, combined with the presence of clear experimental conditions.”

      It is true that the model, even with frequency balancing, only captures letter- and bigramfrequency effects and not the word-frequency effects that we know the N400m is sensitive to. Since our model is restricted to feed-forward processes, this finding adds to the evidence that frequency-modulated effects are driven by feed-back effects as modeled by Nour Eddine et al. (2024, doi:10.1016/j.cognition.2024.105755). See also our response to the next comment by the Reviewer where we discuss feed-back connections. We have added the following to the section about model limitations in the revised Discussion:

      “The fact that the model failed to simulate the effects of word-frequency on the N400m (Figure 8), even after frequency-balancing of the training data, is additional evidence that this effect may be driven by feed-back activity, as for example modeled by Nour Eddine et al. (2024).”

      Like the Reviewer, we initially thought that later stages of neural visual word processing would be insensitive to differences in font size. When diving into the literature to find support for this claim, we found only a few works directly studying the effect of font size on evoked responses, but, surprisingly, what we did find seemed to align with our model. We have added the following to our revised Discussion:

      “The fully connected linear layers in the model show a negative correlation with font size. While the N400 has been shown to be unaffected by font size during repetition priming (Chauncey et al., 2008), it has been shown that in the absence of priming, larger font sizes decrease the evoked activity in the 300–500 ms window (Bayer et al., 2012; Schindler et al., 2018). Those studies refer to the activity within this time window, which seems to encompass the N400, as early posterior negativity (EPN). What possibly happens in the model is that an increase in font size causes an initial stronger activation in the first layers, due to more convolution units receiving input. This leads to a better signal-to-noise ratio (SNR) later on, as the noise added to the activation of the units remains constant whilst the amplitude of the input signal increases. A better SNR translates ultimately in less co-activation of units corresponding to orthographic neighbours in the final layers, hence to a decrease in overall layer activity.”

      Another example of the mismatch between this model and the visual cortex is the lack of feedback connections in the model. Within the visual cortex, there are extensive feedback connections, with later processing stages providing recursive feedback to earlier stages. This is especially evident in reading, where feedback from lexical-level processes feeds back to letter-level processes (e.g. Heilbron et al 2020 Nature Comms.). This feedback is especially relevant for the reading of words in noisy conditions, as tested in the current manuscript, as lexical knowledge enhances letter representation in the visual cortex (the word superiority effect). This results in neural activity in multiple cortical areas varying over time, changing selectivity within a region at different measured time points (e.g. Woolnough et al 2021 Nature Human Behav.), which in the current study is simplified down to three discrete time windows, each attributed to different spatial locations.

      We agree with the Reviewer that a full model of reading in the brain must include feed-back connections and share their sentiment that these feed-back processes play an important role and are a fascinating topic to study. The intent for the model presented in our study is very much to be a stepping stone towards extending the capabilities of models that do include such connections.

      However, there is a problem of scale that cannot be ignored.

      Current models of reading that do include feedback connections fall into the category we refer to in the paper as “traditional models” and all only a few layers deep and operate on very simplified inputs, such as pre-defined line segments, a few pixels, or even a list of prerecognized letters. The Heilbron et al. 2020 study that the Reviewer refers to is a good example of such a model. (This excellent and relevant work was somehow overlooked in our literature discussion in the Introduction. We thank the Reviewer for pointing it out to us.) Models incorporating realistic feed-back activity need these simplifications, because they have a tendency to no longer converge when there are too many layers and units. However, in order for models of reading to be able to simulate cognitive behavior such as resolving variations in font size or typeface, or distinguish text from non-text, they need to operate on something close to the pixel-level data, which means they need many layers and units.

      Hence, as a stepping stone, it is reasonable to evaluate a model that has the necessary scale, but lacks the feed-back connections that would be problematic at this scale, to see what it can and cannot do in terms of explaining experimental effects in neuroimaging studies. This was the intended scope of our study. For the revision, we have attempted to make this more clear.

      We have changed the title to be:

      “Convolutional networks can model the functional modulation of the MEG responses associated with feed-forward processes during visual word recognition” and added the following to the Introduction:

      “The simulated environments in these models are extremely simplified, partly due to computational limitations and partly due to the complex interaction of feed-forward and feed-back connectivity that causes problems with convergence when the model grows too large. Consequently, these models have primarily focused on feed-back lexico-semantic effects while oversimplifying the initial feed-forward processing of the visual input. 

      […]

      This rather high level of visual representation sidesteps having to deal with issues such as visual noise, letters with different scales, rotations and fonts, segmentation of the individual letters, and so on. More importantly, it makes it impossible to create the visual noise and symbol string conditions used in the MEG study to modulate the type-I and type-II components. In order to model the process of visual word recognition to the extent where one may reproduce neuroimaging studies such as Vartiainen et al. (2011), we need to start with a model of vision that is able to directly operate on the pixels of a stimulus. We sought to construct a model that is able to recognize words regardless of length, size, typeface and rotation with very high accuracy, whilst producing activity that mimics the type-I, type-II, and N400m components which serve as snapshots of this process unfolding in the brain. For this model, we chose to focus on the early feed-forward processing occurring during visual word recognition, as the experimental setup in the MEG study was designed to demonstrate, rather than feed-back effects

      […]

      By doing so, we restrict ourselves to an investigation of how well the three evoked components can be explained by a feed-forward CNN in an experimental setting designed to demonstrate feed-forward effects. > As such, the goal is not to present a complete model of all aspects of reading, which should include feed-back effects, but rather to demonstrate the effectiveness of using a model that has a realistic form of input when the aim is to align the model with the evoked responses observed during visual word recognition.”

      And we have added the following to the Discussion section:

      “In this paper we have restricted our simulations to feed-forward processes. Now, the way is open to incorporate convolution-and-pooling principles in models of reading that simulate feed-back processes as well, which should allow the model to capture more nuance in the Type-II and N400m components, as well as extend the simulation to encompass a realistic semantic representation. A promising way forward may be to use a network architecture like CORNet (Kubilius et al., 2019), that performs convolution multiple times in a recurrent fashion, yet simultaneously propagates activity forward after each pass. The introduction of recursion into the model will furthermore align it better with traditional-style models, since it can cause a model to exhibit attractor behavior (McLeod et al., 2000), which will be especially important when extending the model into the semantic domain.

      Furthermore, convolution-and-pooling has recently been explored in the domain of predictive coding models (Ororbia & Mali, 2023), a type of model that seems particularly well suited to model feed-back processes during reading (Gagl et al., 2020; Heilbron et al., 2020; Nour Eddine et al., 2024).”

      We also would like to point out to the Reviewer that we did in fact perform a correlation between the model and the MNE-dSPM source estimate of all cortical locations and timepoints (Figure 7B). Such a brain-wide correlation map confirms that the three dipole groups are excellent summaries of when and where interesting effects occur within this dataset.

      The presented model needs substantial further development to be able to replicate, both behaviorally and neurally, many of the well-characterized phenomena seen in human behavior and neural recordings that are fundamental hallmarks of human visual word processing. Until that point, it is unclear what novel contributions can be gleaned from correlating low-dimensional model weights from these computational models with human neural data.

      We hope that our revisions have clarified the goals and scope of this study. The CNN model we present in this study is a small but, we feel, essential piece in a bigger effort to employ deep learning techniques to further enhance already existing models of reading. In our revision, we have extended our discussion where to go from here and outline our vision on how these techniques could help us better model the phenomena the reviewer speaks of. We agree with the reviewer that there is a long way to go, and we are excited to be a part of it.

      In addition to the changes described above, we now end the Discussion section as follows: 

      “Despite its limitations, our model is an important milestone for computational models of reading that leverages deep learning techniques to encompass the entire computational process starting from raw pixels values to representations of wordforms in the mental lexicon. The overall goal is to work towards models that can reproduce the dynamics observed in brain activity observed during the large number of neuroimaging experiments performed with human volunteers that have been performed over the last few decades. To achieve this, models need to be able to operate on more realistic inputs than a collection of predefined lines or letter banks (for example: Coltheart et al., 2001; Heilbron et al., 2020; Laszlo & Armstrong, 2014; McClelland & Rumelhart, 1981; Nour Eddine et al., 2024). We have shown that even without feed-back connections, a CNN can simulate the behavior of three important MEG evoked components across a range of experimental conditions, but only if unit activations are noisy and the frequency of occurrence of words in the training dataset mimics their frequency of use in actual language.”

      Reviewer #3 (Public Review):

      The paper is rather qualitative in nature. In particular, the authors show that some resemblance exists between the behavior of some layers and some parts of the brain, but it is hard to quantitively understand how strong the resemblances are in each layer, and the exact impact of experimental settings such as the frequency balancing (which seems to only have a very moderate effect according to Figure 5).

      The large focus on a qualitative evaluation of the model is intentional. The ability of the model to reproduce experimental effects (Figure 4) is a pre-requisite for any subsequent quantitative metrics (such as correlation) to be valid. The introduction of frequency balancing is a good example of this. As the reviewer points out, frequency balancing during training has only a moderate impact on correlation scores and from that point of view does not seem impactful. However, when we look at the qualitative evaluation, we see that with a large vocabulary, a model without frequency balancing fails to properly distinguish between consonant strings and (pseudo)words (Figure 4, 5th row). Hence, from the point of view of being able to reproduce experimental effects, frequency balancing has a large impact.

      That said, the reviewer is right to highlight the value of quantitative analysis. An important limitation of the “traditional” models of reading that do not employ deep learning is that they operate in unrealistically simplified environments (e.g. input as predefined line segments, words of a fixed length), which makes a quantitative comparison with brain data problematic. The main benefit that deep learning brings may very well be the increase in scale that makes more direct comparisons with brain data possible. In our revision we attempt to capitalize on this benefit more. The reviewer has provided some helpful suggestions for doing so in their recommendations, which we discuss in detail below.

      We have added the following discussion on the topic of qualitative versus quantitative analysis to the Introduction:

      “We sought to construct a model that is able to recognize words regardless of length, size, typeface and rotation, as well as humans can, so essentially perfectly, whilst producing activity that mimics the type-I, type-II, and N400m components which serve as snapshots of this process unfolding in the brain.

      […]

      These variations were first evaluated on their ability to replicate the experimental effects in that study, namely that the type-I response is larger for noise embedded words than all other stimuli, the type-II response is larger for all letter strings than symbols, and that the N400m is larger for real and pseudowords than consonant strings. Once a variation was found that could reproduce these effects satisfactorily, it was further evaluated based on the correlation between the amount of activation of the units in the model and MEG response amplitude.”

      And follow this up in the Discussion with a new sub-section entitled “On the importance of experimental contrasts and qualitative analysis of the model”

      The experiments only consider a rather outdated vision model (VGG).

      VGG was designed to use a minimal number of operations (convolution-and-pooling, fullyconnected linear steps, ReLU activations, and batch normalization) and rely mostly on scale to solve the classification task. This makes VGG a good place to start our explorations and see how far a basic CNN can take us in terms of explaining experimental MEG effects in visual word recognition. However, we agree with the reviewer that it is easy to envision more advanced models that could potentially explain more. In our revision, we expand on the question of where to go from here and outline our vision on what types of models would be worth investigating and how one may go about doing that in a way that provides insights beyond higher correlation values.

      We have included the following in our Discussion sub-sections on “Limitations of the current model and the path forward”:

      “The VGG-11 architecture was originally designed to achieve high image classification accuracy on the ImageNet challenge (Simonyan & Zisserman, 2015). Although we have introduced some modifications that make the model more biologically plausible, the final model is still incomplete in many ways as a complete model of brain function during reading.

      […]

      In this paper we have restricted our simulations to feed-forward processes. Now, the way is open to incorporate convolution-and-pooling principles in models of reading that simulate feed-back processes as well, which should allow the model to capture more nuance in the Type-II and N400m components, as well as extend the simulation to encompass a realistic semantic representation. A promising way forward may be to use a network architecture like CORNet (Kubilius et al., 2019), that performs convolution multiple times in a recurrent fashion, yet simultaneously propagates activity forward after each pass. The introduction of recursion into the model will furthermore align it better with traditional-style models, since it can cause a model to exhibit attractor behavior (McLeod et al., 2000), which will be especially important when extending the model into the semantic domain. Furthermore, convolution-and-pooling has recently been explored in the domain of predictive coding models (Ororbia & Mali, 2023), a type of model that seems particularly well suited to model feed-back processes during reading (Gagl et al., 2020; Heilbron et al., 2020; Nour Eddine et al., 2024).”

      Reviewer #3 (Recommendations For The Authors):

      (1) The method used to select the experimental conditions under which the behavior of the CNN is the most brain-like is rather qualitative (Figure 4). It would have been nice to have a plot where the noisyness of the activations, the vocab size and the amount of frequency balancing are varied continuously, and show how these three parameters impact the correlation of the model layers with the MEG responses.

      We now include this analysis (Figure 6 in the revised manuscript, Supplementary Figures 47) and discuss these factors in the revised Results section:

      “Various other aspects of the model architecture were evaluated which ultimately did not lead to any improvements of the model. The response profiles can be found in the supplementary information (Supplementary Figures 4–7) and the correlations between the models and the MEG components are presented in Figure 6. The vocabulary of the final model (10 000) exceeds the number of units in its fullyconnected layers, which means that a bottleneck is created in which a sub-lexical representation is formed. The number of units in the fully-connected layers, i.e. the width of the bottleneck, has some effect on the correlation between model and brain (Figure 6A), and the amount of noise added to the unit activations less so (Figure 6B). We already saw that the size of the vocabulary, i.e. the number of wordforms in the training data and number of units in the output layer of the model, had a large effect on the response profiles (Figure 4). Having a large vocabulary is of course desirable from a functional point of view, but also modestly improves correlation between model and brain (Figure 6C). For large vocabularies, we found it beneficial to apply frequency-balancing of the training data, meaning that the number of times a word-form appears in the training data is scaled according to its frequency in a large text corpus. However, this cannot be a one-to-one scaling, since the most frequent words occur so much more often than other words that the training data would consist of mostly the top-ten most common words, with less common words only occurring once or not at all. Therefore, we decided to scale not by the frequency 𝑓 directly, but by 𝑓𝑠, where 0 < 𝑠 < 1, opting for 𝑠 = 0.2 for the final model (Figure 6D).”

      (2) It is not clear which layers exactly correspond to which of the three response components. For this to be clearer, it would have been nice to have a plot with all the layers of VGG on the x-axis and three curves corresponding to the correlation of each layer with each of the three response components.

      This is a great suggestion that we were happy to incorporate in the revised version of the manuscript. Every figure comparing the response patterns of the model and brain now includes a panel depicting the correlation between each layer of the model and each of the three MEG components (Figures 4 & 5, Supplementary Figures 2-5). This has given us (and now also the reader) the ability to better benchmark the different models quantitatively, adding to our discussion on qualitative to quantitative analysis.

      (3) It is not clear to me why the authors report the correlation of all layers with the MEG responses in Figure 5: why not only report the correlation of the final layers for N400, and that of the first layers for type-I?

      We agree with the reviewer that it would have been better to compare the correlation scores for those layers which response profile matches the MEG component. While the old Figure 5 has been merged with Figure 4, and now provides the correlations between all the layers and all MEG components, we have taken the Reviewer’s advice and marked the layers which qualitatively best correspond to each MEG component, so the reader can take that into account when interpreting the correlation scores.

      (4) The authors mention that the reason that they did not reproduce the protocol with more advanced vision models is that they needed the minimal setup capable of yielding the desired experiment effect. I am not fully convinced by this and think the paper could be significantly strengthened by reporting results for a vision transformer, in particular to study the role of attention layers which are expected to play an important role in processing higher-level features.

      We appreciate and share the Reviewer’s enthusiasm in seeing how other model architectures would fare when it comes to modeling MEG components. However, we regard modifying the core model architecture (i.e., a series of convolution-and-pooling followed by fully-connected layers) to be out of scope for the current paper.

      One of the key points of our study is to create a model that reproduces the experimental effects of an existing MEG study, which necessitates modeling the initial feed-forward processing from pixel to word-form. For this purpose, a convolution-and-pooling model was the obvious choice, because these operations play a big role in cognitive models of vision in general. In order to properly capture all experimental contrasts in the MEG study, many variations of the CNN were trained and evaluated. This iterative design process concluded when all experimental contrasts could be faithfully reproduced.

      If we were to explore different model architectures, such as a transformer architecture, reproducing the experimental contrasts of the MEG study would no longer be the end goal, and it would be unclear what the end goal should be. Maximizing correlation scores has no end, and there are a nearly endless number of model architectures one could try. We could bring in a second MEG study with experimental contrasts that the CNN cannot explain and a transformer architecture potentially could and set the end goal to explain all experimental effects in both MEG studies. But even if we had access to such a dataset, this would almost double the length of the paper, which is already too long.

    1. eLife Assessment

      Hardly anything is known about the genetic basis and mechanism of male-killing. Recently, a gene called oscar, in the bacterium Wolbachia, was implicated in killing male corn borer moths by interfering with moth genes that control sex determination and proper dosage of sex-specific genes. In this paper, the authors show that a distantly related oscar gene in another strain of Wolbachia kills male tea tortrix moths in a similar mechanism. This valuable study cements our understanding of the sophisticated way that Wolbachia kills male moths and butterflies (Lepidoptera) so early in their development. The conclusions are supported by solid evidence.

    2. Reviewer #1 (Public review):

      Summary:

      Insects and their relatives are commonly infected with microbes that are transmitted from mothers to their offspring. A number of these microbes have independently evolved the ability to kill the sons of infected females very early in their development; this male killing strategy has evolved because males are transmission dead-ends for the microbe. A major question in the field has been to identify the genes that cause male killing and to understand how they work. This has been especially challenging because most male-killing microbes cannot be genetically manipulated. This study focuses on a male-killing bacterium called Wolbachia. Different Wolbachia strains kill male embryos in beetles, flies, moths, and other arthropods. This is remarkable because how sex is determined differs widely in these hosts. Two Wolbachia genes have been previously implicated in male-killing by Wolbachia: oscar (in moth male-killing) and wmk (in fly male-killing). The genomes of some male-killing Wolbachia contain both of these genes, so it is a challenge to disentangle the two.

      This paper provides strong evidence that oscar is responsible for male-killing in moths. Here, the authors study a strain of Wolbachia that kills males in a pest of tea, Homona magnanima. Overexpressing oscar, but not wmk, kills male moth embryos. This is because oscar interferes with masculinizer, the master gene that controls sex determination in moths and butterflies. Interfering with the masculinizer gene in this way leads the (male) embryo down a path of female development, which causes problems in regulating the expression of genes that are found on the sex chromosomes.

      Strengths:

      The authors use a broad number of approaches to implicate oscar, and to dissect its mechanism of male lethality. These approaches include: a) overexpressing oscar (and wmk) by injecting RNA into moth eggs, b) determining the sex of embryos by staining female sex chromosomes, c) determining the consequences of oscar expression by assaying sex-specific splice variants of doublesex, a key sex determination gene, and by quantifying gene expression and dosage of sex chromosomes, using RNASeq, and d) expressing oscar along with masculinizer from various moth and butterfly species, in a silkmoth cell line. This extends recently published studies implicating oscar in male-killing by Wolbachia in Ostrinia corn borer moths, although the Homona and Ostrinia oscar proteins are quite divergent. Combined with other studies, there is now broad support for oscar as the male-killing gene in moths and butterflies (i.e. order Lepidoptera).

    3. Reviewer #2 (Public review):

      Wolbachia are maternally transmitted bacteria that can manipulate host reproduction in various ways. Some Wolbachia induce male killing (MK), where the sons of infected mothers are killed during development. Several MK-associated genes have been identified in Homona magnanima, including Hm-oscar and wmk-1-4, but the mechanistic links between these Wolbachia genes and MK in the native host are still unclear.

      In this manuscript, Arai et al. show that Hm-oscar is the gene responsible for Wolbachia-induced MK in Homona magnanima. They provide evidence that Hm-Oscar functions through interactions with the sex determination system. They also found that Hm-Oscar disrupts sex determination in male embryos by inducing female-type dsx splicing and impairing dosage compensation. Additionally, Hm-Oscar suppresses the function of Masc. The manuscript is well-written and presents intriguing findings. The results support their conclusions regarding the diversity and commonality of MK mechanisms, contributing to our understanding of the mechanisms and evolutionary aspects of Wolbachia-induced MK.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Insects and their relatives are commonly infected with microbes that are transmitted from mothers to their offspring. A number of these microbes have independently evolved the ability to kill the sons of infected females very early in their development; this male killing strategy has evolved because males are transmission dead-ends for the microbe. A major question in the field has been to identify the genes that cause male killing and to understand how they work. This has been especially challenging because most male-killing microbes cannot be genetically manipulated. This study focuses on a male-killing bacterium called Wolbachia. Different Wolbachia strains kill male embryos in beetles, flies, moths, and other arthropods. This is remarkable because how sex is determined differs widely in these hosts. Two Wolbachia genes have been previously implicated in male-killing by Wolbachia: oscar (in moth male-killing) and wmk (in fly male-killing). The genomes of some male-killing Wolbachia contain both of these genes, so it is a challenge to disentangle the two.

      This paper provides strong evidence that oscar is responsible for male-killing in moths. Here, the authors study a strain of Wolbachia that kills males in a pest of tea, Homona magnanima. Overexpressing oscar, but not wmk, kills male moth embryos. This is because oscar interferes with masculinizer, the master gene that controls sex determination in moths and butterflies. Interfering with the masculinizer gene in this way leads the (male) embryo down a path of female development, which causes problems in regulating the expression of genes that are found on the sex chromosomes.

      We would like to thank you for evaluating our manuscript.

      Strengths:

      The authors use a broad number of approaches to implicate oscar, and to dissect its mechanism of male lethality. These approaches include: a) overexpressing oscar (and wmk) by injecting RNA into moth eggs, b) determining the sex of embryos by staining female sex chromosomes, c) determining the consequences of oscar expression by assaying sex-specific splice variants of doublesex, a key sex determination gene, and by quantifying gene expression and dosage of sex chromosomes, using RNASeq, and d) expressing oscar along with masculinizer from various moth and butterfly species, in a silkmoth cell line. This extends recently published studies implicating oscar in male-killing by Wolbachia in Ostrinia corn borer moths, although the Homona and Ostrinia oscar proteins are quite divergent. Combined with other studies, there is now broad support for oscar as the male-killing gene in moths and butterflies (i.e. order Lepidoptera). So an outstanding question is to understand the role of wmk. Is it the master male-killing gene in insects other than Lepidoptera and if so, how does it operate?

      We would like to thank you for evaluating our manuscript. Our data demonstrated that Oscar homologs play important roles in male-killing phenotypes in moths and butterflies; however, the functional relevance of wmk remains uncertain. As you noted, whether wmk acts as a male-killing gene in insects such as flies and beetles—or even in certain lepidopteran species—requires further investigation using diverse insect models, which we are eager to explore in future research.

      Weaknesses:

      I found the transfection assays of oscar and masculinizer in the silkworm cell line (Figure 4) to be difficult to follow. There are also places in the text where more explanation would be helpful for non-experts.

      Thank you for your suggestion. We have revised the section on the cell-based experiment. Further, we revised the manuscript to make it accessible to a broader audience. We believe these revisions have significantly improved the clarity and comprehensiveness of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      Wolbachia are maternally transmitted bacteria that can manipulate host reproduction in various ways. Some Wolbachia induce male killing (MK), where the sons of infected mothers are killed during development. Several MK-associated genes have been identified in Homona magnanima, including Hm-oscar and wmk-1-4, but the mechanistic links between these Wolbachia genes and MK in the native host are still unclear.

      In this manuscript, Arai et al. show that Hm-oscar is the gene responsible for Wolbachia-induced MK in Homona magnanima. They provide evidence that Hm-Oscar functions through interactions with the sex determination system. They also found that Hm-Oscar disrupts sex determination in male embryos by inducing female-type dsx splicing and impairing dosage compensation. Additionally, Hm-Oscar suppresses the function of Masc. The manuscript is well-written and presents intriguing findings. The results support their conclusions regarding the diversity and commonality of MK mechanisms, contributing to our understanding of the mechanisms and evolutionary aspects of Wolbachia-induced MK.

      We would like to thank you for evaluating our manuscript.

      Comments on revisions:

      The authors have already addressed the reviewer's concerns.

      We would like to thank you for evaluating our manuscript.

      Reviewer #3 (Public review):

      Summary:

      Overall, this is a clearly written manuscript with nice hypothesis testing in a non-model organism that addresses the mechanism of Wolbachia-mediated male killing. The authors aim to determine how five previously identified male-killing genes (encoded in the prophage region of the wHm Wolbachia strain) impact the native host, Homona magnanima moths. This work builds on the authors' previous studies in which

      (1) they tested the impact of these same wHm genes via heterologous expression in Drosophila melanogaster

      (2) also examined the activity of other male-killing genes (e.g., from the wFur Wolbachia strain in its native host: Ostrinia furnacalis moths).

      Advances here include identifying which wHm gene most strongly recapitulates the male-killing phenotype in the native host (rather than in Drosophila), and the finding that the Hm-Oscar protein has the potential for male-killing in a diverse set of lepidopterans, as inferred by the cell-culture assays.

      We would like to thank you for evaluating our manuscript.

      Strengths:

      Strengths of the manuscript include the reverse genetics approaches to dissect the impact of specific male-killing loci, and use of a "masculinization" assay in Lepidopteran cell lines to determine the impact of interactions between specific masc and oscar homologs.

      We would like to thank you for evaluating our manuscript.

      Weaknesses:

      It is clear from Figure 1 that the combinations of wmk homologs do not cause male killing on their own here. While I largely agree with the author's conclusions that oscar is the primary MK factor in this system, I don't think we can yet rule out that wmk(s) may work synergistically or interactively with oscar in vivo. This might be worth a small note in the discussion. (eg at line 294 'indicating that wmk likely targets factors other than masc." - this could be downstream of the impacts of oscar; perhaps dependent on oscar-mediated impacts on masc first).

      We sincerely appreciate your suggestion. Whilst wmk genes themselves did not exhibit apparent lethal effects on the native host, as you noted, we cannot entirely rule out the possibility that wmk may be involved in male-killing actions, either directly or indirectly assisting the function of Hb-oscar. Following your suggestion, we have added a brief note in the discussion section regarding the interpretation of wmk functions.

      “In addition, Katsuma et al. (2022) reported that the wmk homologs encoded by wFur did not affect the masculinizing function of masc in vitro, indicating that wmk likely targets factors other than masc. Whilst we cannot rule out the possibility that wmk may work synergistically or interactively with oscar in vivo—potentially acting downstream of oscar’s impact—our results strongly suggested that Wolbachia strains have acquired multiple MK genes through evolution.” (lines 287-292)

      Regarding the perceived male-bias in Figure 2a: I think readers might be interpreting "unhatched" as "total before hatching". You could eliminate ambiguity by perhaps splitting the bars into male and female, and then within a bar, coloring by hatched versus unhatched. But this is a minor point, and I think the updated text helps clarify this.

      Thank you for your suggestion. We have accordingly revised the figure 2a. In addition, we have included more detailed information in the first sentence of the section Males are killed mainly at the embryonic stage.

      “The sex of hatched larvae (neonates) and the remaining unhatched embryos was determined by the presence or absence of W chromatin, a condensed structure of the female-specific W chromosome observed during interphase.” (lines 171-173)

      The new Figure 4b looks to be largely redundant with the oscar information in Figure 1a.

      Thank you for your suggestion. We have removed Figure 4b due to its overlap with Figure 1a and have incorporated relevant figure legends into the Figure 1a legend.

      Updated statistical comparisons for the RNA-seq analysis are helpful. However these analyses are based on single libraries (albeit each a pool of many individuals), so this is still a weaker aspect of the manuscript.

      Thank you for your suggestion. As you noted, the use of single libraries (due to the limited number of available individuals, though each includes approximately 50 males and females) may be a potential limitation of this study. However, as demonstrated in the qPCR assay for the Z-linked gene provided in the previous revision, we believe that our data and conclusion—that Wolbachia/ Hb-oscar disrupts dosage compensation by causing the overexpression of Z-linked genes—are well-supported and robust.

      The new information on masc similarity is useful (Fig 4d) - if the authors could please include a heatmap legend for the colors, that would be helpful. Also, please avoid green and red in the same figure when key for interpretation.

      Thank you for your suggestion. We have accordingly included a heatmap legend and revised the colors.

      Figure 1A "helix-turn-helix" is misspelled. ("tern").

      We have revised.

      Recommendations for the authors:

      Comments from the reviewing editor: I would suggest you address the comments of the reviewer on the revised version.

      We have further revised the manuscript to address all the questions, comments and suggestions provided by the reviewers. We believe that the resulting revisions have significantly enhanced the quality and comprehensiveness of our manuscript.

      Reviewer #1 (Recommendations for the authors):

      Thank you for revising this manuscript. I have a few last recommendations:

      - Line 214: re: 'Statistical data are available in the supplementary data file', it would be more helpful to add a few words here that actually summarize the statistical results

      We would like to thank you for your suggestion. We have revised the sentence to describe the overview of the statistical results.

      “RNA-seq analysis revealed that, in Hm-oscar-injected embryos, Z-linked genes (homologs on the B. mori chromosomes 1 and 15) were more expressed in males than in females (Fig. 3a), which was not observed in the GFP-injected group (Fig. 3b). Similarly, as previously reported by Arai et al. (2023a), high levels of Z-linked gene expression were also observed in wHm-t-infected males, but not in NSR males (Fig. 3c,d). The high (i.e., doubled) Z-linked gene expression in both Hm-oscar-expressed and wHm-t-infected males was further confirmed by quantification of the Z-linked Hmtpi gene (Fig. 3e). These trends were statistically supported, with all data available in the supplementary data file.” (lines 205-213)

      - Figure 1 legend: do you mean 'bridged' instead of 'brigged'?

      We have accordingly revise, thank you for the suggestion.

    1. eLife Assessment

      The authors have developed a biosensor for programmed cell death. They use this biosensor to provide cell death measurements in a specific early development time. The findings useful in a specific context; however, the application of this biosensor is incomplete as it does not take into account existing literature and is missing controls.

    2. Joint Public Review:

      Summary:

      Jia and colleagues developed a fluorescence resonance energy transfer (FRET)-based biosensor to study programmed cell death in the zebrafish spinal cord. They applied this tool to study death of zebrafish spinal motor neurons.

      Strengths:

      Their analysis shows that the tool is a useful biosensor of motor neuron apoptosis in living zebrafish and can reveal which part of the neuron undergoes caspase activation first.

      Weaknesses:

      As far as it is possible to tell, the authors focus on death of motor neurons innervating axial muscles. Previous work from over 30 years ago revealed that only a small number of these motor neurons die early in development. So this is not new, although following the cells and learning details of their apoptosis is new. Most of the work on motor neuron death in tetrapods was carried out on limb innervating motor neurons. Zebrafish have paired pectoral and pelvic fins, homologs of tetrapod paired limbs. These fins are innervated by distinct sets of motor neurons in zebrafish, as they are in tetrapods. However, the authors have not focused on these particular motor neurons, and thus have not made a fair comparison with tetrapods. In fact, they do not tell us which spinal levels they observed or whether they have been consistent from animal to animal. Pelvic fins emerge much later than pectoral fins in zebrafish, so it is possible that the time frame during which the authors imaged motor neuron death does not include motor neurons innervating pelvic fins.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      (1) The results do not support the conclusions. The main "selling point" as summarized in the title is that the apoptotic rate of zebrafish motorneurons during development is strikingly low (~2% ) as compared to the much higher estimate (~50%) by previous studies in other systems. The results used to support the conclusion are that only a small percentage (under 2%) of apoptotic cells were found over a large population at a variety of stages 24-120hpf. This is fundamentally flawed logic, as a short-time window measure of percentage cannot represent the percentage on the long-term. For example, at any year under 1% of human population die, but over 100 years >99% of the starting group will have died. To find the real percentage of motorneurons that died, the motorneurons born at different times must be tracked over long term, or the new motorneuron birth rate must be estimated. Similar argument can be applied to the macrophage results.<br />

      In the revised manuscript (revised Figure 4), we extended the observation time window as long as possible, from 24 hpf to 240 hpf. After 240 hpf, the transparency of zebrafish body decreased dramatically, which made optical imaging quite difficult.

      We are confident that this 24-240 hpf time window covers the major time window during which motor neurons undergo programmed cell death during zebrafish early development. We chose the observation time window based on the following two reasons: 1) Previous studies showed that although the time windows of motor neuron death vary in chick (E5-E10), mouse (E11.5-E15.5), rat (E15-E18), and human (11-25 weeks of gestation), the common feature of these time windows is that they are all the developmental periods when motor neurons contact with muscle cells. The contact between zebrafish motor neurons and muscle cells occurs before 72 hpf, which is included in our observation time window. 2) Most organs of zebrafish form before 48-72 hpf, and they complete hatching during 48-72 hpf. Food-seeking and active avoidance behaviors also start at 72 hpf, indicating that motor neurons are fully functional at 72 hpf.

      Previous studies in zebrafish have shown that the production of spinal cord motor neurons largely ceases before 48 hpf, and then the motor neurons remain largely constant until adulthood (doi: 10.1016/j.celrep.2015.09.050; 10.1016/j.devcel.2013.04.012; 10.1007/BF00304606; 10.3389/fcell.2021.640414). Our observation time window covers the major motor neuron production process. Therefore, we believe that neurogenesis will not affect our findings and conclusions.

      Although we are confident that 240 h tracking is long enough to measure the motor neuron death rate, several sentences have been added in the discussion part, “In our manuscript, we tracked the motor neuron death in live zebrafish until 240 hpf, which was the longest time window we could achieve. But there was still a possibility that zebrafish motor neurons might die after 240 hpf.”

      We agreed that the “2%” description might not be very accurate. Thus, we have revised our title to “Zebrafish live imaging reveals a surprisingly small percentage of spinal cord motor neurons die during early development.”

      (2) The conclusion regarding timing of axon and cell body caspase activation and apoptosis timing also has clear issues. The ~minutes measurement are too long as compared to the transport/diffusion timescale between the cell body and the axon, caspase activity could have been activated in the cell body and either caspase or the cleaved sensor move to the axon in several seconds. The authors' results are not high frequency enough to resolve these dynamics. Many statements suggest oversight of literature, for example, in abstract "however, there is still no real-time observation showing this dying process in live animals.".

      Real-time imaging of live animals is quite challenging in the field. Currently, using confocal microscopy, we can only achieve minute-scale tracking. In the future, with more advanced imaging techniques, the sensor fish in the present study may provide us with more detailed information on motor neuron death. We have removed “real-time” from our revised manuscript. We also revised the mentioned sentence in the abstract.

      (3) Many statements should use more scholarly terms and descriptions from the spinal cord or motorneuron, neuromuscular development fields, such as line 87 "their axons converged into one bundle to extend into individual somite, which serves as a functional unit for the development and contraction of muscle cells"

      We have removed this sentence.

      (4) The transgenic line is perhaps the most meaningful contribution to the field as the work stands. However, mnx1 promoter is well known for its non-specific activation - while the images do suggest the authors' line is good, motorneuron markers should be used to validate the line. This is especially important for assessing this population later as mnx1 may be turned off in mature neurons. The author's response regarding mnx1 specificity does not mitigate the original concern.

      The mnx1 promoter has been widely used to label motor neurons in transgenic zebrafish. Previous studies have shown that most of the cells labeled in the mnx1 transgenic zebrafish are motor neurons. In this study, we observed that the neuronal cells in our sensor zebrafish formed green cell bodies inside of the spinal cord and extended to the muscle region, which is an important morphological feature of the motor neurons.

      Furthermore, a few of those green cell bodies turned into blue apoptotic bodies inside the spinal cord and changed to blue axons in the muscle regions at the same time, which strongly suggests that those apoptotic neurons are not interneurons.

      In fact, no matter what method is used, such as using antibodies to stain specific markers to label motor neurons, 100% specificity cannot be achieved. More importantly, although the mnx1 promoter might have labeled some interneurons, this will not affect our major finding that only a small percentage of spinal cord motor neurons die during the early development of zebrafish.

      Reviewer 2:

      (1) Title: The 50% figure of motor neurons dying through apoptosis during early vertebrate development is not precisely accurate. In papers referenced by the authors, there is a wide distribution of percentages of motor neurons that die depending on the species and the spinal cord region. In addition, the authors did not examine limb-innervating motor neurons, which are the ones best studied in motor neuron programmed cell death in other species. Thus, a better title that reflects what they actually show would be something like "A surprisingly small percentage of early developing zebrafish motor neurons die through apoptosis in non-limb innervating regions of the spinal cord."

      In fish, there are no such structures as limbs, although fins may be evolutionarily related to limbs. In our manuscript, we studied the naturally occurring motor neuron death in the whole spinal cord during the early stage of zebrafish development. The death of motor neurons in limb-innervating motor neurons has been extensively studied in chicks and rodents, as it is easy to undergo operations such as amputation. However, previous studies have shown this dramatic motor neuron death occurs not only in limb-innervating motor neurons but also in other spinal cord motor neurons (doi: 10.1006/dbio.1999.9413).

      We have revised our title to “Zebrafish live imaging reveals a surprisingly small percentage of spinal cord motor neurons die during early development.”

      (2) lines 18-19: "embryonic stage of vertebrates" is very broad, since zebrafish are also vertebrates; it would be better to be more specific

      lines 25-26: The authors should be more specific about which animals have widespread neuronal cell death.

      We have revised our manuscript accordingly.

      (3) lines 98-99; 110-111; 113; 122-123; 140-141: A cell can undergo apoptosis. But an axon, which is only part of a cell, cannot undergo apoptosis. Especially since the axon doesn't have a separate nucleus, and the definition of apoptosis usually includes nuclear fragmentation. A better subheading would describe the result, which is that caspase activation is seen in both the cell body and the axon.

      We have revised the subheadings and related words in the manuscript accordingly. In the introduction, we also revised the expression of the third aim from “Which part of a neuron (cell body vs. axon) will die first?” to “Which part of a neuron (cell body vs. axon) will degrade first?”.

      (4) lines 159-160; 178-179: This is an oversimplification of the literature. The authors should spell out which populations of motor neuron have been examined and say something about the similarities and difference in motor neuron death.

      We have revised it accordingly.

      (5) lines 200; 216: The authors did not observe macrophages engulfing motor neurons. But that does not mean that they cannot. Making the conclusion stated in this subheading would require some kind of experiment, not just observations.

      We did observe few colocalizations of macrophages and dead motor neurons.  To more accurately express these data, in the revised manuscript, we used “colocalization” to replace “engulfment.” The subheading has been revised to “Most dead motor neurons were not colocalized with macrophages.” Accordingly, panel C of Figure 5 has also been revised.

      (6) lines 234-246: The authors seem to have missed the point about VaP motor neuron death, which was two-fold. First, VaP death has been previously described, thus it could serve as a control for the work in this paper, especially since the conditions underlying VaP death and survival have been experimentally tested. Second, they should acknowledge that previous work showed that at least some motor neuron death in zebrafish differs from that described in chick and rodents. This conclusion came from work showing that death of VaP is independent of limitations in muscle innervation area, suggesting it is not coupled to muscle-derived neurotrophic factors.

      Figures: The authors should say which level of the spinal cord they examined in each figure.

      We have compared our findings with previous findings in the revised manuscript. The death of VaP motor neurons is not related to neurotrophic factors, but the death of other motor neurons may be related to neurotrophic factors, which needs further study and evidence. Our study examined the overall motor neuron apoptosis regardless of the causes and locations. To avoid misunderstanding, in the revised manuscript, we removed the data and words related to neurotrophic factors.

      We also extended the observation time window as long as possible, from 24 hpf to 240 hpf (revised Figure 4). After 240 hpf, the transparency of zebrafish body decreased dramatically, which made the optical imaging quite difficult.

    1. eLife Assessment

      It is known from model organisms that genes' effects on traits are often modulated by environmental variables, but similar gene-by-environment (GxE) interactions have been difficult to detect using statistical analyses of genomic data, e.g., in humans. This study introduces a new framework to estimate gene-by-environment effects, treating it as a bias-variance tradeoff problem. The authors convincingly show that greater statistical power can be achieved in detecting GxE if an underlying model of polygenic GxE is assumed. This polygenic amplification model is a truly novel view with fundamental promise for the detection of GxE in genomic datasets, especially with continued development to detect more complex signals of amplification.

    2. Reviewer #1 (Public review):

      Experiments in model organisms have revealed that the effects of genes on heritable traits are often mediated by environmental factors -- so-called gene-by-environment (or GxE) interactions. In human genetics, however, where indirect statistical approaches must be taken to detect GxE, limited evidence has been found for pervasive GxE interactions. The present manuscript argues that the failure of statistical methods to detect GxE may be due to how GxE is modelled (or not modelled) by these methods.

      The authors show, via re-analysis of an existing dataset in Drosophila, that a polygenic 'amplification' model can parsimoniously explain patterns of differential genetic effects across environments. (Work from the same lab had previously shown that the amplification model is consistent with differential genetic effects across the sexes for a number of traits in humans.) The parsimony of the amplification model allows for powerful detection of GxE in scenarios in which it pertains, as the authors show via simulation.

      Before the authors consider polygenic models of GxE, however, they present a very clear analysis of a related question around GxE: When one wants to estimate the effect of an individual allele in a particular environment, when is it better to stratify one's sample by environment (reducing sample size, and therefore increasing the variance of the estimator) versus using the entire sample (including individuals not in the environment of interest, and therefore biasing the estimator away from the true effect specific to the environment of interest)? Intuitively, the sample-size cost of stratification is worth paying if true allelic effects differ substantially between the environment of interest and other environments (i.e., GxE interactions are large), but not worth paying if effects are similar across environments. The authors quantify this trade-off in a way that is both mathematically precise and conveys the above intuition very clearly. They argue on its basis that, when allelic effects are small (as in highly polygenic traits), single-locus tests for GxE may be substantially underpowered.

      The paper is an important further demonstration of the plausibility of the amplification model of GxE, which, given its parsimony, holds substantial promise for the detection and characterization of GxE in genomic datasets. However, the empirical and simulation examples considered in the paper (and previous work from the same lab) are somewhat "best-case" scenarios for the amplification model, with only two environments and with these environments amplifying equally the effects of only a single set of genes. It would be an important step forward to demonstrate the possibility of detecting amplification in more complex scenarios, with multiple environments each differentially modulating the effects of multiple sets of genes. This could be achieved via simulations similar to those presented in the current manuscript.

      Comments on revisions:

      The authors have (with reasonable justification) said that my main recommendations for strengthening the conclusions of the paper are beyond its scope, and they have thoughtfully responded to my (and the other reviewer's) other comments. The paper is now more clearly written---in particular, the connection between the single-locus bias-variance tradeoff calculations and the polygenic results is much more transparent than before. Given that the authors have (again, with fair justification) chosen not to address my major comment, my broad assessment of the paper is unchanged---I think it is an important contribution to a critical topic---and I have no further comments for its improvement (though I note an issue with figure referencing in the captions of Supplementary Figs S2 and S3).

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Experiments in model organisms have revealed that the effects of genes on heritable traits are often mediated by environmental factors---so-called gene-by-environment (or GxE) interactions. In human genetics, however, where indirect statistical approaches must be taken to detect GxE, limited evidence has been found for pervasive GxE interactions. The present manuscript argues that the failure of statistical methods to detect GxE may be due to how GxE is modelled (or not modelled) by these methods.

      The authors show, via re-analysis of an existing dataset in Drosophila, that a polygenic ‘amplification’ model can parsimoniously explain patterns of differential genetic effects across environments. (Work from the same lab had previously shown that the amplification model is consistent with differential genetic effects across the sexes for several traits in humans.) The parsimony of the amplification model allows for powerful detection of GxE in scenarios in which it pertains, as the authors show via simulation.

      Before the authors consider polygenic models of GxE, however, they present a very clear analysis of a related question around GxE: When one wants to estimate the effect of an individual allele in a particular environment, when is it better to stratify one’s sample by environment (reducing sample size, and therefore increasing the variance of the estimator) versus using the entire sample (including individuals not in the environment of interest, and therefore biasing the estimator away from the true effect specific to the environment of interest)? Intuitively, the sample-size cost of stratification is worth paying if true allelic effects differ substantially between the environment of interest and other environments (i.e., GxE interactions are large), but not worth paying if effects are similar across environments. The authors quantify this trade-off in a way that is both mathematically precise and conveys the above intuition very clearly. They argue on its basis that, when allelic effects are small (as in highly polygenic traits), single-locus tests for GxE may be substantially underpowered.

      The paper is an important further demonstration of the plausibility of the amplification model of GxE, which, given its parsimony, holds substantial promise for the detection and characterization of GxE in genomic datasets. However, the empirical and simulation examples considered in the paper (and previous work from the same lab) are somewhat “best-case” scenarios for the amplification model, with only two environments, and with these environments amplifying equally the effects of only a single set of genes. It would be an important step forward to demonstrate the possibility of detecting amplification in more complex scenarios, with multiple environments each differentially modulating the effects of multiple sets of genes. This could be achieved via simulations similar to those presented in the current manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Wine et al. describe a framework to view the estimation of gene-context interaction analysis through the lens of bias-variance tradeoff. They show that, depending on trait variance and context-specific effect sizes, effect estimates may be estimated more accurately in context-combined analysis rather than in context-specific analysis. They proceed by investigating, primarily via simulations, implications for the study or utilization of gene-context interaction, for testing and prediction, in traits with polygenic architecture. First, the authors describe an assessment of the identification of context-specificity (or context differences) focusing on “top hits” from association analyses. Next, they describe an assessment of polygenic scores (PGSs) that account for context-specific effect sizes, showing, in simulations, that often the PGSs that do not attempt to estimate context-specific effect sizes have superior prediction performance. An exception is a PGS approach that utilizes information across contexts. Strengths:

      The bias-variance tradeoff framing of GxE is useful, interesting, and rigorous. The PGS analysis under pervasive amplification is also interesting and demonstrates the bias-variance tradeoff.

      Weaknesses:

      The weakness of this paper is that the first part -- the bias-variance tradeoff analysis -- is not tightly connected to, i.e. not sufficiently informing, the later parts, that focus on polygenic architecture. For example, the analysis of “top hits” focuses on the question of testing, rather than estimation, and testing was not discussed within the bias-variance tradeoff framework. Similarly, while the PGS analysis does demonstrate (well) the bias-variance tradeoff, the reader is left to wonder whether a bias-variance deviation rule (discussed in the first part of the manuscript) should or could be utilized for PGS construction.

      We thank the editors and the reviewers for their thoughtful critique and helpful suggestions throughout. In our revision, we focused on tightening the relationship between the analytical single variant bias-variance tradeoff derivation and the various empirical analyses that follow.

      We improved discussion of our scope and what is beyond our scope. For example, our language was insufficiently clear if it suggested to the editor and reviewers that we are developing a method to characterize polygenic GxE. Developing a new method that does so (let alone evaluating performance across various scenarios) is beyond the scope of this manuscript.

      Similarly, we clarify that we use amplification only as an example of a mode of GxE that is not adequately characterized by current approaches. We do not wish to argue it is an omnibus explanation for all GxE in complex traits. In many cases, a mixture of polygenic GxE relationships seems most fitting (as observed, for example, in Zhu et al., 2023, for GxSex in human physiology).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      MAJOR COMMENT

      The amplification model is based on an understanding of gene networks in which environmental variables concertedly alter the effects of clusters of genes, or modules, in the network (e.g., if an environmental variable alters the effect of some gene, it indirectly and proportionately alters the effects of genes downstream of that gene in the network---or upstream if the gene acts as a bottleneck in some pathway). It is clear in this model that (i) multiple environmental variables could amplify distinct modules, and (ii) a single environmental variable could itself amplify multiple separate modules, with a separate amplification factor for each module.

      However, perhaps inspired by their previous work on GxSex interactions in humans, the authors’ focus in the present manuscript is on cases where there are only two environments (“control” and “high-sugar diet” in the Drosophila dataset that they reanalyze, and “A” and “B” in their simulations [and single-locus mathematical analysis]), and they consider models where these environments amplify only a single set of genes, i.e., with a single amplification factor. While it is of course interesting that a single-amplification-factor model can generate data that resemble those in the Drosophila dataset that the authors re-analyze, most scenarios of amplification GxE will presumably be more complex. It seems that detecting amplification in these more complex scenarios using methods such as the authors do in their final section will be correspondingly more difficult. Indeed, in the limit of sufficiently many environmental variables amplifying sufficiently many modules, the scenario would resemble one of idiosyncratic single-locus GxE which, as the authors argue, is very difficult to detect. That more complex scenarios of amplification, with multiple environments separately amplifying multiple modules each, might be difficult to detect statistically is potentially an important limitation to the authors’ approach, and should be tested in their simulations.

      We agree that characterizing GxE when there is a mixture of drivers of context-dependency is difficult. Developing a method that does so across multiple (and perhaps not pre-defined) contexts is of high interest to us but beyond the scope of the current manuscript

      We note that for GxSex, modeling this mixture does generally improve phenotypic prediction, and more so in traits where we infer amplification as a major mode of GxE.

      MINOR COMMENTS

      Lines 88-90: “This estimation model is equivalent to a linear model with a term for the interaction between context and reference allele count, in the sense that context-specific allelic effect estimators have the same distributions in the two models.”

      Does this equivalence require the model with the interaction term also to have an interaction term for the intercept, i.e., the slope on a binary variable for context (since the generative model in Eq. 1 allows for context-specific intercepts)?

      It does require an interaction term for the intercept. This is e_i (and its effect beta_E) in Eq. S2 (line 70 of the supplement).

      Lines 94-96: Perhaps just a language thing, but in what sense does the estimation model described in lines 92-94 “assume” a particular distribution of trait values in the combined sample? It’s just an OLS regression, and one can analyze its expected coefficients with reference to the generative model in Eq. 1, or any other model. To say that it “assumes” something presupposes its purpose, which is not clear from its description in lines 92-94.

      We corrected “assume” to “posit”.

      Lines 115-116: It should perhaps be noted that the weights wA and wB need not sum to 1.

      Indeed; it is now explicitly stated.

      Lines 154-160: I think the role of r could be made even clearer by also discussing why, when VA>>VB, it is better to use the whole-sample estimate of betaA than the sample-A-specific estimate (since this is a more counterintuitive case than the case of VA<<VB discussed by the authors).

      This is addressed in lines 153-154, stating: “Typically, this (VA<<VB) will also imply that the additive estimator is greatly preferable for estimating β_B , as β_B will be extremely noisy”

      Line 243 and Figure 4 caption: The text states that the simulated effects in the high-sugar environment are 1.1x greater than those in the control environment, while the caption states that they are 1.4x greater.

      We have corrected the text to be consistent with our simulations.

      TYPOS/WORDING

      Line 14: “harder to interpret” --> “harder-to-interpret”

      Line 22: We --> we

      Line 40: “as average effect” -> “as the average effect”?

      Line 57: “context specific” --> “context-specific”

      Line 139: “re-parmaterization” --> “re-parameterization”

      Lines 140, 158, 412: “signal to noise” --> “signal-to-noise”

      Figure 3C,D: “pule rate” --> “pulse rate”

      The caption of Figure 3: “conutinous” --> “continuous”

      Line 227: “a variant may fall” --> “a variant may fall into”

      Line 295: “conferring to more GxE” --> “conferring more GxE” or “corresponding to more GxE”? This is very pedantic, but I think “bias-variance” should be “bias--variance” throughout, i.e., with an en-dash rather than a hyphen.

      We have corrected all of the above typos.

      Reviewer #2 (Recommendations For The Authors):

      (This section repeats some of what I wrote earlier).

      - First polygenic architecture part: the manuscript focuses on “top hits” in trying to identify sets of variants that are context-specific. This “top hits” approach seems somewhat esoteric and, as written, not connected tightly enough to the bias-variance tradeoff issue. The first section of the paper which focuses on bias-variance trade-off mostly deals with estimation. The “top hits” section deals with testing, which introduces additional issues that are due to thresholding. Perhaps the authors can think of ways to make the connection stronger between the bias-variance tradeoff part to the “top hits” part, e.g., by introducing testing earlier on and/or discussion estimation in addition to testing in the “top hits” part of the manuscript. The second polygenic architecture part: polygenic scores that account for interaction terms. Here the authors focused (well, also here) on pervasive amplification in simulations. This part combines estimation and testing (both the choice of variants and their estimated effects are important). In pervasive amplification the idea is that causal variants are shared, the results may be different than in a model with context-specific effects and variant selection may have a large impact. Still, I think that these simulations demonstrate the idea developed in the bias-variance tradeoff part of the paper, though the reader is left to wonder whether a bias-variance decision rule should or could be utilized for PGS construction.

      In both of these sections we discuss how the consideration of polygenic GxE patterns alters the conclusions based on the single-variant tradeoff. In the “top hits” section, we show that single-variant classification itself, based on a series of marginal hypothesis tests alone, can be misleading. The PGS prediction accuracy analysis shows that both approaches are beaten by the polygenic GxE estimation approach. Intuitively, this is because the consideration of polygenic GxE can mitigate both the bias and variance, as it leverages signals from many variants.

      We agree that the links between these sections of the paper were not sufficiently clear, and have added signposting to help clarify them (lines 176-180; lines 275-277; lines 316-321).

      - Simulation of GxDiet effects on longevity: the methods of the simulation are strange, or communicated unclearly. The authors’ report (page 17) poses a joint distribution of genetic effects (line 439), but then, they simulated effect estimates standard errors by sampling from summary statistics (line 445) rather than simulated data and then estimating effect and effect SE. Why pose a true underlying multivariate distribution if it isn’t used?

      We rewrote the Methods section “Simulation of GxDiet effects on longevity in Drosophila to make our simulation approach clearer (lines 427-449). We are indeed simulating the true effects from the joint distribution proposed. However, in order to mimic the noisiness of the experiment in our simulations, we sample estimated effects from the true simulated effects, with estimation noise conferring to that estimated in the Pallares et al. dataset (i.e., sampling estimation variances from the squares of empirical SEs).

      - How were the “most significantly associated variants” selected into the PGS in the polygenic prediction part? Based on a context-specific test? A combined-context test of effect size estimates?

      For the “Additive” and “Additive ascertainment, GxE estimation” models (red and orange in Fig. 5, respectively), we ascertain the combined-context set. For the “GxE” and “polygenic GxE” (green and blue in Fig. 5, respectively) models, we ascertain in a context-specific test. We now state this explicitly in lines 280-288 and lines 507-526.

      - As stated, I find the conclusion statement not specific enough in light of the rest of the manuscript. “the consideration of polygenic GxE trends is key” - this is very vague. What does it mean “to consider polygenic GxE trends” in the context of this paper? I can’t tell. “The notion that complex trait analyses should combine observations at top associated loci” - I don’t think the authors really refer to combining “observations”, rather perhaps combine information from top associated loci. But this does not represent the “top hits” approach that merely counts loci by their testing patterns. “It may be a similarly important missing piece...” What does “it” refer to? The top loci? What makes it an important missing piece?

      We rewrote the conclusion paragraph to address these concerns (lines 316-321).

    1. eLife Assessment

      This important study reports numerous attempts to replicate reports on transgenerational inheritance of a learned behavior – pathogen avoidance – in C. elegans. While the authors observe parental effects that are limited to a single generation (also called intergenerational inheritance), the authors failed to find evidence for transmission over multiple generations, or transgenerational inheritance. The experiments presented are meticulously described, making for compelling evidence that in the authors' hands transgenerational inheritance cannot be observed. There remains the possibility that different assay setups explain the failure to reproduce previous observations, although the authors present data suggesting that details of the assay are not that significant. There also remains the possibility that differences in culture conditions or lab environment explain the failure to reproduce previous observations, with updates to the paper having further reduced the probability that this applies here. Even if this were the case, it would imply that the original experimental paradigm was dependent on a very specific context. Given the prominence of the original reports of transgenerational inheritance, the present study is of broad interest to anyone studying genetics, epigenetics, or learned behavior.

      [As also pointed out by the authors of this study, the authors of the original reports have provided a response on bioRxiv (DOI: https://doi.org/10.1101/2025.01.21.634111).]

    2. Reviewer #1 (Public review):

      Summary:

      The authors report an inability to reproduce a transgenerational memory of avoidance of the pathogen PA14 in C. elegans. Instead, the authors demonstrate intergenerational inheritance for a single F1 generation, in embryos of mothers exposed to OP50 and PA14, where embryos isolated from these mothers by bleaching are capable of remembering to avoid PA14 in a manner that is dependent on systemic RNAi proteins sid-1 and sid-2. This could reflect systemic sRNAs generated by neuronal daf-7 signaling that are transmitted to F1 embryos. The authors note that transgenerational memory of PA14 was reported by the Murphy group at Princeton, but that environmental or strain variation (worms or bacteria) might explain the single generation of inheritance observed at Harvard. The Hunter group tried different bacterial growth conditions and different worm growth temperatures for independent PA14 strains, which they show to be strongly pathogenic. However, the authors could not reproduce a transgenerational effect at Harvard. This paper honestly alters expectations and indicates that the model that avoidance of PA14 is remembered for multiple generations is not robust enough to be replicated in all laboratories.

      Overall, this paper that demonstrates that one model for transgenerational inheritance in C. elegans is not robust. The author do demonstrate an avoidance memory for F1 embryos that could be a maternal effect, and the authors confirm that this is mediated by a systemic small RNA response. There are several points in the manuscript where a more positive tone might be helpful.

      Strengths:

      The authors note that the high copy number daf-7::GFP transgene used by the Murphy group displayed variable expression and evidence for somatic silencing or transgene breakdown in the Hunter lab, as confirmed by the Murphy group. The authors nicely use single copy daf-7::GFP to show that neuronal daf-7::GFP is elevated in F1 but not F2 progeny with regards to memory of PA14 avoidance, speaking to an intergenerational phenotype.

      The authors nicely confirm that sid-1 and sid-2 are generally required for intergenerational avoidance of F1 embryos of moms exposed to PA14. However, these small RNA proteins did not affect daf-7::GFP elevation in the F1 progeny. This result is unexpected given previous reports that daf-7::GFP is not elevated in F1 progeny of sid mutants.

      The authors studied antisense small RNAs that change in Murphy data sets, identifying 116 mRNAs that might be regulated by sRNAs in response to PA14. The authors show that the maco-1 gene, putatively targeted by piRNAs according to the Kaletsky 2020 paper, displays few siRNAs that change in response to PA14. The authors conclude that the P11 ncRNA of PA14, which was proposed to promote interkingdom RNA communication by the Murphy group, may not affect maco-1 expression in C. elegans, although they did not formally demonstrate this. The authors define 8 genes based on their analysis of sRNAs and mRNAs that might promote resistance to PA14, but they do not further characterize these genes' role in pathogen avoidance. Others might wish to consider following up on these genes and their possible relationship with P11.

      Weaknesses:

      This very thorough and interesting manuscript is at times pugnacious.

      Please explain more clearly what is High Growth media for E. coli in the text and methods, conveying why it was used by the Murphy lab, and if Normal Growth or High Growth is better for intergenerational heritability assays.

      Comments on revisions:

      The authors have done a reasonable job cordially revising this manuscript, and the authors have addressed most reviewer concerns. It is likely that the P11 gene was in some of the PA14 Pseudomonas strains tested, as one was kindly provided by the Murphy group.

    3. Reviewer #2 (Public review):

      This paper examines the reproducibility of results reported by the Murphy lab regarding transgenerational inheritance of a learned avoidance behavior in C. elegans. It has been well established by multiple labs that worms can learn to avoid the pathogen pseudomonas aeruginosa (PA14) after a single exposure. The Murphy lab has reported that learned avoidance is transmittable to 4 generations and dependent on a small RNA expressed by PA14 that elicits the transgenerational silencing of a gene in C. elegans. The Hunter lab now reports that although they can reproduce inheritance of the learned behavior by the first generation (F1), they cannot reproduce inheritance in subsequent generations.

      This is an important study that will be useful for the community. Although they fail to identify a "smoking gun", the study examine several possible sources for the discrepancy, and their findings will be useful to others interested in using these assays. The preference assay appears to work in their hands in as much as they are able to detect the learned behavior in the P0 and F1 generations, suggesting that the failure to reproduce the transgenerational effect is not due to trivial mistakes in the protocol. The authors provide a full protocol and highlight key deviations from the Murphy lab protocol. The authors provide good evidence that no single protocol modification was sufficient on its own to explain the divergent results. It remains possible that protocol differences affected the assay cumulatively or that other uncontrolled factors were responsible. Nevertheless, the authors provide good evidence that the trans-generational effect reported by the Murphy lab lacks experimental robustness, calling into question its ecological relevance in the wild.

    4. Reviewer #3 (Public review):

      Summary:

      It has been previously reported in many high-profile papers, that C. elegans can learn to avoid pathogens. Moreover, this learned pathogen avoidance can be passed on to future generations - up to the F5 generation in some reports. In this paper, Gainey et al. set out to replicate these findings. They successfully replicated pathogen avoidance in the exposed animals, as well as a strong increase in daf-7 expression in ASI neurons in F1 animals, as determined by a daf-7::GFP reporter construct. However, they failed to see strong evidence for pathogen avoidance or daf-7 overexpression in the F2 generation. The failure of replication is the major focus of this work.<br /> Given their failure to replicate these findings, the authors embark on a thorough test of various experimental confounders that may have impacted their results. They also re-analyze the small RNA sequencing and mRNA sequencing data from one of the previously published papers and draw some new conclusions, extending this analysis.

      Strengths:

      • The authors provide a thorough description of their methods, and a marked-up version of a published protocol that describes how they adapted the protocol to their lab conditions. It should be easy to replicate the experiments.

      • The authors test source of bacteria, growth temperature (of both C. elegans and bacteria), and light/dark husbandry conditions. They also supply all their raw data, so that sample size for each testing plate can be easily seen (in the supplementary data). None of these variations appears to have a measurable effect on pathogen avoidance in the F2 generation, with all but one of the experiments failing to exhibit learned pathogen avoidance.

      • The small RNA seq and mRNA seq analysis is well performed and extends the results shown in the original paper. The original paper did not give many details of the small RNA analysis, which was an oversight. Although not a major focus of this paper, it is a worthwhile extension on the previous work.

      • It is rare that negative results such as these are accessible. Although the authors were unable to determine the reason that their results differ from those previously published, it is important to document these attempts in detail, as has been done here. Behavioral assays are notoriously difficult to perform and public discourse around these attempts may give clarity to the difficulties faced by a controversial field.

      Weaknesses:

      • Although the "standard" conditions have been tested over multiple biological replicates, many of the potential confounders that may have altered the results have been tested only once or twice. For example, changing the incubation temperature to 25{degree sign}C was tested in only two biological replicates (Exp 5.1 and 5.2) - and one of these experiments actually resulted in apparent pathogen avoidance inheritance in the F2 generation (but not in the F1). An alternative pathogen source was tested in only one biological replicate (Exp 3). Given the variability observed in the F2 generation, increasing biological replicates would have added to the strengths of the report.

      • A key difference between the methods used here and those published previously, is an increase in the age of the animals used for training - from mostly L4 to mostly young adults. I was unable to find a clear example of an experiment when these two conditions were compared, although the authors state that it made no difference to their results.

      • The original paper reports a transgenerational avoidance effect up to the F5 generation. Although in this work the authors failed to see avoidance in the F2 generation, it would have been prudent to extend their tests for more generations in at least a couple of their experiments to ensure that the F2 generation was not an aberration (although this reviewer acknowledges that this seems unlikely to be the case).

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      […] Overall, this is an important paper that demonstrates that one model for transgenerational inheritance in C. elegans is not reproducible. This is important because it is not clear how many of the reported models of transgenerational inheritance reported in C. elegans are reproducible. The authors do demonstrate a memory for F1 embryos that could be a maternal effect, and the authors confirm that this is mediated by a systemic small RNA response. There are several points in the manuscript where a more positive tone might be helpful.

      We would like to correct the statement made in the second to last sentence. The demonstration of an F1 response to PA14 was first reported by Moore et al., (2019) and then by Pereira et al., (2020) using a different behavioral assay. We merely confirmed these results in our hands, and confirmed the observation, first reported by Kaletsky et al., (2020), that sid-1 and sid-2 are required for this F1 response; although we did find that sid-1 and sid-2 are not required for the PA14-induced increase in daf-7p::gfp expression in ASI neurons in the F1 progeny of trained adults, which had not been addressed in the published work.

      Yes, the intergenerational F1 response could be a maternal effect, but the in utero F1 embryos and their precursor germ cells were directly exposed to PA14 metabolites and toxins (non-maternal effect) as well as any parental response, whether mediated by small RNAs, prions, hormones, or other unknown information carriers. While the F1 aversion response does require sid-1 and sid-2, we would not presume that the substrate is therefore an RNA molecule, particularly because the systemic RNAi response supported by sid-1 and sid-2 is via long double-stranded RNA. To date, no evidence suggests that either protein transports small RNAs, particularly single-stranded RNAs.

      Strengths:

      The authors note that the high copy number daf-7::GFP transgene used by the Murphy group displayed variable expression and evidence for somatic silencing or transgene breakdown in the Hunter lab, as confirmed by the Murphy group. The authors nicely use single copy daf-7::GFP to show that neuronal daf-7::GFP is elevated in F1 but not F2 progeny with regards to the memory of PA14 avoidance, speaking to an intergenerational phenotype.

      The authors nicely confirm that sid-1 and sid-2 are generally required for intergenerational avoidance of F1 embryos of moms exposed to PA14. However, these small RNA proteins did not affect daf-7::GFP elevation in the F1 progeny. This result is unexpected given previous reports that single copy daf-7::GFP is not elevated in F1 progeny of sid mutants. Because the Murphy group reported that daf-7 mutation abolishes avoidance for F1 progeny, this means that the sid genes function downstream of daf-7 or in parallel, rather than upstream as previously suggested.

      The published report (Moore et al., 2019) shows only multicopy daf-7p::gfp results and does not address the daf-7p::gfp response in sid-1 or sid-2 mutants. Thus, our discovery that systemic RNAi, exogenous RNAi, and heritable RNAi mutants don’t disrupt elevated daf-7p::gfp in ASI neurons in the F1 progeny of PA14 trained P0’s is only unexpected with respect to the published models (Moore et al., 2019, Kaletsky et al., 2020).

      The authors studied antisense small RNAs that change in Murphy data sets, identifying 116 mRNAs that might be regulated by sRNAs in response to PA14. Importantly, the authors show that the maco-1 gene, putatively targeted by piRNAs according to the Kaletsky 2020 paper, displays few siRNAs that change in response to PA14. The authors conclude that the P11 ncRNA of PA14, which was proposed to promote interkingdom RNA communication by the Murphy group, is unlikely to affect maco-1 expression by generating sRNAs that target maco-1 in C. elegans. The authors define 8 genes based on their analysis of sRNAs and mRNAs that might promote resistance to PA14, but they do not further characterize these genes' role in pathogen avoidance. The Murphy group might wish to consider following up on these genes and their possible relationship with P11.

      Weaknesses:

      This very thorough and interesting manuscript is at times pugnacious.

      We reiterate that we never claimed that Moore et al., (2019) did not obtain their reported results. We simply stated that we could not replicate their results using the published methods and then failed in our search to identify variable(s) that might account for our results. In revising the manuscript, we have striven to make clear, unmuddied statements of facts and state that future investigations may provide independent evidence that supports the original claims and explains our divergent results.

      Please explain more clearly what is High Growth media for E. coli in the text and methods, conveying why it was used by the Murphy lab, and if Normal Growth or High Growth is better for intergenerational heritability assays.

      We added the standard recipes and the following explanations in the methods section to the revised text.

      “NG plates minimally support OP50 growth, resulting in a thin lawn that facilitates visualization of larvae and embryos. HG plates (8X more peptone) support much higher OP50 growth, resulting in a thick bacterial lawn that supports larger worm populations.”

      We have also included the following text in our presentation and discussion of the effects of growth conditions on worm choice in PA14 vs OP50 choice assays.

      “Furthermore, because OP50 pathogenicity is enhanced by increased E. coli nutritive conditions (Garsin et al., 2003, Shi et al., 2006), the growth of F1-F4 progeny on High Growth (HG) plates (Moore et al., 2019; 2021b), which contain 8X more peptone than NG plates and therefore support much higher OP50 growth levels, immediately prior to the F1-F4 choice assays may further contribute to OP50 aversion among the control animals.”

      We don’t know enough to claim that HG or NG media is better than the other for intergenerational assays, but they are different. Thus, switching between the two in a multigenerational experiment likely introduces unknown variability.

      Reviewer #2 (Public Review):

      This paper examines the reproducibility of results reported by the Murphy lab regarding transgenerational inheritance of a learned avoidance behavior in C. elegans. It has been well established by multiple labs that worms can learn to avoid the pathogen pseudomonas aeruginosa (PA14) after a single exposure. The Murphy lab has reported that learned avoidance is transmittable to 4 generations and dependent on a small RNA expressed by PA14 that elicits the transgenerational silencing of a gene in C. elegans. The Hunter lab now reports that although they can reproduce inheritance of the learned behavior by the first generation (F1), they cannot reproduce inheritance in subsequent generations.

      This is an important study that will be useful for the community. Although they fail to identify a "smoking gun", the study examines several possible sources for the discrepancy, and their findings will be useful to others interested in using these assays. The preference assay appears to work in their hands in as much as they are able to detect the learned behavior in the P0 and F1 generations, suggesting that the failure to reproduce the transgenerational effect is not due to trivial mistakes in the protocol. An obvious reason, however, to account for the differing results is that the culture conditions used by the authors are not permissive for the expression of the small RNA by PA14 that the MUrphy lab identified as required for transgenerational inheritance. It would seem prudent for the authors to determine whether this small RNA is present in their cultures, or at least acknowledge this possibility.

      We thank the reviewer for raising this issue and have added the following statement to this effect in the revised manuscript.

      “We note that previous bacterial RNA sequence analysis identified a small non-coding RNA called P11 whose expression correlates with bacterial growth conditions that induce heritable avoidance (Kaletsky et al., 2020). Critically, C. elegans trained on a PA14 ΔP11 strain (which lacks this small RNA) still learn to avoid PA14, but their F1 and F2-F4 progeny fail to show an intergenerational or transgenerational response (Figure 3L in Kaletsky et al., 2020). The fact that we observed an intergenerational (F1) avoidance response is evidence that our PA14 growth conditions induce P11 expression.”

      We believe that this addresses the concern raised here.

      The authors should also note that their protocol was significantly different from the Murphy protocol (see comments below) and therefore it remains possible that protocol differences cumulatively account for the different results.

      As suggested below, we have added to the supplemental documents the protocol we followed for the aversion assay. In our view, this document shows that our adjustments to the core protocol were minor. Furthermore, where possible, these adjustments were explicitly tested in side-by-side experiments for both the aversion assay and the daf-7p::gfp expression assay and presented in the manuscript.

      To discover the source(s) of discrepancy between our results and the published results we subsequently introduced variations to this core protocol to exclude likely variables (worm and bacteria growth temperatures, assay conditions, worm handling methods, bacterial culture and storage conditions, and some minor developmental timing issues). Again, where possible, the effect of variations was tested in side-by-side experiments for both the aversion assay and the daf-7p::gfp expression assay and were presented in or have now been added to the manuscript.

      It remains possible that we misunderstood the published Murphy lab protocols, but we were highly motivated to replicate the results so we could use these assays to investigate the reported RNAi-pathway dependent steps, thus we read every published version with extreme care.

      Reviewer #3 (Public Review):

      […] Strengths:

      (1) The authors provide a thorough description of their methods, and a marked-up version of a published protocol that describes how they adapted the protocol to their lab conditions. It should be easy to replicate the experiments.

      As noted above in response to a suggestion by reviewer #2, we have replaced the annotated published protocol with the protocol that we followed. This will aid other groups' attempts to replicate our experimental conditions.

      (2) The authors test the source of bacteria, growth temperature (of both C. elegans and bacteria), and light/dark husbandry conditions. They also supply all their raw data, so that the sample size for each testing plate can be easily seen (in the supplementary data). None of these variations appears to have a measurable effect on pathogen avoidance in the F2 generation, with all but one of the experiments failing to exhibit learned pathogen avoidance.

      We note that the parallel analysis of daf-7p::gfp expression in ASI neurons was also tested for several of these conditions and also failed to replicate the published findings.

      (3) The small RNA seq and mRNA seq analysis is well performed and extends the results shown in the original paper. The original paper did not give many details of the small RNA analysis, which was an oversight. Although not a major focus of this paper, it is a worthwhile extension of the previous work.

      (4) It is rare that negative results such as these are accessible. Although the authors were unable to determine the reason that their results differ from those previously published, it is important to document these attempts in detail, as has been done here. Behavioral assays are notoriously difficult to perform and public discourse around these attempts may give clarity to the difficulties faced by a controversial field.

      Thank you for your support. Choosing to pursue publication of these negative results was not an easy decision, and we thank members of the community for their support and encouragement.

      Weaknesses:

      (1) Although the "standard" conditions have been tested over multiple biological replicates, many of the potential confounders that may have altered the results have been tested only once or twice. For example, changing the incubation temperature to 25{degree sign}C was tested in only two biological replicates (Exp 5.1 and 5.2) - and one of these experiments actually resulted in apparent pathogen avoidance inheritance in the F2 generation (but not in the F1). An alternative pathogen source was tested in only one biological replicate (Exp 3). Given the variability observed in the F2 generation, increasing biological replicates would have added to the strengths of the report.

      We agree that our study was not exhaustive in our exploration of variables that might be interfering with our ability to detect F2 avoidance. We also note that some of these variables also failed (with many more independent experiments) to induce elevated daf-7p::gfp expression in ASI neurons in F2 progeny. Our goal was not to show that variation in some growth or assay condition would generate reproducible negative results, but the exploration was designed to tweak conditions to enable detection of a robust F2 response. Given the strength of the data presented in Moore et al., (2019) we expected that adjustment of the problematic variable would produce positive results apparent in a single replicate, which could then be followed up. If we had succeeded, then we would have documented the conditions that enabled robust F2 inheritance and would have explored molecular mechanisms that support this important but mysterious process.

      (2) A key difference between the methods used here and those published previously, is an increase in the age of the animals used for training - from mostly L4 to mostly young adults. I was unable to find a clear example of an experiment when these two conditions were compared, although the authors state that it made no difference to their results.

      We can state firmly that the apparent time delay did not affect P0 learned avoidance (new Figure S1) or, as documented in Table S1, daf-7p::gfp expression in ASI neurons. In our experience, training mostly L4’s on PA14 frequently failed to produce sufficient F1 embryos for both F1 avoidance assays or daf-7p::gfp measurements in ASI neurons and collection of F2 progeny. Indeed, in early attempts to detect heritable PA14 aversion, trained P0 and F1 progeny were not assayed in order to obtain sufficient F2’s for a choice assay. These animals failed to display aversion, but without evidence of successful P0 training or an F1 intergenerational response this was deemed a non-fruitful trouble-shooting approach. We have added supplemental Figure S1 which presents P0 choice assay results from experiments using younger trained animals that failed to produce sufficient F1’s to continue the inheritance experiments.

      The different timing at the start of training between the two protocols may reflect the age of the recovered bleached P0 embryos. It is reasonable to assume that bleaching day 1 adults vs day 2 or 3 adults from the P-1 population could shift the average age of recovered P0 embryos by several hours. The Murphy protocol only states that P0 embryos were obtained by bleaching healthy adults. Regardless, if the hypothesis entertained here is true, that a several hour difference in larval/adult age during 24 hours of training affects F2 inheritance of learned aversion but does not affect P0 learned avoidance, then we would argue that this paradigm for heritable learned avoidance, as described in Moore et al., (2019, 2021), is not sufficiently robust for mechanistic investigations.

      (3) The original paper reports a transgenerational avoidance effect up to the F5 generation. Although in this work the authors failed to see avoidance in the F2 generation, it would have been prudent to extend their tests for more generations in at least a couple of their experiments to ensure that the F2 generation was not an aberration (although this reviewer acknowledges that this seems unlikely to be the case).

      We would point out that we also failed to robustly replicate the F2 response in the daf-7p::gfp expression assays. An F2-specific aberration that affects two different assays seems quite unlikely, and it remains unclear how we would interpret a positive result in F3 and F4 generations without a positive result in the F2 generation. Were we to further extend these investigations, we believe that exploration of additional culture conditions would warrant higher priority than extension of our results to the F3 and F4 generations.

      Reviewing Editor Comments:

      The reviewers' suggestions for improving the manuscript were mostly minor, to change the wording in some places and to add some more explanation regarding the methods.

      What should be highlighted in the section on OP50 growth conditions is that the initial preference for PA14 in the Murphy lab has also been observed by multiple other labs (Bargmann, Kim, Zhang, Abbalay). The fact that this preference was not observed by the Hunter lab is one of several indicators of subtle differences in the environment that might add up to explain the differences in results.

      We agree that subtle known and unknown differences in OP50 and PA14 culture conditions can have measurable effects on the detection of PA14 attraction/aversion relative to OP50 attraction/aversion that could obscure or create the appearance of heritable effects between generations. We have added (see below) to the text a fuller description of the variability in the initial or naive preference observed in different laboratories using similar or variant 2-choice assays and culture conditions. It is worth emphasizing that direct comparison of the OP50 growth conditions specified in Moore et al., (2021) frequently revealed a much larger effect on the naïve choice index than is reported between labs (Figure 4).  

      “Naïve (OP50 grown) worms often show a bias towards PA14 in choice assays (Zhang et al., 2005; Ha et al., 2010; Moore et al., 2019; Pereira et al., 2020; Lalsiamthara and Aballay, 2022). This response, rather than representing an innate attraction to PA14, likely reflects the context of the worm's recent growth on OP50, a mild C. elegans pathogen (Garigan et al., 2002; Garsin et al., 2003; Shi et al., 2006). Thus, the naïve worms presented with a choice between a recently experienced mild pathogen (OP50) and a novel food choice (PA14) initially choose the novel food instead of the known mild pathogen (OP50 aversion).

      In line with our results, some other groups have also reported higher naïve choice index scores (Lee et al., 2017). This variability in naïve choice may reflect differences in growth conditions of either the OP50 or PA14 bacteria. In addition, we note that among the studies that show naïve worm attraction to Pseudomonas (OP50 aversion) there are extensive methodological differences from the methods in Moore et al., (2019; 2021b), including differences in bacterial growth temperature, incubation time, whether the bacteria is diluted or concentrated prior to placement on the choice plates, the concentration of peptone in the choice plates, the length of the choice assay, and the inclusion of sodium azide in the choice assays (Zhang et al., 2005; Ha et al., 2010; Moore et al., 2019; Pereira et al 2020; Lalsiamthara and Aballay, 2022). Thus, the cause of the variability across published reports is not clear.”

      Overall, an emphasis on the absence of robustness of the reported results, rather than failure to reproduce them (which can always have many reasons), is appropriate.

      We agree that an emphasis on robustness is appropriate and have modified the text throughout the manuscript to shift the emphasis to absence of robustness. This includes a change to the manuscript title, which is now, “Reported transgenerational responses to Pseudomonas aeruginosa in C. elegans are not robust”

      A significant experimental addition would be some attempts to determine whether the bacterial PA14 pathogen in the authors' lab produces the P11 small RNA, which has been proposed to have a causal role in initiating the previously reported transgenerational inheritance.

      We acknowledge in the revised manuscript that a subsequent publication (Kaletsky et al., 2020) identified a correlation between PA14 training conditions that induced transgenerational memory and the expression of P11, a P. aeruginosa small non-coding RNA (see our response above to Reviewer #2’s similar query). While testing for the presence of P11 in Harvard culture conditions would be an important assay in any study whose purpose was to investigate the proposed P11-mediated mechanism underlying the transgenerational responses reported by the Murphy Lab, our goal was rather to replicate the robust transgenerational (F2) responses to PA14 training and then to investigate in more detail how sid-1 and sid-2 contribute to transgenerational epigenetic inheritance. Neither sid-1 nor sid-2 are predicted to transport small RNAs or single-stranded RNAs, thus testing for the presence of P11 is less relevant to our goals. Regardless, we note that Figure 3L in Kaletsky et al., (2020) showed that PA14 ΔP11 bacteria failed to induce an F1 avoidance response. Thus, the fact that we observed F1 avoidance implies that our culture conditions successfully induced P11 expression.

      Reviewer #1 (Recommendations For The Authors):

      The abstract could be more positive by concluding that 'We conclude that this example of transgenerational inheritance lacks robustness but instead reflects an example of small RNA-mediated intergenerational inheritance.'

      As recommended, we have added additional clarifying information to the abstract and moderated the conclusion sentence.

      “We did confirm that the dsRNA transport proteins SID-1 and SID-2 are required for the intergenerational (F1) inheritance of pathogen avoidance, but not for the F1 inheritance of elevated daf-7 expression. Furthermore, our reanalysis of RNA seq data provides additional evidence that this intergenerational inherited PA14 response may be mediated by small RNAs.”

      “We conclude that this example of transgenerational inheritance lacks robustness, confirm that the intergenerational avoidance response, but not the elevated daf-7p::gfp expression in F1 progeny, requires sid-1 and sid-2, and identify candidate siRNAs and target genes that may mediate this intergenerational response.”

      Differential expression of sRNAs or mRNAs might be better understood quantitatively by presenting data in scatterplots (Reed and Montgomery 2020) rather than in volcano plots.

      We agree and have modified Figure 6A and 6B.

      This statement in the main text might be unnecessary, as it affects the tenor of the conclusion of this significant manuscript. 'We note that none of the raw data for the published figures and unpublished replicate experiments . . . this hampered our ability to fully compare'.

      We have rewritten this paragraph to focus on our goal: to identify the source of the discrepancy between our results and the published results. We considered discarding this statement but ultimately decided that our inability to directly compare our data to that of previously published work is a shortcoming of our study that deserves to be acknowledged and explained.

      “Ideally, we would have compared our results with the published results (Moore et al., 2019), to possibly identify additional experimental parameters for further investigation; for example, a quantitative comparison of naïve choice in the P0 and F1 generations could help to determine the role of bacterial growth in the choice assay response. However, none of the raw data for the published figures and unpublished replicate experiments (Moore et al., 2019) were available on the publisher’s website or provided upon request to the corresponding author. In the absence of a quantitative comparison, it remains possible that an explanation for the discrepancies between our results and those of Moore et al., (2019) has been overlooked.”

      The final sentence of the Discussion could be tempered and more positive by stating 'Thus independent reproducibility is of paramount concern, and we have tried to be completely transparent as a model for how heritability research should be conducted within the C. elegans community'.

      Thank you. The suggested sentence nicely captures our intention. We now use it, almost verbatim, as our final sentence.

      “Thus, independent reproducibility is of paramount concern, and we have tried to be completely transparent as a model for how heritability research should be presented within the C. elegans community.”

      Reviewer #2 (Recommendations For The Authors):

      Specific comments:

      (1) Protocol: It is difficult to assess from the Methods the exact protocol used by the authors to assay food preference. The annotated Murphy protocol is not sufficient. The authors should provide their own protocol - a detailed lab-ready protocol where every step is outlined, and any steps that deviate from the Murphy lab protocol are called out.

      Thank you for this excellent suggestion. We now include a protocol that documents the precise steps, timings, and controls that we followed (S1_aversion_protocol). We also include footnotes to both explain the reasons behind particular steps and to document known differences to the published protocol. Given the thoroughness of this suggested approach, we have thus removed the annotated version of Moore et al., (2021) from the revised submission.

      (2) The authors imply in the methods that, unlike the Murphy lab, they did NOT use azide in the assay, and instead used 4oC to "freeze" the worms in place - It is not clear whether this method was used throughout all their assays and whether this could be a source of the difference. This change is NOT indicated in the annotated Murphy lab STAR Protocol they provide in the supplement.

      We apologize for the lack of clarity. Concerned that azide may be interfering with our ability to detect heritable silencing we tested and then used cold-induced rigor to preserve worm choice in some choice assay results. This was not a change to the core protocol, but a variation used in some assays to determine whether azide could reduce our ability to detect heritable behavioral responses to PA14 exposure. As Moore et al., (2021) show, too much azide can affect measurement of worm choice. Too little or ineffective azide also can affect measurement of worm choice. Azide also affects bacteria (both OP50 and PA14), which could affect the production of molecules that attract or repel worms, much like performing the assay in light vs dark conditions can influence the measured choice index.

      In our hands, cold-induced rigor worked well and within biological replicates was indistinguishable from azide (Figure S10). Thus, we include those results in our analysis and now indicate in Tables 2 and S2 and in Figures 1 and 3 which experiments used which method. As suggested, we now provide a detailed protocol that includes a note describing our precise method for cold-induced rigor.

      Also, the number of worms used in each assay needs to be specified (same or different from Murphy protocol?), and whether any worms were "censored" as in the Murphy protocol, and if so on what basis.

      While we published the exact number of worms scored in each assay (on each plate) it is unknown how this might compare to the results published in Moore et al., (2019), as the number of animals in the presented choice assays (either per plate or per choice) were not reported. Details on censoring, when to exclude data, and additional criteria to abandon an in-progress experiment are now detailed in the protocol (S1_aversion_protocol)

      (3) Several instances in the text cite changes in the protocol as producing "no meaningful differences" without referring to a specific experiment that supports that statement (for example, line 399 regarding azide).

      We now include data and methods comparing azide and cold-induced rigor (Supplemental document S1_aversion_protocol, Supplemental Figure S10), and data showing the P0 choice index for 48-52 hour post-bleach L4/young adults (Supplemental Figure S1), in addition to the previously noted absence of effects due to differences in embryo bleaching protocols (Figures 2, 3 and Tables 1, 2, S1, and S2).

      (4) If the authors want to claim the irreproducibility of the Murphy lab results, they should use the exact protocol used by the Murphy lab in its entirety. It is not sufficient to show that individual changes do not affect the outcome, since the protocol they use appears to include SEVERAL changes which could cumulatively affect the results. If the authors do not want to do this, they should at least acknowledge and summarize in their discussion ALL their protocol changes.

      We acknowledge these minor differences between the protocols we followed and the published methods but disagree that they invalidate our results. We transparently present the effect of known minimal protocol changes. We also present analysis of possible invalidating variations (number of animals in a choice assay). We emphasize that in our hands both measures of TEI, the choice assay and measurement of daf-7p::gfp in ASI neurons, failed to replicate the published transgenerational results.

      If the protocol is sensitive to how animals are counted, whether bleached embryos are mixed gently or vigorously or a few hours difference in age at training, then in our view this TEI paradigm is not robust.

      See also our response to reviewer #3’s public reviews above.

      (5) The authors acknowledge that "non-obvious growth culture differences" could account for the different results. In this respect, the Murphy lab has proposed that the transgenerational effect requires a small RNA expressed in PA14. The authors should check that this RNA is expressed in the cultures they grow in their lab and use for their experiments. This could potentially identify where the two protocols diverge.

      The bacterial culture conditions and worm training procedures described in Moore et al., (2019) successfully produced trained P0 animals that transmitted a PA14 aversion response to their F1 progeny. In a subsequent publication (Kaletsky et al., 2020), the Murphy lab showed a correlation between the culture conditions that induce heritable avoidance and the expression of P11, a P. aeruginosa small non-coding RNA. As mentioned above in response to Reviewer #2’s public review and the Reviewing Editor’s comments to authors, the Murphy lab showed that PA14 ΔP11 bacteria fail to induce an F1 avoidance response (Figure 3L in Kaletsky et al., (2020)). Thus, the fact that we observed F1 avoidance implies that our culture conditions successfully induced P11 expression. We believe that this addresses the concern raised here. Furthermore, if P11 is not reliably expressed in pathogenic PA14, then the published model is unlikely to be relevant in a natural environment. Again, we thank the reviewer for raising this issue and have added this information to the revised manuscript (see above response to Reviewer #2’s Public Reviews).

      (6) Legend to Figure 1: please clarify which experiments were done with which PA14 isolates especially for A-C. What is the origin of the N2 strain used here?

      These details from Tables 2 and S2 have been added to Figure 1 panels A-C and Figure 3. Bristol N2, obtained from the CGC (reference 257), was used for aversion experiments.

      (7) Growth conditions: "These young adults produced comparable P0 and F1 results (Figure 1, Figure 2, and Figure 3)." It is not clear from the text what specific figure panels need to be compared to examine the effect of the variables described in the text. Please indicate which figure panels should be compared (lines 70-95).

      The information for the daf-7p::gfp expression experiments displayed in Figure 1 and Figure 2 is presented in Table 1 and Table S1. The data for P0 aversion training using younger animals is now presented in Figure S1.

      Reviewer #3 (Recommendations For The Authors):

      While overall I found this easy to follow and well-written, I think the clarity of the figures could be improved by incorporating some of the information from S2 into Figure 3. Besides the figure label listing the experiment (Exp1, Exp2, etc) it would be helpful to add pertinent information about the experiment. For example Exp 1.1 (light, 20{degree sign}C), Exp1.2 (dark, 20{degree sign}C), Exp 5 (25{degree sign}C, light), etc.

      Thank you for the suggestion. These details from Tables 2 and S2 have been added to Figures 1 A-C, and 3.

      Citations

      • Moore, R.S., Kaletsky, R., and Murphy, C.T. (2019). Piwi/PRG-1 Argonaute and TGF-beta Mediate Transgenerational Learned Pathogenic Avoidance. Cell 177, 1827-1841 e1812.

      • Moore, R.S., Kaletsky, R., and Murphy, C.T. (2021). Protocol for transgenerational learned pathogen avoidance behavior assays in Caenorhabditis elegans. STAR Protoc 2, 100384.

      • Kaletsky, R., Moore, R.S., Vrla, G.D., Parsons, L.R., Gitai, Z., and Murphy, C.T. (2020). C. elegans interprets bacterial non-coding RNAs to learn pathogenic avoidance. Nature 586, 445-451.

      • Pereira, A.G., Gracida, X., Kagias, K., and Zhang, Y. (2020). C. elegans aversive olfactory learning generates diverse intergenerational effects. J Neurogenet 34, 378-388.

    1. eLife Assessment

      This work investigates ZC3H11A as a cause of high myopia through the analysis of human data and experiments with genetic knockout of Zc3h11a in mouse, providing a useful model of myopia. The evidence supporting the conclusion is still incomplete in the revised manuscript as the concerns raised in the previous review were not fully addressed. The article will benefit from further strengthening the genetic analysis, full presentation of human phenotypic data, and explaining the reasons why there was no increased axial length in mice with myopia. The work will be of interest to ophthalmologists and researchers working on myopia.

    2. Reviewer #2 (Public review):

      Summary:

      The authors reported that mutations were identified in the ZC3H11A gene in four adolescents from 1015 high myopia subjects in their myopia cohort. They further generated Zc3h11a knockout mice utilizing the CRISPR/Cas9 technology.

      Comments on revisions:

      Chong Chen and colleagues revised the manuscript; however, none of my suggestions from the initial review have been sufficiently addressed.

      (1) I indicated that the pathogenicity and novelty of the mutation need to be determined according to established guidelines and databases. However, the conclusion was still drawn without sufficient justification.<br /> (2) The phenotype of heterozygous mutant mice is too weak to support the gene's contribution to high myopia. The revised manuscript does not adequately address these discrepancies. Furthermore, no explanation was provided for why conditional gene deletion was not used to avoid embryonic lethality, nor was there any discussion on tissue- or cell-specific mechanistic investigations.<br /> (3) The title, abstract, and main text continue to misrepresent the role of the inflammatory intracellular PI3K-AKT and NF-κB signaling cascade in inducing high myopia. No specific cell types have been identified as contributors to the phenotype. The mice did not develop high myopia, and no relationship between intracellular signaling and myopia progression has been demonstrated in this study.

    3. Reviewer #3 (Public review):

      Chen et al have identified a new candidate gene for high myopia, ZC3H11A, and using a knock-out mouse model, have attempted to validate it as a myopia gene and explain a potential mechanism. They identified 4 heterozygous missense variants in highly myopic teenagers. These variants are in conserved regions of the protein, and predicted to be damaging, but the only evidence the authors provide that these specific variants affect protein function is a supplement figure showing decreased levels of IκBα after transfection with overexpression plasmids (not specified what type of cells were transfected). This does not prove that these mutations cause loss of function, in fact it implies they have a gain-of-function mechanism. They then created a knock-out mouse. Heterozygotes show myopia at all ages examined but increased axial length only at very early ages. Unfortunately, the authors do not address this point or examine corneal structure in these animals. They show that the mice have decreased B-wave amplitude on electroretinogram (a sign of retinal dysfunction associated with bipolar cells), and decreased expression of a bipolar cell marker, PKCα. On electron microscopy, there are morphologic differences in the outer nuclear layer (where bipolar, amacrine, and horizontal cell bodies reside). Transcriptome analysis identified over 700 differentially expressed genes. The authors chose to focus on the PI3K-AKT and NF-κB signaling pathways and show changes in expression of genes and proteins in those pathways, including PI3K, AKT, IκBα, NF-κB, TGF-β1, MMP-2 and IL-6, although there is very high variability between animals. They propose that myopia may develop in these animals either as a result of visual abnormality (decreased bipolar cell function in the retina) or by alteration of NF-κB signaling. These data provide an interesting new candidate variant for development of high myopia, and provide additional data that MMP2 and IL6 have a role in myopia development. For this revision, none of my previous suggestions have been addressed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Chen and colleagues investigated ZC3H11A as a potential cause of high myopia (HM) in humans through the analysis of exome sequencing in 1,015 adolescents and experiments involving Zc3h11a knock-out mice. The authors showed four possibly pathogenic missense variants in four adolescents with HM. After that, the authors presented the phenotypic features of Zc3h11a knock-out mice, the result of RNA-sequencing, and a comparison of mRNA and protein levels of the functional candidates between wild-type and Zc3h11a knock-out mice. Based on their observations, the authors concluded that ZC3H11A protein contributes to the early onset of myopia.

      The strengths of this manuscript include: (1) successful identification of characteristic ophthalmic phenotypes in Zc3h11a knock-out mice, (2) demonstration of biological features related to myopia, such as PI3K-AKT and NF-kB pathways, and (3) inclusion of supporting human genetic data in individuals with HM. On the other hand, the weaknesses of this paper appear to be: (1) the lack of robust evidence from their genomic analysis, and (2) insufficient evidence to support phenotypic similarity between humans with ZC3H11A mutations and Zc3h11a knock-out mice. Given that the biological mechanisms of high myopia are not fully understood, the identification of a novel gene is valuable. As described in the manuscript, it is worth noting that the previous study using myopic mouse model has implicated the role of ZC3H11A in the etiology of myopia (Fan et al. Plos Genet 2012).

      Thank you very much for your valuable suggestions.

      Specific comments:

      (1) I am concerned about the certainty of similarity in phenotypes between individuals with ZC3H11A mutation and Zc3h11a knock-out mice. A crucial point would be that there are no statistical differences in axial lengths (ALs) between wild-type and Zc3h11a knock-out mice at 8W and 10W, even though ALs in the individuals with ZC3H11A mutation were long. I would also like to note that the phenotypic information of these individuals is not available in the manuscript, although the authors indicated the suppressed b-wave amplitude in Zc3h11a knock-out mice. Considering that the authors described that "Detailed ophthalmic examinations were performed (lines: 321-323)", the detailed clinical features of these individuals should be included in the manuscript.

      Thank you for your valuable comments. The axial length in Zc3h11a Het-KO mice were found to be significantly greater than in WT littermates at weeks 4 and 6 (Independent samples t-test, p<0.05; Figure 2A and B). Although no significant differences were observed at other time points, there was still some degree of increase in these parameters. We continued to measure corneal curvature and found no significant differences between the two groups. Therefore, the difference in refraction may be due to the small size of the mouse eye. A 1 D change in refraction corresponds to only a 5-6 μm change in AL(1). However, the SD-OCT resolution used in this study is relatively low (theoretical resolution of 6 μm)(2, 3), so the small changes measured in vitreous cavity depth and AL may not be statistically significant. Additionally, some studies have shown that axial lengths reported in frozen sections are longer than those measured in vivo for age-matched mice(1, 4). Another possible explanation is that the curvature and refractive power of the lens have changed. These hypotheses provide a reasonable explanation for the mismatch between changes in refraction and ocular length parameters.

      Reference

      (1) Schmucker C, Schaeffel F. A paraxial schematic eye model for the growing C57BL/6 mouse. Vision research 44, 1857-1867 (2004).

      (2) Yuan Y, Chen F, Shen M, Lu F, Wang J. Repeated measurements of the anterior segment during accommodation using long scan depth optical coherence tomography. Eye & contact lens 38, 102-108 (2012).

      (3) Shen M, et al. SD-OCT with prolonged scan depth for imaging the anterior segment of the eye. Ophthalmic Surgery, Lasers and Imaging Retina 41, S65-S69 (2010).

      (4) Schmucker C, Schaeffel F. In vivo biometry in the mouse eye with low coherence interferometry. Vision research 44, 2445-2456 (2004).

      Additionally, regarding the “detailed ophthalmic examinations”, due to our patients were selected from a myopia screening cohort of over one million (children and adolescents myopia survey [CAMS] program), and ophthalmic examination only includes semi-annual refractive error measurements (a total of 5 times, with refractive error being the average of the three maximum values) and only one axial length measurement. The inappropriate description of “Detailed clinical features” has been removed.

      (2) The term "pathogenic variant" should be used cautiously. Please clarify the pathogenicity of the reported variants in accordance with the ACMG guideline.

      Thank you for your valuable comments. Four missense mutations in the ZC3H11A gene (c.412G>A, p.V138I; c.128G>A, p.G43E; c.461C>T, p.P154L; and c.2239T>A, p.S747T) were identified in the 1015 HM patients aged from 15 to 18 years. All of the identified mutations exhibited very low frequencies or does not exist in the Genome Aggregation Database (gnomAD) and Clinvar, and using pathogenicity prediction software SIFT, PolyPhen2, and CADD, most of them display high pathogenicity levels. Among them, c.412G>A, c.128G>A and c.461C>T were located in or around a domain named zf-CCCH_3 (Figure 1A and B). Furthermore, all of the mutation sites were located in highly conserved amino acids across different species (Figure 1C). Four mutations resulted in a higher degree of conformational flexibility and altered the negative charge at the corresponding sites (Figure 1D and E). Meanwhile, through transfection of overexpression mutant plasmids, it was found that compared to the wild-type, the mRNA expression levels of IκBα in the nucleus of all four mutant types (ZC3H11A<sup>V138I</sup>, ZC3H11A<sup>G43E</sup>, ZC3H11A<sup>P154L</sup> and ZC3H11A<sup>S747T</sup>) were significantly reduced (Supplement Figure 3). According to the ACMG guidelines, the above mutations can be classified as “Pathogenic Moderate”.

      (3) The genetic analysis does not fully support the claim that ZC3H11A is causative for HM. While the authors showed the rare allele frequencies and high CADD scores (> 20) of the identified variants, these were insufficient to establish causality. A helpful way to assess the causality would be performing a segregation analysis. An alternative approach is to show significant association by performing a gene-level association test. Assessing the pathogenicity of the variants using various prediction software, such as SIFT, PolyPhen2, and REVEL may also provide additional supportive evidence.

      Thank you for your valuable comments. We have addad the pathogenicity of the variants using various prediction software, such as SIFT, PolyPhen2, CADD, and the population variation databases, such as Genome Aggregation Database (gnomAD_AF) and ClinVar. Meanwhile, through transfection of overexpression mutant plasmids, it was found that compared to the wild-type, the mRNA expression levels of IκBα in the nucleus of all four mutant types (ZC3H11A<sup>V138I</sup>, ZC3H11A<sup>G43E</sup>, ZC3H11A<sup>P154L</sup> and ZC3H11A<sup>S747T</sup>) were significantly reduced (Supplement Figure 3).

      (4) As shown in Figure 2, significant differences in refraction were observed from 4 weeks to 10 weeks. Nevertheless, no differences were observed in AL, anterior/vitreous chamber depth, and lens depth. The author should experimentally clarify what factors contribute to the observed difference in refraction.

      Thank you for your valuable comments. The existing data show significant differences in refraction between 4 and 10 weeks, with the AL and vitreous cavity depth of Het mice being longer than those of WT mice at 4 and 6 weeks. Although no significant differences were observed at other time points, there was still some degree of increase in these parameters. We continued to measure corneal curvature and found no significant differences between the two groups. Therefore, the difference in refraction may be due to the small size of the mouse eye. A 1 D change in refraction corresponds to only a 5-6 μm change in AL(1). However, the SD-OCT resolution used in this study is relatively low (theoretical resolution of 6 μm)(2, 3), so the small changes measured in vitreous cavity depth and AL may not be statistically significant. Additionally, some studies have shown that axial lengths reported in frozen sections are longer than those measured in vivo for age-matched mice(1, 4). Another possible explanation is that the curvature and refractive power of the lens have changed. These hypotheses provide a reasonable explanation for the mismatch between changes in refraction and ocular length parameters.

      Reference

      (1) Schmucker C, Schaeffel F. A paraxial schematic eye model for the growing C57BL/6 mouse. Vision research 44, 1857-1867 (2004).

      (2) Yuan Y, Chen F, Shen M, Lu F, Wang J. Repeated measurements of the anterior segment during accommodation using long scan depth optical coherence tomography. Eye & contact lens 38, 102-108 (2012).

      (3) Shen M, et al. SD-OCT with prolonged scan depth for imaging the anterior segment of the eye. Ophthalmic Surgery, Lasers and Imaging Retina 41, S65-S69 (2010).

      (4) Schmucker C, Schaeffel F. In vivo biometry in the mouse eye with low coherence interferometry. Vision research 44, 2445-2456 (2004).

      (5) The gene names should be italicized throughout the manuscript.

      Thank you for your valuable comments. The gene names have been italicized throughout the manuscript.

      (6) Table 1: providing chromosomal positions and rs numbers (if available) would be helpful for readers.

      Thank you for your valuable comments. We have provided the chromosome positions and rs number (if available) of each mutation in Table 1.

      (7) Figure 5b, c, and d: the results of pathway analysis and GO enrichment analysis are difficult to interpret due to the small font size. It would be preferable to present these results in tables. Moreover, the authors should set a significant threshold in the enrichment analyses.

      Thank you for your valuable comments. We have adjusted the font size of the image. In the retina transcriptome analysis, we have set Fold change (FC) of at least two and a P value < 0.05 as thresholds to analyze differentially expressed genes (DEGs). The GO terms and KEGG pathways enrichment analysis selected the top 20 with the most significant differences or the highest number of enriched genes for display.

      Reviewer #2 (Public Review):

      Summary: Chong Chen and colleagues reported that mutations were identified in the ZC3H11A gene in four adolescents from 1015 high myopia subjects in their myopia cohort. They further generated Zc3h11a knockout mice utilizing the CRISPR/Cas9 technology. They analyzed the heterozygotes knockout mice compared to control littermates and found refractive error changes, electrophysiological differences, and retinal inflammation-related gene expression differences. They concluded that ZC3H11A may play a role in the early onset of myopia by regulating inflammatory responses.

      Strengths:

      Data were shown from both clinical cohort and animal models.

      Weaknesses:

      Their findings are interesting and important, however; they need to resolve several points to make the current conclusion.

      (1) They described the ZC3H11A gene as a pathogenic variant for high myopia. It should be classified as pathogenic according to the guidelines of the American College of Medical Genetics and Genomics (Richards et al., Genet Med 17(5):405-24, 2015). The modes of inheritance for the families need to be shown. They also described identifying the gene as a "new" candidate. It should be checked in databases such as gnomAD and ClinVar, and any previous publications and be declared as a novel variant.

      Thank you for your valuable comments. Four missense mutations in the ZC3H11A gene (c.412G>A, p.V138I; c.128G>A, p.G43E; c.461C>T, p.P154L; and c.2239T>A, p.S747T) were identified in the 1015 HM patients aged from 15 to 18 years. All of the identified mutations exhibited very low frequencies or does not exist in the Genome Aggregation Database (gnomAD) and Clinvar, and using pathogenicity prediction software SIFT, PolyPhen2, and CADD, most of them display high pathogenicity levels. Among them, c.412G>A, c.128G>A and c.461C>T were located in or around a domain named zf-CCCH_3 (Figure 1A and B). Furthermore, all of the mutation sites were located in highly conserved amino acids across different species (Figure 1C). Four mutations resulted in a higher degree of conformational flexibility and altered the negative charge at the corresponding sites (Figure 1D and E). Meanwhile, through transfection of overexpression mutant plasmids, it was found that compared to the wild-type, the mRNA expression levels of IκBα in the nucleus of all four mutant types (ZC3H11A<sup>V138I</sup>, ZC3H11A<sup>G43E</sup>, ZC3H11A<sup>P154L</sup> and ZC3H11A<sup>S747T</sup>) were significantly reduced (Supplement Figure 3). According to the ACMG guidelines, the above mutations can be classified as “Pathogenic Moderate”.

      Unfortunately, our patients are part of the MAGIC project (aged 15 years or older), a cohort consists of thousands of individuals with HM (patients from the children and adolescents myopia survey [CAMS] program) who have undergone WES, and their parents' relevant information was not collected for performing a segregation analysis.

      (2) The phenotypes of the heterozygote mice are weak overall. The het mice showed mild to moderate myopic refractive shifts from 4 to 10 weeks of age. However, this cannot be explained by other ocular biometrics such as anterior chamber depth or lens thickness. Some differences are found between het and WT littermates in axial length and vitreous chamber depth but disappear after 8 weeks old. Furthermore, the early differences are not enough to explain the refractive error changes. They mentioned that they did not use homozygotes because of the embryonic lethality. I would strongly suggest employing conditional knockout systems to analyze homozygotes. This will also be able to identify the causative tissues/cells because they assume bipolar cells are functional. The cells in the retinal pigment epithelium and choroid are also important to contribute to myopia development.

      Thank you for your valuable comments. The existing data show significant differences in refraction between 4 and 10 weeks, with the AL and vitreous cavity depth of Het mice being longer than those of WT mice at 4 and 6 weeks. Although no significant differences were observed at other time points, there was still some degree of increase in these parameters. We continued to measure corneal curvature and found no significant differences between the two groups. Therefore, the difference in refraction may be due to the small size of the mouse eye. A 1 D change in refraction corresponds to only a 5-6 μm change in AL(1). However, the SD-OCT resolution used in this study is relatively low (theoretical resolution of 6 μm)(2, 3), so the small changes measured in vitreous cavity depth and AL may not be statistically significant. Additionally, some studies have shown that axial lengths reported in frozen sections are longer than those measured in vivo for age-matched mice(1, 4). Another possible explanation is that the curvature and refractive power of the lens have changed. These hypotheses provide a reasonable explanation for the mismatch between changes in refraction and ocular length parameters.

      Reference

      (1) Schmucker C, Schaeffel F. A paraxial schematic eye model for the growing C57BL/6 mouse. Vision research 44, 1857-1867 (2004).

      (2) Yuan Y, Chen F, Shen M, Lu F, Wang J. Repeated measurements of the anterior segment during accommodation using long scan depth optical coherence tomography. Eye & contact lens 38, 102-108 (2012).

      (3) Shen M, et al. SD-OCT with prolonged scan depth for imaging the anterior segment of the eye. Ophthalmic Surgery, Lasers and Imaging Retina 41, S65-S69 (2010).

      (4) Schmucker C, Schaeffel F. In vivo biometry in the mouse eye with low coherence interferometry. Vision research 44, 2445-2456 (2004).

      The drawback is that, we did not conduct relevant research on homozygous knockout mice. The first reason is that our patient's mutation pattern is heterozygous mutation (Heterozygous knockout mice can better simulate human phenotypes). The second reason is that homozygous knockout mice are lethal, and we did not use the conditional knockout mouse model for further research. At the same time, we limited the pathway of myopia to the recognized and classical retina-sclera pathway, and did not study other pathways such as retinal pigment epithelium and choroid.

      (3) Their hypothesis regarding inflammatory gene changes and myopic development is not logical. Are the inflammatory responses evoked from bipolar cells? Did the mice show an accumulation of inflammatory cells in the inner retina? Visible retinal inflammation is not generally seen in either early-onset or high-myopia human subjects. Can this be seen in the actual subjects in the cohort? To me, this is difficult to adapt the retina-to-sclera signaling they mentioned in the discussion so far. Egr-1 may be examined as described.

      Thank you for your valuable comments. We have removed the hypothesis regarding inflammatory gene changes and myopic development. At present, the explanation is based solely on the correlation of signal pathways, the theoretical basis comes from the reference literature:

      “Lin et al., Role of Chronic Inflammation in Myopia Progression: Clinical Evidence and Experimental Validation. EBioMedicine, 2016 Aug:10:269-81, Figure 7.”

      Reviewer #3 (Public Review):

      Chen et al have identified a new candidate gene for high myopia, ZC3H11A, and using a knock-out mouse model, have attempted to validate it as a myopia gene and explain a potential mechanism. They identified 4 heterozygous missense variants in highly myopic teenagers. These variants are in conserved regions of the protein, but the authors provide no evidence that these specific variants affect protein function. They then created a knock-out mouse. Heterozygotes show myopia at all ages examined but increased axial length only at very early ages. Unfortunately, the authors do not address this point or examine corneal structure in these animals. They show that the mice have decreased B-wave amplitude on electroretinogram (a sign of retinal dysfunction associated with bipolar cells), and decreased expression of a bipolar cell marker, PKCa. They do not address, however, whether there are fewer bipolar cells, or simply decreased expression of the marker protein. On electron microscopy, there are morphologic differences in the outer nuclear layer (where bipolar, amacrine, and horizontal cell bodies reside). Transcriptome analysis identified over 700 differentially expressed genes. The authors chose to focus on the PI3K-AKT and NF-kB signaling pathways and show changes in the expression of genes and proteins in those pathways, including PI3K, AKT, IkBa, NF-kB, TGF-b1, MMP-2, and IL-6, although there is very high variability between animals. They propose that myopia may develop in these animals either as a result of visual abnormality (decreased bipolar cell function in the retina) or by alteration of NF-kB signaling. These data provide an interesting new candidate variant for the development of high myopia, and provide additional data that MMP2 and IL6 have a role in myopia development, but do not support the claim of the title that myopia is caused by an inflammatory reaction.

      Thank you for your valuable comments. Four missense mutations in the ZC3H11A gene (c.412G>A, p.V138I; c.128G>A, p.G43E; c.461C>T, p.P154L; and c.2239T>A, p.S747T) were identified in the 1015 HM patients aged from 15 to 18 years. All of the identified mutations exhibited very low frequencies or does not exist in the Genome Aggregation Database (gnomAD) and Clinvar, and using pathogenicity prediction software SIFT, PolyPhen2, and CADD, most of them display high pathogenicity levels. Among them, c.412G>A, c.128G>A and c.461C>T were located in or around a domain named zf-CCCH_3 (Figure 1A and B). Furthermore, all of the mutation sites were located in highly conserved amino acids across different species (Figure 1C). Four mutations resulted in a higher degree of conformational flexibility and altered the negative charge at the corresponding sites (Figure 1D and E). Meanwhile, through transfection of overexpression mutant plasmids, it was found that compared to the wild-type, the mRNA expression levels of IκBα in the nucleus of all four mutant types (ZC3H11A<sup>V138I</sup>, ZC3H11A<sup>G43E</sup>, ZC3H11A<sup>P154L</sup> and ZC3H11A<sup>S747T</sup>) were significantly reduced (Supplement Figure 3). According to the ACMG guidelines, the above mutations can be classified as “Pathogenic Moderate”.

      The existing data show significant differences in refraction between 4 and 10 weeks, with the AL and vitreous cavity depth of Het mice being longer than those of WT mice at 4 and 6 weeks. Although no significant differences were observed at other time points, there was still some degree of increase in these parameters. We continued to measure corneal curvature and found no significant differences between the two groups. Therefore, the difference in refraction may be due to the small size of the mouse eye. A 1 D change in refraction corresponds to only a 5-6 μm change in AL(1). However, the SD-OCT resolution used in this study is relatively low (theoretical resolution of 6 μm)(2, 3), so the small changes measured in vitreous cavity depth and AL may not be statistically significant. Additionally, some studies have shown that axial lengths reported in frozen sections are longer than those measured in vivo for age-matched mice(1, 4). Another possible explanation is that the curvature and refractive power of the lens have changed. These hypotheses provide a reasonable explanation for the mismatch between changes in refraction and ocular length parameters.

      To evaluate the change in the number of a specific type of retinal cells, the most commonly used experimental method involves staining with antibodies specific to the target cell type, followed by fluorescence microscopy. The fluorescence intensity or the number of cells can be analyzed semi-quantitatively to assess the changes in the specific cell type in the retina. For example, in retinal degenerative models, rhodopsin-specific staining is used to identify the loss of rod cells. In our study, we selected PCK-α as a marker protein for bipolar cells to assess their number. Additionally, transmission electron microscopy (TEM) was used to observe damage to the cell morphology in the inner nuclear layer (INL) of Het mice, where bipolar cell bodies are located. Based on both sets of data, we conclude that bipolar cells have indeed undergone structural damage and a reduction in number.

      Reference

      (1) Schmucker C, Schaeffel F. A paraxial schematic eye model for the growing C57BL/6 mouse. Vision research 44, 1857-1867 (2004).

      (2) Yuan Y, Chen F, Shen M, Lu F, Wang J. Repeated measurements of the anterior segment during accommodation using long scan depth optical coherence tomography. Eye & contact lens 38, 102-108 (2012).

      (3) Shen M, et al. SD-OCT with prolonged scan depth for imaging the anterior segment of the eye. Ophthalmic Surgery, Lasers and Imaging Retina 41, S65-S69 (2010).

      (4) Schmucker C, Schaeffel F. In vivo biometry in the mouse eye with low coherence interferometry. Vision research 44, 2445-2456 (2004).

      We have removed the hypothesis regarding inflammatory gene changes and myopic development. At present, the explanation is based solely on the correlation of signal pathways, the theoretical basis comes from the reference literature:

      “Lin et al., Role of Chronic Inflammation in Myopia Progression: Clinical Evidence and Experimental Validation. EBioMedicine, 2016 Aug:10:269-81, Figure 7.”

    1. eLife Assessment

      This study follows up on Arimura et al's powerful new method MagIC-Cryo-EM for imaging native complexes at high resolution. Using a clever design embedding protein spacers between the antibody and the nucleosomes purified, thereby minimizing interference from the beads, the authors concentrate linker histone variant H1.8 containing nucleosomes. From these samples, the authors obtain convincing atomic structures of the H1.8 bound chromatosome purified from interphase and metaphase cells, finding a NPM2 chaperone bound form exists as well. Caveats previously noted have been addressed nicely in the revision, strengthening the overall conclusions. This is an important new tool in the arsenal of single molecule biologists, permitting a deep dive into structure of native complexes, and will be of high interest to a broad swathe of scientists studying native macromolecules present at low concentrations in cells.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Arimura et al describe MagIC-Cryo-EM, an innovative method for immune-selective concentrating of native molecules and macromolecular complexes for Cryo-EM imaging and single-particle analysis. Typically, Cryo-EM imaging requires much larger concentrations of biomolecules than those that are feasible to achieve by conventional biochemical fractionation. This manuscript is meticulously and clearly written and the new technique is likely to become a great asset to other electron microscopists and chromatin researchers.

      Strengths:

      Previously, Arimura et al. (Mol. Cell 2021) isolated from Xenopus extract and resolved by Cryo-EM a sub-class of native nucleosomes conjugated containing histone H1.8 at the on-dyad position, similar to that previously observed by other researchers with reconstituted nucleosomes. Here they sought to analyze immuno-selected nucleosomes aiming to observe specific modes of H1.8 positioning (e.g. on-dyad and off-dyad) and potentially reveal structural motifs responsible for the decreased affinity of H1.8 for the interphase chromatin compared to metaphase chromosomes. The main strength of this work is a clever and novel methodological design, in particular the engineered protein spacers to separate captured nucleosomes from streptavidin beads for clear imaging. The authors provide a detailed step-by-step description of MagIC-Cryo-EM procedure including nucleosome isolation, preparation of GFP nanobody attached magnetic beads, optimization of the spacer length, concentration of the nucleosomes on graphene grids, data collection and analysis, including their new DUSTER method to filter-out low signal particles. This tour de force methodology should facilitate the consideration of MagIC-Cryo-EM by other electron microscopists, especially for analysis of native nucleosome complexes.<br /> In pursuit of biologically important new structures, the immune-selected H1.8-containing nucleosomes were solved at about 4A resolution; their structure appears to be very similar to the previously determined structure of H1.8-reconstituted nucleosomes. There were no apparent differences between the metaphase and interphase complexes suggesting that the on-dyad and off-dyad positioning does not explain the differences in H1.8 - nucleosome binding. However, they were able to identify and solve complexes of H1.8-GFP with histone chaperone NPM2 in a closed and open conformation providing mechanistic insights for H1-NPM2 binding and the reduced affinity of H1.8 to interphase chromatin as compared to metaphase chromosomes.

      MagIC technique still has certain limitations resulting from formaldehyde fixation, use of bacterial-expressed recombinant H1.8-GFP, and potential effects of magnetic beads and/or spacer on protein structure, which are explicitly discussed in the text. Notwithstanding these limitations, MagIC-Cryo-EM is expected to become a great asset to other electron microscopists and biochemists studying native macromolecular complexes.

      Comments on revisions:

      In the revision, Arimura et al. have constructively addressed the reviewer's concerns, by discussing possible limitations and including additional information on proteomic analysis and H1.8-NPM2 structures.<br /> The revised manuscript and rebuttal letter strengthen my initial opinion that this paper describes an innovative method for immune-selective concentrating of native molecules and macromolecular complexes thus enabling Cryo-EM imaging and structural analysis of native nucleosome complexes at low concentration. This manuscript is meticulously and clearly written and may become a great asset to other electron microscopists and chromatin researchers

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a straightforward and convincing demonstration of a reagent and workflow that they collectively term "MagIC-cryo-EM", in which magnetic nanobeads combined with affinity linkers are used to specifically immobilize and locally concentrate complexes that contain a protein-of-interest. As a proof of concept, they localize, image, and reconstruct H1.8-bound nucleosomes reconstructed from frog egg extracts. The authors additionally devised an image-processing workflow termed "DuSTER", which increases the true positive detections of the partially ordered NPM2 complex. The analysis of the NPM2 complex {plus minus} H1.8 was challenging because only ~60 kDa of protein mass was ordered. Overall, single-particle cryo-EM practitioners should find this study useful.

      Strengths:

      The rationale is very logical and the data are convincing.

      Weaknesses:

      I have seen an earlier version of this study at a conference. The conference presentation was much easier to follow than the current manuscript. It is as if this manuscript had undergone review at another journal and includes additional experiments to satisfy previous reviewers. Specifically, the NPM2 results don't seem to add much to the main story (MagIC-cryo-EM) and read more like an addendum. The authors could probably publish the NPM2 results separately, which would make the core MagIC results (sans DusTER) easier to read.

      Comments on revisions:

      The authors have addressed my concerns. Congratulations!

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Arimura et al describe MagIC-Cryo-EM, an innovative method for immune-selective concentrating of native molecules and macromolecular complexes for Cryo-EM imaging and single-particle analysis. Typically, Cryo-EM imaging requires much larger concentrations of biomolecules than that are feasible to achieve by conventional biochemical fractionation. Overall, this manuscript is meticulously and clearly written and may become a great asset to other electron microscopists and chromatin researchers.

      Strengths:

      Previously, Arimura et al. (Mol. Cell 2021) isolated from Xenopus extract and resolved by Cryo-EM a sub-class of native nucleosomes conjugated containing histone H1.8 at the on-dyad position, similar to that previously observed by other researchers with reconstituted nucleosomes. Here they sought to analyze immuno-selected nucleosomes aiming to observe specific modes of H1.8 positioning (e.g. on-dyad and off-dyad) and potentially reveal structural motifs responsible for the decreased affinity of H1.8 for the interphase chromatin compared to metaphase chromosomes. The main strength of this work is a clever and novel methodological design, in particular the engineered protein spacers to separate captured nucleosomes from streptavidin beads for a clear imaging. The authors provide a detailed step-by-step description of MagIC-Cryo-EM procedure including nucleosome isolation, preparation of GFP nanobody attached magnetic beads, optimization of the spacer length, concentration of the nucleosomes on graphene grids, data collection and analysis, including their new DUSTER method to filter-out low signal particles. This tour de force methodology should facilitate considering of MagIC-CryoEM by other electron microscopists especially for analysis of native nucleosome complexes.

      In pursue of biologically important new structures, the immune-selected H1.8-containing nucleosomes were solved at about 4A resolution; their structure appears to be very similar to the previously determined structure of H1.8-reconstituted nucleosomes. There were no apparent differences between the metaphase and interphase complexes suggesting that the on-dyad and off-dyad positioning does not explain the differences in H1.8 - nucleosome binding. However, they were able to identify and solve complexes of H1.8-GFP with histone chaperone NPM2 in a closed and open conformation providing mechanistic insights for H1-NPM2 binding and the reduced affinity of H1.8 to interphase chromatin as compared to metaphase chromosomes.

      Weaknesses:

      Still, I feel that there are certain limitations and potential artifacts resulting from formaldehyde fixation, use of bacterial-expressed recombinant H1.8-GFP, and potential effects of magnetic beads and/or spacer on protein structure, that should be more explicitly discussed. 

      We thank the reviewer for recognizing the significance of our methods and for constructive comments. To respond to the reviewer's criticism, we revised the “Limitation of the study” section (page 12, line 420) as indicated by the underlines below.

      “While MagIC-cryo-EM is envisioned as a versatile approach suitable for various biomolecules from diverse sources, including cultured cells and tissues, it has thus far been tested only with H1.8-bound nucleosome and H1.8-bound NPM2, both using antiGFP nanobodies to isolate GFP-tagged H1.8 from chromosomes assembled in Xenopus egg extracts after pre-fractionation of chromatin. To apply MagIC-cryo-EM for the other targets, the following factors must be considered: 1) Pre-fractionation. This step (e.g., density gradient or gel filtration) may be necessary to enrich the target protein in a specific complex from other diverse forms (such as monomeric forms, subcomplexes, and protein aggregates). 2) Avoiding bead aggregation. Beads may be clustered by targets (if the target complex contains multiple affinity tags or is aggregated), nonspecific binders, and the target capture modules. To directly apply antibodies that recognize the native targets and specific modifications, optimization to avoid bead aggregation will be important. 3) Stabilizing complexes. The target complexes must be stable during the sample preparation. Crosslink was necessary for the H1.8-GFP-bound nucleosome. 4) Loading the optimum number of targets on the bead. The optimal number of particles per bead differs depending on target sizes, as larger targets are more likely to overlap. For H1.8-GFP-bound nucleosomes, 500 to 2,000 particles per bead were optimal. We expect that fewer particles should be coated for larger targets.”

      We would like to note that while the use of bacterially expressed GFP-tagged H1.8 and MagIC-cryo-EM may potentially influence the structure of the H1.8-bound nucleosome, the structures of GFP-tagged H1.8-bound nucleosomes isolated from chromosomes assembled in Xenopus egg extract are essentially identical to the endogenous H1.8bound nucleosome structure we previously determined. In addition, we have shown that GFP-H1.8 was able to replace the function of endogenous H1.8 to support the proper mitotic chromosome length (Fig. S3), which is based on the capacity of H1.8 to compete with condensin as we have previously demonstrated (PMID 34406118). Therefore, we believe that the effects of GFP-tagging to be minimal. This point incorporated into the main result section (page 6, line 215) to read as “The structures of GFP-tagged H1.8bound nucleosomes isolated from Xenopus egg extract chromosomes are essentially identical to the endogenous H1.8-bound nucleosome structure we previously determined. Therefore, although the usage of GFP-tagged H1.8 and MagIC-cryo-EM potentially influence the structure of the H1.8-bound nucleosome, we consider these influences to be minimal.”

      Also, the GFP-pulled down H1.8 nucleosomes should be better characterized biochemically to determine the actual linker DNA lengths (which are known to have a strong effect of linker histone affinity) and presence or absence of other factors such as HMG proteins that may compete with linker histones and cause the multiplicity of nucleosome structural classes (such as shown on Fig. 3F) for which the association with H1.8 is uncertain.

      We addressed the concerns brought by the reviewer as following:

      (1) DNA length

      As the reviewer correctly pointed out, linker DNA length is critical for linker histone binding, and conventional ChIP protocols often result in DNA over-digestion to lengths of 140–150 bp. To minimize DNA over-digestion and structural damage, we have optimized a gentle chromosomal nucleosome purification protocol that enabled the cryoEM analysis of chromosomal nucleosomes (PMID: 34478647). This protocol involves DNA digestion with a minimal amount of MNase at 4ºC, producing nucleosomal DNA fragments of 180–200 bp. Additionally, before each chromatin extraction, we performed small-scale MNase assays to ensure that the DNA lengths consistently fell within the 180–200 bp range (Fig. S4B). These DNA lengths are sufficient for linker histone H1 binding, in agreement with previous findings indicating that >170 bp is adequate for linker histone association (PMID: 26212454). 

      This information has been incorporated into the main text and Methods section; 

      On page 5, line 178, the sentence was added to read, “To prevent dissociation of H1.8 from nucleosomes during DNA fragmentation, the MNase concentration and the reaction time were optimized to generate DNA fragment lengths with 180–200 bp (Fig. S4B), which is adequate for linker histone association (PMID 26212454).”

      On page 32, line 1192, the sentence was added to read, “To digest chromatin, MNase concentration and reaction time were tested on a small scale and optimized to the condition that produces 180-200 bp DNA fragments.”

      (2) Co-associated proteins with H1-GFP nucleosome.

      We now include mass spectrometry (MS) data for the proteins in the sucrose density gradient fraction 5 used for MagIC-cryo-EM analysis of GFP-H1.8-bound chromatin proteins as well as MS of proteins isolated with the corresponding MagIC-cryo-EM beads (Table S2 and updated Table S5). As the reviewer expected, HMG proteins (hmga2.L and hmga2.S in Table S2) were present in interphase sucrose gradient fraction 5, but their levels were less than 2% of H1.8. Accordingly, none of the known chromatin proteins besides histones and the nucleoplasmin were detected by MS in the GFP-nanobody MagIC-cryo-EM beads, including the FACT complex and PCNA, whose levels in the sucrose fraction were comparable to H1.8 (Table S2), suggesting that our MagIC-cryo-EM analysis was not meaningfully affected by HMG proteins and other chromatin proteins. Consistent with our interpretation, the structural features of H1.8bound nucleosomes isolated from interphase and metaphase chromosomes were essentially identical.

      Reviewer #2 (Public review):

      Summary:

      The authors present a straightforward and convincing demonstration of a reagent and workflow that they collectively term "MagIC-cryo-EM", in which magnetic nanobeads combined with affinity linkers are used to specifically immobilize and locally concentrate complexes that contain a protein-of-interest. As a proof of concept, they localize, image, and reconstruct H1.8-bound nucleosomes reconstructed from frog egg extracts. The authors additionally devised an image-processing workflow termed "DuSTER", which increases the true positive detections of the partially ordered NPM2 complex. The analysis of the NPM2 complex {plus minus} H1.8 was challenging because only ~60 kDa of protein mass was ordered. Overall, single-particle cryo-EM practitioners should find this study useful.

      Strengths:

      The rationale is very logical and the data are convincing.

      Weaknesses:

      I have seen an earlier version of this study at a conference. The conference presentation was much easier to follow than the current manuscript. It is as if this manuscript had undergone review at another journal and includes additional experiments to satisfy previous reviewers. Specifically, the NPM2 results don't seem to add much to the main story (MagIC-cryo-EM), and read more like an addendum. The authors could probably publish the NPM2 results separately, which would make the core MagIC results (sans DusTER) easier to read.

      We thank the reviewer for constructive comments. We regret to realize that the last portion of the result section, where we have described a detailed analysis of NPM2 structures, was erroneously omitted from the submission due to MS Word's formatting error. We hope that the inclusion of this section will justify the inclusion of the NPM2 analysis. Specifically, we decided to include NPM2 structures to demonstrate that our method successfully determined the structure that had never been reported. Conformational changes in the NPM family have been proposed in previous studies using techniques such as NMR, negative stain EM, and simulations, and these changes are thought to play a critical role in regulating NPM function (PMID: 25772360, 36220893, 38571760), but there has been a confusion in the literature, for example, on the substrate binding site and on whether NPM2 recognizes the substrate as a pentamer or decamer. Despite their low resolution, our new cryo-EM structures of NPM2 suggest that NPM2 recognizes the substrate as a pentamer, identifies potential substrate-binding sites, and indicates the mechanisms underlying NPM2 conformational changes. We believe that publishing these results will provide valuable insights into the NPM research field and help guide and inspire further investigations.

      Reviewer #3 (Public review):

      Summary:

      In this paper, Arimura et al report a new method, termed MagIC-Cryo-EM, which refers to the method of using magnetic beads to capture specific proteins out of a lysate via, followed immunoprecipitation and deposition on EM grids. The so-enriched proteins can be analzyed structurally. Importantly, the nanoparticles are further functionalized with protein-based spacers, to avoid a distorted halo around the particles. This is a very elegant approach and allows the resolution of the stucture of small amounts of native proteins at atomistic resolution.

      Here, the authors apply this method to study the chromatosome formation from nucleosomes and the oocyte-specific linker histone H1.8. This allows them to resolve H1.8-containing chromatomosomes from oocyte extract in both interphase and metaphase conditions at 4.3 A resolution, which reveal a common structure with H1 placed right at the dyad and contacting both entry-and exit linker DNA.

      They then investigate the origin of H1.8 loss during interphase. They identify a nonnucleosomal H1.8-containing complex from interphase preparations. To resolve its structure, the authors develop a protocol (DuSTER) to exclude particles with ambiguous center, revealing particles with five-fold symmetry, that matches the chaperone NPM2. MS and WB confirms that the protein is present in interphase samples but not metaphase. The authors further separate two isoforms, an open and closed form that coexist. Additional densities in the open form suggest that this might be bound H1.8.

      Strengths:

      Together this is an important addition to the suite of cryoEM methods, with broad applications. The authors demonstrate the method using interesting applications, showing that the methods work and they can get high resolution structures from nucleosomes in complex with H1 from native environments.

      Weaknesses:

      The structures of the NPM2 chaperone is less well resolved, and some of the interpretation in this part seems only weakly justified.

      We thank the reviewer for recognizing the significance of our methods and for constructive comments. We regret to realize that the last portion of the result section where we have described detailed analysis of NPM2 structures was erroneously omitted from the submission due to the MS word's formatting error. We hope that inclusion of this section will justify the inclusion of NPM2 analysis. Specifically, we agree that our NPM2 structures are low-resolution and that our interpretations may be revised as higher-resolution structures become available, although we believe that publishing these results will provide valuable insights into the NPM research field and also will illustrate the power of MagIC-cryo-EM and DuSTER. To respond to this criticism, the revised manuscript now clearly describes the limitations of our NPM2 structures while highlighting the key insights. In page 12 line 452, the sentence was added to read, “While DuSTER enables the structural analysis of NPM2 co-isolated with H1.8-GFP, the resulting map quality is modest, and the reported numerical resolution may be overestimated. Furthermore, only partial density for H1.8 is observed. Although structural analysis of small proteins is inherently challenging, it is possible that halo-like scattering further hinder high-resolution structural determination by reducing the S/N ratio. More detailed structural analyses of the NPM2-substrate complex will be addressed in future studies.

      Reviewer #1 (Recommendations for the authors): 

      (1) To assess the advantage provided by the new technique for imaging of isolated pure or enriched fractions of native chromatin, the nucleosome structure analysis should be matched by a proper biochemical characterization of the isolated nucleosomes. Nucleosome DNA size is known to greatly affect linker histone affinity and additional proteins like HMG may compete with linker histone for binding. SDS-PAGE of the sucrose gradient fractions (Fig. 3E) shows many nonhistone proteins where H1-GFP appears to be a minor component. However, the gradient fractions contain both bound and unbound proteins. I would suggest that a larger-scale pull-down using the same GFP antibodies and streptavidin beads should be conducted and the captured nucleosome DNA and proteins characterized. 

      We addressed the concerns brought by the reviewer as following:

      (1) DNA length

      As the reviewer correctly pointed out, linker DNA length is critical for linker histone binding, and conventional ChIP protocols often result in DNA over-digestion to lengths of 140–150 bp. To minimize DNA over-digestion and structural damage, we have optimized a gentle chromosomal nucleosome purification protocol that enabled the cryoEM analysis of chromosomal nucleosomes (PMID: 34478647). This protocol involves DNA digestion with a minimal amount of MNase at 4ºC, producing nucleosomal DNA fragments of 180–200 bp. Additionally, before each chromatin extraction, we performed small-scale MNase assays to ensure that the DNA lengths consistently fell within the 180–200 bp range (Fig. S4B). These DNA lengths are sufficient for linker histone H1 binding, in agreement with previous findings indicating that >170 bp is adequate for linker histone association (PMID: 26212454). 

      This information has been incorporated into the main text and Methods section. 

      On page 5, line 178, the sentence was added to read, “To prevent dissociation of H1.8 from nucleosomes during DNA fragmentation, the MNase concentration and the reaction time were optimized to generate DNA fragment lengths with 180–200 bp (Fig. S4B), which is adequate for linker histone association (PMID 26212454).”

      On page 32, line 1192, the sentence was added to read, “To digest chromatin, MNase concentration and reaction time were tested on a small scale and optimized to the condition that produces 180-200 bp DNA fragments.”

      (2) Co-associated proteins with H1-GFP nucleosome.

      We now include mass spectrometry (MS) data for the proteins in the sucrose density gradient fraction 5 used for MagIC-cryo-EM analysis of GFP-H1.8-bound chromatin proteins as well as MS of proteins isolated with the corresponding MagIC-cryo-EM beads (Table S2 and updated Table S5). As the reviewer expected, HMG proteins (hmga2.L and hmga2.S in Table S2) were present in interphase sucrose gradient fraction 5, but their levels were less than 2% of H1.8. Accordingly, none of known chromatin proteins besides histones and the nucleoplasmin were detected by MS in the GFP-nanobody MagIC-cryo-EM beads, including the FACT complex and PCNA, whose levels in the sucrose fraction were comparable to H1.8 (Table S2), suggesting that our MagIC-cryo-EM analysis was not meaningfully affected by HMG proteins and other chromatin proteins. Consistent with our interpretation, the structural features of H1.8bound nucleosomes isolated from interphase and metaphase chromosomes were essentially identical.

      (2) A similar pull-down analysis with quantitation of NPM2 and GFP (in addition to analysis of sucrose gradient fractions) should be conducted to show whether the immune-selected particles do indeed contains a stoichiometric complex of H1.8 with NPM2.  

      Proteins isolated using MagIC-cryo-EM beads were identified through mass spectrometry (Fig. 4D). The MS signal suggests that the molar ratio of NPM2 is higher than that of H1.8 or sfGFP. This observation is consistent with the idea that an NPM2 pentamer can bind between one and five H1.8-GFP molecules.

      (3) The use of recombinant, bacterial produced H1.8- GFP and just one type of antibodies (GFP) are certain limitations of this work. These limitations as well as future steps needed to use antibodies specific for native antigens, such as histone variants and epigenetic modifications should be discussed.  

      We clarified these points in the “Limitation of the study” section (page 12, line 420). The revised sections are indicated by the underlines below.

      “While MagIC-cryo-EM is envisioned as a versatile approach suitable for various biomolecules from diverse sources, including cultured cells and tissues, it has thus far been tested only with H1.8-bound nucleosome and H1.8-bound NPM2, both using antiGFP nanobodies to isolate GFP-tagged H1.8 from chromosomes assembled in

      Xenopus egg extracts after pre-fractionation of chromatin. To apply MagIC-cryo-EM for the other targets, the following factors must be considered: 1) Pre-fractionation. This step (e.g., density gradient or gel filtration) may be necessary to enrich the target protein in a specific complex from other diverse forms (such as monomeric forms, subcomplexes, and protein aggregates). 2) Avoiding bead aggregation. Beads may be clustered by targets (if the target complex contains multiple affinity tags or is aggregated), nonspecific binders, and the target capture modules. To directly apply antibodies that recognize the native targets and specific modifications, optimization to avoid bead aggregation will be important. 3) Stabilizing complexes. The target complexes must be stable during the sample preparation. Crosslink was necessary for the H1.8-GFP-bound nucleosome. 4) Loading the optimum number of targets on the bead. The optimal number of particles per bead differs depending on target sizes, as larger targets are more likely to overlap. For H1.8-GFP-bound nucleosomes, 500 to 2,000 particles per bead were optimal. We expect that fewer particles should be coated for larger targets.”

      Reviewer #2 (Recommendations for the authors):  

      General: 

      Figures: Most of the figures have tiny text and schematic items (like Fig. 2B). To save readers from having to enlarge the paper on their computer screen, consider enlarging the smallest text & figure panels. 

      We enlarged the text in the main figures.

      Is it possible that the MagIC method also keeps more particles "submerged", i.e., away from the air:water interface? Does MagIC change the orientation distribution?  

      In theory, the preferred orientation bias should be reduced in MagIC-cryo-EM, as particles are submerged, and the bias is thought to arise from particle accumulation at the air-water interface. However, while the preferred orientation appears to be mitigated, the issue is not completely resolved, as demonstrated in Author response image 1.

      Author response image 1.

      A possible explanation for the remaining preferred orientation bias in MagIC-cryo-EM data is that many particles are localized on graphene-water interfaces.

      Consider adding a safety note to warn about possible pinching injuries when handling neodymium magnets. 

      This is a good idea. We added a sentence in the method section (page 24, line 878), “The two pieces of strong neodymium magnets have to be handled carefully as magnets can leap and slam together from several feet apart.”

      In the methods section, the authors state that the grids were incubated on magnets, followed by blotting and plunge freezing in the Vitrobot. Presumably, the blotting was performed in the absence of magnets. The authors may want to clarify this in the text. If so, can the authors speculate how the magnet-treated beads are better retained on the grids during blotting? Is it due to the induced aggregation and/or deposition of the nanobeads on the grid surface? 

      In the limitation section (page 12 line 446), the sentence was added to read:

      “The efficiency of magnetic bead capture can be further improved. In the current MagICcryo-EM workflow, the cryo-EM grid is incubated on a magnet before being transferred to the Vitrobot for vitrification. However, since the Vitrobot cannot accommodate a strong magnet, the vitrification step occurs without the magnetic force, potentially resulting in bead loss. This limitation could be addressed by developing a new plunge freezer capable of maintaining magnetic force during vitrification.”

      In the method section (page 27 line 993), the sentence was modified. The revised sections are indicated by underlines.

      “The grid was then incubated on the 40 x 20 mm N52 neodymium disc magnets for 5 min within an in-house high-humidity chamber to facilitate magnetic bead capture. Once the capture was complete, the tweezers anchoring the grid were transferred and attached to the Vitrobot Mark IV (FEI), and the grid was vitrified by employing a 2second blotting time at room temperature under conditions of 100% humidity.”

      Do you see an extra density corresponding to the GFP in your averages?  

      Since GFP is connected to H1.8 via a flexible linker, the GFP structure was observed in complex with the anti-GFP nanobody, separate from the H1.8-nucleosome and H1.8NPM2 complexes, as shown in Fig. S10.

      Fig. 5 & Fig. S11: The reported resolutions for NPM2 averages were ~5Å but the densities appear - to my eyes - to resemble a lower-resolution averages.  

      Although DuSTER enables the 3D structural determination of NPM2 co-isolated with H1-GFP, we recognize that the quality of the NPM2 map falls short of the standard expected for a typical 5 Å-resolution map. To appropriately convey the quality of the NPM2 maps, we have included the 3D FSC and local resolution map of the NPM2 structure (new Fig. S12). Furthermore, we have revised the manuscript to deemphasize the resolution of the NPM2 structure to avoid any potential misinterpretation.

      Fig. 5D: The cartoon says: "less H1.8 on interphase nucleosome" and "more H1.8 on metaphase nucleosome". Please help the readers understand this conclusion with the gel in Fig. 3C and the population histograms in Fig. 3F. 

      As depicted in Fig. 3A, we previously identified the preferential binding of H1.8 to metaphase nucleosomes (PMID: 34478647). In this study, to obtain sufficient H1.8bound nucleosomes for MagIC-cryo-EM, we used 2.5 times more starting material for interphase samples compared to M-phase samples. This discrepancy complicates the comparison of H1-GFP binding ratios in western blots. However, in GelCode<sup>TM</sup> Blue staining (Fig. S4A), where both H1-GFP and histone bands are visible, the preferential binding of H1.8 to metaphase nucleosomes can be observed (See fractions 11 in interphase and metaphase).

      Abstract - that removes low signal-to-noise ratio particles -> to exclude low signal-tonoise ratio particles; The term "exclude" is more accurate and is in the DuSTER acronym itself. 

      We edited it accordingly. 

      P1 - to reduce sample volume/concentration -> to lower the sample volume/concentration needed 

      We edited it accordingly.

      P1 - Flow from 1st to 2nd paragraph could be improved. It's abrupt. Maybe say that some forms of nucleoprotein complexes are rare, with one example being H1.8-bound nucleosomes in interphase chromatin? 

      We have revised the text to address the challenges involved in the structural characterization of native chromatin-associated protein complexes. The revised text reads, “Structural characterization of native chromatin-associated protein complexes is particularly challenging due to their heterogeneity and scarcity: more than 300 proteins directly bind to the histone core surface, while each of these proteins is targeted to only a fraction of nucleosomes in chromatin.”

      P2 - interacts both sides of the linker DNA -> interacts with both the entry and exit linker DNA 

      We have edited it accordingly.

      P2 - "from the chromatin sample isolated from metaphase chromosomes but not from interphase chromosomes" - meaning that the interphase nucleosomes don't have H1.8 densities at all, or that they do, but the H1.8 only interacts with one of the two linker DNAs? 

      In our original attempt to analyze nucleosome structures assembled in Xenopus egg extracts without MagIC-cryo-EM, we were not able to detect the density confidently assigned to H1.8 in interphase chromatin samples. To avoid potential confusion, the revised text reads, “We were able to resolve the 3D structure of the H1.8-bound nucleosome isolated from metaphase chromosomes but not from interphase chromosomes(3). The resolved structure indicated that H1.8 in metaphase is most stably bound to the nucleosome at the on-dyad position, in which H1 interacts with both the entry and exit linker DNAs(21–24). This stable H1 association to the nucleosome in metaphase likely reflects its role in controlling the size and the shape of mitotic chromosomes through limiting chromatin accessibility of condensins(25), but it remains unclear why H1.8 binding to the nucleosome in interphase is less stable. Since the low abundance of H1.8-bound nucleosomes in interphase chromatin might have prevented us from determining their structure, we sought to solve this issue by enriching H1.8bound nucleoprotein complexes through adapting ChIP-based methods.”

      P1, P2 - The logical leap from "by adapting ChIP-based methods" to MagIC is not clear. 

      We addressed this point by revising the text as shown above.

      P2 - "Intense halo-like noise" - This is an awkward term. These are probably the Fresnel fringes that arise from underfocus. I wouldn't call this phenomenon "noise". https://www.jeol.com/words/emterms/20121023.093457.php  

      We re-phrased it as “halo-like scattering”.

      P3 -It may help readers to explain how cryo-EM structures of the H1.8-associated interphase nucleosomes would differentiate from the two models in Fig. 3A.  

      We have revised the introduction section (lines 43~75), including the corresponding paragraph to address the comments above, highlighting the motivation behind determining the structures of interphase and metaphase H1.8-associated nucleosomes. We hope the revisions are now clear.

      P6 - "they were masked by background noise from the ice, graphene". I thought that graphene would be contribute minimal noise because it is only one-carbon-layer thick? 

      That is a valid point. We have removed the term ‘graphene’ from the sentence.

      P6 - What was the rationale to focus on particles with 60 - 80Å dimensions? 

      We observed that 60–80 Å particles were captured by MagIC-cryo-EM beads, as numerous particles of this size were clearly visible in the motion-corrected micrographs surrounding the beads. To clarify this, we revised the sentence to read: 'Topaz successfully picked most of the 60–80 Å particles visible in the motion-corrected micrographs and enriched around the MagIC-cryo-EM beads (Figure S6A).

      P7 - Please explain a technical detail about DuSTER: do independent runs of Topaz picks give particle centers than differ by up to ~40Å or is it that 2D classification gives particle centers that differ by up to ~40Å? Is it possible to distinguish these two possibilities by initializing CryoSPARC on two independent 2D classification jobs on the same set of Topaz picks?  

      Due to the small particle size of NPM2, the former type is predominantly generated when Topaz fails to pick the particles reproducibly. The first cycle of DuSTER removes both former-type particles (irreproducibly picked particles) and latter-type particles (irreproducibly centered particles), while subsequent cycles specifically target and remove the latter type. We have added the following sentence to clarify this (page 7, line 249). The revised sections are indicated by underlines below: “To assess the reproducibility of the particle recentering during 2D classification, two independent particle pickings were conducted by Topaz so that each particle on the grid has up to two picked points (Figure 4A, second left panel). Some particles that only have one picked point will be removed in a later step. These picked points were independently subjected to 2D classification. After recentering the picked points by 2D classification, distances (D) between recentered points from the first picking process and other recentered points from the second picking process were measured. DuSTER keeps recentered points whose D are shorter than a threshold distance (D<sub>TH</sub>). By setting D<sub>TH</sub> = 20 Å, 2D classification results were dramatically improved in this sample; a five-petal flower-shaped 2D class was reconstructed (Figure 4B). This step also removes the particles that only have one picked point.“

      P8 - NPM2 was introduced rather abruptly (it was used as an initial model for 3D refinement). I see NPM2 appear in the supplemental figures cited before the text in P8, but the significance of NPM2 was not discussed there. The authors seem to have made a logical leap that is not explained. 

      We have removed the term NPM2 in P8.

      P9 - "extra cryo-EM densities, which likely represent H1." This statement would be better supported if the resolution of the reconstruction was high enough to resolve H1specific amino acids in the "extra densities" protruding from the petals. 

      We concurred and softened the statement to read “extra cryo-EM densities, which may represent H1.8,”

      P9 - "Notably, extra cryo-EM densities, which likely represent H1.8, are clearly observed in the open form but much less in the closed form near the acidic surface regions proximal to the N terminus of beta-1 and the C terminus of beta-8 (Fig. 5A and 5B)."  It would be helpful to point out where the "extra densities" are in the figure for the open and closed form. Some readers may not be able to extrapolate from the single red arrow to the other extra densities. 

      Thank you for your comment. We have pointed out the density in the Fig 5A as well.

      P9 - "Supporting this idea, the acidic tract A1 (aa 36-40) and A2 (aa 120-140) are both implicated in the recognition of basic substrates such as core histones..."  Did this sentence get cut off in the next column?  

      We apologize for our oversight on this error. Due to an MS Word formatting error, the sentences (lines 316–343) were hidden beneath a figure. We have retrieved the missing sentences:

      “Supporting this idea, the acidic tract A1 (aa 36-40) and A2 (aa 120-140), which are both implicated in recognition of basic substrates such as core histones(43,50), respectively interact with and are adjacent to the putative H1.8 density (Figure 5B). In addition, the NPM2 surface that is in direct contact with the putative H1.8 density is accessible in the open form while it is internalized in the closed form (Figure 5C). This structural change of NPM2 may support more rigid binding of H1.8 to the open NPM2, whereas H1.8 binding to the closed form is less stable and likely occurs through interactions with the C-terminal A2 and A3 tracts, which are not visible in our cryo-EM structures.

      In the aforementioned NPM2-H1.8 structures, for which we applied C5 symmetry during the 3D structure reconstruction, only a partial H1.8 density could be seen (Figure 5B). One possibility is that H1.8 structure in NPM2-H1.8 does not follow C5 symmetry. As the size of the NPM2-H1.8 complex estimated from sucrose gradient elution volume is consistent with pentameric NPM2 binding to a single H1.8 (Figure 3C and Table S3), applying C5 symmetry during structural reconstruction likely blurred the density of the monomeric H1.8 that binds to the NPM2 pentamer. The structural determination of NPM2-H1.8 without applying C5 symmetry lowered the overall resolution but visualized multiple structural variants of the NPM2 protomer with different degrees of openness coexisting within a NPM2-H1.8 complex (Figure S14), raising a possibility that opening of a portion of the NPM2 pentamer may affect modes of H1.8 binding. Although more detailed structural analyses of the NPM2-substrate complex are subject of future studies, MagIC-cryo-EM and DuSTER revealed structural changes of NPM2 that was co-isolated H1.8 on interphase chromosomes.

      Discussion 

      MagIC-cryo-EM offers sub-nanometer resolution structural determination using a heterogeneous sample that contains the target molecule at 1~2 nM, which is approximately 100 to 1000 times lower than the concentration required for conventional cryo-EM methods, including affinity grid approach(9–11).”

      Reviewer #3 (Recommendations for the authors):  

      All with regards to the NPM2 part: 

      It would be great if the authors could provide micrographs where the particles are visible, in addition to the classes. 

      The particles on the motion-corrected micrographs are available in Fig S9.

      Also, the angular distribution in the SI looks very uniform. 

      I also wonder, if the authors could indicate the local resolution for all structures. 

      Could the authors provide the 3D FSC for NPM2?  

      Although DuSTER enables the 3D structural determination of NPM2 co-isolated with H1-GFP, we recognize that the quality of the NPM2 map falls short of the standard expected for a typical 5 Å resolution map. To appropriately convey the quality of the NPM2 maps, we have included the 3D FSC and local resolution map of the NPM2 structure (new Fig. S12).

      I really cannot see a difference between the open and closed forms. Looking at the models, I am skeptical that the authors can differentiate the two forms with the available resolution. Could they provide statistics that support their assignments? 

      To better highlight the structural differences between the two forms, we added a new figure to compare the maps between open and closed forms (Fig S12J-K).

      Also, the 'additional density' representing H1.8 in the NPM2 structures - I cannot see it. 

      We pointed out the density with the red arrow in the revised Fig 5A.

      Minor comments: 

      Something is missing at the end of Results, just before the beginning of the Discussion.  The figure legend for Fig. S12 is truncated, so it is unclear what is going on 

      We apologize for our oversight on this error. Due to an MS Word formatting error, the sentences (lines 316–343) were hidden beneath a figure. We have retrieved the missing sentences:

      “Supporting this idea, the acidic tract A1 (aa 36-40) and A2 (aa 120-140), which are both implicated in recognition of basic substrates such as core histones(43,50), respectively interact with and are adjacent to the putative H1.8 density (Figure 5B). In addition, the NPM2 surface that is in direct contact with the putative H1.8 density is accessible in the open form while it is internalized in the closed form (Figure 5C). This structural change of NPM2 may support more rigid binding of H1.8 to the open NPM2, whereas H1.8 binding to the closed form is less stable and likely occurs through interactions with the C-terminal A2 and A3 tracts, which are not visible in our cryo-EM structures.

      In the aforementioned NPM2-H1.8 structures, for which we applied C5 symmetry during the 3D structure reconstruction, only a partial H1.8 density could be seen (Figure 5B). One possibility is that H1.8 structure in NPM2-H1.8 does not follow C5 symmetry. As the size of the NPM2-H1.8 complex estimated from sucrose gradient elution volume is consistent with pentameric NPM2 binding to a single H1.8 (Figure 3C and Table S2), applying C5 symmetry during structural reconstruction likely blurred the density of the monomeric H1.8 that binds to the NPM2 pentamer. The structural determination of NPM2-H1.8 without applying C5 symmetry lowered the overall resolution but visualized multiple structural variants of the NPM2 protomer with different degrees of openness coexisting within a NPM2-H1.8 complex (Figure S14), raising a possibility that opening of a portion of the NPM2 pentamer may affect modes of H1.8 binding. Although more detailed structural analyses of the NPM2-substrate complex are subject of future studies, MagIC-cryo-EM and DuSTER revealed structural changes of NPM2 that was co-isolated H1.8 on interphase chromosomes.

      Discussion 

      MagIC-cryo-EM offers sub-nanometer resolution structural determination using a heterogeneous sample that contains the target molecule at 1~2 nM, which is approximately 100 to 1000 times lower than the concentration required for conventional cryo-EM methods, including affinity grid approach(9–11).”

      Figure S13: I am not sure how robust these assignments are at this low resolution. Are these real structures or classification artifacts? It feels very optimistic to interpret these structures  

      We agree that our NPM2 structures are low-resolution and that our interpretations may be revised as higher-resolution structures become available, although we believe that publishing these results will provide valuable insights into the NPM research field and also will illustrate the power of MagIC-cryo-EM and DuSTER. Conformational changes in the NPM family have been proposed in previous studies using techniques such as NMR, negative stain EM, and simulations, and these changes are thought to play a critical role in regulating NPM function (PMID: 25772360, 36220893, 38571760), but there has been a confusion in the literature, for example, on the substrate binding site and on whether NPM2 recognizes the substrate as a pentamer or decamer. Despite their low resolution, our new cryo-EM structures of NPM2 suggest that NPM2 recognizes the substrate as a pentamer, identify potential substrate-binding sites, and indicate the mechanisms underlying NPM2 conformational changes. We believe that publishing these results will provide valuable insights into the NPM research field and help guide and inspire further investigations. 

      To respond to this criticism, we have revised the manuscript to clearly describe the limitations of our NPM2 structures while highlighting the key insights. On page 12, line 452, the sentence was added to read, “While DuSTER enables the structural analysis of NPM2 co-isolated with H1.8-GFP, the resulting map quality is modest, and the reported numerical resolution may be overestimated. Furthermore, only partial density for H1.8 is observed. Although structural analysis of small proteins is inherently challenging, it is possible that halo-like scattering further hinders high-resolution structural determination by reducing the S/N ratio. More detailed structural analyses of the NPM2-substrate complex will be addressed in future studies.”

    1. eLife Assessment

      The formation of the Z-ring at the time of bacterial cell division interests researchers working towards understanding cell division across all domains of life. The manuscript by Jasnin et al reports the cryoET structure of toroid assembly formation of FtsZ filaments driven by ZapD as the cross linker. The findings are important and have the potential to open a new dimension in the field, but the evidence to support these exciting claims is currently incomplete, mostly because of the suboptimal "resolution of the toroids", so in the absence of additional experiments, the interpretations would need to be toned down.

    2. Reviewer #1 (Public review):

      Summary:

      The major result in the manuscript is the observation of the higher order structures in a cryoET reconstruction that could be used for understanding the assembly of toroid structures. The cross-linking ability of ZapD dimers result in bending of FtsZ filaments to a constant curvature. Many such short filaments are stitched together to form a toroid like structure. The geometry of assembly of filaments - whether they form straight bundles or toroid like structures - depends on the relative concentrations of FtsZ and ZapD.

      Strengths:

      In addition to a clear picture of the FtsZ assembly into ring-like structures, the authors have carried out basic biochemistry and biophysical techniques to assay the GTPase activity, the kinetics of assembly, and the ZapD to FtsZ ratio.

      Weaknesses:

      The discussion does not provide an overall perspective that correlates the cryoET structural organisation of filaments with the biophysical data. The current version has improved in terms of addressing this weakness and clearly states the lacuna in the model proposed based on the technical limitations.

      Future scope of work includes the molecular basis of curvature generation and how molecular features of FtsZ and ZapD affect the membrane binding of the higher order assembly.

    3. Reviewer #3 (Public review):

      Summary:

      Previous studies have analyzed the binding of ZapD to FtsZ and provided images of negatively stained toroids and straight bundles, where FtsZ filaments are presumably crosslinked by ZapD dimers. Toroids without ZapD have also been previously formed by treating FtsZ with crowding agents. The present study is the first to apply cryoEM tomography, which can resolve the structure of the toroids in 3D. This shows a complex mixture of filaments and sheets irregularly stacked in the Z direction and spaced radially. The most important interpretation would be to distinguish FtsZ filaments from ZapD crosslinks, This is less convincing. The authors seem aware of the ambiguity: "However, we were unable to obtain detailed structural information about the ZapD connectors due to the heterogeneity and density of the toroidal structures, which showed significant variability in the conformations of the connections between the filaments in all directions." Therefore, the reader may assume that the crosslinks identified and colored red are only suggestions, and look for their own structural interpretations. But readers should also note some inconsistencies in stoichiometry and crosslinking arrangements that are detailed under "weaknesses."

      Strengths.

      This is the first cryoEM tomography to image toroids and straight bundles of FtsZ filaments bound to ZapD. A strength is the resolution, which. at least for the straight bundles. is sufficient to resolve the ~4.5 nm spacing of ZapD dimers attached to and projecting subunits of an FtsZ filament. Another strength is the pelleting assay to determine the stoichiometry of ZapD:FtsZ (although this also leads to weaknesses of interpretation).

      Weaknesses

      The stoichiometry presents some problems. Fig. S5 uses pelleting to convincingly establish the stoichiometry of ZapD:FtsZ. Although ZapD is a dimer, the concentration of ZapD is always expressed as that of its subunit monomers. Fig. S5 shows the stoichiometry of ZapD:FtsZ to be 1:1 or 2:1 at equimolar or high concentrations of ZapD. Thus at equimolar ZapD, each ZapD dimer should bridge two FtsZ's, likely forming crosslinks between filaments. At high ZapD, each FtsZ should have it's own ZapD dimer. However, this seems contradicted by later statements in Discussion and Results. (1) "At lower concentrations of ZapD, .. toroids are the most prominent structures, containing one ZapD dimer for every four to six FtsZ molecules." Shouldn't it be one ZapD dimer for every two FtsZ? (2) "at the high ZapD concentration...a ZapD dimer binds two FtsZ molecules connecting two filaments." Doesn't Fig. S5 show that each FtsZ subunit has its own ZapD dimer? And wouldn't this saturate the CTD sites with dimers and thus minimize crosslinking?

      A major weakness is the interpretation of the cryoEM tomograms, specifically distinguishing ZapD from FtsZ. The distinction of crosslinks seems based primarily on structure: long continuous filaments (which often appear as sheets) are FtsZ, and small masses between filaments are ZapD. The density of crosslinks seems to vary substantially over different parts of the figures. More important, the density of ZapD's identified and colored red seem much lower than the stoichiometry detailed above. Since the mass of the ZapD monomer is half that of FtsZ, the 1:1 stoichiometry in toroids means that 1/3 of the mass should be ZapD and 2/3 FtsZ. However, the connections identified as ZapD seem much fewer than the expected 1/3 of the mass. The authors conclude that connections run horizontally, diagonally and vertically, which implies no regularity. This seems likely, but as I would suggest that readers need to consider for themselves what they would identify as a crosslink.

      In contrast to the toroids formed at equimolar FtsZ and ZapD, thin bundles of straight filaments are assembled in excess ZapD. Here the stoichiometry is 2:1, which would mean that every FtsZ should have a bound ZapD DIMER. The segmentation of a single filament in Fig. 5e seems to agree with this, showing an FtsZ filament with spikes emanating like a picket fence, with a 4.5 nm periodicity. This is consistent with each spike being a ZapD dimer, and every FtsZ subunit along the filament having a bound ZapD dimer. But if each FtsZ has its own dimer, this would seem to eliminate crosslinking. The interpretative diagram in Fig. 6, far right, which shows almost all ZapD dimers bridging two FtsZs on opposite filaments, would be inconsistent with this 2:1 stoichiometry.

      In the original review I suggested a control that might help identify the structures of ZapD in the toroids. Popp et al (Biopolymers 2009) generated FtsZ toroids that were identical in size and shape to those here, but lacking ZapD. These toroids of pure FtsZ were generated by adding 8% polyvinyl chloride, a crowding agent. The filamentous substructure of these toroids in negative stain seemed very similar to that of the ZapD toroids here. CryoET of these toroids lacking ZapD might have been helpful in confirming the identification of ZapD crosslinks in the present toroids. However, the authors declined to explore this control.

      Finally, it should be noted that the CTD binding sites for ZapD should be on the outside of curved filaments, the side facing the membrane in the cell. All bound ZapD should project radially outward, and if it contacted the back side of the next filament, it should not bind (because the CTD is on the front side). The diagram second to right in Fig. 6 seems to incorporate this abortive contact.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The major result in the manuscript is the observation of the higher order structures in a cryoET reconstruction that could be used for understanding the assembly of toroid structures. The crosslinking ability of ZapD dimers result in bending of FtsZ filaments to a constant curvature. Many such short filaments are stitched together to form a toroid like structure. The geometry of assembly of filaments - whether they form straight bundles or toroid like structures - depends on the relative concentrations of FtsZ and ZapD.

      Strengths:

      In addition to a clear picture of the FtsZ assembly into ring-like structures, the authors have carried out basic biochemistry and biophysical techniques to assay the GTPase activity, the kinetics of assembly, and the ZapD to FtsZ ratio.

      Weaknesses:

      The discussion does not provide an overall perspective that correlates the cryoET structural organisation of filaments with the biophysical data.

      The crosslinking nature of ZapD is already established in the field. The work carried out is important to understand the ring assembly of FtsZ. However, the availability of the cryoET observations can be further analysed in detail to derive many measurements that will help validate the model, and obtain new insights.

      We thank the reviewer for these insightful comments on our work. We have edited the manuscript to resolve and clarify most of the issues raised during the review process.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors set out to better understand the mechanism by which the FtsZ-associated protein ZapD crosslinks FtsZ filaments to assemble a large-scale cytoskeletal assembly. For this aim, they use purified proteins in solution and a combination of biochemical, biophysical experiments and cryo-EM. The most significant finding of this study is the observation of FtsZ toroids that form at equimolar concentrations of the two proteins.

      Strengths:

      Many experiments in this paper confirm previous knowledge about ZapD. For example, it shows that ZapD promotes the assembly of FtsZ polymers, that ZapD bundles FtsZ filaments, that ZapD forms dimers and that it reduces FtsZ's GTPase activity. The most novel discovery is the observation of different assemblies as a function of ZapD:FtsZ ratio. In addition, using CryoEM to describe the structure of toroids and bundles, the paper provides some information about the orientation of ZapD in relation to FtsZ filaments. For example, they found that the organization of ZapD in relation to FtsZ filaments is "intrinsic heterogeneous" and that FtsZ filaments were crosslinked by ZapD dimers pointing in all directions. The authors conclude that it is this plasticity that allows for the formation of toroids and its stabilization. Unfortunately, a high-resolution structure of the protein organization was not possible. These are interesting findings that in principle deserve publication.

      We thank the reviewer for this valuable assessment. We have made several changes to the manuscript to improve its readability and comprehensibility. In addition, we have addressed the reviewer’s main concerns in the point-by-point response below.

      Weaknesses:

      While the data is convincing, their interpretation has some substantial weaknesses that the authors should address for the final version of this paper.

      We have addressed most of the aspects highlighted by the reviewer to improve the quality and comprehensibility of our results.

      For example, as the authors are the first to describe FtsZ-ZapD toroids, a discussion why this has not been observed in previous studies would be very interesting, i.e. is it due to buffer conditions, sample preparation?

      Several factors may explain the absence of observed toroidal structures in other studies. FtsZ is a highly dynamic protein, and its behavior varies significantly with different environmental conditions, as detailed in the literature. These environmental factors include pH, salt concentration, protein type, GTP levels, and the purification strategy used. Previous research has employed negative stain electron microscopy (EM) to visualize ZapD-FtsZ structures. It is important to note that FtsZ is sensitive to surface effects when it is bound to or adsorbed onto membranes (Mateos-Gil et al. 2019 FEMS Microbiol Rev - DOI: 10.1093/femsre/fuy039). Therefore, the adsorption of FtsZ and ZapD onto the EM grid may influence the formation of higher order structures. In this study, we used cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET) to visualize the 3D organization of ZapD-mediated structures. This approach allows us to avoid staining artifacts and the distortion of structures caused by adsorption or drying of the grid. In addition, we can resolve single filaments. Our buffer conditions also differ slightly from those in previous studies, which may significantly impact the behavior of FtsZ, as illustrated in Supplementary Fig. 3.

      At parts of the manuscript, the authors try a bit too hard to argue for the physiological significance of these toroids. This, however, is at least very questionable, because: The typical diameter is in the range of 0.25-1.0 μm, which requires some flexibility of the filaments to be able to accommodate this. It's difficult to see how a FtsZ-ZapD toroid, which appears to be quite rigid with a narrow size distribution of 502 nm {plus minus} 55 nm could support cell division rather than stalling it at that cell diameter. which the authors say is similar to the E. coli cell.

      The toroidal structures formed by FtsZ and ZapD, with their characteristics similar to those of the bacterial division system, are significant in physiological contexts and warrant further study. The connections mediated by Zaps are expected to play a crucial role in filament organization, which is vital for the machinery enabling cellular constriction. Therefore, characterizing these structures in vitro can provide insight into divisome stabilization, assembly and constriction mechanisms. While we acknowledge the limitations of in vitro systems and do not expect to see the same toroidal structures in vivo, the way ZapD decorates and connects FtsZ filaments in vitro may resemble the processes that occur in the division ring formed inside the cell. This study represents an initial effort to characterize these toroidal structures, which could inspire further research and potentially reveal their physiological relevance.

      Regarding flexibility, it has been previously reported that an arrangement of loosely connected filaments forms the FtsZ ring. Our model is consistent with this observation despite the heterogeneity and density observed in the toroidal structures. We anticipate differences in vivo due to the high complexity of the cytoplasm, interactions with other cellular components, and attachment to the cell membrane, all of which would influence structural outcomes. However, our novel in vitro approach, which allows us to study FtsZ filament organization and connectivity – features that are challenging to explore in vivo and have not been thoroughly investigated before – has the potential to significantly advance our understanding of these structures. Consequently, these structures can aid our understanding of complex macrostructures in vivo, even if we have merely begun to scratch the surface of their characterization.

      Regarding the size of the toroids, we hypothesize that it reflects an optimal condition based on our experimental setup in solution. In vivo, these conditions are altered by interactions with various division partners, attachment to the plasma membrane, and system contraction. 

      We have better reformulated and edited the manuscript to discuss the potential physiological relevance of our toroidal structures.

      For cell division, FtsZ filaments are recruited to the membrane surface via an interaction of FtsA or ZipA the C-terminal peptide of FtsZ. As ZapD also binds to this peptide, the question arises who wins this competition or where is ZapD when FtsZ is recruited to the membrane surface? Can such a toroidal structure of FtsZ filaments form on the membrane surface? Additional experiments would be helpful, but a more detailed discussion on how the authors think ZapD could act on membrane-bound filaments would be essential.

      We appreciate this comment, which was indeed one of our main questions. The complexity of the division system raises many questions about the interaction of FtsZ with the plasma membrane. The competition between division components to interact with FtsZ and thus modulate its behavior is still largely unknown. FtsA and ZipA appear to have a greater affinity for the C-terminal domain (CTD) of FtsZ than ZapD. However, considering all FtsZ monomers forming a filament, we expect FtsZ filaments to interact with many different division partners. The ability of FtsZ to interact with many components is necessary to explain the current model of the system. According to this model, FtsZ filaments would be decorated by many different proteins, anchoring them to the membrane while crosslinking or promoting their disassembly in a spatiotemporally controlled manner. 

      We tried experiments combining FtsA, ZipA, and ZapD on supported lipid membranes and liposomes. However, they proved difficult to perform. We expect similar results to those observed for ZapA (Caldas et al. 2019 Nat Commun - DOI: 10.1038/s41467-019-13702-4). However, competition between proteins for interaction with the CTD of FtsZ adds an extra layer of complexity, making exploring this issue attractive in the future. However, as remarkably pointed out by Reviewer 3, our cryo-ET data of straight bundles provide new insights into how ZapD-FtsZ structures can bind to the plasma membrane. In these straight bundles, the CTDs of two parallel FtsZ filaments are oriented upwards. They can bind the plasma membrane directly or the ZapDs, which decorate the FtsZ filaments from above instead of from the side, as suggested previously (Schumacher et al. 2017 J Biol Chem - DOI: 10.1074/jbc.M116.773192), allowing ZapDs to interact with the membrane.

      The authors conclude that the FtsZ filaments are dynamic, which is essential for cell division. But the evidence for dynamic FtsZ filaments within these toroids seems rather weak, as it is solely the partial reassembly after addition of GTP. As ZapD significantly slows down GTP hydrolysis, I am not sure it's obvious to make this conclusion.

      FtsZ filaments are dynamic, as they can reassemble into macrostructures relatively quickly. Decreased GTPase activity is a good indicator of the formation of lateral interactions between filaments. For instance, under crowding conditions, FtsZ also reduces its GTPase activity, although the bundles disassemble very slowly over time (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200). We measured the GTPase activity during the first 5 minutes after GTP addition, conditions under which toroidal structures and bundles remain fully assembled. However, we expect GTPase activity to recover as the macrostructures disassemble, considering the reassembly of macrostructures after GTP resupply, which suggests that FtsZ filaments remain active and dynamic.

      On a similar note, on page 5 the authors claim that ZapD would transiently interact with FtsZ filaments. What is the evidence for this? They also say that this transient interaction could have a "mechanistic role in the functionality of FtsZ macrostructures." Could they elaborate?

      We have rephrased the whole paragraph in the revised version to clarify matters (page 10, lines 2434):

      “These results are consistent with the observation that ZapD interacts with FtsZ through its central hub, which provides additional spatial freedom to connect other filaments in different conformations. This flexibility allows different filament organizations and contributes to structural heterogeneity. In addition, these results suggest that these crosslinkers can act as modulators of the dynamics of the ring structure, spacing filaments apart and allowing them to slide in an organized manner. The ability of FtsZ to treadmill directionally, together with the parallel or antiparallel arrangement of short, transiently crosslinked filaments, is considered essential for the functionality of the Z ring and its ability to exert constrictive force34,36–38,50. Thus, Zap proteins can play a critical role in ensuring correct filament placement and stabilization, which is consistent with the toroidal structure formed by ZapD.”

      The author should also improve in putting their findings into the context of existing knowledge. For example:

      The authors observe a straightening of filament bundles with increasing ZapD concentration. This seems consistent with what was found for ZapA, but this is not explicitly discussed (Caldas et al 2019)

      We have discussed this similarity in the revised version of this manuscript (page 12, line 40 - page 13, line 8):

      “Understanding how the associative states of ZapA (as tetramers) and ZapD (as dimers), together with membrane tethering, influence the predominant structures formed in both systems is essential. The complexity of the division system raises important questions about the interaction dynamics between FtsZ and the plasma membrane. The competitive nature of the division components to engage with FtsZ and modulate its functionality remains to be thoroughly elucidated. It is important to note that FtsA and ZipA have a greater affinity for the C-terminal domain of FtsZ than ZapD. Our cryo-ET data on straight bundles provide new perspectives on how ZapD-FtsZ structures can effectively bind to the plasma membrane; in particular, the C-terminal domains of parallel FtsZ filaments are oriented upward, allowing direct membrane binding or interaction with ZapDs that reinforce these filaments from above, rather than from the side, as previously suggested.”

      A paragraph summarizing what is known about the properties of ZapD in vivo would be essential: i.e., what has been found regarding its intracellular copy number, location and dynamics?

      We thank the reviewer for this valuable suggestion. We describe the role of Zap proteins in vivo and the previous studies of ZapD in the introduction (page 2, lines 34 - page 3, line 17). Additionally, we added the estimated number of ZapD copies in the cell in the discussion (page 11, lines 2-7).

      In the introduction, the authors write that "GTP binding and hydrolysis induce a conformational change in each monomer that modifies its binding potential, enabling them to follow a treadmilling behavior". This seems inaccurate, as shown by Wagstaff et al. 2022, the conformational change of FtsZ is not associated with the nucleotide state. In addition, they write that FtsZ polymerization depends on the GTPase activity. It would be more accurate to write that polymerization depends on GTP, and disassembly on GTPase activity.”

      Following the reviewer's suggestions, we have adapted and corrected these text elements as follows (page 2, lines 7-9): 

      “FtsZ undergoes treadmilling due to polymerization-dependent GTP hydrolysis, allowing the ring to exhibit its dynamic behavior.”

      On page 2 they also write that "the mechanism underlying bundling of FtsZ filaments is unknown". I would disagree, the underlying mechanism is very well known (see for example Schumacher, MA JBC 2017), but how this relates to the large-scale organization of FtsZ filaments was not clear.

      We thank the reviewer for this comment. We have corrected and clarified the related text accordingly (page 3, lines 11-12):

      “…the link between FtsZ bundling, promoted by ZapD, and the large-scale organization of FtsZ filaments remains unresolved.”

      The authors describe the toroid as a dense 3D mesh, how would this be compatible with the Z-ring and its role for cell division? I don't think this corresponds to the current model of the Z-ring (McQuillen & Xiao, 2020). Apart from the fact it's a ring, I don't think the organization of FtsZ obviously similar to the current of the Z-ring in the bacterial cell, in particular because it's not obvious how FtsZ filaments can bind ZapD and membrane anchors simultaneously.

      We consider that the intrinsic characteristics of toroidal structures and the bacterial division ring have points in common. As indicated in the answer above, despite the differences and limitations that might result from an in vitro approach, the structures shown after ZapD crosslinking of FtsZ filaments can demonstrate intrinsic features occurring in vivo. The current model of the division ring consists of an arrangement of filaments loosely connected by crosslinkers in the center of the cell, forming a ring. This model is compatible with our findings, although many questions remain about the structural organization of the Z-ring in the cell.

      Reviewer 3 has brought a compelling new perspective to interpreting our cryo-ET data: ZapD decorates FtsZ from above, allowing ZapD or FtsZ to bind to the plasma membrane. We have discussed this point in more detail below. In the case of straight bundles, this favors the stacking of straight FtsZ filaments, whereas in the case of toroids, ZapD can also bind FtsZ filaments laterally and diagonally, and it is this less compact arrangement that could enable FtsZ bending and toroid size adjustment. 

      We have revised the text accordingly to incorporate the interpretation proposed by Reviewer 3 (page 12, lines 24-31):

      “The current model of the division ring consists of an array of filaments loosely connected by crosslinkers at the center of the cell, forming a ring. This model is consistent with our findings, although many questions remain regarding the structural organization of the Z ring within the cell. ZapD binds to FtsZ from above, allowing either ZapD or FtsZ to interact with the plasma membrane. In straight bundles, this facilitates the stacking of straight FtsZ filaments, while for toroids, ZapD can also bind FtsZ filaments diagonally. This less compact arrangement could allow bending of the FtsZ filaments and adjustment of toroid size.”

      The authors write that "most of these modulators" interact with FtsZ's CTP, but then later that ZapD is the only Zap protein that binds CTP. This seems to be inconsistent. Why not write that membrane anchors usually bind the CTP, most Zaps do not, but ZapD is the exception?

      We thank the reviewer for this pertinent suggestion, which we have followed in the revised version of the manuscript (page 2, lines 19-22):

      “Most of these modulators interact with FtsZ through its carboxy-terminal end, which modulates division assembly as a central hub.  ZapD is the only Zap protein known to crosslink FtsZ by binding its C-terminal domain, suggesting a critical Z ring structure stabilizing function.”

      I also have some comments regarding the experiments and their analysis:

      Regarding cryoET: the filaments appear like flat bands, even in the absence of ZapD, which further elongates these bands. Is this due to an anisotropic resolution? This distortion makes the conclusion that ZapD forms bi-spherical dimers unconvincing.

      The missing wedge caused by the limited angular range of the tomography data generates an elongation of the structures by a factor of 2 along the Z axis. This feature is visible in the undecorated FtsZ filament data (Supplementary Fig. 10). The more pronounced elongation along the Z-axis observed in the presence of ZapD indicates the presence of ZapD to connect two parallel FtsZ filaments along the Z-axis (see Supplementary Figs. 8, 9 and 10). We do not have sufficient resolution to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis, but we also observed bispherical ZapDs in the XY plane (Fig. 4b-d). Unfortunately, our data do not allow for a more detailed characterization.

      The authors say that the cryoET visualization provides crucial information on the length of the filaments within this toroid. How long are they? Could the authors measure it?

      Measuring the length of single filaments is not trivial, given the dense, heterogeneous mesh promoted by ZapD crosslinking. We tried to identify and track them, but the density of filaments and connections made precise measurement very difficult. Nevertheless, we could identify the formation of these toroids by an arrangement of short filaments (Supplementary Fig. 11) instead of continuous circular filaments.

      We have removed the following sentence text in the revised manuscript: “Visualization of ZapDmediated FtsZ toroidal structures by cryo-ET provided crucial information on the 3D organization, connectivity and length of filaments within the toroid.”

      Regarding the dimerization mutant of ZapD: there is actually no direct confirmation that mZapD is monomeric. Did the authors try SEC MALS or AUC? Accordingly, the statement that dimerization is "essential" seems exaggerated (although likely true).

      Unlike the wild-type ZapD protein, the mZapD mutant exists as a mixture of monomers (~15%) and dimers, as AUC assays performed at similar protein concentrations revealed. These results demonstrate that the mutant protein has a lower tendency to form dimers than the native ZapD protein. We have included the AUC data for mZapD in the supplementary material (Supp. Fig. 15a).

      What do the authors mean that toroid formation is compatible with robust persistence length? I.e. What does robust mean? It was recently shown that FtsZ filaments are actually surprisingly flexible, which matches well the fact that the diameter of the Z-ring must continuously decrease during cell division (Dunajova et al Nature Physics 2023).

      We have corrected this sentence in the revised version of the manuscript to improve clarity (page 11, lines 9-10): 

      “The persistence length and curvature of FtsZ filaments are optimized for forming bacterial-sized ring structures.”

      The authors claim that their observations suggest „that crosslinkers ... allows filament sliding in an organized fashion". As far as I know there is no evidence of filament sliding, as FtsZ monomers in living cells and in vitro are static.

      Filament sliding may be one of the factors contributing to the force generation mechanisms involved in cell division (Nguyen et al. 2021 J Bacteriol - DOI: 10.1128/JB.00576-20). Our results indicate that ZapD can separate filaments, creating space between them and facilitating their organization.

      Although the molecular dynamics of cell constriction are not yet fully understood, it is possible that filament sliding plays a role. If this is the case, the crosslinking of short FtsZ filaments in multiple directions by ZapD could provide the necessary flexibility to adjust the diameter of the constriction ring during bacterial division.

      What is the „proto-ring FtsA protein"?

      The proto-ring denotes the first molecular assembly of the Z-ring, which in E. coli consists of FtsZ, FtsA and ZipA (see, for example, Ortiz et al. 2016 FEMS Microbiol Rev - DOI: 10.1093/femsre/fuv040). To simplify matters, we have deleted the term “proto-ring” in the revised version of the MS.

      The authors refer to „increasing evidence" for „alternative network remodeling mechanisms that do not rely on chemical energy consumption as those in which entropic forces act through diffusible crosslinkers, similar to ZapD and FtsZ polymers." A reference should be given, I assume the authors refer to the study by Lansky et al 2015 of PRC on microtubules. However, I am not sure how the authors made the conclusion that this applies to FtsZ and ZapD, on which evidence is this assumption based?

      We refer to cytoskeletal network remodeling mechanisms independent of chemical energy consumption (Braun et al. 2016 Bioessays - DOI: 10.1002/bies.201500183) driven by entropic forces induced by macromolecular crowding agents or diffusible crosslinkers. The latter mechanism leads to an increase in filament overlap length and the contraction of filament networks. These mechanisms complement and act in synergy with energy-consuming processes (such as those involving nucleotide hydrolysis) to modulate actin- and microtubule-based cytoskeleton remodeling. Similarly, crosslinking proteins such as ZapD may contribute to remodeling the FtsZ division ring in the cell. 

      We have revised the corresponding text of the manuscript accordingly (page 13, lines 16-24):  “In addition, our findings could greatly enhance the understanding of how polymeric cytoskeletal networks are remodeled during essential cellular processes such as cell motility and morphogenesis. Although conventional wisdom points to molecular motors as the primary drivers of filament remodeling through energy consumption, there is increasing evidence that there are alternative mechanisms that do not rely on such energy, instead harnessing entropic forces via diffusible crosslinkers. This approach may also be applicable to ZapD and FtsZ polymers, suggesting a promising avenue for optimizing conditions in the reverse engineering of the division ring to enhance force generation in minimally reconstituted systems aimed at achieving autonomous cell division.”

      Some inconsistencies in supplementary figure 3: The normalized absorbances in panel a do not seem to agree with the absolute absorbance shown in panel e, i.e. compare maximum intensity for ZapD = 20 µM and 5 µM in both panels.

      We have corrected these inconsistencies in the revised version.

      It's not obvious to me why the structure formed by ZapD and FtsZ disassembles after some time even before GTP is exhausted, can the authors explain? As the structures disassemble, how is the "steadystate turbidity" defined? Do the structures also disassemble when they use a non-hydrolyzable analog of GTP?

      In the presence of ZapD, FtsZ rapidly forms higher order polymers after the addition of GTP, as shown by turbidity assays at 320 nm (the formation of single- or double-stranded FtsZ filaments in the absence of ZapD does not produce a significant increase in turbidity). Macrostructures formed by FtsZ in the presence of ZapD, while more stable than FtsZ filaments (which rapidly disassemble following GTP consumption), are also dynamic. These assembly reactions are GTP-dependent and considerably modify polymer dynamics. In agreement with our results, previous studies have shown that high concentrations of macromolecular crowders (such as Ficoll or dextran) promote the formation of dynamic FtsZ polymer networks (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200). In this case, FtsZ GTPase activity was significantly retarded compared with FtsZ filaments, resulting in a decrease in GTPase turnover. Similar mechanisms may apply to assembly reactions in the presence of ZapD.

      Parallel assembly studies replacing GTP with a slowly hydrolyzable GTP analog remain pending. We expect ZapD-containing FtsZ macrostructures to last assembled for longer but still disassemble upon GTP consumption, as occurs with the crowding-induced FtsZ polymer networks formed in the presence of nucleotide analogs.

      Accordingly, we have revised the corresponding text to clarify matters (page 4, line 37 – page 5 line 7). 

      Conclusion: Despite some weaknesses in the interpretation of their findings, I think this paper will likely motivate other structural studies on large scale assemblies of FtsZ filaments and its associated proteins. A systematic comparison of the effects of ZapA, ZapC and ZapD and how their different modes of filament crosslinking can result in different filament networks will be very useful to understand their individual roles and possible synergistic behavior.

      We appreciate the reviewer's remarks and comments, which provided us with valuable information and helped us considerably improve the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The authors provide the first image analysis by cryoET of toroids assembled by FtsZ crosslinked by ZapD. Previously toroids of FtsZ alone have been imaged only in projection by negative stain EM. The authors attempt to distinguish ZapD crosslinks from the underlying FtsZ filaments. I did not find this distinction convincing, especially because it seems inconsistent with the 1:1 stoichiometry demonstrated by pelleting. I was intrigued by one image showing straight filament pairs, which may suggest a new model for how ZapD crosslinks FtsZ filaments.

      We thank the reviewer for these valuable comments, to which we have responded in detail below. 

      Strengths:

      (1) The first image analysis of FtsZ toroids by cryoET.

      (2) The images are accompanied by pelleting assays that convincingly establish a 1:1 stoichiometry of FtsZ:ZapD subunits.

      (3) Fig. 5 shows an image of a pair of FtsZ filaments crosslinked by ZapD. This seems to have higher resolution than the toroids. Importantly, it suggests a new model for the structure of FtsZ-ZapD that resolves previously unrecognized conflicts. (This is discussed below under weaknesses, because it is so far only supported by a single image.)

      We thank the reviewer for this assessment and, in particular, for raising point 3, which provided a new perspective on the interpretation of our data. We have also included a new example of a straight bundle in Supplementary Fig. 13.

      Weaknesses:

      This paper reports a study by cryoEM of polymers and bundles assembled from FtsZ plus ZapD. Although previous studies by other labs have focused on straight bundles of filaments, the present study found toroids mixed with these straight bundles, and they focused most of their study on the toroids. In the toroids they attempt to delineate FtsZ filaments and ZapD crosslinks. A major problem here is with the stoichiometry. Their pelleting assays convincingly established a stoichiometry of 1:1, while the mass densities identified as ZapD are sparse and apparently well below the number of FtsZ (FtsZ subunits are not resolved in the reconstructions, but the continuous sheets or belts seem to have a lot more mass than the identified crosslinks.)  

      Apart from the stoichiometry I don't find the identification of crosslinks to be convincing. It is missing an important control - cryoET of toroids assembled from pure FtsZ, without ZapD.

      However, if I ignore these and jump to Fig. 5, I think there is an important discovery that resolves controversies in the present study as well as previous ones, controversies that were not even recognized. The controversy is illustrated by the Schumacher 2017 model (their Fig. 7), which is repeated in a simplified version in Fig. 1a of the present mss. That model has a two FtsZ filaments in a plane facing ZapD dimers which bridge them. In this planar model the C-terminal linker, and the ctd of FtsZ that binds ZapD facing each other and the ZapD in the middle, with. The contradiction arises because the C-terminus needs to face the membrane in order to attach and generate a bending force. The two FtsZ filaments in the planar model are facing 90{degree sign} away from the membrane. A related contradiction is that Houseman et al 2016 showed that curved FtsZ filaments have the C terminus on the outside of the curve. In a toroid the C termini should all be facing the outside. If the paired filaments had the C termini facing each other, they could not form a toroid because the two FtsZ filaments would be bending in opposite directions.

      Fig. 5 of the present ms seems to resolve this by showing that the two FtsZ filaments and ZapD are not planar, but stacked. The two FtsZ filaments have their C termini facing the same direction, let's say up, toward the membrane, and ZapD binds on top, bridging the two. The spacing of the ctd binding sites on the Zap D dimer is 6.5 nm, which would fit the ~8 nm width of the paired filament complex observed in the present cryoEM (Fig S13). In the Schumacher model the width would be about 20 nm. Importantly, the stack model has the ctd of each filament facing the same direction, so the paired filaments could attach to the membrane and bend together (using ctd's not bound by ZapD). Finally, the new arrangement would also provide an easy way for the complex to extend from a pair of filaments to a sheet of three or four or more. A problem with this new model from Fig. 5 is that it is supported by only a single example of the paired FtsZ-ZapD complex. If this is to be the basis of the interpretation, more examples should be shown. Maybe examples could be found with three or four FtsZ filaments in a sheet.

      We thank the reviewer for asking interesting questions and suggesting a compelling model for how ZapD could bind FtsZ filaments. Cryo-ET of straight bundles revealed that high ZapD density promotes vertical stacking of FtsZ filaments and decoration of FtsZ filaments by ZapD from above. In toroids, FtsZ filaments are vertically decorated by ZapD, which explains the high elongation of the filament structures observed, consisting of FtsZ-ZapD(-FtsZ) units. In addition, we observed a high abundance of diagonal connections between FtsZ filaments of different heights, revealing a certain flexibility/malleability of ZapD to link filaments that are not perfectly aligned vertically. This configuration could give rise to curved filaments and the overall toroid structure.

      The manuscript proposes that ZapD can bind FtsZ filaments in different directions. However, it seems to have a certain tendency to bind to the upper part of FtsZ filaments, stacking them vertically or vertically with a lateral shift (Supplementary Fig. 9). We also observe lateral connections, although the features of the toroidal structures limit their visualization. This enables both the binding to the membrane by ZapD or FtsZ and the formation of higher order FtsZ polymer structures. 

      In summary, ZapD is capable of linking FtsZ filaments in multiple directions, including from the upper part of the filaments as well as laterally or diagonally. At high concentrations of ZapD, the filaments become more compactly arranged, primarily stacking vertically, which results in the loss of curvature. In contrast, at lower concentrations of ZapD, the FtsZ filaments are less tightly packed, leading to curved filaments and an overall toroidal structure that may resemble the in vivo ring structures.

      We have edited our manuscript to accommodate this hypothesis, including the abstract and the cryoET section (page 7, lines 5-16): 

      “The isosurface confirmed the presence of extended structures along the Z-axis, well beyond the elongation expected from the missing wedge effect for single FtsZ filaments (for comparison, see Supplementary Fig. 10). The vertically extended structures appeared to correspond to filaments that were connected or decorated by additional densities along the Z-axis (Supplementary Fig. 9b). Importantly, these densities were only observed in the presence of ZapD (Supplementary Fig. 10b), suggesting that they represent ZapD connections (Fig. 3e and Supplementary Figs. 8e and 9b). We note that the resolution of the data is not sufficient to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis.

      These results suggest that the toroids are constructed and stabilized by interactions between ZapD and FtsZ, which are mainly formed along the Z-axis but also laterally and diagonally.”

      Page 7, lines 40-42: 

      “Cryo-ET imaging of ZapD-mediated FtsZ toroidal structures revealed a preferential vertical stacking and crosslinking of short ZapD filaments, which are also crosslinked laterally and diagonally, allowing for filament curvature.”

      And in the discussion (page 12, lines 27-31): 

      “ZapD binds to FtsZ from above, allowing either ZapD or FtsZ to interact with the plasma membrane. In straight bundles, this facilitates the stacking of straight FtsZ filaments, while for toroids, ZapD can also bind FtsZ filaments diagonally. This less compact arrangement could allow bending of the FtsZ filaments and adjustment of the toroid size.”

      What then should be done with the toroids? I am not convinced by the identification of ZapD as "connectors." I think it is likely that the ZapD is part of the belts that I discuss below, although the relative location of ZapD in the belts is not resolved. It is likely that the resolution in the toroid reconstructions of Fig. 4, S8,9 is less than that of the isolated pf pair in Fig. 5c.

      We agree with the reviewer's interpretation that ZapD can attach to FtsZ filaments from both above and laterally. The data from the straight bundles, which are more clearly resolved due to their thinner structure, demonstrate that ZapD can decorate FtsZ filaments vertically. Additionally, the toroidal data supports the notion that ZapD can act as a crosslinker between filaments that are not perfectly vertical, allowing for lateral offsets (see, for example, Fig. 4d) or lateral connections (Fig. 4b). 

      We recognize that the resolution and high density of structures in our cryo-ET data make it challenging to accurately annotate proteins or connectors. Despite this difficulty, we have made efforts to label and identify the ZapD proteins and connectors. We employed an arbitrary labeling method to assist with visual interpretation. However, we acknowledge that some errors may exist and that ZapD proteins were not labeled, particularly along the Z-axis, where the missing wedge limits our ability to distinguish between ZapD and FtsZ proteins (page 7, lines 8-13):

      “The vertically extended structures appeared to correspond to filaments that were connected or decorated by additional densities along the Z-axis (Supplementary Fig. 9b). Importantly, these densities were only observed in the presence of ZapD (Supplementary Fig. 10b), suggesting that they represent ZapD connections (Fig. 3e and Supplementary Figs. 8e and 9b). We note that the resolution of the data is not sufficient to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis. We note that the resolution of the data is not sufficient to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis.”

      We draw attention to the limitation of our manual segmentation in the text as follows (page 7, lines 20-24):

      “We manually labeled the connecting densities in the toroid isosurfaces to analyze their arrangement and connectivity with the FtsZ filaments. The high density of the toroids and the wide variety of conformations of these densities prevented the use of subtomogram averaging to resolve their structure and spatial arrangement within the toroids.”

      Importantly, If the authors want to pursue the location of ZapD in toroids, I suggest they need to compare their ZapD-containing toroids with toroids lacking ZapD. Popp et al 2009 have determined a variety of solution conditions that favor the assembly of toroids by FtsZ with no added protein crosslinker. It would be very interesting to investigate the structure of these toroids by the present cryoEM methods, and compare them to the FtsZ-ZapD toroids. I suspect that the belts seen in the ZapD toroids will not be found in the pure FtsZ toroids, confirming that their structure is generated by ZapD.

      The only reported toroidal structure of E. coli FtsZ can be found in the literature by Popp et al. (2009 Biopolymers – DOI: 10.1002/bip.21136). It is important to note that methylcellulose (MC) must be added to the working solution to induce the formation of these structures, as FtsZ toroids do not form in the absence of MC. The mechanisms by which MC promotes this assembly process go beyond mere excluded volume effects due to crowding, as the concentration of MC used is very low (less than 1 mg/ml), which is below the typical crowding regime. This suggests that there are additional interactions between MC and FtsZ. Such complexities and secondary interactions prevent the use of this system as a reliable control for the FtsZ toroidal structures reported here. Alternatively, we also considered the toroidal structures of FtsZ from Bacillus subtilis (Huecas et al. 2017 Biophys J - DOI: 10.1016/j.bpj.2017.08.046) and Cyanobacterium synechocystis (Wang et al. 2019 J Biol Chem – DOI: 10.1074/jbc.RA118.005200). However, these structures do not serve as appropriate controls due to the structural and molecular differences between these FtsZ proteins.

      Recommendations for the authors:  

      Reviewing Editor:

      While the three referees recognize and appreciate the importance of this work several technical and interpretational questions have been raised. There was a prolonged discussion amongst the three expert referees, and it was felt that the current version suffers from a number of problems that the authors need to consider. These are to do with 1. Stoichiometry of ZapD-FtsZ 2. the evidence for crosslinks 3. how the cryo-ET data correlates with the biophysical data 4. Physiological relevance of the elucidated structures. Please take note of the public reviews (strengths and weaknesses) as well as "Recommendations to the authors" sections below, if you choose to prepare a revision.

      In reading the reviews very carefully (as well as while following the ensuing robust discussion between the referees) I noticed that all points raised are extremely important to be addressed / reconciled (with experiments and / or discussion) for this study to become an outstanding contribution to bacterial cell biology field. I would therefore urge you to consider these carefully and revise the manuscript accordingly.

      We thank the editorial board and reviewers for their excellent work evaluating and reviewing our manuscript. Their constructive suggestions and comments have been taken into account in preparing the revised version. We have paid particular attention to the four points mentioned above by the reviewing editor. We hope that the new version and this point-by-point rebuttal letter will answer most of the questions and weaknesses raised by the reviewers.

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improvement of the manuscript:

      (1) ZapD to FtsZ ratio:

      i) Page 3: Results section, paragraph 1:

      FtsZ to ZapD shows a 1:2 ratio. How does this explain cross linking by a dimeric species, as this will be equivalent to a 1:1 ratio of FtsZ and ZapD? The crystal structure in the reference cited has FtsZ peptide bound only to one side of the dimer, however a crosslinking effect can happen only if FtsZ binds to both protomers of ZapD dimer. If the decoration is not uniform as given in the toroid model based on cryoET, this should lead to a model with excess of FtsZ in the toroid?

      On page 3 of the original manuscript, we stated that the binding stoichiometry of ZapD to FtsZ was 2:1, based on estimates derived from sedimentation velocity experiments involving the unassembled GDP form of FtsZ. However, upon reanalyzing these experiments, we found that the previous characterization of the association mode was overly simplistic. We determined that there are two predominant molecular species of ZapD:FtsZ complexes in solution, which correspond to ZapD dimers bound to either one or two FtsZ monomers, resulting in stoichiometries of 2:1 and 1:1, respectively. The revised binding stoichiometry data for ZapD and GDP-FtsZ suggests the presence of 1:1 ZapD-FtsZ complexes which aligns with the idea that FtsZ polymers can be crosslinked by dimeric ZapD species. In mixtures where ZapD is present in excess over FtsZ, the crosslinking corresponds to 1:1 binding stoichiometries, leading to the formation of straight macrostructures. Conversely, when the concentration of ZapD is reduced in the reaction mixture, the resulting macrostructures take the form of toroids. In this scenario, there is an excess of FtsZ because only some of the FtsZ molecules within the polymers are crosslinked by ZapD dimers, resulting in a binding stoichiometry of approximately 0.4 ZapD molecules per FtsZ, as quantified by differential sedimentation experiments.

      We have rewritten the corresponding texts in the revised version to explain these matters (page 4 lines 14-18):

      “Sedimentation velocity analysis of mixtures of the two proteins revealed the presence of two predominant molecular species of ZapD:FtsZ complexes in solution. These complexes are compatible with ZapD dimers bound to one or two FtsZ monomers, corresponding to ZapD:FtsZ stoichiometries of 2:1 and 1:1, respectively (Supplementary Fig. 1a (III-IV)). This observation is consistent with the proposed interaction model.”

      ii) How does 40 - 80 uM of ZapD correspond to a molar ratio of approximately 6?

      It was a typo from previous versions. We have corrected it in the revised version. 

      iii) The ratios of ZapD to FtsZ are different when described later in page 4 in the context of the toroid. Are these ratios relevant compared to the contradicting ratios mentioned later in page 4?

      To clarify issues related to the binding of ZapD to FtsZ, we have rewritten the sections on ZapD binding stoichiometries to both FtsZ-GDP and FtsZ polymers in the presence of GTP (see page 4 lines 14-18 and page 5 lines 15-26).

      iv) Supplementary Figure 5:

      In the representative gel shown, the amount of ZapD in the pellet does not appear to be double compared to 10 and 30 uM concentrations. However, the estimated amount in the plot shown in panel (c) appears to indicate that that ZapD has approximately doubled at 30 uM compared to 10 uM. Please re-check the quantification.

      Without prior staining calibration of the gels, there is no simple quantitative relationship between gel band intensities after Coomassie staining and the amount of protein in a band (Darawshe et al. 1993 Anal Biochem - DOI: 10.1006/abio.1993.1581). The latter point precludes a quantitative comparison of pelleting / SDS-PAGE data and analytical sedimentation measurements.

      v) How can a consistent ratio being maintained be explained in an irregular structure of the toroid? The number of ZapD should be much less compared to FtsZ according to the model.

      See answers to points i) and iii)

      (2) GTPase activity and assembly/disassembly of toroids:

      i) Page 3, Results section: last paragraph:

      What is the explanation or hypothesis for decrease in GTPase activity upon ZapD binding? Given that FtsZ core is not involved in the interaction of the higher order assemblies, what is the probable reason on decrease in GTPase activity upon ZapA binding?

      Excluded volume effects caused by macromolecular crowding, such as high concentrations of Ficoll or dextran, promote the formation of dynamic FtsZ polymer networks (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200). In these conditions, FtsZ GTPase activity is significantly slowed down compared to the activity observed in FtsZ filaments formed without crowding, leading to a decreased GTPase turnover rate. Similar mechanisms may also apply to assembly reactions in the presence of ZapD (see, for example, Durand-Heredia et al. 2012 J Bacteriol - DOI: 10.1128/JB.0017612).

      ii) How is the decrease in GTPase activity compatible with dynamics of disassembly? Please substantiate on why disassembly is linked to transient interaction with ZapD. Shouldn't disassembly and transient interaction be linked to recovery of GTPase activity rates? 

      iii) Does the decrease in GTPase activity imply a reduced turnover of disassembly of FtsZ to monomers? Hence, how is the reduction in turbidity related to the decrease in GTPase activity? How does the GTPase activity change with time? iv) How can the decrease in GTPase activity with increasing ZapD be explained?

      We conducted GTPase activity assays within the first two minutes following GTP addition, a timeframe that promotes bundle formation. Previous studies, such as those by Durand-Heredia et al. (2012 J Bacteriol - DOI: 10.1128/JB.00176-12), have also indicated a reduction in GTPase activity during the initial moments of bundling. The reviewer’s suggestion that GTPase activity should recover after the disassembly of toroids is valid and warrants further investigation. To test this hypothesis, measuring GTPase activity over extended periods would be necessary. When comparing FtsZ filaments observed in vitro, we found that ZapD-containing FtsZ bundles exhibit decreased GTPase activity. Although we did not measure it directly, we anticipate a reduction in the rate of GTP exchange within the polymer, similar to the behavior of FtsZ bundles formed in the presence of crowders (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200), which also display a delay in GTPase activity. High levels of ZapD enhance bundling, which may explain the decrease in GTPase activity as ZapD levels increase.

      (3) Treadmilling and FtsZ filament organisation:

      If the FtsZ filaments are cross linked antiparallel, how can tread milling behaviour be explained? Doesn't tread milling imply a directionality of filament orientations in the FtsZ bundles?

      Our model can only suggest filament alignment. The latter is compatible with parallel and antiparallel filament organization.

      The correlation between observed effects on GTPase activity, treadmilling and ZapD interaction will provide an interesting insight to the model.

      Establishing a detailed correlation among these three factors could yield valuable insights into the mechanisms and potential physiological implications of the structural organization of FtsZ polymers influenced by crosslinking proteins and ZapD. To precisely characterize these interactions, further time-resolved assays in solution and reconstituted systems would be necessary, which is beyond the scope of this study.

      (4) Toroid dimensions and intrinsic curvature:

      i) Page 4: What is the correlation between the toroid dimensions and the intrinsic curvature of the FtsZ filaments? Given the thickness of ~ 127 nm, please provide an explanation of how the intrinsic curvature of FtsZ is compatible with both the inner and outer diameters of 500 nm and 380 nm.

      We added a paragraph for clarification (page 6, lines 20-24):

      “Previous studies have shown different FtsZ structures at different concentrations and buffer conditions. FtsZ filaments are flexible and can generate different curvatures ranging from mini rings of ~24 nm to intermediate circular filaments of ~300 nm or toroids of ~500 nm in diameter (reviewed in Erickson and Osawa 2017 Subcell Biochem - DOI: 10.1007/978-3-319-53047-5_5, and Wang et al. 2019 J Biol Chem - DOI: 10.1074/jbc.RA119.009621). It is reasonable to assume that FtsZ filaments can accommodate the toroid shape promoted by ZapD crosslinking.”

      ii) For the curvature of FtsZ filaments to be similar, the length of the filaments in the inner circles of the toroid have to be smaller than those in the outer circles? Is this true? Or are the FtsZ filaments of uniform length throughout?

      Due to the limitations in the resolution of the toroidal structure, we could not accurately measure the length or curvature of the filaments. Considering the FtsZ flexibility, these filaments may exhibit various curvatures and lengths, as previously mentioned.

      iii) Is the ZapD density uniform thought the inner and outer regions of the toroid?

      The heterogeneity found in the structures suggests a difference in ZapD binding densities; however, we lack quantitative data to confirm this. The outer regions are likely more exposed to the attachment of free ZapDs in the surrounding environment, which leads to the recruitment of more ZapDs and the formation of straight bundles. Supplementary Fig. 7b (right) features a zoomed-in image of a toroid adorned with globular densities in the outer areas, which may correspond to ZapD oligomers. Similar characteristics appear in the straight filaments illustrated in the panels of this figure. However, these features are absent or present in significantly lower quantities in toroids with a 1:1 ratio and toroids formed under a 1:6 ratio, suggesting that the external decoration is due to ZapD saturation. Unfortunately, we cannot provide further details on the characteristics of these protein associations.

      (5) Regular arrangement and toroid structure:

      i) Page 4: last section, first sentence: What is meant by 'regular' arrangement here? The word regular will imply a periodicity, which is not a feature of the bundles.

      We have rephrased the sentence in the revised manuscript as follows (page 5, lines 35-36): “Previous studies have visualized bundles with similar features using negative-stain transmission electron microscopy.”

      ii) Similarly, page 6 first sentence mentions about a conserved toroid structure. Which aspects of the toroid structure are conserved and what are the other toroids that are compared with?

      We noted several features that are conserved in the ZapD-mediated toroidal structures, including their diameter, thickness, height, and roundness, as shown in Fig. 2d-e and Supplementary Fig. 6b-c. However, the internal organization of the toroid does not exhibit a periodic or regular structure. We have rephrased this to say: “…resulting in a toroidal structure observed for the first time following the interaction between FtsZ and one of its natural partners in vitro.” (page 7, lines 42-43):

      iii) Discussion, para 1, last sentence: How is the toroid structural correlated with the bacterial cell FtsZ ring? What do the authors mean by 'structural compatibility' with the ring?

      The toroidal structures described in this work are consistent with the intermediate curved conformation of FtsZ polymers observed more generally across bacterial species and are likely to be part of the FtsZ structure responsible for constriction-force generation (Erickson and Osawa 2017 Subcell Biochem - DOI: 10.1007/978-3-319-53047-5_5). In the case of E. coli, if we assume an average of around 5000 FtsZ monomers in the polymeric form (two-thirds of the total found in dividing cells), this number of FtsZ molecules would be enough to encircle the cell around 6-8 times (considering the axial spacing between FtsZ monomers and the cell perimeter), which would be compatible with the structure adopting the form of a discontinuous toroidal assembly. 

      The term “structural compatibility” could be confusing, so we have removed it from the revised text. 

      iv) Discussion, para 2:

      Resemblance with the division ring in bacterial cells is mentioned in paragraph 2, however the features that are compared to claim resemblance comes later in the discussion. It will be helpful to rearrange the sections so that these are presented together.

      We have reorganized the sections following the reviewer’s suggestion.

      (6) CryoET of toroid and interpretation of the tomogram:

      i) Supplementary figure 10: It is not convincing that the indicated densities correspond to ZapD. Is the resolution and the quality of the tomogram sufficient to comment on the localisation of ZapD? It is challenging to see any interpretable difference between FtsZ filament dimers in 10a vs FtsZ+ZapD in panel (b).

      We acknowledge that localizing ZapDs in the structure is a challenge due to the limited resolution of the cryo-ET data (page 7, lines 11-13, 21-24). We have manually labeled putative ZapDs in the data and have done our best to identify the structures reasonably while recognizing the limitations of the segmentation. We use different colors to guide the eye without clearly stating what is or is not a ZapD. However, filaments found in 1:1 and 1:6 ratio toroids have a clear difference in thickness to those observed in the absence of ZapD. The filaments in 1:0 ratio toroids provide a reasonable control for elongation due to the missing wedge and allow us to attribute the extra filament thickness to ZapD densities confidently (page 7, lines 5-12).

      ii) How is it quantified that the elongation in Z is beyond the missing wedge effect? Please include the explanation for this in the methods or the relevant data as Supplementary figure panels.

      The missing wedge effect causes an elongation by a factor of 2 along the Z-axis. This elongation is evident in the filaments of the 1:0 ratio toroids. Consequently, the elongation in the filaments of the 1:1 and 1:6 ratio toroids exceed that observed due to the missing wedge effect. We have also added this information to the methods section (page 17, lines 31-33).

      iii) Segmentation analysis of the tomogram and many method details of analysis and interpretation of the tomography data has not been described. This is essential to understand the reliability of the interpretation of the tomography data.

      We provided thresholds for volume extraction as isosurfaces and clarified how the putative ZapDs are colored in the revised methods section (page 17, line 24-30). However, we could not perform quantitative analysis of the segmented structures.

      (7) Quantification of structural features of the toroid:

      i) Page 5 last sentence mentions that it provides crucial information on the connectivity and length of the filaments. Is it possible to show a quantification of these features in the toroid models?

      Based on our data, we hypothesize that ZapD crosslinks filaments by creating a network of short filaments rather than long ones. These short filaments assemble to form a complete ring. However, the current resolution of the data precludes precise quantification of this process.

      In the revised version, we have changed this last sentence to put the emphasis on the crosslinking geometry instead (page 7, lines 40-43):

      “Cryo-ET imaging of ZapD-mediated FtsZ toroidal structures revealed a preferential vertical stacking and crosslinking of short ZapD filaments, which are also crosslinked laterally and diagonally, allowing for filament curvature and resulting in a toroidal structure observed for the first time following the interaction between FtsZ and one of its natural partners in vitro.”

      ii) In toroids with increasing concentrations, will it be possible to quantify the number of blobs which have been interpreted as ZapD? Is this consistent with the data of FtsZ to ZapD ratios?

      These quantifications would assist in interpreting the data. However, due to the limited resolution of the data, we are reluctant to provide estimates.

      iii) What is the average length of the filaments in the toroid? Can this be quantified from the tomography data? Similarly, can there be an estimation of curvature of the filaments from the data?

      Unfortunately, the complexity of the toroidal structure and the limited resolution we achieved prevent us from providing accurate quantification. We attempted to track and measure the length of the filaments, but this proved challenging due to the high concentration of connections. Regarding curvature, the arrangement of the filaments into toroids makes it difficult to measure the curvature of each filament. Additionally, the filaments are not perfectly aligned, which suggests that there may be various curvatures present.

      iv) What is the average distance between the FtsZ filaments in the toroid? Does this correlate with the ZapD dimensions, when a model has been interpreted as ZapD?

      We measured the spacing (not the center-to-center distance) between filaments in the toroids and showed this in Supplementary Fig. 14b (sky blue). We observed that the distances are very similar to those found for straight bundles (light blue), with a slightly greater variability. We should point out here that the distances were measured in the XY plane to simplify the measurements.

      v) What is the estimate of average inter-filament distances within the toroid? (Similar data as in Figure 13 for bundles?) When the distance between filaments is less, is the angle between ZapD and FtsZ filament axis different from 90 degrees? This might help in validation of interpretation of some of the blobs as ZapD.

      The distances between the filaments presented in Supplementary Figure 14b include those for toroids (1:1 ratio, represented in sky blue) and straight bundles (1:6 ratio, shown in light blue). We focused solely on the distance between filaments in the XY plane and did not differentiate based on the connection angle. Although the distance may vary with changes in the angles between filaments, our data does not permit us to make any quantitative measurements regarding these variations.

      vi) How does the inter filament distance in the toroids compare with the dimensions of ZapD dimers, in the toroids and bundles? Is there a role played by the FtsZ linker in deciding the spacing?

      The dimension of a ZapD dimer is ~7 nm along the longest axis. Huecas et al. (2017 Biophys J - DOI: 10.1016/j.bpj.2017.08.046) estimated an interfilament distance of ~6.5-6.7 nm for toroids of FtsZ from Bacillus subtilis. These authors also observed a difference in this spacing as a function of the linker, assuming that linker length would modulate FtsZ-FtsZ interactions. We observe a similar spacing for double filaments (5.9 ± 0.8 nm) and a longer spacing in the presence of ZapD (7.88 ± 2.1 nm). Previous studies with ZapD did not measure the distance between filaments but hypothesized that distances of 6-12 nm are allowed based on the structure of the protein (Schumacher M. 2017 J Biol Chem - DOI: 10.1074/jbc.M116.773192). Longer linkers may also provide additional freedom to spread the filaments further apart and facilitate a higher degree of variability in the connections by ZapD. This discussion has been included in the revised text (page 6, line 10-18).

      (8) Crosslinking by ZapD and toroid reorganisation by transient interactions:

      i) Page 5, paragraph 2: Presence of putative ZapD decorating a single FtsZ': When ZapD is interacting with 2 FtsZ monomers within the same protofilament, it does not have any more valency to crosslink filaments. How do the authors propose that this can connect nearby filaments?

      We thank the reviewer for raising this interesting question. We see examples of ZapD dimers binding a filament through only one of the monomers, occupying one valency of the interaction and leaving one of the monomers available for another binding. We expect to see higher densities of ZapD in the outer regions of toroids simply because there are no longer (or not as frequent) FtsZ filaments available to be attached and join the overall toroid structure. Assuming that a ZapD dimer could bind the same FtsZ filament, this region would not be able to connect to other nearby filaments via these interactions.

      ii) Page 5: How are the authors coming up with the proposal of a reorganisation of toroid structures to a bundle? Given the extensive cross linking, a transition from a toroid to a bundle has to be a cooperative process and may not be driven by transient interactions. I would imagine that the higher concentration of ZapD will directly result in straight bundles because of the increased binding events of a dimer to one filament.

      Theoretically, this is correct. A certain degree of cooperativity linked to multivalent interactions would also favor the establishment of other ZapD connections. Furthermore, the formation of these structures occurs relatively quickly, within the first two minutes following the addition of GTP. We observed various intermediate structures, ranging from sparse filament bundles to toroids and straight filaments. However, the limited data prevents us from proposing a model that eventually explains the formation of higher-order structures over time.

      iii) Given such a highly cross-linked mesh, how can you justify transient interactions and loss of ZapD leading to disassembly? The possibility that ZapD can diffuse out of such a network seems impossible. Hence, what is the significance of a transient interaction? What is the basis of calling the interactions transient?

      We have noted that the term “transient” used to define the interaction between ZapD and FtsZ seems to generate confusion. Therefore, we have decided to replace this term to improve the readability of our manuscript, which has been edited accordingly.

      iv) Does the spacing between ZapD connections decide the curvature of the toroid?

      The FtsZ linker connected to ZapD molecules could modulate filament spacing and curvature, as previously suggested (Huecas et al. 2017 Biophys J - DOI: 10.1016/j.bpj.2017.08.046; Sundararajan and Goley 2017 J Biol Chem - DOI: 10.1074/jbc.M117.809939, and Sundararajan et al. 2018 Mol Microbiol - DOI: 10.1111/mmi.14081). In our structures, we observe a mixture of curvatures in the internal organization of the toroid. Despite the flexibility of FtsZ, filaments have a preferred curvature that FtsZ would initially determine. However, the amount of ZapD connections will eventually force the filament structure to adapt and align with neighboring filaments, facilitating connections with more ZapDs. Thus, the binding density of ZapD molecules significantly impacts FtsZ curvature rather than the ZapD connections themselves. However, the molecular mechanism describing the link between ZapD binding and polymer curvature remains unsolved.

      v) What is the difference in conditions between supplementary figure 6 and 12? Why is it that toroids are not observed in 12, for the same ratios?

      Both figures show images of samples under the same conditions. At high ZapD concentrations in the sample, we observe a mixture of structures ranging from single filaments, bundles, toroids, and straight bundles. In Supplementary Fig. 6, we have selected images of toroids, while in Supplementary Fig. 12, we have focused on single and double filaments. We aim to compare similar structures at different ZapD concentrations.

      (9) Correlation with in vivo observations:

      What is the approximate ratio of ZapD to FtsZ concentrations in the cell? In this context, within a cell which one - a toroid or bundle - will be preferred?

      Previous studies have estimated that E. coli cells contain approximately 5,000 to 15,000 FtsZ protein molecules, resulting in a concentration of around 3 to 10 µM (Rueda et al. 2003 J Bacteriol - DOI: 10.1128/JB.185.11.3344-3351.2003). Furthermore, only about two-thirds of these FtsZ molecules participate in forming the division ring (Stricker et al. 2002 PNAS - DOI: 10.1073/pnas.052595099). In contrast, ZapD is a low-abundance protein, with only around 500 molecules per cell (DurandHeredia et al. 2012 J Bacteriol - DOI: 10.1128/JB.00176-12), making it a relatively small fraction compared to the FtsZ molecules. Under these circumstances, toroidal structures are more likely to form than straight bundles, as the latter would require significantly higher concentrations of ZapD for proper assembly. We have added these considerations in the revised text (page 11, lines 1-7).

      (10) Interpretation of mZapD results:

      i) What is the experimental proof for weakened stability of the dimer? Rather than weakened stability, does this form a population of only monomeric ZapD or a proportion of non-functional or unfolded dimer? This requires to be shown by AUC or SEC to substantiate the claim of a weakened interface.

      We have provided new AUC results indicating that mZapD is partially monomeric, which suggests a weakened dimerization interface (page 9, line 15-16 and Supp. Fig. 15a). The assays revealed no signs of protein aggregation.

      ii) How does a weaker dimer result in thinner bundles and not toroids? A weaker dimer would imply that the number of ZapD linked to FtsZ will be less than the wild type, leading to less cross linking, which should lead to toroid formation rather than thinner bundles.

      This observation provides the most plausible explanation. However, we did not detect any toroidal structures, even at high concentrations of mZapD. This finding indicates that a more potent dimerization interface is essential for promoting the formation of toroidal structures rather than merely the number of ZapD-FtsZ connections. mZapD presumably has a reduced affinity for FtsZ, which, along with a weaker binding interface, may explain mZapD's inability to facilitate toroid formation.

      iii) This observation would imply that the geometry of the dimeric interaction plays a role in the bending of the FtsZ filaments into toroids? Please comment.

      Our data suggest that the binding density of ZapD to FtsZ polymers is a crucial factor governing the transition from toroidal structures to straight bundles. Toroids form when the polymers have excess free FtsZ (that ZapD does not crosslink). Additional factors, such as the orientation of the interactions, the length of the flexible linker, and the strength of the ZapD dimerization interface, are likely to contribute to these structural reorganizations. However, our current data do not allow for further analysis, and future experiments will be necessary to address these questions.

      (11) Curvature and plasticity of toroid:

      i) What are the factors that stabilise curved protofilaments/toroid structures in the absence of a cross linker, based on earlier studies from B. subtilis. A comparison will be insightful. ii) What is the effect of the linker length between FtsZ globular domain and CTP in the toroid spacing?

      Huecas et al. 2017 (Biophys J - DOI: 10.1016/j.bpj.2017.08.046) concluded that the disordered CTL of FtsZ serves as a spacer that modulates the self-organization of FtsZ polymers. They proposed that this intrinsically disordered CTL, which spans the gap between protofilament cores, provides approximately 70 Å of lateral spacing between the curved Bacillus subtilis FtsZ (BsFtsZ), forming toroidal structures. In contrast, the parallel filaments of tailless BsFtsZ mutants, which have a reduced spacing of 50 Å, will likely stick together, resulting in the straight bundles observed. In the full-length BsFtsZ filament, the flexibility allowed by the lateral association favors the coalescence of these curved protofilaments, leading to the formation of toroidal structures. 

      The role of the C-terminal tail of FtsZ in E. coli is critical for its functionality (Buske and Levin 2012 J Biol Chem - DOI: 10.1074/jbc.M111.330324). However, its structural involvement in complex formations remains unclear. Research indicates that any disordered peptide between 43 and 95 amino acids in length can function as a viable linker, while peptides that are significantly shorter or longer impede cell division (Gardner et al. 2013 Mol Microbiol - DOI: 10.1111/mmi.12279). Studies in E. coli and B. subtilis suggest that intrinsically disordered CTLs play a role in determining FtsZ assembly and function in vivo, and this role is dependent on the length, flexibility, and disorder of the tails. These aspects still require further exploration.

      iii) How is it concluded that the concentration of ZapD is modulating the behaviour of the toroid structure? ZapD as a molecule does not have much room for conformational flexibility beyond a few angstroms, in the absence of long flexible regions. Rather, shouldn't the linker length of FtsZ to the CTP decide the plasticity of the toroid?

      The length and flexibility of the linker can significantly influence structural interactions. As previously mentioned, a longer linker will likely enhance the range of interaction distances and orientations. However, specific interaction of ZapD and FtsZ is stronger than non-specific electrostatic FtsZ-FtsZ interactions, and this is not solely due to the flexibility of the linker. Instead, it can modulate the formation of either a toroidal structure or straight bundles.

      iv) "a minor free energy perturbation to bring about significant changes in the geometry of the fibers due to modifications in environmental conditions" - this sentence is not clear to me. How did the data described in the paper relate to minor free energy perturbations and how do environmental conditions affect this?

      This sentence aimed to convey the notion of polymorphism in FtsZ polymers. We acknowledge that the original version may have been unclear, so we have removed it in the new version of the manuscript (page 12, lines 1-2).

      (12) Missing controls:

      i) Supplementary Figure 2a: Interaction between ZapD and FtsZ: what was the negative control used in this experiment? Use of FtsZ with the CTP deletion or ZapD specific mutations will help in confirming that the Kd estimation is indeed driven by a specific interaction.

      Negative controls correspond to FtsZ and ZapD alone.

      ii) In a turbidity measurement, how will you distinguish between ZapD mediated bundling, ZapD independent bundling and FtsZ filaments alone? Here again, having a data with non-interacting mutational partners will make the data more reliable.

      The turbidity signal of individual proteins in the absence and presence of GTP is indistinguishable from that of the buffer. We have indicated this in the figure legend.

      iii) Control experiments to show that mZapD is folded (see point below) and to indeed prove that it is monomeric is missing.

      We have included the missing AUC data in the supplementary information (Supp Fig 15a).

      Minor points:

      -  Page 2, para 4: beta-sheet domain (instead of beta-strand)

      Done.

      -  Fig 2a and b: Why is a ratio mentioned in Figure 2a legend? I understood these images as individual proteins at 10 uM concentrations.

      That was a typing error; it corresponds to two individual proteins at 10 µM concentrations. 

      -  Fig 2. Y-axis - spelling of frequency (change in all figures where applicable)

      Corrected.

      -  Supplementary Figure 5: FtsZ 5 uM - change u to micro symbol. FtsZ - t is missing

      Corrected. 

      -  Molecular weight marker is xx. What does xx stand for?

      Corrected. 

      -  Fig 1: Units for GTPase activity on the y-axis is missing.

      Done.

      -  Suppl Fig 3: How was the normalisation carried out for the turbidity data?

      We have explained it the revised methods section. 

      -  Page 4, line 5: p missing in ZapD

      Done. 

      -  Page 5: paragraph 1, last sentence: stabilised or established?

      Done.

      -  Page 6: 3rd sentence from last: correct the sentence (one ZapD two FtsZ)

      Corrected. 

      -  Page 14: Fluorescence microscopy and FRAP experiments have not been described in the manuscript. Hence, these are not required in the methods.

      Corrected. 

      -  Please include representative gels of purified protein samples used in the assay for sample quality control.

      Controls for each protein are shown in Supplementary Fig. 5a as “control samples” corresponding to 5 µM of each protein before centrifugation.

      Reviewer #3 (Recommendations for the authors):

      Fig. S2a confirms and quantitates the interaction of ZapD with FtsZ-GDP monomers by F.A. It shows a surprisingly high Kd of ~10 µM. This seems important but it is ignored in the overall interpretation. Fig. S2b (FCS) suggests an even weaker interaction, but this may reflect higher order aggregates.

      As the reviewer points out, the interaction between ZapD and FtsZ in the GDP form is weak, consistent with the need for high concentrations of ZapD to form FtsZ macrostructures in the presence of GTP.

      We did not observe the formation of ZapD aggregates, even at higher protein (Author response image 1A) and salt (Author response image 1B) concentrations.

      Author response image 1.

      A) Sedimentation velocity (SV) profiles of ZapD over a concentration range of 2 to 30 µM in 50 mM KCl, 5 mM MgCl2, Tris-HCl pH 7. B) SV profiles of ZapD at 10 µM in different ionic strength concentrations in buffer 50-500 mM KCl, 5 mM MgCl2, 50 mM Tris-HCl pH 7. Abs280 measurements were collected at 48,000 rpm and 20 ºC. 

      Describing their assembly of toroids the authors state "Upon adding equimolar amounts of ZapD, corresponding to the subsaturating ZapD binding densities described in the previous section". My reading of Fig. 1b and S5 is that FtsZ is almost fully saturated at 1:1 concentration; In S5a at 5:5 µM about 25% of each is in the pellet, which is near 1:1 saturation. It is certainly >50% saturated. Shouldn't this be clarified to read "slightly substoichiometric. Of course, that undermines the identification of ZapD as such a substoichiometric number.

      We have rephrased the sentence following the reviewer’s suggestions to clarify matters (page 5, lines 39-40).

      The cryoET images in Fig. 3 are an average of five slices with a total thickness of 32 nm. The circular "short filaments..almost parallel" are therefore not single 5 nm diameter FtsZ filaments but must be alignment of filaments axially into sheets (or belts, the axial structure shown in Fig. S8e, discussed next). Importantly, the authors indicate "connections between filaments" by red arrows. This seems wrong for two reasons. (1) The "connections" are very sparse, and therefore not consistent with the near saturation of FtsZ by ZapD. (2) To show up in the 32 nm averaged slice, connections from multiple filaments would have to be aligned. Fig. 3e is a "view of the segmented toroidal structure." I think it shows sheets of filaments as noted above, and the suggested "crosslinks" are again very sparse and no more convincing.

      We thank the reviewer for pointing this out. This was an error on our part, which we have corrected in the figure legend of the revised version of the manuscript. The tomographic slice shown in Fig. 3a is an average of 5 slices, each with a pixel size of 0.86 nm, corresponding to a pixel size of 4.31 nm. It therefore corresponds to the thickness of a single FtsZ filament. The few red arrows indicate lateral connections between filaments, and as discussed earlier, ZapDs also crosslinks FtsZ filaments vertically, giving rise to the elongated structures observed in the Z-direction.

      All 3-D reconstructions and segmented renditions should have a scale bar. The axial cylindrical sheets seem to be confirmed and qualified in Fig. S8e. The cylindrical sheets are not continuous, but seem to consist of belt-like filaments that are ~8-10 nm wide in the axial direction. Adjacent belts are separated axially by ~5 nm gaps, and radially by 4-20 nm. The densest filaments in the projection image Fig. 3b are probably an axial superposition of 2-3 belts, while the lighter filaments may be individual belts.

      Fig. 4 shows a higher number of crosslinks but nowhere near a 1:1 stoichiometry. Most importantly to me, the identification of crosslinks vs filaments seems completely arbitrary. For example, if one colored grey all of the densities I 4a right panel, I would have no way to duplicate the distinctions shown in red and blue. Even if we accept the authors' distinction, it does not provide much structural insight. Continuous bands or sheets are identified as FtsZ, without any resolution of substructure, and any density outside these bands is ZapD. The spots identified as ZapD seem randomly dispersed and much too sparse to include all the ~1:1 ZapD.

      We appreciate the reviewer's comments. Scale bars are present in the tomographic slices but not in the 3D views, as these are perspective views, and it would be inappropriate to include scale bars. To provide context for the images, we added the dimensions of the toroids and toroid sections to the figure legends. 

      As previously mentioned, the resolution of our data limits our ability to accurately segment ZapD densities, especially in the Z direction. In Fig. 4, we have done our best to segment the ZapD densities at the top and sides of the FtsZ filaments, but many densities have been missed. We have clarified this point in the text and in the figure legend. We have clarified this point in both the text and the figure legends. This preliminary annotated view is meant to help illustrate the formation of the toroids. In Fig. 3, we have labeled only a few arrows to highlight the lateral connections between the FtsZ filaments; however, there are many more connections than those indicated.

      Fig. S12 explores the effect of increasing ZapD to 1:6, and the authors conclude "the high concentration of ZapD molecules increased the number of links between filaments and ultimately promoted the formation of straight bundles." However, the binding sites on FtsZ are already nearly saturated at 10:10.

      We cannot assume that all FtsZ binding sites are present at a 1:1 ratio. Our pelleting assay confirms the presence of both proteins in the pellet, but we should be cautious about quantification due to the limitations of this technique. Based on our cryo-EM experiments, the amount of ZapD associated with these structures is much lower. We hypothesize that ZapD proteins sediment with the large FtsZ structures, acting as an external decoration for the toroids. A single ZapD monomer may be bound to multiple outer filaments of the structures, which could effectively increase the total µM concentration observed in the pelleting assay. This situation may explain the enrichment of ZapD in the pellet at high concentrations, when theoretically only a 1:1 ratio should be possible. We have observed external decorations of ZapD at high concentrations (see Supplementary Fig. 6). We believe that the pelleting assay simplifies the system and should be used to complement the cryo-EM images.

      Minor points.

      In the Intro "..to follow a treadmilling behavior, similar to that of actin filaments.9-13." These refs have little to do with treadmilling. I suggest: Wagstaff..Lowe mBio 2017; Du..Lutkenhaus PNAS 2018; Corbin Erickson BJ 2020; Ruis..Fernandez-Tornero Plos Biol 2022.

      Following the reviewer’s suggestions, we have modified the references in the revised version. 

      The authors responded to a query during review stating that the concentration of ZapD always refers to the monomer subunit. That seems certainly the case for Fig. S1, but the caption to Fig. 1a confuses the stoichiometry issue: "expecting (sic) at around 2:1 FtsZ:ZapD." Perhaps it could be clarified by stating that the Fig. shows only half the FtsZ's occupied. But in Fig. 1b the absorbance reaches its maximum at equimolar FtsZ and ZapD. That means that all FtsZ's are bound to a ZapD monomer. Why not draw the model in 1A show that? Fig. S5 is also consistent with this 1:1 stoichiometry. And this might be the place to contrast the planar model with the stacked model suggested by Fig. 5 where the two FtsZ filaments are ~8 nm apart, and the ZapD bridging them is on top.

      We have revised the legend for Fig. 1a to improve its readability. In Fig. 1b, the absorbance data indicate that most FtsZ proteins form macrostructures; however, this does not imply that all FtsZ proteins are bound to ZapDs. Our findings demonstrate that this binding only occurs in the case of straight bundles.

      It may help to note that some previous studies have expressed the concentration of ZapD as the dimer. E.g., Roach..Khursigara 2016 found maximal pelleting at FtsZ:ZapD(dimer) of 2:1 (their Fig. 3), completely consistent with the 1:1 FtsZ:ZapD(monomer) in the present study.

      We recognize this discrepancy in the literature. Therefore, throughout the manuscript, the molar concentrations of both proteins are expressed in terms of the FtsZ and ZapD monomer species.

    1. eLife Assessment

      Combining experimental and computation approaches, this manuscript provides convincing evidence for a post-transcriptional mechanism that provides robust control over the protein expression level of RecB in E. coli. In addition to uncovering how DNA damage drives higher levels of RecB protein, this work also reveals important tenets for how broader mechanisms that suppress noise and underlie responsive tuning of protein levels can be achieved.

    2. Reviewer #1 (Public review):

      Summary:

      In this study the authors use an elegant set of single-molecule experiments to assess the transcriptional and post-transcriptional regulation of RecB. The question stems from a previous observation from the same lab, that RecB protein levels are low and not induced under DNA damage. The authors first show that recB transcript levels are low and have a short half-live. They further show that RecB levels are likely regulated via translational control. They provide evidence for low noise in RecB protein levels across cells and show that the translation of the mRNA increases under double-strand break conditions. Authors identify Hfq binding sites in the recbcd operon and show that Hfq regulates the levels of RecB protein without changing the mRNA levels. They suggest that RecB translation is directly controlled by Hfq binding to mRNA, as mutating one of the binding sites has a direct effect on RecB protein levels.

      The implication of Hfq in regulation of RecB translation is important, and suggests mechanisms of cellular response to DNA damage that are beyond the canonically studied mechanisms (such as transcriptional regulation by LexA). Data are clearly presented and the writing is direct and easy to follow. Overall, the study is well-designed and provides novel insights into the regulation of RecB, that is part of the complex required to process break ends.

      Comments on revisions:

      All my comments are addressed - I congratulate the authors on this excellent work.

    3. Reviewer #2 (Public review):

      Summary:

      The authors carry out a careful and rigorous quantitative analysis of RecB transcript and protein levels at baseline and in response to DNA damage. Using single-molecule FISH and Halo-tagging in order to achieve sensitive measurements, they provide evidence that enhanced RecB protein levels in response to DNA damage are achieved through a post-transcriptional mechanism mediated by the La-like RNA binding protein, Hfq. In terms of biological relevance, the authors suggest that this mechanism provides a way to control the optimum level of RecB expression as both deletion and over-expression are deleterious. In addition, the proposed mechanism provides a new framework for understanding how transcriptional noise can be suppressed at the protein level.

      Strengths:

      Strengths of the manuscript include the rigorous approaches and orthogonal evidence to support the core conclusions, for example, the evidence that altering either Hfq or its recognition sequence on the RNA similarly enhance the protein to RNA ratio of RecB. The writing is clear and the experiments are well-controlled. The modeling approaches provide essential context to interpret the data, particularly given the small numbers of molecules per cell. The interpretations are careful and well supported. The findings

      Weaknesses:

      Future studies (and possibly new experimental tools) will be needed to provide further insight into the relevance of the findings to more subtle changes in RecB levels than that occurring in response to extensive DNA damage.

    4. Reviewer #3 (Public review):

      Summary:

      The work by Kalita et al. reports regulation of RecB expression by Hfq protein in E.coli cell. RecBCD is an essential complex for DNA repair and chromosome maintenance. The expression level needs to be regulated at low level under regular growth conditions but upregulated upon DNA damage. Through quantitative imaging, the authors demonstrate that recB mRNAs and proteins are expressed at low level under regular conditions. While the mRNA copy number demonstrates high noise level due to stochastic gene expression, the protein level is maintained at a lower noise level compared to expected value. Upon DNA damage, the authors claim that the recB mRNA concentration is decreased, however RecB protein level is compensated by higher translation efficiency. Through analyzing CLASH data on Hfq, they identified two Hfq binding sites on RecB polycistronic mRNA, one of which is localized at the ribosome binding site (RBS). Through measuring RecB mRNA and protein level in the ∆hfq cell, the authors conclude that binding of Hfq to the RBS region of recB mRNA suppresses translation of recB mRNA. This conclusion is further supported by the same measurement in the presence of Hfq sequestrator, the sRNA ChiX, and the deletion of the Hfq binding region on the mRNA.

      Strengths:

      (1) The manuscript is well-written and easy to understand.<br /> (2) While there are reported cases of Hfq regulating translation of bound mRNAs, its effect on reducing translation noise is relatively new.<br /> (3) The imaging and analysis are carefully performed with necessary controls.

      Comments on revisions:

      The authors have addressed my previous concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public Review):

      The authors make a compelling case for the biological need to exquisitely control RecB levels, which they suggest is achieved by the pathway they have uncovered and described in this work. However, this conclusion is largely inferred as the authors only investigate the effect on cell survival in response to (high levels of) DNA damage and in response to two perturbations - genetic knock-out or over-expression, both of which are likely more dramatic than the range of expression levels observed in unstimulated and DNA damage conditions.

      In the discussion of the updated version of the manuscript, we have clarified the limits of our interpretation of the role of the uncovered regulation.

      Lines 411-417: “It is worth noting that the observed decrease in cell viability upon DNA damage was detected for relatively drastic perturbations such as recB deletion and RecBCD overexpression. Verifying these observations in the context of more subtle changes in RecB levels would be important for further investigation of the biological role of the uncovered regulation mechanism. However, the extremely low numbers of RecB proteins make altering its abundance in a refined, controlled, and homogeneous across cells manner extremely challenging and would require the development of novel synthetic biology tools.”

      Reviewer #3 (Public Review):

      The major weaknesses include a lack of mechanistic depth, and part of the conclusions are not fully supported by the data.

      (1) Mechanistically, it is still unclear why upon DNA damage, translation level of recB mRNA increases, which makes the story less complete. The authors mention in the Discussion that a moderate (30%) decrease in Hfq protein was observed in previous study, which may explain the loss of translation repression on recB. However, given that this mRNA exists in very low copy number (a few per cell) and that Hfq copy number is on the order of a few hundred to a few thousand, it's unclear how 30% decrease in the protein level should resides a significant change in its regulation of recB mRNA.

      We agree that the entire mechanistic pathway controlling recB expression may be not limited to just Hfq involvement. We have performed additional experiments, proposed by the reviewer, suggesting that a small RNA might be involved (see below, response to comments 3&4). However, we consider that the full characterisation of all players is beyond the scope of this manuscript. In addition to describing the new data (see below), we expanded the discussion to explain more precisely why changes in Hfq abundance upon DNA damage may impact RecB translation. 

      Lines 384-391: “A modest decrease (~30%) in Hfq protein abundance has been seen in a proteomic study in E. coli upon DSB induction with ciprofloxacin (DOI: 10.1016/j.jprot.2018.03.002). While Hfq is a highly abundant protein, it has many mRNA and sRNA targets, some of which are also present in large amounts (DOI: 10.1046/j.1365-2958.2003.03734.x). As recently shown, the competition among the targets over Hfq proteins results in unequal (across various targets) outcomes, where the targets with higher Hfq binding affinity have an advantage over the ones with less efficient binding (DOI: 10.1016/j.celrep.2020.02.016). In line with these findings, it is conceivable that even modest changes in Hfq availability could result in significant changes in gene expression, and this could explain the increased translational efficiency of RecB under DNA damage conditions. “

      (2) Based on the experiment and the model, Hfq regulates translation of recB gene through binding to the RBS of the upstream ptrA gene through translation coupling. In this case, one would expect that the behavior of ptrA gene expression and its response to Hfq regulation would be quite similar to recB. Performing the same measurement on ptrA gene expression in the presence and absence of Hfq would strengthen the conclusion and model.

      Indeed, based on our model, we expect PtrA expression to be regulated by Hfq in a similar manner to RecB. However, the product encoded by the ptrA gene, Protease III, (i) has been poorly characterised; (ii) unlike RecB, is located in the periplasm (DOI: 10.1128/jb.149.3.1027-1033.1982); and (iii) is not involved in any DNA repair pathway. Therefore, analysing PtrA expression would take us away from the key questions of our study.

      (3) The authors agree that they cannot exclude the possibility of sRNA being involved in the translation regulation. However, this can be tested by performing the imaging experiments in the presence of Hfq proximal face mutations, which largely disrupt binding of sRNAs.

      (4) The data on construct with a long region of Hfq binding site on recB mRNA deleted is less convincing. There is no control to show that removing this sequence region itself has no effect on translation, and the effect is solely due to the lack of Hfq binding. A better experiment would be using a Hfq distal face mutant that is deficient in binding to the ARN motifs.

      We performed the requested experiments. We included this data in the manuscript in the supplementary figure (Figure S11), and our interpretation in the discussion.

      Lines 354-378: “While a few recent studies have shown evidence for direct gene regulation by Hfq in a sRNA-independent manner (DOI: 10.1101/gad.302547.117; DOI: 10.1111/mmi.14799; DOI: 10.1371/journal.pgen.1004440; DOI: 10.1111/mmi.12961; DOI: 10.1038/emboj.2013.205), we attempted to investigate whether a small RNA could be involved in the Hfq-mediated regulation of RecB expression. We tested Hfq mutants containing point mutations in the proximal and distal sides of the protein, which were shown to disrupt either binding with sRNAs or with ARN motifs of mRNA targets, respectively [DOI: 10.1016/j.jmb.2013.01.006, DOI: 10.3389/fcimb.2023.1282258]. Hfq mutated in either proximal (K56A) or distal (Y25D) faces were expressed from a plasmid in a ∆hfq background. In both cases, Hfq expression was confirmed with qPCR and did not affect recB mRNA levels (Supplementary Figure S11b). When the proximal Hfq binding side (K56A) was disrupted, RecB protein concentration was nearly similar to that obtained in a ∆hfq mutant (Supplementary Figure S11a, top panel). This observation suggests that the repression of RecB translation requires the proximal side of Hfq, and that a small RNA is likely to be involved as small RNAs (Class I and Class II) were shown to predominantly interact with the proximal face of Hfq [DOI: 10.15252/embj.201591569]. When we expressed Hfq mutated in the distal face (Y25D) which is deficient in binding to mRNAs, less efficient repression of RecB translation was detected (Supplementary Figure S11a, bottom panel). This suggests that RecB mRNA interacts with Hfq at this position. We did not observe full de-repression to the ∆hfq level, which might be explained by residual capacity of Hfq to bind its recB mRNA target in the point mutant (Y25D) (either via the distal face with less affinity or via the lateral rim Hfq interface).”

      Taken together, these results suggest that Hfq binds to recB mRNA and that a small RNA might contribute to the regulation although this sRNA has not been identified.

      (5) Ln 249-251: The authors claim that the stability of recB mRNA is not changed in ∆hfq simply based on the steady-state mRNA level. To claim so, the lifetime needs to be measured in the absence of Hfq.

      We measured recB lifetime in the absence of Hfq in a time-course experiment where transcription initiation was inhibited with rifampicin and mRNA abundance was quantified with RT-qPCR. The results confirmed that recB mRNA lifetime in hfq mutants is similar to the one in the wild type (Figure S7d, referred to the line 263 of the manuscript).

      (6) What's the labeling efficiency of Halo-tag? If not 100% labeled, is it considered in the protein number quantification? Is the protein copy number quantification through imaging calibrated by an independent method? Does Halo tag affect the protein translation or degradation?

      Our previous study (DOI: 10.1038/s41598-019-44278-0) described a detailed characterization of the HaloTag labelling technique for quantifying low-copy proteins in single E. coli cells using RecB as a test case. 

      In that study, we showed complete quantitative agreement of RecB quantification between two fully independent methods: HaloTag-based labelling with cell fixation and RecB-sfGFP combined with a microfluidic device that lowers protein diffusion in the bacterial cytoplasm. This second method had previously been validated for protein quantification (DOI: 10.1038/ncomms11641) and provides detection of 80-90% of the labelled protein. Additionally, in our protocol, immediate chemical fixation of cells after the labelling and quick washing steps ensure that new, unlabelled RecB proteins are not produced. We, therefore, conclude that our approach to RecB detection is highly reliable and sufficient for comparing RecB production in different conditions and mutants.

      The RecB-HaloTag construct has been designed for minimal impact on RecB production and function. The HaloTag is translationally fused to RecB in a loop positioned after the serine present at position 47 where it is unlikely to interfere with (i) the formation of RecBCD complex (based on RecBCD structure, DOI: 10.1038/nature02988), (ii) the initiation of translation (as it is far away from the 5’UTR and the beginning of the open reading frame) and (iii) conventional C-terminalassociated mechanisms of protein degradation (DOI: 10.15252/msb.20199208). In our manuscript, we showed that the RecB-HaloTag degradation rate is similar to the dilution rate due to bacterial growth. This is in line with a recent study on unlabelled proteins, which shows that RecB’s lifetime is set by the cellular growth rate (DOI: 10.1101/2022.08.01.502339).

      Furthermore, we have demonstrated (DOI: 10.1038/s41598-019-44278-0) that (i) bacterial growth is not affected by replacing the native RecB with RecB-HaloTag, (ii) RecB-HaloTag is fully functional upon DNA damage, and (iii) no proteolytic processing of the RecB-HaloTag is detected by Western blot. 

      These results suggest that RecB expression and functionality are unlikely to be affected by the translational HaloTag insertion at Ser-47 in RecB.

      In the revised version of the manuscript, we have added information about the construct and discuss the reliability of the quantification.

      Lines 141-152: “To determine whether the mRNA fluctuations we observed are transmitted to the protein level, we quantified RecB protein abundance with singlemolecule accuracy in fixed individual cells using the Halo self-labelling tag (Fig. 2A&B).

      The HaloTag is translationally fused to RecB in a loop after Ser47(DOI: 10.1038/s41598-019-44278-0) where it is unlikely to interfere with the formation of RecBCD complex (DOI: 10.1038/nature02988), the initiation of translation and conventional C-terminal-associated mechanisms of protein degradation (DOI: 10.15252/msb.20199208). Consistent with minimal impact on RecB production and function, bacterial growth was not affected by replacing the native RecB with RecBHaloTag, the fusion was fully functional upon DNA damage and no proteolytic processing of the construct was detected (DOI: 10.1038/s41598-019-44278-0). To ensure reliable quantification in bacteria with HaloTag labelling, the technique was previously verified with an independent imaging method and resulted in > 80% labelling efficiency (DOI: 10.1038/s41598-019-44278-0, DOI: 10.1038/ncomms11641). In order to minimize the number of newly produced unlabelled RecB proteins, labelling and quick washing steps were followed by immediate chemical fixation of cells.”

      Lines 164-168: “Comparison to the population growth rate [in these conditions (0.017 1/min)] suggests that RecB protein is stable and effectively removed only as a result of dilution and molecule partitioning between daughter cells. This result is consistent with a recent high-throughput study on protein turnover rates in E. coli, where the lifetime of RecB proteins was shown to be set by the doubling time (DOI: 10.1038/s41467-024-49920-8).”

      (7) Upper panel of Fig S8a is redundant as in Fig 5B. Seems that Fig S8d is not described in the text.

      We have now stated in the legend of Fig S8a that the data in the upper panel were taken from Fig 5B to visually facilitate the comparison with the results given in the lower panel. We also noticed that we did not specify that in the upper panel in Fig S9a (the data in the upper panel of Fig S9a was taken from Fig 5C for the same reason). We added this clarification to the legend of the Fig S9 as well.

      We referred to the Fig S8d in the main text. 

      Lines 283-284: “We confirmed the functionality of the Hfq protein expressed from the pQE-Hfq plasmid in our experimental conditions (Fig. S8d).”

      Reviewer #1 (Recommendations For The Authors):

      (1) Experimental regime to measure protein and mRNA levels.

      (a) Authors expose cells to ciprofloxacin for 2 hrs. They provide a justification via a mathematical model. However, in the absence of a measurement of protein and mRNA across time, it is unclear whether this single time point is sufficient to make the conclusion on RecB induction under double-strand break.

      In our experiments, we only aimed to compare recB mRNA and RecB protein levels in two steady-state conditions: no DNA damage and DNA damage caused by sublethal levels of ciprofloxacin. We did not aim to look at RecB dynamic regulation from nondamaged to damaged conditions – this would indeed require additional measurements at different time points. We revised this part of the results to ensure that our conclusions are stated as steady-state measurements and not as dynamic changes.

      Line 203-205: “We used mathematical modelling to verify that two hours of antibiotic exposure was sufficient to detect changes in mRNA and protein levels and for RecB mRNA and protein levels to reach a new steady state in the presence of DNA damage.”

      (b) Authors use cell area to account for the elongation under damage conditions. However, it is unclear whether the number of copies of the recB gene are similar across these elongated cells. Hence, authors should report mRNA and protein levels with respect to the number of gene copies of RecB or chromosome number as well.

      Based on the experiments in DNA damaging conditions, our main conclusion is that the average translational efficiency of RecB is increased in perturbed conditions. We believe that this conclusion is well supported by our measurements and that it does not require information about the copy number of the recB gene but only the concentration of mRNA and protein. We did observe lower recB mRNA concentration upon DNA damage in comparison to the untreated conditions, which may be due to a lower concentration of genomic DNA in elongated cells upon DNA damage, as we mention in lines (221-223).

      Our calculation of translation efficiency could be affected by variations of mRNA concentration across cells in the dataset. For example, longer cells that are potentially more affected by DNA damage could have lower concentrations of mRNA. We verified that this is not the case, as recB mRNA concentration is constant across cell size distribution (see the figure below or Figure S5a from Supplementary Information).

      Therefore, we do not think that the measurements of recB gene copy would change our conclusions. We agree that measuring recB gene copies could help to investigate the reason behind the lower recB mRNA concentration under the perturbed conditions as this could be due to lower DNA content or due to shortage of resources (such as RNA polymerases). However, this is a side observation we made rather than a critical result, whose investigation is beyond the scope of this manuscript.

      Author response image 1.

      (2) RecB as a proxy for RecBCD. Authors suggest that RecB levels are regulated by hfq. However, how does this regulatory circuit affect the levels of RecC and RecD? Ratio of the three proteins has been shown to be important for the function of the complex.

      A full discussion of RecBCD complex formation regulation would require a complete quantitative model based on precise information on the dynamic of the complex formation, which is currently lacking. 

      We can however offer the following (speculative) suggestions assuming that all three subunits are present in similar abundance in native conditions (DOI: 10.1038/s41598019-44278-0 for RecB and RecC). As the complex is formed in 1:1:1 ratio (DOI: 10.1038/nature02988), we propose that the regulation mechanism of RecB expression affects complex formation in the following way. If the RecB abundance becomes lower than the level of RecC and RecD subunits, the complex formation would be limited by the number of available RecB subunits and hence the number of functional RecBCDs will be decreased. On the contrary, if the number of RecB is higher than the baseline, then, especially in the context of low numbers, we would expect that the probability of forming a complex RecBC (and then RecBCD) will be increased. Based on this simple explanation, we might speculate that regulation of RecB expression may be sufficient to regulate RecB levels and RecBCD complex formation. However, we feel that this argument is too speculative to be added to the manuscript. 

      (3) Role of Hfq in RecB regulation. While authors show the role of hfq in recB translation regulation in non-damage conditions, it is unclear as to how this regulation occurs under damage conditions.

      (a) Have the author carried out recB mRNA and protein measurement in hfqdeleted cells under ciprofloxacin treatment?

      We attempted to perform experiments in hfq mutants under ciprofloxacin treatment. However, the cells exhibited a very strong and pleiotropic phenotype: they had large size variability and shape changes and were also frequently lysing. Therefore, we did not proceed with mRNA and protein quantification because the data would not have been reliable. 

      (b) How do the authors propose that Hfq regulation is alleviated under conditions of DNA damage, when RecB translation efficiency increases?

      We propose that Hfq could be involved in a more global response to DNA damage as follows. 

      Based on a proteomic study where Hfq protein abundance has been found to decrease (~ 30%) upon DSB induction with ciprofloxacin (DOI: 10.1016/j.jprot.2018.03.002), we suggest that this could explain the increased translational efficiency of RecB. While Hfq is a highly abundant protein, it has many targets (mRNA and sRNA), some of which are also highly abundant. Therefore the competition among the targets over Hfq proteins results in unequal (across various targets) outcomes (DOI: 10.1046/j.13652958.2003.03734.x), where the targets with higher Hfq binding affinity have an advantage over the ones with less efficient binding. We reason that upon DNA damage, a moderate decrease in the Hfq protein abundance (30%) can lead to a similar competition among Hfq targets where high-affinity targets outcompete low-affinity ones as well as low-abundant ones (such as recB mRNAs). Thus, the regulation of lowabundant targets of Hfq by moderate perturbations of Hfq protein level is a potential explanation for the change in RecB translation that we have observed. Potential reasons behind the changes of Hfq levels upon DNA damage would be interesting to explore, however this would require a completely different approach and is beyond the scope of this manuscript.

      We have modified the text of the discussion to explain our reasoning:

      Lines 384-391: “A modest decrease (~30%) in Hfq protein abundance has been seen in a proteomic study in E. coli upon DSB induction with ciprofloxacin (DOI: 10.1016/j.jprot.2018.03.002). While Hfq is a highly abundant protein, it has many mRNA and sRNA targets, some of which are also present in large amounts (DOI: 10.1046/j.1365-2958.2003.03734.x). As recently shown, the competition among the targets over Hfq proteins results in unequal (across various targets) outcomes, where the targets with higher Hfq binding affinity have an advantage over the ones with less efficient binding (DOI: 10.1016/j.celrep.2020.02.016). In line with these findings, it is conceivable that even modest changes in Hfq availability could result in significant changes in gene expression, and this could explain the increased translational efficiency of RecB under DNA damage conditions.”

      (c) Is there any growth phenotype associated with recB mutant where hfq binding is disrupted in damage and non-damage conditions? Does this mutation affect cell viability when over-expressed or under conditions of ciprofloxacin exposure?

      We checked the phenotype and did not detect any difference in growth or cell viability affecting the recB-5 UTR* mutants either in normal conditions or upon exposure to ciprofloxacin. However, this is expected because the repair capacity is associated with RecB protein abundance and in this mutant, while translational efficiency of recB mRNA increases, the level of RecB proteins remains similar to the wild-type (Figure 5E).

      Minor points:

      (1) Introduction - authors should also discuss the role of RecFOR at sites of fork stalling, a likely predominant pathway for break generated at such sites.

      The manuscript focuses on the repair of DNA double-strand breaks (DSBs). RecFOR plays a very important role in the repair of stalled forks because of single-strand gaps but is not involved in the repair of DSBs (DOI: 10.1038/35003501). We have modified the beginning of the introduction to mention the role of RecFOR. 

      Lines 35-39: “For instance, replication forks often encounter obstacles leading to fork reversal, accumulation of gaps that are repaired by the RecFOR pathway (DOI: 10.1038/35003501) or breakage which has been shown to result in spontaneous DSBs in 18% of wild-type Escherichia coli cells in each generation (DOI: 10.1371/journal.pgen.1007256), underscoring the crucial need to repair these breaks to ensure faithful DNA replication.”

      (2) Methods: The authors refer to previous papers for the method used for single RNA molecule detection. More information needs to be provided in the present manuscript to explain how single molecule detection was achieved.

      We added additional information in the method section on the fitting procedure allowing quantifying the number of mRNAs per detected focus.

      Lines 515-530: “Based on the peak height and spot intensity, computed from the fitting output, the specific signal was separated from false positive spots (Fig. S1a). To identify the number of co-localized mRNAs, the integrated spot intensity profile was analyzed as previously described (DOI: 10.1038/nprot.2013.066). Assuming that (i) probe hybridization is a probabilistic process, (ii) binding each RNA FISH probe happens independently, and (iii) in the majority of cases, due to low-abundance, there is one mRNA per spot, it is expected that the integrated intensities of FISH probes bound to one mRNA are Gaussian distributed. In the case of two co-localized mRNAs, there are two independent binding processes and, therefore, a wider Gaussian distribution with twice higher mean and twice larger variance is expected. In fact, the integrated spot intensity profile had a main mode corresponding to a single mRNA per focus, and a second one representing a population of spots with two co-localized mRNAs (Fig. S1b). Based on this model, the integrated spot intensity histograms were fitted to the sum of two Gaussian distributions (see equation below where a, b, c, and d are the fitting parameters), corresponding to one and two mRNA molecules per focus. An intensity equivalent corresponding to the integrated intensity of FISH probes in average bound to one mRNA was computed as a result of multiple-Gaussian fitting procedure (Fig. S1b), and all identified spots were normalized by the one-mRNA equivalent.

      Reviewer #2 (Recommendations For The Authors):

      Overall the work is carefully executed and highly compelling, providing strong support for the conclusions put forth by the authors.

      One point: the potential biological consequences of the post-transcriptional mechanism uncovered in the work would be enhanced if the authors could 1) tune RecB protein levels and 2) directly monitor the role that RecB plays in generating single-standed DNA at DSBs.

      We agree that testing viability of cells in case of tunable changes in RecB levels would be important to further investigate the biological role of the uncovered regulation mechanism. However, this is a very challenging experiment as it is technically difficult to alter the low number of RecB proteins in a controlled and homogeneous across-cell manner, and it would require the development of precisely tunable and very lowabundant synthetic designs. 

      We did monitor real-time RecB dynamics by tracking single molecules in live E. coli cells in a different study (DOI: 10.1101/2023.12.22.573010) that is currently under revision. There, reduced motility of RecB proteins was observed upon DSB induction indicating that RecB is recruited to DNA to start the repair process.

    1. eLife Assessment

      In this detailed study, Cohen and Ben-Shaul characterized Accessory Olfactory Bulb (AOB) cell responses to various conspecific urine samples in female mice across the estrous cycle. The authors found that AOB cell responses varied depending on the strain and sex of the sample, but no clear differences were observed between estrous and non-estrous females. These findings provide convincing evidence that the AOB functions as a stable sensory relay, without directly modulating responses based on reproductive state, which supports the role of downstream brain regions in integrating reproductive state. Overall, this study provides valuable insights for researchers in the fields of olfaction and social neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      In this detailed study, Cohen and Ben-Shaul characterized the AOB cell responses to various conspecific urine samples in female mice across the estrous cycle. The authors found that AOB cell responses vary with the strains and sexes of the samples. Between estrous and non-estrous females, no clear or consistent difference in responses was found. The cell response patterns, as measured by the distance between pairs of stimuli, are largely stable. When some changes do occur, they are not consistent across strains or male status. The authors concluded that AOB detects the signals without interpreting them. Overall, this study will provide useful information for scientists in the field of olfaction.

      Strengths:

      The study uses electrophysiological recording to characterize the responses of AOB cells to various urines in female mice. AOB recording is not trivial as it requires activation of VNO pump. The team uses a unique preparation to activate the VNO pump with electric stimulation, allowing them to record AOB cell responses to urines in anesthetized animals. The study comprehensively described the AOB cell responses to social stimuli and how the responses vary (or not) with features of the urine source and the reproductive state of the recording females. The dataset could be a valuable resource for scientists in the field of olfaction.

      Weaknesses:

      (1) The figures could be better labeled.

      (2) For Figure 2E, please plot the error bar. Are there any statistics performed to compare the mean responses?

      (3) For Figure 2D, it will be more informative to plot the percentage of responsive units.

      (4) Could the similarity in response be explained by the similarity in urine composition? The study will be significantly strengthened by understanding the "distance" of chemical composition in different urine.

      (5) If it is not possible for the authors to obtain these data first-hand, published data on MUPs and chemicals found in these urines may provide some clues.

      (6) It is not very clear to me whether the female overrepresentation is because there are truly more AOB cells that respond to females than males or because there are only two female samples but 9 male samples.

      (7) If the authors only select two male samples, let's say ICR Naïve and ICR DOM, combine them with responses to two female samples, and do the same analysis as in Figure 3, will the female response still be overrepresented?

      (8) In Figure 4B and 4C, the pairwise distance during non-estrus is generally higher than that during estrus, although they are highly correlated. Does it mean that the cells respond to different urines more distinctively during diestrus than in estrus?

      (9) The correlation analysis is not entirely intuitive when just looking at the figures. Some sample heatmaps showing the response differences between estrous states will be helpful.

    3. Reviewer #2 (Public review):

      Summary:

      Many aspects of the study are carefully done, and in the grand scheme this is a solid contribution. I have no "big-picture" concerns about the approach or methodology. However, in numerous places the manuscript is unnecessarily vague, ambiguous, or confusing. Tightening up the presentation will magnify their impact.

      Strengths:

      (1) The study includes urine donors from males of three strains each with three social states, as well as females in two states. This diversity significantly enhances their ability to interpret their results.

      (2) Several distinct analyses are used to explore the question of whether AOB MCs are biased towards specific states or different between estrus and non-estrus females. The results of these different analyses are self-reinforcing about the main conclusions of the study.

      (3) The presentation maintains a neutral perspective throughout while touching on topics of widespread interest.

      Weaknesses:

      (1) Introduction:<br /> The discussion of the role of the VNS and preferences for different male stimuli should perhaps include Wysocki and Lepri 1991

      (2) Results:<br /> a) Given the 20s gap between them, the distinction between sample application and sympathetic nerve trunk stimulation needs to be made crystal clear; in many places, "stimulus application" is used in places where this reviewer suspects they actually mean sympathetic nerve trunk stimulation.<br /> b) There appears to be a mismatch between the discussion of Figure 3 and its contents. Specifically, there is an example of an "adjusted" pattern in 3A, not 3B.<br /> c) The discussion of patterns neglects to mention whether it's possible for a neuron to belong to more than one pattern. For example, it would seem possible for a neuron to simultaneously fit the "ICR pattern" and the "dominant adjusted pattern" if, e.g., all ICR responses are stronger than all others, but if simultaneously within each strain the dominant male causes the largest response.

      (3) Discussion:<br /> a) The discussion of chemical specificity in urine focuses on volatiles and MUPs (citation #47), but many important molecules for the VNS are small, nonvolatile ligands. For such molecules, the corresponding study is Fu et al 2015.<br /> b) "Following our line of reasoning, this scarcity may represent an optimal allocation of resources to separate dominant from naïve males": 1 unit out of 215 is roughly consistent with a single receptor. Surely little would be lost if there could be more computational capacity devoted to this important axis than that? It seems more likely that dominance is computed from multiple neuronal types with mixed encoding.

      (4) Methods:<br /> a) Male status, "were unambiguous in most cases": is it possible to put numerical estimates on this? 55% and 99% are both "most," yet they differ substantially in interpretive uncertainty.<br /> b) Surgical procedures and electrode positioning: important details of probes are missing (electrode recording area, spacing, etc).<br /> c) Stimulus presentation procedure: Are stimuli manually pipetted or delivered by apparatus with precise timing?<br /> d) Data analysis, "we applied more permissive criteria involving response magnitude": it's not clear whether this is what's spelled out in the next paragraph, or whether that's left unspecified. In either case, the next paragraph appears to be about establishing a noise floor on pattern membership, not a "permissive criterion."<br /> e) Data analysis, method for assessing significance: there's a lot to like about the use of pooling to estimate the baseline and the use of an ANOVA-like test to assess unit responsiveness.<br /> But:<br /> i) for a specific stimulus, at 4 trials (the minimum specified in "Stimulus presentation procedure") kruskalwallis is questionable. They state that most trials use 5, however, and that should be okay.<br /> ii) the methods statement suggests they are running kruskalwallis individually for each neuron/stimulus, rather than once per neuron across all stimuli. With 11 stimuli, there is a substantial chance of a false-positive if they used p < 0.05 to assess significance. (The actual threshold was unstated.) Were there any multiple comparison corrections performed? Or did they run kruskalwallis on the neuron, and then if significant assess individual stimuli? (Which is a form of multiple-comparisons correction.)

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this detailed study, Cohen and Ben-Shaul characterized the AOB cell responses to various conspecific urine samples in female mice across the estrous cycle. The authors found that AOB cell responses vary with the strains and sexes of the samples. Between estrous and non-estrous females, no clear or consistent difference in responses was found. The cell response patterns, as measured by the distance between pairs of stimuli, are largely stable. When some changes do occur, they are not consistent across strains or male status. The authors concluded that AOB detects the signals without interpreting them. Overall, this study will provide useful information for scientists in the field of olfaction.

      Strengths:

      The study uses electrophysiological recording to characterize the responses of AOB cells to various urines in female mice. AOB recording is not trivial as it requires activation of VNO pump. The team uses a unique preparation to activate the VNO pump with electric stimulation, allowing them to record AOB cell responses to urines in anesthetized animals. The study comprehensively described the AOB cell responses to social stimuli and how the responses vary (or not) with features of the urine source and the reproductive state of the recording females. The dataset could be a valuable resource for scientists in the field of olfaction.

      Weaknesses:

      (1) The figures could be better labeled.

      Figures will be revised to provide more detailed labeling.

      (2) For Figure 2E, please plot the error bar. Are there any statistics performed to compare the mean responses?

      We did not perform statistical comparisons (between the mean rates across the population). We will add this analysis and the corresponding error bars. 

      (3) For Figure 2D, it will be more informative to plot the percentage of responsive units.

      We will do it.

      (4) Could the similarity in response be explained by the similarity in urine composition? The study will be significantly strengthened by understanding the "distance" of chemical composition in different urine.

      We agree. As we wrote in the Discussion: “Ultimately, lacking knowledge of the chemical space associated with each of the stimuli, this and all the other ideas developed here remain speculative.”

      A better understanding of the chemical distance is an important aspect that we aim to include in our future studies. However, this is far from trivial, as it is not chemical distance per se (which in itself is hard to define), but rather the “projection” of chemical space on the vomeronasal receptor neurons array. That is, knowledge of the chemical composition of the stimuli, lacking full knowledge of which molecules are vomeronasal system ligands, will only provide a partial picture. Despite these limitations, this is an important analysis which we would have done had we access to this data.

      (5) If it is not possible for the authors to obtain these data first-hand, published data on MUPs and chemicals found in these urines may provide some clues.

      Measurements about some classes of molecules may be found for some of the stimuli that we used here, but not for all. We are not aware of any single dataset that contains this information for any type of molecules (e.g., MUPs) across the entire stimulus set that we have used. More generally, pooling results from different studies has limited validity because of the biological and technical variability across studies. In order to reliably interpret our current recordings, it would be necessary to measure the urinary content of the very same samples that were used for stimulation. Unfortunately, we are not able to conduct this analysis at this stage.

      (6) It is not very clear to me whether the female overrepresentation is because there are truly more AOB cells that respond to females than males or because there are only two female samples but 9 male samples.

      It is true that the number of neurons fulfilling each of the patterns depends on the number of individual stimuli that define it. However, our measure of “over-representation” aims to overcome this bias, by using bootstrapping to reveal if the observed number of patterns is larger than expected by chance. We also note that more generally, the higher frequency of responses to female, as compared to male stimuli, is obtained in other studies by others and by us, also when the number of male and female stimuli is matched (e.g., Bansal et al BMC Biol 2021, Ben-Shaul et al, PNAS 2010, Hendrickson et al, JNS, 2008).

      (7) If the authors only select two male samples, let's say ICR Naïve and ICR DOM, combine them with responses to two female samples, and do the same analysis as in Figure 3, will the female response still be overrepresented?

      We believe that the answer is positive, but we can, and will perform this analysis to check.

      (8) In Figure 4B and 4C, the pairwise distance during non-estrus is generally higher than that during estrus, although they are highly correlated. Does it mean that the cells respond to different urines more distinctively during diestrus than in estrus?

      This is an important observation. For the Euclidean distance there might be a simple explanation as the distance depends on the number of units (and there are more units recorded in non-estrus females). However, this simple explanation does not hold for the correlation distance. A higher distance implies higher discrimination during the non-estrus stage, but our other analyses of sparseness and the selectivity indices do not support this idea. We note that absolute values of distance measures should generally be interpreted cautiously, as they may depend on multiple factors including sample size. Also, a small number of non-selective units could increase the correlation in responses among stimuli, and thus globally shift the distances. For these reasons, we focus on comparisons, rather than the absolute values of the correlation distances. In the revised manuscript, we will note and discuss this important observation.

      (9) The correlation analysis is not entirely intuitive when just looking at the figures. Some sample heatmaps showing the response differences between estrous states will be helpful.

      If we understand correctly, the idea is to show the correlation matrices from which the values in 4B and 4C are taken. We can and will do this, probably as a supplementary figure.

      Reviewer #2 (Public review):

      Summary:

      Many aspects of the study are carefully done, and in the grand scheme this is a solid contribution. I have no "big-picture" concerns about the approach or methodology. However, in numerous places the manuscript is unnecessarily vague, ambiguous, or confusing. Tightening up the presentation will magnify their impact.

      We will revise the text with the aim of tightening the presentation.

      Strengths:

      (1) The study includes urine donors from males of three strains each with three social states, as well as females in two states. This diversity significantly enhances their ability to interpret their results.

      (2) Several distinct analyses are used to explore the question of whether AOB MCs are biased towards specific states or different between estrus and non-estrus females. The results of these different analyses are self-reinforcing about the main conclusions of the study.

      (3) The presentation maintains a neutral perspective throughout while touching on topics of widespread interest.

      Weaknesses:

      (1) Introduction:

      The discussion of the role of the VNS and preferences for different male stimuli should perhaps include Wysocki and Lepri 1991

      Agreed. we will refer to this work in our discussion.

      (2) Results:

      a) Given the 20s gap between them, the distinction between sample application and sympathetic nerve trunk stimulation needs to be made crystal clear; in many places, "stimulus application" is used in places where this reviewer suspects they actually mean sympathetic nerve trunk stimulation.

      In this study, we have considered both responses that are triggered by sympathetic trunk activation, and those that occur (as happens in some preparations) immediately following stimulus application (and prior to nerve trunk stimulation). An example of the latter Is provided in the second unit shown in Figure 1D (and this is indicated also in the figure legend). In our revision, we will further clarify this confusing point.

      b) There appears to be a mismatch between the discussion of Figure 3 and its contents. Specifically, there is an example of an "adjusted" pattern in 3A, not 3B.

      True. Thanks for catching this error. We will correct this.

      c) The discussion of patterns neglects to mention whether it's possible for a neuron to belong to more than one pattern. For example, it would seem possible for a neuron to simultaneously fit the "ICR pattern" and the "dominant adjusted pattern" if, e.g., all ICR responses are stronger than all others, but if simultaneously within each strain the dominant male causes the largest response.

      This is true. In the legend to Figure 3B, we actually write: “A neuron may fulfill more than one pattern and thus may appear in more than one row.”, but we will discuss this point in the main text as well.

      (3) Discussion:

      a) The discussion of chemical specificity in urine focuses on volatiles and MUPs (citation #47), but many important molecules for the VNS are small, nonvolatile ligands. For such molecules, the corresponding study is Fu et al 2015.

      We fully agree. We will expand our discussion and refer to Fu et al.

      b) "Following our line of reasoning, this scarcity may represent an optimal allocation of resources to separate dominant from naïve males": 1 unit out of 215 is roughly consistent with a single receptor. Surely little would be lost if there could be more computational capacity devoted to this important axis than that? It seems more likely that dominance is computed from multiple neuronal types with mixed encoding.

      We agree, and we are not claiming that dominance, nor any other feature, is derived using dedicated feature selective neurons.  Our discussion of resource allocation is inevitably speculative. Our main point in this context is that a lack of overrepresentation does not imply that a feature is not important. We will revise our discussion to better clarify our view of this issue.

      (4) Methods:

      a) Male status, "were unambiguous in most cases": is it possible to put numerical estimates on this? 55% and 99% are both "most," yet they differ substantially in interpretive uncertainty.

      This sentence is actually misleading and irrelevant. Ambiguous cases were not considered as dominant for urine collection. We only classified mice as dominant if they were “won” in the tube test and exhibited dominant behavior in the subsequent observation period in the cage. We will correct the wording in the revised manuscript.

      b) Surgical procedures and electrode positioning: important details of probes are missing (electrode recording area, spacing, etc).

      True. We will add these details.

      c) Stimulus presentation procedure: Are stimuli manually pipetted or delivered by apparatus with precise timing?

      They are delivered manually. We will clarify this as well.

      d) Data analysis, "we applied more permissive criteria involving response magnitude": it's not clear whether this is what's spelled out in the next paragraph, or whether that's left unspecified. In either case, the next paragraph appears to be about establishing a noise floor on pattern membership, not a "permissive criterion."

      True, the next paragraph is not the explanation for the more permissive criteria. The more permissive criteria involving response magnitude are actually those described in Figure 3A and 3B. The sentence that was quoted above merely states that before applying those criteria, we had also searched for patterns defined by binary designation of neurons as responsive, or not responsive, to each of the stimuli (this is directly related to the next comment below). Using those binary definitions, we obtained a very small number of neurons for each pattern and thus decided to apply the approach actually used and described in the manuscript.

      e) Data analysis, method for assessing significance: there's a lot to like about the use of pooling to estimate the baseline and the use of an ANOVA-like test to assess unit responsiveness.

      But:

      i) for a specific stimulus, at 4 trials (the minimum specified in "Stimulus presentation procedure") kruskalwallis is questionable. They state that most trials use 5, however, and that should be okay.

      The number of cases with 4 trials is truly a minority, and we will provide the exact numbers in our revision.

      ii) the methods statement suggests they are running kruskalwallis individually for each neuron/stimulus, rather than once per neuron across all stimuli. With 11 stimuli, there is a substantial chance of a false-positive if they used p < 0.05 to assess significance. (The actual threshold was unstated.) Were there any multiple comparison corrections performed? Or did they run kruskalwallis on the neuron, and then if significant assess individual stimuli? (Which is a form of multiple-comparisons correction.)

      First, we indeed failed to mention that our criterion was 0.05. We will correct that in our revision. We did not apply any multiple comparison measures. We consider each neuron-stimulus pair as an independent entity, and we are aware that this leads to a higher false positive rate. On the other hand, applying multiple comparisons would be problematic, as we do not always use the same number of stimuli in different studies. Applying multiple comparison corrections would lead to different response criteria across different studies. Notably, most, if not all, of our conclusions involve comparisons across conditions, and for this purpose we think that our procedure is valid. We do not attach any special meaning to the significance threshold, but rather think of it as a basic criterion that allows us to exclude non-responsive neurons, and to compare frequencies of neurons that fulfill this criterion.

    1. eLife Assessment

      Pinho et al use in vivo calcium imaging and chemogenetic approaches to examine the involvement of hippocampal sub-regions across the different stages of a sensory preconditioning task in mice. They find convincing evidence for sensory preconditioning in male mice. They also find that, in these mice, CaMKII-positive neurons in the dorsal hippocampus: (1) encode the audio-visual association that forms in stage 1 of the task, and (2) retrieve/express sensory preconditioned fear to the auditory stimulus at test. These findings are supported by evidence that ranges from incomplete to convincing. The study will be valuable to researchers in the field of learning and memory.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Pinho et al. presents a novel behavioral paradigm for investigating higher-order conditioning in mice. The authors developed a task that creates associations between light and tone sensory cues, driving mediated learning. They observed sex differences in task acquisition, with females demonstrating faster-mediated learning compared to males. Using fiber photometry and chemogenetic tools, the study reveals that the dorsal hippocampus (dHPC) plays a central role in encoding mediated learning. These findings are crucial for understanding how environmental cues, which are not directly linked to positive/negative outcomes, contribute to associative learning. Overall, the study is well-designed, with robust results, and the experimental approach aligns with the study's objectives.

      Strengths:

      (1) The authors develop a robust behavioral paradigm to examine higher-order associative learning in mice.

      (2) They discover a sex-specific component influencing mediated learning, with females exhibiting enhanced learning abilities.

      (3) Using fiber photometry and chemogenetic techniques, the authors identify the dorsal hippocampus but not the ventral hippocampus, which plays a crucial for encoding mediated learning.

      Weaknesses:

      (1) The study would be strengthened by further elaboration on the rationale for investigating specific cell types within the hippocampus.

      (2) The analysis of photometry data could be improved by distinguishing between early and late responses, as well as enhancing the overall presentation of the data.

      (3) The manuscript would benefit from revisions to improve clarity and readability.

    3. Reviewer #2 (Public review):

      Summary:

      Pinho et al. developed a new auditory-visual sensory preconditioning procedure in mice and examined the contribution of the dorsal and ventral hippocampus to learning in this task. Using photometry they observed activation of the dorsal and ventral hippocampus during sensory preconditioning and conditioning. Finally, the authors combined their sensory preconditioning task with DREADDs to examine the effect of inhibiting specific cell populations (CaMKII and PV) in the DH on the formation and retrieval/expression of mediated learning.

      Strengths:

      The authors provide one of the first demonstrations of auditory-visual sensory preconditioning in male mice. Research on the neurobiology of sensory preconditioning has primarily used rats as subjects. The development of a robust protocol in mice will be beneficial to the field, allowing researchers to take advantage of the many transgenic mouse lines. Indeed, in this study, the authors take advantage of a PV-Cre mouse line to examine the role of hippocampal PV cells in sensory preconditioning.

      Weaknesses:

      (1) The authors report that sensory preconditioning was observed in both male and female mice. However, their data only supports sensory preconditioning in male mice. In female mice, both paired and unpaired presentations of the light and tone in stage 1 led to increased freezing to the tone at test. In this case, fear to the tone could be attributed to factors other than sensory preconditioning, for example, generalization of fear between the auditory and visual stimulus.

      (2) In the photometry experiment, the authors report an increase in neural activity in the hippocampus during both phase 1 (sensory preconditioning) and phase 2 (conditioning). In the subsequent experiment, they inhibit neural activity in the DH during phase 1 (sensory preconditioning) and the probe test, but do not include inhibition during phase 2 (conditioning). It was not clear why they didn't carry forward investigating the role of the hippocampus during phase 2 conditioning. Sensory preconditioning could occur due to the integration of the tone and shock during phase two, or retrieval and chaining of the tone-light-shock memories at test. These two possibilities cannot be differentiated based on the data. Given that we do not know at which stage the mediate learning is occurring, it would have been beneficial to additionally include inhibition of the DH during phase 2.

      (3) In the final experiment, the authors report that inhibition of the dorsal hippocampus during the sensory preconditioning phase blocked mediated learning. While this may be the case, the failure to observe sensory preconditioning at test appears to be due more to an increase in baseline freezing (during the stimulus off period), rather than a decrease in freezing to the conditioned stimulus. Given the small effect, this study would benefit from an experiment validating that administration of J60 inhibited DH cells. Further, given that the authors did not observe any effect of DREADD inhibition in PV cells, it would also be important to validate successful cellular silencing in this protocol.

    4. Reviewer #3 (Public review):

      Summary:

      Pinho et al. investigated the role of the dorsal vs ventral hippocampus and the gender differences in mediated learning. While previous studies already established the engagement of the hippocampus in sensory preconditioning, the authors here took advantage of freely-moving fiber photometry recording and chemogenetics to observe and manipulate sub-regions of the hippocampus (dorsal vs. ventral) in a cell-specific manner. The authors first found sex differences in the preconditioning phase of a sensory preconditioning procedure, where males required more preconditioning training than females for mediating learning to manifest, and where females displayed evidence of mediated learning even when neutral stimuli were never presented together within the session.

      After validation of a sensory preconditioning procedure in mice using light and tone neutral stimuli and a mild foot shock as the unconditioned stimulus, the authors used fiber photometry to record from all neurons vs. parvalbumin_positive_only neurons in the dorsal hippocampus or ventral hippocampus of male mice during both preconditioning and conditioning phases. They found increased activity of all neurons, as well as PV+_only neurons in both sub-regions of the hippocampus during both preconditioning and conditioning phases. Finally, the authors found that chemogenetic inhibition of CaMKII+ neurons in the dorsal, but not ventral, hippocampus specifically prevented the formation of an association between the two neutral stimuli (i.e., light and tone cues), but not the direct association between the light cue and the mild foot shock. This set of data: (1) validates the mediated learning in mice using a sensory preconditioning protocol, and stresses the importance of taking sex effect into account; (2) validates the recruitment of dorsal and ventral hippocampi during preconditioning and conditioning phases; and (3) further establishes the specific role of CaMKII+ neurons in the dorsal but not ventral hippocampus in the formation of an association between two neutral stimuli, but not between a neutral-stimulus and a mild foot shock.

      Strengths:

      The authors developed a sensory preconditioning procedure in mice to investigate mediated learning using light and tone cues as neutral stimuli, and a mild foot shock as the unconditioned stimulus. They provide evidence of a sex effect in the formation of light-cue association. The authors took advantage of fiber-photometry and chemogenetics to target sub-regions of the hippocampus, in a cell-specific manner and investigate their role during different phases of a sensory conditioning procedure.

      Weaknesses:

      The authors went further than previous studies by investigating the role of sub-regions of the hippocampus in mediated learning, however, there are several weaknesses that should be noted:

      (1) This work first validates mediated learning in a sensory preconditioning procedure using light and tone cues as neutral stimuli and a mild foot shock as the unconditioned stimulus, in both males and females. They found interesting sex differences at the behavioral level, but then only focused on male mice when recording and manipulating the hippocampus. The authors do not address sex differences at the neural level.

      (2) As expected in fear conditioning, the range of inter-individual differences is quite high. Mice that didn't develop a strong light-->shock association, as evidenced by a lower percentage of freezing during the Probe Test Light phase, should manifest a low percentage of freezing during the Probe Test Tone phase. It would interesting to test for a correlation between the level of freezing during mediated vs test phases.

      (3) The use of a synapsin promoter to transfect neurons in a non-specific manner does not bring much information. The authors applied a more specific approach to target PV+ neurons only, and it would have been more informative to keep with this cell-specific approach, for example by looking also at somatostatin+ inter-neurons.

      (4) The authors observed event-related Ca2+ transients on hippocampal pan-neurons and PV+ inter-neurons using fiber photometry. They then used chemogenetics to inhibit CaMKII+ hippocampal neurons, which does not logically follow. It does not undermine the main finding of CaMKII+ neurons of the dorsal, but not ventral, hippocampus being involved in the preconditioning, but not conditioning, phase. However, observing CaMKII+ neurons (using fiber photometry) in mice running the same task would be more informative, as it would indicate when these neurons are recruited during different phases of sensory preconditioning. Applying then optogenetics to cancel the observed event-related transients (e.g., during the presentation of light and tone cues, or during the foot shock presentation) would be more appropriate.

      (5) Probe tests always start with the "Probe Test Tone", followed by the "Probe Test Light". "Probe Test Tone" consists of an extinction session, which could affect the freezing response during "Probe Test Light" (e.g., Polack et al. (http://dx.doi.org/10.3758/s13420-013-0119-5)). Preferably, adding a group of mice with a Probe Test Light with no Probe Test Tone could help clarify this potential issue. The authors should at least discuss the possibility that the tone extinction session prior to the "Probe Test Light" could have affected the freezing response to the light cue.

    5. Reviewer #4 (Public review):

      Summary

      Pinho et al use in vivo calcium imaging and chemogenetic approaches to examine the involvement of hippocampal sub-regions across the different stages of a sensory preconditioning task in mice. They find clear evidence for sensory preconditioning in male but not female mice. They also find that, in the male mice, CaMKII-positive neurons in the dorsal hippocampus: (1) encode the audio-visual association that forms in stage 1 of the task, and (2) retrieve/express sensory preconditioned fear to the auditory stimulus at test. These findings are supported by evidence that ranges from incomplete to convincing. They will be valuable to researchers in the field of learning and memory.

      Abstract

      Please note that sensory preconditioning doesn't require the stage 1 stimuli to be presented repeatedly or simultaneously.

      "Finally, we combined our sensory preconditioning task with chemogenetic approaches to assess the role of these two hippocampal subregions in mediated learning."<br /> This implies some form of inhibition of hippocampal neurons in stage 2 of the protocol, as this is the only stage of the protocol that permits one to make statements about mediated learning. However, it is clear from what follows that the authors interrogate the involvement of hippocampal sub-regions in stages 1 and 3 of the protocol - not stage 2. As such, most statements about mediated learning throughout the paper are potentially misleading (see below for a further elaboration of this point). If the authors persist in using the term mediated learning to describe the response to a sensory preconditioned stimulus, they should clarify what they mean by mediated learning at some point in the introduction. Alternatively, they might consider using a different phrase such as "sensory preconditioned responding".

      Introduction

      "Low-salience" is used to describe stimuli such as tone, light, or odour that do not typically elicit responses that are of interest to experimenters. However, a tone, light, or odour can be very salient even though they don't elicit these particular responses. As such, it would be worth redescribing the "low-salience" stimuli in some other terms.

      "These higher-order conditioning processes, also known as mediated learning, can be captured in laboratory settings through sensory preconditioning procedures2,6-11."<br /> Higher-order conditioning and mediated learning are not interchangeable terms: e.g., some forms of second-order conditioning are not due to mediated learning. More generally, the use of mediated learning is not necessary for the story that the authors develop in the paper and could be replaced for accuracy and clarity. E.g., "These higher-order conditioning processes can be studied in the laboratory using sensory preconditioning procedures2,6-11."

      In reference to Experiment 2, it is stated that: "However, when light and tone were separated on time (Unpaired group), male mice were not able to exhibit mediated learning response (Figure 2B) whereas their response to the light (direct learning) was not affected (Figure 2D). On the other hand, female mice still present a lower but significant mediated learning response (Figure 2C) and normal direct learning (Figure 2E). Finally, in the No-Shock group, both male (Figure 2B and 2D) and female mice (Figure 2C and 2E) did not present either mediated or direct learning, which also confirmed that the exposure to the tone or light during Probe Tests do not elicit any behavioral change by themselves as the presence of the electric footshock is required to obtain a reliable mediated and direct learning responses."<br /> The absence of a difference between the paired and unpaired female mice should not be described as "significant mediated learning" in the latter. It should be taken to indicate that performance in the females is due to generalization between the tone and light. That is, there is no sensory preconditioning in the female mice. The description of performance in the No-shock group really shouldn't be in terms of mediated or direct learning: that is, this group is another control for assessing the presence of sensory preconditioning in the group of interest. As a control, there is no potential for them to exhibit sensory preconditioning, so their performance should not be described in a way that suggests this potential.

      Methods - Behavior

      I appreciate the reasons for testing the animals in a new context. This does, however, raise other issues that complicate the interpretation of any hippocampal engagement: e.g., exposure to a novel context may engage the hippocampus for exploration/encoding of its features - hence, it is engaged for retrieving/expressing sensory preconditioned fear to the tone. This should be noted somewhere in the paper given that one of its aims is to shed light on the broader functioning of the hippocampus in associative processes.

      This general issue - that the conditions of testing were such as to force engagement of the hippocampus - is amplified by two further features of testing with the tone. The first is the presence of background noise in the training context and its absence in the test context. The second is the fact that the tone was presented for 30 s in stage 1 and then continuously for 180s at test. Both changes could have contributed to the engagement of the hippocampus as they introduce the potential for discrimination between the tone that was trained and tested.

      Results - Behavior

      The suggestion of sex differences based on differences in the parameters needed to generate sensory preconditioning is interesting. Perhaps it could be supported through some set of formal analyses. That is, the data in supplementary materials may well show that the parameters needed to generate sensory preconditioning in males and females are not the same. However, there needs to be some form of statistical comparison to support this point. As part of this comparison, it would be neat if the authors included body weight as a covariate to determine whether any interactions with sex are moderated by body weight.

      What is the value of the data shown in Figure 1 given that there are no controls for unpaired presentations of the sound and light? In the absence of these controls, the experiment cannot have shown that "Female and male mice show mediated learning using an auditory-visual sensory preconditioning task" as implied by its title. Minimally, this experiment should be relabelled.

      "Altogether, this data confirmed that we successfully set up an LTSPC protocol in mice and that this behavioral paradigm can be used to further study the brain circuits involved in higher-order conditioning."<br /> Please insert the qualifier that LTSPC was successfully established in male mice. There is no evidence of LTSPC in female mice.

      Results - Brain

      "Notably, the inhibition of CaMKII-positive neurons in the dHPC (i.e. J60 administration in DREADD-Gi mice) during preconditioning (Figure 4B), but not before the Probe Test 1 (Figure 4B), fully blocked mediated, but not direct learning (Figure 4D)."<br /> The right panel of Figure 4B indicates no difference between the controls and Group DPC in the percent change in freezing from OFF to ON periods of the tone. How does this fit with the claim that CaMKII-positive neurons in the dorsal hippocampus regulate associative formation during the session of tone-light exposures in stage 1 of sensory preconditioning?

      Discussion

      "When low salience stimuli were presented separated on time or when the electric footshock was absent, mediated and direct learning were abolished in male mice. In female mice, although light and tone were presented separately during the preconditioning phase, mediated learning was reduced but still present, which implies that female mice are still able to associate the two low-salience stimuli."<br /> This doesn't quite follow from the results. The failure of the female unpaired mice to withhold their freezing to the tone should not be taken to indicate the formation of a light-tone association across the very long interval that was interpolated between these stimulus presentations. It could and should be taken to indicate that, in female mice, freezing conditioned to the light simply generalized to the tone (i.e., these mice could not discriminate well between the tone and light).

      "Indeed, our data suggests that when hippocampal activity is modulated by the specific manipulation of hippocampal subregions, this brain region is not involved during retrieval."<br /> Does this relate to the results that are shown in the right panel of Figure 4B, where there is no significant difference between the different groups? If so, how does it fit with the results shown in the left panel of this figure, where differences between the groups are observed?

      "In line with this, the inhibition of CaMKII-positive neurons from the dorsal hippocampus, which has been shown to project to the restrosplenial cortex56, blocked the formation of mediated learning."<br /> Is this a reference to the findings shown in Figure 4B and, if so, which of the panels exactly? That is, one panel appears to support the claim made here while the other doesn't. In general, what should the reader make of data showing the percent change in freezing from stimulus OFF to stimulus ON periods?

    6. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study by Pinho et al. presents a novel behavioral paradigm for investigating higher-order conditioning in mice. The authors developed a task that creates associations between light and tone sensory cues, driving mediated learning. They observed sex differences in task acquisition, with females demonstrating faster-mediated learning compared to males. Using fiber photometry and chemogenetic tools, the study reveals that the dorsal hippocampus (dHPC) plays a central role in encoding mediated learning. These findings are crucial for understanding how environmental cues, which are not directly linked to positive/negative outcomes, contribute to associative learning. Overall, the study is well-designed, with robust results, and the experimental approach aligns with the study's objectives.

      Strengths:

      (1) The authors develop a robust behavioral paradigm to examine higher-order associative learning in mice.

      (2) They discover a sex-specific component influencing mediated learning, with females exhibiting enhanced learning abilities.

      (3) Using fiber photometry and chemogenetic techniques, the authors identify the dorsal hippocampus but not the ventral hippocampus, which plays a crucial for encoding mediated learning.

      Weaknesses:

      (1) The study would be strengthened by further elaboration on the rationale for investigating specific cell types within the hippocampus.

      We will add more information to better explain the rationale of our experiments and/or manipulations.

      (2) The analysis of photometry data could be improved by distinguishing between early and late responses, as well as enhancing the overall presentation of the data.

      We will provide new photometry analysis to differentiate between early and late responses during stimuli presentations.

      (3) The manuscript would benefit from revisions to improve clarity and readability.

      We will improve the clarity and readability of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      Pinho et al. developed a new auditory-visual sensory preconditioning procedure in mice and examined the contribution of the dorsal and ventral hippocampus to learning in this task. Using photometry they observed activation of the dorsal and ventral hippocampus during sensory preconditioning and conditioning. Finally, the authors combined their sensory preconditioning task with DREADDs to examine the effect of inhibiting specific cell populations (CaMKII and PV) in the DH on the formation and retrieval/expression of mediated learning.

      Strengths:

      The authors provide one of the first demonstrations of auditory-visual sensory preconditioning in male mice. Research on the neurobiology of sensory preconditioning has primarily used rats as subjects. The development of a robust protocol in mice will be beneficial to the field, allowing researchers to take advantage of the many transgenic mouse lines. Indeed, in this study, the authors take advantage of a PV-Cre mouse line to examine the role of hippocampal PV cells in sensory preconditioning.

      Weaknesses:

      (1) The authors report that sensory preconditioning was observed in both male and female mice. However, their data only supports sensory preconditioning in male mice. In female mice, both paired and unpaired presentations of the light and tone in stage 1 led to increased freezing to the tone at test. In this case, fear to the tone could be attributed to factors other than sensory preconditioning, for example, generalization of fear between the auditory and visual stimulus.

      To address the pertinent doubt raised by the reviewer, we will perform new experiments to generate a new unpaired group in female mice through the increase of the temporal interval between light and tone exposure during the preconditioning phase. We believe this new results will bring additional information to better understand the performance of female mice in sensory preconditioning.

      (2) In the photometry experiment, the authors report an increase in neural activity in the hippocampus during both phase 1 (sensory preconditioning) and phase 2 (conditioning). In the subsequent experiment, they inhibit neural activity in the DH during phase 1 (sensory preconditioning) and the probe test, but do not include inhibition during phase 2 (conditioning). It was not clear why they didn't carry forward investigating the role of the hippocampus during phase 2 conditioning. Sensory preconditioning could occur due to the integration of the tone and shock during phase two, or retrieval and chaining of the tone-light-shock memories at test. These two possibilities cannot be differentiated based on the data. Given that we do not know at which stage the mediate learning is occurring, it would have been beneficial to additionally include inhibition of the DH during phase 2.

      We will perform new experiments to generate novel data by inhibiting the CamK-positive neurons of the dorsal hippocampus during the conditioning phase.

      (3) In the final experiment, the authors report that inhibition of the dorsal hippocampus during the sensory preconditioning phase blocked mediated learning. While this may be the case, the failure to observe sensory preconditioning at test appears to be due more to an increase in baseline freezing (during the stimulus off period), rather than a decrease in freezing to the conditioned stimulus. Given the small effect, this study would benefit from an experiment validating that administration of J60 inhibited DH cells. Further, given that the authors did not observe any effect of DREADD inhibition in PV cells, it would also be important to validate successful cellular silencing in this protocol.

      By combining chemogenetic and fiber photometry approaches, we will perform a control experiments to demonstrate that our chemogenetic experiments are decreasing CAMK- or PV-dependent activity in dorsal and ventral hippocampus.

      Reviewer #3 (Public review):

      Summary:

      Pinho et al. investigated the role of the dorsal vs ventral hippocampus and the gender differences in mediated learning. While previous studies already established the engagement of the hippocampus in sensory preconditioning, the authors here took advantage of freely-moving fiber photometry recording and chemogenetics to observe and manipulate sub-regions of the hippocampus (dorsal vs. ventral) in a cell-specific manner. The authors first found sex differences in the preconditioning phase of a sensory preconditioning procedure, where males required more preconditioning training than females for mediating learning to manifest, and where females displayed evidence of mediated learning even when neutral stimuli were never presented together within the session.

      After validation of a sensory preconditioning procedure in mice using light and tone neutral stimuli and a mild foot shock as the unconditioned stimulus, the authors used fiber photometry to record from all neurons vs. parvalbumin_positive_only neurons in the dorsal hippocampus or ventral hippocampus of male mice during both preconditioning and conditioning phases. They found increased activity of all neurons, as well as PV+_only neurons in both sub-regions of the hippocampus during both preconditioning and conditioning phases. Finally, the authors found that chemogenetic inhibition of CaMKII+ neurons in the dorsal, but not ventral, hippocampus specifically prevented the formation of an association between the two neutral stimuli (i.e., light and tone cues), but not the direct association between the light cue and the mild foot shock. This set of data: (1) validates the mediated learning in mice using a sensory preconditioning protocol, and stresses the importance of taking sex effect into account; (2) validates the recruitment of dorsal and ventral hippocampi during preconditioning and conditioning phases; and (3) further establishes the specific role of CaMKII+ neurons in the dorsal but not ventral hippocampus in the formation of an association between two neutral stimuli, but not between a neutral-stimulus and a mild foot shock.

      Strengths:

      The authors developed a sensory preconditioning procedure in mice to investigate mediated learning using light and tone cues as neutral stimuli, and a mild foot shock as the unconditioned stimulus. They provide evidence of a sex effect in the formation of light-cue association. The authors took advantage of fiber-photometry and chemogenetics to target sub-regions of the hippocampus, in a cell-specific manner and investigate their role during different phases of a sensory conditioning procedure.

      Weaknesses:

      The authors went further than previous studies by investigating the role of sub-regions of the hippocampus in mediated learning, however, there are several weaknesses that should be noted:

      (1) This work first validates mediated learning in a sensory preconditioning procedure using light and tone cues as neutral stimuli and a mild foot shock as the unconditioned stimulus, in both males and females. They found interesting sex differences at the behavioral level, but then only focused on male mice when recording and manipulating the hippocampus. The authors do not address sex differences at the neural level.

      As discussed above, we will perform additional experiment to evaluate the presence of a reliable sensory preconditioning in female mice. In addition, although observing sex differences at the neural level can be very interesting, we think that it is out of the scope of the present work. However, we will mention this issue/limitation in the Discussion in the new version of the manuscript.

      (2) As expected in fear conditioning, the range of inter-individual differences is quite high. Mice that didn't develop a strong light-->shock association, as evidenced by a lower percentage of freezing during the Probe Test Light phase, should manifest a low percentage of freezing during the Probe Test Tone phase. It would interesting to test for a correlation between the level of freezing during mediated vs test phases.

      We will provide correlations between the behavioral responses in both probe tests.

      (3) The use of a synapsin promoter to transfect neurons in a non-specific manner does not bring much information. The authors applied a more specific approach to target PV+ neurons only, and it would have been more informative to keep with this cell-specific approach, for example by looking also at somatostatin+ inter-neurons.

      We will better justify the use of specific promoters and the targeting of PV-positive neurons. We will also add discussion on potential interesting future experiments such as the targeting of other GABAergic subtypes.

      (4) The authors observed event-related Ca2+ transients on hippocampal pan-neurons and PV+ inter-neurons using fiber photometry. They then used chemogenetics to inhibit CaMKII+ hippocampal neurons, which does not logically follow. It does not undermine the main finding of CaMKII+ neurons of the dorsal, but not ventral, hippocampus being involved in the preconditioning, but not conditioning, phase. However, observing CaMKII+ neurons (using fiber photometry) in mice running the same task would be more informative, as it would indicate when these neurons are recruited during different phases of sensory preconditioning. Applying then optogenetics to cancel the observed event-related transients (e.g., during the presentation of light and tone cues, or during the foot shock presentation) would be more appropriate.

      We will perform new experiments to analyze the activity of CAMK-positive neurons during light-tone associations during the preconditioning phase in male mice.

      (5) Probe tests always start with the "Probe Test Tone", followed by the "Probe Test Light". "Probe Test Tone" consists of an extinction session, which could affect the freezing response during "Probe Test Light" (e.g., Polack et al. (http://dx.doi.org/10.3758/s13420-013-0119-5)). Preferably, adding a group of mice with a Probe Test Light with no Probe Test Tone could help clarify this potential issue. The authors should at least discuss the possibility that the tone extinction session prior to the "Probe Test Light" could have affected the freezing response to the light cue.

      We will add discussion on this issue raised by the reviewer.

      Reviewer #4 (Public review):

      Summary

      Pinho et al use in vivo calcium imaging and chemogenetic approaches to examine the involvement of hippocampal sub-regions across the different stages of a sensory preconditioning task in mice. They find clear evidence for sensory preconditioning in male but not female mice. They also find that, in the male mice, CaMKII-positive neurons in the dorsal hippocampus: (1) encode the audio-visual association that forms in stage 1 of the task, and (2) retrieve/express sensory preconditioned fear to the auditory stimulus at test. These findings are supported by evidence that ranges from incomplete to convincing. They will be valuable to researchers in the field of learning and memory.

      Abstract

      Please note that sensory preconditioning doesn't require the stage 1 stimuli to be presented repeatedly or simultaneously.

      We will correct this wrong sentence in the abstract.

      "Finally, we combined our sensory preconditioning task with chemogenetic approaches to assess the role of these two hippocampal subregions in mediated learning."

      This implies some form of inhibition of hippocampal neurons in stage 2 of the protocol, as this is the only stage of the protocol that permits one to make statements about mediated learning. However, it is clear from what follows that the authors interrogate the involvement of hippocampal sub-regions in stages 1 and 3 of the protocol - not stage 2. As such, most statements about mediated learning throughout the paper are potentially misleading (see below for a further elaboration of this point). If the authors persist in using the term mediated learning to describe the response to a sensory preconditioned stimulus, they should clarify what they mean by mediated learning at some point in the introduction. Alternatively, they might consider using a different phrase such as "sensory preconditioned responding".

      Through the text, we will avoid the term “mediated learning” and we will replace it with more accurate terms. In addition, we will interrogate the role of dHPC in Stage 2 as commented above.

      Introduction

      "Low-salience" is used to describe stimuli such as tone, light, or odour that do not typically elicit responses that are of interest to experimenters. However, a tone, light, or odour can be very salient even though they don't elicit these particular responses. As such, it would be worth redescribing the "low-salience" stimuli in some other terms.

      We will substitute “low-salience” for “innocuous”.

      "These higher-order conditioning processes, also known as mediated learning, can be captured in laboratory settings through sensory preconditioning procedures2,6-11."

      Higher-order conditioning and mediated learning are not interchangeable terms: e.g., some forms of second-order conditioning are not due to mediated learning. More generally, the use of mediated learning is not necessary for the story that the authors develop in the paper and could be replaced for accuracy and clarity. E.g., "These higher-order conditioning processes can be studied in the laboratory using sensory preconditioning procedures2,6-11."

      Through the text, we will avoid the term “mediated learning” and we will replace it with more accurate terms.

      In reference to Experiment 2, it is stated that: "However, when light and tone were separated on time (Unpaired group), male mice were not able to exhibit mediated learning response (Figure 2B) whereas their response to the light (direct learning) was not affected (Figure 2D). On the other hand, female mice still present a lower but significant mediated learning response (Figure 2C) and normal direct learning (Figure 2E). Finally, in the No-Shock group, both male (Figure 2B and 2D) and female mice (Figure 2C and 2E) did not present either mediated or direct learning, which also confirmed that the exposure to the tone or light during Probe Tests do not elicit any behavioral change by themselves as the presence of the electric footshock is required to obtain a reliable mediated and direct learning responses."<br /> The absence of a difference between the paired and unpaired female mice should not be described as "significant mediated learning" in the latter. It should be taken to indicate that performance in the females is due to generalization between the tone and light. That is, there is no sensory preconditioning in the female mice. The description of performance in the No-shock group really shouldn't be in terms of mediated or direct learning: that is, this group is another control for assessing the presence of sensory preconditioning in the group of interest. As a control, there is no potential for them to exhibit sensory preconditioning, so their performance should not be described in a way that suggests this potential.

      We will re-write the text to clarify the right comments raised by the Reviewer.

      Methods - Behavior

      I appreciate the reasons for testing the animals in a new context. This does, however, raise other issues that complicate the interpretation of any hippocampal engagement: e.g., exposure to a novel context may engage the hippocampus for exploration/encoding of its features - hence, it is engaged for retrieving/expressing sensory preconditioned fear to the tone. This should be noted somewhere in the paper given that one of its aims is to shed light on the broader functioning of the hippocampus in associative processes.

      We will further discuss this aspect on the manuscript.

      This general issue - that the conditions of testing were such as to force engagement of the hippocampus - is amplified by two further features of testing with the tone. The first is the presence of background noise in the training context and its absence in the test context. The second is the fact that the tone was presented for 30 s in stage 1 and then continuously for 180s at test. Both changes could have contributed to the engagement of the hippocampus as they introduce the potential for discrimination between the tone that was trained and tested.

      We will consider the aspect raised by the reviewer on the manuscript.

      Results - Behavior

      The suggestion of sex differences based on differences in the parameters needed to generate sensory preconditioning is interesting. Perhaps it could be supported through some set of formal analyses. That is, the data in supplementary materials may well show that the parameters needed to generate sensory preconditioning in males and females are not the same. However, there needs to be some form of statistical comparison to support this point. As part of this comparison, it would be neat if the authors included body weight as a covariate to determine whether any interactions with sex are moderated by body weight.

      We will add statistical comparisons between male and female mice.

      What is the value of the data shown in Figure 1 given that there are no controls for unpaired presentations of the sound and light? In the absence of these controls, the experiment cannot have shown that "Female and male mice show mediated learning using an auditory-visual sensory preconditioning task" as implied by its title. Minimally, this experiment should be relabelled.

      We will relabel Figure 1.

      "Altogether, this data confirmed that we successfully set up an LTSPC protocol in mice and that this behavioral paradigm can be used to further study the brain circuits involved in higher-order conditioning."

      Please insert the qualifier that LTSPC was successfully established in male mice. There is no evidence of LTSPC in female mice.

      We will generate new experiments to try to demonstrate that SPC can be also observed in female mice.

      Results - Brain

      "Notably, the inhibition of CaMKII-positive neurons in the dHPC (i.e. J60 administration in DREADD-Gi mice) during preconditioning (Figure 4B), but not before the Probe Test 1 (Figure 4B), fully blocked mediated, but not direct learning (Figure 4D)."

      The right panel of Figure 4B indicates no difference between the controls and Group DPC in the percent change in freezing from OFF to ON periods of the tone. How does this fit with the claim that CaMKII-positive neurons in the dorsal hippocampus regulate associative formation during the session of tone-light exposures in stage 1 of sensory preconditioning?

      We will rephrase and add more Discussion regarding this section of the results to stick to what the graphs are showing. We will clarify that the group where dHPC activity is inhibited during preconditioning is the only one where the % of change is not significantly different from 0 (compared to the control or the group where the dHPC activity was modulated during the test).

      Discussion

      "When low salience stimuli were presented separated on time or when the electric footshock was absent, mediated and direct learning were abolished in male mice. In female mice, although light and tone were presented separately during the preconditioning phase, mediated learning was reduced but still present, which implies that female mice are still able to associate the two low-salience stimuli."

      This doesn't quite follow from the results. The failure of the female unpaired mice to withhold their freezing to the tone should not be taken to indicate the formation of a light-tone association across the very long interval that was interpolated between these stimulus presentations. It could and should be taken to indicate that, in female mice, freezing conditioned to the light simply generalized to the tone (i.e., these mice could not discriminate well between the tone and light).

      We will rewrite this part depending on the results observed in female mice.

      "Indeed, our data suggests that when hippocampal activity is modulated by the specific manipulation of hippocampal subregions, this brain region is not involved during retrieval."

      Does this relate to the results that are shown in the right panel of Figure 4B, where there is no significant difference between the different groups? If so, how does it fit with the results shown in the left panel of this figure, where differences between the groups are observed?

      We will re-write it to clearly describe our results and we will also revise all the statistical analysis.

      "In line with this, the inhibition of CaMKII-positive neurons from the dorsal hippocampus, which has been shown to project to the restrosplenial cortex56, blocked the formation of mediated learning."

      Is this a reference to the findings shown in Figure 4B and, if so, which of the panels exactly? That is, one panel appears to support the claim made here while the other doesn't. In general, what should the reader make of data showing the percent change in freezing from stimulus OFF to stimulus ON periods?

      We will rewrite the text to clearly describe our results, and we will also revise all the statistical analysis. In addition, we will better explain the data showing the % of change.

    1. eLife Assessment

      In this paper, the authors report important structural and functional findings on the interaction of how the group A streptococci (GAS) M3 protein (expressed on GAS strains emm3, which are associated with invasive disease) binds to human collagens. They demonstrate an unusual T-shaped structure within the N-terminal hypervariable region of M3 protein that can bind two copies of collagen triple helix in parallel. These solid data advance understanding of how GAS M3 interacts with human collagen, information relevant to understanding and developing treatments for GAS infection. A major limitation of the work is the lack of mutational work to test if the T-shaped structure is necessary for binding collagen.

    2. Reviewer #1 (Public review):

      Summary:

      Wojnowska et al. report structural and functional studies of the interaction of Streptococcus pyogenes M3 protein with collagen. They show through X-ray crystallographic studies that the N-terminal hypervariable region of M3 protein forms a T-like structure and that the T-like structure binds a three-stranded collagen-mimetic peptide. They indicate that the T-like structure is predicted by AlphaFold3 (with varying confidence level) in other M proteins that have sequence similarity to M3 protein and M-like proteins from group C and G streptococci. For some, but not all, of these related M and M-like proteins, AlphaFold3 predicts complexes similar to the one observed for M3-collagen. Functionally, the authors show that emm3 strains form biofilms with more mass when surfaces are coated with collagen, and this effect can be blocked by an M3 protein fragment that contains the T-structure. They also show the co-occurrence of emm3 strains and collagen in patient biopsies and a skin tissue organoid.

      Strengths:

      The paper is well-written and the data presented is mostly sound.

      Weaknesses:

      However, a major limitation of the paper is that it is almost entirely observational and fails to draw a causal relationship. This is mainly due to the near-total absence of mutational studies.

    3. Reviewer #2 (Public review):

      Streptococcus pyogenes, or group A streptococci (GAS) can cause diseases ranging from skin and mucosal infections, to plasma invasion, and post-infection autoimmune syndromes. M proteins are essential GAS virulence factors that include an N-terminal hypervariable region (HVR). M proteins are known to bind to numerous human proteins; a small subset of M proteins were reported to bind collagen, which is thought to promote tissue adherence. In this paper, the authors characterize M3 interactions with collagen and its role in biofilm formation. Specifically, they screened different collagen type II and III variants for full-length M3 protein binding using an ELISA-like method, detecting anti-GST antibody signal. By statistical analysis, hydrophobic amino acids and hydroxyproline were found to positively support binding, whereas acidic residues and proline negatively impacted binding (Table 1). The authors applied X-ray crystallography to determine the structure of the N-terminal domain (42-151 amino acids) of M3 protein (M3-NTD). M3-NTD dimmer (PDB 8P6K) forms a T-shaped structure with three helices (H1, H2, H3), which are stabilized by a hydrophobic core, inter-chain salt bridges and hydrogen bonds on H1, H2 helices, and H3 coiled coil. The conserved Gly113 serves as the turning point between H2 and H3 (Figure 5). The M3-NTD is co-crystalized with a 24-residue peptide, JDM238, to determine the structure of M3-collagen binding. The structure (PDB 8P6J) shows that two copies of collagen in parallel bind to H1 and H2 of M3-NTD. Among the residues involved in binding, conserved Try96 is shown to play a critical role supported by structure and isothermal titration calorimetry (ITC). The authors also apply a crystal-violet assay and fluorescence microscopy to determine that M3 is involved in collagen type I binding, but not M1 or M28 (Figure 9). Tissue biopsy staining indicates that M3 strains co-localize with collagen IV-containing tissue, while M1 strains do not. The authors provide generally compelling evidence to show that GAS M3 protein binds to collagen, and plays a critical role in forming biofilms, which contribute to disease pathology. This is a very well-executed study and a well-written report relevant to understanding GAS pathogenesis and approaches to combatting disease; data are also applicable to emerging human pathogen Streptococcus dysgalactiae. One caveat that was not entirely resolved is if/how different collagen types might impact M3 binding and function. Due to the technical constraints, the in vitro structure and other binding assays use type II collagen whereas in vivo, biofilm formation assays and tissue biopsy staining use type I and IV collagen; it was unclear if this difference is significant. One possibility is that M3 has an unbiased binding to all types of collagens, only the distribution of collagens leads to the finding that M3 binds to type IV (basement membrane) and type I (varies of tissue including skin), rather than type II (cartilage).

    4. Author response:

      Many thanks for assessing our submission. We are grateful for the reviews and recommendations that will inform a revised version of the paper, which will include additional data and modified text to take into account the reviewers’ comments.

      We appreciate Reviewer #1’s suggestion regarding the use of mutational work to demonstrate that collagen binding is indeed dependent on the T-shaped fold. However, we believe that this approach is neither feasible nor necessary for our study. Instead, we propose to measure collagen binding to a monomeric form of M3, which preserves all residues including the ones involved in binding, but cannot form the T-shaped structure. This will achieve the same as unravelling the T fold through mutations, but at the same time removes the risk of directly affecting binding through altering residues that are involved in both binding and definition of the T fold.

      Structural biology is by its nature observational, which is not a limitation but the very purpose of this approach. Our study goes beyond observing structures. We identify a critical residue within a previously mapped binding site, and demonstrate through mutagenesis a causal link between presence of this residue on a tertiary fold and collagen binding activity. We will firm up our mutational experiments with a characterisation of the M3 Tyr96 variants to confirm that these mutations did not affect the overall fold. We further demonstrate that the interaction between M3 and collagen promotes biofilm formation as observed in patient biopsies and a tissue model of infection. We show that other streptococci, that do not possess a surface protein presenting collagen binding sites like M3, do not form collagen-dependent biofilm. We therefore do not think that criticising our study for being almost entirely observational is justified. 

      We thank Reviewer #2 for the thorough analysis of our reported findings. The main criticism here concerns the question if binding of emm3 streptococci would differ for different types of collagen. We will address this point in the revised manuscript. Our collagen peptide binding assays together with the structural data identify the collagen triple helix as the binding site for M3. While collagen types differ in their functions and morphology in various tissues, they all have in common triple-helical tropocollagen regions (with very high sequence similarity) that are non-specifically recognised by M3. Therefore, our data in conjunction with the body of published work showing binding of M3 to collagens I, II, III and IV suggest it is highly likely that emm3 streptococci will indeed bind to many if not all types of collagen in the same manner. Whether this means all collagen types, in the various tissues where they occur, are targeted by emm3 streptococci is a very interesting question, however one that goes beyond the scope of our study.

    1. eLife Assessment

      This important theoretical study introduces an extension to the commonly used SIR model for infectious disease dynamics, to explicitly consider the role of larger group sizes. Instead of the commonly used individual-based network models, the authors developed a simplified approach based on group sampling, with discrete high- and low-risk groups, which makes the results easier to produce and interpret, at the cost of less detail in the model. The evidence is convincing in terms of the soundness of the theoretical projections and the impact that accounting for group sizes may have on inferences from surveillance data. However, it has not yet been demonstrated that the predictions provide more realistic projections when based on real-world data.

    2. Reviewer #1 (Public review):

      Summary:

      This work considers the biases introduced into pathogen surveillance due to congregation effects, and also models homophily and variants/clades. The results are primarily quantitative assessments of this bias but some qualitative insights are gained e.g. that initial variant transmission tends to be biased upwards due to this effect, which is closely related to classical founder effects.

      Strengths:

      The model considered involves a simplification of the process of congregation using multinomial sampling that allows for a simpler and more easily interpretable analysis.

      Weaknesses:

      This simplification removes some realism, for example, detailed temporal transmission dynamics of congregations.