7,842 Matching Annotations
  1. Aug 2022
    1. LDST 200: Introduction to Leadership Studies and Applications Fall 2022 (Aug. 22-Oct. 14) Instructor Information: Jacob H. Stutzman, Ph.D Email: jhstutz@ku.edu Office Hours: by appointment (Links to an external site.) Required Materials Heifetz, R., Grashow, A., & Linsky, M. (2009). The practice of adaptive leadership: Tools and tactics for changing your organization and the world. Boston, MA: Harvard Business Press Institute for Leadership Studies. (2020). LDST 201: Introduction to Leadership Coursepack (5e) [provided on Canvas] Course Outcomes Upon completion of this course, students will examine and recall various theoretical approaches to leadership and leadership development; recall the four core leadership competencies and integrate each competency into their personal leadership development; explain and differentiate the role of ethics, diversity, and community development in leadership; theorize the ethical implications and applications of Adaptive Leadership and the four core leadership competencies; identify acts of Adaptive Leadership and distinguish between technical problems and adaptive challenges; distinguish between a learning/experimenting paradigm versus a problem/solution paradigm, along with contrasting the strengths and limitations of both; evaluate his/her own personal leadership strengths and challenges based on deliberate reflection; effectively communicate knowledge about and applications of leadership to others. How will we get from where we are to where we are hoping to go? Each week, you will work through a module that includes video lectures and readings (both from the assigned texts and some provided by the instructor) that will help you build a base of knowledge about leadership studies generally and Adaptive Leadership specifically. Each module will also include a quiz, a journal, and other writing assignments designed to help you put your knowledge to use by testing it and applying it to relevant scenarios. Most of the assignments will be completed individually, but there will be a limited amount of collaborative work. By thoughtfully and carefully completing each assignment, you will develop your knowledge of Adaptive Leader and explore the ways in which the principles of Adaptive Leadership can be useful in your own contexts. Assessments/Assignments Journals (7 @25 pts ea.) In each module (except for Module 8) there will be a prompt based on the material in the module. Responses should integrate the material from that week. The assignment expectations and sample journal entries are available on pp. 16-23 of the coursepack. Quizzes (6, 80 pts total) In each module (except for Module 8), there is a short quiz based on the material in the module. You may take each quiz twice, but the most recent score will always be the score that is recorded. Additional information is on p. 24 of the coursepack. Exams (2 @80 pts ea.) There will be two multiple choice exams in the course. These exams will cover material presented in the readings, webinars, supporting documents, and videos comprising the Modules. Exams will open and close with the Modules, so each exam must be taken before the Module deadline for that week. Exam dates are listed in the course schedule. Once you login to take the exam, you must complete it within 60 minutes. You will only have one chance to complete each exam. Study guides for each exam are available on Canvas throughout the semester. Reflection Paper (150 pts) You will complete a final reflection paper that draws on the material covered throughout the course. A full description, rubric, and sample paper are in the coursepack on pp. 38-50 of the coursepack. Application Project (3 parts @80 pts ea.) You will work through a three-part project to do the work of Adaptive Leadership in a community to which you belong. Each phase of the project will build on the work done before, with the first phase due in Module 3. TruTalent Assessment/Letter (10 pts for the results, 40 pts for the letter) Each of you will complete the TruTalent assessment through the University Career Center and upload your results when they are ready. In Module 5, there is also an assignment that calls on you to reflect on your results in the form of a letter. The specific assignment will be available in that module. Ethics Discussions (30 pts) In Modules 4, 5, & 6, you will respond to a prompt and then reply to your classmates' responses in an annotation assignment. Details will be available in Module 4. Ethics Paper (75 pts) Using the framework provided in Module 4, each of you will prepare an ethics case study on a situation of your choosing. Details are in Module 4. Self-Care Plan (40 pts) Each of you will also complete a self-care plan, because doing leadership is hard work, and you can't pour from an empty cup. Details for the assignment are in Module 7. Total points available: 1000 Grade Distribution           🦄 B+: 875-899 C+: 775-799 D+: 675-699 💩 A: 925-1000 B: 825-874 C: 725-774 D: 625-674 💩 A-: 900-924 B-: 800-824 C-: 700-724 D-: 600-624 F: 0-599 Schedule Module Date Open Date Due Items Due Course Information Module; Module 1--Introduction Aug. 22 Aug. 27 Pre-Course Survey; Values Worksheet; Journal Module 2--History of Leadership Theories Aug. 22 Sept. 3 Quiz; Journal Module 3--Introduction to Adaptive Leadership Aug. 28 Sept. 10 Quiz; Journal; TruTalent results; Application Phase 1 Module 4--Diversity and Ethics Sept. 4 Sept. 17 Quiz; Journal; Ethics Paper; Ethics Annotation: Exam 1 Module 5--Leadership and Personality Sept. 11 Sept. 24 Quiz; Journal; TruTalent Letter; Ethics Annotation Module 6--Manage Self and Energize Others Sept. 18 Oct. 1 Quiz; Journal; Application Phase 2, Ethics Annotation Module 7--Diagnose the Situation and Intervene Skillfully Sept. 25 Oct. 8 Quiz; Journal; Self-Care Plan Module 8--Celebrations of Knowledge Oct. 2 Oct. 15 Application Phase 3; Final Reflection Paper; Exam 2 Policies, Procedures, and the Like Canvas and Email This course will use Canvas for the dissemination of all lecture materials and reading assignments (other than the textbook), as well as the collection of all assessments. It is the student's responsibility to regularly check Canvas for updates and information. Emails sent through Canvas will go to your KU email address, so you must also check that email address regularly for information and communication. If you send an email from a non-university email address, I will reply to that address, but any emails I initiate will go to your university address. Assignments should not be submitted via email unless explicit, case-by-case arrangements are made. Incompletes In accordance with KU's policy on incompletesLinks to an external site., an I should only be assigned when some portion of the work for a course has not been done, for reasons beyond a student's control. Incompletes should be rare and will be assigned only in rare circumstances. If you believe such circumstances apply to your situation, please contact me as soon possible. Civility Each of us is an adult that has made the choice to be in this course. Recognizing that choice, each of us is expected to respect all points of view expressed in the classroom. Each person in this classroom should feel free to express her/his opinion and should feel an obligation to ensure that everyone else in the room feels the same freedom. Intolerance and incivility will not be tolerated, though disagreement and reasoned argument are strongly encouraged. Title IX of the Education Amendments of 1972 prohibits sex discrimination against any participant in an educational program or activity that receives federal funds, including federal loans and grants. Title IX also prohibits student‐to‐student sexual harassment. If you encounter unlawful sexual harassment, gender‐based discrimination, or other forms or prohibited harassment/discrimination, please talk with your professor or with the Office of Institutional Opportunity & Access at 785‐864-6414, or go to the Institutional Opportunity & AccessLinks to an external site. page for more information and reporting tools. Academic Integrity and Intellectual Property Academic misconduct of any kind is not tolerated in this class. Both the definition of academic misconduct and potential sanctions for it are defined by KU policyLinks to an external site.. Plagiarizing another's work. knowingly misrepresenting the source of any academic work, giving or receiving of unauthorized aid on assignments, and acting dishonestly in research are all subject to penalties. Similarly, submitting all or portions of an assignment completed in another class for a grade in this class is an act of academic misconduct. If you have outside work that you believe is appropriate and valuable to include in an assignment for this course, please speak with your instructor to establish appropriate guidelines. Additionally, all work produced for and in this course remains the intellectual property of the creator, including but not limited to: the textbooks, the lectures, and student assignments. No work may be reused, reproduced, or distributed without the express permission of the work's creator. This includes sharing notes or course materials to commercial or nonprofit services/databases. This policy does NOT include taking notes for personal use or a student volunteer taking notes for someone with a reasonable accommodation identified by the Student Access Center. Accessibility If you believe you need or would benefit from the accommodation of a disability, please contact the Student Access CenterLinks to an external site. to discuss accommodations. Since accommodations may require early planning and generally are not provided retroactively, please contact the Center as soon as possible.

      My name is Eden. I'm from the most haunted town in Kansas, Atchison! My major is Philosophy and my minor is in history. I think my goal for this class is to learn more about leadership in a conceptual way, so that I can apply it in more real-life situations. My walk up song would be Wake up by Young the giant!

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Neville et al addresses the link between the localization and the activity of the so-called "Pins complex" or "LGN complex", that has been shown to regulate mitotic spindle orientation in most animal cell types and tissues. In most cell types, the polarized localization of the complex in the mitotic cell (which can vary between apical and basolateral, depending on the context) localizes pulling forces to dictate the orientation. The authors reexplore the notion that this polarized localization of the complex is sufficient to dictate spindle orientation, and propose that an additional step of "activation" of the complex is necessary to refine positioning of the spindle.

      The experiments are performed in the follicular epithelium (FE), an epithelial sheet of cell that surrounds the drosophila developing oocyte and nurse cells in the ovarium. Like in many other epithelia, cell divisions in the FE are planar (the cell divides in the plane of the epithelium). The authors first confirm that planar divisions in this epithelium depends on the function of Pins and its partner mud, and that the interaction between the two partners is necessary, like in many other epithelial structures. Planar divisions are often associated with a lateral/basolateral "ring" of the Pins complex during mitosis. The authors show that in the FE, Pins is essentially apical in interphase and becomes enriched at the lateral cortex during mitosis, however a significant apical component remains, whereas mud is almost entirely absent from the apical cortex. Pins being "upstream" of mud in the complex, this is a first hint that the localization of Pins is not sufficient to dictate the localization of mud and of the pulling forces.

      The authors then replace wt Pins, whose cortical anchoring strongly relies on its interaction with Gai subunits, with a constitutively membrane anchored version (via a N-terminal myristylation). They show that the localization of myr-Pins mimics that of wt-Pins, with a lateral enrichment in mitosis, and a significant apical component. Since a Myr-RFP alone shows a similar distribution, they conclude that the restricted localization of Pins in mitosis is a consequence of general membrane characteristics in mitosis, rather than the result of a dedicated mechanism of Pins subcellular restriction. Remarkably, Myr-Pins also rescues Pins loss-of-function spindle orientation defects.

      They further show that the cortical localization of Pins does not require its interaction with Dlg (unlike what has been suggested in other epithelia). However, spindle orientation requires Dlg, and in particular it requires the direct Dlg/Pins interaction. The activity of Dlg in the FE appears to be independent from khc73 and Gukholder, two of its partners involved in its activity in microtubule capture and spindle orientation in other cell types. Based on all these observations, the authors propose that Dlg serves as an activator that controls Pins activity in a subregion of its localization domain (in this case, the lateral cortex of the mitotic FE cell). They propose to test this idea by relocalizing Pins at the apical cortex, using Inscuteable ectopic expression. With the tools that they use to drive Inscuteable expression, they obtain two populations of cells. One population has a stronger apical that basolateral Insc distribution, and the spindle is reoriented along the apical-basal axis; the other population has higher basolateral than apical levels of Insc distribution, and the spindle remains planar. The authors write that Pins localization is unchanged between the two subsets of cells (although I do not entirely agree with them on that point, see below), and that although Mud is modestly recruited to the apical cortex in the first population, it remains essentially basolateral in both. In this situation, the localization of Insc in the cell is therefore a better predictor of spindle orientation than that of Pins or Mud. Remarkably, removing Dlg in an Insc overexpression context leads to a dramatic shift towards apical-basal reorientation of the spindle, suggesting that loss of Dlg-dependent activation of the lateral Pins complex reveals an Insc-dependent apical activation of the complex. Overall, I find the demonstration convincing and the conclusion appropriate. One of the limitations of the study is the use of different drivers and reporters for the localization of Pins, which makes it hard to compare different situations, but not to the point that it would jeopardize the main conclusions. I do not have major remarks on the paper, only a few minor observations and suggestion of simple experiments that would complete the study.

      Minor:

      What happens to Pins and Mud in Dlg mutant cells that overexpress Insc and behave as InscA? Are they still essentially lateral, or are they more efficiently recruited to the apical cortex?

      This is a terrific question. Of course we would love to know and intend to find out.

      One way to do this (consistent with the manuscript) would be to generate flies that are Dlg[1P20], FRT19A/RFP-nls, hsflp, FRT19A; TJ-GAL4/+; Pins-Tom, GFP-Mud/UAS-Insc. (Note that these flies would only allow us to image Mud; we would have to repeat the experiment using GFP FRT19A; hsflp 38 to see Pins. This isn’t ideal given that we’d like to image both together). Generating these flies is a major technical challenge because of the number of transgenes and chromosomes involved.

      Our preferred way to do this would be to generate flies that are Dlg[1P20]/Dlg[2]; TJ-GAL4/+; Pins-Tom, GFP-Mud/UAS-Insc. So far, we’ve been unsuccessful. We are now undertaking a modified crossing scheme that we hope will solve the problem, though we aren’t overly optimistic about the outcome. We find that the temperature-sensitive mutation Dlg[2] presents an activation barrier; while we are able to generate flies that are Dlg[2] / FM7 in combination with transgenes and/or mutations on other chromosomes, we do not always recover the Dlg[2] / Y males (which must develop at 18degrees) from these complex genotypes.

      In the longer term (outside the scope of revision), we are working to develop more tools for imaging Mud and Pins that we hope will help answer this question.

      Regarding the competition between Pins and Insc for dictating the apical versus basolateral localization of Insc, the Insc-expression threshold model could be easily tested in Pins62/62 mutants, where it is expected that only InscA localization should be observed, even at 25{degree sign}C (unless Pins is required for the cortical recruitment of Insc, as it is the case in NBs - see Yu et al 2000 for example).

      This is another great experiment and one we’d love to carry out. Again, the genetics are currently challenging, only because both UAS-Inscuteable and FRT82B pinsp62 are on the third chromosome. (Right now we’re trying to hop UAS-Inscuteable to the second).

      However, we do have another idea for testing the threshold model, which is to repeat the experiment in which we express UAS-Insc in cells that are DlgIP20/IP20 at 25oC. Because the relevant cells (UAS-Insc OX in Dlg mitotic clones) are relatively rare, we have not yet been able to collect enough examples to make a firm conclusion. However, our preliminary results (only six cells so far!) suggest that more InscB cells are observed at the lower temperature, consistent with the threshold model.

      I do not agree with the authors on P.10 and Figure 6A-D, when they claim that the apical enrichment of Pins is equivalent in both InscA and InscB cells. The number of measured cells is very low, and the ratio of apical/lateral Pins differs between the two sets of cells. The number of cells should be increased and the ratios compared with a relevant statistic method.

      Totally fair. We are working to add more data to these panels (6B and 6D). The trend observed in 6D may be softening in agreement with the reviewer’s prediction, although we currently don’t yet have enough new data points to be confident in that conclusion. Therefore, we have not yet updated the manuscript, though we expect to do so during the revision period. We will also add a statistical comparison. Importantly, as the reviewer suggested, this does not alter our conclusions.

      A lot of the claims on Pins localization rely on overexpression (generally in a Pins null background) of tagged Pins expressed from different promoters or drivers, and fused to different fluorescent tags. Therefore, it is difficult to evaluate to which extent the localization reflects an endogenous expression level, and to compare the different situations. As the cortical localization of Pins relies on interaction with cortical partners (mostly GDP-bound Gai) which are themselves in limiting quantity in the cell, and in the case of Gai-GDP, regulated by Pins GDI activity, this poses a problem when comparing their distribution, because the expression level of Pins may contribute to its cortical/cytoplasmic ratio, but also to its lateral/apical distribution. Although I understand that the authors have been using tools that were already available for this study, I think it would be more convincing if all the Pins localization studies were performed with endogenously tagged Pins, even those with Myr localization sequences. In an age of CRISPR-Cas-dependent homologous recombination, I think the generation of such alleles should have been possible. Although this would probably not change the main claims of the paper, it would have made a more convincing case for the localization studies.

      We don’t disagree at all with this point. We did indeed try to stick with the published UAS-Pins-myr-GFP, not only for convenience but because it allows us to make comparisons to other studies using the same tool (Chanet et al Current Biology 2017 and Camuglia et al eLife 2022). Another consideration is that we used only one driver across our experiments (Traffic jam-GAL4). It is quite weak at the developmental stages that we examine, meaning that overexpression is not a major concern. (Indeed we have struggled with the opposite problem).

      We certainly take the reviewer’s comment seriously and we therefore described it in the manuscript. We are currently working to develop endogenous tools using CRISPR.

      Paragraph added to Discussion – Limitations of our Study:

      “Another technical consideration is that our work makes use of transgenes under the control of Traffic jam-GAL4. While this strategy allows us to compare our results with previous work employing the same or similar tools, a drawback is that we cannot guarantee that Traffic jam-GAL4 drives equivalent expression to the endogenous Pins promoter (Chanet et al., 2017, Camuglia et al., 2022). However, given that Traffic jam-GAL4 is fairly weak at the developmental stages examined, we are not especially concerned about overexpression effects.”

      The authors should indicate in the figure legends or in the methods that the spindle orientation measurements for controls for Pins62/62 are reused between figures 1, 3, 4, 5, 6 , and between figure 3, 4 and 5, respectively.

      Absolutely. Added to the Methods section.

      Reviewer #1 (Significance (Required)):

      Altogether, this study makes a convincing case that the localization of the core members of the pulling force complex, Pins and Mud, is not entirely sufficient to localize active force generation, and that the complex must be activated locally, at least in the FE. The notion of activation of the Pins/LGN complex has probably been on many people's mind for years: Pins/LGN works as a closed/open switch depending on the number of Gai subunits it interacts with, it must be phosphorylated, etc... suggesting that not all cortical Pins/LGN was active and involved in force generation. However the study presented here shows an interesting case where localization and activation are clearly disconnected. The authors show how Dlg plays this role in physiological conditions in the FE, and use ectopic expression of Insc to show that, at least in an artificial context, Insc can have the same "activating activity" (or at least an activating activity that is stronger than its apical recruitment capability and stronger than Dlg's activating activity). It is to my knowledge the first case of such a clear dissociation. In their discussion, the authors are careful not to generalize the observation to other tissues. Although I did not reexplore all that has been published on the Pins/LGN-NuMA/Mud complex over the last 20 years, my understanding is that despite interesting cases of distribution of the complex like that of Mud in the tricellular junction in the notum, the localization model can still explain most of the phenotypes that have been described without invoking an activation step. If it is the case, then the activation model is another variation (an interesting one!) on the regulation of the core machinery, which are plentiful as the authors indicate in their introduction, and is maybe specific to the FE; if not, then it would be interesting to push the discussion further by reexamining previous results in other systems, and pinpointing those phenotypes that could be better explained with an activation step.

      Overall, I find this is an elegant piece of work, which should be of interest to many cell and developmental biologists beyond the community of spindle orientation aficionados.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Summary: The manuscript by Neville et al. addressed the mechanism how conserved spindle regulators (Pins/Mud/Gai/Dynein) control spindle orientation in the proliferating epithelia by revising "the canonical model", using the Drosophila follicular epithelium (FE). The authors examined the epistatic relationship among Pins, Mud and Dlg in FE and found that Pins controls the cortical localization of Mud by utilizing mutant analyses, and suggested their localization does not fully overlap using the newly generated knock-in allele. They also showed that Pins relocalization during mitosis depends on cortical remodeling, or passive model, where Pins localization changes with other membrane-anchored proteins. Their data further suggest that Pins cortical localization is not influenced by Dlg, but Pins-interacting domain of Dlg does affect spindle orientation. Based on these results, the authors propose that Dlg controls spindle orientation not by redistributing Pins, but by promoting (or "activating" from their definition) Pins-dependent spindle orientation. Interestingly, ectopic expression of Inscuteable (Insc) suggested that Insc localization, either apical or lateral, correlates with spindle orientation, and their localization is a dominant indicator of spindle orientation, compared to the localization of Pins and Mud, implicating potentially distinct roles of activation and localization of the spindle complex. Overall their genetic experiments are well-designed and provide stimulation for future research. However, their evidence is suggestive, but not conclusive for their proposal. I have several concerns about their conclusion and would like to request more detailed information as well as to propose additional experiments.

      Major concerns: 1. This report lacks technical and experimental details. As the typical fly paper, the authors need to show the exact genotypes of flies they used for experiments. This needs to be addressed for Figures 1-6, and Supplemental Figures. Especially, which Gal4 drivers were used for UAS-Pins wt or mutant constructs in Figure 4 with pins mutant background, Khc73, GUKH mutant backgrounds. Which exact flies were used for mutant clone experiments for Supplemental Figure 3? (A for typical mosaic, and B for MARCM). Without these details, it is impossible to evaluate results and reproduce by others.

      We take this concern very seriously!

      • We listed the GAL4 driver (Traffic jam-GAL4) in the first section of the Materials and Methods: Expression was driven by Traffic Jam-GAL4 (Olivieri et al., 2010). The transgene and relevant citation have been added to Table 1.
      • We explained the background stock for the MARCM experiment in the Materials and Methods: Mosaic Analysis with a Repressible Cell Marker (after the method of Lee and Luo) was carried out using GFP-mCD8 (under control of an actin promoter) as the marker. The transgene and relevant citation have been added to Table 1.
      • In line with other fly studies (eg. Nakajima et al., Nature 2013) and our own Drosophila work (Bergstralh et al Current Biology 2013, Bergstralh*, Lovegrove*, St Johnston NCB 2015, Bergstralh et al Development 2016, Finegan et al EMBO J 2019, Cammarota*, Finegan* et al Current Biology 2020) we were careful to show the relevant genotype components in each figure.
      • We included a fully referenced Supplementary Table (Table 1 – Drosophila genetics) listing every mutant allele or transgene with a citation and a note about availability. We have expanded this table in response to the author’s concern (see above).

        Related to the comment 1, how did the author perform "clonal expression of Ubi-Pin-YFP" in page 5? As far as I understand, Ubi-Pin-YFP is expressed ubiquitously by the ubiquitin promoter.

      The reviewer makes a good point. We regret that we did not make this experiment more clear. Ubi-Pins-YFP was recombined onto an FRT chromosome (FRT82B). We made mitotic clones.

      We have clarified this in the Methods section as follows:

      “Mitotic clones of Ubi-Pins-YFP were made by recombining the Ubi-Pins-YFP transgene onto the FRT82B chromosome”

      1. In page 6, if Pins relocalization is passive and is associated with membrane-anchored protein remodeling during mitosis, its relocalization can be suppressed by disrupting the process of mitotic remodeling (mitotic rounding). The authors should test this by either genetic disruption or pharmacological treatment for the actomyosin should cause defects in Pins relocalization, which bolster their conclusion.

      We agree that this is a cool experiment and are happy to give it another shot. However, we do note that interpretation could be difficult. We don’t know that mitotic rounding and membrane-anchored protein remodeling during mitosis are inextricably linked. Notably, the remodeling we describe reflects cell polarity; apical components are evidently moved to the lateral cortex. This is contrary to understanding of rounding, which reflects isotropic actomyosin activity (Chanet et al., (2017) Curr.Biol. & Rosa et al., (2015) Dev. Cell.). Therefore we don’t understand what a “negative” result would mean, or for that matter that a “positive” result would be safe to interpret.

      We have attempted many strategies to prevent cell rounding in the follicular epithelium, none of which have successfully prevented rounding. 1) We attempted to genetically knockdown Moesin in the FE and did not see an effect on cell rounding. However we couldn’t confirm knockdown and therefore are not confident in this manipulation. 2) It is difficult to interpret the result of genetically disrupting Myosin, because it causes pleiotropic effects, such as inhibition of the cell cycle, and disruption of monolayer architecture. 3) We treated egg chambers with Y-27632 (a Rok-inhibitor) and examined its effect on mitotic cell rounding and on cytokinesis, which are Rok-dependent processes. Our experiments were performed using manually-dissociated ovarioles treated for 45 minutes in Schneider Cell Medium supplemented with insulin. Even at our maximum concentration of 1mM Y-27632, several orders of magnitude above the Ki, we are unable to see any effect on mitotic cell shape or actin accumulation at the mitotic cortex and did not observe any evidence of defective cytokinesis. We also did not observe defects in spindle organization or orientation, as would be expected from failed rounding. We therefore do not believe that the inhibitor works in this tissue. One possible explanation is that the follicle cells are secretory, and likely to pass molecules taken up from the media quickly into the germline. Therefore, we do not anticipate that we can perform this experiment to our satisfaction.

      1. The critical message in this manuscript is that the core spindle complex mediated by Pins-Mud controls spindle orientation by "activation", but not localization. The findings that Pins and Mud localization is not influenced by Insc and that ecotpic Insc expression and genetic Mud depletion (Figure 6) might support their proposal, but these results just suggest their localization does not matter. I wonder how the authors could conclude and define "activation". What does this activation mean in the context of spindle orientation? Can the authors test activation by enzymatic activity or assess dynamics of spindle alignment?

      We intend for the critical message of the manuscript to be that “The spindle orienting machinery requires activation, not just localization.” We absolutely do not make the claim that localization is not important, only that it is not sufficient. The reviewer recognizes this point here and in a subsequent comment: “The authors showed that Pins and Mud localization themselves are not sufficient for the control of spindle orientation with genetic analyses.”

      We also do not claim that Pins and/or Mud localization are not impacted by Inscuteable. On the contrary, we plainly see and report that they are; the intensity profiles in Figure 6 are distinct from those in Figure 2, as discussed in the text.

      We appreciate the reviewer’s point about activation. Since we do not understand these proteins to be enzymes, we aren’t sure what enzymatic activity would be tested. The dynamics of spindle alignment in this slowly developing system are prohibitively difficult to measure: the mitotic index is very low (~2%) and only a very small fraction of those cells will be in a focal plane that permits accurate live imaging in the apical-basal axis. Alternative modes of activation include conformational change and/or a connection with other important molecules. The simplest possibility would be that Dlg allows Pins to bind Mud, but so far our data do not support it. We have added the following paragraph to our discussion:

      “The mechanism of activation remains unclear. While the most straightforward possibility is that Dlg promotes interaction between Pins and Mud, our results show that Mud is recruited to the cortex even when Dlg is disrupted (Figure 4D). Alternatively, Discs large may promote a conformational change in the spindle-orientation complex and/or a change in complex composition. Furthermore, the Inscuteable mechanism is not likely to work in the same way. Dlg binds to a conserved phosphosite in the central linker domain of Pins and should therefore allow for Pins to simultaneously interact with Mud (Johnston et al., 2009). Contrastingly, binding between Pins and Inscuteable is mediated by the TPR domains of Pins, meaning that Mud is excluded (Culurgioni et al., 2011; 2018). While a stable Pins-Inscuteable complex has been suggested to promote localization of a separate Pins-Mud-dynein complex, our work raises the possibility that it might also or instead promote activation.”

      1. In page 7-8, although Pins-S436D rescue spindle orientation, but Pins-S436A does not in pins null clones background, Pins localization is not influenced by Dlg. This questions how exactly Pins and Dlg can interact, and how Dlg affect Pins function. Related to this observation, in the embryonic Pins:Tom localization in dlg mutant does not provide strong evidence to support their conclusion given the experimental context is different from previous study (Chanet et al., 2017).

      We agree with the reviewer. Our data (this paper and previous papers) and the work of others indicate that this interaction is important for spindle orientation (Bergstralh et al., 2013a; Saadaoui et al., 2014; Chanet et al., 2017). However, we show here that Dlg doesn’t obviously impact Pins localization (as proposed in our earlier paper), but does impact the ability of the spindle orientation machinery to work (hence activity).

      The reviewer makes a very good point. Our experimental context is different from the previous study concerning Pins and Dlg in embryos: Chanet et al (2017) performed their work in the embryonic head, whereas we look at divisions in the ventral embryonic ectoderm. These are distinct mitotic zones (Foe et al. (1989) Development) and exhibit distinct epithelial morphologies. We show that Pins:Tom localizes at the mitotic cell cortex in Dlg[2]/Dlg[1P20] in cells in the ventral embryonic ectoderm. Our only conclusion from this experiment is that Pins:Tom can localize without the Dlg GUK domain in another cell type (outside the follicular epithelium). In the current preliminary revision we have softened our claim as follows:

      “We also examined the relationship between Pins and Dlg in the Drosophila embryo. A previous study showed that cortical localization of Pins in embryonic head epithelial cells is lost when Dlg mRNA is knocked down (Chanet et al., 2017). We find that Pins:Tom localizes to the cortex in the ventral ectoderm of early embryos from Dlg1P20/Dlg2 mothers, indicating that Pins localization in the ventral embryonic ectoderm epithelium does not require direct interaction with Dlg. We therefore speculate that Dlg plays an additional role in that tissue, upstream of Pins (Figure 4G).

      Our intention is to elaborate on our findings with additional data from embryos. To this end we have already acquired preliminary control data investigating the spindle angle with respect to the plane of the epithelium, and are in the process of examining spindle angles in dlg mutant embryonic tissue.

      In page 11, the authors state "... that activation of pulling in the FE requires Dlg". I was not convinced by anything related to "pulling". There is no evidence to support "pulling" or such dynamic in this paper, just showing Mud localization, correct?

      We appreciate the reviewer’s concern. The original sentence read that “We interpret [our data] to mean that interaction between Pins and Dlg, which is required for pulling, stabilizes the lateral pulling machinery even if Dlg is not a direct anchor.” This statement is based on work across multiple systems, including the C. elegans embryo (Grill et al Nature 2001), the Drosophila pupal notum (Bosveld et al, Nature 2016), and HeLa cells (Okumura et al eLife 2018), which shows that Mud/dynein-mediated pulling (on astral microtubules) orients/positions spindles. This is described in the introduction.

      To address the reviewer’s particular concern, we have replaced “pulling” with “spindle-orentation machinery,” so that this sentence now reads …“activation of the spindle-orientation machinery in the FE requires Dlg.”

      1. Ectopic expression of Insc (Figure 6) provided a new idea and hypothesis, but the conclusion is more complicated given that Insc is not expressed in normal FE. For example, the statement that "Inscuteable and Dlg mediate distinct and competitive mechanism for activation of the spindle-orienting machinery in follicle cells" is probably right, but it does not show anything meaningful since Insc does not exist in normal FE. Is Dlg in a competitive situation during mitosis of FE? If so, which molecules are competitive against Dlg? The important issue is to provide a new interpretation of how spindle orientation is controlled epithelial cells. I strongly recommend to add models in this manuscript for clarity.

      We considered the addition of model cartoons very carefully in preparing the original manuscript, and again after review. While we are certainly not going to “dig in” on this point, our concern is that model figures would obscure rather than clarify the message. As the reviewer points out, we do not understand how activation works, and as discussed in the manuscript we don’t think it’s likely to work the same way in follicle cells (Dlg) as it does in neuroblasts (Insc). Therefore model figure(s) are premature.

      We do not agree with the statement that "Inscuteable and Dlg mediate distinct and competitive mechanism for activation of the spindle-orienting machinery in follicle cells… does not show anything meaningful.” This is a remarkable finding because it suggests that there is more than one way to activate Pins. Given the critical importance of spindle orientation in the developing nervous system, and the evolutionary history of the Dlg-Pins interaction, we think that this finding supports a model in which the Dlg-Pins interaction evolved in basal organisms, and a second Inscuteable-Pins interaction evolved subsequently to support neural complexity. These ideas are raised in the Discussion.

      The reviewer also writes that “The important issue is to provide a new interpretation of how spindle orientation is controlled epithelial cells.” We find this concern perplexing, since the reviewer clearly recognizes that we have provided a new interpretation: Dlg is not a localization factor but rather a licensing factor for Pins-dependent spindle orientation.

      Minor comments: 8. Some sections were not written well in the manuscript. "It does not" in page 6. "These predictions are not met". I just couldn't understand what they stand for. Their writing has to be improved.

      Again, we are not going to dig in here, but we would prefer to retain the original language, which in our opinion is fairly clear. Our study is hypothesis-driven and based on assumptions made by the current model. We used direct language to help the reviewer understand what happened when we tested those assumptions.

      1. In page 9, Supplementary Figure 4 should be cited in the paragraph (A potential strategy for..), not Supplemental Figure 1A, and 1B.

      Good catch, thank you! We have corrected this.

      1. In page 10, the authors examine aPKC localization in Insc expressing context of FE. Does aPKC localization correlate with Insc localization (Insc dictates aPKC?)? aPKC is not involved in spindle orientation from the author's report (Bergstralh et al., 2013), so it does not likely provide any supportive evidence.

      I’m afraid we don’t entirely understand this comment. The interdependent relationship between aPKC and Inscuteable localization is long-established in the literature and was previously addressed in the follicle epithelium (Bergstralh et al. 2016). We do not make the claim here that aPKC governs spindle orientation. We are emphasizing that the difference between InscA and InscB cells extends to the relocalization of polarity components involved in Insc localization. As described in the manuscript, these data are provided to support our threshold model:

      “In agreement with interdependence between Inscuteable and the Par complex, we find that aPKC is stabilized at the apical cortex in InscA cells but enriched at the lateral cortex in InscB cells (Figure 6E). This finding is consistent with an Inscuteable-expression threshold model; below the threshold, Pins dictates lateral localization of Inscuteable and aPKC. Above the threshold, Inscuteable dictates apical localization of Pins and aPKC.”

      1. In Dicussion page 12, "In addition, we find that while the LGN S408D (Drosophila S436D) variant is reported to act as a phosphomimetic, expression of this variant has no obvious effect on division orientation (Johnston et al., 2012)". Where is the evidence for this? I interpret that this phosphomimetic form can rescue like wt-Pins not like unphospholatable mutant S436A, so it means that have an effect on spindle orientation, correct?

      The reviewer makes a good point. We regret the confusion. We mean to point out that the S436D variant is no different from the wild type. We have amended the text to clarify:

      “In addition, we find that while the LGN S408D (Drosophila 436D) variant is reported to act as a phosphomimetic, this variant does not cause an obvious mutant phenotype in the follicular epithelium (Johnston et al., 2012). What then is the purpose of this modification? Since the phosphosite is highly conserved through metazoans, one possibility to consider is that the phosphorylation regulates the spindle orientation role of Pins, whereas unphosphorylated Pins plays a different role (Schiller and Bergstralh, 2021).”

      Reviewer #2 (Significance (Required)):

      The authors showed that Pins and Mud localization themselves are not sufficient for the control of spindle orientation with genetic analyses. While the authors tried to challenge the concept of "canonical model", there is no clear demonstration of "activation" of the spindle complex. I appreciate their genetic evidence and new results, and understand the message that Pins and Mud effects are beyond localization, but there is no alternative mechanism to support their model. At the current stage, their evidence provides more hypothesis, not conclusion. Based on my expertise in Developmental and Cell biology, I suggest that the work has an interest in audience who studies spindle machinery, but for general audience.

      We think that the reviewer fundamentally shares our perspective on the study. Our work tests assumptions made by the canonical model and shows that they aren’t always met (meaning that the question of how spindle orientation works in epithelia at least is still unsolved), and that in the FE at least one component (Dlg) has been misunderstood. We reach a major conclusion, which is that localization of Pins is not enough to predict spindle orientation in the FE.

      It’s absolutely true that the precise molecular role of Dlg has not been solved by our study. This is a major question for the lab, and we are currently undertaking biochemical work to address it. It’s probably more work than we can (or should) do on our own, which is just one reason to share our current results with colleagues.

      One fundamental reason for undertaking this study is that 25 years of spindle orientation studies released into an environment in which “positive” conclusions are the bar for publication success may have burdened the field with claims that are overly-speculative. We appear to have contributed to this problem ourselves in 2013. With that in mind we contend that providing an alternative molecular mechanism at this point is premature and would impair rather than improve the literature.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Neville et al re-examine the role and regulation of Pins/LGN in Drosophila follicular epithelial cells. They argue that polar or bipolar enrichment of Pins localisation at the plasma membrane is not crucial for spindle orientation, and therefore propose that Pins must be somehow activated to function. These interpretations are not supported by the data. However, the data strongly suggest an alternative interpretation which is of major biological significance.

      As an initial point, we disagree with the summary above. We do not argue that enrichment of Pins is not crucial for spindle orientation. We argue that enrichment of Pins is not sufficient. This is why we titled the paper “The spindle orienting machinery requires activation, not just localization” instead of “The spindle orienting machinery requires activation, not localization.”

      Although we disagree with reviewer, we appreciate their criticism of our manuscript and are glad for the opportunity to clarify our findings. In our responses to the specific comments (below) we explain why our data contradict the reviewer’s model and what we will do to improve the manuscript.

      Comments:

      1. In the experiments on Dlg mutants (Fig 4D, S3) visualising Pins:Tom, the wild-type needs to be shown next to the Dlg mutant image, otherwise a comparison cannot be made. For example, Pins:Tom looks strongly enriched at the lateral membranes in the wild-type shown in Fig 2B&C, but much more weakly localised at the lateral membranes in Dlg1P20/2 mutants in Fig 4D. Thus, it looks like the Dlg GUK domain is required for full enrichment of Pins:Tom at lateral membranes, even if some low level of Pins can still bind to the plasma membrane in the absence of the Dlg GUK domain. Quantification would likely show a reduction in Pins:Tom lateral enrichment in the Dlg1P20/2 mutants - consistent with the spindle misorientation phenotype in these mutants.

      The reviewer raises a reasonable concern about Figure 4D. We noted the difficulty of imaging Pins:Tom, which is exceedingly faint, in our original manuscript. For technical reasons, only one copy of the transgene was imaged in the experiment presented in 4G (two copies were used in Figure 2B), and the lack of signal presented an even greater challenge. In the manuscript we went with the clearest image. To address the reviewer’s concern, we have added signal intensity plots to this figure showing that Pins:Tom and Pins-myr are both laterally enriched at mitosis in Dlg[1P20]/Dlg[2] mutants. These data have been added as a new panel (E) in Figure 4. We were also able to replace the pictures in 4D with new ones generated after review.

      1. In Fig 4E, the phosphomutant PinsS436A-GFP looks more strongly apical and less strongly lateral than the wildtype Pins-GFP, consistent with the spindle misorientation phenotype in S436A rescued pins mutants.

      The reviewer has an eagle eye! We did not detect a difference in localization across the three transgenes, though we were certainly looking for it (that’s why we generated these flies in the first place). Again, the strength of signal was a major challenge in these experiments, and we therefore went with the cleanest image. In response to the reviewer’s concern, we note that the S436A and S436D examples shown have equivalent apical signal, but only the S436A fails to rescue spindle orientation.

      Together, Reviewer Comments 1 and 2 suggest a model in which Dlg is required for lateral enrichment of Pins at mitosis. As described in the manuscript, this is the very model proposed in our own previous study (Bergstralh, Lovegrove, and St Johnston; 2013), and reiterated in a subsequent review article (Bergstralh, Dawney, and St Johnston; 2017). We point these publications out because the senior author of the current manuscript is not especially enthusiastic about showing himself to be wrong (twice!) in the literature. He therefore insisted on seeing multiple lines of evidence before making the counterargument:

      • The reviewer’s model (the 2013 model) is firstly challenged by work shown in Figure 3. We find that membrane-anchored proteins (even just myristoylated RFP!) demonstrate lateral enrichment at mitosis, regardless of whether or not they interact with the Dlg-GUK domain.
      • Even stronger evidence is shown in Figure 4F. Pins-myr-GFP is very plainly enriched at the lateral cortex in Dlg[IP20]/Dlg[2] mutant cells (now demonstrated with signal intensity plots in FIGURE 4E). However, the spindle doesn’t orient correctly (quantified in 4C). Since Dlg is impacting spindle orientation independently of Pins localization, these data support our “claim in the final sentence of the abstract ‘Local enrichment of Pins is not sufficient to determine spindle orientation; an activation step is also necessary’.”

        In the InscA examples, Pins:Tom looks apical. In the InscB examples, Pins:Tom looks more laterally localised, consistent with the spindle orientations in these experiments.

      These figures (6A-D) don’t only show/quantify Pins:Tom localization. They also show localization of GFP-Mud. Whereas Pins:Tom is certainly apically enriched in the InscA examples, the interesting finding is that GFP-Mud is not. In strong contrast, it instead shows a weak apical localization and a strong lateral enrichment. As described in the manuscript, this pattern of Mud localization predicts normal spindle orientation, which is not observed in these cells.

      Thus, these data appear to support the existing model that Pins enrichment at the plasma membrane is a key factor directing mitotic spindle orientation in these cells. The author's claim in the final sentence of the abstract "Local enrichment of Pins is not sufficient to determine spindle orientation; an activation step is also necessary" is not supported by the data.

      We are pleased that the reviewer shared this quote; our claim is that Pins localization is not sufficient, not that it is unnecessary (see above). We absolutely do not dispute that “Pins enrichment at the plasma membrane is a key factor directing mitotic spindle orientation.”

      The open question posed by the data is why GFP-Mud is excluded apically & basally during mitosis, while Pins:Tom is not. The simple alternative model is that Mud only localises to the plasma membrane where Pins is most strongly concentrated, such that Mud strongly amplifies any Pins asymmetry. Thus, even myr-Pins can still rescue a pins n mutant, because myr-Pins is still enriched laterally compared to apically (or basally).

      Thus, I would strongly suggest re-titling the manuscript to: "Mud/NuMA amplifies minor asymmetries in Pins localisation to orient the mitotic spindle".

      Well, that is a good-looking title, and we’re therefore sorry to decline the suggestion. However, as described above, Figure 4D shows that Pins enrichment does not always predict spindle orientation. More importantly, Figure 6A (cited by the reviewer in Comment 3) very plainly shows that Mud does not “only locali[ze] to the plasma membrane where Pins is most strongly concentrated.” In this picture – and across multiple InscA cells (Figure 6B) - Pins is strongly concentrated at the apical surface, whereas Mud is not.

      Mud/NuMA presumably achieves this amplification by binding to the plasma membrane only where Pins is concentrated above a critical threshold level. This would mean a non-linear model based on cooperativity among Pins monomers that increases the binding avidity to Mud above the threshold concentration of Pins monomers.

      This is essentially a minor revision of the standard model, which we expected would hold true in the FE. As described above, it is not supported by our data.

      Reviewer #3 (Significance (Required)):

      The manuscript is focused on the question of mitotic spindle orientation in epithelial cells, which is a fundamental unsolved problem in biology. The data reported are impressive and important, providing new insights into how the key spindle orientation factors Mud/NuMA and Pins/LGN localise during mitosis in epithelia. I recommend publication after major revisions.

      We are delighted that the reviewer finds our data impressive and important, and our experiments insightful. We understand that the “major revisions” requested are meant to bring the paper in line with their model (our own earlier model). Since the data in our original manuscript contradict that model, the revisions are instead focused on clarifying and strengthening our message.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank for reviewers for their feedback and were pleased they think that the manuscript is “of great interest to the scientific community”. The reviewers agree that the manuscript addresses an important question and that the identification of ASNS as a potential vulnerability of late-stage colorectal cancer is significant. The reviewers agree that our findings would be substantially strengthened by validation in state-of-the-art organoid model systems. We agree with this and are currently liaising with collaborators (Owen Sansom, Beatson Institute and Laura Thomas, Swansea University) to replicate our findings in both mouse and human colorectal organoid models. We will determine the sensitivity of colorectal organoid models to ASNS inhibition across a range of tumorigenicities and mutational profiles representing different stages of the adenoma-carcinoma progression. We believe these experiments will substantially strengthen the manuscript and lend weight to our finding that late-stage adenocarcinoma cells are vulnerable to ASNS inhibition.

      This is the predominant concern across reviewers, we are confident we can address this and all other, relatively minor, concerns as detailed below.

      Please find below a point-by-point reply to the reviewer’s comments. Reviewer comments are in italicized text and our responses follow.

      Reviewer #1

      • All of the findings in this manuscript are limited to in vitro observations, we know that most of the in vitro findings can not be translated in vivo. The manuscript would significantly benefit from in vivo experiments using the cells described in Fig.1 A and confirming the in vitro findings.*

      We agree that validation of our results in a more physiological context would significantly elevate our manuscript. In order to address this, we intend to use both human and mouse colorectal organoid models (please see detailed description of this in response to reviewer 2). We have decided to take this approach rather than conduct in vivoexperiments using the AA series (C1, SB, 10C and M) for two main reasons. Firstly, the C1 and SB cell lines do not form tumours in mice, consistent with them representing early colorectal adenoma cells. As such, we are not able to use the entire series in in vivo experiments. Secondly, we are keen to demonstrate replication of our findings in an alternative model. An organoid model would offer increased functional relevance, whilst allowing us to retain the ability to validate our observed metabolic dependencies across the adenoma to carcinoma sequence. We hope the reviewer agrees that these experiments would address their concerns.

      • The authors should provide proliferation data for the cell lines they used in this manuscript (C1, SB, 10C and M). In Fig. 1 B they show clear differences in EACR, can the authors provide data on glucose uptake differences in these analyzed cell lines.*

      We agree that proliferation and glucose uptake data would be a useful addition to the manuscript. We will provide doubling times for the cell lines used in this study and will measure glucose uptake by analysing extracellular glucose levels in the cell culture media from each of the cell lines.

      • In Figure 2 C the authors should provide isotope tracing data for the upper glycolysis (e.g. glucose and glucose-6-P) and alanine. In Figure 2 D the authors should provide the isotope tracing data for glutamine and glutamate.*

      We have data for glycolytic intermediates; glycerol-3-phosphate and dihydroxyacetone phosphate (DHAP) and alanine and will add them to the figures as requested.

      • Do the authors see any sign of reductive carboxylation in their U-13C glutamine experiments?*

      We observe only a low level of reductive carboxylation across the AA series cell lines (

      • Can the authors speculate how the C1, SB, 10C and M cell lines would react when glucose would be replaced with galactose in the culture environment and forcing the cells to increase oxidative phosphorylation (OXPHOS).*

      We would speculate that the cells would react similarly to our experiments in low glucose conditions displayed in Fig 3A-K. Given that M cells were shown to be the most flexible with regards to fuel source, we would expect them to be able to survive and proliferate more efficiently than the other cell lines in challenging culture conditions. Additionally, we would expect the C1s to survive well in galactose conditions, given that they rely less on glycolysis for ATP production and have significantly higher spare respiratory capacity compared to the more progressed cell lines.

      • Can the authors comment whether C1, SB, 10C and M cell lines show differences in coping with oxidative stress?*

      Again, we would speculate that the M cells would cope with exposure to oxidative stress best, given their metabolic flexibility. However, we would aim to test this by measuring the cellular response to hydrogen peroxide (which would induce oxidative stress) across all cell lines.

      • In the ASNS knockdown experiments do the authors detect an increase in glucose uptake in ASNS deficient cells.*

      This is an interesting question; we will address it by comparing extracellular glucose levels in C1 and M cells transfected with control and siRNA targeting ASNS.

      • Can the authors provide gene expression data that would explain the metabolic switch between early and late-stage adenocarcinoma? Do the authors detect any differences in mTORC1 activation among the C1, SB, 10C and M cell lines? ASNS is an ATF4 target, can the authors provide any expression data on ATF4 in their cell lines.*

      To address the first question, using our proteomics data, we have generated heatmaps showing protein abundance data from key metabolic pathways including glycolysis, the TCA cycle and the electron transport chain in the C1, SB and M cell lines. These data show an array of variation in protein expression of these pathways between the C1, SB and M cells, with no clear up or downregulation of these pathways as a whole, but rather more intricate regulation of clusters of proteins within the pathways. These data align well with the metabolomic data presented in Figure 2 and will allow us to investigate the mechanisms underlying the metabolic switch. These heat maps will be incorporated into the manuscript. Using the heatmaps we will identify and discuss key nodes we predict to explain the metabolic switch between early and late-stage adenocarcinoma. We will then determine whether manipulation of these nodes impact the metabolic phenotype of the cells experimentally. For example, the heat maps have highlighted that ENO3 or enolase 3 is strongly upregulated in the SB and M cells in comparison to the C1 cells and may be involved in driving the metabolic switch. Indeed, ENO3 has previously been found to promote colorectal cancer progression by enhancing glycolysis (Chen et al, Med Oncol, 2022), consistent with what we see here. To test this, we will knock down ENO3 across the cell line series and determine the impact on cellular phenotype and metabolism (using Seahorse extracellular flux analysis).

      With regards to mTORC1 activation, we have further analysed our proteomics data from C1, SB and M cells and have found that the M cells show significantly higher serine 235/236 phosphorylation of ribosomal S6 protein, a common readout for mTORC1 activation, compared to C1 and SB cells. Further, we aim to carry out immunoblotting across the four cell lines to analyse phospho-S6 (relative to total S6), 4E-BP1 and phospho-ULK-1 (relative to total ULK-1) levels.

      With regards to ATF4, using our proteomics data we have generated a heatmap of gene expression changes of ATF4 target genes in C1, SB and M cells that we will provide in supplementary material . These data suggest that there does not appear to be any clear pattern of enhanced or reduced ATF4 transcriptional activity across the cell lines, with different clusters of genes within this signature up or downregulated across the series. Moreover, Ingenuity Pathway Analysis (IPA) revealed that the ATF4 pathway showed an activation z-score of -0.41 (p=0.0134) in SB versus C1 cells, and 0.35 (p=0.00051) in M versus C1 cells (where a threshold of +/- 2 indicates activation/suppression of the pathway, respectively), confirming there is no clear regulation of this pathway between the cell lines. In addition, we will carry out immunoblotting for ATF4 expression levels across the cell line series.

      Reviewer #2

      *Major comments: *

      *Early CRC *

      *Molecular understanding of CRC is obviously of great interest and importance for the clinics. However, tumors of early stages are almost exclusively resected and not treated with systemic agents. Hence, the argument by the authors that the metabolic understanding of early CRC is of clinical relevance is somewhat misleading. Overall, it would have been much more clinically relevant to investigate the multiple steps of later stages during CRC progression. How about metabolic changes during metastasis. Deep mechanistic understanding of process during metastasis has striking clinical relevance. *

      We agree with the reviewer that understanding metastatic progression is of clinical relevance and should indeed be investigated in more detail. Using our model, we do shed light on a vulnerability of late-stage adenocarcinoma cells (sensitivity to asparagine synthetase (ASNS) inhibition). Indeed, we show that ASNS expression is elevated in both colorectal tumour and metastatic tissue in comparison to normal suggesting that our study may have revealed a vulnerability with utility for treating late stage (and potentially metastatic) tumours. The reviewer raises an important issue with the way we frame the utility of the model in the manuscript text. We will rewrite this to emphasise its utility in identifying late-stage vulnerabilities and the clinical value of this approach. We maintain that the molecular understanding of colorectal cancer across all stages of its progression will provide a valuable contribution to the field but agree that we should be more specific with regards to the clinical utility of our findings.

      *Model system *

      The cell lines used in this study are not state-of-the-art to investigate the complex process during CRC progression. The original paper is from 1993 in which the cell lines were generated does not allow understanding of the characteristics of these cell lines. Recently, multiple models have been established, for example in organoids, to investigate the progression of CRC much more reliably. There are systems that use CRISPR/CAS9 edited human organoids that follow the genetic alterations of CRC progression with accompanied phenotypes. Further, extensive biobanks of organoids from patients are available (also commercially) which better represent the stages of CRC. Similarly, the question raised above of how representative this progression cell line set is needs to addressed. The mutagen-induced progression could generate various alterations that are not detected in patients, hence create an artificial system. Overall, biological replicates are missing.

      We thank the reviewer for their critique and agree that our manuscript would be significantly strengthened if we were able to replicate our key findings in another model. We agree that the cell line series we have used here has limitations and we will make sure these are discussed by adding a ‘Limitations’ section to the ‘Discussion’. We maintain that the cell line series is a valuable tool in which to effectively identify metabolic vulnerabilities for further research. A key advantage of this system is that it is a human cell line series of the same lineage. In addition, we can easily conduct metabolomics and stable isotope tracer analysis allowing us to investigate cellular metabolic activity and manipulate any identified pathways easily. As such, the cell line series is an effective tool in which to identify potential vulnerabilities, but we agree that these vulnerabilities need to be validated in state-of-the-art organoid systems for the impact of the work to be clearer.

      To address this, in collaboration with Owen Sansom (Beatson Institute) and Laura Thomas (Swansea University), we aim to validate our identified metabolic dependency in mouse and human colorectal organoids respectively. We will determine the sensitivity of colorectal organoid models across a range of tumorigenicities and mutational profiles representing different stages of the adenoma-carcinoma progression to asparagine synthetase (ASNS) inhibition. We believe these experiments will substantially strengthen the manuscript and lend weight to our finding that late-stage adenocarcinoma cells are vulnerable to ASNS inhibition.

      *Gene Expression analysis *

      In Figure 5 C and D is the expression of ASNS to stages and overall survival from online available datasets correlated. Its unclear what the difference between tumor and metastatic in C means. The labelling in D is too small. Is the difference between the two groups significant? Are these patients only at a specific stage? It seems not that ASNS is a strong prognosticator; further stratification is needed to clarify the role of ASNS in CRC.

      The data displayed in Fig 5C and 5D are from separate datasets so are not correlated. In Fig 5C ‘Tumour’ refers to gene expression from the primary tumour site (in this case the colorectum), whereas ‘Metastatic’ refers to gene expression from a metastatic tumour (from which the primary tumour was of colorectal origin). We will make this clearer in the text and figure legend. We will also make the labelling on the survival plot in D clearer, indicating that the difference between the two groups is significant and displaying the p value clearly.

      The data included in the survival plots in 5D encompass all tumour stages. We have further analysed these data, adjusting for tumour stage. We found that high ASNS expression in later stage tumours (stage 3 and 4) is associated with poorer overall survival, whereas there is no significant difference in overall survival in earlier stage tumours (stage 1 and 2) in relation to ASNS expression. We plan to add this to the supplementary materials and discuss in the main text as it is consistent with our findings from the AA cell line series.

      *Western Blot controls *

      For the Western Blots in Figure 6 A and C the total S6 and ULK1 controls are missing what is needed to assess the effect on pS6 and pULK1 correctly.

      We will add total S6 and ULK1 controls to these figures.

      In the same panels, the KO efficacy is not very high in A (-ASN). However, this is crucial to make the conclusion that this cell line (C1) is not dependent on ASNS.

      The average knockdown efficiency in the C1 cells is 72% across n=3 experiments. Therefore, levels of ASNS are significantly reduced. However, to further validate this finding, we will use L-Albizziine, a competitive inhibitor of ASNS, at the same concentration in both C1 and M cells to eliminate any issues surrounding variation in knockdown efficiency and to replicate the results obtained using ASNS siRNA. These data will be included in supplementary material.

      *Minor comments: *

      *Statistical analysis of proliferation assays *

      The statistical significance for proliferation assays are missing.

      The statistical significance at the final timepoints of the proliferation assays are displayed on bar graphs in Supplementary Figure 5 (Figure S5B and C). We will add these to the proliferation curves in the main figure.

      Reviewer #3

      *A major concern is the model used in this study: *

      Sodium butyrate and the carcinogen N-methyl-N-nitro-nitrosoguanidine (MNNG) were used for the transformation. I believe this model was developed by one of the co-authors of the study, A.C. Williams in the 1990s. The relevance of the model for in vivo colon carcinogenesis is not entirely clear to me and I miss information why in particular sodium butyrate and MNNG were used. I am not an expert on colon carcinogenesis but I did not have the impression that this model has been widely adopted and I miss detailed information on the model itself as well as a critical discussion of its limitations.

      We thank the reviewer for raising these concerns and will include a ‘Limitations’ section in the manuscript ‘Discussion’ to elaborate on both the utility and the limitations of this model system. As described in response to concerns raised by reviewer #1 and reviewer #2, we plan to validate our findings in organoid models of colorectal tumourigenesis to strengthen the discoveries made using the AA cell line series.

      With regards to the use of sodium butyrate and MNNG for transformation of the C1 cells, justification was provided in the original paper describing generation of the cell line model series (Williams et al, Cancer Research. 1990). Sodium butyrate is naturally occurring in the gut and was used for the transformation of the C1 cells as it had been proposed to play a role in promoting colorectal tumorigenesis through upregulating carcinoembryonic antigen (CEA) expression and enhancing proliferation in adenoma cells able to resist growth arrest following treatment (Berry et al, Carcinogenesis. 1988). At the time of generating the cell line series, few reagents were known to induce transformation in human epithelial cells. However, MNNG was one of those and had been previously used to transform keratinocytes (Rhim et al, Science. 1986). Crucially, tumours formed in mice from xenografted 10C cells were found to be heterogeneous, displaying areas of differentiation with glandular organisation, the presence of functional goblet cells enabling mucin production, as well as areas of poorly or undifferentiated cells. Furthermore, cytogenetic analyses revealed that genetic changes in the cell line progression model such as chromosome 18q loss and KRAS activation replicate those seen in CRC patients (Williams et al, Oncogene. 1993). Together, these characteristics recapitulate human tumours in vivo, validating the use of sodium butyrate and MNNG in generating an in vitro CRC cell line model that represents human colorectal tumorigenesis.

      Figure 6: total levels of ribosomal S6 protein and ULK1 should be detected, quantified and used for normalization.

      We agree with the reviewer, we will add total S6 and ULK1 controls to these figures.

      Can you measure ASN upon inhibition of autophagy? Does it go down further?

      This is an interesting question, and we will address this experimentally by measuring ASN levels following treatment with chloroquine in the C1 and M cell lines. We will do this using stable isotope labelling and mass spectrometry and include the results in supplementary material.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Comments

      We thank all reviewers for providing very detailed, knowledgeable, and informative reviews.

      All reviewers were complementary about the data and the rigor of the study. Reviewers 2 and 3 commented on the significance of the work, and their assessments were complementary, specifically about the fact that it bridges several previous studies and links these to kinase-phosphatase regulation on the BUB complex. We agree that this a major strength of the work. That is why we also believe the comment by reviewer 1 that “most of the phenotypes/observations are consistent with the literature and not surprising” is actually a strength and not a weakness. Sometimes manuscripts that bring together various different findings into one conceptual model can be very powerful, even if each finding in isolation is not so surprising. In this case, the concept that a dual kinase-phosphatase module integrates two major mitotic processes will, we believe, prove to be a significant breakthrough that helps to explain how these processes are properly integrated at kinetochores.

      The main criticism of all reviewers related to the interpretations and writing style, which in general, we felt were valid. We will take on board all these comments, reword the manuscript during revision, and provide a detailed response to each of these points at resubmission.

      In terms of points requiring new experiments, there were 3 in total:

      1) Reviewers 1 and 3 raised an important issue about the feedback loop which will be addressed with new experiments to uncouple the feedback.

      2) Reviewer 2 made an important point about KNL1 levels, including a good suggestion to perform FRAP analysis to examine BUB complex dynamics when MELT numbers are increased. We will carry out this experiment prior to revision.

      3) Reviewer 1 had a second major comment regarding the modulation of MELT number and how this cannot be directly linked to PLK1/PP2A levels. We have 3 new experiments to add regarding this, performed already, which are discussed in the section below.

      All other comments were textual points that in most case we felt were valid. They showed that all reviewers had a very good grasp of the paper, the concepts, and the field in general. So, we finish by thanking all reviewers again for their thorough and detailed assessments of our manuscript. The comments they raised will help us to improve the manuscript after revision.

      Description of the planned revisions

      Three main points:

      1) The role of the feedback loop [reviewers 1 and 3]:

      The general issue is explained succinctly by reviewer 3’s comment:

      “The argument linking the negative feedback loop to biological functions is not straightforward. The authors provide evidence in Figure 1 for regulatory pathways between docked PLK1 and bound PP2A. However, their assays in Figure 2 bypass the feedback loop by directly modulating PP2A activity. These experiment supports an argument that the kinase/phosphatase activity balance is important, but do not address the feedback loop specifically (which could potentially be done using mutations that disrupt the feedback regulation). The claim that "a homeostatic feedback loop maintains an optimal balance of PLK1 and PP2A on the BUB complex" is too strong because there is no direct evidence connecting the feedback loop to optimal function.”

      This is a good point that we will address at revision. We demonstrate that the enzymes regulate each other on the BUB complex in figure 1 (PLK1 recruits PP2A, and PP2A removes PLK1), which balances their levels on the BUB complex. To determine consequences of upsetting this balance we locked either the kinase-bound or phosphatase-bound states (Figure 2). Importantly, this is required to assess direct phenotypes associated with each, but it does not directly demonstrate the role of the feedback loop. To do this we will generate mutants, as suggested, and analyse their phenotypes.

      We will mutate the PLK1 binding site (T620A) and recover the PLK1-regulated sites in the KARD motif to phospho-mimicking aspartates (S676D/T680D), analyzing effects on PLK1/PP2A recruitment, chromosome alignment and SAC strength. We predict that this will remove PLK1 and recover some PP2A, but to lower levels overall than the BUBR1-B56 fusion. In that case the phenotypes will probably be milder, but that would not change the overall conclusions.

      We maintain that locking PLK1 on its phospho-binding site (in BUBR1-DPP2A) is the ideal scenario to test direct PLK1 roles, but we will also now create alanine mutants of the PLK1 site (S676A/T680A) and the CDK1 site (S670A) to address the feedback loops controlled by CDK1 and PLK1. Our prediction is that these will skew the balance towards PLK1, without fully removing PP2A, again likely to produce milder intermediate phenotypes.

      It is definitely worth testing these predictions, because it would directly address the role of the feedback loop and it would avoid relying solely on “artificially high levels” as mentioned by reviewer 1. One final point on this however, the PLK1 recruitment in DPP2A cells is not artificial – it is PLK1 bound to its native phospho-motif when PP2A is unbound (without any feedback from PP2A this phospho-site and PLK1 binding increase to the observed maximal levels). The fusion of B56 is admittedly less optimal, but this does still lock the phosphatase-bound state in a set stoichiometry, crucially in the absence of kinase. This is required to assess direct phosphatase effects. These PLK1/PP2A levels may well be higher than observed physiologically on the BUB complex when considering the behavior of all BUBR1 molecules, since we doubt they ever reach 1:1 stoichiometry with either PLK1 or PP2A. However, the feedback loop is operating within individual molecules (figure 1), which may well individually flip between PLK1 or PP2A bound states. This may occur on certain molecules at specific times. Therefore, locking the PLK1.PP2A-bound state on all molecules is, in our opinion, still a valid and useful perturbation to assess function of these two states.

      2) The increase recruitment of BUB1-PLK1/PP2A when MELT numbers are increased [reviewer 2]

      "While in the 12x and 19x mutant conditions there are more molecules of BUBs per Knl1, the overall BUB levels are the same as in wild-type controls. Since the MELT repeat used throughout the paper is a consensus sequence that is likely optimal for BUB binding, it is possible that the phenotypes of the 12x and 19x mutants are explained because of an increase in the affinity of BUBs for Knl1 rather than overall levels. This would also help explain why Knl1 and BUBs are observed at the spindle midzone in the 19x mutant (Fig. S4)"

      The reviewer raises an important issue here, when stating that increasing MELT numbers decreases KNL1 kinetochore recruitment. This has the net effect of normalizing overall BUB1-PLK1/PP2A kinetochore levels, even though BUB1-PLK1/PP2A recruitment per KNL1 molecule is increased. That is why we were careful to state BUB1-PLK1/PP2A were increased “on each KNL1 molecule” and not “on kinetochores” when referring to the effect in the 12x/19x MELT mutant. However, this could easily be misinterpreted so this point will be clarified at revision.

      The issue of why the phenotype is so dependent on kinase/phosphatase level per KNL1 molecules is an important one however, which has puzzled us until now. We think the suggestion to look at turnover by FRAP is a good one, because enhanced binding strength could underlie the phenotype here, and potentially explain the lack of disassembly at anaphase. We will perform these experiments at revision to see if they can clarify the issue.

      3) The link between MELT number and PLK1/PP2A levels [Reviewer 1]

      “My second comment relates to the fact that the two parts of the paper are not directly linked although the authors try to do this. They nicely manipulate the MELT repeats on KNL1 to change the number of Bub complexes. However, they cannot directly link the data to changes in Plk1 and PP2A-B56 levels only as many other things are changing. By increasing MELT numbers Bub complex and Mad1/Mad2 levels increase as well as an example and this makes interpretations complicated. To me these experiments are not addressing the main conclusions of the paper.”

      We do not agree with this overall assessment, but there are two elements to this comment: the effect of modulating MELT number on SAC strength (and its link to PLK1) or on KT-MT stability (and link to P2A). We will therefore discuss each separately:

      For SAC regulation, we feel that the data is clear and the interpretations are justified, although we will add new data to support this point after revision. Increasing MELT number causes defects in MELT-BUB dissociation and SAC silencing (4a-c). Importantly, these phenotypes can be completely rescued by inhibiting PLK1 (4d-e). So, we do link the effects of high MELT number to PLK1 activity. Our interpretation is that when MELT numbers are increased the ability of PLK1 to phosphorylate these motifs and maintain the SAC platform is enhanced (when MPS1 is inhibited pharmacologically or upon KT-MT attachment). So, whilst it is true that many factors, such as the kinetochore levels of BUB/MAD1/MAD2, are crucial for the SAC, the ability of PLK1 to maintain these levels (via pMELT-BUB1) is crucial and that changes as MELT number increases. This contributes directly to the observe SAC silencing phenotype, as confirmed by the complete rescue of this phenotype after PLK1 inhibition.

      We did also explore the possibility that increased BUB1 activity could also contribute to SAC strengthening, for example, by enhancing Aurora B recruitment to centromeres. However, BUB1 inhibition did not alter SAC strength or MELT dephosphorylation kinetics. We will add this data after revision.

      We also evaluated the levels of phosphorylated MAD1-pT716, which is important for MCC assembly (Ji et al. 2017, Ji et al. 2018, Faesen, 2017). Our data show that WT and 19xMELT exhibit similar MAD1-pT716 levels during a nocodazole arrest and following MPS1 inhibition. In summary, the main changes we observe are elevated BUB1 levels due to MELT phosphorylation, and increased BUB1 phosphorylation on pT461 (as shown in Figure 4h). All this points towards a localized effect of PLK1 on/around the BUB complex. We will add this data and make this point clearly at revision.

      For KT-MT attachment regulation, we agree that we do not have a similar way to inhibit PP2A-B56 activity to rescue hyperstable microtubule attachment when MELT numbers are high. For this, we require a way to rapidly inhibit PP2A-B56 activity after attachments have formed, something that is not technical feasible at the present. We can also not say for certain that reduced MELT numbers destabilize microtubule due to lack of PP2A, however we feel this is the most like interpretation for the following reasons. The phenotype of removing PP2A from BUBR1 or removing the MELT from KNL1 (along with all associated factors), is identical: mutant cells have comparable chromosome misalignment due to unattached kinetochores (compare 2F-I with 5A-D). Therefore, the additional factors lost by removing the MELTs cannot be having such a strong impact in KT-MT attachment. The obvious factor that could affect attachment strength is again BUB1, via Aurora B recruitment to centromeres. However, loss of BUB1 (after MELT removal) is predicted to enhance attachment stability (reduced Aurora B) and not decrease it, as we observe. So, whilst we cannot definitely conclude that modulating MELT number affect attachment stability via PP2A, we feel that this is certainly the most likely explanation. We will state this clearly in the revised text.

      Description of analyses that authors prefer not to carry out

      “SAC strength of BubR1 WT, ΔC and B56γ was analysed in the presence of nocodazole + MPS1i. It would be interesting to see what the phenotypes are without MPS1i [Reviewer 1]”

      In the absence of MPS1i basal MELT phosphorylation increases (DC) or decreases (B56g) as predicted (Figure 2d; compare timepoint 0 all conditions). This does not cause any change to SAC strength when all kinetochores are unattached in nocodazole (not shown). The sensitize SAC assay (nocodazole + MPSi) has been used by many groups (originally Santaguida et al, 2011; Saurin et al, 2011), because it reduces SAC signals from all unattached kinetochores which would otherwise produce a saturated response. In this case, we specifically chose a dose of MPS1 inhibitor that gave a partial SAC response from which we could observe either strengthening or weakening – a key point of the assay. Indeed, this showed that the SAC was strengthened (DC) or weakened (B56g), as predicted (Figure 2E). The only other way to do this, which has been used by some in the literature, is to use a low dose of nocodazole which prevents all kinetochore from signaling to the SAC. We specifically wanted to avoid this situation because then you cannot untangle the effects on SAC and KT-MT attachment stability – this was crucial in our case.

    1. Who Can Name the Bigger Number?by Scott Aaronson [Author's blog] [This essay in Spanish] [This essay in French] [This essay in Chinese] In an old joke, two noblemen vie to name the bigger number. The first, after ruminating for hours, triumphantly announces "Eighty-three!" The second, mightily impressed, replies "You win." A biggest number contest is clearly pointless when the contestants take turns. But what if the contestants write down their numbers simultaneously, neither aware of the other’s? To introduce a talk on "Big Numbers," I invite two audience volunteers to try exactly this. I tell them the rules: You have fifteen seconds. Using standard math notation, English words, or both, name a single whole number—not an infinity—on a blank index card. Be precise enough for any reasonable modern mathematician to determine exactly what number you’ve named, by consulting only your card and, if necessary, the published literature. So contestants can’t say "the number of sand grains in the Sahara," because sand drifts in and out of the Sahara regularly. Nor can they say "my opponent’s number plus one," or "the biggest number anyone’s ever thought of plus one"—again, these are ill-defined, given what our reasonable mathematician has available. Within the rules, the contestant who names the bigger number wins. Are you ready? Get set. Go. The contest’s results are never quite what I’d hope. Once, a seventh-grade boy filled his card with a string of successive 9’s. Like many other big-number tyros, he sought to maximize his number by stuffing a 9 into every place value. Had he chosen easy-to-write 1’s rather than curvaceous 9’s, his number could have been millions of times bigger. He still would been decimated, though, by the girl he was up against, who wrote a string of 9’s followed by the superscript 999. Aha! An exponential: a number multiplied by itself 999 times. Noticing this innovation, I declared the girl’s victory without bothering to count the 9’s on the cards. And yet the girl’s number could have been much bigger still, had she stacked the mighty exponential more than once. Take , for example. This behemoth, equal to 9387,420,489, has 369,693,100 digits. By comparison, the number of elementary particles in the observable universe has a meager 85 digits, give or take. Three 9’s, when stacked exponentially, already lift us incomprehensibly beyond all the matter we can observe—by a factor of about 10369,693,015. And we’ve said nothing of or . Place value, exponentials, stacked exponentials: each can express boundlessly big numbers, and in this sense they’re all equivalent. But the notational systems differ dramatically in the numbers they can express concisely. That’s what the fifteen-second time limit illustrates. It takes the same amount of time to write 9999, 9999, and —yet the first number is quotidian, the second astronomical, and the third hyper-mega astronomical. The key to the biggest number contest is not swift penmanship, but rather a potent paradigm for concisely capturing the gargantuan. Such paradigms are historical rarities. We find a flurry in antiquity, another flurry in the twentieth century, and nothing much in between. But when a new way to express big numbers concisely does emerge, it’s often a byproduct of a major scientific revolution: systematized mathematics, formal logic, computer science. Revolutions this momentous, as any Kuhnian could tell you, only happen under the right social conditions. Thus is the story of big numbers a story of human progress. And herein lies a parallel with another mathematical story. In his remarkable and underappreciated book A History of π, Petr Beckmann argues that the ratio of circumference to diameter is "a quaint little mirror of the history of man." In the rare societies where science and reason found refuge—the early Athens of Anaxagoras and Hippias, the Alexandria of Eratosthenes and Euclid, the seventeenth-century England of Newton and Wallis—mathematicians made tremendous strides in calculating π. In Rome and medieval Europe, by contrast, knowledge of π stagnated. Crude approximations such as the Babylonians’ 25/8 held sway. This same pattern holds, I think, for big numbers. Curiosity and openness lead to fascination with big numbers, and to the buoyant view that no quantity, whether of the number of stars in the galaxy or the number of possible bridge hands, is too immense for the mind to enumerate. Conversely, ignorance and irrationality lead to fatalism concerning big numbers. Historian Ilan Vardi cites the ancient Greek term sand-hundred, colloquially meaning zillion; as well as a passage from Pindar’s Olympic Ode II asserting that "sand escapes counting." ¨ But sand doesn’t escape counting, as Archimedes recognized in the third century B.C. Here’s how he began The Sand-Reckoner, a sort of pop-science article addressed to the King of Syracuse: There are some ... who think that the number of the sand is infinite in multitude ... again there are some who, without regarding it as infinite, yet think that no number has been named which is great enough to exceed its multitude ... But I will try to show you [numbers that] exceed not only the number of the mass of sand equal in magnitude to the earth ... but also that of a mass equal in magnitude to the universe. This Archimedes proceeded to do, essentially by using the ancient Greek term myriad, meaning ten thousand, as a base for exponentials. Adopting a prescient cosmological model of Aristarchus, in which the "sphere of the fixed stars" is vastly greater than the sphere in which the Earth revolves around the sun, Archimedes obtained an upper bound of 1063 on the number of sand grains needed to fill the universe. (Supposedly 1063 is the biggest number with a lexicographically standard American name: vigintillion. But the staid vigintillion had better keep vigil lest it be encroached upon by the more whimsically-named googol, or 10100, and googolplex, or .) Vast though it was, of course, 1063 wasn’t to be enshrined as the all-time biggest number. Six centuries later, Diophantus developed a simpler notation for exponentials, allowing him to surpass . Then, in the Middle Ages, the rise of Arabic numerals and place value made it easy to stack exponentials higher still. But Archimedes’ paradigm for expressing big numbers wasn’t fundamentally surpassed until the twentieth century. And even today, exponentials dominate popular discussion of the immense. Consider, for example, the oft-repeated legend of the Grand Vizier in Persia who invented chess. The King, so the legend goes, was delighted with the new game, and invited the Vizier to name his own reward. The Vizier replied that, being a modest man, he desired only one grain of wheat on the first square of a chessboard, two grains on the second, four on the third, and so on, with twice as many grains on each square as on the last. The innumerate King agreed, not realizing that the total number of grains on all 64 squares would be 264-1, or 18.6 quintillion—equivalent to the world’s present wheat production for 150 years. Fittingly, this same exponential growth is what makes chess itself so difficult. There are only about 35 legal choices for each chess move, but the choices multiply exponentially to yield something like 1050 possible board positions—too many for even a computer to search exhaustively. That’s why it took until 1997 for a computer, Deep Blue, to defeat the human world chess champion. And in Go, which has a 19-by-19 board and over 10150 possible positions, even an amateur human can still rout the world’s top-ranked computer programs. Exponential growth plagues computers in other guises as well. The traveling salesman problem asks for the shortest route connecting a set of cities, given the distances between each pair of cities. The rub is that the number of possible routes grows exponentially with the number of cities. When there are, say, a hundred cities, there are about 10158 possible routes, and, although various shortcuts are possible, no known computer algorithm is fundamentally better than checking each route one by one. The traveling salesman problem belongs to a class called NP-complete, which includes hundreds of other problems of practical interest. (NP stands for the technical term ‘Nondeterministic Polynomial-Time.’) It’s known that if there’s an efficient algorithm for any NP-complete problem, then there are efficient algorithms for all of them. Here ‘efficient’ means using an amount of time proportional to at most the problem size raised to some fixed power—for example, the number of cities cubed. It’s conjectured, however, that no efficient algorithm for NP-complete problems exists. Proving this conjecture, called P¹ NP, has been a great unsolved problem of computer science for thirty years. Although computers will probably never solve NP-complete problems efficiently, there’s more hope for another grail of computer science: replicating human intelligence. The human brain has roughly a hundred billion neurons linked by a hundred trillion synapses. And though the function of an individual neuron is only partially understood, it’s thought that each neuron fires electrical impulses according to relatively simple rules up to a thousand times each second. So what we have is a highly interconnected computer capable of maybe 1014 operations per second; by comparison, the world’s fastest parallel supercomputer, the 9200-Pentium Pro teraflops machine at Sandia National Labs, can perform 1012 operations per second. Contrary to popular belief, gray mush is not only hard-wired for intelligence: it surpasses silicon even in raw computational power. But this is unlikely to remain true for long. The reason is Moore’s Law, which, in its 1990’s formulation, states that the amount of information storable on a silicon chip grows exponentially, doubling roughly once every two years. Moore’s Law will eventually play out, as microchip components reach the atomic scale and conventional lithography falters. But radical new technologies, such as optical computers, DNA computers, or even quantum computers, could conceivably usurp silicon’s place. Exponential growth in computing power can’t continue forever, but it may continue long enough for computers—at least in processing power—to surpass human brains. To prognosticators of artificial intelligence, Moore’s Law is a glorious herald of exponential growth. But exponentials have a drearier side as well. The human population recently passed six billion and is doubling about once every forty years. At this exponential rate, if an average person weighs seventy kilograms, then by the year 3750 the entire Earth will be composed of human flesh. But before you invest in deodorant, realize that the population will stop increasing long before this—either because of famine, epidemic disease, global warming, mass species extinctions, unbreathable air, or, entering the speculative realm, birth control. It’s not hard to fathom why physicist Albert Bartlett asserted "the greatest shortcoming of the human race" to be "our inability to understand the exponential function." Or why Carl Sagan advised us to "never underestimate an exponential." In his book Billions & Billions, Sagan gave some other depressing consequences of exponential growth. At an inflation rate of five percent a year, a dollar is worth only thirty-seven cents after twenty years. If a uranium nucleus emits two neutrons, both of which collide with other uranium nuclei, causing them to emit two neutrons, and so forth—well, did I mention nuclear holocaust as a possible end to population growth? ¨ Exponentials are familiar, relevant, intimately connected to the physical world and to human hopes and fears. Using the notational systems I’ll discuss next, we can concisely name numbers that make exponentials picayune by comparison, that subjectively speaking exceed as much as the latter exceeds 9. But these new systems may seem more abstruse than exponentials. In his essay "On Number Numbness," Douglas Hofstadter leads his readers to the precipice of these systems, but then avers: If we were to continue our discussion just one zillisecond longer, we would find ourselves smack-dab in the middle of the theory of recursive functions and algorithmic complexity, and that would be too abstract. So let’s drop the topic right here. But to drop the topic is to forfeit, not only the biggest number contest, but any hope of understanding how stronger paradigms lead to vaster numbers. And so we arrive in the early twentieth century, when a school of mathematicians called the formalists sought to place all of mathematics on a rigorous axiomatic basis. A key question for the formalists was what the word ‘computable’ means. That is, how do we tell whether a sequence of numbers can be listed by a definite, mechanical procedure? Some mathematicians thought that ‘computable’ coincided with a technical notion called ‘primitive recursive.’ But in 1928 Wilhelm Ackermann disproved them by constructing a sequence of numbers that’s clearly computable, yet grows too quickly to be primitive recursive. Ackermann’s idea was to create an endless procession of arithmetic operations, each more powerful than the last. First comes addition. Second comes multiplication, which we can think of as repeated addition: for example, 5´3 means 5 added to itself 3 times, or 5+5+5 = 15. Third comes exponentiation, which we can think of as repeated multiplication. Fourth comes ... what? Well, we have to invent a weird new operation, for repeated exponentiation. The mathematician Rudy Rucker calls it ‘tetration.’ For example, ‘5 tetrated to the 3’ means 5 raised to its own power 3 times, or , a number with 2,185 digits. We can go on. Fifth comes repeated tetration: shall we call it ‘pentation’? Sixth comes repeated pentation: ‘hexation’? The operations continue infinitely, with each one standing on its predecessor to peer even higher into the firmament of big numbers. If each operation were a candy flavor, then the Ackermann sequence would be the sampler pack, mixing one number of each flavor. First in the sequence is 1+1, or (don’t hold your breath) 2. Second is 2´2, or 4. Third is 3 raised to the 3rd power, or 27. Hey, these numbers aren’t so big! Fee. Fi. Fo. Fum. Fourth is 4 tetrated to the 4, or , which has 10154 digits. If you’re planning to write this number out, better start now. Fifth is 5 pentated to the 5, or with ‘5 pentated to the 4’ numerals in the stack. This number is too colossal to describe in any ordinary terms. And the numbers just get bigger from there. Wielding the Ackermann sequence, we can clobber unschooled opponents in the biggest-number contest. But we need to be careful, since there are several definitions of the Ackermann sequence, not all identical. Under the fifteen-second time limit, here’s what I might write to avoid ambiguity: A(111)—Ackermann seq—A(1)=1+1, A(2)=2´2, A(3)=33, etc Recondite as it seems, the Ackermann sequence does have some applications. A problem in an area called Ramsey theory asks for the minimum dimension of a hypercube satisfying a certain property. The true dimension is thought to be 6, but the lowest dimension anyone’s been able is prove is so huge that it can only be expressed using the same ‘weird arithmetic’ that underlies the Ackermann sequence. Indeed, the Guinness Book of World Records once listed this dimension as the biggest number ever used in a mathematical proof. (Another contender for the title once was Skewes’ number, about , which arises in the study of how prime numbers are distributed. The famous mathematician G. H. Hardy quipped that Skewes’ was "the largest number which has ever served any definite purpose in mathematics.") What’s more, Ackermann’s briskly-rising cavalcade performs an occasional cameo in computer science. For example, in the analysis of a data structure called ‘Union-Find,’ a term gets multiplied by the inverse of the Ackermann sequence—meaning, for each whole number X, the first number N such that the Nth Ackermann number is bigger than X. The inverse grows as slowly as Ackermann’s original sequence grows quickly; for all practical purposes, the inverse is at most 4. ¨ Ackermann numbers are pretty big, but they’re not yet big enough. The quest for still bigger numbers takes us back to the formalists. After Ackermann demonstrated that ‘primitive recursive’ isn’t what we mean by ‘computable,’ the question still stood: what do we mean by ‘computable’? In 1936, Alonzo Church and Alan Turing independently answered this question. While Church answered using a logical formalism called the lambda calculus, Turing answered using an idealized computing machine—the Turing machine—that, in essence, is equivalent to every Compaq, Dell, Macintosh, and Cray in the modern world. Turing’s paper describing his machine, "On Computable Numbers," is rightly celebrated as the founding document of computer science. "Computing," said Turing, is normally done by writing certain symbols on paper. We may suppose this paper to be divided into squares like a child’s arithmetic book. In elementary arithmetic the 2-dimensional character of the paper is sometimes used. But such use is always avoidable, and I think it will be agreed that the two-dimensional character of paper is no essential of computation. I assume then that the computation is carried out on one-dimensional paper, on a tape divided into squares. Turing continued to explicate his machine using ingenious reasoning from first principles. The tape, said Turing, extends infinitely in both directions, since a theoretical machine ought not be constrained by physical limits on resources. Furthermore, there’s a symbol written on each square of the tape, like the ‘1’s and ‘0’s in a modern computer’s memory. But how are the symbols manipulated? Well, there’s a ‘tape head’ moving back and forth along the tape, examining one square at a time, writing and erasing symbols according to definite rules. The rules are the tape head’s program: change them, and you change what the tape head does. Turing’s august insight was that we can program the tape head to carry out any computation. Turing machines can add, multiply, extract cube roots, sort, search, spell-check, parse, play Tic-Tac-Toe, list the Ackermann sequence. If we represented keyboard input, monitor output, and so forth as symbols on the tape, we could even run Windows on a Turing machine. But there’s a problem. Set a tape head loose on a sequence of symbols, and it might stop eventually, or it might run forever—like the fabled programmer who gets stuck in the shower because the instructions on the shampoo bottle read "lather, rinse, repeat." If the machine’s going to run forever, it’d be nice to know this in advance, so that we don’t spend an eternity waiting for it to finish. But how can we determine, in a finite amount of time, whether something will go on endlessly? If you bet a friend that your watch will never stop ticking, when could you declare victory? But maybe there’s some ingenious program that can examine other programs and tell us, infallibly, whether they’ll ever stop running. We just haven’t thought of it yet. Nope. Turing proved that this problem, called the Halting Problem, is unsolvable by Turing machines. The proof is a beautiful example of self-reference. It formalizes an old argument about why you can never have perfect introspection: because if you could, then you could determine what you were going to do ten seconds from now, and then do something else. Turing imagined that there was a special machine that could solve the Halting Problem. Then he showed how we could have this machine analyze itself, in such a way that it has to halt if it runs forever, and run forever if it halts. Like a hound that finally catches its tail and devours itself, the mythical machine vanishes in a fury of contradiction. (That’s the sort of thing you don’t say in a research paper.) ¨ "Very nice," you say (or perhaps you say, "not nice at all"). "But what does all this have to do with big numbers?" Aha! The connection wasn’t published until May of 1962. Then, in the Bell System Technical Journal, nestled between pragmatically-minded papers on "Multiport Structures" and "Waveguide Pressure Seals," appeared the modestly titled "On Non-Computable Functions" by Tibor Rado. In this paper, Rado introduced the biggest numbers anyone had ever imagined. His idea was simple. Just as we can classify words by how many letters they contain, we can classify Turing machines by how many rules they have in the tape head. Some machines have only one rule, others have two rules, still others have three rules, and so on. But for each fixed whole number N, just as there are only finitely many distinct words with N letters, so too are there only finitely many distinct machines with N rules. Among these machines, some halt and others run forever when started on a blank tape. Of the ones that halt, asked Rado, what’s the maximum number of steps that any machine takes before it halts? (Actually, Rado asked mainly about the maximum number of symbols any machine can write on the tape before halting. But the maximum number of steps, which Rado called S(n), has the same basic properties and is easier to reason about.) Rado called this maximum the Nth "Busy Beaver" number. (Ah yes, the early 1960’s were a more innocent age.) He visualized each Turing machine as a beaver bustling busily along the tape, writing and erasing symbols. The challenge, then, is to find the busiest beaver with exactly N rules, albeit not an infinitely busy one. We can interpret this challenge as one of finding the "most complicated" computer program N bits long: the one that does the most amount of stuff, but not an infinite amount. Now, suppose we knew the Nth Busy Beaver number, which we’ll call BB(N). Then we could decide whether any Turing machine with N rules halts on a blank tape. We’d just have to run the machine: if it halts, fine; but if it doesn’t halt within BB(N) steps, then we know it never will halt, since BB(N) is the maximum number of steps it could make before halting. Similarly, if you knew that all mortals died before age 200, then if Sally lived to be 200, you could conclude that Sally was immortal. So no Turing machine can list the Busy Beaver numbers—for if it could, it could solve the Halting Problem, which we already know is impossible. But here’s a curious fact. Suppose we could name a number greater than the Nth Busy Beaver number BB(N). Call this number D for dam, since like a beaver dam, it’s a roof for the Busy Beaver below. With D in hand, computing BB(N) itself becomes easy: we just need to simulate all the Turing machines with N rules. The ones that haven’t halted within D steps—the ones that bash through the dam’s roof—never will halt. So we can list exactly which machines halt, and among these, the maximum number of steps that any machine takes before it halts is BB(N). Conclusion? The sequence of Busy Beaver numbers, BB(1), BB(2), and so on, grows faster than any computable sequence. Faster than exponentials, stacked exponentials, the Ackermann sequence, you name it. Because if a Turing machine could compute a sequence that grows faster than Busy Beaver, then it could use that sequence to obtain the D‘s—the beaver dams. And with those D’s, it could list the Busy Beaver numbers, which (sound familiar?) we already know is impossible. The Busy Beaver sequence is non-computable, solely because it grows stupendously fast—too fast for any computer to keep up with it, even in principle. This means that no computer program could list all the Busy Beavers one by one. It doesn’t mean that specific Busy Beavers need remain eternally unknowable. And in fact, pinning them down has been a computer science pastime ever since Rado published his article. It’s easy to verify that BB(1), the first Busy Beaver number, is 1. That’s because if a one-rule Turing machine doesn’t halt after the very first step, it’ll just keep moving along the tape endlessly. There’s no room for any more complex behavior. With two rules we can do more, and a little grunt work will ascertain that BB(2) is 6. Six steps. What about the third Busy Beaver? In 1965 Rado, together with Shen Lin, proved that BB(3) is 21. The task was an arduous one, requiring human analysis of many machines to prove that they don’t halt—since, remember, there’s no algorithm for listing the Busy Beaver numbers. Next, in 1983, Allan Brady proved that BB(4) is 107. Unimpressed so far? Well, as with the Ackermann sequence, don’t be fooled by the first few numbers. In 1984, A.K. Dewdney devoted a Scientific American column to Busy Beavers, which inspired amateur mathematician George Uhing to build a special-purpose device for simulating Turing machines. The device, which cost Uhing less than $100, found a five-rule machine that runs for 2,133,492 steps before halting—establishing that BB(5) must be at least as high. Then, in 1989, Heiner Marxen and Jürgen Buntrock discovered that BB(5) is at least 47,176,870. To this day, BB(5) hasn’t been pinned down precisely, and it could turn out to be much higher still. As for BB(6), Marxen and Buntrock set another record in 1997 by proving that it’s at least 8,690,333,381,690,951. A formidable accomplishment, yet Marxen, Buntrock, and the other Busy Beaver hunters are merely wading along the shores of the unknowable. Humanity may never know the value of BB(6) for certain, let alone that of BB(7) or any higher number in the sequence. Indeed, already the top five and six-rule contenders elude us: we can’t explain how they ‘work’ in human terms. If creativity imbues their design, it’s not because humans put it there. One way to understand this is that even small Turing machines can encode profound mathematical problems. Take Goldbach’s conjecture, that every even number 4 or higher is a sum of two prime numbers: 10=7+3, 18=13+5. The conjecture has resisted proof since 1742. Yet we could design a Turing machine with, oh, let’s say 100 rules, that tests each even number to see whether it’s a sum of two primes, and halts when and if it finds a counterexample to the conjecture. Then knowing BB(100), we could in principle run this machine for BB(100) steps, decide whether it halts, and thereby resolve Goldbach’s conjecture. We need not venture far in the sequence to enter the lair of basilisks. But as Rado stressed, even if we can’t list the Busy Beaver numbers, they’re perfectly well-defined mathematically. If you ever challenge a friend to the biggest number contest, I suggest you write something like this: BB(11111)—Busy Beaver shift #—1, 6, 21, etc If your friend doesn’t know about Turing machines or anything similar, but only about, say, Ackermann numbers, then you’ll win the contest. You’ll still win even if you grant your friend a handicap, and allow him the entire lifetime of the universe to write his number. The key to the biggest number contest is a potent paradigm, and Turing’s theory of computation is potent indeed. ¨ But what if your friend knows about Turing machines as well? Is there a notational system for big numbers more powerful than even Busy Beavers? Suppose we could endow a Turing machine with a magical ability to solve the Halting Problem. What would we get? We’d get a ‘super Turing machine’: one with abilities beyond those of any ordinary machine. But now, how hard is it to decide whether a super machine halts? Hmm. It turns out that not even super machines can solve this ‘super Halting Problem’, for the same reason that ordinary machines can’t solve the ordinary Halting Problem. To solve the Halting Problem for super machines, we’d need an even more powerful machine: a ‘super duper machine.’ And to solve the Halting Problem for super duper machines, we’d need a ‘super duper pooper machine.’ And so on endlessly. This infinite hierarchy of ever more powerful machines was formalized by the logician Stephen Kleene in 1943 (although he didn’t use the term ‘super duper pooper’). Imagine a novel, which is imbedded in a longer novel, which itself is imbedded in an even longer novel, and so on ad infinitum. Within each novel, the characters can debate the literary merits of any of the sub-novels. But, by analogy with classes of machines that can’t analyze themselves, the characters can never critique the novel that they themselves are in. (This, I think, jibes with our ordinary experience of novels.) To fully understand some reality, we need to go outside of that reality. This is the essence of Kleene’s hierarchy: that to solve the Halting Problem for some class of machines, we need a yet more powerful class of machines. And there’s no escape. Suppose a Turing machine had a magical ability to solve the Halting Problem, and the super Halting Problem, and the super duper Halting Problem, and the super duper pooper Halting Problem, and so on endlessly. Surely this would be the Queen of Turing machines? Not quite. As soon as we want to decide whether a ‘Queen of Turing machines’ halts, we need a still more powerful machine: an ‘Empress of Turing machines.’ And Kleene’s hierarchy continues. But how’s this relevant to big numbers? Well, each level of Kleene’s hierarchy generates a faster-growing Busy Beaver sequence than do all the previous levels. Indeed, each level’s sequence grows so rapidly that it can only be computed by a higher level. For example, define BB2(N) to be the maximum number of steps a super machine with N rules can make before halting. If this super Busy Beaver sequence were computable by super machines, then those machines could solve the super Halting Problem, which we know is impossible. So the super Busy Beaver numbers grow too rapidly to be computed, even if we could compute the ordinary Busy Beaver numbers. You might think that now, in the biggest-number contest, you could obliterate even an opponent who uses the Busy Beaver sequence by writing something like this: BB2(11111). But not quite. The problem is that I’ve never seen these "higher-level Busy Beavers" defined anywhere, probably because, to people who know computability theory, they’re a fairly obvious extension of the ordinary Busy Beaver numbers. So our reasonable modern mathematician wouldn’t know what number you were naming. If you want to use higher-level Busy Beavers in the biggest number contest, here’s what I suggest. First, publish a paper formalizing the concept in some obscure, low-prestige journal. Then, during the contest, cite the paper on your index card. To exceed higher-level Busy Beavers, we’d presumably need some new computational model surpassing even Turing machines. I can’t imagine what such a model would look like. Yet somehow I doubt that the story of notational systems for big numbers is over. Perhaps someday humans will be able concisely to name numbers that make Busy Beaver 100 seem as puerile and amusingly small as our nobleman’s eighty-three. Or if we’ll never name such numbers, perhaps other civilizations will. Is a biggest number contest afoot throughout the galaxy? ¨ You might wonder why we can’t transcend the whole parade of paradigms, and name numbers by a system that encompasses and surpasses them all. Suppose you wrote the following in the biggest number contest: The biggest whole number nameable with 1,000 characters of English text Surely this number exists. Using 1,000 characters, we can name only finitely many numbers, and among these numbers there has to be a biggest. And yet we’ve made no reference to how the number’s named. The English text could invoke Ackermann numbers, or Busy Beavers, or higher-level Busy Beavers, or even some yet more sweeping concept that nobody’s thought of yet. So unless our opponent uses the same ploy, we’ve got him licked. What a brilliant idea! Why didn’t we think of this earlier? Unfortunately it doesn’t work. We might as well have written One plus the biggest whole number nameable with 1,000 characters of English text This number takes at least 1,001 characters to name. Yet we’ve just named it with only 80 characters! Like a snake that swallows itself whole, our colossal number dissolves in a tumult of contradiction. What gives? The paradox I’ve just described was first published by Bertrand Russell, who attributed it to a librarian named G. G. Berry. The Berry Paradox arises not from mathematics, but from the ambiguity inherent in the English language. There’s no surefire way to convert an English phrase into the number it names (or to decide whether it names a number at all), which is why I invoked a "reasonable modern mathematician" in the rules for the biggest number contest. To circumvent the Berry Paradox, we need to name numbers using a precise, mathematical notational system, such as Turing machines—which is exactly the idea behind the Busy Beaver sequence. So in short, there’s no wily language trick by which to surpass Archimedes, Ackermann, Turing, and Rado, no royal road to big numbers. You might also wonder why we can’t use infinity in the contest. The answer is, for the same reason why we can’t use a rocket car in a bike race. Infinity is fascinating and elegant, but it’s not a whole number. Nor can we ‘subtract from infinity’ to yield a whole number. Infinity minus 17 is still infinity, whereas infinity minus infinity is undefined: it could be 0, 38, or even infinity again. Actually I should speak of infinities, plural. For in the late nineteenth century, Georg Cantor proved that there are different levels of infinity: for example, the infinity of points on a line is greater than the infinity of whole numbers. What’s more, just as there’s no biggest number, so too is there no biggest infinity. But the quest for big infinities is more abstruse than the quest for big numbers. And it involves, not a succession of paradigms, but essentially one: Cantor’s. ¨ So here we are, at the frontier of big number knowledge. As Euclid’s disciple supposedly asked, "what is the use of all this?" We’ve seen that progress in notational systems for big numbers mirrors progress in broader realms: mathematics, logic, computer science. And yet, though a mirror reflects reality, it doesn’t necessarily influence it. Even within mathematics, big numbers are often considered trivialities, their study an idle amusement with no broader implications. I want to argue a contrary view: that understanding big numbers is a key to understanding the world. Imagine trying to explain the Turing machine to Archimedes. The genius of Syracuse listens patiently as you discuss the papyrus tape extending infinitely in both directions, the time steps, states, input and output sequences. At last he explodes. "Foolishness!" he declares (or the ancient Greek equivalent). "All you’ve given me is an elaborate definition, with no value outside of itself." How do you respond? Archimedes has never heard of computers, those cantankerous devices that, twenty-three centuries from his time, will transact the world’s affairs. So you can’t claim practical application. Nor can you appeal to Hilbert and the formalist program, since Archimedes hasn’t heard of those either. But then it hits you: the Busy Beaver sequence. You define the sequence for Archimedes, convince him that BB(1000) is more than his 1063 grains of sand filling the universe, more even than 1063 raised to its own power 1063 times. You defy him to name a bigger number without invoking Turing machines or some equivalent. And as he ponders this challenge, the power of the Turing machine concept dawns on him. Though his intuition may never apprehend the Busy Beaver numbers, his reason compels him to acknowledge their immensity. Big numbers have a way of imbuing abstract notions with reality. Indeed, one could define science as reason’s attempt to compensate for our inability to perceive big numbers. If we could run at 280,000,000 meters per second, there’d be no need for a special theory of relativity: it’d be obvious to everyone that the faster we go, the heavier and squatter we get, and the faster time elapses in the rest of the world. If we could live for 70,000,000 years, there’d be no theory of evolution, and certainly no creationism: we could watch speciation and adaptation with our eyes, instead of painstakingly reconstructing events from fossils and DNA. If we could bake bread at 20,000,000 degrees Kelvin, nuclear fusion would be not the esoteric domain of physicists but ordinary household knowledge. But we can’t do any of these things, and so we have science, to deduce about the gargantuan what we, with our infinitesimal faculties, will never sense. If people fear big numbers, is it any wonder that they fear science as well and turn for solace to the comforting smallness of mysticism? But do people fear big numbers? Certainly they do. I’ve met people who don’t know the difference between a million and a billion, and don’t care. We play a lottery with ‘six ways to win!,’ overlooking the twenty million ways to lose. We yawn at six billion tons of carbon dioxide released into the atmosphere each year, and speak of ‘sustainable development’ in the jaws of exponential growth. Such cases, it seems to me, transcend arithmetical ignorance and represent a basic unwillingness to grapple with the immense. Whence the cowering before big numbers, then? Does it have a biological origin? In 1999, a group led by neuropsychologist Stanislas Dehaene reported evidence in Science that two separate brain systems contribute to mathematical thinking. The group trained Russian-English bilinguals to solve a set of problems, including two-digit addition, base-eight addition, cube roots, and logarithms. Some subjects were trained in Russian, others in English. When the subjects were then asked to solve problems approximately—to choose the closer of two estimates—they performed equally well in both languages. But when asked to solve problems exactly, they performed better in the language of their training. What’s more, brain-imaging evidence showed that the subjects’ parietal lobes, involved in spatial reasoning, were more active during approximation problems; while the left inferior frontal lobes, involved in verbal reasoning, were more active during exact calculation problems. Studies of patients with brain lesions paint the same picture: those with parietal lesions sometimes can’t decide whether 9 is closer to 10 or to 5, but remember the multiplication table; whereas those with left-hemispheric lesions sometimes can’t decide whether 2+2 is 3 or 4, but know that the answer is closer to 3 than to 9. Dehaene et al. conjecture that humans represent numbers in two ways. For approximate reckoning we use a ‘mental number line,’ which evolved long ago and which we likely share with other animals. But for exact computation we use numerical symbols, which evolved recently and which, being language-dependent, are unique to humans. This hypothesis neatly explains the experiment’s findings: the reason subjects performed better in the language of their training for exact computation but not for approximation problems is that the former call upon the verbally-oriented left inferior frontal lobes, and the latter upon the spatially-oriented parietal lobes. If Dehaene et al.’s hypothesis is correct, then which representation do we use for big numbers? Surely the symbolic one—for nobody’s mental number line could be long enough to contain , 5 pentated to the 5, or BB(1000). And here, I suspect, is the problem. When thinking about 3, 4, or 7, we’re guided by our spatial intuition, honed over millions of years of perceiving 3 gazelles, 4 mates, 7 members of a hostile clan. But when thinking about BB(1000), we have only language, that evolutionary neophyte, to rely upon. The usual neural pathways for representing numbers lead to dead ends. And this, perhaps, is why people are afraid of big numbers. Could early intervention mitigate our big number phobia? What if second-grade math teachers took an hour-long hiatus from stultifying busywork to ask their students, "How do you name really, really big numbers?" And then told them about exponentials and stacked exponentials, tetration and the Ackermann sequence, maybe even Busy Beavers: a cornucopia of numbers vaster than any they’d ever conceived, and ideas stretching the bounds of their imaginations. Who can name the bigger number? Whoever has the deeper paradigm. Are you ready? Get set. Go. References Petr Beckmann, A History of Pi, Golem Press, 1971. Allan H. Brady, "The Determination of the Value of Rado’s Noncomputable Function Sigma(k) for Four-State Turing Machines," Mathematics of Computation, vol. 40, no. 162, April 1983, pp 647- 665. Gregory J. Chaitin, "The Berry Paradox," Complexity, vol. 1, no. 1, 1995, pp. 26- 30. At http://www.umcs.maine.edu/~chaitin/unm2.html. A.K. Dewdney, The New Turing Omnibus: 66 Excursions in Computer Science, W.H. Freeman, 1993. S. Dehaene and E. Spelke and P. Pinel and R. Stanescu and S. Tsivkin, "Sources of Mathematical Thinking: Behavioral and Brain-Imaging Evidence," Science, vol. 284, no. 5416, May 7, 1999, pp. 970- 974. Douglas Hofstadter, Metamagical Themas: Questing for the Essence of Mind and Pattern, Basic Books, 1985. Chapter 6, "On Number Numbness," pp. 115- 135. Robert Kanigel, The Man Who Knew Infinity: A Life of the Genius Ramanujan, Washington Square Press, 1991. Stephen C. Kleene, "Recursive predicates and quantifiers," Transactions of the American Mathematical Society, vol. 53, 1943, pp. 41- 74. Donald E. Knuth, Selected Papers on Computer Science, CSLI Publications, 1996. Chapter 2, "Mathematics and Computer Science: Coping with Finiteness," pp. 31- 57. Dexter C. Kozen, Automata and Computability, Springer-Verlag, 1997. ———, The Design and Analysis of Algorithms, Springer-Verlag, 1991. Shen Lin and Tibor Rado, "Computer studies of Turing machine problems," Journal of the Association for Computing Machinery, vol. 12, no. 2, April 1965, pp. 196- 212. Heiner Marxen, Busy Beaver, at http://www.drb.insel.de/~heiner/BB/. ——— and Jürgen Buntrock, "Attacking the Busy Beaver 5," Bulletin of the European Association for Theoretical Computer Science, no. 40, February 1990, pp. 247- 251. Tibor Rado, "On Non-Computable Functions," Bell System Technical Journal, vol. XLI, no. 2, May 1962, pp. 877- 884. Rudy Rucker, Infinity and the Mind, Princeton University Press, 1995. Carl Sagan, Billions & Billions, Random House, 1997. Michael Somos, "Busy Beaver Turing Machine." At http://grail.cba.csuohio.edu/~somos/bb.html. Alan Turing, "On computable numbers, with an application to the Entscheidungsproblem," Proceedings of the London Mathematical Society, Series 2, vol. 42, pp. 230- 265, 1936. Reprinted in Martin Davis (ed.), The Undecidable, Raven, 1965. Ilan Vardi, "Archimedes, the Sand Reckoner," at http://www.ihes.fr/~ilan/sand_reckoner.ps. Eric W. Weisstein, CRC Concise Encyclopedia of Mathematics, CRC Press, 1999. Entry on "Large Number" at http://www.treasure-troves.com/math/LargeNumber.html. Back to Writings page Back to Scott's homepage Back to Scott's blog

      Why do we even care about big numbers is there any use?

    1. Applied Ecology textbook.

      I really appreciate this project overall as this will really mean a lot to some scientists that may not ever be in a textbook when they should be. Even if they don't see it I think it's awesome we can do something about giving more people appreciation for their work they may not get.

    1. Author Response

      Reviewer #1 (Public Review):

      “The synthesis and metabolism of sphingolipid (SL) are involved in wide range of biological processes. In the present study, the authors investigate the role of SPTLC1, one of the essential subunits of serine palmitoyl transferase complex, in both physiological and pathophysiological angiogenesis, via using inducible endothelial-specific SPTLC1 knockout mice. They found SPTLC1 deficiency in ECs inhibited retinal angiogenesis along with reducing several SL metabolites in plasma, red blood cells, and peripheral organs. In addition, the authors found SPTLC1 EC-KO mice are resistant to APAP-induced liver injury. Overall, the in vivo findings in the present study are of potential interest and the authors have given clear evidence that endothelial SPTLC1 is critical to retinal angiogenesis. However, the underlying mechanisms are completely lacking in the present study. Most of the evidence provided is circumstantial, associative, and indirect.”

      We appreciate the positive comments of the reviewer. We have addressed the reviewer’s concern regarding underlying mechanisms as detailed below.

      “To be specific,

      1. The authors found endothelial SPTLC1 is important to both angiogenesis and the plasma lipid profile. However, the authors did not present the data to demonstrate the relationship between them. The in vivo findings about the phenotype and the plasma lipid profile might be true and unrelated. It would be important to know whether supplementing the reduced lipid induced by SPTLC1 KO could rescue the angiogenesis related phenotype in mice, or, whether the alternative way to inhibit the SL synthesis could mimic the phenotype of KO mice.”

      In the manuscript, we discussed the possibility whether S1P is involved, since it is one of the most down-regulated SL in the plasma and a major regulator of angiogenesis. We think it is unlikely that reduced plasma S1P is responsible for the phenotype. First, the retinal angiogenesis defect in Sptlc1 ECKO mice is the opposite of S1pr1 ECKO as we have published previously (PMID: 22975328, PMID: 32059774). Moreover, deletion of sphingosine kinase, the enzyme produces S1P, in the endothelium does not influence retinal angiogenesis at P6 (Figure 3 Supplement 2 A and B). Loss of S1P chaperone ApoM- i.e., Apom KO, which exhibits 50% reduction of plasma S1P, does not show change in retinal vascular development (Figure 3 Supplement 2 C and D). Taken together, our results strongly suggest that reduction in plasma S1P is not the cause of vascular defect in Sptlc1 ECKO retinas.

      Based on our results in the manuscript, loss of SPT enzyme activity in endothelial cells reduced SL species in the endothelial cells and the plasma. Our in vitro and VEGF intraocular injection experiments (new data) suggests that the angiogenic defects seen in Sptlc1 ECKO mice is due to cell intrinsic defects in VEGF signaling and not due to changes in plasma SL levels. We have edited the discussion section to address this issue.

      “2. A major issue is that the present study did not reveal is a real downstream target. It is possible that VEGF signaling might be impaired by SPTLC1 knockout as discussed by the authors. However, the authors did not demonstrate this point with data. Including both in vivo and in vitro data to evaluate the effects of SPTLC1 deficiency on VEGF signaling might further strengthen the hypothesis. Besides, with in vitro experiments, the authors might further find the critical metabolite(s) involved in VEGF signaling and angiogenesis.”

      As discussed above, we agree with the review’s critique and have addressed this essential point with new experiments (both in vitro and in vivo) in Figure 5. Our new data shows that SPT pathway supplies the glycosphingolipid GM1, which is needed for efficient VEGF-induced ERK phosphorylation and tip cell formation.

      Reviewer #2 (Public Review):

      “Andrew Kuo et al. investigated the role of endothelial de novo sphingolipids (SL) synthesis using endothelial cell specific SPTLC1 knockout (ECKO) mice. They showed that these mice exhibited low concentration of various SL species in not only ECs but also RBC, circulation, and other non-EC tissues. They also showed that ECKO mice exhibited impaired angiogenesis in normal and oxygen-induced retinopathy models, consistent with the decrease of endothelial proliferation and tip cell formation. They finally revealed that these mice were resistant to acetaminophen-induced acute liver injury in early phase. The experiments were well-designed, and the results were clear and convincing. The authors concluded that endothelial cells were the major source of SL in circulation and various organs (liver and lung) other than retina (and probably brain). The weakness of the current version of the manuscript is that the authors did not elucidate the mechanisms underlying the observed phenomena.

      1) The authors showed impaired angiogenesis in ECKO mice using neonatal retina model. Based on the fact that this phenotype was similar to that in endothelial VEGFR2 deficient mice, they suggested that VEGF responsiveness is altered in ECKO mice. Although this hypothesis is plausible, the authors would need to prove it by evaluating VEGFR signaling (VEGFR phosphorylation, Akt activation etc.) in ECKO mice.”

      We thank the reviewer for positive comments. As for the weakness identified, we have addressed this point by conducting new in vitro and in vivo experiments (detailed above). The new Figure 5 addresses this issue directly.

      “2) The acetaminophen-induced liver injury was reduced in ECKO mice in early phase. However, it is still unclear whether SL production itself affects liver injury. The authors discussed the possibility that gene deficiency increases unconsumed serine resulting in GSH increase, but it is essentially independent to SL. If possible, it would be good if the authors could investigate the effect of SL administration on the liver injury progression.”

      We appreciate the reviewer’s concern about liver injury model in the Sptlc1 ECKO mice. Our data suggests that SL species supplied from EC impacts hepatocyte response to stress. Since the acetaminophen induced liver injury is highly dependent on reactive oxygen species, our finding that increased glutathione levels in the Sptlc1 ECKO mice may be involved in the phenotype. However, we are simply considering them as biochemical markers of liver injury. This has been addressed in the discussion.

      “3) This paper showed the impaired cell proliferation in Sptlc1 KO EC mice, and discussed it. Authors described that this phenotype was similar to that of Nos3 KO mice, but its inconsistency with Sptlc2 ECKO adult mice was only justified by a word "isoform-selective function". Authors could quantify eNOS expressions in Sptlc1 KO mice, compared results and then discuss this matter. “

      In figure 1C, we used eNOS as an EC marker to show purity during our EC isolation process. In fact, we did not observe change of eNOS expression in Sptlc1 ECKO. We also did not detect elevated phospho-eNOS in Sptl1c ECKO in contrast to Sptlc2 ECKO adult mice (Figure1 supplement 4). Additionally, our work in the retina was performed in postnatal-genedeletion pups from P6-P17 which is different from the published Sptlc2 ECKO study. The differences in gene deletion strategy (early postnatal vs. adult) could result in differences in eNOS expression . We have added discussion about this issue.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Ciliates extensively rearrange their somatic genome every time a new somatic nucleus develops from the zygotic germline nucleus. In this manuscript, Feng et al report the sequencing, assembly and annotation of the germline and somatic genomes of Euplotes woodruffi and the germline genome of Tetmemena sp. (whose somatic genome was sequenced and assembled by the same lab in 2015). They present a comparative analysis of developmentally programmed genome rearrangements in these two species and in the model ciliate Oxytricha trifallax. Their major findings are that:

      (i) E. woodruffi and Tetmemena sp. eliminate a smaller fraction of their germline genome (~54%) from their somatic macronucleus (MAC) than O. trifallax (>80%)

      (ii) Transposable elements (TE) represent a smaller fraction of the germline genome (~2%) in the first two ciliates than in O. trifallax (~15%). TEs are mainly located at the boundaries of germline chromosomes and in intergenic regions, but can also be found inside IESs

      (iii) Several thousands of genes are scrambled in the germline genome of all three species

      The authors have also addressed the possible origin of gene scrambling. They report an interesting association with local paralogy and propose a model for the emergence of the odd-even pattern of gene unscrambling between two paralogous copies.

      Major comments:

      1. Based on the statistics presented in Table 1, genome assemblies are of good quality, with a reasonable N50 size of germline (MIC) contigs. It seems, however, that no entire MIC chromosome could be assembled, since no two-telomere contig is mentioned in the list. As proposed by the authors (p.7) the presence of numerous TEs at the boundaries of MIC contigs (Fig S1) may have hindered the assembly of MIC chromosome ends. I would have appreciated to have more information on the "other repeats" (which seem to differ from tandem repeats according to Fig 2) and their location along MIC contigs.

        Subcategories of “other repeats” were included in Table S2 based on Repeatmasker annotations. We now analyzed the locations of other repeats in MIC contigs and include those as well in new Figure S1B. About 30% of “other” transposable elements are present at the boundaries of MIC contigs, which may also hinder the assembly. Notably, 35-45% of “other TEs” are in assembled, intergenic regions.

      The definition of "Internal Eliminated Sequences" (IES) is not clear. The authors make a distinction between IESs and TEs. I understand that IESs are DNA segments that separate two macronuclear-destined sequences (MDS) in the germline genome. Thus they appear to be restricted to those regions that eventually yield gene-sized MAC chromosomes. IESs are eliminated between two pointers that may not be identical on both sides in case of scrambled genes. Some clarification is needed here.

      To illustrate my point: I found the statement "with many TE insertions within IESs, suggesting that TE insertions may have generated IESs" particularly confusing (p. 9 lines 5-6). Does this mean that IESs extend beyond the ends of inserted TEs? The legend of Fig S1 should also be clarified.

      We clarified the text and legend. IESs can extend beyond the ends of inserted TEs, even if the original IES is a decayed TE, due to subsequent sequence evolution at the boundaries or if the original insertion was into an existing IES. David Prescott referred to sequence evolution at the edges of IESs as “pointer sliding” (ref.36).

      p. 10 lines 2-4 and Fig S2: Could the authors explain the difference they make between MDS (in the text) and CDS (in Fig S2)? My understanding is that a CDS is the entire gene coding sequence and may be made of multiple MDSs. If this is correct, the sentence should read "We compared the number of MDSs between single-copy orthologs for single-gene MAC chromosomes across the three species and found that the orthologs have similar CDS lengths".

      Yes, we made the correction.

      p. 12 lines 10-15: the discovery that paralogous MDSs can be found in scrambled genomic loci is interesting. If the two paralogs can be distinguished based on the number of substitutions, it would be informative to go back to individual reads and check whether each of the two copies can be incorporated in the unscrambled CDS (and at which frequency). Would the pointers be compatible with this?

      The paralogous MDSs in the MIC are often not identical. The copy with the highest similarity is assigned as “preliminary match” by SDRAP (ref. 52), and others are assigned as “additional matches”. To validate SDRAP assignments, we did pairwise BLASTN alignments (“-task megablast”) of paralogous MIC MDSs and their corresponding MAC MDSs. We confirmed that in the three species, the preliminary match has the best or equally best pid (percentage of identity) in most cases. Therefore, the MDS assigned as preliminary match is more likely the paralog incorporated into the MAC chromosome.

      We used genome assemblies of Euplotes woodruffi, which had the highest Nanopore coverage, to further investigate the frequency of MDS incorporation. We followed the reviewer’s suggestion and called SNP variants on both MAC and MIC genomes. For MAC SNP calling, we used Illumina reads as input for freebayes (ref a). For MIC SNP calling, we used Nanopore reads, instead of Illumina reads, to avoid non-specific short-read mapping on paralogous MDSs and to avoid the presence of any contaminating MAC reads. Variants were called and phased by PEPPER-Margin-DeepVariant (ref b), a new tool published in 2021 in Nature Methods, which has been reported to have similar accuracy to Illumina read variant calling, especially at high read coverage. We used the parameter “--pepper_min_coverage_threshold 20” to call confident variants when at least 20 reads cover the position. Only 92 MIC SNPs in the paralogous MDSs passed all filters of the program. Using this small set of MIC SNPs, we were unfortunately unable to distinguish which paralogous MIC MDS was incorporated into the MAC. Therefore, we cannot infer with what frequency one paralogous MDS is incorporated over another, until they become sufficiently diverged, which is compatible with the model.

      a. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907. 2012 Jul 17.

      b. Shafin K, Pesout T, Chang PC, Nattestad M, Kolesnikov A, Goel S, Baid G, Kolmogorov M, Eizenga JM, Miga KH, Carnevali P. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nature methods. 2021 Nov;18(11):1322-32.

      The hypothesis that odd-even scrambled loci have evolved from paralogous genes in E. woodruffi is supported by the existence of paralogous MDSs, length conservation of MDS/IES pairs and sequence similarity between corresponding MDS and IES in a pair. The correlations presented for Oxytricha and Tetmemena are much less convincing (Fig S5D and E). I recommend that the authors are even more cautious in their statement on p.13 ("For Oxytricha and Tememena, the MDS and IES lengths for such MDS/IES pairs also correlate positively, but more moderately").

      Thank you, we rephrased the text.

      p. 15 last paragraph: Why did the authors focus only on TBEs inserted in non-scrambled IESs to look for orthologous TBE insertions? Is there a reason to believe that no recent TBE insertion occurred at other genomic loci? Or was it only for practical reasons? It is also not clear to me whether the authors have considered full-length TBEs or the presence of at least one TBE ORF.

      This analysis was limited for practical reasons, because we identify position conservation of TBEs by aligning protein sequences of MAC genes. We only consider TBEs inserted in non-scrambled IESs in exons. It would be difficult and less meaningful to align completely non-coding MIC-limited regions.

      Partial TBEs are also included if they contain at least one TBE ORF (detected by BLAST).

      Furthermore, TE insertion cannot explain the origin of scrambled IESs, and TEs rarely map to scrambled IESs (Figure S1A), but there is a clear evolutionary model for the origin of nonscrambled IESs from decay of TBEs (ref. 49). Initial purifying selection would act on the TE to maintain its ability to self-excise, whereas we advocate for a different model for the origin of scrambled IESs by decay of paralogous MDSs.

      p. 16: the authors report that some introns of E. woodruffi map "near" Oxytricha/Tetmemena pointers. How near? Based on the information provided by the authors, I don't think this observation necessarily implies that IESs were converted to introns (or reciprocally) during evolution. If this were true, shouldn't at least one intron boundary coincide exactly with a pointer? The authors should clarify this (also in the discussion, on p. 20, top paragraph).

      We used a 20bp window (~7 amino acids), as described in the Methods, and added that to the Results. Full detail is provided in the Methods section, “Ortholog comparison pipeline and Monte Carlo simulations”. 103 E. woodruffi introns are within 20bp from the midpoint of Oxytricha/Tetmemena pointers. Among these, 43 intron boundaries overlap an Oxytricha or Tetmemena pointer. We observed 306 cases of precisely matching boundaries between any two species, where the exon junction of one species maps inside the MDS/IES pointer of another species, although we would only expect the boundaries of introns and IESs to coincide so precisely if they were recent conversions. Hence we feel that a window analysis is informative.

      p. 19 2nd paragraph: the suggested mechanism explaining the 5' bias of IESs in E. woodruffi genes is unclear. How could germline recombination take place between a MIC chromosome and a MAC reverse transcript or nanochromosome? This would imply that DNA could be imported in the MIC. Is there evidence that this might occur?

      The ability of TEs to invade the MIC demonstrates that even foreign DNA can be incorporated into the MIC. Since MAC DNA is present at high copy number, it offers a potential source for a recombination template that could erase IESs, as could an errant reverse transcript of one of the long noncoding template RNAs. Any of these would be infrequent events that would matter on an evolutionary time scale even if developmentally rare.

      According to Figure 1, no scrambled genes have been reported in Paramecium tetraurelia. Within the frame of the proposed model, this is somewhat unexpected because this ciliate went through several whole genome duplications during evolution and harbors many paralogous gene pairs. Is there a reason why no gene scrambling took place in Paramecium?

      Paramecium uses only TA dinucleotide pointers for IES elimination, unlike the rich diversity of pointers in spirotrichous ciliates. This limitation in its machinery may explain why no scrambled loci have been observed in Paramecium, despite the abundance of paralogs. Our model suggests that local MIC paralogy is associated with the origin of scrambling. But most of the paralogy reported in Paramecium is at the level of whole chromosomes in the MAC (ref. 104) rather than local MIC paralogy.

      Minor comments:

      p. 4 (4th bottom line): To my knowledge, ref #28 presents a draft (incomplete) MIC assembly of the Paramecium genome.

      Thank you, we added reference 29 and adjusted the wording describing the quality of MIC genome draft assemblies.

      p. 7 (last paragraph): "encoding" should be replaced by "carrying"

      Thank you, we made the change.

      p. 10 (2nd paragraph): insert a missing "o" into "nanochromosomes"

      Thank you, corrected.

      p. 10 (same paragraph): the weak 5' bias of IES distribution in Tetmemena should be shown (either as an additional panel in Fig 3 or in a Sup Figure.

      Thank you, we added it as Figure S2C.

      p. 24 2nd paragraph: "a" is missing in "Trinity, which is a software..."

      Thank you, we made the correction.

      CROSS-CONSULTATION COMMENTS

      I agree with most comments of reviewer 3.

      The authors have actually defined "TE" in the introduction (p. 6). Depending on the journal's rules for abbreviation use, it may not be necessary to define it again in the results section

      Reviewer #1 (Significance (Required)):

      Ciliates are unicellular models to study developmentally programmed genome rearrangements at the mechanistic, genome-wide and evolutionary levels. These aspects have so far mostly been addressed in three species: P. tetraurelia and Tetrahymena thermophila on the one hand, the spirotrichous ciliate O. trifallax on the other.

      One new piece of information that can be found in the present manuscript is the assembly and annotation of the germline genome of two novel species: Tetmemena sp, closely related to Oxytricha, and the more distant E. woodruffi. Feng et al establish that, similar to other ciliates, Tetmemena and Euplotes eliminate TEs and other germline-specific sequences during programmed genome rearrangements. They also undergo extensive gene unscrambling, which results in IES removal and MDS reordering to assemble coding sequences.

      A TE origin was discussed previously for Paramecium (Arnaiz et al PLoS Genet; Sellis et al 2021 PLoS Biol) and Tetrahymena IESs (Hamilton et al 2016 eLife). While this may also hold true in spirotrichous ciliatesThe present manuscript proposes a completely new evolutionary scenario for IESs from scrambled genes. Here, Feng et al establish that scrambled genes of spirotrichous ciliates tend to be associated with local paralogy. They provide evidence supporting that IESs from scrambled genes may have evolved from paralogous MDSs.

      Although I am more an expert in the molecular mechanisms involved in genome rearrangements, I feel that the work reported here should draw the attention of a broader audience interested in genome dynamics and evolution, beyond the specific field of spirotrichous ciliate biology.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Feng et al. provide a solid analysis of the evolution of genome rearrangement in spirotrich ciliates. The authors applied a variety of state-of-the-art sequencing and bioinformatic methods to investigate the intriguing and extremely complex patterns of genome architecture in this protist lineage. Methods (including statistical analyses) are adequate and explained in detail. Results and discussions reflect careful, clever analysis of the data and excellent linkage with the literature. Figures and tables complement the text in a compelling way. I have only minor suggestions:

      Summary: more gradually introduce Spirotrichea and the phylogenetic relationship among the three species analyzed. This would better position the reader to understand the evolutionary context you are working in. Also, it would be helpful to more clearly differentiate novel vs. existing data. A suggestion: "This study focuses on three spirotrich species: two in the family Oxytrichidae (Oxytricha trifallax and Tetmemena sp) and Euplotes woodruffi as an outgroup. To complement existing data, we sequenced, assembled and annotated the germiline and somatic genomes of E. woodruffi and the germline genome of Tetmemena sp."

      Thank you, we clarified the summary (abstract).

      Introduction, first paragraph: Replace "The species in this study..." for a more precise statement, such as "The three spirotrich species studied here..."

      Thank you, we have made this statement more precise.

      p. 4: This sentence is unclear: "These useful tools provide partial insight to guide selection of species for full genome sequencing, which allows construction of complete rearrangement maps of a MIC genome onto a MAC genome for a reference species."

      Thank you, we have clarified this sentence.

      p. 8: define TE on first mention.

      Defined on page 6.

      Table 1. Indicate which MIC and MAC data are from this study.

      References are included for published data and a note has been added to indicate data from this study.

      Reviewer #3 (Significance (Required)):

      The present work represents a significant advance in the field of evolutionary genomics. The focus of the paper is on ciliates, an ancient (2 billion-year old) and highly diverse eukaryotic phylum that presents many peculiarities, including sex, nuclear dimorphism, genome rearrangement, high numbers of paralogs and transposons, etc. While some data exist on a few model ciliates of disparate phylogenetic position, this work focuses on two species taxonomically placed in the same family, plus a more distant outgroup within the same class. This gives a novel dimension to this study, that goes beyond exploring genome architecture in a single clade. Instead, it allows to explore evolutionary trends in genome rearrangement among relatively closely related species. This paper should be of high interest not only for ciliate biologists (like me), but also in relation to comparative genomics of protists/eukaryotes and germ-soma biology. I highly recommend publication.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Ciliates extensively rearrange their somatic genome every time a new somatic nucleus develops from the zygotic germline nucleus. In this manuscript, Feng et al report the sequencing, assembly and annotation of the germline and somatic genomes of Euplotes woodruffi and the germline genome of Tetmemena sp. (whose somatic genome was sequenced and assembled by the same lab in 2015). They present a comparative analysis of developmentally programmed genome rearrangements in these two species and in the model ciliate Oxytricha trifallax. Their major findings are that:

      (1) E. woodruffi and Tetmemena sp. eliminate a smaller fraction of their germline genome (~54%) from their somatic macronucleus (MAC) than O. trifallax (>80%)

      (2) Transposable elements (TE) represent a smaller fraction of the germline genome (~2%) in the first two ciliates than in O. trifallax (~15%). TEs are mainly located at the boundaries of germline chromosomes and in intergenic regions, but can also be found inside IESs

      (3) Several thousands of genes are scrambled in the germline genome of all three species

      The authors have also addressed the possible origin of gene scrambling. They report an interesting association with local paralogy and propose a model for the emergence of the odd-even pattern of gene unscrambling between two paralogous copies.

      Major comments:

      (1) Based on the statistics presented in Table 1, genome assemblies are of good quality, with a reasonable N50 size of germline (MIC) contigs. It seems, however, that no entire MIC chromosome could be assembled, since no two-telomere contig is mentioned in the list. As proposed by the authors (p.7) the presence of numerous TEs at the boundaries of MIC contigs (Fig S1) may have hindered the assembly of MIC chromosome ends. I would have appreciated to have more information on the "other repeats" (which seem to differ from tandem repeats according to Fig 2) and their location along MIC contigs.

      (2) The definition of "Internal Eliminated Sequences" (IES) is not clear. The authors make a distinction between IESs and TEs. I understand that IESs are DNA segments that separate two macronuclear-destined sequences (MDS) in the germline genome. Thus they appear to be restricted to those regions that eventually yield gene-sized MAC chromosomes. IESs are eliminated between two pointers that may not be identical on both sides in case of scrambled genes. Some clarification is needed here.

      To illustrate my point: I found the statement "with many TE insertions within IESs, suggesting that TE insertions may have generated IESs" particularly confusing (p. 9 lines 5-6). Does this mean that IESs extend beyond the ends of inserted TEs? The legend of Fig S1 should also be clarified.

      (3) p. 10 lines 2-4 and Fig S2: Could the authors explain the difference they make between MDS (in the text) and CDS (in Fig S2)? My understanding is that a CDS is the entire gene coding sequence and may be made of multiple MDSs. If this is correct, the sentence should read "We compared the number of MDSs between single-copy orthologs for single-gene MAC chromosomes across the three species and found that the orthologs have similar CDS lengths".

      (4) p. 12 lines 10-15: the discovery that paralogous MDSs can be found in scrambled genomic loci is interesting. If the two paralogs can be distinguished based on the number of substitutions, it would be informative to go back to individual reads and check whether each of the two copies can be incorporated in the unscrambled CDS (and at which frequency). Would the pointers be compatible with this?

      (5) The hypothesis that odd-even scrambled loci have evolved from paralogous genes in E. woodruffi is supported by the existence of paralogous MDSs, length conservation of MDS/IES pairs and sequence similarity between corresponding MDS and IES in a pair. The correlations presented for Oxytricha and Tetmemena are much less convincing (Fig S5D and E). I recommend that the authors are even more cautious in their statement on p.13 ("For Oxytricha and Tememena, the MDS and IES lengths for such MDS/IES pairs also correlate positively, but more moderately").

      (6) p. 15 last paragraph: Why did the authors focus only on TBEs inserted in non-scrambled IESs to look for orthologous TBE insertions? Is there a reason to believe that no recent TBE insertion occurred at other genomic loci? Or was it only for practical reasons? It is also not clear to me whether the authors have considered full-length TBEs or the presence of at least one TBE ORF.

      (7) p. 16: the authors report that some introns of E. woodruffi map "near" Oxytricha/Tetmemena pointers. How near? Based on the information provided by the authors, I don't think this observation necessarily implies that IESs were converted to introns (or reciprocally) during evolution. If this were true, shouldn't at least one intron boundary coincide exactly with a pointer? The authors should clarify this (also in the discussion, on p. 20, top paragraph).

      (8) p. 19 2nd paragraph: the suggested mechanism explaining the 5' bias of IESs in E. woodruffi genes is unclear. How could germline recombination take place between a MIC chromosome and a MAC reverse transcript or nanochromosome? This would imply that DNA could be imported in the MIC. Is there evidence that this might occur?

      (9) According to Figure 1, no scrambled genes have been reported in Paramecium tetraurelia. Within the frame of the proposed model, this is somewhat unexpected because this ciliate went through several whole genome duplications during evolution and harbors many paralogous gene pairs. Is there a reason why no gene scrambling took place in Paramecium?

      Minor comments:

      • p. 4 (4th bottom line): To my knowledge, ref #28 presents a draft (incomplete) MIC assembly of the Paramecium genome.

      • p. 7 (last paragraph): "encoding" should be replaced by "carrying"

      • p. 10 (2nd paragraph): insert a missing "o" into "nanochromosomes"

      • p. 10 (same paragraph): the weak 5' bias of IES distribution in Tetmemena should be shown (either as an additional panel in Fig 3 or in a Sup Figure.

      • p. 24 2nd paragraph: "a" is missing in "Trinity, which is a software..."

      CROSS-CONSULTATION COMMENTS

      I agree with most comments of reviewer 3.

      The authors have actually defined "TE" in the introduction (p. 6). Depending on the journal's rules for abbreviation use, it may not be necessary to define it again in the results section

      Significance

      • Ciliates are unicellular models to study developmentally programmed genome rearrangements at the mechanistic, genome-wide and evolutionary levels. These aspects have so far mostly been addressed in three species: P. tetraurelia and Tetrahymena thermophila on the one hand, the spirotrichous ciliate O. trifallax on the other.

      • One new piece of information that can be found in the present manuscript is the assembly and annotation of the germline genome of two novel species: Tetmemena sp, closely related to Oxytricha, and the more distant E. woodruffi. Feng et al establish that, similar to other ciliates, Tetmemena and Euplotes eliminate TEs and other germline-specific sequences during programmed genome rearrangements. They also undergo extensive gene unscrambling, which results in IES removal and MDS reordering to assemble coding sequences.

      • A TE origin was discussed previously for Paramecium (Arnaiz et al PLoS Genet; Sellis et al 2021 PLoS Biol) and Tetrahymena IESs (Hamilton et al 2016 eLife). While this may also hold true in spirotrichous ciliatesThe present manuscript proposes a completely new evolutionary scenario for IESs from scrambled genes. Here, Feng et al establish that scrambled genes of spirotrichous ciliates tend to be associated with local paralogy. They provide evidence supporting that IESs from scrambled genes may have evolved from paralogous MDSs.

      • Although I am more an expert in the molecular mechanisms involved in genome rearrangements, I feel that the work reported here should draw the attention of a broader audience interested in genome dynamics and evolution, beyond the specific field of spirotrichous ciliate biology.

    1. Author Response

      Reviewer #1 (Public Review):

      1) While the authors identify the suppressors in known genetic interactors (GIs) of the yeast SEC53, it is worth testing if the compensatory mutations are rewiring the GIs, thereby explaining the lack of comparable compensations observed in reconstituted strains. If altered GIs explain the suppression, then while yeast serves as an excellent tool to perform these assays, the human context of the disease may require a different set of genetic suppressors and, therefore, a different target than the yeast PGM1 ortholog.

      Our data show that pgm1 mutations alone greatly improve growth of sec53-V238M strains. Our data also indicate other pathways of compensation. Whether each of these compensatory mechanisms translate to humans is unknown. However, the observed enrichment of compensatory mutations in genes whose human homologs are associated with Type 1 CDG, suggests that many of these genetic interactions are likely to be conserved.

      Also, are Sec53 and Pgm1 proteins directly interacting in yeast and whether these mutations are on the interaction interface?

      As we mention above, there is no support for a direct physical interaction between Sec53 and Pgm1.

      2) Based on the data obtained between pACT1 and pSEC53-driven expression of the SEC53 mutant alleles, the pattern of suppressors appears to be different. Authors report that the variants expressed from strong pACT1 promoters show more suppressors than those driven by native promoters. Is this a general trend in experimental evolution that slower-growing strains tend to show lesser suppressors? For example, on Page 6, line 154, "compensating for Sec53-F126L dimerization defects are rare or not easily accessible". The statement suggests that the authors did obtain suppressors that compensate for the dimerization defect. At the same time, while rare (also, are authors suggesting suppression of dimerization defect as in better dimerization?), the rate of obtaining suppressors seems to be linked to the severity of the fitness defects of the strains. The lack of suppressors may be a limitation of the evolution experiments. Indeed later in the manuscript, the authors noticed that while PGM1 suppressors obtained in V238M can also suppress F126L alleles, the suppression was not as efficient. Could it be that evolution experiments in slower-growing strains predominantly enrich suppressors in other pathways (i.e., not in the CDG orthologs) that restore the growth better and compete out the relatively weaker suppressors in PGM1? In fact, the authors report similar effects on Page 7, lines 204-210. These two paragraphs are contradictory and should be explained further.

      All of our sequencing was performed on strains with sec53 under the control of the pACT1 promoter. While we did not identify unique sec53-F126L suppressors, we cannot exclude that sec53-F126L suppressors exist, so we describe them as “rare or not easily accessible”. While it is possible that the slower growth rate of the sec53-F126L allele could impact the likelihood of observing suppressors, we think it is more likely due to the nature of the variant (dimerization defect versus stability defect) rather than growth rate. In other laboratory evolution experiments the same beneficial mutation typically has a greater effect in slower-growing backgrounds (for example: doi.org/10.1126/science.1250939).

      3) Authors report that the LOF of PGM1 compensates for the SEC53 mutations. However, the evolution experiments did not capture any LOFs in PGM1. The fitness comparisons in evolution experiments are different as many different genotypes compete in a mix. Therefore, the fitness assays in a clonal population may not represent these differences well. To test this argument, authors can try to mimic the evolution experiments by mixing two genotypes to check competitive fitness, like the co-culture of pgm1 suppressor obtained via evolution experiments with pgm1Δ.

      Though we did not perform a direct head-to-head competition between a pgm1 suppressor and a pgm1Δ, our data suggest that the pgm1 delete would outcompete some of the lower-fitness suppressors. In the Discussion we speculate as to why we do not see deletion mutations: “Given that most of the evolved clones containing pgm1 mutations are more fit than the reconstructed strains, it is possible that other evolved mutations interact epistatically only with non-loss-of-function pgm1 mutations.”. Though it is beyond the scope of the present manuscript, it would be possible to rerun the evolution experiment in sec53-V238M strains carrying either a pgm1 missense suppressor or a pgm1Δ. Under the hypothesis of additional interacting loci, only the pgm1 missense suppressors would be more likely to acquire additional compensatory mutations.

      Reviewer #3 (Public Review):

      Vignogna et al. used yeast genetics, experimental evolution and biochemistry to tackle human congenital disorders of glycosylation (CDG), a disease mostly caused by mutations in PMM2. They took advantage of the observation that the budding yeast gene SEC53 is almost identical to human PMM2, and used experimental evolution to find interactors of SEC53/PMM2. They found an overrepresentation of mutations in genes corresponding to other human CDG genes, including PGM1. Genetic and biochemical characterizations of the pgm1 mutations were carried out. This work is solid, although authors did not reveal why reduction of pgm1 activity could compensate for defects of a particular mutant allele of sec53.

      Out of curiosity, if the authors were to simply focus on the preexisting mutations, would they have gotten the materials for most of the experiments in this article? In other words, how important is the experimental evolution?

      The evolution experiment was crucial as the specific pgm1 mutations we identified here have not been reported elsewhere, nor have the orthologous mutations been identified in human PGM1.

      A strain table with full genotypes is needed.

      We added a strain genotype table (Supplemental Dataset 2).

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2022-01541

      Corresponding author(s): Hubert Hilbi

      1. General Statements

      Upon infection of eukaryotic host cells, Legionella pneumophila forms a unique compartment, the Legionella-containing vacuole (LCV). While the role of vesicle trafficking pathways for LCV formation has been quite extensively studied, the role of putative membrane contact sites (MCS) between the LCV and the ER has been barely addressed. In our study, we provide a comprehensive analysis of the localization and function of protein and lipid components of LCV-ER MCS in the genetically tractable amoeba Dictyostelium discoideum.

      We would like to thank the 3 reviewers for their thorough and constructive reviews. Overall, the reviewers state that the study is of interest to researchers in the field of Legionella and other intracellular pathogens (Reviewer 2), as well as to cell biologists (Reviewer 3). Reviewer 1 does not ask for additional experiments but is critical about the overall structure of the manuscript and the proteomics approach. As requested by the reviewer, we have substantially restructured the revised manuscript, now clearly outline the hypotheses put forward in the study and streamlined the proteomics data. Reviewer 2 asks for additional experiments to support our model of LCV-ER MCS. In the revised manuscript, we have included additional experiments addressing lipid exchange at the MCS, and we plan to perform further co-localization experiments. Reviewer 3 appreciates the comprehensive LCV proteomics and asks for only minor revisions, which we have incorporated in the revised version of the manuscript. We include below a point-by-point response to all the comments made by the reviewers.

      2. Description of the planned revisions

      Reviewer #2

      Major comment

      1) MCS contain protein complexes or a group of proteins, but the proteins here are studied in isolation and do not support the model shown in Figure 7. Co-localization studies of the putative LCV-ER MCS proteins are critical, especially given that the authors hypothesize the proteins are working together to modulate PI(4)P levels.

      Response: As suggested by the reviewer, we will perform additional co-localization experiments with MCS components. To this end, we will construct mCherry-Vap, and we will co-transfect the parental D. discoideum strain Ax3 with plasmids producing mCherry-Vap and OSBP8-GFP or GFP-OSBP11. Using these dually fluorescence labelled D. discoideum strains, the co-localization of Vap with the OSBPs will be assessed at 1, 2, and 8 h post infection. The data will be presented as fluorescence micrographs, and co-localization of Vap with the OSBPs will be quantified using Pearson’s correlation coefficient and fluorescence intensity profiles. The data will be outlined in the text (l. 258 ff.) and shown in the new Fig. 2 and__ Fig. S4__.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1 (Evidence, reproducibility and clarity):

      In the manuscript by Vormittag, et al., the authors perform proteomics identification of proteins associated with the Legionella-containing vacuole (LCV) in the model amoeba Dictyostelium discoideum comparing WT to atlastin knockout mutants. The authors find approximately half the D. discoideum proteome associated with the LCV, but there was enrichment of some proteins on the WT relative to the mutant. They focus on proteins involved in forming membrane contact sites (MCS) that previously were shown to be important for expansion of the Chlamydia-containing vacuole. Most significant are the oxysterol binding proteins (OSBP) and VapA (similar to that seen in Chlamydia). The authors show differential association of these proteins with either the LCV or presumably the ER associated with the LCV. Using a linear scale over 8 days, they show that mutations in some of the MCS reduce yields in two of the OSPB knockout mutants and the growth rate of the vap mutant is slowed but ultimate yield is increased. Using some nice microscopy techniques, they measure LCV size, and the osbK mutant appears particular small relative to other strains, whereas the osbH mutant generates large vacuoles. This doesn't necessarily correlate with the PI4P quantities on the vacuoles (which is higher in all of them), but I am not totally sure how this is measured, and whether is it PI4P/pixel or PI4P/LCV. In all cases, this was reduced by Sac1 mutation. Surprisingly, even though there was uniform increase in PI4P in each of the mutants, loss of PI4P only affects localization of some of the proteins. Finally, in what seems to be a peripherally related experiment, the authors show that a pair of Legionella translocated effectors are required to maintain PI4P levels, although it is not clear how this is related to the other data in the manuscript.

      It is not clear from the manuscript if the authors are just cataloging things or trying to test a hypothesis. This is an extremely difficult manuscript to read and reconstruct what the authors showed. I really think that the only people who will understand what is written are people who are familiar with the work in Chlamydia starting in 2011 in Engel's and Derre's laboratories, which clearly showed that MCS and most specifically Vap/OSBPs are involved in vacuole expansion. If the authors could rewrite the manuscript along these lines, perhaps comparing their data to the Chlamydia data it would help a lot. Otherwise, I don't think anyone else will understand why they are focusing on these things. I don't recommend new experiments (although re-analyzing data is necessary), but the manuscript has to be taken apart and claims removed, and data be interpreted properly. Otherwise, the manuscript seems like just a clearing house for data.

      Response: Thank you for the concise summary of our data and pointing out the need to restructure the manuscript and to clearly outline the hypotheses underlying the study. According to the reviewer’s suggestions, we have now re-structured the manuscript. In the revised manuscript the story unfolds from the observation that the ER tightly associates with (isolated) LCVs, and the proteomics approach is used as a validation of the presence of MCS proteins at the LCV-ER MCS.

      As suggested by the reviewer, we now highlight the seminal work on Chlamydia by the Engel and Derré laboratories not in the Discussion section (as in the original version of the manuscript) but already in the Introduction section (l. 142-148). We believe that it makes a stronger case to start out an analysis of LCV-ER MCS with a Legionella-specific cell biological finding (LCV-ER association) and an unbiased proteomics approach, as compared to a more derivative and defensive approach starting out with what is known about Chlamydia.

      The reviewer’s comment “This is an extremely difficult manuscript to read” appears overly harsh and conflicts with the positive evaluation of Reviewer #2 and Reviewer #3. Finally, we respectfully disagree with the reviewer’s statement that experiments characterizing L. pneumophila effectors implicated in the formation and function of LCV-ER MCS are peripheral. These experiments significantly contribute to a mechanistic understanding of how L. pneumophila forms and exploits LCV-ER MCS, and they are central for studies on pathogen-host interactions. The studies are analogous to the work on Chlamydia effectors by the Engel and Derré laboratories, but the mode of action of Legionella and Chlamydia effectors is obviously different. Another important distinction of our work to the studies on Chlamydia is the use of the genetically tractable amoeba, D. discoideum, which allows an analysis of LCV-ER MCS by fluorescence microscopy at high spatial resolution.

      Specific comments

      1. The problems start with the first figure, in which the authors state that almost half the D. discoideum proteome is LCV-associated. I doubt that this is correct, and they should base this on some selective criterion. Furthermore in Fig. 1A, they show Venn diagrams for how they whittled this down, but the Supplemental Dataset gives us no clue on how this was done. I can only sit down myself with the dataset and try to figure that out, but that is an unreasonable expectation for the reader. The dataset provided should have a series of sheets, describing how the large protein set was whittled down and how they were sorted, so the reader can evaluate how robust the final results were. To me (at least), if they said: "look we got this surprising result that suggests MCS are involved in promoting LCV formation, and although this is well recognized in Chlamydia but poorly recognized in Legionella", that would be satisfactory to me.

      Response: According to the reviewer’s suggestions, we have now thoroughly re-structured the manuscript. In the revised manuscript the story unfolds from the observation that the ER tightly associates with LCVs in infected cells and with isolated LCVs. The proteomics approach is now used as a validation of the presence of MCS proteins at the LCV-ER MCS and relegated to the Supplementary Information section (former Fig. 1, now Fig. S3).

      For the proteomics analysis, all protein identifications have been filtered for robustness applying a constant FDR (false discovery rates) of protein and PSM (peptide spectrum match) of 0.01, which is a commonly accepted threshold in the field. Moreover, two identified unique peptides were required for protein identification. The parallel application of both filter criteria results in very robust and reliable data sets. This is outlined in the Material and Methods section (l. 683-693).

      In the data set of LCV-associated proteins, 2,434 D. discoideum proteins have been identified (Table S1). This is 18.5% of the total of 13,126 predicted D. discoideum proteins (UniprotKB) and considerably less than “almost half the D. discoideum proteome”, as stated by the reviewer. Moreover, 1,224 L. pneumophila proteins have been identified (among 3,024 predicted L. pneumophila proteins in the database). This is a reasonable number of proteins identified from an intracellular vacuolar pathogen, given the LCV isolation and proteomics methods applied. We now outline these findings more extensively in the Results section (l. 207-213). Moreover, to render Table S1 more reader-friendly, we added to the datasheet “All data” the datasheets “Dictyostelium”, “Legionella” and “Info”.

      The Venn diagram in Fig. S3A (previously Fig. 1A) does not show a subset of proteins “whittled down” from the entire proteomes, but simply summarizes LCV-associated proteins, which were either identified exclusively in the parental strain Ax3 but not in the Δsey1 mutant strain, or only in Δsey1 but not in Ax3, thus identifying possible candidates relevant for the LCV-ER MCS. This information is now outlined more clearly in the text (l. 238-241). Moreover, we now explicitly define in the Material and Methods section (l. 697-704) the “on” and “off” proteins shown in Fig. S3A.

      The overall rational for the comparative proteomics approach was our previous finding that compared to the D. discoideum parental strain Ax3, the Δsey1 mutant strain accumulates less ER around LCVs (PMID: 28835546, 33583106). This finding suggests that formation of the LCV-ER MCS might be compromised in the Δsey1 mutant strain. This hypothesis is now outlined at the beginning of the Results paragraph (l. 204-207).

      I am clueless regarding how Fig. 6 fits with the rest of the manuscript. If this is about MCS, there is no demonstration these effectors are directly involved in MCS other than the somewhat diffuse argument that there is some correlative connection to PI4P levels, that I am not particularly convinced by.

      Response: The PtdIns(4)P gradient between two different cellular membranes is an intrinsic feature of MCS. To date, a quantification of PtdIns(4)P levels on LCVs in response to the presence or absence of specific L. pneumophila effectors is lacking. Accordingly, we opted for quantifying the PtdIns(4)P levels on LCVs in presence and absence of an L. pneumophila effector putatively generating PtdIns(4)P on LCVs, the phosphoinositide 4-kinase LepB, or titrating PtdIns(4)P on LCVs, the PtdIns(4)P-binding ubiquitin ligase SidC. To address the concerns of Reviewer 1 and Reviewer 3 (see below), we now outline in detail the rational to assess the role of LepB and SidC for MCS function (l. 385-387). Importantly, we now also provide data that at LCV-ER MCS PtdIns(4)P/cholesterol lipid exchange is functionally important (new Fig. 6 and Fig. S10). In the revised version of the manuscript, this new data is preceding the experiments with the L. pneumophila effectors, which should render our choice of effectors more comprehensible to the reader and increase the flow of the manuscript.

      Line 146 and associated paragraph. We don't need a catalog of proteins in narrative. There is more detail in the narrative than there is in the tables and figures, which would be a more appropriate way to present the data.

      Response: As suggested by the reviewer, we summarized the LCV-associated D. discoideum proteins and considerably reduced the list in the text (l. 214-230).

      Line 186. There is nothing wrong with pursuing MCS based on the idea that this was seen before with Chlamydia and you wanted to test if this was a previously unappreciated aspect of Legionella biology. I don't see the rationale based on the proteomics, partly because I don't understand how the proteomics dataset was parsed.

      Response: As suggested by the reviewer, we thoroughly re-structured the manuscript and now highlight the seminal work on Chlamydia by the Engel and Derré laboratories already in the Introduction section (not in the Discussion section as in the original version of the manuscript). We believe that it makes a stronger case to start out an analysis of LCV-ER MCS with a Legionella-specific cell biological finding (LCV-ER association) and an unbiased proteomics approach, as compared to a more derivative and defensive approach starting out with what is known about Chlamydia.

      Figure 3: These growth curves are super-weird. I am not used to looking at 8 days of logarithmic growth in a linear scale and seeing no (apparent) growth for 4 days. Considering all the microscopy data are performed in the first 18 hrs of infection, it’s hard to see how this is related to data at 8 days post infection. If this were plotted in logarithmic scale, as microbiologists are used to doing, then perhaps we could see a connection. Also, in some cases, it might be helpful to calculate a growth rate, because it’s possible the author may now see some effects by comparing logarithmic growth rates.

      Response: We have been characterizing growth of L. pneumophila in D. discoideum in several studies using growth curves with RFU vs. time plotted in linear scale (e.g., Finsel et al., 2013, Cell Host Microbe 14:38; Rothmeier et al., 2013, PloS Pathog 9: e1003598; Swart et al., 2020, mBio 11: e00405-20). The D. discoideum-L. pneumophila infection model is peculiar, since the amoebae do not survive temperatures beyond 26 degC. This is substantially below the optimal growth temperature of L. pneumophila (35-40 degC). This means that - due to the many genetic tools available - D. discoideum is an excellent model to investigate cell biological aspects of the infection at early time points (ca. 1-18 h p.i.), but the amoebae are not an optimal system to quantify (several rounds) of intracellular growth.

      Figure 2: The images don't necessarily show what the bar graphs show. In particular, look at Osp8. That image doesn't make sense to me.

      Response: The individual channels of the merged images in Fig. 1 (formerly Fig. 2) are shown in Fig. S2. By looking at the individual channels, it becomes clear that OSBP8-GFP co-localizes with calnexin-mCherry (overlapping signals), but not with P4C-mCherry or AmtA-mCherry (adjacent signals). Co-localization was quantified in a non-biased manner by Pearson’s correlation coefficient. To further visualize co-localization, we now also provide fluorescence intensity profiles for all confocal micrographs (amended Fig. 1).

      In summary, I think the authors hit on something that is probably important for Legionella biology, but it’s not clear what they want to show. They are very invested in connecting everything to PI4P levels, which may or may not be correct, but it seems to me that perhaps taking more care in showing the importance of the Vap/OSPB nexus in supporting Legionella growth should be the first priority.

      Response: Given the importance of the PtdIns(4)P gradient for lipid exchange at MCS, we believe it is justified to put considerable emphasis on this lipid. To further substantiate a functional role of PtdIns(4)P at LCV-ER MCS, we now also show that an increase in PtdIns(4)P at the LCV correlates with a decrease of cholesterol (new Fig. 6 and Fig. S10). The inverse correlation of these two lipids is in agreement with the notion that cholesterol is a counter lipid of PtdIns(4)P at LCV-ER MCS.

      It is not clear from the manuscript if the authors are just cataloging things or trying to test a hypothesis.

      Response: In the revised version of the manuscript, we put forward several specific hypotheses, which we then tested in our study (l. 152-155).

      If I understand Fig. 1, only one of the candidates (VapA) was verified as being more enriched in WT relative to atlastin mutants. This argues even more strongly that the authors have to describe their criteria for choosing these candidates.

      Response: As outlined above (specific point 1), we have now re-structured the manuscript according to the reviewer’s suggestions. In the revised manuscript the story unfolds from the observation that the ER tightly associates with LCVs in infected cells and with isolated LCVs. The proteomics approach is now used as a validation of the presence of MCS proteins at the LCV-ER MCS and relegated to the Supplementary Information section (formerly Fig. 1, now Fig. S3). We consider the proteomics approach a powerful hypothesis generator, and the experimental identification of several MCS proteins by proteomics validated the cell biological and bioinformatics insights.

      Reviewer #1 (Significance (Required)):

      As stated above, the manuscript can't decide if it’s about MCS or PI4P, and I would argue strongly that the emphasis on PI4P detracts from the manuscript, as well as its inability to draw connection to previous work that is likely to be important.

      Response: We respectfully disagree with the reviewer on this important point and hold that proteins as well as lipids are crucial functional determinants of MCS. The PtdIns(4)P gradient is a pivotal process for lipid exchange at MCS. Therefore, we believe it is justified to put considerable emphasis on this lipid. In the Introduction section, we now specify several hypotheses on the localization and function of lipids and proteins at LCV-ER MCS (l. 152-155). Moreover, we now also refer to the previous work on Chlamydia MCS in the Introduction section (l. 142-148).

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary of paper and major findings

      Membrane contact sites (MCS) are locations where two membranes are in close proximity (10-80nm). MCS have a defined protein composition which tether the membranes together and function in small molecule and lipid exchange. Typically, MCS proteins contain structural (e.g., tethers) and functional (e.g., exchange lipids) proteins, in addition to proteins which regulate the structure and function of the MCS. In this manuscript, Vormittag et al describe protein components of MCS between the Legionella-containing vacuole (LCV) and the host endoplasmic reticulum (ER) in the amoeba Dictyostelium. Proteomics of isolated LCVs followed by microscopy analysis identified several proteins which localize to either the LCV-associated ER (OSBP8), the LCV (OSBP11), or both (VAP and Sac1). The mammalian homologs of these proteins have been shown to play important roles in ER MCS, with VAP serving a structural role, Sac1 a PI(4P) phosphatase regulating PI(4)P levels, and OSBP8 and OSBP11 lipid transferring proteins. Given the importance of PI(4)P in formation and maintenance of the Legionella-containing vacuole, the authors used dicty mutants to determine the importance of these proteins in bacterial growth, LCV size, and PI(4)P levels on the LCV. While VAP and OSBP11 appear to promote Legionella infection, OSBP8 appears to restriction infection, although all identified MCS components appear to play a role in decreasing PI(4P) shortly after infection. Finally, VAP and OSBP8 localization to the LCV is PI(4)P-dependent. Overall, the authors conclude that these MCS components play a role in modulating PI(4)P levels on the LCV.

      Overall, this is an interesting study further exploring the role of PI(4)P in LCV-ER interactions, and how PI(4)P levels are regulated. The figures are clearly presented, there is an impressive amount of data, and rigor appears to be strong with appropriate replicates and statistical analysis. The phenotypes are often mild, but the authors are careful to not overinterpret the data. While this is an interesting study, additional experiments are necessary to support the overall model and the text needs to put the findings into the larger context.

      Response: We would like to thank the reviewer for this positive and constructive assessment. We performed and planned additional experiments to further strengthen the study and support our model.

      Major comments

      1) MCS contain protein complexes or a group of proteins, but the proteins here are studied in isolation and do not support the model shown in Figure 7. Co-localization studies of the putative LCV-ER MCS proteins are critical, especially given that the authors hypothesize the proteins are working together to modulate PI(4)P levels.

      Response: To further explore the possible interactions between Vap and OSBP proteins, we plan co-localization experiments using D. discoideum strains producing mCherry-Vap and either OSBP8-GFP or GFP-OSBP11, as outlined above (Section 2, new__ Fig. 2__ and Fig. S4).

      Moreover, we included additional data on PtdIns(4)P/cholesterol lipid exchange (Fig. 6 __and Fig. S10__), which have been incorporated into the model (amended Fig. 8). Based on the available data, we do not postulate direct interactions between Vap and OSBP proteins. The previous model, which now has been amended, might have been misleading in that respect.

      2) The phenotypes are relatively mild, suggesting functional redundancy. Double knockouts, particularly in VAP and OSBP11, may generate a stronger phenotype that better supports the hypothesis and demonstrate the importance during infection.

      Response: Thank you for this interesting suggestion. Please see Section 4 below for our arguments, why we believe that this intriguing approach is beyond the scope of the current study.

      3) The timing of PI(4)P and MCS protein localization during infection is critical to understanding how MCS might be functioning. Based on Figure 6C, PI(4)P levels decrease on the LCV during infection, but this is not fully explained in the context of what's known in the literature and what is observed the previous figures. How does localization of different MCS components change during infection, and does this correlate with the changes in growth or LCV size? A better description in the Introduction on LCV-associated PI(4)P levels would be beneficial in orienting the reader to why PI(4)P levels are modulated.

      Response: As suggested by the reviewer, we added to the Introduction section more detail about the kinetics of PtdIns(4)P accumulation on LCVs (l. 65-71), and we discuss the limited spatial resolution of the IFC approach (formerly Fig. 6C, now Fig. 7C; l. 407-408). Importantly, we also provide new data showing that within 2 h p.i. an increase in PtdIns(4)P at the LCV coincides with a decrease of cholesterol (new Fig. 6 and Fig. S10). The new data is put into this context in the Discussion section (l. 449-454).

      4) OSW-1 has other targets besides OSBPs, and depleting Sac1 and Arf1 in A549 cells is not specifically targeting the MCS, as these proteins have other functions. The data in mammalian cells is not convincing and should be removed.

      Response: As suggested by the reviewer, we removed the data on depleting Sac1 in A549 cells (Fig. 3D, and Fig. S6BC). We propose to leave the pharmacological data on inhibition of L. pneumophila replication by OSW-1 in the manuscript, but to clearly point out that OSW-1 has other targets besides OSBPs (l. 297-299).

      Minor comments

      1) Figure 2 is missing details on number of experiments/replicates and statistical analysis.

      Response: Thank you for having noted this oversight. The number of independent experiments and statistical analysis have now been added to Fig. 1 (formerly Fig. 2) (l. 1009-1010).

      2) Can the authors hypothesize why VAP promotes growth early during infection, but appears to restrict growth at later timepoints (Figure 3A)?

      Response: Thank you for raising this intriguing point. The opposite effects on growth of Vap at early and later timepoints during infection might be explained by interactions with antagonistic OSBPs. Vap likely co-localizes with OSBP8 as well as with OSBP11 on the limiting LCV membrane or the ER, respectively (experiment to be performed; Fig. 2 and__ Fig. S4__). The absence of OSBP8 (ΔosbH) or OSBP11 (ΔosbK) causes larger or smaller LCVs, and increased or reduced intracellular replication of L. pneumophila, respectively. Thus, OSBP8 seems to restrict and OSBP 11 seems to promote intracellular replication. Accordingly, if Vap affects or interacts with OSBP11 early and with OSBP8 later during infection, opposite effects on growth of Vap might be explained. These reflections are now outlined in the Discussion section (l. 431-441).

      3) There is a large amount of data, which makes it difficult at times to follow. I suggest adding additional information to table 1, including LCV size and whether or not the protein's localization is PI(4)P-dependent.

      Response: Thank you for this suggestion. As proposed by the reviewer, we added the additional information to Table 1 (PtdIns(4)P-dependency of protein localization, LCV size).

      Reviewer #2 (Significance (Required)):

      Membrane contact sites during bacterial infection are a growing area of research. In Legionella, several papers point to the presence of MCS. Further, PI(4)P is known to be an important component on the LCV. This paper shows that MCS protein members are important in modulating LCV PI(4)P levels. The model as presented is not completely supported by the data as co-localization experiments are needed, along with more detailed analysis of how PI(4)P levels change over infection and the role of these MCS proteins in that process. This study will be of interest to those studying Legionella and other vacuolar pathogens. Area of expertise is on membrane contact sites and lipid biology.

      Response: Thank you very much for the overall positive and constructive evaluation.

      Reviewer #3 (Evidence, reproducibility and clarity):

      The authors perform proteomic analysis of Legionella-containing vacuoles. The observe association of membrane contact site (MCS) proteins including VAP, OSBPs, and Sac1. Functional data indicates that these proteins contribute to PI4P levels on LCVs and their ability to acquire lipid from the ER to enable LCV expansion/stability. Overall, the paper is an important contribution to the field and builds upon a growing appreciation for MCS in establishment of intracellular niches by microbial pathogens. I have only minor comments for the authors consideration.

      Response: We would like to thank the reviewer for this enthusiastic assessment.

      Minor comments:

      -line 145, "This approach revealed 3658 host or bacterial proteins identified on LCVs...". This number seems high... how does it compare to prior proteomic studies of pathogen-containing vacuoles?

      Response: As outlined above (reviewer 1, point 1), we have now changed the text (l. 207-213): “This approach revealed 2,434 LCV-associated D. discoideum proteins (Table S1), of a total of 13,126 predicted D. discoideum proteins (UniprotKB). Moreover, 1,224 L. pneumophila proteins were identified (among 3,024 predicted L. pneumophila proteins), which is a reasonable number of proteins identified from an intracellular bacterial pathogen within its vacuole with the proteomics methods applied (Herweg et al, 2015; Schmölders et al., 2017).”

      • line 160. Can the authors comment on why mitochondrial proteins are observed in their proteomic analysis? Are these non-specific background signals or reflecting relevant organelle contact?

      Response: The dynamics of mitochondrial interactions with LCVs and the effects of L. pneumophila infection on mitochondrial functions have been thoroughly analyzed (PMID: 28867389). This seminal work is now cited in the text (l. 227-230).

      • line 268. It is reported that LCVs are smaller with MCS disruption at 2 and 8 h p,i.. Does this also lead to instability or rupture of LCVs? And related to this why would LCVs be bigger at 16h with MCS disruption?

      Response: MCS components affect LCV size positively or negatively. E.g., the absence of OSBP8 (ΔosbH) or OSBP11 (ΔosbK) causes larger or smaller LCVs, and increased or reduced intracellular replication of L. pneumophila, respectively. However, as outlined in the Discussion section (l. 442-454), we believe that the relatively small size likely reflects a structural remodeling of the pathogen vacuole rather than a substantial LCV expansion. LCV rupture takes place only very late in the infection cycle (beyond 48 h) and is followed by lysis of the host amoeba (PMID: 34314090).

      • lines 288 and 299 "data not shown" this data should be included in a supplemental figure.

      Response: The data on the localization of GFP-Sac1 and GFP-Sac1_ΔTMD are included in the Figs. 1A, 4A, 5AD, S2A, S7A, and__ S9__ (l. 328, l. 339).

      • line 327. The authors choose to focus on the role of LepB and SidC in MCS modulation. The rationale for choosing these two amongst the ca 330 effectors was not given. Were other effectors also examined?

      Response: LepB and SidC were chosen due to their activities producing or titrating PtdIns(4)P, respectively, and their LCV localization. This rational is now given in the text (l. 385-387). No other effectors were examined up to this point.

      Reviewer #3 (Significance (Required)):

      Comprehensive LCV proteomics of interest to field of cellular microbiology. Studies of MCS broadly relevant to cell biologists.

      Response: Thank you very much for the overall very positive evaluation.

      4. Description of analyses that authors prefer not to carry out

      Reviewer #2

      Major comment

      2) The phenotypes are relatively mild, suggesting functional redundancy. Double knockouts, particularly in VAP and OSBP11, may generate a stronger phenotype that better supports the hypothesis and demonstrate the importance during infection.

      Response: Thank you for raising the important question of functional redundancy. We now outline this concept in the Discussion section (l. 427-429). A further analysis of the genetic and biochemical relationship between Vap and OSBP11 or OSBP8 are without doubt some of the most interesting aspects of further studies on the topic of LCV-ER MCS.

      The construction of a D. discoideum double mutant strain is time consuming and usually takes 1-2 months. Provided that a Vap/OSBP11 double deletion mutant strain is viable and can be generated, it takes another 1-2 months to thoroughly characterize the strain regarding intracellular replication of L. pneumophila (Fig. 3), LCV size (Fig. 4), and PtdIns(4)P score (Fig. 5). Moreover, there is already a large amount of data in the paper (to quote Reviewer #2), and therefore, adding new data might makes it even harder to follow the story and focus on the key points. Finally, we believe that the planned colocalization experiments (Reviewer #2, point 1) and the new data on lipid exchange kinetics (new Fig. 6 and Fig. S10) fit the current story more coherently, and thus, are more straightforward and informative than the generation and characterization of double mutant strains. For these reasons, we believe that the generation and characterization of D. discoideum double mutant strains is beyond the scope of the current study.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In the manuscript by Vormittag, et al., the authors perform proteomics identification of proteins associated with the Legionella-containing vacuole (LCV) in the model amoeba Dictyostelium discoideum comparing WT to atlastin knockout mutants. The authors find approximately half the D. discoideum proteome associated with the LCV, but there was enrichment of some proteins on the WT relative to the mutant. They focus on proteins involved in forming membrane contact sites (MCS) that previously were shown to be important for expansion of the Chlamydia-containing vacuole. Most significant are the oxysterol binding proteins (OSBP) and VapA (similar to that seen in Chlamydia). The authors show differential association of these proteins with either the LCV or presumably the ER associated with the LCV. Using a linear scale over 8days, they show that mutations in some of the MCS reduce yields in two of the OSPB knockout mutants and the growth rate of the vap mutant is slowed but ultimate yield is increased. Using some nice microscopy techniques, they measure LCV size, and the osbK mutant appears particular small relative to other strains, whereas the osbH mutant generates large vacuoles. This doesn't necessarily correlate with the PI4P quantities on the vacuoles (which is higher in all of them), but I am not totally sure how this is measured, and whether is it PI4P/pixel or PI4P/LCV. In all cases, this was reduces by Sac1 mutation. Surprisingly, even though there was uniform increase in PI4P in each of the mutants, loss of PI4P only affects localization of some of the proteins. Finally, in what seems to be a peripherally related experiment, the authors show that a pair of Legionella translocated effectors are required to maintain PIF4P levels, although it is not clear how this is related to the other data in the manuscript.

      It is not clear from the manuscript if the authors are just cataloging things or trying to test a hypothesis. This is an extremely difficult manuscript to read and reconstruct what the authors showed. I really think that the only people who will understand what is written are people who are familiar with the work in Chlamydia starting in 2011 in Engel's and Derre's laboratories, which clearly showed that MCS and most specifically Vap/OSBPs are involved in vacuole expansion. If the authors could rewrite the manuscript along these lines, perhaps comparing their data to the Chlamydia data it would help a log. Otherwise, I don't think anyone else will understand why they are focusing on these things. I don't recommend new experiments (although re-analyzing data is necessary), but the manuscript has to be taken apart and claims removed, and data be interpreted properly. Otherwise, the manuscript seems like just a clearing house for data.

      1. The problems start with the first figure, in which the authors state that almost half the D. discoideum proteome is LCV-associated. I doubt that this is correct, and they should base this on some selective criterion. Furthermore in Fig. 1A, they show Venn diagrams for how they whittled this down, but the Supplemental Dataset gives us no clue on how this was done. I can only sit down myself with the dataset and try to figure that out, but that is an unreasonable expectation for the reader. The dataset provided should have a series of sheets, describing how the large protein set was whittled down and how they were sorted, so the reader can evaluate how robust the final results were. To me (at least), if they said: "look we got this surprising result that suggests MCS are involved in promoting LCV formation, and although this is well recognized in Chlamydia but poorly recognized in Legionella", that would be satisfactory to me.
      2. I am clueless regarding how Fig. 6 fits with the rest of the manuscript. If this is about MCS, there is no demonstration these effectors are directly involved in MCS other than the somewhat diffuse argument that there is some correlative connection to PI4P levels, that I am not particularly convinced by.
      3. Lin 146 and associated paragraph. We don't need a catalog of proteins in narrative. There is more detail in the narrative than there is in the tables and figures, which would be a more appropriate way to present the data.
      4. Line 186. There is nothing wrong with pursuing MCS based on the idea that this was seen before with Chlamydia and you wanted to test if this was a previously unappreciated aspect of Legionella biology. I don't see the rationale based on the proteomics, partly because I don't understand how the proteomics dataset was parsed.
      5. Figure 3: These growth curves are super-weird. I am not used to looking at 8 days of logarithmic growth in a linear scale, and seeing no (apparent) growth for 4 days. Considering all the microscopy data are performed in the first 18 hrs of infection, its hard to see how this is related to data at 8 days post infection. If this were plotted in logarithmic scale, as microbiologists are used to doing, then perhaps we could see a connection. Also, in some cases, it might be helpful to calculate a growth rate, because its possible the author may now see some effects by comparing logarithmic growth rates.
      6. Figure 2: The images don't necessarily show what the bar graphs show. In particular, look at Osp8. That image doesn't make sense to me.

      In summary, I think the authors hit on something that is probably important for Legionella biology, but its not clear what they want to show. They are very invested in connecting everything to PI4P levels, which may or may not be correct, but it seems to me that perhaps taking more care in showing the importance of the Vap/OSPB nexus in supporting Legionella growth should be the first priority.

      It is not clear from the manuscript if the authors are just cataloging things or trying to test a hypothesis.

      If I understand Fig. 1, only one of the candidates (VapA) was verified as being more enriched in WT relative to atlastin mutants. This argues even more strongly that the authors have to describe their criteria for choosing these candidates

      Significance

      As stated above, the mansucript can't decide if its about MCS or PI4P, and I would argue strongly that the emphasis on PI4P detracts from the manuscript, as well as its inability to draw connection to previous work that is likely to be important.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This paper examines the formation and repair of micronuclei in non-cancerous cells, specifically in mouse embryonic fibroblasts. This work was performed completely in culture and used a combination of western blot, confocal and superresolution microscopy to assess the contents of micronuclei over a repair period of 5 hours after 2 hours of induction of double strand breaks by treatment with etoposide. The authors found that the bodies colocalised with LC3, Beclin 1 and lysosomes suggestive of autophagy. However no evidence of autophagic flux has been demonstrated.

      Major issues are as follows:

      Figure 2

      A - Any sense of the autophagic flux? LC3B - I and LC3B - II seem to be in equal quantities most of the time. Maybe using the tandem LC3 in this system could provide further insight. Also remove the violin plots from this graph and from G and H, as there are too few data points.

      Thank you for your comment. We have evidence of a functional autophagic flux, since we observed an increasing number of acidic vesicles stained with Lysotracker in response to DNA damage, which were reduced after DNA repair. Some of the micronuclei were also co-stained with Lysotracker, suggesting their lysosomal degradation. We reorganized the data in the revised figure 2A to communicate better these observations. We reproduce here the dynamic of Lysotracker stain, please notice an increase in the abundance of acidic vesicles after 2h of DNA damage. A further evidence of activation of functional autophagy is the dynamic intracellular distribution of both LC3 and BECN1, indicative of autophagy induction. Please notice in revised Figure 2A that LC3 surrounding vesicles increases after 2h of DNA damage and diminish when DNA is repaired. BECN1 in control MEFs is highly concentrated inside the nucleus, predominantly at the nucleolus, and after DNA damage it redistributes towards the cytoplasm. Finally after DNA repair, BECN1 appears highly concentrated at the nucleus again. These dynamic changes correlate with autophagosomes formation and successful fusion with lysosomes. In the revised manuscript we removed the violin plot as suggested. Since the elimination of nuclear components occurs in a subset of cells, the role of the autophagic machinery needs to be analyzed cell by cell. We considered better to eliminate also the Western blot, as an analysis of the whole population does not provide information relevant for this study.

      • Can you reduce the brightness in the merge image, as I cannot see DAPI nor a convincing Beclin-1/LC3 co-localisation.

      Thank you for the observation. We improved the quality of the images and reorganized Figure 2 to convincingly show BECN1 and LC3 co-localization, together with Lysotracker, in nuclear alterations (buds and micronuclei). We modified the results text accordingly.

      • Although the data is convincing, It would be clearer if the brightness of the merge image was reduced.

      Thank you for your comment. We improved the images shown, these data is now integrated in new Figure 2A.

      • Is the significant result the difference between 5h R Control si and 5h R Atg7? if so, there is no significant change in micronuclei as the same time point, can you explain this disconnect? are the buds being degraded prior to becoming micronuclei?

      That is correct, we found no statistical significant difference in the number of micronuclei formed silencing Atg7, although there was a trend to reduce them. To consolidate the role of autophagy in nuclear buds and micronuclei formation, we studied Atg4-/- MEFs. We confirmed a statistical significant reduction of buds formation when autophagy is impaired (new Figure 2G). However, we observed that the number of micronuclei increased after 2h of DNA damage in Atg4-/- MEFs, suggesting that autophagy does not contribute to micronuclei formation but elimination. Together, our results suggest that the origin of buds and micronuclei are mechanistically different. A difference in the biogenesis of buds and micronuclei has been previously suggested studying cells cultured under strong stress conditions that induce DNA amplification, as well as in cells under folic acid deficiency. While interstitial DNA without telomere was more prevalent in buds than in micronuclei, telomeric DNA was more frequently observed in micronuclei (Fennech et al. 2011, Mutagenesis 26:125-132). We agree with the reviewer, it seems that not all the buds become micronuclei.

      Figure 3 A - nice microscopy showing the co-localisation of TOP2A and LC3-GFP. I'm interested in DAPI being on some bodies and not others. Do you have any sense of the dynamics of this?

      Thank you for the interesting question. Since removal of nuclear alterations as nuclear buds and micronuclei is a very dynamic process, we detect nuclear damaged material in the cytoplasm are at different degradation stages. Nucleases could be degrading DNA in micronuclei. Another possibility to the lack of DAPI signal in some micronuclei containing TOP2A and GFP-LC·is that TOP2A could be expelled from the nucleus with undetectable fragments of DNA or even without DNA, as a renewal process. We believe that nuclear buds can form without extruding DNA in some cases, perhaps to modulate proteostasis in addition to protect genome stability. In the revised manuscript we discuss this possibility further.

      G - c shows a strand of mostly TOP2B coming from the nucleus. Is there any evidence that this occurs using either confocal microscopy or super resolution approaches. Could you try Z-stack to find these?

      Thank you for the suggestion, we analyzed Z-stack images and tried to observe it also by immunofluorescence. We could detect some tubular signal connecting the nucleolus with a micronucleus containing TOP2B and BECN1 (arrow head in Fig 3B reproduced below), although we cannot be certain we are detecting the same nuclear extrusion mechanism by Electron Microscopy than by immunofluorescence.

      Figure 4 C - is there a significant increase in FBL negative bodies, this would make sense if FBN is being degraded in the micronuclei during the repair process

      We found that the number of micronuclei without FBL increased with statistical significant difference by Two-way-ANOVA followed by Dunnett´s multiple comparison test (P=0.463 comparing cells with 2h of DNA damage with control cells; P=0.0017 comparing cells after 5h of DNA repair with untreated cells; n=5). We agree with the reviewer, a possible explanation is that FBL is being degraded in micronuclei during the repair process. Although it could also be possible that nucleolar is less sensitive to Etoposide poisoning, or that nucleolar DDR is mechanistically different.

      • Would it be possible to increase the n of these experiments to confirm either no change in FBL/LC3 co-loc, or evidence of increase?

      Thank you for the suggestion. We repeated the experiment two more times to increase the n to 5. We found no statistical difference in the number of nuclear buds or micronuclei containing both FBL and LC3 during DNA damage and repair. Therefore it seems that the release of nucleolar components is not enhanced by Etoposide-induced DSB, suggesting that nucleolar DDR is a unique response, independent of DDR elsewhere in the genome (reviewed in Nucleic Acids Research, 2020, Vol. 48, No. 17 9449–9461 doi: 10.1093/nar/gkaa713).

      Minor issues:

      Figure 4 and 5 legends are in a different font.

      Thank you. We correct the font in the current manuscript.

      Reviewer #1 (Significance (Required)):

      There is little specific data on the role of autophagy in clearing micronuclei in cancer cells, so this may be suggestive of a new mechanism that occur during normal cellular homeostasis. There are known links between lamin A defects and the formation of micronuclei, but not explicitly that the micronuclei are also Lamin A positive. it is likely that analogous processes occur in both cancer and non-cancer, so the impact of these data is not clear to me. This paper may be of interest to researchers interested in nuclear structure and DNA damage, but based on the data presented the significance is limited.

      The significance of the present work is to discover that autophagy is relevant both during physiological DNA damage and in response to an exogenous DNA damaging agent, to extrude damaged DNA, TOP2cc and Fibrillarin from the nucleus. This knowledge is relevant since insufficiencies on autophagy imply a risk of genomic instability, which in turn could drive the cell into a senescent or malignant state. We present data showing that autophagy regulates the dynamic formation and elimination of nuclear buds and micronuclei in a mechanistically differentiated way. While autophagy contributes to nuclear buds formation, it is necessary for micronuclei elimination. Our data suggest that nucleophagy could be also a mechanism to alleviate basal nucleolar stress. As the reviewer noticed, some micronuclei did not have DNA. It is conceivable then that nuclear buds and micronuclei form also for a proteostatic function, not necessarily involving DNA damage elimination. We believe the significance of our work contributes to our understanding of the cell, as well as to cancer research. Whether common mechanisms between cancerous and normal cells occur is relevant to know, to consider the specificity of potential therapeutic approaches.

      I don't have sufficient expertise to evaluate the super resolution microscopy beyond assessing the images.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Peer review of the manuscript with the number RC-2021-01181 by Muciño-Hernandez G et. al. at Review Commons and with the tittle "Nucleophagy contributes to genome stability 1 though TOP2cc and nucleolar components degradation"

      1. Summary Muciño-Hernandez G et. al. show in this manuscript that mouse embryonic fibroblasts (MEFs) have basal levels of nuclear buds and micronuclei, which are indicators of genomic DNA damage. These basal levels of nuclear buds and micronuclei in MEFs increased after Etoposide treatment, which is known to induce DNA Double stranded Breaks (DSD). Interestingly, the nuclear buds and micronuclei co-localize with makers for nucleophagy (BECN1 and LC3) and acidic vesicles, suggesting that they are cleared by nucleophagy. The authors propose that basal levels of nucleophagy clear basal levels of genomic DNA damage that occurs as result from DNA-dependent biological processes in the cell nucleus, thereby contributing to nuclear stability of MEFs under physiological conditions. These basal levels of nucleophagy increase after the action of factors that induce DNA damage and nuclear stress. The concepts proposed by Muciño-Hernandez G et. al. are novel, since most of the current published data on nucleophagy related to DNA damage have been obtained under pathological conditions, e.g. implementing cancer cells.

      The authors use in their manuscript various molecular biology techniques to obtain data that support their claims, including Western Blot analysis of protein extracts from MEFs, immunostaining on MEFs and neutral comet assays, complemented with state of the art imaging techniques, such as confocal microscopy, immunoelectron microscopy and super resolution microscopy. The quality of the data is sound. The structure of the manuscript support the understanding of the reader. However, I would like to suggest several improvements that will help to increase the quality of the manuscript, in order that fits to the standards of articles recently published in journals affiliated to Review Commons, such as the Journal of Cell Biology, the EMBO Journal or eLife.

      1. Major comments

      2.1 The authors have to improve the description of the results. Especially the description of those Figure panels containing plots that were generated using data from several experiments has to be improved.

      One example is the description of the Figure 1D, which is in the lanes 137-151 of the current version of the manuscript. Whereas the authors describe in lanes 137-147 observations related to representative pictures of confocal microscopy after immunostaining presented in Figure 1D (left), the description of the quantification from 9 independent experiments presented in the plots in Figure 1D (right) comes relatively short in lanes 147-150 without mentioning any of the values implemented for creating the plots.

      "Interestingly, while the frequency of nuclear buds gradually increased after DNA damage and during DNA repair, the frequency of micronuclei also increased after DNA damage, but diminished upon DNA repair."

      The other plots presented in the different figure panels across the manuscript are described in a similar manner. I would like to suggest to the authors to improve their manuscript by including during the description of their results the values that were implemented for the degeneration of the plots presented in the manuscript. For example, in the specific case of Figure 1D above:

      "Interestingly, the percentage of MEFs with nuclear buds gradually increased from XY% ({plus minus} XY SD) in control non-treated (Ctrl) MEFs to XY% ({plus minus} XY SD; P=XY) after 2 h Etoposide-induced DSB in MEFs and XY% ({plus minus} XY SD; P=XY) after DNA repair take place in MEFSs 5 h upon stop of Etoposide treatment (Figure 1D, right). In contrast, the percentage of MEFs with micronuclei significantly increased from XY% ({plus minus} XY SD) in Ctrl MEFs to XY% ({plus minus} XY SD; P=XY) after 2 h Etoposide-induced DSB, whereas it was reduced to XY% ({plus minus} XY SD; P=XY) 5 h after stop of Etoposide treatment (Figure 1D, right)."

      Descriptions of the plots as mentioned above will make the text more intuitive for the reader, and they will make possible to read the Results Section without switching to the Figure Legends or the Material and Methods Section or to Supplementary Files. Even though the representative pictures from different microscopy techniques presented in the manuscript are of good quality and support the claims of the authors, it is important to mention that the quantifications presented in the plots demonstrate the statistical significance of these representative pictures. Thus, the authors should consistently include in the manuscript during the description of theirs results all the information (mean values, standard error of the means, P values, n values, etc.) that support their interpretation of the results and demonstrate the statistical significance of their claims.

      Thank you for your clear and valuable advice. We followed it and in the revised manuscript we included the data in the results section.

      2.2 Following a similar line of argumentation as in the previous point, the authors should provide as Supplementary Material an Excel file containing a statistical summary, including all statistical relevant information from each one of the plots presented in each Figure panel, such as n values, P values, Test implemented, values used for the plots, numbers of experiments, etc. The information could be organized in the Excel file in different data sheets according to the Figure panels, in order that the reader can easily navigate through the data. In the current version of the manuscript, one cannot find the values used for the generation of the plots presented in the manuscript in any of the submitted files.

      Thank your for this suggestion. We have included in Table S1 an Excel file with a data sheet for each Figure panel, containing all the data collected and the statistical analysis performed.

      Minor comments

      3.1 In general, prior studies were appropriately referenced. Only few references has to be added.

      Line 48: Add to the already included reference "Dobersch et al., 2021" also the reference Singh et al., 2015 PMID 26045162.

      Thank you, we added this reference.

      Line 53: Add the corresponding reference after the word "respectively".

      We added the corresponding reference.

      Line 82: Add the corresponding reference after the word "them".

      We added the corresponding reference.

      Line 125: Add the corresponding reference after the word "cells".

      We added the corresponding reference.

      Line 130: The expression "...by analyzing the recruitment of the phosphorylated histone γH2AX..." is the first time that the authors mention in the manuscript the DNA damage maker γH2AX. I suggest that is better introduced as " ... by analyzing the recruitment of the DNA damage marker γH2AX (histone variant H2A.X phosphorylated a serine 139, Rogakou EP, et al., 1998, PMID 9488723) to DSB sites."

      Thank you very much for your suggestion. In the revised manuscript we corrected the text as suggested.

      Line 199: Add the corresponding reference after the word "formation".

      We added the corresponding reference.

      Line 205: Add the corresponding reference after the word "cells".

      We added the corresponding reference.

      3.2 The use of the English language is appropriate throughout the manuscript. However, there are minor errors in the use of punctuation marks, in the use of prepositions and typos. I will list some of them below. However, I would like to recommend that manuscript is corrected by an English native speaker.

      Thank you for your careful review of our manuscript. We corrected all the errors listed. A college proficient in English has reviewed the revised manuscript.

      Line 41: "...and reproductive systems; genome instability also..." the semicolon can be replaced by a period.

      Line 43: "Since early in development DNA is under constant endogenous..." between "development" and "DNA" there should a comma.

      The sentence in lanes 53-55 has to be rephrased.

      Lines 62-63: the expression "...throughout life." should be substituted.

      Line 70: The abbreviation "rDNA" has to be explained the first time that is used.

      Lines 81-82: It has to be explained for the scientist that is not specialized in the field of nucleophagy, how the integrity of the genome is threatened by micronuclei and nuclei-derived material.

      √ Lines 106-110: The sentence is long. It would be easier to understand for the reader if this sentence is divided into two sentences.

      Lines 121-122: The subtitle should be rephrased.

      Lines 132-138: The sentence is long. It would be easier to understand for the reader if this sentence is divided into two sentences, e.g. with a period before the word "hence".

      Lines 143-144: "... in a subpopulation of healthy, untreated cells...". The interpretation of "healthy" might be subjective. I would like to suggest substituting in the complete manuscript the word "healthy" by "control".

      Line 163: The abbreviation for γH2AX was already introduced in line 130.

      Line 182: A comma after "cell lines" is missed.

      Line 183: delete "either". √ Lines 190-194: The sentence is long. It would be easier to understand for the reader if this sentence is divided into two sentences, e.g. with a period after the word "decreased" in line 191.

      Line 218: I assume that instead of "bus", it should be "buds".

      Line 220: I assume that instead of "iRNA", it should be "siRNA". In addition, it is the first time that the abbreviation is used. Thus, I suggest introducing it as "...was silenced by specific small interfering RNA (siRNA) previous to ..."

      Line 327: delete the word "chronic".

      Line 344: I assume that instead of "(figures 4C)", it should be "(Figure 4D)".

      3.3 The structure of the Figures is ok for the peer review process and it might be optimized during editing of the manuscript. Nevertheless, I would like to suggest to the authors to increase the lettering size throughout all the figures. It will make the figures more intuitive.

      Thank you for the suggestion. We increase the font size of the figures.

      Reviewer #2 (Significance (Required)):

      Significance

      The work presented by Muciño-Hernandez G et. al. will be clearly a significant contribution to the scientific community working on autophagy, DNA damage repair and cancer, among others. It will be of interest to a broad spectrum of scientists, as I will elaborate in the following lines. The authors propose that MEFs have basal levels of genomic DNA damage under physiological conditions, which are cleared by basal levels of nucleophagy. On one hand, these findings are in line with various publications demonstrating that DNA-dependent biological processes in the cell nucleus, such as transcription, replication, recombination, and repair, involve intermediates with DNA breaks that may compromise the integrity of DNA. Thus, there must be mechanisms that ensure the integrity of the genome during these processes under physiological conditions, one of them seems to be nucleophagy. This perspective might explain the fact that proteins and histone modifications that were initially characterized during DNA repair also play a role during transcription, recombination, and replication. For example, phosphorylated H2AX at S139 (γH2AX) is often used as a marker for DNA-DSB [PMID 9488723]. However, accumulating evidences suggest additional functions of this histone modification [PMIDs 19377486; 22628289; 23382544]. In addition, McManus et al. [PMID 16030261] analyzed the dynamics of γH2AX in normal growing mammalian cells and found γH2AX in all phases of cell cycle with a maximum during M phase, suggesting that γH2AX may contribute to the fidelity of the mitotic process, even in the absence of ectopic- induced DNA damage. Further, Singh et al [PMID 26045162] and Dobersch et al [PMID 33594057] report that γH2AX plays a role in transcriptional activation in response to TGFB-signaling. Moreover, classical DNA-repair complexes have been linked to DNA demethylation and transcriptional activation [PMIDs 17268471; 28512237; 25901318], and DNA-DSB is known to induce ectopic transcription that is essential for repair, supporting a tight mechanistic correlation between transcription, DNA damage, and repair [PMID 24207023]. Perhaps, the authors might consider introducing several of the aspects and the citations written above into the Discussion section of the revised version of their manuscript. On the other hand, most of the published data related to nucleophagy have been obtained from cancer cells. Muciño-Hernandez G et. al. obtained their data implementing MEFs to demonstrate that the proposed mechanisms take also place under non-pathological conditions, what is one of the novel aspects of the present work.

      I hope that my suggestions help the authors to improve their manuscript, thereby reaching the standards of manuscripts recently published in journals affiliated to Review Commons AND increasing the impact of their contribution to the scientific community.

      Thank you very much for your suggestions. They helped us to present now a much-improved manuscript. We hope the revised work is now suitable for publication in the Journal of Cell Science.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Muciño-Hernández and colleagues suggest that basal formation of nuclear buds and micronuclei increases in primary mouse embryonic fibroblasts following etoposide-induced double strand breaks (DSBs). The study combines the use of biochemical methodologies with confocal and super resolution microscopy in an effort to explore the contribution of nucleophagy to genome stability. The authors provide evidence that autophagy is induced upon etoposide treatment. They detected GFP-LC3 and BECN1 signals in nuclear buds and micronuclei even in untreated control and to a higher extent in etoposide-treated cells. Then, the authors examined whether nucleophagy is required for the removal of nuclear buds and micronuclei, by treating fibroblasts with control and Atg7 siRNA. The authors claim that the percentage of cells with micronuclei or nuclear buds decrease upon Atg7 knockdown, suggesting that components of the autophagy machinery induce the formation of these nuclear abnormalities. Moreover, Type II DNA Topoisomerases (TOP2A and TOP2B) and the ribosomal protein fibrillarin were detected in nuclear buds and micronuclei in fibroblasts treated or not with etoposide. Again in this case, GFP-LC3 was detected in fibrillarin-containing nuclear alterations. Based on these observations, the authors suggest that nucleophagy contributes to the elimination of chromosomal fragments or nucleolar bodies exiting the nucleus under DNA damage -inducing conditions. Specifically, they propose a key role for nucleophagy in maintaining genome stability by eliminating Type II DNA Topoisomerase cleavage complex (TOP2cc) and nucleolar components such as fibrillarin.

      While it seems that there is a relationship between nuclear-extruded TOP2 with endogenous BECN1 and GFP-LC3 suggesting autophagic engagement, inconsistencies of fluorescent images between different figures indicate possible technical problems/limitations (please see specific comments, below), compromising authors' claims. LC3 immunoblotting and GFP-LC3 localization results appear over-interpreted (comments below). Neither TOP2 nor Fibrillarin have been shown to be actual autophagic substrates. Also, the link between genomic stability, micronuclei formation and autophagy has been previously reported (Zhao et al., PMID: 33752561).

      An additional major concern is relates to nucleophagy being a selective type of autophagy. As such it requires efficient recognition and sequestration of the nuclear material destined to be degraded. Cargo specificity is mediated by receptor proteins, but no evidence for such receptors is provided in this study. Moreover, there is no real mechanistic insight on how nucleophagy mediates genome stability and how this can be interpreted in terms of cell survival under physiological and stress conditions. In other words, the biological significance of the findings presented has not been addressed.

      Specific comments are summarized below:

      The authors suggest that autophagy is induced after etoposide treatment and during the DNA repair process. However, the Western blot presented in Fig. 2A is not convincing and quantification does not support a significant autophagy induction in any of these cases. Autophagy appears to be induced 1h after etoposide removal, as evidenced LC3II/LC3 I increase (Fig. 2A and S2A). Nevertheless, all these changes should be more rigorously assessed.

      Thank you for the observation. We removed the analysis of LC3II/LC3I by Western blot in the revised manuscript because a basal and induced elimination of nuclear components by the autophagic machinery occurs only in a subset of cells. It needs to be analyzed cell by cell. Pooling together all the cells dilutes the observation. Nevertheless, the dynamic intracellular distribution of both LC3 and BECN1 indicate autophagy induction. Please notice in revised Figure 2A that LC3 surrounding vesicles increases after 2h of DNA damage and diminish when DNA is repaired. BECN1 in control MEFs is highly concentrated inside the nucleus, at the nucleolus as it co-localized with Fibrillarin (new Figure 4E), and after DNA damage it redistributes towards the cytoplasm. Finally after DNA repair, BECN1 appears highly concentrated at the nucleus again. A further evidence of a functional autophagic flux, is the observation of an increasing number of acidic vesicles stained with Lysotracker in response to DNA damage, which were reduced after DNA repair. Some of the micronuclei were also co-stained with Lysotracker, suggesting their lysosomal degradation.

      Line 190 and Fig. 2A: It is totally unclear whether "autophagy activation" takes place during the two waves described. There is no LC3B-I to LC3B-II conversion to initially suggest "autophagy activation". It rather suggests that autophagy is stalled. Fig. 2F shows that GFP-LC3 is strongly fluorescent into the lysotracker-stained lysosomes, further pointing to possible functional or technical problems.

      As pointed out by reviewer 1, the images presented in original Figure 2F were over-exposed. In the current version we replaced those images with new images of better quality. We also reorganized the presentation of the data, and in revised Figure 2A we present photos where more convincingly can be observed a co-localization of BECN1 with LC3, with o without Lysotracker signal in nuclear buds and micronuclei. We also performed immunolocalization of endogenous LC3 (new Figure 2D) to rule out a possible misinterpretation of GFP-LC3 aggregates. As explained before, we removed original Figure2A.

      Fig. 2B and Sup. Fig. 2B: BECN1 staining looks problematic. There is extreme BECN1 accumulation in the nucleus. Are those nuclear patterns of endogenous BECN1 and GFP-LC3 normal (see also minor comment 6 and 7)? Is there literature supporting such a distribution?

      Yes, it has been documented BECN1 localization in the nucleus during development and in response to DNA damage stimuli such as ionizing radiation, and with a function related to DNA repair alternative to autophagosome formation (Fei Xu, et al. 2017, Scientific Reports | 7:45385 | DOI: 10.1038/srep45385). In the current manuscript we also detected endogenous LC3, to avoid a possible artifact with GFP-LC3 expression. We observed endogenous LC3 also localized in the nucleus (new Figure 2D).

      It is hard to imagine how BECL1 is implicated in a (here hypothetical) nuclear lamina degradation event driven by LC3-lamin B1 direct interaction (Dou et al., 2015). BECL1 is an upstream to LC3 component and is a subunit of the PI3K complex catalyzing the local PI3P generation. The above should cause recruitment of the downstream autophagic machinery. Other subunits of the same complex or downstream effectors should be identified at the same spots to support authors' claims.

      Our proposal that BECN1 is contributing to nucleophagy is supported by its co-localization with LC3 and Lysotracker stained vesicles (new figure 2A), as well as with TOP2 (Figure 3A-C). We appreciate the interesting idea of the reviewer; we certainly did not analyze the presence of BECN1 interacting partners. We agree, further studies analyzing their localization could complement our current findings. Supporting our work, others have observed UVRAG in the nucleus, specifically in centromeric regions, and it also has a role in DNA repair through its interaction with DNA-PK (Dev Cell. 2012 May 15; 22(5): 1001–1016. doi: 10.1016/j.devcel.2011.12.027). Given the anti-tumorigenic role of several autophagic molecules, it is tempting to speculate that several of them could have triple roles in the nucleus: directly interacting with DNA repair machinery, eliminating unrepairable DNA damaged and preventing excessive protein accumulation in the nucleus. Further experiments are necessary to probe this hypothesis, but are beyond the scope of the present manuscript.

      U, 2h D and 5h R images of whole cells are necessary. The authors should also provide representative images of cells under different conditions i.e. control, etoposide-treatment and during DNA repair. Along similar lines, untreated control cells are not included in Fig. 2E and F. These images are needed for a better comparison between normal and DNA damage-inducing conditions.

      The reviewer is right. In the revised Figure 2 we included representative images of control cell, Etoposide-treatment and during DNA repair cells. Images of whole cells are now shown in supplementary Figure 2S.

      The authors state that autophagy is required for nuclear buds and micronuclei formation. However, the data shown in Fig. 2G and H are hardly convincing given that the statistical difference between cells treated with control and Atg7 siRNA is not strong (for example, *p˂0.5, 5h after etoposide removal). To provide further support to this notion, they should use cells from autophagy defective mutants and examine the appearance of nuclear abnormalities across different conditions compared to control cells.

      We agree with the reviewer and followed his/her suggestion. We established collaboration with Dr. Sandra Cabrera, who kindly shared with us Atg4b-/- mice from which we isolated MEFs to compare side by side with WT MEFs the appearance of nuclear abnormalities. We confirmed a statistical significant reduction in the formation of nuclear buds in both conditions: silencing the expression of Atg7 by siRNA and in Atg4b-/- MEFs, suggesting that the autophagic machinery contributes to buds formation (new Figure 2F-G). Interestingly, we observed a different result analyzing micronuclei. While we found no statistical significant difference in the percentage of cells with micronuclei silencing the expression of Atg7 by siRNA, we found a statistical significant increment of cells with micronuclei in Atg4b-/- MEFs (new Figure 2F-G). This apparently discrepant result suggests that nuclear buds and micronuclei have a different mechanistic origin. A difference in the biogenesis of buds and micronuclei has been previously suggested studying cells cultured under strong stress conditions that induce DNA amplification, as well as in cells under folic acid deficiency. While interstitial DNA without telomere was more prevalent in buds than in micronuclei, telomeric DNA was more frequently observed in micronuclei (Fennech et al. 2011, Mutagenesis 26:125-132).

      Lines 223-228: The role of autophagic machinery in the formation of nuclear buds is not supported and furthermore hard to conceptualize. How the components of autophagy are implicated during the nuclear buds and micronuclei formation? Colocalization of autophagic proteins might mean that autophagy is engaged at some point after or during the above formation. The causal, mechanistic and temporal aspects of the above budding and nucleophagic events need experimental support and/or more accurate interpretation.

      We agree with the reviewer, and now we expressed our interpretation with more caution. The role of autophagic machinery in the formation of nuclear buds is supported by the following findings: a) the localization of LC3 and BECN1 in nuclear buds; b) the inhibition of Atg7 expression by specific siRNAs reduced the number of cells with buds and c) Atg4b-/- MEFs had reduced number of cells with buds (new Figure 2G). How the components of autophagic machinery are implicated in nuclear buds formation is an interesting question and deserves further investigation, beyond the scope of the present manuscript.

      The authors claim that nucleophagy eliminates topoisomerase cleavage complex because TOP2A and TOP2B appear to more extensively co-localize with GFP-LC3 and BECN1 after etoposide-induced DSBs. However, the quantification presented in Fig. 3D-F to support this statement does not, in general, show a statistically significant difference in fibroblasts across different conditions (normal, etoposide treatment, etoposide removal).

      Autophagic elimination of TOP2 protein is supported by the following findings: 1) both BECN1 and LC3 were detected in micronuclei in acidic vesicles (labeled with Lysotracker), which is indicative of the autolysosomal nature of the cytoplasmic compartment containing TOP2 (Figure 2A); 2) TOP2B was found by electron microscopy in some cells exiting the nucleus surrounded by LC3 (Figure 3G); 3) TOP2B accumulated in cells lacking ATG4, as expected if it is degraded by autophagy (Figure 3H).

      Why would BECLIN colocalise with TOP2B in Figure 3g, given that beclin is involved in the initiation process?

      We think that BECN1 is involved in additional functions to the initiation process of bud formation. For example, it has been shown by others that BECN interacts with TOP2 (Dev Cell. 2012 May 15; 22(5): 1001–1016. doi: 10.1016/j.devcel.2011.12.027). It could be working as an autophagic receptor targeting TOP2cc to buds and micronuclei. We are aware that further studies are necessary to test this hypothesis, but they are beyond the scope of this manuscript.

      Fig. 4A and B: There is no enrichment of GFP-LC3 in "the nuclear alterations containing Fibrillarin" as stated in lines 341-343 comparing to the rest of the cellular GFP fluorescence.

      It is true that there is not a local enrichment of GFP-LC3 as those normally reported as LC3 puncta in response to autophagy induction by starvation, for example. Nevertheless we are confident of the specificity of the observation, as not every nuclear alteration was found having GFP-LC3. We detected GFP-LC3 in 72% (mean ± 3.61 SD) of the nuclear alterations containing Fibrillarin in untreated cells, in 65.7% (mean ± 1.97 SD) of cells with 2h of DNA damage and in 90.33% (mean ±6.36 SD) after 5 h of DNA repair (in 5 independent experiments).

      Moreover, there is no statistical significance in Fig. 4C and D measurements limiting the safety of authors' conclusions in lines 341-346.

      We agree with reviewer´s observation. We repeated these experiments two more times and did not find a statistical significant difference in the percentage of cells with nuclear lesions containing Fibrillarin and GFP-LC3 after DNA damage nor after DNA repair. These results suggest that nucleolar DDR is a particular response, independent of DDR elsewhere in the genome, as has been suggested (reviewed in Nucleic Acids Research, 2020, Vol. 48, No. 17 9449–9461; doi: 10.1093/nar/gkaa713). An alternative is that the release of nucleolar components is not enhanced by Etoposide at the dose and time used in this work.

      Lines 368-370: As discussed by the authors and reported in previous publication (Xu et al., 2017), "BECN1 interacts directly with TOP2B, which leads to the activation of DNA repair proteins, and the formation of NR and DNA-PK repair complexes", independent of its role in autophagy. Currently, there are no rigorous findings supporting the contribution of BECN1 (as a functional constituent of the core autophagic machinery) to nuclear damaged material extrusion (lines 382-384).

      We agree with the reviewer in that we did not perform an assay to demonstrate that BECN1 is contributing to TOP2 nuclear extrusion as a functional constituent of the core autophagic machinery. Nevertheless, the following data support the proposal of an autophagic elimination of TOP2cc: 1) TOP2B was detected in micronuclei containing BECN1 (Figure 3B); 2) BECN1 was found in micronuclei containing LC3 and in an acidic vesicle (labeled with Lysotracker), indicative of the autolysosomal nature of the compartment (Figure 2A); 3) TOP2 was found in some cells exiting the nucleus surrounded by LC3 (Figure 3G); d) TOP2 accumulated in cells lacking ATG4, suggesting its autophagic degradation (Figure 3H).

      Lines 435-441 and Fig. 5: The current findings do not support the proposed model. It is hard to support and conceptualize the statement "proteasome and nucelophagy function in a dynamic way inside the nucleus".

      The reviewer is right. We made a mistake integrating an interpretation within the summary of the actual findings of this work. We correct the text in the current version.

      In Fig. 5, LC3 appears to decorate inner nuclear membrane and probably to interact with some of the other proteins depicted, which is misleading.

      We agree with the reviewer. We removed the scheme in the current manuscript.

      Beclin-1 appears to interact with Fibrillarin (Nucleolus).

      This is correct. We observed by immunofluorescence a co-localization of BECN1 with Fibrillarin (new Figure E), and demonstrated by co-immunoprecipitation that they are constituents of a complex (new Figure F).

      Most of the differences in Sup. Fig. 3 lack statistical significance compromising the authors' claims.

      We agree with the reviewer. To perform a separated statistical analysis of the percentage of cells with nuclear buds or micrnonuclei did not provide further information. We eliminated this analysis in the current version.

      Many conclusions are drawn by colocalisation-immunofluorescence analysis. Co-immunoprecipitation experiments should also be performed to show that TOP2B and fibrillarin interact with LC3/autophagic machinery.

      Thank you for your suggestion. We performed immunoprecipitation analysis and confirmed an interaction of Fibrillarin with BECN1, this result is now presented in Figure 4F. We found no co-immunoprecipitation of LC3 with either Fibrillarin or TOP2A, nor of TOP2B with BECN1.

      Additionally, colocalisation analysis should be performed using tools such as Pearson's correlation and is an initial indication of nucleophagy. In the case of fibrillarin, immunofluorescence images do not indicate colocalisation, they need to be repeated.

      The transport of Fibrillarin out of the nucleus by micronuclei formation and its autophagic degradation implies that both proteins are contained in the same vesicular compartment, it does not necessarily requires a direct interaction of Fibrillarin with LC3. Therefore, a co-localization detected by Pearson´s analysis is not a necessary confirmation of the nucelophagic degradation of Fibrillarin. Actually, Fibrillarin does not seem to interact with LC3, since we could not detect both proteins by co-immunoprecipitation. Nevertheless, we observed a nucleolar localization of BECN1 overlapping with Fibrillarin (new Figure 4E), and we confirm by co-immunoprecipitation the presence of both BECN1 and Fibrillarin in a complex (new Figure 4F). Following reviewer´s advice, we repeated two more times the analysis of Fibrillarin immunolocalization. We corroborated its localization in micronuclei and nuclear buds in 5.86% (mean ± 5.03 SD) of untreated cells, indicating a basal level of nucleolar material exclusion from the nucleus. Interestingly, the percentage of cells with Fibrillarin in nuclear alterations did not increased with statistical significance with Etoposide treatment. At 2 h of DNA damage we observed only a slight increase to 6.8% (mean ± 4.03 SD) of cells having nuclear buds and micronuclei with Fibrillarin, while the number of cells with nuclear lesions increased to 30.6% (mean ± 4.2 SD). Similarly, the proportion of cells having Fibrillarin in nuclear lesions after 5 h of DNA repair increased only to 7.66 % (mean ±6.08 SD), while the total number of cells having nuclear buds and micronuclei increased to 38.42% (mean ± 9.3SD). These results suggest that nucleolar components are constantly sent out of the nucleus as a homeostatic process, and not significantly in response to Etoposide-induced DSB.

      Measurement of LC3/fibrillarin positive puncta should be performed, under basal conditions, genotoxic, and nucleolar stress under control and Atg7 knockdown conditions.

      Since we observed no statistical significant change in the number of micronuclei with Fibrillarin under Etoposide-induced DSB nor DNA repair, we did not perform the suggested experiment.

      Moreover, if nuclear proteins described are substrates of autophagy, then their levels would decrease upon autophagic induction i.e. starvation or in this case DNA damage and nucleolar stress. Thus, western blot analysis of relative protein levels can be performed.

      Thank you for the suggestion. Since only 5% of the cells have micronuclei with Fibrillarin, and this proportion did not increased significantly in response to DNA damage, it is unlikely to detect a difference in the amount of Fibrillarin in response to autophagy manipulation performing a population analysis (as it is in a Western blot). Nevertheless, we compared Fibrillarin abundance by Western blot in WT MEFs vs. Atg4-/- MEFs untreated (U), treated for 2 h with Etoposide (D) and after 5 h of DNA repair (5) shown in the top panel of the follow figure. As expected, we found no statistical significant difference determined by 2way-ANOVA followed by Sidak´s multiple comparisons test (n=3). Ajusted P values are shown for each comparison (left graph).

      On the other hand, since the percentage of cells with TOP2B in micronuclei and nuclear buds increased in response to DNA damage and during DNA repair, it was possible to detect a statistical significant accumulation of TOP2B in cells lacking ATG4 after 5h of DNA repair (bottom panel and right graph in the figure above). This observation is now included in new Figure 3H. Supporting our finding, TOP2A is reduced in cancerous cells grown under glucose deprivation (Alchanati, I., et al. 2009. PLoS One. 4:e8104).

      Endogenous LC3 nuclear buds should also be detected to verify nucleophagy as GFP-LC3 has been shown to aggregate, causing artifacts under certain conditions.

      We agree with the reviewer. We detected endogenous LC3 by immunofluorescence. This result is now included in Figure 2D.

      Minor comments

      In the Discussion section, the paragraph focused on the role of the ubiquitin-proteasome system is not substantiated by the data presented in the manuscript. Along similar lines, formation of aggresomes following etoposide treatment and their subsequent removal has not been monitored.

      We apologized for the confusion, we corrected the text to now clearly distinguish which are our findings and which are published data that we just attempt to relate.

      Western blots of better quality should be provided with assigned markers of protein size.

      The Western blots shown have markers of protein size.

      There are several language errors in the text that need to be corrected. Several sentences are too long and confusing or must be re-phrased. For example, see the lines: 123-125, 209-210,212, 218,221-222.

      We apologize for our language errors. We corrected all errors indicated and asked colleges proficient in English to review our text.

      Fig. 1B. Place "μm" into parenthesis.

      Sup. Fig. 1B: Replace "gH2AX" with "γH2AX".

      Fig. 1D: Separate DAPI and γH2AX channel images would be informative.

      We now show also separated channels.

      Fig. 2E: Enlarged separate DAPI, GFP-LC3 and lamin A/C channel images would be informative.

      We now show also separated channels.

      Line 218: Replace "bus" with "buds".

      Fig. 2B, 2E, 2F, 3A and probably Sup. Fig. 2B represent MEFs treated for 2h with etoposide. The pattern of GFP-LC3 in 2B looks extensively nuclear and almost absent from cytoplasm.

      We confirmed our finding detecting endogenous LC3.

      In addition, Fig. 2B and 3B represent MEFs treated for 2h with Etoposide. The pattern of endogenous BECN1 in Fig. 2B looks extensively nuclear and almost absent from cytoplasm. In Fig. 3B the pattern is notably different.

      BECN1 pattern of distribution is rather similar, predominantly in the nucleolus. We demonstrate it further by detecting BECN1 overlapping localization with Fibrillarin (new Figure 4E) and co-immunoprecipitation (new Figure 4F).

      Sup. Fig. 2C: Index box is not properly aligned.

      Thank you. We reviewed the alignment of each index box and reorganized the figure in the revised manuscript to add the whole blots of the new experiments we performed to analyze MEFs Atg4-/-.

      Lines 154, 343 and 837: Replace "DBS" with "DSB".

      Thank you, we corrected these typos.

      Fig. 4 panels are not clearly cited at the text.

      We apologize, we reviewed that they are clearly cited now.

      Line 220: siRNA

      Thank you, we corrected the text.

      Lines 373-374: References "Lenain et al., 2015" and "Li et al., 2019" are missing.

      Thank you for noticing it, we added the missing references. We use EndNote X9, we did not expect it to fail.

      Lines 400-401 and 407: Probably the second "Latonen, 2011" reference needs "et al".

      It is correct. We now cite this paper properly.

      Line 427: Do authors refer to Fig. 1E rather than Fig. 2B?

      Yes, we are sorry for this mistake. Thank you for pointing it out.

      Line 434: Correct "clearance" spelling.

      Thank you, we corrected it.

      Reviewer #3 (Significance (Required)):

      The authors suggest that nucleophagy contributes to the elimination of chromosomal fragments or nucleolar bodies exiting the nucleus under DNA damage -inducing conditions. Specifically, they propose a key role for nucleophagy in maintaining genome stability by eliminating Type II DNA Topoisomerase cleavage complex (TOP2cc) and nucleolar components such as fibrillarin.

      However, neither TOP2 nor Fibrillarin have been shown to be actual autophagic substrates. Also, the link between genomic stability, micronuclei formation and autophagy has been previously reported (Zhao et al., PMID: 33752561).

      We found nuclear buds and micronuclei with markers of different stages of the autophagic pathway, suggesting an active role of autophagy proteins in buds formation, and micronuclei removal. We detected TOP2 and Fibrillarin in micronuclei and propose their elimination by nucleophagy by the following findings: 1) both BECN1 and LC3 were detected in micronuclei in acidic vesicles (labeled with Lysotracker), which is indicative of autolysosomes (Figure 2A); 2) TOP2B was found by electron microscopy in some cells exiting the nucleus surrounded by LC3 (Figure 3G); 3) TOP2B accumulated in cells lacking ATG4, as expected if it is degraded by autophagy (Figure 3H); 4) BECN1 has a dynamic cytoplasmic-nucelar traffic in response to DNA damage; 5) BECN1co-localized with Fibrillaron in nucleolus and both proteins were co-immunoprecupitated.

      The link between genomic stability, micronuclei formation and autophagy has been previously reported only in cancerous cells. Considering that physiological DNA damage occurs constantly in the cell, basal nucleophagy is potentially fundamental to maintain cells healthy.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper addresses an important question: whether the conduction velocity in white matter tracts is related to individual differences in memory performance. The authors use novel MRI techniques to estimate the "g-ratio" in vivo in humans - the ratio of the inner axon relative to the inner axon plus its outer myelin sheath. They find that autobiographical recall is positively related to the g-ratio in a specific white matter tract (the parahippocampal cingulum bundle) in a population of 217 healthy adults. This main finding is extended by showing that better memory is associated with larger inner axon diameters and lower neurite dispersion, which suggests more coherently organised neurites. The authors also argue that their results show that the magnetic resonance (MR) g-ratio can reveal novel insights into individual differences in cognition and how the human brain processes information.

      The study is exploratory in nature and the analyses were not pre-registered. The technique has not been used before to associate cognitive performance with MR estimates of conduction velocity in candidate white matter tracts. It is therefore unknown how strong any associations are likely to be and what sort of sample size might be needed to observe them. Nevertheless, if the technique proves to be reliable, then it certainly offers a valuable new tool to understand individual differences in cognitive abilities. However, brain structure to behavior associations are notoriously variable across studies and have been argued to require very large sample sizes to obtain reproducible results.

      We respectfully disagree that the study was exploratory. We had distinct aims and hypotheses from the outset. Our prime interest is in autobiographical memory, the hippocampus and its connectivity. This motivated our focus on three specific white matter tracts. We also planned from the time of study design to examine the MR g-ratio, and even contributed to refining the pre-processing pipeline for this approach, as reported in a previous paper (Clark et al., 2021, Frontiers in Neuroscience). Moreover, in the current manuscript we outlined well thought through possible outcomes and declared specific predictions.

      Regarding pre-registration, due to the scope of this work, the experiment was planned eight years ago, and data collection commenced seven years ago. At that time, formal pre-registration was not common practice. However, it has been a long-standing feature of our Centre that proposed studies and their analysis plans undergo rigorous internal peer review, including presentation to the whole Centre, before data acquisition can commence. The proposal for the research under consideration here was presented on 26th September, 2014.

      As noted in our response to the Editors’ Public Evaluation Summary above, someone has to be the first to report a novel result, and we believe that the depth and transparency of our approach permits confidence in the findings. Not least, and to reprise, because we employed the most widely-used and best-validated method of testing autobiographical memory recall that is currently available – Levine’s Autobiographical Interview. Our primary analyses were performed using the behavioural outcome measure from this test, the results of which were directly compared to those from a closely-matched control measure to test whether significantly larger effects were observed for our variable of interest. The potential for false positives was further reduced by extracting microstructure data from hypothesised tracts of interest (instead of performing whole brain voxel-wise analyses), with statistical correction performed on all structure-behaviour analyses. Moreover, we performed partial correlations with age, gender, scanner and number of voxels in a region of interest (ROI) as covariates. Complementary investigations were also conducted using other commonly-reported measures, providing supporting evidence. We report all analyses (and provide all the source data), including those finding no relationships. The consistent results throughout were associations between autobiographical memory recall ability and the microstructure of the parahippocampal cingulum bundle only. Moreover, thanks to the excellent suggestions of the Reviewers, the revised version reports additional analyses that allow us to further corroborate and interpret our findings.

      Our sample of 217 participants allowed for sufficient power to identify medium effect sizes when conducting correlation analyses at alpha levels of 0.01 and when comparing correlations at alpha levels of 0.05 (Cohen, 1992, Psychological Bulletin). While it has recently been suggested that thousands of participants are required in order to investigate brain structure-behaviour associations (Marek et al., 2022, Nature), other, more sophisticated, analyses suggest that samples of ~200 participants can be sufficient, in line with our estimates (Cecchetti and Handjaras, https://psyarxiv.com/c8xwe; DeYoung et al., https://psyarxiv.com/sfnmk). Given that our study was principled, well-controlled, analysed appropriately and produced very specific and consistent findings, we are confident that the findings are robust.

      The authors decided to analyse performance on a single task - the Autobiographical Memory Interview - and identified three candidate white matter tracts that connect the hippocampal region with other brain regions. While it is clear why these three tracts were chosen, it is less obvious why the authors chose to investigate associations with the Autobiographical Memory Interview and not other memory tests that were part of the battery of tests administered to the participants. It is reasonable to assume that something as general as the conduction velocity of a white matter tract would have an effect on memory ability across a range of tasks, so to single out one seems an unnecessarily narrow focus.

      Our main interest over many years, and hence the focus of this study, is autobiographical memory recall because it directly relates to how people function in real life. As noted above, autobiographical experiences occur in dynamic, multisensory, multidimensional, non-linear, ever-changing contexts; they involve actively engaging with the environment and other people; they are embodied; they span milliseconds to decades. Many of these features cannot be captured by laboratory-based episodic memory tests. This issue is increasingly being discussed (for example, see recent reviews by Nastase et al., 2020, NeuroImage; Mobbs et al., 2021, Neuron; Miller et al., 2022, Current Biology). It is further laid bare in McDermott et al.’s (2009, Neuropsychologia) meta-analysis of functional MRI studies which showed that laboratory-based and autobiographical memory retrieval tasks differ substantially in terms of their neural substrates. Consequently, we were not surprised to find that when we analysed laboratory-based memory test performance, there were no correlations with the MR g-ratio. Recall of vivid, detailed, multimodal, autobiographical memories may rely on inter-regional connectivity to a greater degree than simpler, more constrained laboratory-based memory tests. Therefore, as well as speaking to conduction velocity, these findings also contribute to wider discussions about real-world compared to laboratory-based memory tests. We thank the Reviewer for making the excellent suggestion to include these additional data, analyses and discussion points.

      The results of the study are interesting and highlight a key role of the parahippocampal cingulum bundle in autobiographical memory recall. The results are corrected for multiple comparisons across the three fiber tracts of interest and the recall of "external details" provides a nice control compared to the "internal details" which are the measure of interest. The main findings are extended to show that it is likely to be an increase in axon diameter and an increase in neurite coherency that characterize those individuals with better autobiographical recall. Despite these positives, it remains unclear whether memory recall, in general, is better in people with higher g-ratios in this tract (as implied in the Abstract), or if this effect is specific to scores on the Autobiographical Memory Interview.

      Our interest is in autobiographical memory, and so we employed the most widely-used and best-validated method of testing autobiographical memory recall that is currently available – Levine’s Autobiographical Interview. Not only does this test include a control measure, external details (as noted by the Reviewer), but we had independent raters score the autobiographical memory descriptions, and found that the inter-class correlation coefficients were very high (see Materials and Methods). Despite using this current, gold standard approach, at the request of the Reviewer we have now analysed data from eight additional laboratory-based memory tests. These are standard memory tests that are often used in neuropsychological studies: testing recall - the immediate and delayed recall of the Logical Memory subtest of the Wechsler Memory Scale IV, the immediate and delayed recall of the Rey Auditory Verbal Learning Test, the delayed recall of the Rey–Osterrieth Complex Figure; testing recognition memory - the Warrington Recognition Memory Tests for Words and Faces; testing semantic memory - the “Dead or Alive Test”. While these tests can assess some aspects of memory recall, they cannot be regarded simply as proxies for autobiographical memory recall, for the reasons we outlined in our response to the previous point. They do not capture key aspects of autobiographical memories. It is therefore all the more interesting that we found no associations between these laboratory-based memory tasks and the MR g-ratio of the parahippocampal cingulum bundle, in contrast to the relationship identified with autobiographical memory recall ability. Recall of vivid, detailed, multimodal, autobiographical memories may rely on inter-regional connectivity to a greater degree than simpler, more constrained laboratory-based memory tests. Therefore, as well as speaking to conduction velocity, these findings also contribute to wider discussions about real-world compared to laboratory-based memory tests. We thank the Reviewer once again for making the excellent suggestion to include these additional data, analyses and discussion points.

      Reviewer #2 (Public Review):

      In this study, Clark and colleagues tackle a very intriguing question: how differences in autobiographical recall abilities reflect in the human brain structure and function? To answer this question, they interviewed a large cohort of subjects and proceeded to acquire MRI data, specifically diffusion-weighted imaging and magnetization transfer data, to estimate the g-ratio, a measure of myelination deeply linked to conduction velocity. Looking at three specific white matter pathways of interest, all interconnecting the hippocampus with other brain structures, they studied the relationship between the g-ratio and the autobiographical recall abilities, together with many more measures from MRI. They found a significant positive association between the g-ratio of the parahippocampal cingulum bundle and the number of inner details from the interviews. These results can provide new potential directions to further study the underlying neural features beyond memory.

      I think that this is a very interesting article, it is well written, the methods are extensively explained, and the appendix provides further details for more expert readers. The authors put an effort into providing a comprehensive context in the introduction and in the discussion, and as a result, the paper seems overall quite suitable for both general and specialistic readerships.

      Thank you.

      The main issue I can currently see in the paper is that the mentioned relationship between g-ratio and recall abilities is then used to infer that better recall abilities are associated with higher conduction velocity and larger axons. The authors' line of reasoning is that given the hypothesized association, the increase in the g-ratio implies increases in myelin and axonal diameter. Despite this scenario being indeed possible given the current result, an increased g-ratio may also not indicate higher conduction velocity. In fact, the first potential inference would be that, without having any information on the axon size, the quantify of myelin can indeed be lower and as result, the conduction velocity would decrease. I understand that the authors expected higher conduction velocity associated with better autobiographical memory recall, but it is hard to see any experimental outcome that could have disproved this hypothesis: from the possible scenarios depicted in the introduction, any change in the g-ratio (and even not any change at all) could indicate higher conduction velocity. What would be then needed to corroborate one of these scenarios is some independent or complementary measure, which unfortunately is missing.

      The mentioned issue does not mean that the paper loses relevance - I think that it should focus on the very practical result, a change in myelination and microstructure, and discuss what are the potential implications, including the one that currently dominates the discussion section.

      Thank you for these comments and the opportunity to provide further clarification.

      First, we have now provided additional background information regarding the relationship between the MR g-ratio and conduction velocity. We explicitly note that while finding a significant relationship between the MR g-ratio and autobiographical memory recall suggests the existence of an association between autobiographical memory recall and parahippocampal cingulum bundle conduction velocity, it cannot speak to the direction of this association.

      Second, we have further noted that interpretation of the parahippocampal cingulum bundle MR g-ratio in relation to the underlying microstructure requires knowledge, or an assumption, about whether the associated change in conduction velocity is faster or slower. Given that faster conduction velocity is thought to promote better cognition (e.g. Brancucci, 2012; Dicke and Roth, 2016; Miller, 1994; Reed and Jensen, 1992), we interpreted our MR g-ratio findings under the assumption of faster conduction velocity, and now explicitly note in several places in the revised manuscript that this is an assumption.

      Third, we thank the Reviewer for the excellent suggestion that a complementary measure could help to further inform the findings. Consequently, we now also include additional analyses examining the relationship between the extent of myelination and autobiographical memory recall ability. This is possible using the magnetisation transfer saturation maps, which are optimised to assess myelination. Given our assumption of faster conduction velocity when interpreting our positive MR g-ratio correlations, then no relationship between parahippocampal cingulum bundle magnetisation transfer saturation and autobiographical memory recall would be expected. On the other hand, if conduction velocity is actually decreasing, then a negative correlation between magnetisation transfer saturation values and autobiographical memory recall ability would be observed. In fact, we found no relationship between parahippocampal cingulum bundle magnetisation transfer saturation and autobiographical memory recall. This suggests that myelin was not associated with autobiographical memory recall ability, supporting our assumption that relationships with the MR g-ratio were indicative of faster rather than slower, conduction velocity.

      We have now added these new data, analyses and discussion points to the revised manuscript.

      It would also be helpful to include some paragraphs on both interpretation and methodological issues when it comes to MRI-based microstructural imaging, which at the moment is lacking. This would provide a better picture of the results for a more general readership.

      We agree, and additional consideration of interpretational and methodological limitations have now been included in the manuscript.

      As one of the first works using an MRI-based microstructural measure of myelin, the g-ratio, to study cognition in a large cohort of subjects, I think this work will be a needed and significant step towards merging the neuroscience and MRI physics community - the methodology presented here is robust and could be used in many other applications.

      Thank you.

      Reviewer #3 (Public Review):

      The manuscript adds useful information about how structural properties of the brain are related to individual differences in autobiographical memory. A novel metric is used to assess features of white matter in tracts that are important for information exchange between the hippocampus and other brain regions. In one of these, the parahippocampal bundle, a relationship between the MR g-ratio and autobiographical memory recall is identified. This represents new and interesting information. The authors interpret the results in line with the theory that speed of signal transmission is important for cognitive function.

      Thank you for this positive summary.

    1. Author Response

      Reviewer #1 (Public Review):

      Rasicci et al. have developed a FRET biosensor that is designed to light up when cardiac myosin folds. This structure is extremely important to understand, and its link to the super-relaxed (SRX) state has not been fully shown. Their study provides a comprehensive review of the literature and provides compelling data that the 15 heptad+leucine zipper+GFP construct does function well and that the DCM mutant E525K has a similar IVM velocity despite a reduced ATPase compared with HMM. They rely on the ionic strength-dependent changes in the rate of MantATP release to argue that the E525K mutation stabilizes the 'interacting heads motif' (IHM) state, which makes logical sense.

      Strengths:

      Well written and comprehensive.

      Utilizes the appropriate fluorescence-based sensor for measuring the folding of the myosin structure. Provides a detailed range of techniques to support the premise of the study

      Weaknesses:

      Over-interpretation of the outcomes from this study means that the IHM and SRX are the same. Similar studies, e.g. Anderson 2018 and Chu 2021 support the opposite view that IHM and SRX are not necessarily the same, Anderson (and Rohde 2018) point out that S1 has some element of a reduced ATPase, this clearly cannot be due to folding of the molecule. Also, mavacamten was used in these studies to show that even S1 is inhibited suggesting that SRX and IHM are not connected. This is not to say that with enough supporting evidence that these observations cannot be over-ridden, it is just not clear that there is enough in this study to support this conclusion.

      We have revised our discussion to emphasize that our results support a model in which the SRX state is enhanced by formation of the IHM, but given the S1 and 2HP data the IHM may not be required for populating the SRX biochemical state (see page 8).

      I felt that the authors passed over the recent Chu 2021 paper too quickly, the Thomas group used a FRET sensor as well and provides a direct comparison as a technique, but with opposite conclusions. They also have supporting data in Rohde 2018 that their constructs were less ionic strength sensitive. It would be useful to understand what the authors think about this.

      We have discussed the Rohde and Chu papers in more detail in the discussion (see page 8). In the Rhode paper they used proteolytically prepared HMM and S1. Rohde found 20% SRX at all KCl concentrations in S1, while HMM shifted from 50% to 20% SRX in low and high salt conditions, respectively. Our results are different in terms of the absolute fraction of the SRX state but the trend is similar in terms of S1 being salt-insensitive and HMM being salt-sensitive. The difference could be proteolytic HMM, which is a longer construct, and proteolytic S1, which is prone to internal cleavage that can impact ATPase activity. Another difference could be the mixed isoform of mantATP used in previous studies and the single isoform of mantATP used on our study (see page 5)

      Reviewer #2 (Public Review):

      The paper by Rasicci et al. examines the impact of the DCM mutation E525K in beta-cardiac myosin on its function and regulation by autoinhibition. The role of the auto-inhibited state of beta-cardiac myosin in fine-tuning cardiac contractility is an active and exciting area of current research related to muscle biology and cardiomyopathies. Several studies in the past have linked the destabilization of the autoinhibited, super-relaxed (SRX) state of myosin to the pathogenesis of hypertrophic cardiomyopathy. This timely study provides one of the first few examples where the hypocontractile phenotype of a DCM mutation has been linked to the stabilization of the SRX state.

      One of the strengths here is the utilization of a wide variety of both pre-existing and novel biochemical and biophysical assays for the study. The authors have characterized a new two-headed long-tailed myosin construct containing 15-heptad repeats of the proximal S2 (15HPZ), which they show allows myosin to form the SRX state in vitro using single ATP turnover assays. The authors go on to compare the E525K and WT proteins using the 15HPZ myosin construct in terms of their steady-state actin-activated ATPase activity, in-vitro actin-sliding velocity and single ATP turnover measurements. These assays reveal that the predominant effect of this mutation is the stabilization of the SRX state which is maintained even at 150 mM salt concentration where the WT SRX is largely disrupted. This is an important observation because DCM mutations so far have been believed to only affect the force-generating capacity of myosin.

      One of the biggest strengths of this study is the attempt to develop a FRET-based approach to directly ask if the biochemical SRX state here correlates well with the structural IHM state, which is an important unresolved question in the field. The authors have designed a FRET pair (C-terminal GFP and Cy3ATP bound to the active site) that is sensitive to the relative position of the heads and the tail, allowing them to distinguish between the low-FRET closed IHM conformation and the no-FRET open conformation. Remarkably, the authors show that the salt dependence of the FRET efficiency values closely follows their results from the salt dependence of the percent SRX for both WT and E525K proteins. The authors then attempt to substantiate their FRET results by a direct visual analysis of the conformational states populated by both WT and E525K proteins at low salt using negative staining EM analysis. The authors have optimized conditions to allow the deposition of the IHM state on grids without adding the small molecule mavacamten, which was found to be necessary in an earlier study to visualize the closed state using EM. The authors conclude that the SRX state correlates well with the IHM state and that the E525K mutation indeed stabilizes the folded-back conformation of myosin.

      This study significantly strengthens the previously illustrated correlation between the SRX and IHM states and provides methodological advances (especially visualization of the IHM state by negative EM in the absence of cross-linking agents) that will be very useful to the field going forward. The observation that a DCM mutation can lead to stabilization of the folded back state is a novel insight that should spark interest in the field to test how broadly this applies to other DCM mutations. The conclusions of the paper are mostly supported by the data; however, some clarifications and qualifications are needed.

      Weaknesses:

      The extremely low enzymatic activity of the M2β 15HPZ myosins as compared to the WT S1 control (which is a historical control not assayed in parallel with the 15HPZ proteins), is concerning for the low protein quality of the 15HPZ myosins. The authors attribute the low kcat to the high proportion of SRX population in their ensembles. However, the DRX rates reported for the WT and E525K 15HPZ proteins in the single ATP turnover assay are ~3-4 fold lower than those of their S1 counterparts. These rates reflect basal turnover of ATP in the open state and thus should not be affected by the presence of the S2 tail, which leads to concerns about the 15HPZ protein activity. In addition, the very high percentage of stuck filaments in the in vitro motility assay for the 15HPZ constructs (despite the use of dark actin) is also concerning for significant amounts of enzymatically inactive protein.

      We thank the reviewer for pointing out the differences in the S1 and HMM DRX rates. We performed additional single turnover measurements with S1, adding two sets of measurements from one additional preparation (N=3), and we demonstrate that there is a significant increase in the DRX rates of WT S1 compared to WT HMM (see pages 4-5, Table 3, Figure 3- figure supplement 3). A faster rate in S1 was also reported in Rohde et al. 2018. Indeed, the DRX rates of E525K S1 are significantly higher than WT in S1, which we also now report in the results (see page 5, Figure 3 – figure supplement 3). We addressed the concerns about 15HPZ activity by performing NH4+ ATPase assays to demonstrate that the number of active heads was similar in S1 and 15HPZ HMM (see page 4). It is possible that the higher percentage of stuck filaments in the HMM motility is due to myosin heads in the IHM state on the motility surface, which generate a drag force by non-specifically interacting with actin, but further study is necessary to examine this question.

      The authors assert that the E525K mutation represents a new mechanism by which DCM-causing mutations lead to decreased contractility - by stabilizing the sequestered state rather than affecting motor function. However, there is no evaluation of the motor function (actin-activated ATPase activity or in vitro motility) of the E525K S1, which would reveal the effects of the mutation without confounding effects due to the sequestering of heads. Interestingly, in the single ATP turnover assay, the DRX rate of the E525K S1 is >2-fold higher than the WT control, suggesting that the mutation may have effects beyond stabilization of the SRX state. The conclusion that the E525K mutation's effect on myosin function is mediated via stabilization of the SRX state would be strengthened if the effects of the mutation on the motor domain alone were also known.

      We thank the reviewer for this suggestion. We performed actin-activated ATPase assays with WT and E525K S1 and found that E525K increases kcat and lowers KATPase, demonstrating enhanced intrinsic motor activity in the mutant S1 construct (see page 4, Figure 2B). This adds an interesting dimension to the manuscript because we report a mutant that enhances the intrinsic motor activity but stabilizes the SRX/IHM (see Discussion page 10). We did not perform in vitro motility, because this assay depends on the surface attachment strategy, and we would like to compare all constructs with the same attachment strategy using a C-terminal GFP tag (mutant and WT S1 and 15HPZ HMM). Therefore, we are making the S1 construct with a C-terminal GFP tag for this purpose, to be examined in a future study.

      While the authors show strong qualitative correlations between the SRX and IHM states using single ATP turnover, FRET, and EM experiments, attempts to quantitatively compare the fraction of heads in the IHM state using the various experimental approaches is problematic. For example, the R0 value of the FRET pair used here doesn't allow precise measurement of the distances being probed here to be made, but the distances are reported and compared to predicted distances. The authors report that the R0 for their FRET pair is 63 Å. Surprisingly the authors go on to use the steady-state FRET efficiency values to determine the average D-A distance (Fig 5B) which is 100 Å when all heads are in the IHM configuration and becomes larger than that when heads open. R0 of 63 Å allows a precise distance measurement to be made in the 31.5-94.5 Å range which corresponds to 0.5-1.5 R0. It is therefore technically incorrect to use the steady-state FRET efficiency values to determine the D-A distance here. Besides, there are several unknown factors here like orientation factor (κ2) which further complicate these calculations. Similarly, the quantification of IHM state molecules from the negative stain EM experiments is significantly hampered by the disruptive effect of the grid surface on the structure of the IHM state. The authors find that limiting the contact time with the grid to ~ 5s is necessary to preserve the IHM state.

      Despite that, only ~15% WT molecules were seen in the IHM state at low salt (Fig. 6B). In contrast, ~56% E525K molecules were in the IHM state. Both these proteins have similar SRX proportions (Fig. 3C) and similar FRET efficiency values (Fig. 5A) at this salt concentration. This mismatch highlights the problem arising due to not having a measure of the populations from the FRET data. It is not clear if the hugely different proportions of the IHM state in EM experiments are indicative of the relative stability of this state in the two proteins or a random difference in the electrostatic interactions of WT vs mutant with the grid. These experiments do not provide a correct idea of the %IHM in the two proteins. In the absence of any IHM population measurement, it is important to proceed with caution when quantitatively correlating the SRX and IHM.

      We thank the reviewer for pointing out that measuring precise distances by FRET can be difficult. We agree that the low FRET efficiency makes precise distance determination even more challenging. However, FRET is quite good at measuring a change in distance given a specific donor-acceptor pair. We feel our FRET biosensor clearly demonstrates FRET efficiencies that are salt-insensitive in E525K but a clear decrease in FRET at higher salt concentrations in WT. In order to compare the trend in the predicted FRET, based on the single turnover measurements, and the actual FRET we thought it was important to plot the two together on the same graph. We understand that this could have been misleading that we were reporting actual distances. We have now plotted the FRET efficiency instead of distance as a function of KCl concentration (Figure 5B), to prevent any confusion with reporting distances. In addition, we have emphasized that the data are plotted to allow for a comparison of the trend in the single turnover and FRET data (see page 6, 10, Figure 5B).

      We agree that it is important to proceed with caution when comparing the EM to the FRET and single turnover data. The EM does not give a quantitative estimate of the fraction of IHM molecules, due to the disruptive effect of the grid surface on protein conformation. However, it does provide direct (though qualitative) evidence that the conformation underlying SRX and enhanced FRET is the IHM, and it is consistent with our interpretation that the E525K mutation enhances FRET and SRX by stabilizing the IHM. To consolidate this result, we have performed EM experiments now with a total of 3 preparations of WT and mutant (see page 6-7 and Figure 6D). We find that while there is variability from experiment to experiment, likely because the grid surface is slightly different each time the experiment is performed, in all cases there was a ~4-fold higher fraction of folded molecules in the mutant. Since each WT/mutant experimental pair was studied in parallel, using identically prepared grids, the results provide further evidence that the mutant stabilizes the IHM. However, we agree that a quantitative, direct visual correlation of the SRX and IHM is not possible based on the current EM data.

      Finally, the utility of the methods described in the paper to the field would be greatly enhanced if they were described in more detail. As currently written, it would be difficult for others to replicate these experiments.

      Thank you for the comment. We have made significant changes in the methods to clarify the details of the experiments (see pages 11-14). In addition, we have added details to the results and figure legends.

    1. Author Response

      Reviewer #1 (Public Review):

      “This study investigates the dynamics of brain network connectivity during sustained experimental pain in healthy human participants. To this end, capsaicin was applied to the tongues of two cohorts of participants (discovery cohort, N=48; replication cohort, N=74). This procedure resulted in pain for several minutes. During sustained pain, pain avoidance/intensity ratings and fMRI scans were obtained. The analyses (i) compare the pain state with a resting state, (ii) assess the dynamics of brain networks during sustained pain, and (iii) aim to predict pain based on the dynamics of brain networks. To this end, the analyses focus on community structures of time-evolving networks. The results show that sustained pain is associated with the emergence of a brain network including somatomotor, frontoparietal, basal ganglia and thalamic brain areas. The somatomotor area of the tongue is particularly involved in that network while this area is decoupled from other parts of the somatomotor cortex. Moreover, the network configuration changes over time with the frontoparietal network decoupling from the somatomotor network. Frontoparietal-cerebellar connections were predictive of decreases of pain. Together, the findings provide novel and convincing insights into the dynamics of brain network during sustained pain.

      Strengths

      • The brain mechanisms of sustained pain is a timely and relevant topic with potential clinical implications.

      • Assessing the dynamics of sustained pain and relating it to the dynamics of brain networks is a timely and promising approach to further the understanding of the brain mechanisms of pain.

      • The study includes discovery and replication cohorts and pursues a cutting-edge analysis strategy.

      • The manuscript is very well-written and the results are visualized in an exemplary manner including a graphical outline and summary of the findings.”

      We thank the reviewer for the thoughtful summarization and evaluation of our study.

      “Weaknesses

      • It remains unclear whether the changes of brain networks over time simply reflect the duration of sustained pain or whether they essentially reflect different levels of pain intensity/avoidance.”

      We appreciate the editor and reviewer’s comment on this issue. With the current experimental paradigm, it is difficult to dissociate the pain duration from the level of pain because the delivery of oral capsaicin commonly induces initial bursting and then a gradual decrease of pain over time. That is, the pain duration is correlated with the pain intensity in our task.

      However, when we examined the time-course of the ratings at each individual level (as shown in Figure S2), the time duration explained 53.7% of the rating variance, R2 = 0.537 ± 0.315 (mean ± standard deviation). In addition, if we constrain the beta coefficient of the time duration to be negative (i.e., ratings should decrease over time), the explained variance decreases to 48.2%, R2 = 0.482 ± 0.457, leaving us enough variance (i.e., greater than 50%) for examining the distinct effects of time duration and ratings on the patterns of functional brain reorganization.

      Indeed, the two main analyses included in the manuscript—consensus community detection and predictive modeling—were designed to examine those two aspects of the task, i.e., time duration and pain avoidance ratings, respectively. First, through the consensus community detection analysis, we examined the community structure that changes over time, i.e., across the early, middle, and late periods (as shown in Figure 3). We then developed predictive models of pain avoidance ratings in the second main analysis (as shown in Figure 5).

      Though it is still a caveat that we cannot fully dissociate the effects of time duration versus pain ratings, we could interpret the first set of results to be more about time duration, while the second set of results is more about pain ratings.

      We now added a description of the implication of predictive modeling for isolating the effects of pain ratings. In addition, a discussion on the caveat of the current experimental design and relevant future direction.

      Revisions to the main manuscript:

      p. 25: Moreover, developing models to directly predict the pain ratings is helpful to complement the group-level analysis, because the changes in consensus community structure over the early, middle, and late periods only indirectly reflect the different levels of pain.

      p. 27: This study also had some limitations. First, with the current experimental paradigm, it is difficult to dissociate the pain duration from the level of pain because the delivery of oral capsaicin commonly induces initial bursting and then a gradual decrease of pain over time. Though we aimed to model the effects of pain duration and pain avoidance ratings with our two primary analyses, i.e., consensus community detection and predictive modeling, we cannot fully dissociate the impact of time duration versus pain ratings.

      “• Although the manuscript is very well-written it might benefit from an even clearer and simpler explanation of what the consensus community structure and the underlying module allegiance measure assesses.”

      We thank you for the suggestion. Now we added additional (but simple) descriptions of module allegiance and consensus community detection methods.

      Revisions to the main manuscript:

      pp. 8-9: Here, the consensus community means the group-level representative structures of the distinct community partitions of individuals. To determine the consensus community across different individuals and times, we first obtained the module allegiance (Bassett et al., 2011) from the community assignment of each individual. Module allegiance assesses how much a pair of nodes is likely to be affiliated with the same community label, and is defined as a matrix T whose element Tij is 1 when nodes i and j are assigned to the same community and 0 when assigned to different communities. This conversion of the categorical community assignments to the continuous module allegiance values allows group-level summarization of different community structures of individuals.

      p. 14: Here, high module allegiance indicates the voxels of two regions are likely to be in the same community affiliation, and vice versa.

      “• The added value of the assessment of the dynamics of brain networks remains unclear. Specifically, it is unclear whether the current analysis of brain networks dynamics allows for a clearer distinction between and prediction of pain and no-pain states than other measures of static or dynamic brain activity or static measures of brain connectivity.”

      The main goal (and thus, the added value) of the current study was to provide a “mechanistic” understanding of the brain processes of sustained pain, rather than the “prediction.” Even though we included the results from the predictive modeling, as in Figures 4-6, our focus was more on the interpretation of the model to quantitatively examine the functional changes in the brain, not on the maximization of the prediction performance.

      Indeed, maximizing the prediction performance was the main goal of our previous study (Lee et al., 2021), in which we developed a predictive model of sustained pain based on the patterns of dynamic functional connectivity. The model showed better prediction performances compared to the current study, but it was challenging to interpret the model because of the high dimensionality of the model and its features. In addition, functional connectivity itself provides only limited insight into how functional brain networks are structured and reconfigured over time.

      In this sense, the multi-layer community detection method has several advantages to achieving our goal. First, the community detection analysis allows us to summarize the complex, high-dimensional whole-brain connectivity patterns into neurobiologically interpretable subsystems. Second, the multi-layer community detection method allows us to study the temporal changes in community structure by connecting the same nodes across different time points.

      Now we added a description of the rationale behind the choice of the multi-layer community detection analysis over the conventional functional connectivity methods, and the added value of our study.

      Revisions to the main manuscript:

      p. 3: In this study, we examined the reconfiguration of whole-brain functional networks underlying the natural fluctuation in sustained pain to provide a mechanistic understanding of the brain responses to sustained pain.

      p. 7: In this study, we used this approach to examine the temporal changes of brain network structures during sustained pain, which cannot be done with conventional functional connectivity-based analyses (Lee et al., 2021).

      p. 27: However, the previous model provides a limited level of mechanistic understanding because of the high dimensionality of the model and its features. In addition, functional connectivity itself provides only limited insight into how functional brain networks are structured and reconfigured over time.

      Reviewer #2 (public Review):

      “The Authors J-J Lee et al., investigated cortical and subcortical brain networks and their organization in communities over time during evoked tonic pain. The paper is well-written, and the findings are interesting and relevant for the field. Interestingly, other than confirming well known phenomena (e.g., segregation within the primary somatomotor cortex) the Authors identified an emerging "pain supersystem" during the initial increase of pain, in which subcortical and frontoparietal regions, usually more segregated, showed more interactions with the primary somatomotor cortex. Decrease of pain was instead associated to a reconfiguration of the networks that sees subcortical and frontoparietal regions connected with areas of the cerebellum. The main novelty of the proposed analysis, lies in the resulting high performances of the classifier, that shows how this interesting link between frontoparietal network and subcortical regions with the cerebellum, is predictive of pain decrease. In summary, the main strengths of the present manuscript are: • Inclusion of subcortical regions: most of the recent papers using the Shaefer parcellation in ~200 brain areas1, do not consider subcortical areas, ignoring possible relevant responses and behaviors of those regions. Not only the Authors smartly addressed this issue, but most of their results showed how subcortical regions played a key role in the networks reconfiguration over time during evoked sustained pain.

      • Robust classification results: high accuracy obtained on training dataset (internal validation), using a leave-one-out approach, and on the available independent test dataset (external validation) of relatively large sample size (N=74).

      • Clarity in the description of aim and sub-aims and exhaustive presentation of the obtained results helped by appropriate illustrations and figures (I suggest less wording in some of them).

      • Availability of continuous behavioral outcome (track ball).”

      We appreciate the reviewer’s summary and positive evaluations.

      “Even though the results are mostly cohesive with previous literature, some of the results need to be discussed in relationship to recently published papers on the same topic as well as justifying some of the non-standard methodological procedures adding appropriate citations (or more detailed comments). The Authors do not touch upon the concept of temporal summation of pain, historically associated with tonic pain, especially when the study is finalized to better understanding brain mechanisms in chronic pain populations (chronic pain patients often exhibit increased temporal summation of pain2). I would suggest starting from the paper recently published by Cheng et al. that also shares most of the methodological pipeline3 to highlight similarities and novelties and deepen the comparison with the associated literature.”

      We thank the reviewer and editor for the comment on this important topic. Temporal summation of pain indicates progressively increased sensation of pain during prolonged noxious stimulation (Price, Hu, Dubner, & Gracely, 1977), and has been suggested as a hallmark of chronic pain disorders including fibromyalgia (Cheng et al., 2022; Price et al., 2002). In a recent study by Cheng et al. (2022), the authors induced tonic pain using constantly high cuff pressure and examined whether the participants experienced increased pain in the late period compared to the early period of pain. On the contrary, in our experimental paradigm, the capsaicin liquid initially delivered into the oral cavity is being cleaned out by saliva, and thus overall pain intensity was decreasing over time, not increasing (Figure 1B). Therefore, the temporal summation of pain may occur in a limited period (e.g., the early period of the run), but it is difficult to examine its effect systematically in our study.

      However, it is notable that Cheng et al.’s results overlap with our findings. For example, Cheng et al. reported the intra-network segregation within the somatomotor network and the inter-network integration between the somatomotor and other networks during the temporal summation of pressure pain in patients with fibromyalgia, which were similar to the findings we reported in Figure S9 and Figure 4. Although it is unclear whether these results reflect the temporal summation of pain, these network-level features shared across the two studies are likely to be an essential component of the sustained pain processes in the brain.

      Now we added a comment on the temporal summation of pain in the main manuscript.

      Revisions to the main manuscript (p. 26):

      Interestingly, a recent fMRI study on the temporal summation of pain in fibromyalgia patients reported results similar to ours (Cheng et al., 2022), including the intra-network dissociation within the somatomotor network and the inter-network integration between the somatomotor and other networks during pain. Although we cannot directly examine whether the temporal summation of pain gave rise to these network-level changes due to the limitation of our experimental paradigm, these consistent findings between the two studies may suggest that our findings could be generalized to clinical conditions.

      We thank the reviewer and editor for the information about this recent publication. Cheng et al. (2022) was not published at the time we wrote the manuscript, and we were surprised that Cheng et al. shares many aspects with our study, e.g., both used multilayer community detection and also reported similar findings, as described above.

      However, there were some differences between the two studies as well.

      First, the focus of our study was on the brain dynamics during the natural time-course of sustained pain from its initiation to remission in healthy participants, whereas the focus of Cheng et al. was on the temporal summation phenomenon of pain (TSP) and the enhanced TSP in patients with fibromyalgia patients. Because of this difference in the research focuses, our study and Cheng et al. are providing many nonoverlapping results and insights. For example, our study paid particular attention to the coping mechanisms of the brain (e.g., the network-level changes in the subcortical and frontoparietal network regions) and the brain systems that are correlated with the natural decrease of pain (e.g., the cerebellum in Figure 5). In contrast, Cheng et al. (2022) identified the brain connectivity and network features important for the increased TSP in fibromyalgia patients.

      Second, our great interest was in identifying and visualizing the fine-grained spatiotemporal patterns of functional brain network changes over the period of sustained pain. To utilize fine-grained brain activity information, we conducted our main analyses at a voxel-level resolution and on the native brain space, such as in Figures 2-3 and Figures S5, S7, and S8. With this fine-grained spatiotemporal mapping, we were able to identify small, but important voxel-level dynamics.

      We now cited Cheng et al. (2022) in multiple places and revised the manuscript accordingly.

      Revisions to the main manuscript (p. 26):

      Interestingly, a recent fMRI study on the temporal summation of pain in fibromyalgia patients reported results similar to ours (Cheng et al., 2022), including the intra-network dissociation within the somatomotor network and the inter-network integration between the somatomotor and other networks during pain. Although we cannot directly examine whether the temporal summation of pain gave rise to these network-level changes due to the limitation of our experimental paradigm, these consistent findings between the two studies may suggest that our findings could be generalized to clinical conditions.

      “Here the main significant weaknesses of the study:

      • The data analysis is entirely conducted on young healthy subjects. This is not a limitation per se, but the conclusion about offering new insights into understanding mechanisms at the basis of chronic pain is too far from the results. Centralization of pain is very different from summation and habituation, especially if all the subjects in the study consistently rated increased and decreased pain in the same way (it never happens in chronic pain patients). A similar pipeline has been actually applied to chronic pain patients (fibromyalgia and chronic back pain)3,4. Discussing the results of the present paper in relationship to those, could offer a more robust way to connect the Authors' results to networks behavior in pathological brains.”

      We are grateful for the opportunity to discuss the clinical implication of our study. First of all, we agree with the reviewer and editor that we cannot make a definitive claim about chronic pain with the current study, and thus, we revised the last sentence of the abstract to tone down our claim.

      Revisions to the main manuscript (p. 2, in the abstract):

      This study provides new insights into how multiple brain systems dynamically interact to construct and modulate pain experience, advancing our mechanistic understanding of sustained pain.

      However, as we noted above in E-4, some of our findings were consistent with the findings from a previous clinical study (Cheng et al., 2022), suggesting the potential to generalize our study to clinical pain conditions. In addition, we previously reported that a predictive model of sustained pain derived from healthy participants performed better at predicting the pain severity of chronic pain patients than the model derived directly from chronic pain patients (Lee et al., 2021), highlighting the advantage of the “component process approach.”

      The component process approach aims to develop brain-based biomarkers for basic component processes first, which can then serve as intermediate features for the modeling of multiple clinical conditions (Woo, Chang, Lindquist, & Wager, 2017). This has been one of the core ideas of the Research Domain Criteria (RDoC) (Insel et al., 2010) and the Hierarchical Taxonomy of Psychopathology (HiTOP) (Kotov et al., 2017). If the clinical pain of a patient group is modeled as a whole, it becomes unclear what is being modeled because of the multidimensional and heterogeneous nature of clinical pain (Melzack, 1999) as well as other co-occurring health conditions (e.g., mental health issues, medication use, etc.). The component process approach, in contrast, can specify which components are being modeled and are relatively free from heterogeneity and comorbidity issues by experimentally manipulating the specific component of interest in healthy participants.

      The current study was conducted on healthy young adults based on the component process approach. We used oral capsaicin to experimentally induce sustained pain, which unfolds over protracted time periods and has been suggested to reflect some of the essential features of clinical pain (Rainville, Feine, Bushnell, & Duncan, 1992; Stohler & Kowalski, 1999). Therefore, the detailed characterization of the brain processes of sustained pain will be able to serve as an intermediate feature of multiple clinical conditions in future studies.

      Now we added the discussion on the clinical generalizability issue in the discussion section.

      Revisions to the main manuscript:

      pp. 25-26: An interesting future direction would be to examine whether the current results can be generalized to clinical pain. Experimental tonic pain has been known to share similar characteristics with clinical pain (Rainville et al., 1992; Stohler & Kowalski, 1999). In addition, in a recent study, we showed that an fMRI connectivity-based signature for capsaicin-induced orofacial tonic pain can be generalized to chronic back pain (Lee et al., 2021). Therefore, a detailed characterization of the brain responses to sustained pain has the potential to provide useful information about clinical pain.

      p. 26: Interestingly, a recent fMRI study on the temporal summation of pain in fibromyalgia patients reported results similar to ours (Cheng et al., 2022), including the intra-network dissociation within the somatomotor network and the inter-network integration between the somatomotor and other networks during pain. Although we cannot directly examine whether the temporal summation of pain gave rise to these network-level changes due to the limitation of our experimental paradigm, these consistent findings between the two studies may suggest that our findings could be generalized to clinical conditions.

      “Vice versa, the behavioral measure used to assess evoked pain perception (avoidance ratings), has been developed for chronic pain patients and never validated on healthy controls5. It might not be an appropriate measure considering the total absence of pain variability in the reported responses over forty-eight subjects6,7.”

      We acknowledge that pain avoidance measures are not fully validated in the healthy population. Nevertheless, we used this measure in this study for the following two main reasons that outweigh the limitations.

      First, a pain avoidance rating provides an integrative measure that can reflect the multi-dimensional aspects of sustained pain. One of the essential functions of pain is to avoid harmful situations and promote survival, and the avoidance motivation induced by pain is composed of not only sensory-discriminative, but also cognitive components including learning, valuation, and contexts (Melzack, 1999). According to the fear-avoidance model (Vlaeyen & Linton, 2012), if the pain-induced avoidance motivation is not resolved for a long time and is maladaptively associated with innocuous environments, chronic pain is likely to develop, suggesting the importance and clinical relevance of pain avoidance measures. In addition, our experimental design is particularly suitable for the use of avoidance rating because the oral capsaicin stimulation is accompanied by the urge to avoid the painful sensation, but it cannot immediately be resolved similar to chronic pain. Moreover, capsaicin is sometimes experienced as intense but less aversive (or even appetitive) in some cases, e.g., spicy food craver (Stevenson & Yeomans, 1993). In this case, avoidance ratings can provide a more reasonable measure of pain compared to the intensity rating.

      Second, the avoidance measure provides a common scale on which we can compare different types of aversive experiences, allowing us to conduct specificity tests for a predictive model of pain. For example, a recent study successfully compared the brain representations of two types of pain and two types of aversive, but non-painful experiences (e.g., aversive auditory and visual experiences) using the same avoidance measure (Ceko, Kragel, Woo, Lopez-Sola, & Wager, 2022). These comparisons were possible because the avoidance measure provided one common scale for all the aversive experiences regardless of their types of stimuli.

      To provide a better justification for the use of the avoidance measure, we now included the specificity test results of our pain predictive models. More specifically, we tested our module allegiance-based SVM and PCR models of pain on the aversive taste and aversive odor conditions (Figure S13).

      Despite these advantages, the use of avoidance rating without thorough validation is a limitation of the current study, and thus future studies need to examine the psychometric properties of the avoidance rating, e.g., examining the relationship among pain intensity, unpleasantness, and avoidance measures. However, the current study showed that the predictive models derived with pain avoidance rating (Study 1) could be used to predict the pain intensity rating (Study 2). In addition, the overall time-course of pain avoidance ratings in Study 1 was similar to the time-course of pain intensity ratings in Study 2, providing some supporting evidence for the convergent validity of the pain avoidance measure.

      As to the following comment, “It might not be an appropriate measure considering the total absence of pain variability in the reported responses over forty-eight subjects,” there are pieces of evidence supporting that the low between-individual variability of ratings is due to the characteristics of our experimental design, not to the fact that we used the avoidance measure. As we discussed in more detail in our response to E-1, our experimental procedure based on capsaicin liquid commonly induces the initial burst of painful sensation and the subsequent gradual relief for most of the participants (Figure 1B, left). A similar time-course pattern of ratings was observed in Study 2 (Figure 1B, right), which used the pain “intensity” rating, not the pain avoidance rating. In addition, previous studies with a similar experimental design (i.e., intra-oral capsaicin application) (Berry & Simons, 2020; Lu, Baad-Hansen, List, Zhang, & Svensson, 2013; Ngom, Dubray, Woda, & Dallel, 2001) also showed a similar time-course of pain ratings with low between-individual variability regardless of the rating types (e.g., VAS or irritation intensity), confirming that this observation is not unique to the pain avoidance rating.

      Now we added descriptions on the small between-individual variability of pain ratings and the use of avoidance ratings.

      Revisions to the main manuscript:

      pp. 5-7: Note that the overall trend of pain ratings over time was similar across participants because of the characteristics of our experimental design, which has also been observed in the previous studies that used oral capsaicin (Berry & Simons, 2020; Lu et al., 2013; Ngom et al., 2001). However, also note that each individual’s time-course of pain ratings were not entirely the same (Figures S2 and S3).

      p. 26: However, there are also differences between the characteristics of capsaicin-induced tonic pain versus clinical pain. For example, clinical pain continuously fluctuates over time in an idiosyncratic pattern (Apkarian, Krauss, Fredrickson, & Szeverenyi, 2001), whereas capsaicin-induced tonic pain showed a similar time-course pattern across the participants—i.e., increasing rapidly and then decreasing gradually (Figure 1B). This typical time-course of pain ratings has been reported in previous studies that used oral capsaicin (Berry & Simons, 2020; Lu et al., 2013; Ngom et al., 2001).

      pp. 26-27: Note that Study 1 used a pain avoidance measure that is not yet fully validated in healthy participants. However, we chose to use the pain avoidance measure, which can provide integrative information on the multi-dimensional aspects of pain (Melzack, 1999; Waddell, Newton, Henderson, Somerville, & Main, 1993). It also has a clinical implication considering that the maladaptive associations of pain avoidance to innocuous environments have been suggested as a putative mechanism of transition to chronic pain (Vlaeyen & Linton, 2012). Lastly, the avoidance measure can provide a common scale across different modalities of aversive experience, allowing us to compare their distinct brain representations (Ceko et al., 2022) or test the specificity of their predictive models (Lee et al., 2021) (Figure S13). Although the psychometric properties of the pain avoidance measure should be a topic of future investigation, we expect that the pain avoidance measure would have a high level of convergent validity with pain intensity given the observed similarity between pain avoidance (Study 1) and pain intensity (Study 2) in their temporal profiles. The generalizability of our PCR model across Studies 1 and 2 also supports this speculation. However, there would also be situations in which pain avoidance is dissociated from pain intensity. For example, capsaicin can be experienced to be intense but less aversive or even appetitive in some contexts, such as cravings for spicy food (Stevenson & Yeomans, 1993). In addition, the gradual rise of avoidance ratings during the late period of the control condition in Study 1 would not be observed if the intensity measure was used. Future studies need to examine the relationship between pain avoidance and the other pain assessments and the advantage of using the pain avoidance measure.

      “• The dynamic measure employed by the Authors is better described from the term "windowed functional connectivity". It is often considered a measure of dynamic functional connectivity and it gives information about fluctuations of the connectivity patterns over time. Nevertheless, the entire focus of the paper, including the title, is on dynamic networks, which inaccurately leads one to think of time-varying measures with higher temporal resolution (either updating for every acquired time point, as the Authors did in their previous publication on the same dataset4, or sliding windows involving weighting or tapering8,9). This allows one to follow network reorganization over time without averaging 2-min intervals in which several different brain mechanisms might play an important role3,10,11. In summary, the assumption of constant response throughout 2-min periods of tonic pain and the use of Pearson correlations do not mirror the idea of dynamic analysis expressed by the Authors in title and introduction. I would suggest removing "dynamic" from the title, reduce the emphasis on this concept, address possible confounds introduced by the choice of long windows and rephrase the aim of the study in terms of brain network reconfiguration over the main phases of tonic pain experience.”

      Now we removed the word ‘dynamic’ from many places in the manuscript, including the title. In addition, we added a brief discussion on the reason we chose to use the long and non-overlapping windows for connectivity calculation.

      Revisions to the main manuscript (p. 8):

      Although the long duration of the time window without overlaps may obscure the fine-grained temporal dynamics in functional connectivity patterns, we chose to use this long time window based on previous literature (Bassett et al., 2011; Robinson, Atlas, & Wager, 2015), which also used long time windows to obtain more reliable estimates of network structures and their transitions.

      “• Procedure chosen for evoking sustained pain. To the best of my knowledge, capsaicin sauce on the tongue is not a validated tonic pain procedure. In favor of this argument is the absence of inter-subject variability in the behavioral results showed in the paper, very unusual for response to painful stimulations. The procedure is well described by the Authors, and some precautions like letting the liquid drying before the start of the scan, have helped reducing confounds. Despite this, the measures in figure 1B suggest that the intensity of the painful stimulation is not constant as expected for sustained pain (probably the effect washes out with the saliva). In this case, the first six-minute interval requires particular attention because it encapsulates the real tonic pain phase, and the following ones require more appropriate labels. Ideally the Author should cite previous studies showing that tongue evoked pain elicits a very specific behavioral response (summation, habituation/decrease of pain, absence of pain perception). If those works are missing, this response need to be treated as a funding rather than an obvious point.”

      We addressed this comment. Moreover, we could find previous studies that experimentally induced tonic pain through the application of capsaicin on the tongue (Berry & Simons, 2020; Boudreau, Wang, Svensson, Sessle, & Arendt-Nielsen, 2009; Green, 1991; Ngom et al., 2001), suggesting that our experimental procedure is in line with previous literature.

      Reviewer #3 (Public Review ):

      “In their manuscript, Lee and colleagues explore the dynamics of the functional community structure of the brain (as measured with fMRI) during sustained experimental pain and provide several potentially highly valuable insights into, and evaluate the predictive capacity of, the underlying dynamic processes. The applied methodology is novel but, at the same time, straightforward and has solid foundations. The findings are very interesting and, potentially, of high scientific impact as they may significantly push the boundaries of our understanding of the dynamic neural processes during sustained pain, with a (somewhat limited) potential for clinical translation.

      However (Major Issue 1), after reading the current manuscript version, not all of my doubts have been dissolved regrading the specificity of the results to pain. Moreover (Major Issue 2), some of the results (specifically, those related to the group level analysis of community differences) do not seem to be underpinned with a proper statistical inference in the current version of the manuscript and, therefore, their presentation and discussion may not be proportional to the degree of evidence. Next to these Major Issues (detailed below), some other, minor clarifications might also be needed before publications. These are detailed below or in the private part of the review ("Recommendations for the authors").

      Despite these issues, this is, in general, a high quality work with a high level of novelty and - after addressing the issues - it has a very high potential for becoming an important contribution (and a very interesting read) to the pain-research community and beyond.”

      We appreciate the reviewer’s thoughtful comments. We have revised the manuscript to address the Reviewer’s major concerns, as described below.

      “Major Issue 1:

      The main issue with the manuscript is that it remains somewhat unclear, how specific the results are to pain.

      Differences between the control resting state and the capsaicin trials might be - at least partially - driven by other factors, like:

      • motion artifacts

      • saliency, attention, axiety, etc.

      Differences between stages over the time-course might, additionally, be driven by scanner drifts (to which the applied approach might be less sensitive, but the possibility is still there ) or other gradual processes, e.g. shifts in arousal, attention shifts, alertness, etc.

      All the above factors might emerge as confounding bias in both of the predictive models.

      This problem should be thoroughly discussed, and at least the following extra analyses are recommended, in order to attenuate concerns related to the overall specificity and neurobiological validity of the results:

      • reporting of, and testing for motion estimates (mean, max, median framewise displacement or anything similar)

      • examining whether these factors might, at least partially, drive the predictive models.

      • e.g. applying the PCR model on the resting state data and verifying of the predicted timecourse is flat (no inverse U-shape, that is characteristic to all capsaicin trials).

      Not using the additional sessions (bitter taste, aversive odor, phasic heat) feels like a missed opportunity, as they could also be very helpful in addressing this issue.”

      We thank the reviewer for this comment on the important issue regarding the specificity of our results and the potential influences of noise. The effects of head motion and physiological confounds are particularly relevant to pain studies because pain involves substantial physiological changes and often causes head motion. To address the related concerns of specificity, we conducted additional analyses assessing the independence of our predictive models (i.e., SVM and PCR models) from head movement and physiology variables and the specificity of our models to pain versus non-painful aversive conditions (i.e., bitter taste and aversive odor) in Study 1.

      First, we examined the overall changes of framewise displacement (FD) (Power, Barnes, Snyder, Schlaggar, & Petersen, 2012), heart rate (HR), and respiratory rate (RR) in the capsaicin condition (Figure S11). For the univariate comparison between the capsaicin vs. control conditions (Figure S11A), the results showed that, as expected, the capsaicin condition caused significant changes in head motion and autonomic responses. The mean FD and HR were significantly higher, and the RR was lower in the capsaicin condition compared to the control condition (FD: t47 = 5.30, P = 2.98 × 10-6; HR: t43 = 4.98, P = 1.10 × 10-5; RR: t43 = -1.91, P = 0.063, paired t-test). In addition, the increased motion and autonomic responses were more prominent in the early period of pain (Figure S11B). The 10-binned (2 mins per time-bin) FD and HR showed a decreasing trend while the RR showed an increasing trend over time in the capsaicin condition. The comparisons between the early (1-3 bins, 0-6 min) vs. late (8-10 bins, 14-20 min) periods of the capsaicin condition showed significant differences both for FD and HR (FD: t47 = 6.45, P = 8.12 × 10-8; HR: t43 = 6.52, P = 6.41 × 10-8; RR: t43 = -1.61, P = 0.11, paired t-test). These results suggest that while participants were experiencing capsaicin tonic pain, particularly during the early period, head motion and heart rate were increased, while breathing was slowed down. Note that we needed to exclude 4 participants’ data in this analysis due to technical issues with the physiological data acquisition.

      Next, we examined whether the changes in head motion and physiological responses influenced our predictive model performance (Figure S12). We first regressed out the mean FD, HR, and RR (concatenated across conditions and participants as we trained the SVM model) from the predicted values of the SVM model with leave-one-subject-out cross-validation (2 conditions × 44 participants = 88) and then calculated the classification accuracy again (Figure S12A). The results showed that the SVM model showed a reduced, but still significant classification accuracy for the capsaicin versus control conditions in a forced-choice test (n = 44, accuracy = 89%, P = 1.41 × 10-7, binomial test, two-tailed). We also did the same analysis for the PCR model (10 time-bins × 44 participants = 440) and the PCR model also showed a significant prediction performance (n = 44, mean prediction-outcome correlation r = 0.20, P = 0.003, bootstrap test, two-tailed, mean squared error = 0.159 ± 0.022 [mean ± s.e.m.]) (Figure S12B). These results suggest that our SVM and PCR models capture unique variance in tonic pain above and beyond the head movement and physiological changes.

      Lastly, we examined the specificity of our predictive models to pain, by testing the models on the non-painful but aversive conditions including the bitter taste (induced by quinine) and aversive odor (induced by fermented skate) conditions (Figure S13). All the model responses were obtained using leave-one-participant-out cross-validation. The results showed that the overall model responses of the SVM model for the bitter taste and aversive odor conditions were higher than those for the control condition but lower than the capsaicin condition (Figure S13A). Classification accuracies for comparing capsaicin vs. bitter taste and capsaicin vs. aversive odor were all significant (for capsaicin vs. bitter taste, accuracy = 79%, P = 6.17 × 10-5, binomial test, two-tailed, Figure S13C; for capsaicin vs. aversive odor, accuracy = 83%, P = 3.31 × 10-6, binomial test, two-tailed, Figure S13E), supporting the specificity of our SVM model of pain. Similarly, the model responses of the PCR model for the bitter taste and aversive odor conditions were lower than the capsaicin condition, and their temporal trajectories were less steep and fluctuating compared to the capsaicin condition (Figure S13B). The time-course of the model responses for the control condition was flatter than all other conditions and did not show the inverted U-shape. Furthermore, the model responses of the bitter taste and aversive odor conditions did not show the significant correlations with the actual avoidance ratings (bitter taste: mean prediction-outcome correlation r = 0.05, P = 0.41, bootstrap test, two-tailed, mean squared error = 0.036 ± 0.006 [mean ± s.e.m.], Figure S13D; aversive odor: mean prediction-outcome correlation r = 0.12, P = 0.06, bootstrap test, two-tailed, mean squared error = 0.044 ± 0.004 [mean ± s.e.m.], Figure S13F), suggesting the specificity of PCR model to pain.

      Overall, we have provided evidence that our models can predict pain ratings above and beyond the head motion and physiological changes and that the models are more responsive to pain compared to non-painful aversive conditions.

      Now we added descriptions on the specificity tests to the main manuscript and also to the Supplementary Information.

      Revisions to the main manuscript (p. 20):

      Specificity of the module allegiance-based predictive models To examine whether the predictive models were specific to pain and the prediction performances were not influenced by confounding variables such as head motion and physiological changes, we conducted additional analyses as shown in Figures S11-13. The SVM and PCR models showed significant prediction performances even after controlling for head motion (i.e., framewise displacement) and physiological responses (i.e., heart rate and respiratory rate) (Figures S11 and S12) and did not respond to the non-painful but aversive conditions including the bitter taste and aversive odor conditions (Figure S13), supporting the specificity of our predictive to pain. For details, please see Supplementary Results.

      Revisions to the Supplementary Information (pp. 2-4):

      Specificity analysis (Figures S11-13) To examine whether the predictive models (i.e., SVM and PCR models) were specific to pain and not influenced by confounding noises, we conducted additional specificity analysis assessing the independence of the models from head movement and physiology variables and specificity of our models to pain versus non-painful aversive conditions (i.e., bitter taste and aversive odor) in Study 1. First, we examined the overall changes of framewise displacement (FD) (Power et al., 2012), heart rate (HR), and respiratory rate (RR) in sustained pain (Figure S11). For the univariate comparison between capsaicin vs. control conditions (Figure S11A), the results showed that, as expected, capsaicin condition caused significant changes in motion and autonomic responses. The mean FD and HR were significantly higher, and the RR was lower in the capsaicin condition compared to the control condition (FD: t47 = 5.30, P = 2.98 × 10-6; HR: t43 = 4.98, P = 1.10 × 10-5; RR: t43 = -1.91, P = 0.063, paired t-test). For the temporal changes of movement and physiology variables (Figure S11B), the results showed that the increased motion and autonomic responses are more prominent in the early period of pain. The 10-binned (2 mins per time-chunk) FD and HR showed decreasing trend while the RR showed increasing trend over time in capsaicin condition. Additional univariate comparisons between early (1-3 bins, 0-6 min) vs. late (8-10 bins, 14-20 min) period of capsaicin condition showed that differences were significant for FD and HR (FD: t47 = 6.45, P = 8.12 × 10-8; HR: t43 = 6.52, P = 6.41 × 10-8; RR: t43 = -1.61, P = 0.11, paired t-test). This suggests that while participants were experiencing tonic pain, particularly in the early period, motion and heart rate was increased but breathing was slowed. Note that we needed to exclude 4 participants’ data due to technical issues with physiological data acquisition. Next, we examined whether the head movement and physiological responses are the main driver of our predictive models (Figure S12). For all the original signature responses from SVM model (2 conditions × 44 participants = 88), we regressed out the mean FD, HR, and RR (concatenated across conditions and participants as the SVM model was trained) and calculated the classification accuracy (Figure S12A). Although the signature responses were controlled for movement and physiology variables, the SVM model still showed a high classification accuracy for the capsaicin versus control conditions in a forced-choice test (n = 44, accuracy = 89%, P = 1.41 × 10-7, binomial test, two-tailed). Similarly, for all the original signature responses from PCR model (10 time-bins × 44 participants = 440), we regressed out the 10-binned FD, HR, and RR (concatenated across time-bins and participants as the PCR model was trained) and calculated the within-individual prediction-outcome correlation (Figure S12B). Again, the PCR model showed a significantly high predictive performance (n = 44, mean prediction-outcome correlation r = 0.20, P = 0.003, bootstrap test, two-tailed, mean squared error = 0.159 ± 0.022 [mean ± s.e.m.]) while controlling for movement and physiology variables. These results suggest that our SVM and PCR models captures unique variance in tonic pain above and beyond the head movement and physiological changes. Lastly, we examined the specificity of our predictive models to pain, by testing the models onto the non-painful but tonic aversive conditions including bitter taste (induced by quinine) and aversive odor (induced by fermented skate) (Figure S13). All the signature responses were obtained using leave-one-participant-out cross-validation. The results showed that the overall signature responses of SVM model for bitter taste and aversive odor conditions were higher than those for control conditions, but lower than capsaicin condition (Figure S13A). Classification accuracy between capsaicin vs. bitter taste and vs. aversive odor were all significantly high (capsaicin vs. bitter taste: accuracy = 79%, P = 6.17 × 10-5, binomial test, two-tailed, Figure S13C; capsaicin vs. aversive odor: accuracy = 83%, P = 3.31 × 10-6, binomial test, two-tailed, Figure S13E), suggesting the specificity of SVM model to pain. Similarly, the temporal trajectories of the signature responses of PCR model for bitter taste and aversive odor conditions were not overlapping with that of the capsaicin condition (Figure S13B). Furthermore, the signature responses of bitter taste and aversive odor conditions do not have significant relationship with the actual avoidance ratings (bitter taste: mean prediction-outcome correlation r = 0.05, P = 0.41, bootstrap test, two-tailed, mean squared error = 0.036 ± 0.006 [mean ± s.e.m.], Figure S13D; aversive odor: mean prediction-outcome correlation r = 0.12, P = 0.06, bootstrap test, two-tailed, mean squared error = 0.044 ± 0.004 [mean ± s.e.m.], Figure S13F), suggesting the specificity of PCR model to pain. Overall, we have provided evidence that the module allegiance-based models can predict pain ratings above and beyond the movement and physiological changes, and are more responsive to pain compared to non-painful aversive conditions, which suggest the specificity of our results to pain.

      “Major Issue 2:

      Another important issue with the manuscript is the (apparent) lack of statistical inference when analyzing the differences in the group-level consensus community structures (both when comparing capsaicin to control and when analysing changes over the time-course of the capsaicin-challenge).

      Although I agree that the observed changes seem biologically plausible and fit very well to previous results, without proper statistical inference we can't determine, how likely such differences are to emerge just by chance.

      This makes all results on Figs. 2 and 3, and points 1, 4 and 5 in the discussion partially or fully speculative or weakly underpinned, comprising a large proportion of the current version of the manuscript.

      Let me note, that this issue only affects part of the results and the remaining - more solid - results may already provide a substantial scientific contribution (which might already be sufficient to be eligible for publication in eLife, in my opinion).

      Therefore I see two main ways of handling Major Issue 2:

      • enhancing (or clarifying potential misunderstandings regarding) the methodology (see my concrete, and hopefully feasible, suggestions in the "private part" of the review),

      • de-weighting the presentation and the discussion of the related results.

      I believe there are many ways to test the significance of these differences. I highlight two possible, permutation testing-based ideas.

      Idea 1: permuting the labels ctr-capsaicin, or early-mid-late, repeating the analysis, constructing the proper null distribution of e.g. the community size changes and obtain the p-values. Idea 2: "trace back" communities to the individual level and do (nonparametric) statistical inference there.”

      We appreciate this important comment. We did not conduct statistical inference when comparing the group-level consensus community affiliations of the different conditions (Figure 2) or different phases (Figure 3) because of the difficulty in matching the community affiliation values of the networks to be compared.

      For example, let us assume that the 800 out of 1,000 voxels of community #1 and 1,000 out of 4,000 voxels of community #2 in the control condition are commonly affiliated with the same community #3 in the capsaicin condition. To compare the community affiliation between two conditions, we should first match the community label of the capsaicin condition (i.e., #3) to that of the control condition (i.e., #1 or #2), and here a dilemma occurs; if we prioritize the proportion of the overlapping voxels for the matching, the common community should be labeled as #1, whereas if we prioritize the number of the overlapping voxels for the matching, the label of the common community should be #2. Although both choices look reasonable, none of them can be a perfect solution.

      As the example above, it is impossible to exactly match the community affiliation of the different networks. We must choose an imperfect criterion for the matching procedure, which essentially affects the comparison of network structure. This was the main reason that we limited our results of Figures 2-3 to a qualitative description based on visual inspection. Moreover, the group-level consensus community structures in Figures 2-3 are not a simple group statistic like sample mean; they were obtained from multiple steps of analyses including permutation-based thresholding and unsupervised clustering, which could further complicate the interpretation of statistical tests.

      Alternatively, there is a slightly different but more rigorous approach to the comparisons of the community structures, which is the Phi-test (Alexander-Bloch et al., 2012; Lerman-Sinkoff & Barch, 2016). Instead of direct use of the community labels, this method converts the community label of each voxel into a list of module allegiance values between the seed voxel and all the voxels of the brain (i.e., 1 if the seed and target voxels have the same community label and 0 otherwise). This allows quantitative comparisons of voxel-level community profiles between different conditions without an arbitrarily matching of the community labels. We adopted this Phi-test for our analyses to examine whether the regional community affiliation pattern is significantly different between (i) the capsaicin vs. control conditions and (ii) the early vs. late periods of pain (Figure S6), which correspond to the main findings of the Figures 2 and 3 in our manuscript, respectively.

      More specifically, to compare the group-level consensus community structures between the capsaicin vs. control conditions and the early vs. late periods, we first obtained a seed-based module allegiance map for each voxel (i.e., using each voxel as a seed). Then, we calculated a correlation coefficient of the module allegiance values between two different conditions for each voxel. This correlation coefficient can serve as an estimate of the voxel-level similarity of the consensus community profile. Because module allegiance is a binary variable, these correlation values are Phi coefficients. A small Phi coefficient means that the spatial pattern of brain regions that have the same community affiliation with the given voxel are different between the two conditions. For example, if a voxel is connected to the somatomotor-dominant community during the capsaicin condition and the default-mode-dominant community during the control condition, the brain regions that have the same community label with the voxel will be very different, and thus the Phi coefficient will become small. Moreover, the Phi coefficient can be small even if a voxel is affiliated as the same (matched) community label for both conditions, when the spatial patterns of the same community is different between conditions.

      To calculate the statistical significance of the Phi coefficient, we conducted permutation tests, in which we randomly shuffled the condition labels in each participant and obtained the group-level consensus community structure for each shuffled condition. Then, we calculated the voxel-level correlations of the module allegiance values between the two shuffled conditions. We repeated this procedure 1,000 times to generate the null distribution of the Phi coefficients, and calculated the proportion of null samples that have a smaller Phi coefficient (i.e., a more dis-similar regional community structure) than the non-shuffled original data.

      Results showed that there are multiple voxels with statistical significance (permutation tests with 1,000 iterations, one-tailed) in the area where the community affiliations of the two contrasting conditions were different (Figure S6). For example, the frontoparietal and subcortical regions for the capsaicin vs. control (c.f., Figure 2), and the frontoparietal, subcortical, brainstem, and cerebellar regions for the early vs. late period of pain (c.f., Figure 3) contain voxels that survived after thresholding with FDR-corrected q < 0.05, suggesting the robustness of our main results.

      Particularly, the somatomotor and insular cortices showed statistical significance in the permutation test, and this may reflect the large changes in other areas that are connecting to the somatomotor and insular cortices across different conditions. The statistical significance was also observed in the visual cortex, which was unexpected. We interpret that the spatial distribution of the visual network community is too stable across conditions, and thus the null distribution from permutation formed a very narrow distribution of Phi coefficients. Therefore, a small change in the community structure could achieve statistical significance.

      Now we added descriptions on the permutation tests.

      Revisions to the main manuscript:

      p. 9: Permutation tests confirmed that the community assignment in the frontoparietal and subcortical regions showed significant changes between the capsaicin versus control conditions (Figure S6A).

      p. 13: Permutation tests further confirmed that the community assignment in the frontoparietal, subcortical, and brainstem regions showed significant changes between the early versus late period of pain (Figure S6B).

      pp. 36-37: Permutation tests for regional differences in community structures. To test the statistical significance of the voxel-level difference of consensus community structures (Figures 2 and 3), we performed the following Phi-test (Alexander-Bloch et al., 2012; Lerman-Sinkoff & Barch, 2016). First, for each given voxel, we compared the community label of the voxel to the community label of all the voxels, generating a list of voxel-seed module allegiance values that allow quantitative comparison of voxel-level community profile (e.g., [1, 0, 1, 1, 0, 0, ...], whose element is equal to 1 if the seed and target voxels were assigned to the same community and 0 otherwise). Next, a correlation coefficient was calculated between the module allegiance values of the two different brain community structures (i.e., capsaicin versus control, and early versus late). This correlation coefficient is an estimate of the regional similarity of community profiles (here, the correlation coefficient is Phi coefficient because module allegiance is a binary variable). To estimate the statistical significance of the Phi coefficient, we performed permutation tests, in which we randomly shuffled the labels and then obtained the group-level consensus community structures from the shuffled data. Then, the Phi coefficient between the module allegiance values of the two shuffled consensus community structures was calculated. We repeated this procedure 1,000 times to generate the null distribution of the Phi coefficient for each voxel. Lastly, we examined the probability to observe a smaller Phi coefficient (i.e., a more dissimilar community profile) than the one from the non-shuffled original data, which corresponds to the P-value of the permutation test. All the P-values were one-tailed as the hypothesis of this permutation test is unidirectional.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Avar et al report on the development of a high-throughput method to screen modifiers of prion replication in cell lines using a genome-wide siRNA library. They identified a number of hits and further studied one candidate, the ribonucleoprotein Hnrnpk. The authors convincingly show the interest of their method. However, the claims that the ribonucleoprotein Hnrnpk impact prion propagation need to be more quantitatively and statistically substantiated.

      1. * A large part of the manuscript is dedicated to the validation of the high-throughput assay (called QUIPPER). QUIPPER is made in 384-plates and provides great technological improvement. It works with different prion-permissive cell lines and different prion strains. QUIPPER is an antibody-FRET-based assay that detects a specific population of PrPSc that resists phospholipase C (PIPLC) treatment. Historically, PIPLC has been shown to cleave cell surface PrPC while preserving PrPSc (which is endocytic or inaccessible). I would recommend that the authors quantify the proportion of PIPLC-resistant PrPSc (PrPPIPLC) versus total PrPSc in their different models. First, PrPPIPLC proportion may be cell and strain dependent. Second and most importantly, as siRNA effects are studied using PrPPIPLC as readout, it is crucial to know if this form is a bona fide surrogate of PrPSc and infectivity or only a specific, subcellular, potentially minor form of PrPSc. This is particularly important as the effects of Hnrnpk knock-down in QUIPPER and western blot sounds discordant; in QUIPPER, the effects are strong (> 5-fold) while by western blot, the effects are much more modest (We addressed this issue in several ways; firstly, we quantified the proportion of PIPLC-resistant PrP (PrPPLC) versus PrPSc in two different models (Fig. 1B and D). Secondly, we directly compared residual infectivity of cells treated with PK or PIPLC (Figure 1C), using the standard scrapie cell assay. The results show that infectivity is retained upon PIPLC treatment. In addition, we assessed the 161 hits obtained via QUIPPER using PrPSc as a readout (Fig. 3B).

      To provide further data on the robustness of our PIPLC-based readout, we have performed western blotting of infected and uninfected cells upon PIPLC treatment and assessed the band patterns following PIPLC administration. This Figure is now incorporated in the manuscript as Supp. Fig. 1C and demonstrates that upon PIPLC digestion of NBH and RML infected CAD5 and GT-1/7 cells, PrP is barely detectable in the non-infected cells, while it is in the prion infected ones. The blots also show that the PIPLC-resistant PrP (PrPPLC) is resistant to PK digestion. These new data, together with those provided in Fig. 1B and Figure 1C, show that PrPPLC is equivalent to PrPSc in terms of PK resistance and infectivity.

      The reviewer pointed out a discordance between Western Blotting and QUIPPER. Although it is not clearly stated, we think the reviewer may be suggesting a discordance based on Fig. 3D. We would like to point out that Fig. 3D does not report fold changes as the reviewer is suggesting, but Z-scores, measured by standard deviations from the mean, not allowing to infer fold-changes. We quantified the effect of NT and HNRNPK targeting siRNAs on prion levels (Fig. 4A) and saw a three-fold change. We believe that the quantifications provided in the new version of the manuscript alleviate the concerns regarding any discordance.

      Technically, this is quite easy as it necessitates, after PIPLC treatment, the quantification of PrPSc in the supernatant versus PrPSc in the cell pellet. In Fig. 1C, the authors show that PrPPIPLC is infectious in a cell-scrapie assay. Using this approach, they could also quantify the infectivity of these species relative to the total infectivity content.

      We addressed this in Supplementary Fig. 1C as depicted above. Supplementary Fig. 1C shows the alikeness of the PrP species measured via the QUIPPER vs. the canonical PK digestion: upon digestion with PIPLC following a PK treatment, we detect PrPSc. Therefore, the experiment demonstrates that PrPPLC is alike in nature to PrPSc. The difference between the PK digested (lanes 3&4) vs PIPLC treated then PK digested lanes (lanes 7&8) is the PrPSc that is released into the media following PIPLC digestion.

      • *

      • The authors identified a list of prion modifiers candidate. Surprisingly, the authors did not perform a pathways analysis to identify potential pathways that could impact prion propagation.*

      Despite extensive efforts, there were no pathways that were enriched in our 40 hits, which is mentioned in the discussion part of the manuscript. Two analyses (for the 161 candidates and 40 hits) are now added to Supplementary Fig. 3C and pasted below.

      • *

      • The authors then studied in more details one hit, the ribonucleoprotein Hnrnpk. They studied the impact of Hnrnpk knock-down on PrPC and PrPres levels in different cell lines. These data (Fig 4 and Fig S4) lack quantitative (on a higher number of wells) and statistical analyses. The western blot that are shown suggest that PrPC levels are slightly increased by the siRNA and that the increase in PrPres levels is modest, barely significant given the western blot method. Same comment after PSA treatment, at least in PG127-infected hovS cells.*

      We performed a quantification on the western blots for all figures mentioned by the reviewers throughout the manuscript. These are incorporated to the manuscript for the figures: Fig. 4A, Fig. 4B, Supplementary Fig. 4A, Supplementary Fig. 4C, Supplementary Fig. 4D, Supplementary Fig. 4F, Supplementary Fig. 4G.

      Additionally, statistical analyses have been incorporated into the manuscript in these figures: Fig. 4C, Fig. 4D, Fig. 4E, Fig, 4F, Fig, 4G, Fig, 4H, Supplementary Fig. 4F. The analyses and the quantitative data demonstrate the effect of Hnrnpk downregulation and PSA treatment on prion levels to be significant. Moreover, we also addressed the regulation of prions via HNRNPK using vacuoles as a read-out as well as with a different mode of regulating HNRNPK expression using shRNAs. All these results, point to HNRNPK as a true modulator of PrPSc.

      In Figure 4A and B, the use of POM1 and/or POM2 to detect PrPC / PrPres is confusing. POM2 is supposed to detect mostly full-length PrPC (Fig 4A top panel), but more than 3 glycoforms are detected. In Fig 4B, POM1 is used for PrPC but because it has a central epitope, it detects both PrPC and PrPSc.

      Both antibodies are able to recognize both PrPC and PrPSc as it has been shown in many publications from the Aguzzi lab as well as other labs in the field. https://pubmed.ncbi.nlm.nih.gov/19060956/

      Note also in Fig 4B, that DMSO alone seems to impact PrPC levels in PG127-infected hovS cells. This advocates again for a more quantitative analysis.

      We have quantified the western blots using the DMSO control as standard value. As DMSO was used to dilute PSA, this should take into account potential effects coming from DMSO (Fig. 4D, Fig. 4F, Fig. 4H and Supplementary Fig. 4F).

      • Psammaplysene A (PSA) is a pharmacological Hnrnpk binder. The authors used this molecule to further demonstrate that Hnrnpk is involved in prion propagation. I disagree with the author's conclusion that "PSA effect does seem to be limited when HNRNPK shRNAs are applied". In Fig S4D, 1µM PSA seems do decrease PrPres levels at similar levels whether the shRNA is applied or not. Again quantification and statistical analyses from several independent experiments would help supporting the authors conclusions.*

      We assessed this point carefully by quantification of the western blots (Fig. 4H) and providing statistical data (Student’s t-test) from three experiments. As we see a threefold lower decrease of prions with and without Hnrnpk regulation when PSA is present, we concluded that the effect we see from PSA should be arising through Hnrnpk. However, we cannot conclusively delineate the effect of PSA, because Hnrnpk ablation is not possible due to essentiality of Hnnrpk. This has now been added to the discussion portion of our manuscript.

      • The authors finally tested PSA on organotypic brain slices (in that case, they provide statistical results) and on flies infected with ovine PG137 prions. PSA administration significantly reduced the locomotor deficits prion-infected flies. The authors quantified the effects of PSA on prion accumulation in flies. Because the overall levels were not detectable by immunoblot, they used a cell-free assay termed RT-QuIC to address prion seeding activity in fly heads. I have specific comments about these experiments:
      • Maybe I missed it, but I could not find which recombinant PrP is used in RT-QuIC assay.*

      This information is provided in the M&M section of the manuscript at hand. The relevant section on P25 reads, where HaPrP23-231 refers to hamster PrP:

      The reaction buffer of the RT-QuIC consisted of 1 mM EDTA (Life Technologies), 10 μM thioflavin T, 170 mM NaCl, and 1× PBS (incl. 130 mM NaCl) and HaPrP23-231 filtered using 100-kD centrifugal filters (Pall Nanosep OD100C34) at a concentration of 0.1 mg/ml.

      In addition, we added this information to the main text as well.

      - This is important as recombinant PrP self-polymerize after a period of time and here the authors have left the RT-QuIC assay running for unusually long period of times (RT-QuIC are stopped after 24h-48h).

      For prions, long RT-QuIC experiments are often performed (also see: https://pubmed.ncbi.nlm.nih.gov/32598380/, https://journals.asm.org/doi/10.1128/mBio.02451-14, https://www.nature.com/articles/s41598-021-84527-9, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3458796/ and others).

      In addition, this is controlled for in all experiments performed in the lab, as the prion-negative sample containing the same RT-QuIC substrate does not become positive after the entire duration of the assay (Fig. 5D).

      - Instead of titrating prion seeding activity by endpoint titration, the authors quantified PSA activity by measuring the effect on another parameter of the RT-QuIC, the length of the lag phase before the conversion reaction is visible. While this is an interesting criterion, reduction of seeding activity must be shown to unequivocally demonstrate that PSA has delayed prion pathogenesis in flies.

      Based on the data presented in the manuscript, we assessed prion pathogenesis in flies using a well-established climbing assay, demonstrating that treatment with PSA significantly improves locomotor behavior, which has been shown to be directly linked to prion levels and is known to have even greater sensitivity then the traditional mouse bioassay (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5998032/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6113635/, https://link.springer.com/article/10.1007/s00441-022-03586-0).. The RT-QuIC represented here represents itself as a secondary read-out to the climbing assay, for which Lag-time quantification is used routinely (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3893511/, https://www.nature.com/articles/s41598-017-10922-w, https://journals.asm.org/doi/10.1128/mBio.02451-14, https://www.nature.com/articles/s41598-021-87295-8). Our results effectively highlight the overlap between the complementary read-outs.

      - Can the authors exclude any interfering effect of PSA on the RT-QuIC reaction, given the amount of material used to seed the reaction (1:20 diluted head homogenates)?

      We do not know how much PSA has reached the Drosophila brain, therefore, the experiment suggested by the reviewer cannot be tied to a 1:20 dilution. However, the concern of the reviewer is valid, and we therefore performed a spiking experiment of a prion positive sample using 1uM PSA (the highest amount used to treat cells, for which we saw a strong prion-reducing effect). We did not see an interference in the RT-QuIC signal due to PSA in the reaction. This has been incorporated into Figure 5D.

      • could the authors comment on the fact that HNRNPK knock-out is not possible and that their siRNA and shRNA are not affecting the cell viability?*

      To select hits during the screen process, we apply a viability filter, excluding siRNAs that reduce viability by more than 50% when compared to the non-targeting control siRNA (Supplementary Fig. 1F). For GT-1/7 cells we do not see any effect on viability of siRNA treatment after 96h. However, as downregulation of HNRNPK worsens the cytopathological vacuolation in the hovS model, as shown in Supp. Fig 4A, we do see an effect on cell fitness using both siRNA as well as shRNA. In addition, as knocking down HNRNPK will not lead to its complete loss, the remaining levels might be enough to sustain viability. Moreover, the longest knockdown experiment we performed is 7 days, we cannot exclude that longer exposure would have an impact on viability, but this question is not in the scope of the paper.

      • In the discussion the authors do not discuss how Hnrnpk could impact prion propagation. This may deserve a comment as this protein is present in the nucleus. As PrPC has been also identified in this compartment, can this specific form be involved in prion pathogenesis?*

      We additionally elaborated on potential ways of how Hnrnpk might impact prion propagation in the discussion, which includes potential nuclear PrPSc as well as with regards to our data obtained from the sequencing efforts shown in Fig. 4I. In addition, we investigated some functional targets of Hnrnpk how they are affected by PSA, which is now added to Supp. Fig 4G.

      Reviewer #1 (Significance (Required)):

      The QUIPPER method is a great conceptual and technological approach that could be applied to genome-wide analyses and screening for therapeutic molecules.

      * The study will interest a general audience interested in neurodegenerative diseases linked to protein misfolding. There are commonalities in pathways and modifiers of the conversion. Further PrP has emerged as a receptor for alpha-synuclein (Parkinson disease) and A-beta peptides (Alzheimer's disease).

      Expertise key words: prion diseases - prion pathogenesis in cell models*

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Prions are protein-based infectious agents that underlie neurodegenerative disease. For prion diseases (e.g., mad cow disease), the infectious agent is the cellular prion protein (PrPc). It exists in a normal conformation and carries out its normal cellular function. However, when it becomes misfolded and aggregates it can adopt an altered conformation, referred to as the prion conformation, or PrPSc. PrPSc aggregates can template the conversion of other PrPc molecules into the PrPSc form. In this way the prions can propagate from one cell to the next and throughout an organism. Prion diseases are truly devastating and identifying ways of stopping prion propagation is of great interest. In this manuscript by Aguzzi and colleagues, the authors designed a way to screen for prion propagation modifiers in mammalian cells. They built a highly sensitive readout of PrPSc propagation and adapted it to a 384-well plate format in adherent cells. They then used this to perform a genomewide siRNA screen, looking for genes that increased or decreased PrPSc propagation when knocked down.

      * They identified nearly 1,200 modulators of prion propagation and then subjected them to various validations and filtering to focus on only those hits that affected PrPSc but not PrPc (though hits that affect levels of PrPc could certainly be interesting). All this led to 40 genes (20 that increased and 20 that decreased prion propagation.*

      * Among these 40, the authors focused on one hit, hnRNPK, an essential RNA-binding protein with diverse cellular functions. They provide evidence that reducing levels of hnRNPK leads to increase prion levels.*

      * They next move to a marine compound called Psammaplysene A (PSA), which had previously been shown to have some neuroprotective properties and to be able to bind to hnRNPK. Because of the latter observation, the authors test if PSA can affect prion levels. They show that indeed treatment of their cell line prion infection model, or an organotypic slice model, or a fly model with PSA is sufficient to decrease prion levels.*

      * The authors propose that PSA works to reduce prion levels by increasing the activity of hnRNPK and that this also implies a role of RNA (because hnRNPK is an RNA-binding protein) in prion propagation. * In a nutshell, in my opinion the design and execution of this genomewide screen is ingenious and has yielded a treasure trove of potential prion modifiers. The ability to distinguish between modifiers of Prpc and PrpSc is super powerful. However, the follow-up and focus on hnRNPK and its connections (which seem tenuous) to the marine compound PSA are incomplete and raise more questions than answers. In its present form, it is hard to assess the potential significance of hnRNPK in prion propagation. I have some comments and suggestions for the authors to consider.

      * 1.To my eye, Fig. 4A looks like Hnrnpk siRNA leads to slightly increased levels of PrPc (detected with POM2 antibody) and this could explain the increase in PrPSc levels. Can the authors assess Prnp RNA levels and the effects of their siRNAs on Prnp expression? It would also be useful to provide quantification of immunoblots if possible.*

      We quantified the western blots as mentioned in our response to reviewer 1. The quantifications are now provided for figures: Fig. 4A and Supplementary Fig. 4A, showing that the increase in prion levels is much stronger than that of PrPC. These confirm the results from the screen as seen in Fig. 3D. In addition, we would again like to point out that the use of shRNAs to knockdown HNRNPK did not yield the increase in PrPC levels aforementioned, as evident by Supplementary Fig. 4D which demonstrates a decrease of PrPC, despite increasing PrPSc levels. Moreover, we show quantification of RNA levels upon downregulation of Hnrnpk and with PSA, which show that downregulation of Hnrnpk via siRNAs indeed increases Prnp mRNA levels and that PSA does not change RNA levels of neither Hnrnpk nor Prnp (Fig. 4C).

      • In Supplemental Fig. 4B it also looks like knocking down Hnrnpk results in decreased PrPc levels in this experiment and its not clear how robust the increase in PrPSc levels are. Quantification of these experiments, if possible, would be helpful.*

      Please see response above. We now provide quantification to all western blots.

      • The authors treat with PSA, which is supposed to bind to Hnrnpk. They state that this treatment does not affect PrPc levels but to my eye Supplemental Fig. 4C looks like highest doses of PSA cause a decrease in PrPc levels. Quantification of the immunoblots would also be useful here.*

      Please see response above. We now provide quantification to all western blots and added a sentence to the manuscript.

      • The authors use Hnrnpk knockdown along with PSA to test if the effects of PSA depend on Hnrnpk. They see PSA decreases PrPSc levels and that this is, to my eye, only slightly attenuated by Hnrnpk reduction. I interpret these results slightly different than the authors. To me, it seems that this result indicates that PSA's effects are (mostly) independent of Hnrnpk.*

      Addressed in point 4 from reviewer one.

      • In the original paper identifying PSA and hnRNPK physical interaction, RNA-binding was important. In the authors' assays, does Hnrnpk's effect on prions depend on RNA-binding? Specific mutations to the RNA-binding domains can be made to assess this.*

      This is a very interesting point. We did try to obtain data to support this claim, however, due to the essentiality as well as tight control of Hnrnpk expression, we were not able to express different forms of Hnrnpk and acquire conclusive data. Therefore, it is currently being pursued how Hnrnpk might affect prion propagation in the scope of another publication.

      • The genetic interaction in the vacuolation phenotype between Prnp and Hnrnpk that the authors report is very interesting (Supplemental Fig. 4A). It seems like this system and phenotype could be useful for the authors in exploring mechanisms by which HnrnpK is functioning.*

      • *

      We absolutely agree to the reviewer’s comment. As mentioned above a second publication is under way to investigate the mechanisms of Hnrnpk’s antiprion function, which is not in the scope of this study.

      • The authors propose that PSA increases activity of Hnrnpk but does it change any Hnrnpk RNA targets from their RNA sequencing? Some functional readout of Hnrnpk function would be useful here to test this hypothesis.*

      Although we do suspect RNA binding has an important role in the anti-prion function of Hnrnpk, we cannot exclude other modalities which Hnrnpk might be function through, such as DNA binding and protein-protein interactions. Therefore, to answer this question, a considerable effort that explores each of the potential of these modalities with regards to the anti-prion function of Hnrnpk would be needed. This extensive effort, however, is out of the scope of the manuscript at hand. However, we investigated the effect of PSA on some known functional targets of Hnrnpk (as suggested by the reviewer) from our sequencing efforts and added this analysis as Supplementary Fig. 4H to the manuscript. These results suggest that PSA leads to an increase of the expression of DNA targets of Hnrnpk, potentially suggesting a modality of action. Moreover, we amended the discussion with regards to potential pathways that might be yielding the effect seen as evidenced by the RNAseq data.

      • In the Introduction, the authors mention two yeast papers in introducing the concept of using unicellular model organisms to perform modifier screens. The first paper (Outeiro and Lindquist, 2003) is a classic but does not contain a yeast screen. The other one does include a loss of function screen in yeast (for polyQ toxicity modifiers) but those results seems to be due to loss of the [RNQ+] prion from certain deletion strains instead of from specific roles of modifier genes, so that paper might not be the best exemplar of yeast modifier screens.*

      We sincerely thank the reviewer for their careful readthrough of the manuscript, the portion that refers to the manuscripts as screens was amended and two new citations for appropriate yeast screens were added to the manuscript.

      • The authors asked if any of their hits from their screen had human genetics connections to neurodegeneration. They mention one of their hits Dock3 right after saying that no hit reached statistical significance after multiple testing corrections. This seems a bit misleading since any time one makes a list of anything there will always be, by definition, one at the top of the list.*

      We amended the wording to improve clarity of the manuscript.

      • The authors perform RNA sequencing on prion infected cells that either had Hnrnpk siRNA or PSA and since these two treatments had opposite effects they looked for genes that went in the corresponding directions. They didn't find anything significant when looking for genes downregulated by Hnrnpk siRNA and upregulated by PSA. They did find glucose metabolism genes when looking in the opposite direction. The significance of this finding is unclear and the authors do not expand on it.*

      Addressed in point 7 of reviewers 1 and 2, we expanded the discussion portion of the manuscript with regards to these results.

      • To me, the data with PSA seem more robust than the Hnrnpk data and it seems that the authors are trying to perhaps over-fit them together. It is possible that PSA affects prion levels independent of Hnrnpk function. This would not dampen my enthusiasm at all for this finding and could be of interest to those in the prion field, in which the search for anti-prion compounds is of great interest.*

      Upon statistical analysis of the result in Fig 4H, we see a three-fold decrease of PSA activity upon HNRNPK downregulation, suggesting PSA activity might be linked to HNRNPK. However, the reviewers point is well taken and we emphasized the value of understanding the function of PSA or mimicry of its effect as potential therapy in the future.

      ***Cross-commenting:**

      All three reviewers seem to appreciate the novelty and impact of the new QUIPPER method the authors have developed to discover modifiers of prion propagation. All three reviewers also seem to be somewhat less convinced by the connection to hnRNPK, including how the compound PSA's anti-prion effects involve hnRNPK (or not).*

      * In my opinion, this manuscript presents important and novel work and a really ingenious new method to study prion propagation, which will be broadly useful to the prion field. I feel that the hnRNPK data could be strengthened, especially with more quantitative analyses. The PSA treatment data are compelling but it seems that the effects might be independent of hnRNPK and that the authors are trying to force a connection which might not be there.*

      * Reviewer #2 (Significance (Required)):*

      * *** Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. ****

      I have expertise in neurodegenerative disease, protein misfolding, yeast modifier screens, CRISPR modifier screens in human cells, and RNA-binding proteins. I have general knowledge about prions, including PrP, but I am not a prion expert.*

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The authors conducted an arrayed RNAi-based genome-wide high-throughput screening of all protein-coding modifier genes that affect prion propagation in cultured cells (murine and human cell lines) using a novel quantitative high throughput QUIPPER assay that they developed. They identified 1191 genes, of which 40 selectively affect PrPSc. Half of the 40 genes seem to inhibit PrPSc (limiter) whereas the other half do the opposite (stabilizers). One of the strong limiters is Hnrnpk, is an essential small heterogeneous nuclear ribonucleoprotein that has been implicated in a few protein misfolding diseases. The biological relevance of the findings is demonstrated by the detection of previously reported modifier genes as well as thorough verification of Hnrnpk as an effective prion limiter that seems to be independent of the two prion strains or host species (mouse and human cell lines as well as Drosophila).

      * The manuscript is very well written, the approach is novel, very well verified, and effective, the data are solid, and the main conclusions convincing.*

      * Two issues need to be discussed.*

      * Major comments:*

      * First, some genes encoding proteins involved in PrP processing, such as ADAM10 and ADAM8, are known to affect PrPC levels, but they are not among the modifier genes identified. Based on Table 2, ADAM8 expression is very low in the GT-1/7 cells. This points to one of the caveats of the RNAi screening approach in that potential roles of low expressing genes in the cell lines used could be missed. Although it is beyond the scope of this manuscript, it would be helpful to add discussions on complimentary screening enhancing gene expression and the use of more cell lines that will allow identification of more modifiers.*

      We thank the reviewer for their concern. The point regarding the screen being less sensitive for genes that are low-expressed in the cell line in question is valid. Upon advancing of the CRISPR-based technologies and the improvement of these technologies to be used in combination with prions, we see their value. We added a sentence to the discussion, talking about gene activation as a future alternative to perform a complimentary screen.

      Second, the statement that PSA's anti-prion effect potentially arises through enhancing the activity of HNRNPK makes sense, but it is also possible that PSA can directly inhibit prion replication as well. It would be helpful to calculate the percentage of reduction in PrPSc by PSA treatment and the percentages compared between shNT and shHNRNK cells.

      We thank the reviewer for the careful read through of the manuscript. The point was addressed for reviewer 1 point 4. In addition, if PSA is added to the RT-QuIC, it does not prevent aggregate formation, indicating that PSA is unlikely to directly inhibit prion replication, but rather depends on a cellular host-intrinsic molecule for its activity. However, we also elaborate more on the possibility of potential other mechanisms for Hnrnpk and PSA’s function on regulating prion levels in the discussion section of our manuscript.

      Minor comments:

      * First, Figure 1C shows that the relative intensity for RML CAD5 cell lysate infected cells is less than with PIPLC treated or PK treated, which seems to be the opposite of what is expected, because PIPLC or PK treatment should not increase infectivity. Please explain.*

      We agree with the reviewer that the results were surprising. For the practicality of the screen, we wanted to show that the treatment does not eliminate the infectious species, which we were able to demonstrate. However, the increase of infectivity could stem from many different factors, e.g. the amount of duration of PK treatment might not harm but instead rather expose the infectious species, or PIPLC might remove cell surface molecules that could prevent infection of cells. However, as there are a plethora of possible scenarios and it was not relevant for the study at hand, we did not go into further detail.

      Second, in Fig S1 e, the labels are too small to read. In Fig 3D, it would be easier to match the stabilizer or limiter genes with the corresponding Z score dots if the genes with a negative Z scores are labelled on the left side while genes with positive Z scores be labelled on the right side.

      We amended the figures as per the reviewer’s suggestion.

      Third, The following sentence on page 11 is confusing: "20 out of these 40 candidates reduce prion propagation upon silencing, and 20 candidates enhanced prion propagation, and henceforward are called stabilizers or limiters, respectively (Fig. 3D-E, Supplementary Table 1)." Did the author mean to say "....and 20 candidates enhanced prion propagation upon silencing, and hence..."?

      We reworded the sentence according to the reviewer’s comment.

      * Fourth, In the subheading "Hnrnpk expression limits of prion propagation in mouse and human cells", "of" should be deleted.*

      We addressed this in the main manuscript file.

      ***Cross-commenting:**

      I agree with Reviewer #2's assessment that more quantification will be helpful and the link between the effect of PSA treatment and hnRNPK can be strengthened. I want to stress that the knockdown data clearly shows the involvement of hnRNPK as a prion limiter in cultured cells. The question on PSA does affect the interpretation of the ex vivo and in vivo data.*

      * The blot in Fig. S4c seems to show some decrease in PrPC levels in NBH-treated GT-1/7 cells. This blot needs to be quantified to confirm whether the PrPC level is changed by PSA treatments. Whether PSA directly inhibits prion replication can be relatively easily assessed in RT-QuIC reactions. Alternative to the use of PSA, RNAi-mediated hnRNPK knockdown can also be done on cultured tissue slices or in brain, but this will require a lot more time and efforts and may be too much to ask for in this manuscript.*

      Quantifications for blots were added throughout the manuscript and the text was amended accordingly, and all the points mentioned have been addressed throughout this response letter.

      Reviewer #3 (Significance (Required)):

      * The findings are novel and very significant. They identified a large number of modifier genes, and established a solid foundation for future studies on prion modifier genes to study prion replication and pathogenesis and for novel therapies against prions and potentially some other protein misfolding diseases. HNRNPK seems to be good target for therapeutic intervention and PSA may be a good candidate for prion treatment. The novel QUIPPER assay can be used to screen for anti-prion compounds and potentially adapted to study other misfolding proteins associated with cells.*

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      All the Reviewer’s comments are reproduced below, with our responses interspersed in [[brackets]]. Citations from the revised manuscript are included in “quotation marks”. The website accepts input only as plain text. Consequently, we had to transform the mathematical expressions into plain text. We apologize for the reduced readability.

      Reviewer #1

      1) The authors state that: "the conductance density mediated by the expression of the mutant was 2.5 times smaller than the wild type, although we transfected the same amount of plasmid DNA (Fig. 2E). Assuming that protein expression is independent of the mutation, the observation suggested that the unitary proton flux ratio RC of wild type to mutant channel was equal to 2.5" (lines 82‐85).

      Macroscopic conductance (G) depends on channel number (N), microscopic or unitary conductance (γ), and open probability (PO) by G=N γ PO. The authors assume that the level of WT and D174A mutant protein expression on plasma membrane, which determines N, are equal; however, this critical assumption does not appear to have been tested.

      The fact that conductance density (nS/pF) is plotted in Fig. 2E does not alter this caveat because this procedure normalizes the data only for cell surface area (i.e., size). The authors' conclude that "The conductance density relationship (Fig. 2E) compares the maximal conduction of both constructs; this is the fully open channel (open probability ≈ 1)"(lines 87‐88). However, neither raw currents nor G‐V data are shown. Typically, currents measured at large, near‐saturating PO are used to compare the relative conductances of WT and mutant ion channels. The currents shown in Fig. 2A and 2B exhibit prominent 'droop' at even modest depolarizing potentials (+10 mV for D174A and +30 mV for WT), indicating that the proton gradient has been substantially perturbed by the flow of ge depolarizing voltages needed to drive channels to near‐maximal PO. Furthermore, there is no evidence that maximal PO itself is also not different in WT and D174A channels. Indeed, maximal PO for native Hv1 channels measured using variance analysis is reported by significantly smaller than 1.0, and assuming that PO = 1.0 for either WT or D174A is therefore not well supported. Maximal could be altered by the D174A mutation, which has a clear and strong effect on channel gating evidenced by the large (‐70 mV) negative shift in threshold potential reported both here and previously in the literature. Effects of mutations on maximal PO due to altered gating behavior could be separate and distinct from any change in plasma membrane channel number (N). 3 Lastly, because D174A channels have a much higher PO than WT at 0 mV, the mutant will necessarily conduct inward proton currents at the physiological resting membrane potential (RMP) in tsa‐201 cells (perhaps ‐30 mV?). Inwardly directed proton currents will therefore cause intracellular acidification under resting conditions.

      The constitutive acid load in cells expressing D174A, but not WT, is likely to have a variety of physiological consequences, including decreased protein expression or plasma membrane targeting of D174A. There is evidence that another constitutively open Hv1 mutant (R205H) also generates smaller currents macroscopic conductance than WT, and this phenomenon is likely to result from decreased cell surface expression. To conclude that the microscopic conductances of WT and D174A are unequal, the authors must demonstrate that N is not different. The authors' conclusion that D174A "conducts protons at a lower rate" (line 89) is therefore not well supported by the experimental data.

      [[

      We toned down our conclusions from the experiments to accommodate the reviewer's criticism: (page 4): " Consequently, the mutant channel is nearly fully open (Fig. 2D), readily seen when the membrane potential is 0 mV and external voltage is absent. The high open probability of the D174 mutant under symmetrical pH conditions is readily seen in the tail current amplitude reaching a quasi-saturation (Fig. 2A). The resulting outward currents have a higher amplitude in the wild-type (Fig. 2A+B). Interestingly, the conductance density mediated by the expression of the mutant was 2.5 times smaller than the wild type, although we transfected the same amount of plasmid DNA (Fig. 2E). Our observation suggests a reduced flux through the mutant if we assume that protein abundance in the plasma membrane is independent of the mutation."]]

      2) The authors indirectly measure apparent proton flux rates (λD) in LUVs containing WT and D174A mutant Hv1 channels using a fluorescence‐based approach, and conclude that λD is 2.4 times smaller for D174A than WT. However, the method for estimating λD is not performed under voltage clamp, and the driving force for proton current is neither known nor measured.

      [[

      The reviewer is mistaken. The method for estimating λD is performed under voltage clamp, and the driving force for proton current is known.

      Page 6: “To obtain λD, we encapsulated c_k^i=150 mM KCl in the HV1 containing large unilamellar vesicles (LUVs) and exposed these vesicles to a buffer with a K+ concentration c_k^o= 3 mM. The addition of valinomycin facilitated K+ efflux, thereby inducing a membrane potential, ψ. ψ constituted the driving force for H+ uptake. It can be calculated according to the Goldman equation:

      ψ = -RT/F ln ((c_k^i+(P_H/P_K ) c_H^i)/(c_k^o+(P_H/P_K ) c_H^o ))

      (1)

      The ratio of the HV1 mediated proton permeability P_H to the valinomycin-mediated potassium permeability P_K is always smaller than 0.04. We base our conclusion on the observation that the CCCP-mediated proton permeability represents an upper limit for P_H since CCCP always induces a faster vesicular proton uptake than HV1 (Fig. 3). Accordingly, the maximum value of P_H/P_K can be estimated as the ratio of valinomycin to CCCP conductivities. The respective values are equal to 1.6 10-3 Ω-1 cm2 [1] and 4 10-6 Ω-1 cm-2 [2]. At pH 7.5, we find c_H^o=10^(-7.5) M, i.e., c_k^o ≫ (P_H/P_K )c_H^o. Similarily, c_k^I ≫ (P_H/P_K ) c_H^i for a broad range of intravesicular pH. With these simplifications, Eq. 1 transforms into the Nernst equation yielding:

      ψ = -RT/F ln (c_k^i)/(c_k^o )=-100 mV

      (2)

      ψ of such size may decrease intravesicular pH by nearly two units. Such acidification does not violate c_k^i ≫ (P_H/P_K ) c_H^i so that ψ remains constant throughout the experiment. That is, the vesicle experiments proceed under voltage clamp conditions. The simple explanation is that, due to the small proton concentration and the limited buffer capacity, the K+ conductance exceeds H+ conductance under all conditions. The conclusion is in line with simulations (32), confirming that the membrane potential is driven very near the Nernst potential for K+.”]]

      The authors state that "Transmembrane voltage constituted the driving force for proton uptake into LUVs (Figure M). It resulted from facilitated K+ efflux out of the vesicles (30)", (lines 261‐262), but this voltage is unknown and not likely to equal the Nernst equilibrium potential for K+ once Hv1 channels begin to open.

      [[

      The reviewer is mistaken. The voltage is known (see the equations above). The opening of the HV1 channels does not alter the potential because c_k^o ≫ (P_H/P_K ) c_H^o and c_k^i ≫ (P_H/P_K ) c_H^i for a broad range of intravesicular pH (see above).]]

      Once Hv1 channels begin to open, intra‐lumenal pH (pHi) will necessarily occur during the experiment. Such changes are likely exacerbated by a) the low proton buffering capacity of the system (5 mM HEPES) and b) the absence of any counter‐charge pathway to balance the effect of proton charge movement on the membrane potential.

      [[

      Vesicle acidification occurs. It signifies the presence of functional proton channels. Nevertheless, the membrane potential does not change (see Equation 1 above). The statement b) is not correct because the outward K+ movement counters the inward-directed proton charge movement.]]

      Given the small volume of LUVs, even a relatively modest difference in either membrane potential or pHi could substantially alter the driving force for proton movement. Together, these factors are highly likely to result in a rapid and potentially large change in the driving force for proton flux.

      [[

      As outlined above, membrane potential stays invariant. Vesicle acidification changes the driving force for proton flux. The steady state is reached when the electrochemical potentials for protons on the two sides of the membrane are equal to each other.]]

      Driving force changes may also be different for WT and D174A because their relative PO may be different under the experimental conditions used here. Because D174A activates at much more negative voltages, it is likely to open more quickly and to a higher PO than WT at early times after depolarization is initiated by addition of valinomycin (Fig. 3A). This fact will likely result in a larger initial inward current being carried by D174A than WT channels. The result would be a more rapid acidification of LUVs by D174A.

      [[

      The reviewer is mistaken. Assuming a transport rate of 20,000 potassium ions per second (G. Stark, B. Ketterer, R. Benz and P. Läuger; Biophys. J. 1971 Vol. 11 Pages 981-981) and a membrane capacity of 1 μF cm-2, it takes valinomycin about 10 ms to drive the vesicular potential to near Nernst values. Activation of the proton channel is at least 10 times slower. Thus, both mutant channel and wild type channel may open at roughly the same instant. The driving force is sufficient to open both channels to the same probability.]]

      The experimental data in Fig. 3A are consistent with the expectation that the proton gradient and driving force more rapidly approach equilibrium for D174A than WT channels: the apparent rate of AMCA fluorescence change is slower in D174A. Although the authors correctly interpret the experimental data to mean that the apparent λD is slower for D174A, they do not rule out the artifactual explanation for the measured differences. Indeed, the observation in Fig. 3A that AMCA fluorescence change eventually reaches a plateau and is not affected by CCCP means that the proton gradient has become exhausted during the experiment, and directly demonstrates that the proton driving force is uncontrolled under the current experimental conditions.

      [[

      The reviewer's interpretation of our results is flawed. Instead of becoming exhausted, the proton gradient builds up during the experiment. Initially, extravesicular and intravesicular pH values are equal to each other. Valinomycin-mediated K+ efflux results in a membrane potential that drives Hv1-mediated H+ influx.

      Page 8: “The number NC of reconstituted HV1 dimers per vesicle determines the acidification rate λ, i.e., the time that elapses before reaching the steady state. The final intraluminal pH is independent of NC. Similarly, CCCP addition in the steady state does not change the intraluminal pH of HV1-containing vesicles. But CCCP will affect the intraluminal pH of vesicles deprived of HV1 since H+ background permeability is too small to allow vesicle acidification within the time allotted for the experiment. Consequently, only HV1-free vesicles will acidify upon CCCP addition. That is, CCCP addition allows estimating the fraction of vesicles deprived of HV1.”]]

      In contrast to the authors' statement that "Our experiments with the purified and reconstituted channels corroborated the conclusion (Fig. 3A)", (lines 92‐93) it is not clear that unitary proton flux rates/unitary conductances are actually different in WT and D174A.

      [[

      The reviewer is mistaken. Since we measured under voltage clamp conditions, ensured rapid installment of the membrane potential, and selected a potential large enough to allow for the same open probability of wild-type and mutant channels, the measured transport rates, λ, are valid. Moreover, we determined the number of HV1 channels per vesicle and thus calculated the transport rate of an individual channel, λD. Since λD is different for WT and D174A, the unitary proton flux rates/unitary conductances are actually different in the wild type and mutant.]]

      3) The presumed differences in unitary conductances (i.e., 'transport rate') between WT and D174A are used to estimate Arrhenius activation energies (Ea): ("The difference in measures transport rates allows a rough estimation of the Arrhenius 128 activation energy Ea for HV1‐mediated proton flow. It amounts to 40 kJ/mol for the wild type and 23 kJ for the mutant. Thus, Ea exceeds the corresponding 15 kJ/mol barrier measured for gramicidin A (32, 33)", (lines 128‐130). The method for determining Ea in the current work is not well‐described. In Ref. 32, the authors estimate Arrhenius activation energy (Ea = 20 kJ/mol) for gramicidin D (not gramicidin A) from the slope of a line fit to measurements of currents at various temperatures. Here, the authors measure AMCA fluorescence decay rates at 4 °C and 23 °C and observe a similar temperature‐dependent difference in WT and D174A (Fig. S2). Given that the data indicate that WT and D174A are similarly temperature‐dependent, it is unclear how the authors arrive at different Ea values. The authors' conclusion that "The increment in Ea suggests that the transport mechanism may be different from a pure Grotthuss type, where the proton uses an uninterrupted water wire to cross the membrane", (lines 131‐133) therefore does not appear to be well‐supported.

      [[

      We removed both the calculation and discussion of activation energies. Knowledge and discussion of activation energies distract from the scope of the manuscript. We show the experiments at different temperatures solely to demonstrate that Hv1 and D174A facilitate proton transport at a decreased temperature where the background conductivity of the lipid bilayer to water is small.]]

      4) The authors report no difference in water permeability in WT vs. D174A (Fig. 5 and S1) and interpret the results to mean that proton currents are not associated with measurable bulk water flow. A similar conclusion was reached for native Hv1 channels using deuterium substitution (DeCoursey & Cherny, 1997).

      [[

      The comment of the reviewer is misleading:

      • Equal water permeabilities of WT and D174A would not exclude an association between proton currents and water flow. Accordingly, our manuscript does not contain the stipulated interpretation.
      • DeCoursey & Cherny (1997) did not evaluate bulk water flow through proton channels. They compared D+ and H+ currents across the plasma membrane of rat alveolar epithelial cells. Page 2: “Comparing deuterium ion and proton currents through the plasma membrane of rat alveolar epithelial cells, DeCoursey & Cherny (22) found an isotope effect exceeding that for hydrogen bond cleavage in bulk water. It suggested the involvement of an amino acid side chain in proton conduction (22). Alternatively, altered properties of confined water could have been responsible for the higher isotope effect.”]]

      However, the absence of bulk water flow does not itself rule out the possibility that 'trapped' waters within the Hv1 pore do not themselves carry the measured proton current. If intra‐pore water molecules are tethered by hydrogen bonds with protein atoms, they may not move when Hv1 channels open.

      [[

      The reviewer’s comment contains one misinterpretation and one unfounded statement:

      1. We never stated that 'trapped' waters within the Hv1 pore do not themselves carry the measured proton current. On the contrary, we envisioned the trapped waters delivering the protons to one or more titratable amino acid side chains and accepting the protons from them.
      2. The reviewer’s view that intra‐pore water molecules tethered by hydrogen bonds with protein atoms may not move when Hv1 channels open is a misconception. Page 12 bottom: “The contrasting opinion that instead of a channel obstruction hydrogen bonds may immobilize the pore water (19) is not convincing. First, the lifetime of a hydrogen bond is in the ps range while HV1’s mean open time exceeds 100 ms (41). Thus, hydrogen bonds may break more than 1011 times during the open state, rendering them unfit for tethering intraluminal water molecules. Second, the effect of hydrogen bonds between water molecules and pore residues is limited to decreased water mobility in narrow channels (23). Their number, NH, allows for predicting pf (26). Specifically, every H-bond donating or receiving pore-lining residue contributes an average increment ΔΔG╪ of 0.1 kcal/mol to the Gibbs free energy of activation ΔG╪ (24). Equation (1) allows the calculation of ΔG╪:

      ΔG╪= N_H ΔΔG╪ + ΔΔG╪_i (13)

      where ΔΔG╪_i = 2 kcal/mol (24). Since N_H = 6 (Fig. S1) in the open HV1 conformation, Eq. 1 predicts ΔG╪ = 2.6 kcal/mol. Eq. (2) allows calculating HV1’s pf from this value (42):

      p_f = v_0 v_w exp(-ΔG╪/RT) (14)

      where vw = 3 × 10−23 cm3 is the volume of one water molecule and ν0 is the universal attempt frequency, ν0 = kB∙T/h ≈ 6.2 × 1012 s−1 at room temperature (kB is Boltzmann’s and h is Planck’s constant).”]]

      Proton transfer through a hydrogen‐bonded network of waters requires only that the electronic structure of the network be rearranged during proton transfer; water is not required. As in the previous study (DeCoursey & Cherny, 1997), the lack of water flux reported here demonstrates seems to reinforce the notion that H+ moves separately from its waters of hydration (i.e., hydronium, H3O+, is not the permeant species) and does not necessarily imply information about the mechanism of proton transfer (i.e., side chain ionization vs. Grotthuss‐type transfer in a water‐wire).

      [[

      The reviewer is mixing two unrelated issues. Of course, proton transport may be separated from mass transfer. Yet, charge transfer may or may not include one or several titratable amino acid side chains. If proton side chain ionization is not involved in proton transfer, a water wire must exist that connects the aqueous solutions on both sides of the membrane. In this case, an osmotic gradient will drive water molecules through the open channel. Since we did not observe such water flux, we conclude that the water wire is interrupted by at least one side chain. Thus, our experiments imply information about the mechanism of proton transfer.]]

      The authors state that: 1) "every H‐bond donating or receiving pore‐lining residue would have contributed an increment ΔΔ𝐺‡ of 0.1 kcal/mol to the Gibbs free energy of activation Δ𝐺‡ (25)" (lines 145‐147), and 2) calculating NH from this Δ𝐺‡ allows estimation of the channel's unitary water permeability (Eqn. 2). Although hydrogen bonding patterns will undoubtedly alter the free energy for channel activation, this is not the same free energy change as that for proton transfer.

      [[

      The reviewer's remark is in line with the previous and the current versions of our manuscript.]]

      Hv1 gating involves conformational changes that are both voltage and Δ pH-dependent, and the D174A mutation is known to alter the voltage dependence of gating (Fig. 2 and previous studies). The effect of D174A on Hv1 unitary conductance, however, is speculated but not unambiguous (see above).

      [[

      Our experiments unambiguously demonstrate the effect of D174A on Hv1 unitary conductance. The interpretation of the experiments is straightforward – there is no speculation involved. The contrasting opinion of the reviewer rests on his misinterpretations of (i) our measurements of proton transport rate λD for wild-type and mutant (see above) and the CCCP-effect (see above).]]

      In the absence of definitive experimental data showing differences in the unitary conductance of WT vs. D174A, the authors' assumption that water permeability would be strongly temperature‐dependent (lines 154‐160) seems premature and their ensuing conclusion tenuous: "pore residues interrupt the HV1 spanning water wire, trapping the water molecules inside the HV1 channel. In contrast to water, protons cross the pore by hopping from one acidic residue to another through one or more bridging water molecules (Fig. 6)" (lines 161‐164).

      [[

      The reviewer chooses to misinterpret our lines. We did not assert that water permeability through the Hv1 channel would be strongly temperature‐dependent. We referred to the well-known fact that there is a strong temperature dependence of lipid bilayer water permeability - in contrast to the tiny effect of temperature on the water permeation across aqueous channels.

      Page 11, bottom: “Considering the stark dependence of the activation energy for background water flow across lipid bilayers (24), we repeated the experiments at a decreased temperature of 4°C. Thanks to the low background water permeability at 4°C, even tiny contributions of HV1 to Pf should be detectable. Yet, the channels did not contribute to the water flow through the vesicular membrane even though channel water permeability but weakly depends on temperature (24).”]]

      Furthermore, the authors calculate the number of hydrogen bonds (NH) that pore waters could form with pore lining residues based on an X‐ray structure of a chimeric proton channel protein (pdb: 3WKV) that is: a) manifests discontinuous transmembrane water density and is known to represent a non‐conductive conformation, b) contains residues from Ci‐VSP in the critical S2‐S3 linker that form part of the proton transfer pathway, and c) exhibits structural features (i.e., highly conserved ionizable residues such as D185 and R205, which like D174 are reported to dramatically alter Hv1 gating, are packed into a solvent‐free crevice) that are inconsistent with physiological function. Given that all Hv1 ionizable mutant combinations tested so far (the sole exception of D112V ‐ other nonionizable substitutions at D112 are tolerated) remain functional (Musset, Smith et al., 2011, Ramsey, Mokrab et al., 2010), the identities of water‐interacting residues speculative.

      [[

      We substituted the X‐ray structure of the chimeric proton channel protein for the AlphaFold structure. We now provide views of the open and closed conformations in the Supplement based on the homology structure (13). Microsecond-long molecular dynamics simulations have optimized the latter.

      The experimental observation of mutants’ functionality (with the sole exception of D112V) supports our view that proton transfer occurs through a hydrogen‐bonded network of waters that is only once (at D112) interrupted by an amino acid side chain. The nature of the amino acids interacting with the proton transferring water molecules is of little importance.]]

      Interpreting differences in the calculated NH based on pdb: 3WKV therefore seems unlikely to reveal fundamentally important insights into Hv1 function. The author's conclusion that "The observation rules out the formation of an uninterrupted water chain spanning the open channel from the aqueous solution at one side of the membrane to the other. NH would have governed water mobility if such a water wire had formed (24)", (lines 143‐145) therefore does not appear to be strongly supported.

      [[

      We did not base our conclusion of an obstructed water pathway on the analysis of structural models. In contrast, the conclusion is the result of our experiments. The structural models permitted the prediction of the expected water permeability. Depending on the model and the channel conformation, we find NH values between six and 16. All of these NH values translate into water permeabilities exceeding gramicidin’s water permeability. Thus, we would have been able to detect the water flux through an unobstructed proton channel.]]

      Reviewer #2:

      Summary: Voltage‐gated proton channels are peculiar members of the voltage‐gated ion channel family due to their absence of canonical pore. Instead, protons permeate through their voltage‐sensing domain. The mechanisms of proton permeation in Hv1 channels are still unclear, with currently two competing hypotheses: (i) hopping through titrable residues within the protein; or (ii) via Grotthuss mechanism involving proton jumping through a continuous water wire. So far, these hypotheses were only tackled by computation. The authors therefore aimed to experimentally test the two hypotheses. To do so, the authors measured the transport rates of protons and water through wild‐type and mutant D174A Hv1 reconstituted in lipid vesicles. Overall, the presented data are convincing and support their conclusion that proton conduction through the channel is not solely mediated by water transport. However, there are several aspects of the paper that I did not understand and would require clarification.

      [[

      We thank the reviewer for the positive evaluation.]]

      Major comments: My major concern is about the relevance of using the D174A mutant. The authors explain at the beginning of the paper that Hv1‐D174A is open at 0 mV, which allows measuring proton flux in systems in which voltage cannot be controlled. However, it seems from the proton flux experiments that wild‐type Hv1 can conduct protons perfectly well in the used experimental paradigm. So why test a mutant? It is actually not clear why wild‐type Hv1 can conduct protons in the proton conduction assay.

      [[

      We introduced the D174A mutation to measure water flux in a setting where the membrane potential is zero. We only performed the proton flux measurements to show that our reconstituted HV1 channels are functional. HV1 can conduct protons because we establish a transmembrane potential in the proton conduction assay. That is, only initially, extravesicular and intravesicular pH values are equal. Valinomycin addition results in a K+ efflux that, in turn, generates a membrane potential. This potential drives the HV1-mediated H+ influx.]]

      The authors should clearly state the trans‐membrane potential created by the K+ gradient across the vesicle, as well as the pH inside and outside the vesicle, and related these conditions to their electrophysiology data to give us an idea of the open probability of wild‐type Hv1 in the conditions used in the proton conduction assays. This is critical to be able to compare the relative rates of proton transport between the wild‐type and the mutant.

      [[Page 6, bottom:

      " ...we encapsulated c_k^i=150 mM KCl in the HV1 containing large unilamellar vesicles (LUVs) and exposed these vesicles to a buffer with a K+ concentration c_k^o= 3 mM. The addition of valinomycin facilitated K+ efflux, thereby inducing a membrane potential, ψ. ψ constituted the driving force for H+ uptake. It can be calculated according to the Goldman equation:

      ψ = -RT/F ln ((c_k^i+(P_H/P_K ) c_H^i)/(c_k^o+(P_H/P_K ) c_H^o ))

      (1)

      The ratio of the HV1 mediated proton permeability P_H to the valinomycin-mediated potassium permeability P_K is always smaller than 0.04. We base our conclusion on the observation that the CCCP-mediated proton permeability represents an upper limit for P_H since CCCP always induces a faster vesicular proton uptake than HV1 (Fig. 3). Accordingly, the maximum value of P_H/P_K can be estimated as the ratio of valinomycin to CCCP conductivities. The respective values are equal to 1.6 10-3 Ω-1 cm2 [1] and 4 10-6 Ω-1 cm-2 [2]. At pH 7.5, we find c_H^o=10^(-7.5) M, i.e., c_k^o ≫ (P_H/P_K )c_H^o. Similarily, c_k^I ≫ (P_H/P_K ) c_H^i for a broad range of intravesicular pH. With these simplifications, Eq. 1 transforms into the Nernst equation yielding:

      ψ = -RT/F ln (c_k^i)/(c_k^o )=-100 mV

      (2)

      ψ of such size may decrease intravesicular pH by nearly two units.

      Such acidification does not violate so that remains constant throughout the experiment. That is, the vesicle experiments proceed under voltage clamp conditions. The simple explanation is that, due to the small proton concentration and the limited buffer capacity, the K+ conductance exceeds H+ conductance under all conditions. The conclusion is in line with simulations (32), confirming that the membrane potential is driven very near the Nernst potential for K+.”]]

      Similarly, the buffers and pH used for the water transport assay are not explicitly mentioned. Are they the same as for the proton transport assay or are the buffers inside and outside the vesicle symmetrical?

      [[

      We added the information about buffers and pH used to the legend. Except for 150 mM sucrose, the internal and external solutions were identical: 150 mM KCl, 5 mM HEPES (pH 7.5), and 0.5 mM EGTA.]]

      Finally, in the introduction the authors base their assumptions about water transport on an X‐ray structure of Hv1 in a closed conformation (3WKV). I do not think it is relevant to study permeation, which in theory should only happen in an open state. If the authors want to make assumptions about the number of hydrogen bonds in the pore and how many water molecules are in the pore (and I don't think they need to do it), they should rather base their assumptions on the computational models of Hv1 open state.

      [[

      We thank the reviewer for the advice. We added a figure to the Supplement. It shows Hv1 models from long-timescale molecular dynamics simulations (Geragotelis et al, Proc Natl Acad Sci U S A 2020 Vol. 117 Issue 24 Pages 13490-13498). The open structure reveals NH=6. We used this value for our calculations.]]

      Minor comments:

      1) Figure 6: the authors should precise that the model of proton conduction through Hv1 is just an assumption. The structural features of Hv1 open state are indeed unknown.

      [[We modified the figure based on the simulation results of Geragotelis et al. We indicated in the legend that the scheme is based on HV1 homology models.]]

      2) Page 9, lines 170‐171 "Drastically prolonged tail current kinetics might reflect a decreased voltage‐dependence of the deactivation in the D174 mutant". Or rather the prolonged kinetics reflect the stabilization of the open state by the mutation (as stated by the authors just after).

      [[Page 14:

      “Drastically prolonged tail current kinetics might reflect (i) a decreased voltage dependence of the deactivation in the D174A mutant or (ii) a stabilized open state (14).”]]

      3) Supplementary figures are displayed in an odd fashion. Figure S3 should be placed before Figures S1 and S2.

      [[We added two more Supplementary Figures and displayed them in the order of text mentionings.]]

      4) In Figure 2, displaying the current trace corresponding to the 0 mV voltage step would improve readability of the figure, by showing that Hv1‐D174A mutants conduct protons at 0 mV and not wt Hv1.

      [[

      We show the current trace corresponding to the 0 mV voltage step for the D174A mutant in panel A and the trace for the wild-type in panel B of Fig. 2.]]

      5) Figure 2 legend "Pronounced inward H+ currents activate negatively to the reversal potential (here ‐70 mV)". I think the authors mean "Here 0 mV", ‐70 mV is the threshold potential. Panel (c), I guess the EH vs Vrev plot is for D174A mutants but it is not mentioned in the legend

      [[

      We corrected the legend. “Pronounced inward H+ currents activate negatively (here – 70 mV) to reversal potential (here – 8 mV), indicating a high open probability of the D174A mutant at 0 mV.” And “Comparison of calculated Nernst potential for protons (EH) and measured reversal potential (Vrev) for the D174A mutant.”]]

      6) Page 4, line 89: the fact that D174A conducts protons at a lower rate is, at this point, based on a lot on assumption. I would just correct the last sentence by saying "Thus, D174A, while opening with less depolarization, seems to conduct protons at a lower rate"

      [[We toned down our statement and inserted a phrase very close to the one suggested.

      Page 5: “Our observation suggests a reduced flux through the mutant if we assume that the protein expression level is independent of the mutation.”]]

      7) Page 6, line 107. The word "therefore" is not necessary

      [[ok]]

      8) Page 7, line 128: "of" in "measures of transport" is missing

      [[We deleted the paragraph.]]

      9) Page 12, lines 261‐262: "Figure M" ??

      [[“Inset of Figure 3A”]]

      CROSS‐CONSULTATION COMMENTS I agree with the two other reviewer's comments. I think our reviews more or less raise the same weaknesses in the study.

      Significance

      This paper addresses a single question with a clearly defined experimental paradigm. Once the issues addressed, the paper should bring important significance to the field of voltage‐gated ion channels since the nature of proton conduction in Hv1 was not known. It could help explain ion conduction in some channelopathies involving ion conduction through the voltage‐sensing domain. The audience is mainly the voltage‐gated ion channel community, as well as the community of membrane permeation mechanisms My field of expertise is in ion channel structure‐function and pharmacology. I have little expertise in the described proton and water flow assays. Therefore I do not have sufficient expertise to evaluate the detailed experimental protocol that led to the measurements.

      Reviewer #3:

      Summary: This study addresses a fundamental question about the mechanism of proton conduction in the voltage gated proton channel Hv1 i.e., whether protons hop through an uninterrupted water wire, or move by other means involving titratable channel residues. The authors argue that an uninterrupted water wire entails a certain rate of water movement through the open channel, which they estimate to be around 10‐12 cm3s‐1 based on a structural model of Hv1 and previous work on other channels. They then measure water permeability of LUVs containing a purified Hv1 mutant expected to be open at 0 mV via light scattering, and proton flux using a pH sensitive fluorescent dye. They calculate a water permeability much lower than predicted and conclude that the water in the conduction pathway does not form an uninterrupted water wire. The manuscript is written clearly, and the experimental measurements are convincing.

      [[We thank the reviewer for the positive evaluation.]]

      There are nonetheless some ambiguities in the way the formation of water wires is discussed.

      Major comments: A protein like Hv1 is larger and more complex than small peptides like gramicidin. In this context, transient water wires, frequently interrupted by titratable residues, or by steric hindrance from hydrophobic sidechains etc. are likely. Can the authors provide an estimate for the maximum frequency and lifetime of uninterrupted proton wires compatible with their measurements? This would be helpful to evaluate whether short‐lived uninterrupted water wires could contribute significantly to proton conduction or not. Trapping usually implies restricted movement. So, for how long do water molecules need to stay inside the channel in order to be considered trapped? Are the water molecules really trapped or simply forming broken wires?

      [[Page 13, bottom:

      “The question arises whether the obstacle in the water pathway is permanent. HV1’s titratable residues or steric hindrance from fluctuating sidechains may frequently interrupt otherwise intact water wires. Yet, our calculations (Eqs. 7 – 11) show that proton diffusion from the bulk solution to the pore mouth is the transport limiting step. Undoubtedly, transient closure would have caused a detectable pore resistance because part of the protons arriving at the pore mouth could not enter the pore. If the pore was closed longer than one ps, an arriving H+ may diffuse out of the capture zone and vanish into the bulk:

      t_c=(r_0^2)/6D = 10^(-16)/(6 × 8.65 × 10^(-5) ) s = 2 × 10^(-13) s

      (16)

      where tc denotes the time a proton requires to diffuse a distance equal to the capture radius r0. Since transient closures would give rise to experimentally undetected pore resistance, they must be ruled out. The observation agrees well with noise experiments, where Lorentzian time constants, albeit smaller than the time constants for H+ current activation but larger than 0.1 s were observed (41).

      We provided the calculations showing the diffusion limitations on page 9:

      “…we show that the transport limiting step is H+ diffusion to the pore (access resistance) and not transport through the pore. Therefore, we first calculate the maximum current Imax permitted by diffusion for a constantly open pore (35):

      I_max=2π F r_o D_H c_H

      (7)

      where F, r0, DH, and cH are Faraday's constant, the capture radius, the H+ diffusion constant, and the H+ concentration, respectively. The only unknown parameter is r0. Taking the gA estimate r0 = 0.87 Å (36), disregarding buffer effects and assuming DH = 8.65×105 cm2s-1, we find:

      I_max=2π (9.6 ×10^4 As)/mol × 0.87 × 10^(-8) cm × 8.65 x 10^(-5) (cm^2 s^(-1) × 4 × 10^(-7.5) mol)/(1000 cm^3 )

      (8)

      I_max=5.6 × 10^(-17) A

      (9)

      Eq. 8 considers that the approximately 25 % charged lipids in the bilayer induce an increase in surface proton concentration, i.e. it accounts for a surface potential of roughly -40 mV in 150 mM salt. The maximal unitary rate would then be equal to:

      q_max = 5.6 × 10^(-17) C/s/1.6 × 10^(-19) C =348 s^(-1)

      (10)

      Here we used the r0 value determined for gA (36). Acidic moieties at the entrance of HV1 and proton surface migration along the lipid bilayer could serve to increase that value (37, 38). The observation suggests transport limitations by poor proton availability. Calculation of the channel resistance, Rch (35), confirms the hypothesis:

      R_ch = R_pore+R_access =[l_ch+(π a_ch)/2] ρ/(π a_ch^2 )

      (11)

      where R_pore is the resistance of the pore proper and R_access is the access resistance. Assuming a channel radius, a_ch, of 0.15 nm, a length, l_ch of 4 nm and solution resistivity (H+ as the sole conducted ion at bulk pH of 7.5 and a surface potential of -40 mV), ρ, of 2×105 Ω cm, we find R_ch = 4×1013 Ω. Thus, the resulting current, Iρ, that we may expect for the vesicular membrane potential of 100 mV is equal to 3×10-15 A. Accordingly, Iρ exceeds Imax by more than one order of magnitude. Consequently, we may safely conclude that HV1 conductance is limited by proton availability under our conditions. ”]]

      The main conclusion of the paper rests on the negative results from the water permeability assay of Fig. 5. It is recommended to include a positive control (e.g., with gramicidin A), run under the same conditions and similar number of channels per LUV, to show how the results should look like in case of significant water permeability.

      [[We included the gramicidin measurements (Fig. 6) as requested.]]

      Figure 6 show a simplified scheme of proton transport with trapped water molecules in Hv1. Panel A represents a resting state (nonconductive); panel B represents an open state (conductive), favored by the D174A mutation. So, what makes B conductive and A nonconductive? Is it the presence of two salt bridges in B vs. three salt bridges in A? This should be clarified.

      [[

      We modified the figure based on the simulation results of Geragotelis et al. We indicate with arrows the parts of the channel where the proton is free to move and crosses the sites with insurmountable energy barriers.

      Legend to the figure (now Fig. 8): “In the region of the selectivity filter adjacent to D112, the channel is too narrow to let water molecules pass (see also Fig. S1). Yet, the proton may bypass the electrostatic barrier of the open channel at D112 (18), i.e., jump between the two neighboring water molecules. Removal of D174 shifts the voltage sensitivity so that most channels are already open at a transmembrane potential of 0 mV. B) The closed channel. It neither allows water nor proton transport. In its new location, D112 provides an insurmountable electrostatic barrier to proton passage.”]]

      Minor comments: The interpretation of Fig. 2E strongly depends on the assumption that the D174A mutation does not alter membrane trafficking. It is recommended to check the validity of this assumption, e.g., by colocalization with a plasma membrane marker. Images of SDS‐PAGE results for the studied Hv1 proteins should be provided to show preparation purity.

      [[

      We toned down the interpretation of Fig. 2E. As it stands now, Fig. 2 shows that the mutant (i) is functional and (ii) has a high open probability at 0 mV. These conclusions are independent on membrane trafficking. We included images of SDS page results for the studied HV1 proteins in the Supplement.]]

      CROSS‐CONSULTATION COMMENTS I agree with the comments from the other two reviewers. My major point is that refuting major water permeability in Hv1 is not the same thing as refuting that protons can be conducted by transient water wires, unless it is proved that the transient water wires cannot sustain enough proton movement to account for the single channel conductance. Reviewer #3 (Significance (Required)): The Hv1 channel plays important roles in the human body, including the immune, respiratory, and reproductive systems. Despite recent advances in understanding the mechanism of proton conduction by Hv1, whether or not protons hop within a continuous water wire in the open channel is a subject of debate (DeCoursey J. Physiol. 2017, Bennett & Ramsey J. Physiol. 2017). This work provides important insights on the debate by refuting the existence of a water wire that can sustain large water permeability. The findings reported here will be of interest to ion channel biophysicist like this reviewer, but also to biologists studying cellular pH homeostasis and the pathophysiology of Hv1.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      August 17, 2022

      RE: Review Commons Refereed Preprint #RC-2022-01442

      Dear Editor of the EMBO Journal,

      Please find our updated manuscript and response to the reviewers’ comments. We appreciate the effort that the reviewers have put into the evaluation of our manuscript.

      We are happy with the potential importance the reviewers realise in the study:

      Reviewer 1: The finding that ubiquitination occurs inside mitochondria would be an important conceptual advance, which would open new perspectives both for ubiquitination and mitochondrial biology

      Reviewer 2: This work would represent a significant/exceptional discovery if supported by compelling data.

      Reviewer 3: the results are interesting and very important, as mentioned in the major comments section…

      With regard to the major comments raised by the reviewers, you will find below our specific response point by point with explanations and suggested novel experiments (highlighted in yellow). In summary we suggest the following actions to fully support our model:

      • We will perform a-complementation with ubiquitin (lacking the GG motif) fused at its C-terminus to the short fragment of b-galactosidase (a). Blue colonies with ωm will indicate import.
      • As shown in Figure S2, now added to the manuscript, we show detection of ubiquitinated proteins and mono ubiquitin in extracts of mitochondria pre-treated with trypsin.
      • A bio-archives address of our other manuscript will be provided.
      • The use of a-complementation for protein localization was developed by us 15 years ago and since then has been used by us and other groups verifying its use as a screening tool. One point is clear, ωm or ωc do not leak into other subcellular compartments. Nevertheless, in the research of specific genes validation is important. Yes!!! ωm and ωc are exclusively located in mitochondria or the cytosol respectively.
      • We will highly purify mitochondria on gradients and treat them with protease.
      • We cannot be sure that we will be able to detect a protein with ubiquitin modifying activity which functions solely on certain proteins in mitochondria, so publication cannot rely on this.
      • Repeat mass spectrometry with careful editing will be undertaken as suggested by the reviewer.
      • We will attempt to perform protease protection assays in the presence of specific detergents.

      Before tackling the very tough revision, we would like to know if EMBO Journal would positively consider acceptance of our manuscript based on the review and planned revision.

      Prof. Ophry Pines Microbiology & Molecular Genetics Hebrew University of Jerusalem Jerusalem 91220 Israel


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript, Zhang et al. investigate whether ubiquitination occurs inside mitochondria of the budding yeast S. cerevisiae. They first observe thanks to a sensitive complementation assay that several components of the yeast ubiquitination (and deubiquitination) machinery can localize inside mitochondria. To be able to specifically probe ubiquitin conjugates assembled inside mitochondria they fused HA-tagged ubiquitin to a mitochondrial targeting sequence. Using this construct, they demonstrate that ubiquitin conjugates can be assembled in mitochondria. A series of elegant experiments demonstrates that the pattern of ubiquitin conjugates depends on the mitochondrial localization and the activity of the ubiquitin conjugating enzyme Rad6. Altogether, these results convincingly demonstrate that ubiquitination can occur inside yeast mitochondria when ubiquitin is intentionally targeted inside this organelle. It however remains unclear whether mitochondrial ubiquitination occurs in endogenous conditions (without targeting ubiquitin into this compartment) and whether it affects mitochondrial functions.

      Response: Regarding the question whether mitochondrial ubiquitination occurs in endogenous conditions, we feel that this is obvious based on our results. We detect numerous ubiquitination related enzymes (E1, E2, E3, DUB) eclipsed in mitochondria but none of the proteasome subunits. As pointed out by the reviewer “these results convincingly demonstrate that ubiquitination can occur inside yeast mitochondria”. With that said, additional data will be incorporated into the manuscript as suggested by the reviewer and can be seen below.

      Major comments:

      1) The materials and methods section is lacking important information (western blot protocol, details of antibodies, strains, plasmids...). It is thus difficult to evaluate how several experiments were performed and how their design (e.g. the promoters chosen to express tagged proteins) could impact the interpretation of the results. This is a major issue that needs to be corrected. The main text should also explicitly indicate whether tagged proteins used in the alpha-complementation assay are overexpressed or not.

      Response: The materials and methods section will be updated accordingly.

      2) Despite the previous comment, the data presented in the manuscript convincingly demonstrate that multiple components of the ubiquitination machinery can localize within mitochondria and that ubiquitin conjugates can be assembled in mitochondria when ubiquitin is modified to be intentionally targeted into this compartment. However, little data is shown to support the hypothesis that ubiquitin conjugates can be assembled in mitochondria when ubiquitin is not fused to a mitochondrial targeting sequence. Thus, in my opinion, the evidences presented in the current manuscript are not sufficient to conclude that ubiquitin conjugates are assembled in mitochondria in endogenous conditions (as this is done implicitly). Additional evidences are needed to draw this conclusion (see some experimental suggestions hereafter). Without further evidences, the speculative aspects of the claim that "ubiquitination occurs in the mitochondrial matrix" should be discussed explicitly.

      Response: See the discussion above why we are confident that ubiquitination occurs in mitochondria. Our major problem with ubiquitin and the ubiquitination enzymes is that they are eclipsed in mitochondria. We propose as suggested by the reviewer (item 4 of his review) to perform a-complementation with ubiquitin fused at its C-terminus to the short fragment of b-galactosidase (a). Blue colonies with ωm will indicate import.

      3) The authors used a mass spectrometry approach to identify mitochondrial ubiquitination substrates. However, they have not yet succeeded in identifying a substrate whose modification is specifically regulated by a given component of the mitochondrial ubiquitination machinery. They have also not identified a phenotype or process impacted by mitochondrial ubiquitination. Thus, at this stage, the biological consequences of mitochondrial ubiquitination remain elusive.

      __Response: __We have not identified a substrate whose modification is dependent on a given component of the mitochondrial ubiquitination machinery, even though we have tried. Again, the problem is low levels of these proteins eclipsed in mitochondria. Even when we do find a protein that is ubiquitinated (e.g. Aco1) its ubiquitination is not exclusively dependent on Rad6. Thus, different ubiquitin enzymes may have the same substrates.

      4) The authors have not directly investigated whether ubiquitin itself (without a mitochondrial targeting sequence) localizes in mitochondria. I encourage them to address this question since it would provide an important piece of evidence suggesting that mitochondrial ubiquitination can occur in endogenous conditions. This could be done using the alpha-complementation assay and the results could be presented within Figure 1. Ideally this experiment should be performed without overexpressing ubiquitin. Note that if the authors decide to use a C-terminally tagged form of ubiquitin for this experiment, the GG motif of ubiquitin should be mutated to avoid cleavage of the alpha tag by cellular DUBs. This form of ubiquitin will not be conjugatable, but this is not an issue for this experiment since its aim is to determine whether ubiquitin can be targeted to mitochondria, not to probe conjugates.

      Response: We will perform experiments as suggested by the reviewer including ubiquitin fused at its C-terminus to the short fragment of b-galactosidase (a), see item 2. We have previously made a PreSu9-Ubi lacking a GG motif but now will look at a different combination of this and other constructs.

      5) In the top panels of Figure 2 and S1, free ubiquitin is well detectable in the total and cytosolic fractions. It is however not clear to me whether it is also detectable in the concentrated mitochondrial fraction. If yes and if it would be resistant to trypsin digestion, it would provide additional evidence that endogenous ubiquitin can be targeted to the mitochondrial matrix (see previous comment).

      Response: See Item 6.

      6) The data shown in the top panel of Figure 2 and S1 also suggest that free ubiquitin is less concentrated in mitochondria than in the cytosol (since it is more difficult to detect in the concentrated mitochondrial fraction than in the cytosolic fraction, see previous comment). It is thus possible that the use of preSu9-HA-Ubi (or preFum1-HA-Ubi) lead to an artificially high intra-mitochondrial concentration of free ubiquitin. As the concentration of free ubiquitin is known to impact ubiquitination processes, I encourage the authors to compare the relative levels of free ubiquitin present in the mitochondrial fraction prepared from WT and preSu9-HA-Ubi (or preFum1-HA-Ubi) expressing cells. If free ubiquitin is detectable in mitochondrial fractions and resistant to trypsin (see previous comment), this could be done by repeating the experiment shown in Figure 3B and probing the blot with an antibody that recognizes free ubiquitin.

      Response to 5 and 6: Detection of ubiquitin in mitochondria is extremely difficult even when mitochondria are 15-fold concentrated versus the cytosol and when HA-Ubi is overexpressed. Thus, ubiquitin is eclipsed in mitochondria. Nevertheless, as shown in the Figure below which was not part of the submitted manuscript yet was performed in parallel to experiments done early on, shows detection of very weak bands of free ubiquitin in extracts of mitochondria pre-treated with trypsin.

      Endogenous ubiquitination pattern in mitochondria of _Δrad6 _cells is restored to normal by Rad6-α. __WT or Δrad6 cells containing a Rad6-α construct or an empty plasmid were subjected to subcellular fractionation. Mitochondrial fractions with or without trypsin treatment, were probed for ubiquitin by WB. Aco1 is a matrix mitochondrial protein, and Tom70 is a mitochondrial outer membrane protein (MOM) facing the cytosol.

      7) I strongly encourage the authors to provide more data indicating that "ubiquitination occurs in mitochondria" by performing experiments that do not rely on the use of the preSu9-HA-Ubi or other forms of ubiquitin that are intentionally targeted to mitochondria. For instance, they could analyse the pattern of HA-Ubi conjugates of trypsin digested mitochondrial fractions prepared from wt, rad6-delta, and rad6-delta complemented with preSu9-Rad6-alpha-SL17. Note that if trypsin digested mitochondrial fractions are too contaminated by ubiquitinated proteins present outside mitochondria to perform this experiment, the authors may use the unspecific DUB Usp2 as an alternative protease to strip ubiquitinated proteins from the mitochondria periphery.

      Response: Concentrated mitochondrial extracts from WT and Δrad6 cells untreated or treated with trypsin were probed with anti-ubiquitin antibodies (Figure above). A very weak band corresponding to free ubiquitin can be detected in extracts of mitochondria treated with trypsin but these are very weak and are on the limit of detection.

      Minor comments:

      1) Overall, the manuscript is well organized and easy to follow. The text is clearly written; the figures are well annotated.

      2) The authors should provide full images of all the blots with anti-ubiquitin and anti-HA antibodies so that one can see the bands corresponding to free ubiquitin (or free HA-Ubi). For instance, in Figure 3B, it is not possible to see the presence (or absence) of the band corresponding to free HA-Ubi because the very bottom of the image is cut.

      3) The authors should indicate whether the MTS of Su9 (and Fum1) are expected to be cleaved after import of preSu9-HA-Ubi (and preFum1-HA-Ubi) in mitochondria. They should also label on the corresponding immunoblots the presence (or absence) of the band corresponding to the free preSu9-HA-Ubi (and preFum1-HA-Ubi) (or HA-Ubi if the MTS is expected to be cleaved from these constructs).

      4) In Figure 3B, the ubiquitin conjugates produced with preSu9-HA-Ubi and preFum1-HA-Ubi have different migration patterns. I think this should be explicitly mentioned and discussed. Could it be due to the presence of lysine residues in the Su9 or Fum1 MTS that could lead to the assembly of artificial ubiquitin chains?

      5) The authors indicate that "endogenous Rad6 [...] is expressed at very low levels and can hardly be detected in the mitochondrial fraction by WB (Figure S5)". I did not manage to observe the band corresponding to endogenous Rad6 in the mitochondrial fraction in the pdf. The authors should provide a more contrasted or better quality image.

      CROSS-CONSULTATION COMMENTS I agree with reviewer 2 that proper validation of the complementation assay is crucial for this manuscript. I was myself wondering whether it uses endogenously tagged proteins or whether it is based on an overexpression system. I imagine this information will be detailed in the manuscript in preparation mentioned by the authors. I am therefore wondering whether it would be possible to ask the authors to provide the draft of this manuscript (or at least the validation part).

      Response: A bio-archives address of our other manuscript will be provided upon resubmission. See other issues referred to the response Reviewer 2.

      I agree with most comments of reviewer 3. Regarding the hypothesis that preSu9-HA-Ubi could form aggregates on the cytosolic surface of the mitochondria, I think that the results presented on Figure 7B rather argue against it (since they indicate that Rad6 localized inside mitochondria can restore the pattern of ubiquitin conjugates). That's why (in my opinion) the major question the author now need to adress is whether intra-mitochondrial ubiquitination occurs in endogenous conditions (ie without forcing ubiquitin into this compartment and without E2 or E3 overexpression).

      Response: See response to the other reviewers

      Reviewer #1 (Significance (Required)):

      The finding that ubiquitination occurs inside mitochondria would be an important conceptual advance, which would open new perspectives both for ubiquitination and mitochondrial biology research. However, the significance of the current manuscript is limited because the presented evidences heavily rely on the use of artificial conditions (ubiquitin tagged with a mitochondrial-targeting sequence) that may trigger irrelevant ubiquitination events. The significance would be much higher if the authors would provide further evidences indicating that intra-mitochondrial ubiquitination occurs in endogenous conditions and/or if they had identified a mitochondrial process specifically impacted by mitochondrial ubiquitination.

      Expertise of the reviewer: Ubiquitination, Yeast biology, protein-protein interactions. No specific expertise in mitochondrial biology

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In the manuscript by Yu et al., the authors test the concept that certain proteins are unevenly distributed within distinct cell compartments. Due to this localization discrepancy, protein detection in some subcellular compartments can be "eclipsed" by a predominant subset of specific protein localizing in another cell compartment their actual distribution. Therefore, tiny amounts of physiologically relevant proteins could be biologically relevant. Still, their function in some locations can be overlooked (or eclipsed) because of the high expression level of the same protein in another subcellular compartment(s). Although, this concept is not particularly novel. For example, it is already known that many different proteins can localize to distinct cellular locations (e.g., permanent mitochondrial and peroxisomal localization of many proteins or transient localization of particular proteins to separate cell compartments). The authors apply a yeast system and an α-complementation assay to test further the role of such eclipsed proteins in mitochondrial biology. Specifically, they focus on the ubiquitin (Ub, or as abbreviated incorrectly in this manuscript; Ubi) conjugation pathway, components of which have never been convincingly shown to localize inside the mitochondria. This work proposes that certain ubiquitination events can occur inside yeast mitochondria. This work would represent a significant/exceptional discovery if supported by compelling data. However, the major problem with this work is that the conclusions are based on the ectopic expression of distinct proteins. This approach is not failproof in precise protein expression/delivery to the specific subcellular locations and is likely to result in a non-specific localization. Thus, the problem of eclipsed proteins is addressed by the methodology that may lead to the artificial generation of eclipsed overexpressed proteins. A more effective approach would be if the authors found a way to study this issue with endogenous proteins. The need for overexpression of mitochondria-targeted ubiquitin makes it challenging to reconcile the physiological role of these fundings. In addition, some critical technical issues and omissions further reduce the potential impact of this work (see Specific comments below). For example, strong evidence of mitochondria fraction purity and additional evidence that all the essential constructs used in this work are not misdirected to a different compartment are needed.

      Response: “Although, this concept is not particularly novel” is a very disappointing remark by the reviewer!! While dual targeting of proteins has been known for many, many years, how widespread the phenomenon was unknown and thought to be negligible. We are leaders for the last 30 years in the field of dual targeting and distribution and in particular distribution of single translation products. We coined the terms “echoforms” and “eclipsed distribution” and developed methods to detect and screen for dual targeting. The concept of eclipsed distribution and in particular eclipsed targeting to mitochondria is very new, and is leading to a novel perception of the mitochondrial proteome (see MS submission). While the reviewer appears to be an expert on ubiquitination, we are experts on dual targeting.

      • Ub was abbreviated incorrectly in this manuscript, Ubi. __Response: __This will be corrected.

      Other comments will be referred to in the response to Specific comments.

      Specific comments 1. The authors should demonstrate beyond doubt that the ω components of their assay (ω-C, which supposedly stays in the cytosol-ONLY and the ω-M component, which seemingly remains in the mitochondria-ONLY) are in the compartment that the authors claim. These two proteins are transfected into yeast cells and overexpressed. Therefore it is possible that they leak to other, not intended, subcellular compartments. The authors assume that ω-M and ω-C are exclusively located either in the mitochondria or the cytosol. However, this should be shown as validation of the assay. The indicated reference from 2005 (Ref.13) and others are irrelevant since assays have variations and are often researcher/lab dependent. This validation is very important since a misallocation of the overexpressed ω-M or ω-C, leaking into other subcellular compartments, may cause misdetection of the α-constructs.

      Response: The use of a-complementation for protein localization was developed by us 15 years ago and since then has been used by us and other groups verifying its use as a screening tool. One point is clear, ωm or ωc do not leak into other subcellular compartments. Nevertheless, in the research of specific genes validation is important. Yes!!! ωm and ωc are exclusively located in mitochondria or the cytosol respectively.

      It is not surprising that Ub conjugates are detected in mitochondrial fractions. It could be due to ubiquitination of the OMM (coming from the cytosol) or perhaps since the subcellular fractions were not pure mitochondria free from contamination (the likely culprit could be the ER). The mitochondrial fractions in this work were obtained by 10,000 g separation between cytosolic and mitochondrial crude fractions. Indeed, these 10,000 g crude fractions are highly impure with membranes from other compartments (i.e., microsomes, lysosomes, and so on). Therefore, more sophisticated purification methods should be used. In addition, the authors should also test these fractions for non-mitochondrial proteins from other membrane organelles.

      Response: We agree with the reviewer and therefore will take the following approaches:

      1. i) We will treat isolated mitochondria with protease in order to remove adhering proteins and digest OMM proteins…… see attached figure.
      2. ii) We will highly purify mitochondria on gradients and this will be straight forward since we are now employing such methods in other projects in the lab. iii) Matrix protein enrichment (by mass spec) is associated with IP for preSu9-HA-Ub conjugates which is three-fold higher than for HA-Ub. In any case the fact that we identify conjugates of proteins not known to be mitochondrial, strongly supports our thesis.

      Figure 2. Coomassie blue staining does not show any signal in the "M" fraction. It can be interpreted that the authors do not get any mitochondria there, and therefore the lack of Ub signal is due to the absence of the protein in the samples. Using the same amount of protein from each fraction would probably reduce the necessity of 15x enrichment.

      Response: The Coomassie blue staining does show a signal in the "M" fraction which is weak yet when a 15x enrichment is run, the protein level by Coomassie blue staining is similar to the cytosolic fraction.

      Figure 3. It is puzzling why the HA-UBQ presence is so strong in the crude mitochondrial fraction, but the preSu9-HA-Ub signal (mito-matrix) is comparatively weak. These data suggest that the crude mito-fraction could be highly contaminated with OTHER membranes. On the other hand, the preSu9-HA-UBQ signal is no more than 1-5% of the total mitochondrial signal. The high enrichment of the HA-Ubi in both cytosols and the mitochondria could indicate the OMM ubiquitination or (again) contamination by other compartments. The constructs with MTS are detected in the mitochondria. However, the localization of tagged MTS-Ubi in a non-targeted compartment (e.g., cytosol) should be excluded by additional exposure times. Because the manuscript talks about eclipsed proteins, this is important.

      Response: The HA-Ub is strong in the mitochondrial fraction, in the absence of trypsin, but is very weak in the presence of the protease indicating that most of the ubiquitinated proteins are externally attached to mitochondria. In contrast, PreSu9-HA-Ub is imported into the mitochondrial matrix and is protected from trypsin. This manuscript refers to “eclipsed in mitochondria” (not the cytosol) and this is true for ubiquitination enzymes as well as for ubiquitin.

      Figure 3C-E. These data indeed suggest that the Ub-conjugates could be formed inside the mitochondria. However, the above-discussed possibility that other than mitochondria compartments co-sediment in the 10,000g fractions makes the data interpretation highly challenging.

      __Response: __We will highly purify mitochondria on gradients and this will be straight forward since we are now employing such methods in other projects in the lab.

      Figure 4. Unsurprisingly, mitochondrial targeting of Ub leads to detecting some co-immunoprecipitating mitochondrial proteins. However, these data do not support the notion that Ub conjugation machinery acts inside the mitochondria and that the target proteins are indeed conjugated with Ub (the interaction with Ub is not equal to being conjugated). At the minimum, the authors should provide a validation that some of the detected mitochondrial matrix proteins are indeed ubiquitinated. To this end, purified mitochondria could be used for the candidate protein IP under denaturing conditions and then blotted for the candidate protein and Ub.

      __Response: __As shown in Table S2 and figure S7, forms of Ilv5, a mitochondrial protein, are ubiquitinated in WT and Drad6 cells. These modified forms of Ilv5 can be eluted from mitochondrial extracts of WT and Drad6 cells. However, the ubiquitination of ilv5 is not dependent or effected by the Drad6 mutation. We cannot be sure that we will be able to detect a protein with ubiquitin modifying activity which functions solely on certain proteins in mitochondria.

      Figure 5. The knock-out of the E2 Rad6 causes a change in the mitochondria ubiquitination pattern. This is an interesting observation, but again it does not prove that the change in the mitochondrial ubiquitination is due to the activity of Rad6 inside of the mitochondria, as opposed to ubiquitination of the OMM proteins or contaminating fractions. One also wonders why overexpression of mitochondria-targeted Ub would be necessary to detect the ubiquitination if this process was physiologically relevant, especially given that detecting endogenous Ub is not challenging. Furthermore, the apparent increase in ubiquitination in E2 mutant cells (Fig. 5) should also be addressed in more detail. Finally, data from one WB is shown, and quantification of several independent experiments should also be provided.

      __Response: __We show in the MS that RAD6 is exclusively targeted to mitochondria (Su9MTS) while unimported molecules are degraded (SL17; degron). This hybrid Rad6 can restore the WT ubiquitin pattern, while a rad6 active site mutant cannot.

      Figure 6. Can the authors provide Western blot data showing the expression of Rad6? Furthermore, quantifying these rescue experiments is necessary to make this conclusion more solid.

      Response: Even though we did not succeed in making good Rad6 antisera, we can clearly detect Rad6-a fusion proteins (Figure 7B).

      Figure 7. The authors found that preSu9-Rad6-α have problems being imported into the mitochondria matrix; therefore, they rebuild it as a preSu9-Rad6-α-SL17 protein. SL17 is a degron that targets the cytosolic protein (not imported into the mitochondria) to the proteasome and degraded (Figs. 7A-B-C). These issues could be a red flag for the rest of the manuscript, suggesting that other constructs (that were not critically evaluated for their localization in this work) could leak to different cellular compartments.

      Response: The wording used by the reviewer is particularly disturbing since current understanding in cell biology of eukaryotic cells does not accept “leaking” of proteins to different cellular compartments. One wouldn’t want DNAses, RNAses, Proteases etc leaking from one compartment to another. The localization of proteins to different cellular compartments involves very precise signals on the proteins, and specific cellular components, such as translocases, are required to target proteins to their exact destination. This is true for Rad6; it contains an MTS like sequence which when removed blocks import of the protein into mitochondria. Rad6 according to our analysis is an eclipsed dual targeted protein, so it no surprise that it is in two compartments and the trick with the SL17 degron solves the problem.

      The manuscript needs to be carefully edited, some references are in the not correct format, and there are issues with figure labels.

      Response: Careful editing will be undertaken as suggested by the reviewer.

      CROSS-CONSULTATION COMMENTS I agree with a great summary by reviewer 1. This discovery should be validated by top-quality data.

      Reviewer #2 (Significance (Required)):

      In the manuscript by Yu et al., the authors test the concept that certain proteins are unevenly distributed within distinct cell compartments. Due to this localization discrepancy, protein detection in some subcellular compartments can be "eclipsed" by a predominant subset of specific protein localizing in another cell compartment their actual distribution. Therefore, tiny amounts of physiologically relevant proteins could be biologically relevant. Still, their function in some locations can be overlooked (or eclipsed) because of the high expression level of the same protein in another subcellular compartment(s). Although, this concept is not particularly novel. For example, it is already known that many different proteins can localize to distinct cellular locations (e.g., permanent mitochondrial and peroxisomal localization of many proteins or transient localization of particular proteins to separate cell compartments). The authors apply a yeast system and an α-complementation assay to test further the role of such eclipsed proteins in mitochondrial biology. Specifically, they focus on the ubiquitin (Ub, or as abbreviated incorrectly in this manuscript; Ubi) conjugation pathway, components of which have never been convincingly shown to localize inside the mitochondria. This work proposes that certain ubiquitination events can occur inside yeast mitochondria. This work would represent a significant/exceptional discovery if supported by compelling data. However, the major problem with this work is that the conclusions are based on the ectopic expression of distinct proteins. This approach is not failproof in precise protein expression/delivery to the specific subcellular locations and is likely to result in a non-specific localization. Thus, the problem of eclipsed proteins is addressed by the methodology that may lead to the artificial generation of eclipsed overexpressed proteins. A more effective approach would be if the authors found a way to study this issue with endogenous proteins. The need for overexpression of mitochondria-targeted ubiquitin makes it challenging to reconcile the physiological role of these fundings. In addition, some critical technical issues and omissions further reduce the potential impact of this work (see Specific comments above). For example, strong evidence of mitochondria fraction purity and additional evidence that all the essential constructs used in this work are not misdirected to a different compartment are needed.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: In this study, the authors detected a set of components of a ubiquitination system in the mitochondrial matrix in budding yeast using the subcellular compartment-dependent α-complementation assay. The authors detected the conjugates of mitochondrial targeting signal sequence-directed HA-Ub (preSu9-HA-Ub) in the mitochondrial matrix. The immunoprecipitates of the preSu9-HA-Ubi conjugates were highly enriched for the mitochondrial matrix proteins. Subsequently, the authors focused on the Rad6 E2 ubiquitin conjugating enzyme in the mitochondrial matrix and evaluated its inactivation-altered ubiquitination pattern in the organelle. The authors conclude that ubiquitination occurs in the mitochondrial matrix because of the eclipsed targeted components of the ubiquitination machinery.

      Major comments: The authors argued that the proteins that were modified with preSu9-HA-Ubi, which was forced to be imported into the mitochondria, are present in the mitochondrial matrix, because these species are resistant to trypsin digestion. However, it was possible that they formed severe aggregates on the cytosolic surface of the mitochondria, and hence, were resistant to the proteinase. In other words, a small amount of proteins that were not imported into the mitochondria could be deposited on the cytosolic surface of the mitochondria, where they were modified with preSu9-HA-Ubi by cytosolic Rad6. To confirm if the preSu9-HA-Ubi-modified proteins were really present in the mitochondrial matrix, they should perform the protease protection assay in the presence of an appropriate detergent (Figure 3D). In addition, subcellular fractionation of the organelle by density gradient centrifugation, indirect immunofluorescence microscopic analysis of the preSu9-HA-Ubi conjugates, and/or experiments on the in vitro import of preSu9-HA-Ubi and Rad6 into the mitochondria would strongly support the authors conclusion. Other experiments that might support the authors conclusion would be to test whether the band pattern for the preSu9-HA-Ubi conjugates changes when the mitochondrial import is impaired.

      Response: We will attempt to perform 1) Protease protection assay in the presence of a detergent (Figure 3D). 2) Subcellular fractionation of the organelle by density gradient centrifugation. 3) In vitro import of Rad6 into the mitochondria.

      Minor comments: In Figure 3B, the molecular weight distributions of the preSu9-HA-Ubi conjugates and those of the preFum-HA-Ubi conjugates are different. Is there any reason for this difference?

      In Figure 3E, the position of "-" (MG132) for lane 1 is not correct.

      In Figure 6A: The band pattern for preSu9-HA-Ubi (lane 13) in the rad6-delta cells expressing Ubc8-alpha is different from that of the wild-type cells expressing Ubc8-alpha (lane 12) as well as that obtained from the rad6-delta cells harboring empty plasmids (lane 9). Is there any explanation for this observation?

      In Figure 7B and S6: The level of preSu9-Rad6-alpha-SL17 in the rad6-delta cells is always lower than that in the wild-type cells (compare lanes 13 and 10 in Figure 7B, and lanes 13 and 12 in Figure S6). Is there any explanation for this observation? The protease protection assay (with detergent control) is needed to fully confirm that preSu9-Rad6-alpha-SL17 is present in the mitochondria.

      In Figure S7, the authors presented the matrix proteins, Ilv5 and Aco1, detected in the preSu9-HA-Ubi IPed samples and described this observation in the main text. However, the authors also showed the blots for Idh1 and Fum1, which were also pulled down with preSu9-HA-Ubi from the WT cells more than from the rad6-delta cells. Is this correct? If so, please elucidate this observation in the main text.

      Figure 8D and 8E are not cited in the main text. Although there are no explanations for these figures in the main text, it looks like Rad6-deltaN11-alpha resides in the mitochondrial fraction. However, the alpha-complementation assay suggests that it resides in the cytosol. Please explain this discrepancy.

      First page of the discussion section, item 6): E2 Rad6, but not E3 Rad6?

      Figure S7: HA-Ub (cytosolic form) control is needed in addition to the empty vector control.

      Figure S7, left panel: There is an unnecessary line break in "Hsp60" and "Ilv5."

      Figure S7, right panel: There is an unnecessary line break in "Hsp60."

      CROSS-CONSULTATION COMMENTS I agree with comments of reviewer 1 and 2. -Validation of the complementation assay. -I also think that it is important to address whether intra-mitochondrial ubiquitination can be observed with endogenous level of ubiquitin. If even a small amount of preSu9-HA-Ub is mistargeted to the cytosol, proteins at the cytosolic side of mitochondrial outer membrane could be ubiquitinated and detected in the mitochondrial fraction. -Preparation of mitochondria with more sophisticated purification methods (i.e. high resolution density gradient) would be needed to separate mitochondria from ER and other organelles. -More information is needed in the materials and methods section.

      Reviewer #3 (Significance (Required)): Significance Although the results are interesting and very important, as mentioned in the major comments section, additional experiments are needed to support their model. However, researchers working on the mitochondrial biology and ubiquitin systems might be interested in and influenced by the reported findings.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors report a public browser in which users can easily investigate associations between PGSs for a wide range of traits, and a large set of metabolites measured by the Nightingale platform in UKBB. This browser can potentially be used for identifying novel biomarkers for disease traits or, alternatively, for identifying novel causal pathways for traits of interest.

      Overall I have no major technical concerns about the study, but I would encourage the authors to revisit whether they can find a more compelling example that can better showcase the work that they have done. I understand that this is partly a resource paper but I think the resource itself can have more impact if the paper provides a clearer use-case for how it can drive novel biological insight.

      Many thanks for your comments. We have undertaken a new application of bi-directional Mendelian randomization to demonstrate how users may use this approach to disentangle whether associations in our atlas likely reflect either causes or consequences of PGS traits/diseases. This example is described on page 9:

      ‘For example, we applied Mendelian randomization (MR) to further evaluate associations highlighted in our atlas with triglyceride-rich very low density lipoprotein (VLDL) particles. For instance, both VLDL particle average diameter size and concentration were associated with the PGS for body mass index (BMI) (Beta=0.04, 95% CI=0.033 to 0.046, P<1x10-300 & Beta=0.012, 95% CI=0.006 to 0.019, P=2.7x104 respectively) and coronary heart disease (CHD) (Beta=0.026, 95% CI=0.019 to 0.032, P<1x10-300 & Beta=0.035, 95% CI=0.028 to 0.042, P<1x10-300 respectively). Conducting bi-directional MR suggested that the associations with average diameter of VLDL particles are likely attributed to a consequence of BMI and CHD liability as opposed to the size of VLDL particles having a causal influence on these outcomes (Supplementary Table 6). In contrast, MR analyses suggested that the concentration of VLDL particles increases risk of CHD (Beta=1.28 per 1-SD change in VLDL particle concentration, 95% CI=1.25 to 1.65, P=2.8x10-7) which may explain associations between the CHD PGS and this metabolic trait within our atlas.’

      and discussed in the discussion on page 21:

      ‘We likewise conducted bi-directional MR to demonstrate that associations between the CHD PGS and VLDL particle size likely reflect an effect of CHD liability on this metabolic trait. In contrast, the association between the CHD PGS and VLDL concentrations are likely attributed to the causal influence of this metabolic trait on CHD risk, suggesting that it is the concentration of these triglyceride-rich particles that are important in terms of the aetiology of CHD risk as opposed to their actual size. We envisage that findings from our atlas, as well as other ongoing efforts which leverage the large-scale NMR data within UKB, should facilitate further granular insight into lipoprotein lipid biology.’

      PGS construction: It's unclear how well the PGS work. Should the reader prefer the stringent or lenient PGS? Perhaps there could be some validation with traits that have decent sample sizes in UKBB. Was there any filtering to remove traits with few GWS hits, low sample sizes, or low SNP heritability as these are unlikely to produce useful PGSs?

      An example of validation was previously included for the chronic kidney disease PGS and its association with circulating creatinine, although this has now been removed due to the feedback you provided in your comments below. However, we have now provided the weights for all of the PGS included in our web atlas should users want to use these scores for prediction purposes (page 7):

      ‘The specific weights for clumped variants used in all PGS can be found at https://tinyurl.com/PGSweights.’

      On page 8 we have mentioned that in this work we have used a more lenient threshold to facilitate endeavours in a ‘reverse gear Mendelian randomization’ framework. However, the option to use the more stringent threshold remains an option for users interested in this as an alternative:

      ‘In this paper, we have discussed findings using PGS that were derived using the more lenient criteria (i.e., P<0.05 & r2<0.1), although all findings based on both thresholds can be found in the web atlas.’

      ‘Specifically, we believe our findings can facilitate a ‘reverse gear Mendelian randomization’ approach to disentangle whether associations likely reflect metabolic traits acting as a cause or consequence of disease risk (Holmes and Davey Smith, 2019) as illustrated using triglyceride-rich very low density lipoprotein (VLDL) particles in the next section.’

      We have not filtering based on other criteria such as the number as SNPs given that certain scores, despite only been constructed using few SNPs, may still provide useful to users. For example, our score for ‘Drinks per day’ based on the more stringent threshold (i.e. P<5x10-8) consists of only 6 SNPs. However, one of these is rs1229984, a missense variant located at the alcohol dehydrogenase ADH1B gene region and known to be a strong predictor of alcohol use (e.g. https://pubmed.ncbi.nlm.nih.gov/31745073/).

      Reviewer #2 (Public Review):

      The authors set out to create an atlas of associations between phenome-wide polygenic scores and circulating lipids, fatty acids, and metabolites. To do so, they utilize GWAS from 129 traits available in the OpenGWAS database to derive polygenic (risk) scores (PGS) along with the recently released NMR metabolomics data containing 249 biomarkers (and ratios) in ~120,000 UK Biobank participants. The authors create a publicly available web portal containing PGS to NMR biomarker associations:

      http://mrcieu.mrsoftware.org/metabolites_PGS_atlas/.

      The strength of this study is in the comprehensive nature of the atlas, containing associations for 129 traits phenome-wide, the large sample size of the UK Biobank NMR data, and the use of PGS for prioritising molecular traits for follow-up experiments, which is an emerging area of interest (International Common Disease Alliance, 2020; Ritchie et al., 2021a). To our knowledge this study is the first to explore this for circulating metabolites.

      In its current form the atlas has several limitations, which should be straightforward to address. Notably, results in the current atlas may be confounded by (1) technical variation in the NMR data (Ritchie et al., 2021b), and (2) major biological determinants of biomarker concentrations, including body mass index, fasting time, and statin usage.

      Firstly, thank you for the suggestion to use your ‘ukbnmr’ R package to help remove technical variations from the UK Biobank NMR metabolites data. We have applied it to remove outliers and variation in the individual data due to (1) the duration between sample preparation and sample measurement, (2) position of samples on shipment plates, (3) different equipment (spectrometers) used. This meant that we needed to re-run our entire analysis pipeline for this project from scratch to the updated dataset. Results do not appear to have drastically changed, although nonetheless we have updated results from all downstream analyses in our online web atlas using this updated dataset provided by ‘ukbnmr’.

      Secondly, the reviewer is correct that biological factors, such as body mass index (BMI) and statin usage, are indeed strongly correlated with metabolites levels. However, we are not able to adjust for such biological factors directly in our analyses, given that they are potential colliders in the causal relationship between diseases/traits and metabolites. Statin usage may be caused by both the high genetic liability to coronary artery disease as well as abnormal lipoprotein lipid levels. Likewise, obesity (and changes in BMI) may result from a high genetic predisposition to cardiometabolic disorders and disrupted metabolism. Thus, adjusting for statin usage and BMI will induce collider bias (https://jamanetwork.com/journals/jama/fullarticle/2790247), which creates spurious associations between the disease/trait PGS and metabolites.

      To better illustrate this issue, we have added additional text on page 14 to justify this study design decision as well as added a new figure (Figure 3) to help demonstrate this clearly to the readers. Fasting time on the other hand we believe is unlikely to act as a collider and was adjusted as a covariate in all linear regression models in this work. This is mentioned on page 25.

      …Further, association results for two (of the 129) PGSs, systolic blood pressure (SBP) and diastolic blood pressure (DBP), are invalid (vastly inflated) as the GWASs used to construct these PGSs included UK Biobank samples.

      Many thanks for your suggestion. We have now removed the SBP and DBP PGS from our atlas due to overlapping samples in UKB. Furthermore, our colleagues at the University of Bristol have notified us that the Glioma GWAS data obtained from the OpenGWAS platform was uploaded with incorrect effect alleles. This PGS has also been subsequently removed from the atlas. Additionally, we removed the Alzheimer’s disease (without APOE) PGS because the pleiotropic effect of lipid associated genes is now systematically examined using lipid gene excluded PGS.

      To demonstrate how one might use these PGS to NMR biomarker associations to prioritise (or deprioritise) findings for follow-up, the authors select a biomarker of interest, glycoprotein acetyls (GlycA), to perform bi-directional Mendelian randomization to orient the direction of causal effects between GlycA and traits of associated PGS. However, the conclusions of this analysis are hampered by the heterogeneous nature of the GlycA biomarker, which captures the levels of five proteins in circulation (Otvos et al., 2015; Ritchie et al., 2019), making it a difficult target to appropriately instrument for Mendelian randomization analysis. This, however, does not detract from the broader point the authors make: that PGS can help prioritize molecular traits for experimental follow-up.

      We have now conducted further sensitivity analyses to evaluate the genetically predicted effects of each of the five proteins in the reference you have provided. This is discussed on page 11:

      ‘We also conducted further sensitivity analyses given that the NMR signal of GlycA is a composite signal contributed by the glycan N-acetylglucosamine residues on five acute-phase proteins, including alpha1-acid glycoprotein, haptoglobin, alpha1-antitrypsin, alpha1-antichymotrypsin, and transferrin (Otvos et al., 2015). Using cis-acting plasma protein (where possible) and expression quantitative trait loci (pQTLs and eQTLs) as instrumental variables for these proteins (Supplementary Table 12) did not provide convincing evidence that they play a role in disease risk for associations between PGS and GlycA (Supplementary Table 13). The only effect estimate robust to multiple testing was found for higher genetically predicted alpha1-antitrypsin levels on gamma glutamyl transferase (GGT) levels (Beta=0.05 SD change in GGT per 1 SD increase in protein levels, 95% CI=0.03 to 0.07, FDR=3.6x10-3), although this was not replicated when using estimates of genetic associations with GGT levels from a larger GWAS conducted in the UK Biobank data (Beta=1.6x10-3, 95% CI=-6.9 x10-3 to 0.01, P=0.71). For details of pleiotropy robust analysis and replication results see Supplementary Table 14.’

      There are also several important limitations to the study which cannot be addressed, which the authors discuss appropriately in the paper. First, the NMR data does not provide a comprehensive view of the metabolome - it is heavily focused on lipids and fatty acids. Many small metabolites in circulation cannot be measured by NMR spectroscopy, and further insights must wait for data from molecular profiling efforts planned or underway in UK Biobank (e.g. mass spectrometry). Second, the authors restricted analysis to participants of European ancestries. This a pragmatic analysis choice given (1) the PGSs were derived from GWAS performed in European ancestries, (2) PGS associations are particularly susceptible to confounding from genetic stratification and differences in environment, and (3) the very small sample sizes for which NMR data is currently available in UK Biobank participants. Finally, although a large sample size, UK Biobank is not a random sample of the population: healthy adults are over-represented, meaning PGS to metabolite associations may be different in disease cases or less healthy individuals.

      Overall this study has strong potential, with straightforward to address limitations, and the resulting atlas will provide a useful characterisation of the relationships between NMR biomarkers and polygenic predisposition to various traits and diseases, which can be used by domain experts to prioritise biomarkers or traits for experimental follow-up.

      Reviewer #3 (Public Review):

      Fang et al. created an atlas for associations between the genetic liability of common risk factors or complex disorders and the abundance of small molecules as well as the characteristics of major apolipoproteins in blood. The whole study is well executed, and the statistical framework is sound. A clear strength of the study is the large array of common risk factors and disease analyzed by means of polygenic risk scores (PGS). Further, the development of an open access platform with appealing graphical display of study results is another strength of the work. Such a reference catalog can help to identify novel biomarkers for diseases and possible causative mechanisms. The authors further show, how such a systematic investigation can also help to distinguish cause from causation. For example, an inflammatory molecule readily measured by the NMR platform and strongly associated in observational studies, is likely to be a consequence rather than a cause for common complex diseases.

      However, in its current form, the study suffers from some weakness that would need to be addressed to improve the applicability of the 'atlas'. This includes a distinction of locus-specific versus real polygenic effects, that is, to what extent are findings for a PGS driven by strong single genetic variants that have been shown to have dramatic impact on small molecule concentrations in blood.

      Thank you for your suggestions to help refine our work. In line with this comment, we have repeated all analyses 1) after applying the ‘ukbnmr’ R package as recommending by reviewer #2 to remove technical variations and outliers and 2) conducted sensitivity analyses to remove an established list of lipid gene loci from PGS construction. Full results can be interrogated in the web atlas to evaluate whether PGS association may be driven by locus-specific effects at these regions, which may be particularly informative given the representation of lipoprotein lipid metabolites on the NMR panel. Findings are reported on page 19:

      ‘The polygenic nature of complex traits means that the inclusion of highly weighted pleiotropic genetic variants in PGS may introduce bias into genetic associations within our atlas. To provide insight into this issue, we constructed PGS excluding variants within the regions of the genome which encode the genes for 14 major regulators of NMR lipoprotein lipids signals which captured 75% of the gene-metabolite associations in the Finnish Metabolic Syndrome In Men (METSIM) cohort (Gallois et al., 2019). For details of these genes see Supplementary Table 5).

      For PGS with these lipid loci excluded, anthropometric traits such as waist-to-hip ratio (N=209), waist circumference (N=206) and body mass index (N=205) still provided strong evidence of association with the majority of metabolic measurements on the NMR panel based on multiple testing corrections. Elsewhere however, the Alzheimer’s disease PGS, which was associated with 60 metabolic traits robust to P<0.05/19 in the initial analysis including these lipid loci (Supplementary Table 17), provided no convincing evidence of association with the 249 circulating metabolites after excluding the lipid loci based on the same multiple testing threshold (Supplementary Table 18). Further inspection suggested that the likely explanation for this attenuation of evidence were due to variants located within the APOE locus which are recognised to exert their influence on phenotypic traits via horizontally pleiotropic pathways (Ferguson et al., 2020).’

      …Further, it is unclear how much NMR spectroscopy adds over and above established clinical biomarkers, such as LDL-cholesterol or total triglycerides. This is in particular important, since the authors do not adequately distinguish between small molecules, such as amino acids, and characteristics of lipoprotein particles, e.g., the cholesterol content of VLDL, LDL or HDL particles, the latter presenting the vast majority of measures provided by the NMR platform. Finally, the study would benefit from more intriguing or novel examples, how such an atlas could help to identify novel biomarkers or potential causal metabolites, or lipoprotein measures other than the long-established markers named in the manuscript, such as creatinine or lipoproteins.

      To address these comments, we have added a new example focusing on the granular measures of VLDL particles provided by the NMR data (on top of the examples listed at the start of the response to reviewer document), which as the review points out is one of its strengths of the measures generated by this platform over long-established biomarkers (page 21):

      ‘We likewise conducted bi-directional MR to demonstrate that associations between the CHD PGS and VLDL particle size likely reflect an effect of CHD liability on this metabolic trait. In contrast, the association between the CHD PGS and VLDL concentrations are likely attributed to the causal influence of this metabolic trait on CHD risk, suggesting that it is the concentration of these triglyceride-rich particles that are important in terms of the aetiology of CHD risk as opposed to their actual size. We envisage that findings from our atlas, as well as other ongoing efforts which leverage the large-scale NMR data within UKB, should facilitate further granular insight into lipoprotein lipid biology.’

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1:


      1) The authors could consider qualifying the observations as preliminary as no

      mechanistic data or longer-term pathophysiology is investigated. Indeed, the latter is well

      beyond the current scope and may require generation of cell-type specific STING ki mice.

      • *

      Thank you for the comment. We have qualified our observations as preliminary (line 662).

      Indeed, generating cell-type specific STING ki mice is part of our future plans.

      2) The authors consistently write "NF-kB/inflammasomes" - these two pathways (although

      related) are quite distinct and should not be lumped together in such a way.

      • *

      Thank you for this important note, we now corrected the text (for example see section headings in the Results section, lines 338 and 415).

      3) Line 79: "NRLP3" should be corrected to NLRP3.

      • *

      Line 210: age of "adult mice" in weeks should be state in the text and figure legend.

      Thank you, corrected.

      4) Line 262: In Figure 3B and D the images look very different and there is no indication of

      what a positive inclusion is? This should be indicated on the image.

      • *

      Thank you for the suggestion. We replaced the corresponding panels with new images, where we show the nuclei with blue, and the Thioflavin S staining with magenta pseudo-color (current Figure 3E). We marked the outline of Thioflavin S positive cells with yellow. An inset showing the magnification of some neurons with inclusions is also presented.

      5) Line 280: The data of Ifi44 should also be mentioned in the text.

      • *

      Thank you. We performed new experiments to show the gene expression changes in the

      striatum and in the substantia nigra, therefore majority of the gene expression data from the cortex has been moved to the supplementary material (supplemental Figures 3 and 5), and is not discussed in detailed in the current manuscript.

      6) Line 290: Figure 4, Examining IL-1B and Caspase-1 transcripts is not a readout of

      inflammasome activation. pro-IL1B is upregulated in response to NFkB activity. Inflammasome

      activation is commonly examined in other methods e.g. via ASC puncta formation (imaging

      based), active IL-1B secretion (ELISA), Caspase-1 and IL-1B cleavage via western blot

      Thank you for this suggestion, we performed new experiments and added the data as Figure 6.

      • We performed Western blot analysis to detect IL-1β cleavage and NLRP3 proteins from the striatum (Figure 6C-E). 2) We quantified the number of ASC puncta within microglia and astroglia from striatal sections (Figure 6F-I). 3). 3) We also measured the protein levels of several additional immune mediators in the striatum of STING ki and KO animals (supplemental Figure 6, summary heatmap is on Figure 7A). 7) Line 310: The NF-kB subunit examined should be stated (p65?). Furthermore, IRF3

      translocation might be a better readout for STING activation.

      • *

      We indeed detected the p65 subunit of NF-kB (antibody is listed in the supplemental Table), and now it is also indicated in the text (line 366). We also performed subcellular fractionation and quantified IRF3 in the nuclear and cytoplasmic fractions. The data is now added on Figure 6A, B.

      8) Discussion: Given the findings here suggest a strong role for NF-kB, a short discussion

      of IFN vs non-IFN responses from STING should be included. There have been a number of

      seminal papers demonstrating the importance of non-IFN STING responses of late as well as

      much evidence from SAVI mice to suggest some non-IFN driven pathologies.

      • *

      Thank you for the suggestion. The data on inflammasomes were given a separate section in the results (from line 395). In the discussion, from line 535 we discuss the IFN dependent response and from line 548 we discuss the non-IFN driven pathways.

      9) Discussion: Is there any evidence from the human SAVI patients of neuroinflammation

      etc. This should be mentioned either way in the discussion

      • *

      Thank you for this comment. The manifestation of neurological symptoms is not a core feature of the human SAVI disease. Some patients suffer from various neurological symptoms e.g. calcification of basal ganglia, spastic diplegia and episodes of seizure (Fremond et al., 2021). We inserted a short text in discussion (lines 532-534).

      10) Discussion: There is a large body of work demonstrating STING-induced cell death in

      numerous cell types. Despite this it is not mentioned nor discussed but should be. It could

      represent how dopaminergic neurons are lost in the STING ki mice.

      • *

      Thank you for pointing out the gap in our discussion. We added additional text in lines 604-618.

      11) The resolution/quality of some of the imaging is not great but this may be due to PDF

      Compression

      Thank you, we upload the figures with higher resolution.

      Reviewer #2:


      1) The authors base their conclusions (line 215-216) on the neuroinflammatory status of

      their mice strongly on an assessment of the Iba1 and GFAP-positive area fraction. Increase of

      Iba1 and GFAP areas does not necessarily correlate with an increased cytokine production and

      release by the cells. Therefore, in addition the measurement of cytokine mRNAs it would be

      necessary to measure cytokines also on protein level (see also #4 and #5).

      • *

      Thank you for this suggestion, we measured the protein levels of several immune mediators with LEGENDplex™ assay from the striatum, and the new data are included as Figure 7A and supplemental Figure 6.

      2) In the same context: Is the increase of Iba1 and GFAP- covered area due to increased

      proliferation of microglia and astrocytes or due to increased expression of these markers in

      activated glia? How is the number of Iba1/GFAP-positive cells affected?

      • *

      We quantified the number of glia cells in the striatum and in the substantia nigra of adult STING WT and STING ki mice, and, parallel with higher immunoreactivity for the corresponding markers, we detected increased number of cells as well. The quantifications are now included in supplemental Figure 1.

      3) Nowadays we know that microglia and astrocytes can exist in a variety of activated

      states which can be either beneficial or detrimental. An analysis of disease-associated

      microglial markers (Keren-Shaul et al. 2017) would give a good picture of the state microglia are

      in.

      • *

      Thank you for the suggestion. In addition to the panel of immune modulators at the protein level (supplemental Figure 6), we performed qPCR analysis of additional “M1” marker (Nos2) and additional “M2” markers (Il4, Fizz2, Ym1) (Gong et al., 2019). The data is included in Figure 7A and shown in supplemental figure 6. The findings are described from line 431.

      4) It also would be of interest to determine which cell type is responsible for the observed

      neurodegeneration. Which cytokines are released by microglia or astrocytes upon STING

      activation? Even in vitro experiments would help here to get a more profound understanding.

      • *

      We agree with the suggestion, however, the further in vitro experiments are beyond the scopes of this study and will be the basis of a future project.

      5) In line 273 the authors describe that STING is known to activate NFkB and the

      inflammasome. As proof that this is also occurring in their mouse, they perform qPCR analysis

      of whole brain IL-1b, TNF-a and Casp1 expression. While this analysis indicates that there is

      indeed an increased mRNA production of proinflammatory cytokines in the brains of STING ki

      mice, it does not give any indication whether the inflammasome is active or not. The inflammasome is a protein complex largely regulated on protein level. Meaning an assessment

      of the cleavage of Caspase 1 on protein level or the presence of cleaved IL-1b in comparison to

      uncleaved Pro-IL-1b by Western Blot as well as a staining for the number of inflammasomes

      would be required to draw these conclusions.

      • *

      Thank you for the suggestion. We performed additional experiments: 1) Western blot to detect pro-IL1b and IL1b and NLRP3 proteins from the striatum (Figure 6C-E), and 2) we quantified the number of ASC puncta within microglia and astroglia from striatal sections (Figure 6F-I).

      6) To conclude that NFkb/inflammasome pathway is the most active/crucial in astrocytes

      (line 354) a staining for ASC inflammasomes would be of importance, especially as astrocytes

      normally do not express NLRP3.

      • *

      Thank you for this comment. We stained brain sections for ASC specks and for microglia (Iba1) and astroglia (GFAP) markers (Figure 6F-I). Although amount of ASC specks in astroglia was lower than in microglia, we found still a substantial amount of ASC specks in astroglia in the brains of STING ki animals.

      7) As already shown for ALS (Yu et al., 2020) and Parkin KO (Sliter et al. 2018), the authors want to

      further assess the relevance of the STING pathway to PD (line 27-28). Therefore, an in-depth analysis of

      key PD hallmarks beyond phosphorylated a-synuclein, loss the other was parkin/PINK related (so TDP

      deleted) of TH-stained neurons and dopamine reduction is needed. In the discussion the authors

      hypothesize that autophagy (line 467) may be linked to the observed phenotype. Therefore,

      assessment of autophagy/mitophagy as well as mitochondrial dysfunction and mtDNA should

      be analysed. In the same line of thought it would be important to know if and how the observed

      dopamine reduction effects mouse behaviour, thus mice should be subjected to the Rotarod or

      pole or beam walk test.

      • *

      Thank you for these suggestions. In the work by Yu et al. and Sliter et al., the STING pathway was shown to mediate neurodegeneration resulting from TDP-43 pathology and mitochondrial damage. Our work is complementary by investigating the effects of constitutive activation of STING. We have therefore focused on the signaling pathways downstream of STING. As mentioned above, the most important next step will be to separate the contributions of neuronal and glial cells by generating cell type specific STING activation. Of course, it will be interesting to see at a later time point whether STING activation feeds back. We also speculate that STING activation may also cause TDP-43 pathology. Yet, this will be part of a future study. To acknowledge that the pathology is not specific to alpha-synuclein, we added a short statement from line 634.

      With respect to the comprehensive analysis of the PD phenotype, our work includes the

      classical parameters of TH neuron number, TH fiber density, dopamine concentration and

      synuclein pathology. With respect to mouse behavior, we note that the STING ki mice have severe inflammation in the lung, kidney and other (peripheral) organs, reduced body weight and reduced lifespan (Luksch et al., 2019; Motwani et al., 2019; Siedel et al., 2020). Motor deficits cannot be attributed to dopamine neuron degeneration and for this reason were not included (stated in the Discussion, lines 624-625). In order to expand the description of the PD phenotype we now included measurements of cytosolic reactive oxygen species, mitochondrial oxygen species and nitric oxide, which result from inflammation and are known to affect dopaminergic neurons (new Figure 8).

      Reviewer #3:


      1) The method for quantification of TH-positive cells is not sufficient. They just described

      how they stained every fifth sections but did not mention how they count. This is a critical point

      and they should carefully provide information more than just referring their previous paper.

      Counting of dopaminergic neurons and quantification of fibers was described in a dedicated section of the methods. This section has now been expanded (from line 154).

      2) It is not persuasive that they did not investigate local inflammation in SN. They

      presented increased microglia and astrocytes in the striatum but not analyzed these cells in SN

      • *

      Indeed, we measured neuroinflammation in the substantia nigra as well, however, although increased in STING ki mice, it was less pronounced than neuroinflammation in the striatum. We now include the quantification of area fraction as well as cell number counting of microglia and astroglia in the substantia nigra of STING WT and STING ki animals (supplemental Figure 1), and also the expression of inflammatory mediators in Figure 4.

      3) In Figure 3, they analyzed alpha-synuclein phosphorylation and beta-sheet structure in

      the striatum. This is funny from the aspect of Parkinson's disease, which dominantly affects SN.

      They should perform similar experiments with SN samples. In a different aspect, the aggregates

      detected by Thio S may not be alpha-synuclein and could be tau, TDP43 or other substances.

      Phospho-synuclein of course does not mean aggregation, so they can consider electron

      microscopy.

      • *

      We agree with the reviewer. To complement our data, we therefore performed solubility assay both from the striatum and from the substantia nigra to quantify the ratio of alpha-synuclein in the Triton X-100 soluble and insoluble fractions (Figure 3C, D) as previously (Szego et al., 2022; Szegő et al., 2019). Additionally, we quantified phosphorylated alpha-synuclein from the substantia nigra as well Figure 3A,B).

      We also agree with the reviewer that the presence of Thioflavin S-positive inclusions may also contain other, beta-sheet forming proteins and noted this from line 634.

      4) Figure 5, pSTAT3 increased in Iba1-negative cells, which seem neurons from the size of

      nuclei. First, the authors should investigate the identity of pSTAT3-positive cells with GFAP and

      MAP2. If pSTAT3 is actually increased in neurons, what does it mean in the pathology? For

      instance, in viral infection, STAT3 activation triggers suicide of neurons to prevent further

      proliferation of viral particles in neurons. Is it homologous or other function?

      • *

      Thank you for this suggestion. The brain sections were stained for Iba1 and GFAP. pSTAT3 nuclear staining indeed increased in non-glia cells, based on the morphology, we think in neurons. However, detailed characterization of the signal is out of the scopes of this (preliminary) study.

      5) In Figure 6 and overall, cell types in which the activation of three signaling pathways,

      were mixed up and hard to understand the actual situation in the brain.

      • *

      In our model, STING is activated in all cells. Consequently, we cannot determine the origin of immune mediators found elevated in the STING ki mice. This will require cell type specific STING activation. In order to react to the reviewer’s comment and be clearer, we have added more details about the brain region and age of mice used for each analysis also in the figures.

      6) In the method section, the original paper for generation of heterozygous STING N153S

      KI mice should be Warner et al, JEM 2017.

      • *

      We used a STING N153S ki mouse strain that was independently generated in the Technical University Dresden (Luksch et al., 2019).

      7) NF-κB stains seem located in cytoplasm in Figure 5B.

      • *

      We agree: especially in the young STING ki mice, cytoplasmic NF-kB staining is increased

      compared to STING WT mice. To quantify nuclear translocation, however, we counted the

      number of those cells where NF-kB signal was overlapping with the nuclear Hoechst staining.

      8) In Figure 4 and 6, why the authors evaluate gene expressions in frontal cortex instead of

      SN or striatum.

      • *

      As noted in several comments, we show here that the STING-induced pathology involves

      dopaminergic neurons, but believe that it is not specific for the dopaminergic system given that STING-ki is ubiquitously expressed. For practical reasons, we have used cortical samples for the expression analysis. For consistency, we now performed additional qPCR measurement from the striatum and from the substantia nigra and included them as new Figure 4 and supplemental Figure 6N-Q. The previous data from the cortex was moved to the supplemental Figures 3 and 5. Additionally, we measured the levels of several inflammatory modulators from the striatum of STING ki and KO animals (Figure 7A and supplemental figure 6A-M).

      9) In some groups (Sting-ki;ifnar1-/- in Fig 6C, 6E), the values were separated to two

      groups, which makes readers to doubt on soundness of their genotyping.

      • *

      Our genotyping protocol is highly standardized, and the genotype of the animals were correctly assigned. Here we provide an example of gel images showing the products after PCR reactions for the STING N153S allele (Figure 1a), STING WT allele (Figure 1b), Ifnara WT allele (Figure 1c) and lack of Ifnara allele (Figure 1d) of the same animals. We note that a bimodal distribution of phenotypes is often observed in Ifnar-/- mice.

    1. Author Response

      Reviewer #2 (Public Review):

      This work will be of potential interest to biologists studying aging. While transposable elements have been reported to have higher expression as organisms age, it was previously unclear if their expression can exacerbate aging phenotypes or if they are a byproduct of aging. The authors present evidence in this manuscript that artificially increasing transposable element expression during the whole Drosophila life cycle can worsen aging phenotypes.

      Strengths

      The authors provide direct evidence that expression of their gypsy construct across the whole life of animals decreases fly lifespan (Figure 4), and that this outcome is dependent on reverse transcriptase (Figure 6).

      Monitoring TE mobilization can be difficult in general and is often expensive when using a sequencing approach. The authors accurately monitor gypsy mobilization from their ectopic copy through qPCR and sequencing.

      Weaknesses

      Experiment design, data interpretation, and story structure:

      The current model proposes that TE increases activity in aged animals and potentially contributes to the aging process. However, this paper artificially drives gypsy activation throughout the whole fly life cycle. Under this design, TE may already bring deleterious effects from early developmental stages or young age, thus ultimately shortening their life cycle. To truly test the function of TE during the aging process, the authors need to temporally control gypsy expression and only express their construct in aged animals.

      Figure 1: I am not sure I got any convincing messages from this figure. First, flies at 30 days of age should not be considered as old. Second, the authors try to claim that TE expression increased with aged FOXO mutants. However, there is no data to show the comparison between aged wild-type and FOXO mutants (panel e is young wt vs young FOXO null). Meanwhile, Figure 1 has nothing to do with Gypsy. How could this figure fit into the story?

      It is clear that we did not do a good job explaining this section. First, we did not mean to imply that the 30-day flies are old. They are simply older than the 5-day flies. The 30-day timepoint was chosen to match previous experiments and data sets in the literature. It was also chosen to minimize any survivor bias that could occur by doing the assay in very old flies. We have clarified this in the text and figures.

      Second, it is the number of transposons that show an increase in expression in the dFOXO null animals that we mean to highlight (18 for dFOXO vs only 2 for wDAH). Panel e is meant to illustrate that the transposon landscapes, even in young flies differ by genotype making a direct transposon to transposon comparison impossible. We have added text to clarify these points.

      Third, we also do not mean to imply that anything here is specific for gypsy. The work going forward in the paper uses gypsy as a tool because it is one of the better understood retrotransposons, there existed a validated active clone of the transposon and it has already been implicated in aging in the fly. We took gypsy as a model retrotransposon. We have added text to clarify here.

      Figure 3: While the data presented in this Figure is sound, it is unclear how this data fits into the overall narrative that transposon activity drives aging.

      Figure 3 is a continuation of the characterization of our ectopic gypsy. We wanted to rule out that there is a “hotspot” of insertion that would account for any phenotypes we observe. We find no hotspot in the males we use for analysis suggesting it is the act of transposition, not a specific target gene that is important. We have added to the text to clarify the motivation for these experiments.

      Figure 5: It is interesting to see the copies of gypsy are not increased after 5 days. Does gypsy still mobilize after this young age? If yes, the authors should observe increased gypsy gDNA in later time points, unless the cells having gypsy new insertions keep dying. The authors should specifically check tissues with low cell turnover (such as brain) or high cell turnover (such as gut).

      Reviewer 2 makes a great observation. In fact, using primer pairs that specifically detect the ectopic gypsy, we consistently see insertion numbers go down in very old animals (figure 5a&b). With our current understanding of retrotransposition, we should not be able to see loss of insertions unless the host cells are being lost from the analysis. We favor the idea that the reviewer suggests; that the cells that have high levels of insertion are dying and disappearing from the analysis. We think this is also reflected in the bias for intergenic or intronic sequences in our insertion mapping of figure 3. In an attempt to address this question we did measure insertions in heads versus bodies. In male flies aged 14 days there was no difference in the average number of insertions (although the variability was greater in heads). This data is reported in Supplemental Figure 6a.

      Figure 8: Using Ubiquitin GAL4 to drive both gypsy and FOXO expression could dilute the expression of each individual gene. Thus, it is possible the rescue effect seen by expressing FOXO in addition to gypsy may just be due to lower gypsy expression. Including qPCR data showing gypsy expression levels in Ubi>gypsy, UAS FOXO flies compared to Ubi>gypsy flies would be helpful.

      We included this data in Figure 2b and 8c. Unfortunately, we did not clearly direct the reader to compare the values. Comparing Figure 2b with Figure 8c shows the RNA level of the ectopic gypsy is comparable in both cases. Perhaps even slightly higher in the UAS-FOXO case. We have added a sentence to make this clear.

      It is unclear if FOXO can rescue TE-specific aging phenotypes. While it appears that FOXO overexpression rescues the decrease in lifespan caused by gypsy expression, the authors did not test if FOXO overexpression could rescue the effects of gypsy in the paraquat resistance assays or rhythmicity experiments.

      We include in this revision data showing dFOXO overexpression rescues the paraquat resistance and lowers the levels of overall insertions in the animals.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors of the paper provide new evidence of how prefrontal cortex of mutant mice used as a disease model of schizophrenia differs from wild type littermates. By analyzing local network dynamics at the level of specific cell type, authors shed new light on the circuit mechanisms that underlie changes in network dynamics in these mice.

      The claims in the submitted manuscript are supported by the data. I have a few comments and questions that need to be clarified.

      We thank the reviewer for highlighting the novelty of our work and its relevance (…shed new light on the circuit mechanisms that underlie changes in network dynamics in these mice…) for the field and the validity of our data (….claims in the submitted manuscript are supported by the data).

      1) Average firing rates

      Authors claim that they saw a significant reduction in interneuron firing rates in Disc1 mutant mice compared to control mice Fig.1c. However, the difference could be general and not interneuron specific. Due to the high firing rates of interneurons, the statistical test will work better on interneurons than on pyramidal cells as pyramidal cells average firing rates are lower. What I suggest to do is to take interneuron cells that fire at a lower rate (lower 33% for example ) and compare the control and Disc1 groups. Also I would suggest to take pyramidal cells that have higher firing rates (upper 33% for example) and compare firing rates across the same groups. One would like to see if these differences are not due to changes in firing rates per se.

      We thank the reviewer for pointing out this important aspect. In our original analysis, we did not take into account that additional differences in the PYR population might be present but ‘masked’ by the overall lower firing rate of that neuronal population. As suggested by the reviewer, we separately considered the firing rate of the ‘top 33%” of the PYR population, which did not significantly differ between genotypes (p=0.958, n=209 control and 245 Disc1 PYRs, Welch’s test). As suggested, we moreover considered the ‘bottom 33%’ of INT firing rates, for which the significantly lower rates of Disc1-mutant INTs remained (control: 4.2 ± 0.6 Hz vs. Disc1: 1.8 ± 0.2, n=26 and 34 neurons, p=0.013, Mann-Whitney U-test). Since only few INTs were recorded per session in some cases (ranges: Disc1: 2-12/session; control: 2-19/session), we performed this analysis on the basis of individual cells (see also our reassessment of the main statistical comparisons in response to #1 by reviewer 2 and #4 by reviewer 3). These data are now reported in the new Fig. 1 – figure supplement 3 and referred to in line 76 ff. (line 72 ff. without tracked changes) of the revised manuscript.

      2) Optogenetic tagging

      Authors indicate that light triggered and spontaneous spike waveform are similar Fig.1d. This is nice, but would be better to see all the tagged neurons. I would suggest showing all optically tagged neurons spike features. Authors can impose with a different color spike features of tagged neurons in Fig.1a. I suspect that since all PVI are narrow spiking and they must fall into the area of blue colored cells in Fig.1a.

      Following the reviewers suggestions, we included the average waveforms with and without light for all opto-tagged PVIs in the revised Fig. 1f. Moreover, we included the kinetic features of opto-tagged PVIs in Fig. 1a (red dots), and separately for control and Disc1-mutant mice in the new Figure 1-figure supplement 2. As predicted by the reviewer, the PVIs indeed cluster with the other putative INTs. We would moreover like to point to our new analysis in response to #2 of reviewer 2 addressing the spike kinetics of optotagged PVIs versus untagged putative INTs, which are similar in their trough-to-peak duration and asymmetry index. These data are shown in the novel Fig. 1 – figure supplement 2.

      3) It was not clear why authors assessed only firing rates in last 25ms (line 348-349). If they have a clear justification for this they should provide it. But why not use the latency of the first spike also as an additional metric. A well tagged cell will respond to light pulse with short latency (within 5 ms). My concern is that non PVI cells may increase firing rate after 25ms of stimulation of PVI cells due to disinhibition.

      Despite the latency to the first spike being frequently used as a method to detect ChR2-positive neurons, the laser stimulation produced significant photoartefacts in our hands. We were therefore concerned that spikes that happen shortly after the onset of the light pulse might be missed, and hence the latency to the first spike might be misinterpreted. Selecting a later time point in the stimulation interval allowed us to assess the firing rate during light application without the interference by artefacts. Nevertheless, we fully agree with the reviewer’s concern that ChR2-negative non-PVIs might increase their rate due to disinhibition, and that these neurons might thus be falsely classified as PVIs. However, we are confident that that is not the case. First, optotagged PVIs cluster well within the population of electrophysiologically identified INTs (see our response to your first remark on ‘optogenetic tagging’) and were indistinguishable from this population in terms of spike kinetics (see our response to #2 of reviewer 2 and the new Fig. 1 – figure supplement 2), suggesting that no disinhibited PYRs were included in the optotagged sample of cells. Second, we performed an additional analysis to address the time course of firing rate changes in optotagged PVIs. We computed smoothed spike trains (convolved with a 5 ms SD Gaussian kernel), and extracted the average firing rate of each optogenetically identified PVI centered on the onset of the light pulses. This analysis revealed a rapid increase in firing rate upon light delivery, arguing against disinhibitory network effects. These new data are now shown in the new Fig. 1 – figure supplement 5 and reported in line 89 (85 without tracked changes) of the revised manuscript.

      4) Spike cross-correlations

      The authors show that spike transmission probability from PYR to PVI is reduced in Disc1 mice compared to the controls Fig.2d and Fig.2e, but what happens to PVI to PYR spike transmission probability? Is it different in those groups? Answering this question is important since the authors discuss this topic in line 185-193.

      Inhibitory synaptic interactions are indeed detectable by spike-train cross-correlation. However, we find these to be harder to quantitatively interpret than excitatory connections. Those interactions are not visible as spike transmission but rather as a reduction in spike transmission. Reliable estimates of the reduction in spike rate of postsynaptic PYRs require very large spike numbers of postsynaptic neurons that need to be sampled. For instance, Senzai et al., 2019 (Neuron 101: 500-513.e5) identified inhibitory interactions in continuous recordings lasting up to 68 h. Since we did not explicitly design our experiments to investigate inhibitory interactions, our recordings were substantially shorter than the required length. Using the method of Senzai et al., 2019 to identify inhibitory interactions, we detected only 5 INT-INT interactions (in the pooled Disc1-mutant and control data set). This low number does not allow the quantification of potentially reduced spike transmission. Thus, attempts to quantify inhibitory interactions properly would require a substantial amount of additional long-duration recordings. While the point raised by the reviewer is highly relevant and should be investigated in future, we think that given the extensive amount of experimentation needed to address this question, it is beyond the scope of the current manuscript.

      5) Authors could try to link oscillations with spike transmission probabilities. On line 180 authors discuss that lower synchrony between PVI might be responsible for observed reduction in gamma power in Disc1 mutant mice. With the available data authors could test this hypothesis. They can look at spike cross correlations in their pool of INT and PVI (if they have pairs of PVI recorded in the same session) population.

      We thank the reviewer for this excellent suggestion! We computed the cross-correlations for all simultaneously recorded putative INTs and quantified the baseline-subtracted mean cross-correlation within 10 ms around zero time lag. This analysis revealed weaker cross-correlation in Disc1-mutant mice (p=0.026, Mann-Whitney U test, tested on averages from n=7 control and Disc1 mice with at least 2 INTs recorded simultaneously), suggestive of reduced synchronization of putative INTs at short time lags. These new data are now included in the new Fig. 4 and reported in line 201 ff. (185 ff. without tracked changes) of the revised manuscript.

      6) An alternative way to link oscillations with lower spike transmission probabilities in PYR-PVI pairs is to use synchrony triggered LFP analysis. One could take all time points when PVI and PYR cells fired acausal spikes within 2ms window and look at the LFP around this time point. Than take the average of the synchrony-triggered LFP and look at the power spectrum.

      The proposal to link spike transmission with LFP power is indeed intriguing. As suggested by the reviewer, we extracted the 60-90 Hz-filtered LFPs triggered by INT spikes that followed a spike in a presynaptic PYR by <2 ms and measured the average gamma amplitude in a window of 20 ms around the INT spike. This analysis revealed comparable gamma amplitudes in Disc1 compared to control pairs. This finding suggests that local PYR-INT loops are still capable to produce gamma oscillations, and that the gamma oscillation defect of Disc1 mice is likely not caused by such a local defect. To investigate the relationship between INT spike timing and gamma oscillations more generally, we further extracted gamma amplitudes of spike-triggered LFPs using all available spikes of the INTs. Moreover, we compared the data to gamma amplitudes measured at randomly selected time points. ANOVA analysis followed by Tukey tests performed on the level of mouse averages indicated that while INT spiking-associated gamma amplitudes were significantly larger than those depicted from random time points in wild type mice (p=0.001). However, the same was not true for Disc1-mutant mice (p=0.591). Furthermore, this analysis revealed significantly reduced spike-triggered high gamma amplitudes in Disc1-mutant compared to control mice (p=0.011). While these results argue against a driving role of local connection alterations in gamma defects, they generally confirm the impaired synchrony of INT spiking relative to gamma oscillation that we observed in our analysis of phase coupling. These data are now shown in the new Fig. 4, which summarizes all new analyses regarding gamma oscillations and phase-coupling, and in figure 4 – figure supplement 2. The new results are described in the main text of the revised manuscript in line 188 ff. (172 ff. without tracked changes).

      Considering the reduced short time scale synchronization of INTs (see our new results towards the reviewer’s #5) and reduced gamma amplitude of INT spike-triggered LFPs, it is possible that impaired synchronization among prefrontal INTs might contribute to the observed reduction in gamma power of Disc1-mutant mice (thereby, essentially, reflecting impaired INT gamma (ING)). Additionally, reduced long-range excitatory drive maintaining local gamma oscillations might be a contributing factor. For example, recent work showed that high gamma oscillations in the mPFC occur synchronized with gamma oscillations in the olfactory bulb (Karalis & Sirota, 2022, Nat Commun 13:467). It remains to be investigated whether local INTs are rhythmically driven by input from the olfactory bulb (in a multi-synaptic pathway including olfactory cortex) and to what extent that drive maintaining afferent gamma might be altered in Disc1-mutant mice. While the current data set does not allow a systematic evaluation of these possibilities, they should be further explored in future experiments.

      7) Cell assembly analysis

      The authors used 10ms for testing synchronization among pairs of PYR neurons in Fig.4a but 25ms for analysis of assembly dynamics. I think the authors justified why they used 25ms bin size, but it was not clear why they used 10ms? Could the authors clarify the reasons behind this decision?

      The synchronization analysis was originally applied to PYRs converging on a common postsynaptic INT. English et al. (Neuron 95:505-520, 2017) systematically tested the effect of presynaptic cooperativity on spike transmission in the hippocampus (their Fig. 5). Their analysis revealed a maximum in cooperativity at ~10 ms. To maximize the sensitivity of our approach, we thus focused on 10 ms for this analysis. However, we agree that using the same time window as for assembly extraction is a reasonable proposal, in particular since we find no difference in the synchronization of identified presynaptic PYRs (Fig. 3e of the revised manuscript). Thus, we have recomputed cross-correlations using a 25 ms bin size. To further improve the analysis, we restricted it to neurons with at least 1000 spikes and simplified the quantification of excess spiking by using the ‘coinicident_spikes’ function of the Python package neuronpy.utils.spiketrain. Excess synchrony is now estimated by quantifying the number of coincident spikes between a reference and a comparison spike train detected in a 25 ms time window normalized by the firing rate expected by chance (2*frequency of comparison train * synchrony window * number of the reference train).

      By using this improved analysis with a 25 ms time window, we could replicate our original finding of enhanced synchronization of PYR spiking. However, when we averaged the data on the basis of individual mice as suggested in #1 of reviewer 2 and #4 of reviewer 3, we could not observe this effect (irrespective of whether we used the new, coincident spikes-based analysis or the original excess synchrony analysis at either 10 or 25 ms synchrony window). This result is now stated in line 215 ff. (199 ff. without tracked changes) of the revised manuscript.

      Reviewer #2 (Public Review):

      This is an interesting paper, in which the authors assessed spiking and network deficits in a well-established mouse model of schizophrenia. This mouse model carries a genetic deletion of the Disrupted-in-schizophrenia-1 (Disc1) gene, which is highly penetrant in the human condition. The authors combined behavioral analyses with state-of-the-art electrophysiological recordings in vivo, coupled to optogenetic tagging, to study a subnetwork formed by a major inhibitory neuron subclass (the parvalbumin (PV)-expressing interneuron) and principal excitatory pyramidal neurons in the medial prefrontal cortex. This work indicates reduced firing rates of PV cells in Disc1-KO mice, likely due to reduced coupling with pyramidal neurons, leading to alterations in local network activity. Indeed, the authors found that Disc-KO mice exhibited reduced levels of gamma oscillations and somewhat hypersynchronous networks.

      Taking advantage of novel techniques and analytical strategies, the manuscript provides rich, novel insight into the neurobiology of a mouse model of this severe psychiatric condition. The data is of high quality, the findings interesting and the manuscript is well written.

      Overall, the results support the authors' conclusions, although some additional analyses are necessary to corroborate their interpretations.

      Although the paper does not give information on how PV cell dysfunctions are engaged during cognitive tasks, this study can be considered as an important first step in advancing our knowledge on the basic dysfunctions of cortical networks in this model of schizophrenia

      We thank the reviewer for praising the ‘high quality’ of our work, and the ‘rich, novel insights’ on the neurobiology of a mouse model of a psychiatric disorder.

      1) The major findings stem from the analysis of the spiking activity of individual neurons recorded using either silicon probes or arrays of tetrodes. Both techniques allow simultaneous recording of many neurons from a single animal; therefore, from a statistical point of view neurons recorded from one animal are pseudo replicas and cannot be considered as independent measurements. Throughout the manuscript, the authors perform two-sample tests on the pooled data from all recorded neurons to compare differences between genotypes; therefore, artifactually increasing the power of statistical tests. Comparisons between genotypes should be performed using each mouse as an independent measurement.

      To be able to compare the data on the basis of mouse averages, we performed additional recordings, which resulted in a final data set of 9 Disc1 and 7 control mice. We recomputed the main results of this study based on mouse averages. First, consistent with our original cell-by-cell analysis, we found significantly reduced firing rates of putative INTs but not of PYRs (line 72 (69 without tracked changes)). Moreover, we confirmed our results on decreased spike transmission probability at PYR-INT connections (line 121 (107 without tracked changes)), decreased spike transmission in the resonance window (line 163 (147 without tracked changes)), reduced high gamma power (line 173 ff. (157 ff. without tracked changes)), lower phase-coupling of INT spikes to high gamma oscillations (line 178 (162 without tracked changes)), and reduced strength of assembly activations in Disc1 compared to control mice (line 229 ff. (211 ff. without tracked changes)). Similarly, we performed new analysis on INT-INT synchronization and INT spike-triggered gamma amplitudes (as requested by reviewer 1 #5 & 6), which showed significant effects on the level of mouse averages (line 188 ff. (line 172 without tracked changes)). Second, our original finding on significant differences in the synchronization of individual PYR-PYR pairs could not be reproduced on the level of individual mice. This is reported in line 215 (199 without tracked changes) of the revised manuscript. Finally, the analyses based on optogentically identified PVIs did not allow comparison by mouse averages due to the low number of experiments (n=3 mice each). Given that the vast majority of our conclusions is based on electrophysiologically identified INTs, with optogenetic identification experiments being only confirmatory in nature, and that performing additional experiments for optogentic identification of PVIs would be very laborious, we report the results of these analyses as comparisons between neurons or connected pairs. This is clearly stated at the respective sections throughout the revised manuscript. We hope that the reviewer can agree with our decision.

      2) The superficial layers of the mPFC are difficult to reach with a vertical approach of the probes due to the presence of a large blood vessel located medially in the frontal dura. Therefore, the authors are most likely reaching mPFC deep layers where PYR neurons produce fast spikes at high rates. If this is the case, this would make it difficult to sort the spiking of PYR from that of INs based on the spike kinetics and rate. The authors used opto-tagging of PVIs in a set of experiments. It would be reassuring to confirm that the spike waveform and kinetics that they extracted from PVIs are similar to those they assigned as INTs in their experiments with no opto-tagging. Identified PVIs should be statistically different from putative PYRs (not responding to light). Although opto-tagging of PVIs can solve this issue, the amount of cells isolated remains low and the number of animals is not stated. Opto-tagged cells are subsequently used for further analyses but the statistical value of those remain unclear. Since the entire interpretation of the rest of the results depend on this result, this must be clarified.

      As correctly pointed out by the reviewer, we indeed targeted deep layers of the mPFC (~0.4 mm lateral of the midline; see also the histological information about the recordings sites that is now included in Figure 1 – figure supplement 1), where higher spike rates are expected compared to superficial layers. To assess whether this might have influenced the identification of putative INTs, we separately plotted the duration and asymmetry index used to classify the neurons in PYRs and putative INTs for Disc1 and control mice. This analysis yielded well separated clusters in both cases. In addition, as suggested by the reviewer, we compared the kinetic properties (spike duration and asymmetry index) and rates of PYRs, putative INTs, and optotagged PVIs. In both genotypes, ANOVA analysis followed by Tukey post-hoc testing revealed significant differences between the PYRs and both groups of INTs, both for rate (smaller in PYRs) and kinetic properties (longer spikes of PYRs) while we found no difference between putative INTs and PVIs. These results thus suggest that the method used to identify INTs works reliably. These new data are now shown in the revised Fig. 1a and the new Figure 1 – figure supplement 2 and mentioned in line 89 ff. (85 without tracked changes) of the revised manuscript.

      We agree that the number of experiments using PVI opto-tagging is low (n=3 mice per genotype, this information is now included in the main text in line 93 ff. (88 ff. without tracked changes)). However, our analysis of spike transmission probability using the population of untagged putative fast-spiking INTs revealed similar results as for the sample of optogenetically identified PVIs. We view the PVI optotagging experiment as an additional confirmation that the difference in firing rate and spike transmission did likely not arise from sampling from different INT types in Disc1 and control mice, as pointed out in line 80 (76 without tracked changes) of the revised manuscript. The limitation of the low number of PVIs in our study is critically reflected in the revised discussion in line 249 ff. (229 without tracked changes).

      3) Proportion of gamma coupled neurons. The authors mention the use of pairwise phase consistency (PPC). PPC is a good method to measure phase coupling independent of differences in firing rates. However, it is not entirely clear how PPC is used to measure the extent of phase locking. In the methods, the authors mention that they ran the PPC analysis after determining significant phase locking with Rayleigh's test. Moreover, they provide PPC values for high gamma oscillations but not for other frequency ranges. Perhaps, it would be better to test significant coupling of all units by nonrandom spike-phase distributions crossing a confidence interval, estimated by Monte Carlo methods from independent surrogate data set. These can be obtained upon randomly jittering each spike times. Indeed, PPC values estimated by the authors for high gamma are higher for PYR than INT (Fig. 1- Fig. Suppl 4 b). This is at odds with previously published observations in V1 (e.g. Perrenoud et al., PLoS Biol. 2016 PMID: 26890123). Given the existing reports of reduced excitatory transmission in DISC-1 mice, phase locking of PYR to other frequency bands might be affected.

      Following the reviewer’s suggestion we have revised our phase-coupling analysis. First, Perrenoud et al (2016) show that gamma oscillations occur in short bursts of high power. To better reflect the coupling of putative INTs to those transient gamma events, we restricted the phase-coupling analysis to epochs within the largest quintile of gamma amplitude (assessed by the envelope of the gamma-filtered signal obtained by Hilbert transformation). Second, instead of the Rayleigh test, we obtained for each unit randomized spike trains by shuffling the inter-spike intervals (500 iterations). Significant phase locking was then obtained by testing whether two consecutive bins of the phase histogram exceeded the 95th percentile of the random distribution. This analysis was performed separately for the low (20-40 Hz) and high gamma bands (60-90 Hz) for both putative INTs and PYRs. Third, the depth of phase coupling was assessed by PPC for all significantly phase-coupled neurons. While this metric is more robust against changes in spike rates than traditional measures, it is still not completely independent of it. Perrenoud et al, for instance, showed using spike sub-sampling that the reliability in estimating PPC depends on spike rate (with >1000 spikes being optimal). However, our data set of PYRs contained fewer than 1000 spikes during high gamma events (mean Disc1: 657 ± 32, mean control: 840 ± 43). To better account for the effect of rate dependence, we restricted the analysis to neurons with >250 spikes. To further limit the potential impact of different spike counts across neurons, we used random subsampling with a fixed spike number of 250 (100 iterations per cell), computed PPC in each iteration, and averaged over the PPC estimates per cell. Finally, in response to the reviewers point 1, the results of all neurons (PYR and INT separately) were then averaged for each mouse.

      Consistent with our original analysis, we found a significantly reduced proportion of phase-coupled INTs but unaltered PPC of significantly coupled INTs to the high gamma band. Moreover, we observed no significant effects for low gamma oscillations or for the phase-coupling of PYRs to either low or high gamma bands. These results are now shown in the new Fig. 4 and the new Figure 4 – figure supplement 1, and are described in line 170 ff. (154 without tracked changes) of the revised manuscript. In addition, we provide a detailed explanation of the revised phase coupling analysis, including a formal description how PPC is computed, in the Methods section of the revised manuscript in line 524 ff. (486 without tracked changes).

      Using the revised phase-coupling analysis, we observed comparable PPC values of significantly coupled PYRs (0.013) and INTs (0.014) to high gamma in control mice. While the improved analysis thus resolved the paradoxical finding of lower PPC in INTs, we did not observe weaker phase-coupling of PYRs as reported in Perrenoud et al. (2016). A possible explanation for this discrepancy might be genuine differences in gamma coupling of the PYR population between visual cortex (Perrenoud et al., 2016) and the prefrontal cortex (our study), which will require further investigation in future.

      Reviewer #3 (Public Review):

      In the present study, the authors aim to assess network activity alterations in the prefrontal cortex of mice with a deletion variant in the schizophrenia susceptibility gene DISC1 ("DISC1 mutants"). Using silicon probe in vivo recordings from the prefrontal cortex, they find that mutant mice show reduced firing rates of fast-spiking interneurons, reduced spike transmission efficacy from pyramidal cells to interneurons, and enhanced synchronization and activation of cell assemblies. The authors conclude that "interneuron pathology is linked with the abnormal coordination of pyramidal cells, which might relate to impaired cognition in schizophrenia."

      The cellular and circuit basis of psychiatric disorders has received strong interest in the recent past. In particular, alterations of the "excitation-inhibition balance" in cortical circuits has been the focus of extensive scrutiny (reviewed in pmid 22251963). Specifically, in both human samples as well as in mouse models, disruption of interneuron development and function have been implicated in the pathogenesis of schizophrenia. In the DISC1 mouse model, studies have reported disrupted interneuron development (e.g. pmid 23631734, 27244370), reduced numbers of GABAergic neurons (e.g. pmid 18945897), reduced inhibition from GABAergic neurons ex vivo (e.g. pmid 32029441), and reduced firing rates of fast-spiking neurons in vivo in the basal forebrain (pmid 34143365).

      The present manuscript makes a potentially important contribution to this question by probing the microcircuitry of the prefrontal cortex in vivo in the DISC1 mouse model of schizophrenia. It goes beyond previous work in assessing circuit dynamics in vivo in more detail, albeit with indirect methods. The experiments and analysis have generally carefully been performed, though the statistical analysis raises some questions. The advances made by the present work compared to previous studies could be delineated more clearly.

      We thank the reviewer for praising the analysis of our data ‘…have generally carefully been performed..’ and the ‘important contribution’ of our work to the field.

  2. www.janeausten.pludhlab.org www.janeausten.pludhlab.org
    1. we women never mean to have anybody. It is a thing of course among us, that every man is refused, till he offers

      See also Emma "A woman may not marry a man merely because she is asked, or because he is attached to her" (chapter 7) and Mansfield Park "I think it ought not to be set down as certain that a man must be acceptable to every woman he may happen to like himself" (Chapter 35)

    1. Author Response

      Reviewer #1 (Public Review):

      Neural stem cells express cascades of transcription factors that are important for generating the diversity of neurons in the brain of flies and mammals. In flies, nothing is known about whether the transcription factor cascades are build from direct gene regulation, e.g. factor A binding to enhancers in gene B to activate its expression. Here, Xin and Ray show that one temporal factor, Slp1/2, is regulated transcriptionally via two molecularly defined enhancers that directly bind two other transcription factors in the cascade as well as integrating Notch signaling. This is a major step forward for the field, and provides a model for subsequent studies on other temporal transcription factor cascades.

      Thanks for the positive comments!

      Reviewer #2 (Public Review):

      The manuscript addresses an important question concerning the mechanisms regulating temporal transitions in Drosophila neural progenitors called neuroblasts. Here, they concentrate on a specific transition between the transcription factors Ey and Slp1/2 that are sequentially expressed within a cascade involving at least 6 temporal transcription factors. Using a combination of new transgenes, bioinformatics and genome-wide profiling of transcription factor biding sites (Dam-ID), they functionally characterize two enhancers of the Slp1/2 genes that are active during this transition. This led to the identification of the Notch pathway as an important facilitator of the transition. They also show that Notch signaling requires cell cycle progression and that Slp1/2 is a direct target of Ey, validating the importance of transcriptional cross-regulatory interactions among the temporal transcription factors to trigger progression.

      In my opinion, the study is very interesting, representing the first careful analysis of enhancers involved in temporal transitions in neural progenitors, and leading to new insights into the mechanisms promoting temporal progression.

      Thanks for the positive comments!

      Reviewer #3 (Public Review):

      In this manuscript, the authors present data to suggest that transcriptional activation of the Slp1/2 temporal factors in the medulla neuroblasts of the developing Drosophila optic lobe is dependent on two enhancer elements. The authors concluded that these two enhancers were able to be activated by Ey and Scro, two other factors identified to be involved in the temporal cascade of the medulla NB. The authors show that cell cycle progression is necessary for Notch signaling, and that Notch signaling activates and sustains the temporal transcription factor cascade. The authors use GFP reporter assays to correlate the enhancer activity to Slp1/2 expression and used DamID to show in-vivo binding of Su(H) and Ey to the enhancer fragments.

      I agree with the authors that it is important to define the mechanisms by which Notch, cell cycle control and these temporal transcription factors function through their cis-regulatory elements to establish this self-propagating cascade to generate diverse cell types during neurogenesis. However, the findings in this study offer limited new insights toward reaching this goal for a myriad of reasons. First, studies in invertebrate and vertebrate neurogenesis have agreed on the conceptual framework that transcriptional control plays a key role in regulating the generation of diverse cell types. The data showing the patterns of slp1/2 transcript simply reaffirm the proposed model as well as recently published single-cell transcriptomic analyses of fly optic lobe neuroblasts. Second, it remains unclear how physiologically relevant the enhancer analyses presented in this study are to the regulation of Slp1/2 expression, as the data can only suggest that they act redundantly to each other. It is also troubling to see that mutating binding sites of a single transcription factor appears to completely abolish enhancer activity while Slp1/2 protein expression is delayed in mutant clonal analyses. Third, the authors do not offer any explanation for how Notch signaling contributing to the timing of Slp1/2 expression, considering that Notch signaling should be active during the entire life of the neuroblast based on canonical Notch target gene expression. What action do Ey and Scro play in this timely enhancer activation as both appear to be necessary to activate the enhancers along with Notch. Fourth, many studies including the Okamoto et al., 2016 study cited in this study have contributed to our appreciation of the role of proper cell cycle control in promoting generation of diverse neurons in vertebrate neurogenesis. It is unclear to me if findings from the current study contribute to significant advancement on this regulatory link.

      Thanks for raising these concerns. Here are our responses:

      First, we agree that there have been great advances in this field including classical studies in the ventral nerve cord, recent studies on type II lineages and medulla including our own scRNA-seq study of medulla neuroblasts. These studies have revealed the sequential expression of transcription factors in neuroblasts of different ages, and proposed that these transcription factors form a transcriptional cascade based on the cross-regulations among them. However, these cross-regulations were based on mutant phenotypes, and in most cases, the cis-regulatory elements of these TTFs have not been characterized, and it hasn’t been studied whether these cross-regulations are direct or not. Little is known about exactly how the timing of the transition is regulated and coordinated with cell-cycle control. We have addressed these questions and identified two enhancer elements for slp1/2, and demonstrated that the previous TTF Ey, another TTF Scro, and Notch signaling directly regulate slp expression. Further we demonstrated that Notch signaling is dependent on cell cycle progression in neuroblasts, and supplying Notch signaling rescues the delay in Slp expression in cell cycle mutants. We believe this study has provided important insights in this field and is another step forward.

      Second, now we provide evidence that deletion of both enhancers specifically abolished Slp1 and Slp2 expression in medulla neuroblasts.

      Regarding the concerns about binding site mutation:

      1) Ey: With loss of Ey, Slp is completely lost. The Ey binding site mutation phenotype is consistent with loss of Ey phenotype.

      2) Su(H): For the u8772 250bp enhancer, mutating all four predicted Su(H) binding sites did abolish the reporter expression. During the revision, we generated another construct, in which we mutated the two predicted Su(H) binding sites which are perfect matches to the consensus, and found that this dramatically reduced the reporter expression. For the d5778 850bp enhancer, mutation of Su(H) binding sites caused strong glial expression which prevented us to precisely assess the neuroblast expression. Thanks to the excellent advice from review #3, we used repo-Gal4 and GFP-RNAi to remove the glial expression. This approach turned out very informative. We found that mutation of four or six out of six predicted Su(H) binding sites actually did not decrease the reporter expression in neuroblasts, suggesting that Notch signaling does not active the d5778 850bp enhancer through these binding sites. However, we think this is the explanation why this enhancer drives a delayed expression comparing to the 220bp enhancer and the endogenous Slp. In addition, this also explains why with loss of Notch signaling, endogenous Slp expression is only delayed but not completely lost. This is because although the 220bp enhancer driven expression is completely lost, the d5778 850 bp enhancer still directs a delayed expression of Slp and this expression is not dependent on Notch signaling.

      3) Scro: Mutation of Scro binding sites caused a decreased expression level of the reporter, consistent with the scro RNAi phenotype on Slp, which is a decreased expression level.

      Third, regarding how Notch signaling which is active in the entire neuroblast life, can act to activate Slp expression in a specific time We tested genetic interactions between Ey, Scro, and Notch in the regulation of Slp expression. We found that with loss of Ey, supplying constitutive active Notch or Scro is not sufficient to rescue Slp expression. Thus Ey as the previous TTF, may be required to prime the slp locus, so that Notch signaling and Scro can act to further activate Slp expression. Therefore, Notch signaling requires Ey to specifically further activate Slp at the correct time. We have added these experimental results and discussion.

      Fourth, the Okamoto et al., 2016 study actually concluded that cell cycle progression is not required for the temporal progression. In their experimental setup, they supply Notch to maintain the un-differentiated status of cortical neural progenitors when they block cell cycle progression. The observed that temporal transition still happened, and they concluded that cell cycle progression is not required for temporal transitions. However, they didn’t consider the possibility that Notch signaling, which is itself dependent on cell cycle progression, actually rescued the possible phenotype caused by arrest of cell cycle progression. Our result demonstrated that in Drosophila medulla, supplying Notch signaling can rescue the delay in the transition to the Slp stage in cell-cycle arrested neuroblasts, and further showed that the mechanism is by direct transcriptional regulation. We believe that publication of our results will be informative to the vertebrate study, promoting vertebrate researchers to re-consider the role of cell cycle progression and Notch signaling in temporal progression.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2022-01528

      Corresponding author(s): Elena Taverna and Tanja Vogel

      1. General Statements [optional]

      We thank the reviewers for the comments and points they raised. We think what we have been asked is a doable task for us and we are confident we will manage to address all points in a satisfactory manner.

      2. Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Reviewer’s comment: The manuscript investigated the role of DOT1L during neurogenesis especially focusing on the earlier commitment from APs. Using tissue culture method with single-cell tracing, they found that the inhibition of DOT1L results in delamination of APs, and promotes neuronal differentiation. Furthermore, using single cell RNA-seq, they seek possible mechanisms and changes in cellular state, and found a new cellular state as a transient state. Among differentially expressed genes, they focused on microcephaly-related genes, and found possible links between epigenetic changes led by DOT1L inhibition and epigenetic inhibition by PRC2. Based on these findings, they suggested that DOT1L could regulate neural fate commitment through epigenetic regulation. Overall, it is well written and possible links from epigenetic to metabolic regulation are interesting. However, there are several issues across the manuscript.

      Response to Reviewer and planned revision:

      We thank the reviewer’s 1 for her/his comments and constructive criticism.

      We hope the revision plan will address the points raised by the reviewer in a satisfactory manner.

      Major issues:

      * *Reviewer’s comment: 1) It is not clear whether the degree of H3K79 methylation (or other histones) changes during development, and whether DOT1L is responsible for those changes. It is necessary to show the changes in histone modifications as well as the levels of DOT1L from APs to BPs and neurons, and to what extent the treatment of EPZ change the degree of histone methylation.

      Response to Reviewer and planned revision:

      • As for the level of DOT1L protein We tried several commercially available antibodies, but they do not work in the mouse, even after multiple attempts and optimization. So, unfortunately we will not be able to provide this piece of information.

      • As for the level of DOT1L mRNA We will provide info regarding the DOT1L mRNA level in APs, BPs and neurons by using scRNAseq data from E12, E14, E16 WT cerebral cortex.

      • As for the levels of H3K79methylation, we did not intend to claim that the histone methylation is responsible for the reported fate transition. We will edit the text to avoid any possible confusion. If it is deemed to be necessary to address the point raised by the reviewer, we do have 3 options, that we here in order of priority and ease of execution from our side.

      • immunofluorescence with an Ab against H3K79me2 using CON and EPZ-treated hemispheres.

      • FACS sort APs, BPs and neurons from CON and EPZ-treated hemispheres, followed by immunoblot for H3K79me2 to assess the H3K79me2 levels. As for the FACS sorting, we will use a combinatorial sorting in the lab on either a TUBB3-GFP or a GFP-reporter line using EOMES-driven mouse lines. This strategy has already been employed in the lab by Florio et al., 2015 and we will use it with minor modifications.
      • scCut&Tag for H3K79me2 from CON and EPZ-treated hemispheres. This option entails a collaboration with the Gonzalo Castelo-Branco lab in Sweden and might therefore require additional time to be established and carried out. Reviewer’s comment:

      Furthermore, the study mainly used pharmacological bath application. DOT1L has anti-mitotic effect, thus it is not clear whether the effect is coming from the inhibition of transmethylation activity.

      Response to Reviewer and planned revision:

      In a previous work we used a genetic model (DOT1L KO mouse) that showed microcephaly (Franz et al. 2019). For this study, we wanted to fill a gap in knowledge by understating if the DOT1L effect was mediated by its enzymatic activity. For this reason, we choose to use the pharmacological inhibition with EPZ, whose effect on DOT1L activity has been extensively reported and documented in literature (EPZ is a drug currently in phase clinical 3 studies).

      The stringent focus of this study on the pharmacological inhibition is thus a step toward understanding what specific roles DOT1L can play, both as scaffold or as enzyme.

      Here, we concentrate on the enzymatic function and the scaffolding function is beyond the scope of this specific study. We can further discuss and elaborate on the rationale behind this in the revised manuscript.

      Reviewer’s comment:

      In addition, the study assumed that the effect of EPZ is cell autonomous. However, if EPZ treatment can change the metabolic state in a cell, it would be possible that observed effects was non-cell autonomous. It would be important to address if this effect is coming in a cell-autonomous manner by other means using focal shRNA-KD by IUE.

      Response to Reviewer and planned revision:

      We did not claim that the effect of EPZ is cell autonomous, we are actually open on this point, as we consider both explanations to be potentially valid. We will edit the text to avoid any possible confusion on what we assume and what not.

      As a general consideration, it is entirely possible that the effects are non-cell autonomous. We will comment and elaborate on that in the revised manuscript.

      If the reviewer/journal considers this a point that must be addressed experimentally, then we will proceed as follows:

      • DOT1L shRNA-KD via in utero electroporation, followed by either
      • in situ hybridization for ASNS to check if ASNS transcript is increased upon DOT1L shRNA-KD compared to CON
      • FACS sorting of the positive electroporated cells (CON and DOT1L shRNA-KD), followed by qPCR to assess the levels of ASNS
      • If the reviewer wants us to check for a more downstream effect on fate, then we will immuno-stain the DOT1L shRNA-KD and CON with TUBB3 AB and/or TBR1 AB (as already done in the present version of the manuscript). Reviewer’s comment: 2) The possible changes in cell division and differentiation were found by very nice single-cell tracing system. However, changes in division modes occurring in targeted APs such as angles of mitotic division and the expression of mitotic markers were not addressed. These information is critical information to understand mechanisms underlying observed phenotype, delamination, differentiation and fate commitment.

      Response to Reviewer and planned revision:

      Previous effects of DOT1L manipulation on the mitotic spindle were observed in a previous paper, using DOT1L KO mouse (Franz et al. 2019). Considering that in our experiments we do use a pharmacological inhibition, we will address this point by quantifying the spindle angle in CON and EPZ-treated cortical hemispheres.

      We will co-stain for DAPI to visualize the DNA/chromosomes, and for phalloidin (filamentous actin counterstain) that allows for a precise visualization of the apical surface and of the cell contour, as it stains the cell cortex.

      Of note, the protocols we are referring to are already established in the lab, based on published work from the Huttner lab (Taverna et al, 2012; Kosodo et al, 2005).

      Reviewer’s comment: 3) The scRNA-seq analysis indicated interesting results, but was not fully clear to explain the observed results in histology. In fact, in single cell RNA-seq, the author claimed that cells in TTS are increased after EPZ treatment, which are more similar to APs. However, in histological data, they found that EPZ treatment increased neuronal differentiation. These data conflicts, thus I wonder whether "neurons" from histology data are actually neurons? Using several other markers simultaneously, it would be important to check the cellular state in histology upon the inhibition/KD of DOT1L.

      Response to Reviewer and planned revision:

      The reviewer’s comment is valid, and we indeed found that TTS cells are an intermediate state between APs and neurons in term of transcriptional profile. This is the reason why we called this cell cluster transient transcriptional state.

      We plan to address this point by staining for TBR1 and/or CTIP2 in CON and EPZ-treated hemispheres and to expand with this EOMES and SOX2 co-staining.

      Minor issues:

      Reviewer’s comment: Figure 1 - It is not clear delaminated cells are APs, BPs or some transient cells (Sox2+ Tubb3+??). It is important to use several cell type-specific and cell cycle markers simulnaneously to characterize cell-type specific identity of the analysed cells by staining. These applied to Fig1B,D,E,F,G,as well as Fig2,3.

      Response to Reviewer and planned revision:

      We will address this point by using a combinatorial staining scheme for several fate markers such as TUBB3, EOMES and SOX2, as suggested by the reviewer.

      Reviewer’s comment: - Please provide higher magnification images of labelled cells (Fig 1H)

      Response to Reviewer and planned revision:

      In the revised manuscript, we will provide higher magnification for the staining.

      Reviewer’s comment: - Please provide clarification on the criteria of Tis21-GFP+ signal thresholding.

      Response to Reviewer and planned revision:

      In the revised manuscript, we will provide a clarification on the criteria of Tis21-GFP+ signal thresholding.

      Reviewer’s comment: - Splitting the GFP signal between ventricular and abventricular does not convincingly support the "more basal and/or differentiated" states after EPZ treatment.

      Response to Reviewer and planned revision:

      We will provide a clarification regarding this point.

      Reviewer’s comment: - Please explain the presence of Tis21-GFP+ cells at the apical VZ.

      Response to Reviewer and planned revision:

      Tis21-GFP+ cells at the apical VZ has been extensively reported in the literature, since the first paper by Haubensak et al. regarding the generation of the Tis21-GFP+ line. In a nutshell, T Tis21-GFP+ cells are present throughout the VZ (therefore also in the apical portion) as neurogenic, Tis21-GFP positive cells are undergoing mitosis at the apical surface. Indeed, the presence of Tis-21 GFP signal have been extensively used by the Huttner lab and collaborators to score apical neurogenic mitosis. In addition, since AP undergo interkinetic nuclear migration, it follows that Tis21-GFP+ nuclei are going to be present throughout the entire VZ.

      In the revised manuscript, we will explain this point and cite additional literature.

      Reviewer’s comment: - Order the legends in same order as the bars.

      Response to Reviewer and planned revision:

      We will follow reviewers’ recommendation and order the legends accordingly.

      Reviewer’s comment: Figure 2 -Fig 2B) The difference between CON and EPZ apical contacts is not clear and does not match with the graph in Fig 2E.

      Response to Reviewer and planned revision:

      We will explain Fig. 2B in more detail and provide additional images in the revised manuscript.

      Reviewer’s comment: -Supp Fig 2 - are these injected slices cultured in control conditions? Please include this in the text and figure/figure legend

      Response to Reviewer and planned revision:

      In the revised manuscript, the text will be changed to address this point and provide clearer info.

      Reviewer’s comment: Fig 2C) The EPZ-treated DxA555+ cells exhibit morphological change of cell shape. Is this phenotype? please comment on the image shown for EPZ treatment panel.

      Response to Reviewer and planned revision:

      We thank the reviewer for having raised this point.

      The change in morphology might be a consequence of delamination and or of cell fate. In the revised manuscript, we will certainly better comment on this very relevant point and expand the discussion accordingly.

      Reviewer’s comment: Fig 2F - 2G) Data presented on EOMES+ and TUBB3+ % are counterintuitive. The authors claimed that TUBB3+ cells are increased and neuronal differentiation is promoted. However, no changes in EOMES+ are observed. What is the explanation? Did the author check the double positive cells? These could be TSS cells?

      Response to Reviewer and planned revision:

      We thank the reviewer to have raised this point.

      As envisioned by the reviewer, we suspect that the counterintuitive data might be due to TSS cell, which based on our scRNAseq data are expressing at the same time several cell type specific markers. It is possible that, since the treatment with EPZ is 24h long, cells (like the TTS cluster) have no time to completely eliminate the EOMES protein. If that were to be the case, then we would expect to still detect (as we indeed do) EOMES immunoreactivity.

      To address this point, we will:

      • analyze scRNA-seq data and check which is the extent of co-expression of Eomes and Tubb3 mRNAs in the TTS population.
      • Check for EOMES and TUBB3 double positive cells in the microinjection experiment. Reviewer’s comment: Figure 2 and Figure 3) the number of pairs analyzed for EPZ is twice as that of Con for comparison of the parameters taken into account. Please include n of each graph in the figure legend of the specific panel if not the same for all panels in that figure (i.e. for figure 3)

      Response to Reviewer and planned revision:

      We will revise the text accordingly.

      Reviewer’s comment:

      Figure 3) The data indicated that the number of daughter cell pairs in EPZ samples is almost double than Control. Is this the phenotype? More numbers of daughter cells in EPZ treated samples were observed from the same number of injections? or the number of injected cells were different?

      Response to Reviewer and planned revision:

      Due to technical reasons, we indeed performed a higher number of injections in EPZ-treated slices. We think this is the main reason behind the difference in number.

      If the reason were to be biological, one would expect to see the same trend in IUE experiments, but this is actually not the case. This does suggest/corroborate the idea that the reason behind the difference is mainly technical.

      Reviewer’s comment: Figure 4)

      • Please clarify if the single cell transcriptomic analysis has been performed only once, and if yes, how statistical testing to compare the cell proportion is carried out with only one batch. Fig 4G)

      Response to Reviewer and planned revision:

      As for the scRNAseq on microinjected cells:

      the scRNA-seq analysis was done once using cells pooled from 3 different microinjection experiments performed in 3 different days.

      As for the scRNAseq on IUE cells:

      The scRNA-seq analysis was done once using cells pooled from 2-3 different IUE experiments performed in 3 different days.

      For all scRNAseq experiments the statistical testing is achieved by intrasample comparisons according to established bioinformatics pipelines. We will better explain this point in the revised manuscript.

      Reviewer’s comment: Figure 4 and 5) - Figures are not supportive of the statement regarding APs' neurogenic potential upon DOT1L inhibition. TSS transcriptomic profile resembles more progenitors than neurons. Please comment on TSS neurogenic capacity taking into account the provided GO and RNAseq.

      Response to Reviewer and planned revision:

      We thank Reviewer 1 for raising this point, It is indeed true that TTS resemble more AP than neurons (as indicated in the Fig. S5B, C). We took that to indicate the fact that these cells are transient and therefore still maintain some AP features. Interestingly, TTS downregulate cell division markers, suggesting a restriction of proliferative potential, as one would expect for cells with an increased neurogenic potential. We will discuss this point in the revised manuscript.

      Reviewer’s comment: - Please provide GO analysis for APs and BPs.

      Response to Reviewer and planned revision:

      Following the reviewer’s suggestion, we will incorporate a more careful and in-depth analysis in the revised version of the manuscript.

      Reviewer’s comment: - Reconstruct figure 5A by listing genes in the same order in both Con and EPZ and prioritize EPZ-Con differences instead of cell-cell differences.

      Response to Reviewer and planned revision:

      We will revise Figure 5A based on the reviewer’s comment.

      Reviewer’s comment:

      Moreover, the presented genes in the heatmap is not the same in two conditions (i.e. NEUROG1 is present in EPZ but absent in Con). Please justify.

      Response to Reviewer and planned revision:

      This observation is based on different activities of transcription factor networks in the control and EPZ condition. They are not supposed to be the same as the cell states are altered and different TF are expressed and active upon the treatment in the diverse cell types. In a revised manuscript we will justify this point.

      Reviewer’s comment: Fig 5D)

      • Please explain why binding of EZH2 on the promoter of Asns is strongly reduced in comparison to a mild significant reduction of H3K79me/H3K27me3 in EPZ compared to Control.

      Response to Reviewer and planned revision:

      Several explanations are possible

      First, the variation can be due to batch effects.

      Second, the acute reduction of EZH2 might not be directly accompanied by a reduced histone mark, which is reduced either by cell division or by demethylases. The two processes of getting rid of the mark might be slower than the reduction of EZH2 presence at the respective site.

      Based on the reviewer’s comment, we will explain this point in the revised manuscript.

      • *

      Reviewer’s comment:

      Also is the changed directly medicated by DOT1L?

      Please test whether DOT1L can bind the promoter of Asns.

      Response to Reviewer and planned revision:

      To address this relevant issue we will proceed with the following protocol:

      • electroporate a tagged version of DOT1L into ESCs
      • select ESCs and differentiate them into NPC_48h.
      • treat NPC with DMSO (Con) or EPZ
      • harvest CON and EPZ-treated NPC
      • perform ChIP-qPCR DOT1L at the Asns promoter Reviewer’s comment: Please provide the expression patterns of DOT1L and Asns during neuronal differentiation.

      Response to Reviewer and planned revision:

      As for Dot1l

      Dot1l expression was shown in Franz et al 2019, by ISH from E12.5 to E18.5.

      As for Asns

      We will provide E14.5 in situ staining of Asns in the developing mouse brain using the Gene Paint database (see Figure below).

      We will also show immunostainings for ASNS at mid-neurogenesis, provided that Ab against ASNS works in the mouse.

      Other General comments:

      Reviewer’s comment: Please Indicate VZ, SVZ and CP on the side of the pictures/ with dot lines in the pictures both for primary figures and supplementary.

      Response to Reviewer and planned revision:

      We will revise the figures accordingly.

      Reviewer’s comment: - The Results and figures sometimes do not support the statement made by the authors

      Response to Reviewer and planned revision:

      We will carefully check on this and eliminate any overinterpretation or non-supported statements from the text.

      • Schemes are not informative/explanatory enough, i.e. time windows of treatment and sample collection, culture conditions details.

      Response to Reviewer and planned revision:

      We will revise the schemes to include more details. In particular, we plan to add a supplementary figure with a detailed visual description of the protocol, to match the detailed description presented in the materials and methods.

      Reviewer’s comment: - A more extensive characterization of TTS cells in terms of differentiation progression and integration would be enlightening

      Response to Reviewer and planned revision:

      In general, we are facing two main challenges while studying the TTS population: one is the lack of a specific marker gene for TTS, the other is the relatively small size of the TTS subpopulation.

      For these reasons, our ability to carry on an in-depth analysis of this cell state is limited.

      Considering the reviewer’s comment, in the revised manuscript we will expand the analysis ad characterization of the differentiation potential of TTS using RNA velocity trajectory.

      We can also expand the discussion on this point.

      Reviewer’s comment: - Picture quality can be improved, provide high magnification images.

      Response to Reviewer and planned revision:

      We will revise the figures to include higher magnification images.

      Reviewer #1 (Significance (Required)):

      Reviewer’s comment: The study could be important for the specific field in neural development. It aims to understand mutations in respective genes and brain malformation. If the link between epigenetic and metabolic changes is clearly shown, it will be interesting. However, the current manuscript is still rather descriptive, and clear mechanistic insights were not provided. The study have potentials and additional data will strength the value of study.

      Response to Reviewer and planned revision:

      We will address the direct impact of DOT1L and H3K79me2 on the Asns gene locus during the revision (see the rationale of the experimental strategy also in the revision plan above). We hope we will thus provide a mechanistic link between epigenetics and altered metabolome.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Reviewer’s comment: Appiah et al. present a concise manuscript that provides details and possible mechanisms of their previous work (Franz et al., 2019; Ferrari et al., 2020). The study uses diverse lines of investigation to arrive at most conclusions. However, as interesting as the data is, we find that at the present state, it is not sufficient to prove that, indeed, the asparagine metabolism is regulated by DOTL1/PRC2 crosstalk. The neurogenic shift presented in the first part of the paper is not comprehensive and, therefore, not very convincing. The quality of images provided in the main and supplementary data is less than ideal. Additional data analysis and interpretation of the scRNA seq data may be needed. The authors finally conclude with rescue experiments done in culture and in-vivo, which we believe is the stand-out part of this study. Overall the manuscript has some interesting observations that are often over-interpreted with less supporting data. The manuscript reads well but requires additional data and changes in the claims/interpretation to be suited for publication.

      Response to Reviewer and planned revision:

      In the revised manuscript, we hope we will address the comments and concerns raised by the reviewer in a satisfactory manner. Comments

      Reviewer’s comment: 1) Abstract: Is this statement correct: "DOT1L inhibition led to increased neurogenesis driven by a shift from asymmetric self-renewing to symmetric neurogenic divisions of APs. AP undergoes symmetric division for self-renewal and asymmetric neurogenic divisions.

      Response to Reviewer and planned revision:

      Based on the current literature (cit. Huttner and Kriegstein), AP undergo:

      • symmetric division for proliferative division at early stages of neurogenesis
      • asymmetric self-renewing division, generating an AP and a BP at mid neurogenesis. This division is also described as neurogenic, as it produces a BP, that is a step further than AP in term of neurogenic potential.
      • symmetric consumptive division at late neurogenesis To avoid any possible confusion, we will re-phrase the sentence to include the adjective “consumptive” and specify the composition of the progeny.

      In the revised manuscript, the sentence will read as follow:

      "DOT1L inhibition led to increased neurogenesis driven by a shift of APs from asymmetric self-renewing (generating one AP and one BP) to symmetric consumptive divisions (generating two neurons)"

      Reviewer’s comment: All the data is based on treatments with EPZ (DOTL1 inhibitor), yet no information is shown to support its targeted activity in this system. A proof of principle in the chosen experimental system is missing; for instance, examining the activity or protein level of DOTL1 and decreased methylation of the target(s) is essential.

      Response to Reviewer and planned revision:

      EPZ is a well characterized drug, that has been used previously in our lab and by others as well.

      As for our lab, the information regarding the inhibitor, its activity and efficiency in inhibiting DOT1L towards H3K79me2 was shown in Franz et al. Supplementary Fig. S6 D, E.

      In the present manuscript, an additional confirmation that EPZ targets DOT1L in regard to its H3K79me2 activity is shown in Fig. 5D.

      We would refer to this information more explicitly in a revised manuscript.

      Reviewer’s comment: 2) Figure 1: The scoring of centrosomes and cilia is insufficient to conclude delamination and increase in basal fates. The effect could be on ciliogenesis or centrosome tethering to the apical end-feet of the AP, and other possible explanations for this observation also exist. The images are too small; larger images or graphic representations could be helpful in addition to the data.

      Response to Reviewer and planned revision:

      We did not intend to claim that the change in centrosome location demonstrate delamination, but only that it suggests delamination. This criterion has been extensively used as a proxy for delamination by several labs working on the cell biology of neurogenesis, such Huttner and Gotz labs. If the issue persists, we can re-phrase in a more cautious way the text referring to Figure 1 to highlight that the data only suggest delamination.

      Response to Reviewer and planned revision:

      To make a statement regarding delamination, I would like to see either the dynamics of delamination (organotypic slices images), staining with BP markers, or morphological changes of AP (staining that will reveal loss of adherence) or comparable data to support the observation. In my opinion Supp. Figure 1 is insufficient; the single image is not convincing; I would like to see 3D reconstruction and better-quality images.

      Response to Reviewer and planned revision:

      We can certainly provide better images and co-stain with relevant markers.

      We think it is beyond the scope of the manuscript embarking in live imaging as we are not studying the dynamics of delamination per se.

      Reviewer’s comment: Tis21 data (1H), again of low quality, is only a single piece of evidence and the conclusion "suggesting that the acquisition of a basal fate was paralleled by a switch to neurogenesis" is premature. I think other cell cycle exit reporters, Fucci markers, pHis, BrdU, NeuroD, or Tbr2 reporters (Li et al., 2020, (Haydar and Sestan labs)) to name a few, are necessary to establish the conclusions. The authors should show other markers such as PAX6, EOMES, or other upper-layer markers upon cell cycle exit in the SVZ/CP. These additional experiments will assist in cell fate analysis.

      Response to Reviewer and planned revision:

      We completely understand the points raised by the reviewer, and we plan to address them by co-staining with PAX6/SOX2, PH3 and/or EOMES.

      We think establishing the Fucci or EOMES mouse system is beyond the scope of the manuscript. In addition, given the present setting of all labs involved, it would be logistically unattainable (see also comments in the section below).

      We think the co-staining scheme and plan will be informative enough to satisfactory address the concerns raised by the reviewer.

      Reviewer’s comment: 2) Figure 2: The microinjection experiments are elegant; the images, however, do not complement the experiment. The images of the microinjected cells seem not to be reconstructed from z-stacked optical slices, so often, processes are not continuous (panel B, for example); therefore, it is not clear if an apical process is indeed missing or just not seen.

      Response to Reviewer and planned revision:

      The mentioned images are reconstructed from continuous Z-stacks, as we always do given the type of data. We can provide better reconstructions and/or additional images.

      Reviewer’s comment:

      The data analysis should include other parameters; BrdU staining could have given information on cell cycle exit, PAX6, SOX2, and EOMES on the location of the cells in the VZ/sVZ. The quality of images showing EOMES and TUBB3 staining is so low that it makes the reader doubt the validity of the quantifications. "Taken together, these data suggest that the inhibition of DOT1L might favor the acquisition of a neuronal over BP cell fate" This interpretation should be subjected to more investigations. It is possible that this treatment just accelerates the AP-> BP -> Neuronal fate. The author's claim needs to be backed by additional experiments or be changed.

      Response to Reviewer and planned revision:

      To address this point, we will include in the revised manuscript staining and co-staining with PAX6, SOX2 (see also response above) and provide a BrdU labeling experiment.

      Reviewer’s comment: 3) Figure 3: The experiment concept and its performance are impressive, yet the data is insufficient. The images in A that are supposed to be representative show two cells; their location is not clear, and the expression of GFP is not clear; in fact, both pairs seem to be GFP negative (not clear what is the threshold for background). Staining with anti-GFP and a second method to follow neurogenesis is necessary.

      Response to Reviewer and planned revision:

      We did use different staining methods and schemes to follow neurogenesis. As specified above, we will deepen our analysis by using additional markers, such as TBR1.

      Reviewer’s comment: 4) On page 9, lines 8-10, the authors claim that their number of cells was "sufficient" for single-cell analysis; the numbers are Response to Reviewer and planned revision:

      In the revised manuscript, we will include the analysis of how many cells are needed to identify cluster of 6 cell types in this paradigm, based for example on the algorithms developed in Treppner et al. 2021.

      Reviewer’s comment: 5) The authors use Seurat and RaceID without their appropriate citations in the first mention during the results. The authors also stop immediately after DEG analysis along with clustering. The authors could analyze their RNA-seq data with a trajectory; to say the least, the identification/characterization of TTS and neurons as Neurons I, II, and III are insufficient. There could be multiple ways to show the "fate" of cells in the isolated FACS, which the authors have missed.

      Response to Reviewer and planned revision:

      We will include the respective citations in a revised manuscript. We provide already differentiation trajectories but will include other methods, including scVelo of FateID to extend the trajectory analyses. We kindly ask the reviewer to also refer to the comments above regarding the TTs cluster characterization as part of our effort to provide a better picture of the different clusters.

      Reviewer’s comment: 6) The authors detected candidates like Fgfr3, Nr2f1, Ofd1, and Mme as part of their treated (different approaches) datasets (from their DEG analysis). They correctly cite Huang et al., 2020 but fail to give us a sense of the consequences of these gene dysregulations. The authors can also validate if these proteins are expressed in their treated cells.

      Response to Reviewer and planned revision:

      In the revised manuscript we will comment on the function of the four genes mentioned.

      In addition, we will validate the expression of these genes on protein and transcriptional level through immunostainings -provided that antibodies are working in our system- or smFISH, respectively.

      Reviewer’s comment: 7) The authors list a few GO terms (page 10, lines 1-10) and associate them with reduced proliferation; they must cite relevant studies. The authors can also add supplementary data showing which genes in their data correspond to these GO terms.

      Response to Reviewer and planned revision:

      We thank the reviewer for pointing out the missing citations.

      We of course agree on the need to add them, and we will do so in the revised manuscript.

      Reviewer’s comment: 8) On Page 11, lines 3-7, the authors describe their method to arrive at the 17 targets with TF activity from the previous analysis. Can the authors describe the method used to correlate the two? The reviewer understands this could be MEME analysis or analysis of earlier datasets of Ferrari et al. 2020. But it must be explicitly stated, and a few examples in supplementary need to be exemplified as this analysis is key to discovering the three metabolic genes.

      Response to Reviewer and planned revision:

      In the revised manuscript, we will clarify the exact analysis that resulted in the identification of the 17 target genes, using the specific tool for gene network analysis, that is based on our scRNA-seq data alone, but not on the Ferrari et al 2020 data set.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      n/a

      4. Description of analyses that authors prefer not to carry out

      Reviewer’s comment: Tis21 data (1H), again of low quality, is only a single piece of evidence and the conclusion "suggesting that the acquisition of a basal fate was paralleled by a switch to neurogenesis" is premature. I think other cell cycle exit reporters, Fucci markers, pHis, BrdU, NeuroD, or Tbr2 reporters (Li et al., 2020, (Haydar and Sestan labs)) to name a few, are necessary to establish the conclusions. The authors should show other markers such as PAX6, EOMES, or other upper-layer markers upon cell cycle exit in the SVZ/CP. These additional experiments will assist in cell fate analysis.

      Response to Reviewer and planned revision:

      As pointed out above, we think establishing the Fucci or EOMES mice system is beyond the scope of the manuscript as it will not provide more information than the ones we will obtain from systematic and extensive co-staining experiments. In addition, all labs involved are facing a logistic issue (animal house not ready yet, construction works etc) that made the importing and setting up of the colony unattainable for the next 6-10months. If the reviewer and/or the editorial board think this is a major point compromising the entire revision, we kindly ask to contact us again so that we can discuss the issue and arrive to a shared conclusion.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Appiah et al. present a concise manuscript that provides details and possible mechanisms of their previous work (Franz et al., 2019; Ferrari et al., 2020). The study uses diverse lines of investigation to arrive at most conclusions. However, as interesting as the data is, we find that at the present state, it is not sufficient to prove that, indeed, the asparagine metabolism is regulated by DOTL1/PRC2 crosstalk. The neurogenic shift presented in the first part of the paper is not comprehensive and, therefore, not very convincing. The quality of images provided in the main and supplementary data is less than ideal. Additional data analysis and interpretation of the scRNA seq data may be needed. The authors finally conclude with rescue experiments done in culture and in-vivo, which we believe is the stand-out part of this study. Overall the manuscript has some interesting observations that are often over-interpreted with less supporting data. The manuscript reads well but requires additional data and changes in the claims/interpretation to be suited for publication.

      Comments

      1. Abstract: Is this statement correct: "DOT1L inhibition led to increased neurogenesis driven by a shift from asymmetric self-renewing to symmetric neurogenic divisions of APs". AP undergoes symmetric division for self-renewal and asymmetric neurogenic divisions.

      All the data is based on treatments with EPZ (DOTL1 inhibitor), yet no information is shown to support its targeted activity in this system. A proof of principle in the chosen experimental system is missing; for instance, examining the activity or protein level of DOTL1 and decreased methylation of the target(s) is essential. <br /> 2. Figure 1: The scoring of centrosomes and cilia is insufficient to conclude delamination and increase in basal fates. The effect could be on ciliogenesis or centrosome tethering to the apical end-feet of the AP, and other possible explanations for this observation also exist. The images are too small; larger images or graphic representations could be helpful in addition to the data.

      To make a statement regarding delamination, I would like to see either the dynamics of delamination (organotypic slices images), staining with BP markers, or morphological changes of AP (staining that will reveal loss of adherence) or comparable data to support the observation. In my opinion Supp. Figure 1 is insufficient; the single image is not convincing; I would like to see 3D reconstruction and better quality images.

      Tis21 data (1H), again of low quality, is only a single piece of evidence and the conclusion "suggesting that the acquisition of a basal fate was paralleled by a switch to neurogenesis" is premature. I think other cell cycle exit reporters, Fucci markers, pHis, BrdU, NeuroD, or Tbr2 reporters (Li et al., 2020, (Haydar and Sestan labs)) to name a few, are necessary to establish the conclusions. The authors should show other markers such as PAX6, EOMES, or other upper-layer markers upon cell cycle exit in the SVZ/CP. These additional experiments will assist in cell fate analysis. 2. Figure 2: The microinjection experiments are elegant; the images, however, do not complement the experiment. The images of the microinjected cells seem not to be reconstructed from z-stacked optical slices, so often, processes are not continuous (panel B, for example); therefore, it is not clear if an apical process is indeed missing or just not seen. The data analysis should include other parameters; BrdU staining could have given information on cell cycle exit, PAX6, SOX2, and EOMES on the location of the cells in the VZ/sVZ. The quality of images showing EOMES and TUBB3 staining is so low that it makes the reader doubt the validity of the quantifications. <br /> "Taken together, these data suggest that the inhibition of DOT1L might favor the acquisition of a neuronal over BP cell fate" This interpretation should be subjected to more investigations. It is possible that this treatment just accelerates the AP-> BP -> Neuronal fate. The author's claim needs to be backed by additional experiments or be changed. 3. Figure 3: The experiment concept and its performance are impressive, yet the data is insufficient. The images in A that are supposed to be representative show two cells; their location is not clear, and the expression of GFP is not clear; in fact, both pairs seem to be GFP negative (not clear what is the threshold for background). Staining with anti-GFP and a second method to follow neurogenesis is necessary. 4. On page 9, lines 8-10, the authors claim that their number of cells was "sufficient" for single-cell analysis; the numbers are <500 for all samples. The authors need to justify this statement or articles that carefully analyze the number required for such a conclusion as references. 5. The authors use Seurat and RaceID without their appropriate citations in the first mention during the results. The authors also stop immediately after DEG analysis along with clustering. The authors could analyze their RNA-seq data with a trajectory; to say the least, the identification/characterization of TTS and neurons as Neurons I, II, and III are insufficient. There could be multiple ways to show the "fate" of cells in the isolated FACS, which the authors have missed. 6. The authors detected candidates like Fgfr3, Nr2f1, Ofd1, and Mme as part of their treated (different approaches) datasets (from their DEG analysis). They correctly cite Huang et al., 2020 but fail to give us a sense of the consequences of these gene dysregulations. The authors can also validate if these proteins are expressed in their treated cells. 7. The authors list a few GO terms (page 10, lines 1-10) and associate them with reduced proliferation; they must cite relevant studies. The authors can also add supplementary data showing which genes in their data correspond to these GO terms. 8. On Page 11, lines 3-7, the authors describe their method to arrive at the 17 targets with TF activity from the previous analysis. Can the authors describe the method used to correlate the two? The reviewer understands this could be MEME analysis or analysis of earlier datasets of Ferrari et al. 2020. But it must be explicitly stated, and a few examples in supplementary need to be exemplified as this analysis is key to discovering the three metabolic genes.

      Significance

      Appiah et al. present a concise manuscript that provides details and possible mechanisms of their previous work (Franz et al., 2019; Ferrari et al., 2020). The study uses diverse lines of investigation to arrive at most conclusions. However, as interesting as the data is, we find that at the present state, it is not sufficient to prove that, indeed, the asparagine metabolism is regulated by DOTL1/PRC2 crosstalk. The neurogenic shift presented in the first part of the paper is not comprehensive and, therefore, not very convincing. The quality of images provided in the main and supplementary data is less than ideal. Additional data analysis and interpretation of the scRNA seq data may be needed. The authors finally conclude with rescue experiments done in culture and in-vivo, which we believe is the stand-out part of this study.

      Overall the manuscript has some interesting observations that are often over-interpreted with less supporting data. The manuscript reads well but requires additional data and changes in the claims/interpretation to be suited for publication.

    1. I like to think of thoughts as streaming information, so I don’t need to tag and categorize them as we do with batched data. Instead, using time as an index and sticky notes to mark slices of info solves most of my use cases. Graph notebooks like Obsidian think of information as batched data. So you have a set of notes (samples) that you try to aggregate, categorize, and connect. Sure there’s a use case for that: I can’t imagine a company wiki presented as streaming info! But I don’t think it aids me in how I usually think. When thinking with pen and paper, I prefer managing streamed information first, then converting it into batched information later— a blog post, documentation, etc.

      There's an interesting dichotomy between streaming information and batched data here, but it isn't well delineated and doesn't add much to the discussion as a result. Perhaps distilling it down may help? There's a kernel of something useful here, but it isn't immediately apparent.

      Relation to stock and flow or the idea of the garden and the stream?

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this study, the authors consider the problem of inferring transcription dynamics from smFISH data. They distinguish between two important experimental situations. The first one considers measurements of mature mRNAs, while the second one considers measurements of nascent mRNA through fluorescent probes targeting PP7 stem loops. The former problem has been previously dealt with extensively, but less work has been done on the context of the latter. The inference approaches are based on maximum likelihood estimation, from which point estimates for promoter-switching and transcription rates are obtained. The study focuses on steady state measurements only. The authors perform several analyses using synthetic data to understand the limitations of both approaches. They find that inference from nascent mRNA is more reliable than inference from mature mRNA distributions. Moreover, they show that accounting for different cell-cycle stages (G1 vs G2) is important and that pooling measurements across the cell-cycle can lead to quantitatively and even qualitatively different inferences. Both approaches are then used to analyze transcription in an experimental system in yeast, for which they find evidence of gene dosage compensation. I consider this an interesting and relevant study, which will appeal to the systems- and computational biology community. The paper is well written and the (computational) methods are described in detail. The experimental description is quite minimal and could profit from further details / explanations. I have several technical criticisms and questions, which I believe should be addressed before publication. Since I am a theorist, I will comment predominantly on the statistical / computational aspects.

      Major comments/questions:

      -A key reference that is missing is Fritzsch et al. Mol Syst Biol (2018). In this work, the authors have used nascent mRNA distributions and autocorrelations (obtained from live-imaging) to infer promoter- and transcription dynamics. I believe this work should be appropriately cited and discussed.

      Synthetic case study:

      -Inference and point estimates. The authors use a maximum-likelihood framework to extract point estimates of the parameters. Subsequently, relative absolute differences are used to assess the accuracy of the inference. However, as far as I have understood, this is performed for only a single simulated dataset, for each considered parameter configuration. The resulting metric, however, does not really capture the inference accuracy, since it is based on a single (random) realization of the MLE. I would recommend to at least repeat the inference multiple times for different realizations of the simulated dataset (per parameter configuration) to get a better feeling of the distribution of the MLE (e.g., its bias / variance). Alternatively, identifiability analyses based on the Fisher information could be performed for (some of) the different parameter configurations although this may be computationally more demanding.

      -It would be useful to include confidence intervals based on profile likelihoods also for the synthetic case study, in particular for the 6 reported datasets. I would also find it helpful to see comprehensive profile likelihood plots for the key results / parameter inferences in the supplement. This would also provide useful insights into the identifiability of the parameters.

      Experimental case study:

      -Validation against live-cell data. In the simulation of the autocorrelation function, what was the ratio of cells initialized in G1 / G2, respectively? I'd expect this to have direct influence on the simulated ACF. Moreover, a linear fit is used to correct for "non-stationary effects" in the ACF that supposedly stem from cell-cycle dynamics. First, I don't think this terminology is really accurate, since non-stationarity would lead to an ACF that depends on two parameters (tau_1 and tau_2). I suppose the goal of the linear correction is to remove slow / static population heterogeneity? If yes, wouldn't it be easier / more direct to also change the simulations to non-synchronized cell-cycles? In this case, they should also display the very slow / static components as displayed in the data, which would eliminate the need for the post-hoc correction. I was also wondering whether other statistics (e.g., mean, variance, distributions) match between the simulations and the live-cell experiment? This could provide further validation of the inferred parameters.

      -If I understood correctly, the signal intensity of the measured transcription spot is normalized by the median cytoplasmic spot brightness. Since the normalized intensity of a single complete transcript is 1, the cumulative intensity should give a lower bound on the nascent mRNAs. The histograms in Fig. 4b show intensity values in the range of 30, which would mean that at least 30 transcripts contribute to the transcription spot. The total number of nucleoplasmic and cytoplasmic mRNA, however, is in the range of 10 (Fig. 3a). I am probably missing something but how can we reconcile these numbers? The authors mention that the brightest spot just counts for one transcript, but argue that this has negligible influence on mature RNA counts. Could this be a possible explanation for the mismatch?

      Minor comments:

      -In the experimental case study, the authors argue that the "correct" inference result is the one that accounts for cell-cycle stage, while the other one termed "incorrect". I find this terminology too strong, since every estimate is subject to uncertainty.

      -Page 2: "... in a asynchronous population" -> "... in an asynchronous population"

      -Page 7: "...parameters sets 3 and 4" -> "...parameter sets 3 and 4"

      -Figures 5a and 6a: parameter names and units should go on the y-axis.

      Significance

      Quantifying kinetic parameters from incomplete and noisy experimental data is a core problem in systems biology. I therefore consider this manuscript to be very relevant to this field. The contribution of this manuscript is largely methodological, although its potential usefulness is demonstrated using experimental data in yeast.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In recent years, the field has investigated crosstalk between cGMP and cAMP signaling (PMID: 29030485), lipid and cGMP signaling (PMID: 30742070), and calcium and cGMP signaling (PMID: 26933036, 26933037). In contrast to the Plasmodium field, which has benefited from proteomic experiments (ex: PMID 24594931, 26149123, 31075098, 30794532), second messenger crosstalk in T. gondii has been probed predominantly through genetic and pharmacological perturbations. The present manuscript compares the features of A23187- and BIPPO-stimulated phosphoproteomes at a snapshot in time. This is similar to a dataset generated by two of the authors in 2014 (PMID: 24945436), except that it now includes one BIPPO timepoint. The sub-min​​ute phosphoproteomic timecourse following A23187 treatment in WT and ∆cdpk3 parasites is novel and would seem like a useful resource.

      CDPK3-dependent sites were detected on adenylate cyclase, PI-PLC, guanylate cyclase, PDE1, and DGK1. This motivated study of lipid and cNMP levels following A23187 treatment. The four PDEs determined to have A23187-dependent phosphosites were characterized, including the two PDEs with CDPK3-dependent phosphorylation, which were found to be cGMP-specific. However, cGMP levels do not seem to differ in a CDPK3- or A23187-dependent manner. Instead, cAMP levels are elevated in ∆cdpk3 parasites. This would seem to implicate a feedback loop between CDPK3, the adenylyl cyclase, and PKA/PKG: CDPK3 activity reduces adenylyl cyclase activity, which reduces PKA activity, which increases PKG activity. The authors don't pursue this direction, and instead characterize PDE2, which does not have CDPK3-dependent phosphosites, and seems out of place in the study

      Response:

      We agree with reviewer 1 that a feedback loop between CDPK3, the adenylyl cyclase and PKA/PKG is certainly one of several possibilities (and we acknowledge this in the manuscript).

      We felt, however, that given the observation that A23187 and BIPPO treatment leads to phosphorylation of numerous PDEs (hinting at the presence of an Ca2+-regulated feedback loop), it was entirely relevant to study these in greater detail. Coupled with the A23187 egress assay on ΔPDE2 parasites - our findings suggest that PDE2 plays an important role in this signalling loop (an entirely novel finding). While PDE2 appears to exert its effects in a CDPK3-independent manner (indeed suggesting that CDPK3 might exert its effects on cAMP levels in a different fashion), this does not detract from the important finding that PDE2 is one of the (likely numerous) components that is regulated in a Ca2+-dependent feedback loop to regulate egress.

      We have modified our writing to better reflect the fact that our decision to pursue study of the PDEs was not solely CDPK3-centric.

      While we feel that our reasoning for studying the PDEs is solid, we appreciate that further clarification on the putative CDPK3-Adenylate cyclase link would make it easier for the reader to follow the rationale.

      We have not studied the direct link between CDPK3 and the Adenylate Cyclase β in more detail, as ACβ alone was shown to not play a major role in regulating lytic growth (Jia et al., 2017).

      **MAJOR COMMENTS**

      1.Some of the key conclusions are not convincing.

      The data presented in Figure 6E, F, and G and discussed in lines 647-679 are incongruent. In Figure 6E, the plaques in the PDE2+RAP image are hardly visible; how can it be that the plaques were accurately counted and determined not to differ from vehicle-treated parasites?

      Are the images in 6E truly representative? Was the order of PDE1 and PDE2 switched? The cited publication by Moss et al. 2021 (preprint) is not in agreement with this study, as stated. That preprint determined that parasites depleted of PDE2 had significantly reduced plaque number and plaque size (>95% reduction); and parasites depleted of PDE1 had a substantially reduced plaque size but a less substantial reduction in plaque number.

      Response:

      The plaques for PDE2+RAP were counted using a microscope since they are difficult to see by eye. We thank the reviewer for detecting our incorrect reference to Moss et al. (2021). This has been corrected in the text. We confirm, however, that the images in 6E are representative of what we observed and do indeed differ from what was seen by Moss et al.. We have acknowledged this clearly in the text.

      The differences cannot easily be explained other than by the different genetic systems used. Further studies of the individual PDEs will likely illuminate their role in invasion/ growth, but we feel this would be beyond the scope of this study.

      Unfortunately, the length of time required for PDE depletion (72h) is incompatible with most T. gondii cellular assays (typically performed within one lytic cycle, 40-48h). Although the authors performed the assays 3 days after initial RAP treatment, is there evidence that non-excised parasites don't grow out of the population. This should be straightforward to test: treat, wait 3 days, infect onto monolayers, wait 24-48h fix, and stain with anti-YFP and an anti-Toxoplasma counterstain. The proportion of the parasite population that had excised the PDE at the time of the cellular assays will then be known, and the reader will have a sense of how complete the observed phenotypes are. As a reader, I will regard the phenotypes with some level of skepticism due to the long depletion time, especially since a panel of PDE rapid knockdown strains (depletion in __Response:

      1. Cellular assays using KO parasites are commonly performed at the point at which protein depletion is detected. Both our western blots and plaque assay results demonstrate that, at the point of assay, there is no substantial outgrowth of non-excised parasites. The original manuscript also includes PCRs performed at the 72 hr time point (See Fig. 6B) to support this.
      2. We appreciate the reviewer’s comment re the panel of PDE KD strains. The reviewer notes that there are substantial limitations to conditional KO systems, which similarly applies to KD systems - there are notable pros and cons to each approach. When designing our strategy (pre-publication of the Moss et al., 2022), we made a deliberate decision to use conditional KO strains in light of the fact that residual protein levels in KD systems can cause significant problems, particularly for membrane proteins (all of the investigated PDEs have a transmembrane domain). Tagging of proteins with the degradation domain can have further issues, leading to protein mis-localisation, which we have experienced with several unrelated proteins in the lab.

        The authors should qualify some of their claims as preliminary or speculative, or remove them altogether.

      The claims in lines 240-260 are confusing. It seems likely that the two drug treatments have at least topological distinctions in the signaling modules, given that cGMP-triggered calcium release is thought to occur at internal stores, whereas A23187-mediated calcium influx likely occurs first at the parasite plasma membrane.The authors' proposed alternative, that treatment-specific phosphosite behavior arises from experimental limitations and "mis-alignment", is unsatisfying for the following reasons: (1) From the outset, the authors chose different time frames to compare the two treatments (15s for BIPPO vs. 50s for A23187); (2) the experiment comprises a single time point, so it does not seem appropriate to compare the kinetics of phosphoregulation. There is still value in pointing out which phosphosites appear treatment-specific under the chosen thresholds, but further claims on the basis of this single-timepoint experiment are too speculative. Lines 264-267 and 281-284 should also be tempered.

      Relatedly, graphing of the data in Figure 1G (accompanying the main text mentioned above) was confusing. Why is one axis a ratio, and the other log10 intensity? What does log10 intensity tell you without reference to the DMSO intensity? Wouldn't you want the L2FC(A23187) vs. L2FC(BIPPO) comparisons? Could you use different point colors to highlight these cases on plot 1E? Additionally, could you use a pseudocount to include peptides only identified in one treatment condition on the plot in 1E? (Especially since these sites are mentioned in lines 272-278 but are not on the plot)

      Response:

      1. The kinetics of the responses to A23187 and BIPPO are very different. This is why treatment timings are purposely different as they were selected to align pathways to a point where calcium levels peak just prior to calcium re-uptake. We make no mention of kinetic comparisons, and merely demonstrate that at the chosen timepoints, overall signalling correlation is very high. The observation that most of the sites that behave differently between conditions sit remarkably close to the threshold for differential regulation (in the treatment condition where they are not DR - see Fig. 1G) led us to speculate that many of these sites are likely on the cusp of differential regulation. While it is entirely possible that some of these differences are, in fact, treatment specific (and we clearly acknowledge this in the text), we simply state that we cannot confidently discern clear signalling features that allow us to distinguish between the two treatments. We feel that this is an entirely relevant observation given the observed preponderance of both A23187 and BIPPO-dependent DR phosphosites on proteins in the PKG signalling pathway (as current models place this upstream of Ca2+release).
      2. Log10 intensity only serves to spread the data for easier visualisation. The only comparison being made relates to the LFCs. Fig. 1Gi shows the LFC scores (x axis) for all sites regulated following A23187 treatment (for which peptides were also identified in BIPPO treatment). On this plot we have highlighted the sites that are differentially regulated following BIPPO but not A23187 treatment (with red showing the DRup and blue showing the DRdown sites). This demonstrates that many of the sites that are regulated following BIPPO but not A23187 treatment cluster close to the threshold for differential regulation in the A23187 dataset - suggesting that many of these sites are likely on the cusp of differential regulation. Fig. 1Gii shows the reverse. While we could highlight the above-mentioned sites on the plot in Fig. 1E, we do not feel that it would demonstrate our point as clearly.

      We feel that including a pseudocount on Fig. 1E for peptides lacking quantification in one treatment condition would be visually misleading as the direct correlation being made in Fig. 1E is BIPPO vs A23187 treatment. The sites mentioned in lines 272-278 in the original manuscript (now lines 268-276) are available in the supplement tables.

      3.Additional experiments would be essential to support the main claims of the paper.

      Genetic validation is necessary for the experiments performed with the PKA inhibitor H89. H89 is nonspecific even in mammalian systems (PMID: 18523239) and in this manuscript it was used at a high concentration (50 µM) The heterodimeric architecture of PKA in apicomplexans dramatically differs from the heterotetrameric enzymes characterized in metazoans (PMID: 29263246), so we don't know what the IC50 of the inhibitor is, or whether it inhibits competitively. Two inducible knockdown strains exist for PKA C1 (PMID: 29030485, 30208022). The authors could request one of these strains and construct a ∆cdpk3 in that genetic background, as was done for the PDE2 cKO strain. Estimated time: 3-4 weeks to generate strain, 2 weeks to repeat assays.

      Response:

      1. While we appreciate that H89 is not 100% specific for PKA, this is not our only line of evidence that cAMP levels are altered. We demonstrate that cAMP levels are elevated in CDPK3 KO parasites – further substantiating our finding.

      The H89 concentration used in our experiment is in keeping with/lower than the concentrations used in other Toxoplasma publications (Jia et al., 2017), and both the Toxoplasma and Plasmodium fields have shown convincingly that H89 treatment phenocopies cKD/cKO of PKA (see Jia et al., 2017; Flueck et al., 2019).

      While we agree that the genetic validation suggested by reviewer 1 would serve to further support our findings (though it would not provide further novel insights), the suggested time frame for experimental execution was not realistic. Line shipment, strain generation, subcloning and genetic validation would take substantially longer than 3-4 weeks.

      cGMP levels are found to not increase with A23187 treatment, which is at odds with a previous study (lines 524-560). The text proposes that the differences could arise from the choice of buffer: this study used an intracellular-like Endo buffer (no added calcium, high potassium), whereas Stewart et al. 2017 used an extracellular-like buffer (DMEM, which also contains mM calcium and low potassium). An alternative explanation is that 60 s of A23187 treatment does not achieve a comparable amount of calcium flux as 15 s of BIPPO treatment, and a calcium-dependent effect on cGMP levels, were it to exist, could not be observed at the final timepoint in the assay. The experiments used to determine the kinetics of calcium flux following BIPPO and A23187 treatments (Fig. 1B, C) were calibrated using Ringer's buffer, which is more similar to an extracellular buffer (mM calcium, low potassium). In this buffer, A23187 treatment would likely stimulate calcium entry from across the parasite plasma membrane, as well as across the membranes of parasite intracellular calcium stores. By contrast, A23187 treatment in Endo buffer (low calcium) would likely only stimulate calcium release from intracellular stores, not calcium entry, since the calcium concentration outside of the parasite is low. Because calcium entry no longer contributes to calcium flux arising from A23187 treatment, it is possible that the calcium fluxes of A23187-treated parasites at 60 s are "behind" BIPPO-treated parasites at 15 s. The researchers could control these experiments by *either* (i) performing the cNMP measurements on parasites resuspended in the same buffer used in Figure 1B, C (Ringer's) or (ii) measuring calcium flux of extracellular parasites in Endo buffer with BIPPO and A23187 to determine the "alignment" of calcium levels, as was done with intracellular parasites in Figure 1C. No new strains would have to be generated and the assays have already been established in the manuscript. Estimated time to perform control experiments with replicates: 2 weeks. This seems like an important control, because the interpretation of this experiment shifts the focus of the paper from feedback between calcium and cGMP signaling, which had motivated the initial phosphoproteomics comparisons, to calcium and cAMP signaling. Further, the lipidomics experiments were performed in an extracellular-like buffer, DMEM, so it's unclear why dramatically different buffers were used for the lipidomics and cNMP measurements.

      Response:

      While the initial calibration experiments to measure calcium flux were indeed performed in Ringer’s buffer, the parasites were intracellular. We therefore chose to measure cNMP concentrations of extracellular parasites syringe lysed in Endo buffer, which is better at mimicking intracellular conditions than any other described buffer.

      As the reviewer suggested, we measured the calcium flux of extracellular parasites in Endo buffer upon stimulation with either A23187 or BIPPO.

      We found that peak calcium response to BIPPO in Endo buffer was similar to that of intracellular parasites (~15 seconds post treatment) (See Supp Fig. 6A). Upon treatment with A23187, extracellular parasites in Endo buffer had a much faster response compared to their intracellular counterparts, with peak flux measured at ~25 seconds post treatment (see Supp Fig. 6B). This indeed does suggest that extracellular parasites in Endo buffer behave differently to A23187 compared to their intracellular counterparts. However, peak calcium response is still occuring within the experimental time course and is not being missed, as the reviewer worries. Moreover, since we are able to detect increased cAMP levels in A23187 treated parasites, Ca2+ flux appears sufficient to alter cNMP signalling.

      We did notice however that the intensity of the calcium flux was much weaker in Endo buffer compared to intracellular parasites (see Supp Fig. 6B). We found that this was due to the lack of host-derived Ca2+, since supplementation of Endo buffer with 1 uM CaCl2 restored the intensity of the calcium response to match that of intracellular parasites (see Supp Fig. 6C). We therefore decided to repeat our cGMP measurements, this time using extracellular parasites in Endo buffer supplemented with 1 uM CaCl2. However, we found no differences in cGMP levels in the response to ionophore under these conditions (now Supp Fig. 6D) compared to the previous experiments, so the conclusions from the previous data do not change.

      As for the lipidomics experiments, we chose to use DMEM so that our dataset could be compared with other published lipidomic datasets (Katris et al., 2020; Dass et al., 2021) where DMEM was also used as a buffer when measuring global lipid profiles of parasites.

      We now acknowledge in the paper that Endo buffer has its shortcomings, and that this could be the reason why we do not detect changes in cGMP concentrations. We do, however, believe that Endo buffer is the best alternative to intracellular parasites and is supported by its consistent use in numerous publications studying Toxoplasma signalling (McCoy et al., 2012; Stewart et al., 2017).

      Additional information is required to support the claim that PDE2 has a moderate egress defect (lines 681-687). T. gondii egress is MOI-dependent (PMID: 29030485). Although the parasite strains were used at the same MOI, there is no guarantee that the parasites successfully invaded and replicated. If parasites lacking PDE2 are defective in invasion or replication, the MOI is effectively decreased, which could explain the egress delay. Could the authors compare the MOIs (number of vacuoles per host cell nuclei) of the vehicle and RAP-treated parasites at t = 0 treatment duration to give the reader a sense of whether the MOIs are comparable?

      Response:

      Since PDE2 KO parasites have a substantial growth defect, we did notice that starting MOIs were consistently lower for the RAP-treated samples compared to the DMSO-treated samples. However, this was also the case for PDE1 KO parasites where we did not see an egress delay. We also found that the egress delay was still evident for ∆CDPK3 parasites, despite having higher starting MOIs than WT parasites in our experiments. Therefore there does not appear to be a link between starting MOIs and the egress delay.

      To be sure of our results, we also performed egress assays where we co-infected HFFs with mCherry-expressing WT parasites (WT ∆UPRT) and GFP-expressing PDE2 cKO parasites that were treated with either DMSO or RAP or ∆CDPK3 parasites. This recapitulated our previous findings, confirming the deletion of PDE2 leads to delay in A23187-mediated egress.

      4.A few references are missing to ensure reproducibility.

      The manuscript states that the kinetic lipidomics experiments were performed with established methods, but the cited publication (line 497) is a preprint. These are therefore not peer reviewed and should be described in greater detail in this manuscript, including any relevant validation.

      Response:

      We thank the reviewer for pointing this out. We have included a greater description of the methods used in the materials and methods section such that the experiment is reproducible, as per the reviewer’s suggestion. We decided to still make mention of the BioRxiv preprint since we thought it was appropriate for the reader to be informed of ongoing developments in the field.

      Please cite the release of the T. gondii proteomes used for spectrum matching (lines 972-973).

      Response:

      We have included this as per the reviewer’s suggestion.

      Please include the TMT labeling scheme so the analysis may be reproduced from the raw files.

      Response:

      We have included this as per the reviewer’s suggestion in Supp Fig. 3A.

      5.Statistical analyses should be reviewed as follows:

      Have the authors examined the possibility that some changes in phosphopeptide abundance reflect changes in protein abundance? This may be particularly relevant for comparisons involving the ∆cdpk3 strain. Did the authors collect paired unenriched proteomes from the experiments performed? Alternatively, there may be enriched peptides that did not change in abundance for many of the proteins that appear dynamically phosphorylated.

      Response:

      We did not collect unenriched proteomes from the experiments performed (although we did perform unenriched mixing checks to ensure equal loading between samples), and believe that this wasn’t a necessity for the following reasons:

      1. For within-line treatment analyses, treatment timings are so short (a maximum of 15-50s in the single timepoint experiment) that it would be unlikely to detect substantial changes in protein abundance. Moreover, these unlikely events would affect all phosphosites across a protein, and therefore be detectable.

      In our CDPK3 dependency timecourse experiments, we normalise both the WT and ∆CDPK3 strain to 0s, and measure signalling progression over time. Therefore, any difference at timepoints that are not “0” are not originating from basal differences. We also see a consistent increase/decrease in phosphosite detection across the sub-minute timecourse, further confirming that the observed changes are truly down to dynamic changes in phosphorylation and not protein levels.

      In the single timepoint CDPK3 dependency analyses (44 regulated sites identified, Data S2), we acknowledge that there could be some risk of altered starting protein abundance between lines. However, if protein abundance were responsible for the changes in phosphosite detection, we would expect all phosphosites across the protein to shift, and we do not observe this. Moreover, when we look at these CDPK3 dependent proteins and compare their phosphosite abundance in untreated WT and ∆CDPK3 lines, we find that for each protein, either all or the majority of phosphosites detected are unchanged (highlighting that there is no substantial difference in this protein’s abundance between lines). Where there are phosphosite differences between lines, these are only ever on single sites on a protein while most other sites are unchanged - implying that these are changes to basal phosphorylation states and not protein levels.

      It seems like for Figs. 3B and S5 the maximum number of clusters modeled was selected. Could the authors provide a rationale for the number of clusters selected, since it appears many of the clusters have similar profiles.

      The number of clusters is chosen automatically by the Mclust algorithm as the value that maximizes the Bayes Information Criterion (BIC). BIC in effect balances gains in model fit (increasing log-likelihood) against increasing the number of parameters (i.e. number of clusters).

      Please include figure panel(s) relating to gene ontology. Relevant information for readers to make conclusions includes p-value, fold-enrichment or gene ratio, and some sort of metric of the frequency of the GO term in the surveyed data set. See PMID: 33053376 Fig. 7 and PMID: 29724925 Fig. 6 for examples or enrichment summaries. Additionally, in the methods, specify (i) the background set, (ii) the method used for multiple test correction, (iii) the criteria constituting "enrichment", (iv) how the T. gondii genome was integrated into the analysis, (v) the class of GO terms (molecular function, biological process, or cellular component), (vi) any additional information required to reproduce the results (for example, settings modified from default).

      Response:

      We have included the additional information requested in the materials and methods.

      We purposely did not include GO figure panels as our analyses are being done across many clusters, making it very difficult to display this information cohesively. We have included all data in Tables S2-S5. These tables included all the relevant information on p-value, enrichment status, ratio in study/ratio in population, class of GO terms etc.

      The presentation of the lipidomics experiments in Figure 4A-C is confusing. First, the ∆cdpk3/WT ratio removes information about the process in WT parasites, and it's unclear why the scale centers on 100 and not 1. Second, the data in Figure S6 suggests a more modest effect than that represented in Fig. 4; is this due to day to day variability? How do the authors justify pairing WT and mutant samples as they did to generate the ratios?

      Response:

      This is a common strategy used by many metabolomics experts (Bailey et al., 2015; Dass et al., 2021; Lunghi et al., 2022). We had originally chosen to represent the data as a ratio since this form of representation helps get rid of the variability that arises between experiments and allows us to see very clear patterns which would otherwise go unnoticed. This variability arises from the amount of lipids in each sample which varies between parasites in a dish, the batch of FBS and DMEM used, and the solutions and even room temperature used to extract lipids on a given day.

      However, we agree with the reviewer that depicting the data in Figure 4A-C as a ratio of ∆CDPK3/WT parasites can be confusing, so we have now changed the graphs, plotting WT and ∆CDPK3 levels instead, and have moved the ratio of ∆CDPK3/WT to the Supplementary Figure 5.

      The significance test seems to be performed on the difference between the WT and ∆cdpk3 strains, but not relative to the DMSO treatment? Wouldn't you want to perform a repeated measures ANOVA to determine (i) if lipid levels change over time and (ii) if this trend differs in WT vs. mutant strain?

      Response:

      The reviewer correctly points out that ANOVA is often used for time courses, but we must point out that it is not always strictly appropriate since it can overlook the purpose of the individual experiment design, which in this case is, 1) to investigate the role of CDPK3 compared to the WT parental strain, and 2) specifically to find the exact point at which the DAG begins to change after stimulus to match the proteomics time course.

      Our data is clearly biassed towards earlier time points where we have 0, 5, 10, 30, 45 seconds where DAG levels are mostly unchanged compared to the single timepoint 60 seconds which shows a significant difference in DAG using our method of statistical comparison by paired two tailed t-test. Therefore, it would be unwise to use ANOVA when we really want to see when the A23187 stimulus takes effect, which appears to be after the 45 second mark. Therefore, analysing the data by ANOVA would likely provide a false negative result, where the result is non-significant but there is clearly more DAG in WT than CDPK3 after 60 seconds. T-tests are commonly used when comparing the same cell lines grown in the same conditions with a test/treatment, and in this case the test/treatment is CPDK3 present or absent (Lentini et al., 2020).

      In the main text, it would be preferable to see the data presented as the proteomics experiments were in Figure 4B and 4C, with fold changes relative to the DMSO (t = 0) treatment, separately for WT and ∆cdpk3 parasites.

      Response:

      We have now changed the way that we represent the data, plotting %mol instead of the ratio.

      Signaling lipids constitute small percentages of the overall pool (e.g. PMID: 26962945), so one might not necessarily expect to observe large changes in lipid abundance when signaling pathways are modulated. Is there any positive control that the authors could include to give readers a sense of the dynamic range? Maybe the DGK1 mutant (PMID: 26962945)?

      Response:

      DGK1 is maybe not a good example because the DGK1 KO parasites effectively “melt” from a lack of plasma membrane integrity ((Bullen et al., 2016), so this would likely be technically challenging. We don’t see the added value in including an additional mutant control since we can already see the dynamic change over time from no difference (0 seconds) to significant difference (60 seconds) between WT and CDPK3 for DAG and most other lipids. We already see a significant difference between WT and CDPK3 after 60 seconds for DAG, and we can clearly see in sub-minute timecourses the changes or not at the specific points where the A23187 is added (0-5 seconds), the parasites acclimatise, for the A23187 to take effect (10-30 seconds) and for the parasite lipid response to be visible by lipidomics (45-60 +seconds).

      Figure 4E: are the differences in [cAMP] with DMSO treatment and A23187 treatment different at any of the timepoints in the WT strain? The comparison seems to be WT/∆cdpk3 at each timepoint. Does the text (lines 562-568) need to be modified accordingly?

      Response:

      In WT (and ∆CDPK3) parasites, [cAMP] is significantly changed at 5s of A23187 treatment (relative to DMSO). We have modified our figures to include this analysis. The existing text accurately reflects this.

      Figure 6I: is the difference between PDE2 cKO/∆cdpk3 + DMSO or RAP significant?

      Response

      In our original manuscript, there was no statistical difference in [cAMP] between PDE2cKO/∆CDPK3+DMSO and PDE2cKO/∆CDPK3+DMSO+RAP, likely due to the variation between biological replicates. To overcome the issues in variability between replicates, we have now included more biological replicates (n=7). This has led to a significant difference in [cAMP] between PDE2cKO/∆CDPK3 DMSO- and RAP-treated parasites and between PDE2cKO DMSO- and RAP-treated parasites (now Fig. 6I).

      **MINOR COMMENTS**

      1.The following references should be added or amended:

      Lines 83-85: in the cited publication, relative phosphopeptide abundances of an overexpressed dominant-negative, constitutively inactive PKA mutant were compared to an overexpressed wild-type mutant. In this experimental setup, one would hypothesize that targets of PKA should be down-regulated (inactive/WT ratios). However, the mentioned phosphopeptide of PDE2 was found to be up-regulated, suggesting that it is not a direct target of PKA.

      Response:

      We thank the reviewer for spotting this error, we have now modified our wording.

      Cite TGGT1_305050, referenced as calmodulin in line 458, as TgELC2 (PMID: 26374117).

      Response:

      We have included this as per the reviewer’s suggestion.

      Cite TGGT1_295850 as apical annuli protein 2 (AAP2, PMID: 31470470).

      Response:

      We have included this as per the reviewer’s suggestion.

      Cite TGGT1_270865 (adenylyl cyclase beta, Acβ) as PMID: 29030485, 30449726.

      Response:

      We have included this as per the reviewer’s suggestion.

      Cite TGGT1_254370 (guanylyl cyclase, GC) as PMID: 30449726, 30742070.

      Response:

      We have included this as per the reviewer’s suggestion.

      Note that Lourido, Tang and David Sibley, 2012 observed that treatment with zaprinast (a PDE inhibitor) could overcome CDPK3 inhibition. The target(s) of zaprinast have not been determined and may differ from those of BIPPO (in identity and IC50). The cited study also used modified CDPK3 and CDPK1 alleles, rather than ∆cdpk3 and intact cdpk1 as used in this manuscript. That is to say, the signaling backgrounds of the parasite strains deviate in ways that are not controlled.

      Response:

      While it is true that zaprinast targets have not been unequivocally identified, zaprinast-induced egress is widely thought to be the result of PKG activation, a conclusion that is further supported by the finding that Compound 1 completely blocks zaprinast-induced egress (Lourido, Tang and David Sibley, 2012). Similarly, BIPPO-induced egress is inhibited by chemical inhibition of PKG by Compound 1 and Compound 2 (Jia et al., 2017). Moreover, like zaprinast, BIPPO has been clearly shown to partially overcome the ∆CDPK3 egress delay (Stewart et al., 2017).

      2.The following comments refer to the figures and legends:

      Part of the legend text for 1G is included under 1H.

      Response:

      This has been corrected

      Figure 1H: The legend mentions that some dots are blue, but they appear green. Please ensure that color choices conform to journal accessibility guidelines. See the following article about visualization for colorblind readers: https://www.ascb.org/science-news/how-to-make-scientific-figures-accessible-to-readers-with-color-blindness____/ . Avoid using red and green false-colored images; replace red with a magenta lookup table. Multi-colored images are only helpful for the merged image; otherwise, we discern grayscale better. Applies to Figures 1B, 5C, 6D. (Aside: anti-CAP seems an odd choice of counterstain; the variation in the staining, esp. at the apical cap, is distracting.)

      Response:

      We thank reviewer #1 for bringing this to our attention, and have modified our colour usage for all IFAs and Figures 1H and 3E.

      We chose CAP staining as the antibody is available in the laboratory and stains both the apical end (which has been shown to contain several proteins important for signalling as well as PDE9) and the parasite periphery, the location of CDPK3.

      Figure 1B: When showing a single fluorophore, please use grayscale and include an intensity scale bar, since relative values are being compared.

      Response:

      We have modified this as per the reviewer’s suggestion

      Figure 1C: it is difficult to compare the kinetics of the calcium response when the curves are plotted separately. Since the scales are the same, could the two treatments be plotted on the same axes, with different colors? Additionally, according to the legend, a red line seems to be missing in this panel.

      Response:

      Fig1C is not intended to compare kinetics, merely to show peak calcium release in each separate treatment condition. We have removed mention of a red line in the figure legend.

      Figure 2A: Either Figure S4 can be moved to accompany Figure 2A, or Figure 2A could be moved to the supplemental.

      Figure S4 has now been incorporated into Figure 2.

      Reviewer #1 (Significance (Required)):

      This manuscript would interest researchers studying signaling pathways in protozoan parasites, especially apicomplexans, as CDPK3 and PKG orthologs exist across the phylum. To my knowledge, it is the first study that has proposed a mechanism by which a calcium effector regulates cAMP levels in T. gondii. Unfortunately, the experiments fall short of testing this mechanism.

      Response:

      We thank reviewer #1 for their comments, but disagree with their assessment that the key points of the manuscript “fall short of experimental testing”.

      1. We demonstrate that, following both BIPPO and A23187 treatment, there is differential phosphorylation of numerous components traditionally believed to sit upstream of PKG activation (as well as several components within the PKG signalling pathway itself).
      2. We show that some of these sites are CDPK3 dependent, and that deletion of CDPK3 leads to changes in lipid signalling and an elevation in levels of cAMP (dysregulation of which is known to alter PKG signalling).
      3. We show that pre-treatment with a PKA inhibitor is able to largely rescue this phenotype.
      4. We demonstrate that a cAMP-specific PDE is phosphorylated following A23187 treatment (i.e. Ca2+ flux)
      5. We show that this cAMP specific PDE plays a role in A23187-mediated egress.
      6. While the latter PDE may not be directly regulated by CDPK3, these findings suggest that there are likely several Ca2+-dependent kinases that contribute to this feedback loop.

        Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      In this manuscript, Dominicus et al investigate the elusive role of calcium-dependent kinase 3 during the egress of Toxoplasma gondii. Multiple functions have already been proposed for this kinase by this group including the regulation of basal calcium levels (24945436) or of a tyrosine transporter (30402958). However, one of the most puzzling phenotypes of CDPK3 deficient tachyzoites is a marked delay in egress when parasites are stimulated with a calcium ionophore that is rescued with phosphodiesterase (PDE) inhibitors. Crosstalk between, cAMP, cGMP, lipid and calcium signalling has been previously described to be important in regulating egress (26933036, 23149386, 29030485) but the role of CDPK3 in Toxoplasma is still poorly understood.

      Here the authors first take an elegant phosphoproteomic approach to identify pathways differentially regulated upon treatment with either a PDE inhibitor (BIPPO) and a calcium ionophore (A23187) in WT and CDPK3-KO parasites. Not much difference is observed between BIPPO or A23187 stimulation which is interpreted by the authors as a regulation through a feed-back loop.

      The authors then investigate the effect of CDPK3 deletion on lipid, cGMP and cAMP levels. The identify major changes in DAG, phospholipid, FFAs, and TAG levels as well as differences in cAMP levels but not for cGMP. Chemical inhibition of PKA leads to a similar egress timing in CDPK3-KO and WT parasites upon A23187 stimulation.

      As four PDEs appeared differentially regulated in the CDPK3-KO line upon A23187, the authors investigate the requirement of the 4 PDEs in cAMP levels. They show diverse localisation of the PDEs with specificities of PDE1, 7 and 9 for cGMP and of PDE2 for cAMP. They further show that PDE1, 7 and 9 are sensitive to BIPPO. Finally, using a conditional deletion system, they show that PDE1 and 2 are important for the lytic cycle of Toxoplasma and that PDE2 shows a slightly delayed egress following A23187 stimulation.

      **Major comments:**

      -Are the key conclusions convincing?

      The title is supported by the findings presented in this study. However I am not sure to understand why the authors imply a positive feed back loop. This should be clarified in the discussion of the results.

      Response:

      We believe in a positive feedback loop as, upon A23187 treatment (resulting in a calcium flux), ΔCDPK3 parasites are able to egress, albeit in a delayed manner. This egress delay is substantially, but not completely, alleviated upon treatment with BIPPO (a PDE inhibitor known to activate the PKG signalling pathway). In conjunction with our phosphoproteomic data (where we see phosphorylation of numerous pathway components upstream of PKG upon BIPPO and A23187 treatment - both in a CDPK3 dependent and independent manner), these observations suggest that calcium-regulated proteins (CDPK3 among them) feed into the PKG pathway. As deletion of CDPK3 delays egress, it is reasonable to postulate that this feedback is one that amplifies egress signalling (i.e. is positive).

      The phosphoproteome analysis seems very strong and will be of interest for many groups working on egress. However, the key conclusion, i.e. that a substrate overlaps between PKG and CDPK3 is unlikely to explain the CDPK3 phenotype, seems premature to me in the absence of robustly identified substrates for both kinases.

      Response:

      We certainly do not fully exclude the possibility of a substrate overlap but do lean more heavily towards a feedback loop given (a) the inability to clearly detect treatment-specific signalling profiles and (b) the phospho targets observed in the A23187 and BIPPO phosphoproteomes. We have further clarified our reasoning, and overall tempered our language in the manuscript as per the reviewer’s suggestion.

      I am not sure there is a clear key conclusion from the lipidomic analysis and how it is used by the authors to build their model up. Major changes are observed but how could this be linked with CDPK3, particularly if cGMP levels are not affected?

      Response:

      Our phosphoproteomic analyses identify several CDPK3-dependent phospho sites on phospholipid signalling components (DGK1 & PI-PLC), suggesting that there is indeed altered signalling downstream of PKG. To test whether these lead to a measurable phenotype, we performed the lipidomics analysis. We did not pursue this arm of the signalling pathway any further as we postulated that the changes in the lipid signalling pathway were less likely to play a role in the feedback loop. Nevertheless, we felt that it was worthwhile to include these findings in our manuscript as they support the conclusions drawn from the phosphoproteomics - namely that lipid signalling is perturbed in CDPK3 mutants. We, or others, may follow up on this in future.

      We agree with the reviewer that it is surprising that cGMP levels remain unchanged in our experiments when we treat with A23187. Given the measurable difference in cAMP levels between WT and ΔCDPK3 parasites, we postulate that CDPK3 directly or indirectly downregulates levels of cAMP. This would, in turn, alter activity of the cAMP-dependent protein kinase PKAc. Jia et al. (2017) have shown a clear dependency on PKG for parasites to egress upon PKAc depletion, but were also unable to reliably demonstrate cGMP accumulation in intracellular parasites. Similarly, their hypothesis that dysregulated cGMP-specific PDE activity results in altered cGMP levels has not been proven (the PDE hypothesised to be involved has since been shown to be cAMP-specific).

      While it is possible that our collective inability to observe elevated cGMP levels is explained by the sensitivity limits of the assay, it is similarly possible that cAMP-mediated signalling is exerting its effects on the PKG signalling pathway in a cGMP-independent manner.

      The evidence that CDPK3 is involved in cAMP homeostasis seems strong. However, the analysis of PKA inhibition is a bit less clear. The way the data is presented makes it difficult to see whether the treatment is accelerating egress of CDPK3-KO parasites or affecting both WT and CDPK3-KO lines, including both the speed and extent of egress. This is important for the interpretation of the experiment.

      Response:

      Fig. 4F shows that there is a significant amount of premature egress in both WT and ∆CDPK3 parasites following 2 hrs of H89 pre-treatment (consistent with previous reports that downregulation of cAMP signalling stimulates premature egress). When we subsequently investigated A23187-induced egress rates of the remaining intracellular H89 pre-treated parasites (Fig. 4Gi-ii) we found that the ∆CDPK3 egress delay was largely rescued. We have moved Fig. 4F to the supplement (now Supp Fig. 5E) in order to avoid confusion between the distinct analyses shown in 4F (pre-treatment analyses) and 4G (egress experiment). These experiments provided a hint that cAMP signalling is affected, which we then validate by measuring elevated cAMP levels in CDPK3 mutant parasites.

      The biochemical characterisation of the four PDE is interesting and seems well performed. However, PDE1 was previously shown to hydrolyse both cAMP and cGMP (____https://doi.org/10.1101/2021.09.21.461320____) which raises some questions about the experimental set up. Could the authors possibly discuss why they do not observe similar selectivity? Could other PDEs in the immunoprecipitate mask PDE activity? In line with this question, it is not clear what % of "hydrolytic activity (%)" means and how it was calculated.

      The experiments describing the selectivity of BIPPO for PDE1, 7 and 9 as well as the biological requirement of the four tested PDEs are convincing.

      Response:

      We believe that the disagreement between our findings and those published by Moss and colleagues are due to the differences in experimental conditions. We performed our assays at room temperature for 1 hour with higher starting cAMP concentrations (1 uM) compared to them. They performed their assays at 37ºC for 2 hours with 10-fold lower starting cAMP concentrations (0.1 uM). We have now repeated this set of experiments using the Moss et al. conditions, and find that PDEs 1, 7 and 9 can be dual specific, while PDE2 is cAMP-specific, thereby recapitulating their findings (Now included in the revised manuscript under Supp Fig. 7B). However, we also now performed a timecourse PDE assay using our original conditions and show that the cAMP hydrolytic activity for PDE1 can only be detected following 4 hours of incubation, compared to cGMP activity that can be detected as early as 30 minutes, suggesting that it possesses predominantly cGMP activity (See Supp Fig. 7C). We therefore believe that our experimental setup is more stringent, because if one starts with a lower level of substrate and incubates for longer and at a higher temperature, even minor dual activity could make a substantial difference in cAMP levels. Our data suggests that the cAMP hydrolytic activity of PDEs 1, 7 and 9 is substantially lower than the cGMP hydrolytic activity that they display.

      We have also included a clear description of how % hydrolytic activity was calculated in the methods section.

      -Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      The claim that CDPK3 affects cAMP levels seems strong however the exact links between CDPK3 activity, lipid, cGMP and cAMP signalling remain unclear and it may be important to clearly state this.

      Response:

      We have modified our wording in the text to more clearly describe our current hypothesis and reasoning.

      -Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      I think that the manuscript contains a significant amount of experiments that are of interest to scientists working on Toxoplasma egress. Requesting experiments to identify the functional link between above-mentioned pathways would be out of the scope for this work although it would considerably increase the impact of this manuscript. For example, would it be possible to test whether the CDPK3-KO line is more or less sensitive to PKG specific inhibition upon A23187 induced?

      -Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      The above-mentioned experiment is not trivial as no specific inhibitors of PKG are available. Ensuring for specificity of the investigated phenotype would require the generation of a resistant line which would require significant work.

      __Response: __We agree that this would be an interesting experiment to further substantiate our findings. As indicated by the reviewer, however, the lack of specific inhibitors of PKG means a resistant line would likely be required to ensure specificity.

      -Are the data and the methods presented in such a way that they can be reproduced?

      It is not clear how the % of hydrolytic activity of the PDE has been calculated.

      Response: We have included a clearer description of how % hydrolytic activity was calculated in the methods section.

      -Are the experiments adequately replicated and statistical analysis adequate?

      This seems to be performed to high standards.

      **Minor comments:**

      -Specific experimental issues that are easily addressable.

      I do not have any comments related to minor experimental issues.

      -Are prior studies referenced appropriately?

      Most of the studies relevant for this work are cited. It is however not clear to me why some important players of the "PKG pathway" are not indicated in Fig 1H and Fig 3E, including for example UGO or SPARK.

      Response:

      We have modified Fig 1H and 3E to include all key players involved in the PKG pathway.

      -Are the text and figures clear and accurate?

      While all the data shown here is impressive and well analysed, I find it difficult to read the manuscript and establish links between sections of the papers. The phosphoproteome analysis is interesting and is used to orientate the reader towards a feedback mechanism rather than a substrate overlap. But why do the authors later focus on PDEs and not on AC or CNBD, as in the end, if I understand well, there is no evidence showing a link between CDPK3-dependent phosphorylation and PDE activity upon A23187 stimulation?

      Response:

      We thank reviewer#2 and appreciate their constructive feedback re the flow of the manuscript.

      Our key findings from the phosphoproteomics study were that 1) BIPPO and A23187 treatment trigger near identical signalling pathways, 2) that both A23187 and BIPPO treatment leads to phosphorylation of numerous components both upstream and downstream of PKG signalling (hinting at the presence of an Ca2+-regulated feedback loop) and 3) several of the abovementioned components are phosphorylated in a CDPK3 dependent manner.

      While several avenues of study could have been pursued from this point onwards, we chose to focus on the feedback loop in a broader sense as its existence has important implications for our general understanding of the signalling pathways that govern egress.

      We reasoned that, given the differential phosphorylation of 4 PDEs following A23187 and BIPPO treatment (none of which had been studied in detail previously), it was relevant to study these in greater detail.

      Coupled with the A23187 egress assay on PDE2 knockout parasites - our findings suggest that PDE2 plays a role in the abovementioned Ca2+ signalling loop. While PDE2 may not exert its effects in a CDPK3-dependent manner (and CDPK3 may, therefore, alter cAMP levels in a different fashion), this does not detract from the important finding that PDE2 is one of the (likely numerous) components that is regulated in a Ca2+-dependent feedback loop to facilitate rapid egress.

      We have modified our wording to better reflect our rationale for studying the PDEs irrespective of their CDPK3 phosphorylation status.

      While we feel that our reasoning for studying the PDEs is solid, we do appreciate that further clarification on the putative CDPK3-Adenylate cyclase link would elevate the manuscript substantially. However, given the data that the ACb is not playing a sole role in the control of egress, this is likely a non-trivial task and requires substantial work.

      It is also unclear how the authors link CDPK3-dependent elevated cAMP levels with the elevated basal calcium levels they previously described. This is particularly difficult to reconcile particularly in a PKG independent manner.

      Response:

      We previously postulated that elevated Ca2+ levels allowed ΔCDPK3 mutants to overcome a complete egress defect, potentially by activating other CDPKs (e.g. CDPK1). It is similarly plausible that elevated Ca2+ levels in ΔCDPK3 parasites may lead to elevated cAMP levels in order to prevent premature egress.

      As noted in our previous responses, we acknowledge that our inability to detect cGMP is surprising. However, given the clarity of our cAMP findings, and the phosphoproteomic evidence to suggest that various components in the PKG signalling pathway are affected, we postulate that we are either unable to reliably detect cGMP due to sensitivity issues, or that cAMP is exerting its regulation on the PKG pathway in a cGMP-independent manner. As noted previously, while the link between cAMP and PKG signalling has been demonstrated by Jia et al., it is not entirely clear how this is mediated.

      The presentation of the lipidomic analysis is also not really clear to me. Why do the authors show the global changes in phospholipids and not a more detailed analysis?

      Response:

      We performed a detailed phospholipid profile of WT and ∆CDPK3 parasites under normal culture conditions. However, due to the sheer quantity of parasites required for this detailed analysis, we were unable to measure individual phospholipid species in our A23187 timecourse. We therefore opted to measure global changes following A23187 stimulation.

      As the authors focus on the PI-PLC pathway, could they detail the dynamics of phosphoinositides? I understand that lipid levels are affected in the mutant but I am not sure to understand how the authors interpret these massive changes in relationship with the function of CDPK3 and the observed phenotypes.

      Response:

      Our phosphoproteomic analyses identified several CDPK3-dependent phospho sites on phospholipid signalling components (DGK1 & PI-PLC), suggesting that (in keeping with all of our other data), there is altered signalling downstream of PKG. To test whether these changes lead to a measurable phenotype, we performed the lipidomics analysis. Following stimulation with A23187, we found a delayed production of DAG in ∆CDPK3 parasites compared to WT parasites. Since DAG is required for the production of PA, which in turn is required for microneme secretion, our finding can explain why microneme secretion is delayed in ∆CDPK3 parasites, as previously reported (Lourido, Tang and David Sibley, 2012; McCoy et al., 2012).

      We did not follow this arm of the signalling pathway any further as we postulated that the changes in the lipid signalling pathway were less likely to play a role in the feedback loop. Nevertheless, we felt that it was worthwhile to include these findings in our manuscript as they support the conclusions drawn from the phosphoproteomics - namely that lipid signalling is perturbed in CDPK3 mutants. We, or others, may follow up on this in future.

      Finally, the characterisation of the PDEs is an impressive piece of work but the functional link with CDPK3 is relatively unclear. It would also be important to clearly discuss the differences with previous results presented in this this preprint: https://doi.org/10.1101/2021.09.21.461320____.

      My understanding is while the authors aim at investigating the role of CDPK3 in A23187 induced egress, the main finding related to CDPK3 is a defect in cAMP homeostasis that is not linked to A23187. Similarly, the requirements of PDE2 in cAMP homeostasis and egress is indirectly linked to CDPK3. Altogether I think that important results are presented here but divided into three main and distinct sections: the phosphoproteomic survey, the lipidomic and cAMP level investigation, and the characterisation of the four PDEs. However, the link between each section is relatively weak and the way the results are presented is somehow misleading or confusing.

      Response:

      As mentioned in a previous response, we chose to study PDEs in greater detail because of our observation that both A23187 and BIPPO treatments lead to their phosphorylation (hinting at the presence of a Ca2+regulated feedback loop). We were particularly intrigued to study the cAMP specific PDE, as CDPK3 KO parasites suggested that cAMP may play a role in the Ca2+ feedback mechanism. As PDE2 may not be directly regulated by CDPK3, Ca2+ appears to exert its feedback effects in numerous ways. We have modified our wording to better reflect our rationale for studying the PDEs irrespective of their CDPK3 phosphorylation status.

      -Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      This is a very long manuscript written for specialists of this signalling pathway and I would suggest the authors to emphasise more the important results and also clearly state where links are still missing. This is obviously a complex pathway and one cannot elucidate it easily in a single manuscript.

      Response:

      We have included an additional summary in our conclusions to better illustrate our findings and clarify any missing links.

      Reviewer #2 (Significance (Required)):

      -Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This is a technically remarkable paper using a broad range of analyses performed to a high standard.

      -Place the work in the context of the existing literature (provide references, where appropriate).

      The cross-talk between cAMP, cGMP and calcium signalling is well described in Toxoplasma and related parasites. Here the authors show that, in Toxoplasma, CDPK3 is part of this complex signalling network. One of the most important finding within this context is the role of CDPK3 in cAMP homeostasis. With this in mind, I would change the last sentence of the abstract to "In summary we uncover a feedback loop that enhances signalling during egress and links CDPK3 with several signalling pathways together."

      Response:

      In light of feedback received from several reviewers, we have made our wording less CDPK3 centric - as our findings relate in part to CDPK3 and, in a broader sense, to a Ca2+ driven feedback loop.

      The genetic and biochemical analyses of the four PDEs are remarkable and highlight consistencies and inconsistencies with recently published work that would be important to discuss and will be of interest for the field.

      __Response: __We thank reviewer#2 and agree that the PDE findings are of significant importance to the field.

      While I understand the studied signalling pathway is complex, I think it would be important to better describe the current model of the authors. In the discussion, the authors indicate that "the published data is not currently supported by a model that fits most experimental results." I would suggest to clarify this statement and discuss whether their work helps to reunite, correct or improve previous models.

      __Response: __We have expanded on the abovementioned statement to clarify that the presence of a feedback loop is a major pillar of knowledge required for the complete interpretation of existing signalling data.

      Could the authors also speculate about a potential role of PDE/CDPK3 in host cell invasion as cAMP signalling has be shown to be important for this process (30208022 and 29030485)?

      __Response: __Existing literature (Jia et al., 2017) suggests that perturbations to cAMP signalling play a very minor role in invasion since parasites where either ACα or ACβ are deleted show no impairment in invasion levels. We currently do not have substantial data on invasion, and are not sure that pursuing this is valuable given the minor phenotypes observed in other studies.

      -State what audience might be interested in and influenced by the reported findings.

      This paper is of great interest to groups working on the regulation of egress in Toxoplasma gondii and other related apicomplexan pathogens.

      -Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      I am working on the cell biology of apicomplexan parasites.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Dominicus et al aimed to identify the intersecting components of calcium, cyclic nucleotides (cAMP, cGMP) and lipid signaling through phosphoproteomic, knockout and biochemical assays in an intracellular parasite, Toxoplasma gondii, particularly when its acutely-infectious tachyzoite stage exits the host cells. A series of experimental strategies were applied to identify potential substrates of calcium-dependent protein kinase 3 (CDPK3), which has previously been reported to control the tachyzoite egress. According to earlier studies (PMID: 23226109, 24945436, 5418062, 26544049, 30402958), CDPK3 regulated the parasite exit through multiple phosphorylation events. Here, authors identified differentially-regulated (DR) phosphorylation sites by comparing the parasite samples after treatment with a calcium ionophore (A23178) and a PDE inhibitor (BIPPO), both of which are known to induce artificial egress (induced egress as opposed to natural egress). When the DCDPK3 mutant was treated with A23187, its delayed egress phenotype did not change, whereas BIPPO restored the egress to the level of the parental (termed as WT) strain, probably by activating PKG.

      The gene ontology enrichment of the up-regulated clusters revealed many probable CDPK3-dependent DR sites involved in cyclic nucleotide signaling (PDE1, PDE2, PDE7, PDE9, guanylate and adenylate cyclases, cyclic nucleotide-binding protein or CNBP) as well as lipid signaling (PI-PLC, DGK1). Authors suggest lipid signaling as one of the factors altered in the CDPK3 mutant, albeit lipidomics (PC, PI, PS, PT, PA, PE, SM) showed no significant change in phospholipids. To reveal how the four PDEs indicated above contribute to the cAMP and cGMP-mediated egress, they examined their biological significance by knockout/knockdown and enzyme activity assays. Authors claim that PDE1,7,9 proteins are cGMP-specific while PDE2 is cAMP-specific, and BIPPO treatment can inhibit PDE1-cGMP and PDE7-cGMP, but not PDE9-cGMP. Given the complexity, the manuscript is well structured, and most experiments were carefully designed. Undoubtedly, there is a significant amount of work that underlies this manuscript; however, from a conceptual viewpoint, the manuscript does not offer significant advancement over the current knowledge without functional validation of phosphoproteomics data (see below). A large body of work preceding this manuscript has indicated the crosstalk of cAMP, cGMP, calcium and lipid signaling cascades. This work provides a further refinement of the existing model In a methodical sense, the work uses established assays, some of which require revisiting to reach robust conclusions and avoid misinterpretation. The article is quite interesting from a throughput screening point of view, but it clearly lacks the appropriate endorsement of the hits.The authors accept that identifying the phosphorylation of a protein does not imply a functional role, which is a major drawback as there is no experimental support for any phosphorylation site of the protein identified through phosphoproteomics. In terms of the mechanism, it is not clear whether and how lipid turnover and cAMP-PKA signaling control the egress phenotype (lack of a validated model at the end of this study).

      Response:

      We thank reviewer #3 for their comments, but respectfully disagree with their assessment that the work presented does not advance current knowledge.

      1. We demonstrate that, following both BIPPO and A23187 treatment, there is differential phosphorylation of numerous components traditionally believed to sit upstream of PKG activation (as well as numerous components within the PKG signalling pathway itself). While it may have been inferred from previous studies that A23187 and BIPPO signalling intersect, this has never been unequivocally demonstrated - nor has a feedback loop ever been shown.

      We provide a novel A23187-driven phosphoproteome timecourse that further bolsters the model of a Ca2+-driven feedback loop.

      We show that deletion of CDPK3 leads to a delay in DAG production upon stimulation with A23187.

      We show that some of the abovementioned sites are CDPK3 dependent, and that deletion of CDPK3 leads to elevated levels of cAMP (dysregulation of which is known to alter PKG signalling).

      We show that pre-treatment with a PKA inhibitor is able to largely rescue this phenotype.

      We demonstrate that a cAMP-specific PDE is phosphorylated following A23187 treatment (i.e. Ca2+ flux)

      We show that this cAMP specific PDE plays a role in egress.

      While the latter PDE may not be directly regulated by CDPK3, these findings suggest that there are likely several Ca2+-dependent kinases that contribute to this feedback loop.

      We also firmly disagree with the reviewer’s assertion that without phosphosite characterisation, we have no support for our model. Following treatment with A23187 (and BIPPO), we clearly show broad, systemic changes (both CDPK3 dependent and independent) across signalling pathways previously deemed to sit upstream of calcium flux. Given the vast number of proteins involved in these signalling pathways, and the multitude of differentially regulated phosphosites identified on each of them, it is highly likely that the signalling effects we observe are combinatorial. Accordingly, we believe that mutating individual sites on individual proteins would be a very costly endeavour which is unlikely to substantially advance our understanding of signalling during egress. Moreover, introducing multiple point mutations in a given protein to ablate phosphorylation may lead to protein misfolding and would therefore not be informative. One of the key aims of this study was to assess how egress signalling pathways are interconnected, and we believe we have been able to show strong support for a Ca2+-driven feedback mechanism in which both CDPK3 and PDE2 play a role through the regulation of cAMP.

      While we agree with the reviewer’s statement that a large body of work preceding this manuscript has indicated the crosstalk of cAMP, cGMP, calcium and lipid signalling cascades, a feedback loop has not previously been shown. We believe that this finding is absolutely central to facilitate the complete interpretation of existing signalling data. Furthermore, no previous studies have gone to this level of detail in either proteomics or lipidomics to analyse the calcium signal pathway in any apicomplexan parasite. We argue that the novelty in our manuscript is that it is a carefully orchestrated study that advances our understanding of the signalling network over time with subcellular precision. The kinetics of signalling is not well understood and we believe that our study is likely the first to include both proteomic and lipidomic analyses over a timecourse during the acute lytic cycle stage of the disease. In doing so, we found evidence for a feedback loop that controls the signalling network spatiotemporally, and we characterise elements of this feedback in the same study.

      **Major Comments:**

      Based on the findings reported here there is little doubt that BIPPO and A23187-induced signaling intersect with each other, as very much expected from previous studies. The authors selected the 50s and 15s post-treatment timing of A23187 and BIPPO, respectively for collecting phosphoproteomics samples. At these time points, which were shown to peak cytosolic Ca2+, parasites were still intracellular (Line #171). How did authors make sure to stimulate the entire signaling cascade adequately, particularly when parasites do not egress within the selected time window? There is significant variability between phosphosite intensities of replicates (Line #186), which may also be attributed to insufficient triggers for the egress across independent experiments. This work must be supported by in vitro egress assays with the chosen incubation periods of BIPPO and ionophore treatment (show the induced % egress of tachyzoites in the 50s and 15s).

      Response:

      1. We appreciate that the reviewer acknowledges that our data clearly shows that BIPPO and A23187-induced signalling intersect. While this may have been expected from previous studies, this has not previously been shown - and is therefore valuable to the field. Specifically, the fact that A23187-treatment leads to phosphorylation of targets normally deemed to sit upstream of calcium release is entirely novel and adds a substantial layer of information to our understanding of how these signalling pathways work together.

      Treatments were purposely selected to align pathways to a point where calcium levels peak just prior to calcium reuptake. At these chosen timepoints, we clearly show that overall signalling correlation is very high. We know from our egress assays using identical treatment concentrations (Fig. 2C), that the stimulations used are sufficient to result in complete egress. We are simply comparing signalling pathways at points prior to egress.

      As mentioned in point 2, we show convincingly that the treatments used are sufficient to trigger complete egress. As detailed clearly in the text, we believe that these variations in intensities between replicates are due to slight differences in timing between experiments (this is inevitable given the very rapid progression of signalling, and the difficulty of replicating exact sub-minute treatment timings). We demonstrate that the reporter intensities associated with DR sites correlate well across replicates (Supp Fig. 3C), suggesting that despite some replicate variability, the overall trends across replicates is very much consistent. This allows us to confidently average scores to provide values that are representative of a site’s phosphorylation state at the timepoint of interest.

      The reviewer’s suggestion that we should demonstrate % egress at the 50s and 15s treatment timepoints is obsolete - we state clearly in the text that parasites have not egressed at these timepoints. Our egress assays (Fig. 2C) further support this.

      The authors discuss that CDPK3 controls the cAMP level and PKA through activation of one or more yet-to-be-identified PDEs(s). cAMP could probably also be regulated by an adenylate cyclase, ACbeta that was found to have CDPK3-dependent phosphorylation sites. If CDPK3 is indeed a regulator of cAMP through the activation of PDEs or ACbeta, it would be expected that the deletion of CDPK3 would perturb the cAMP level, resulting in dysregulation of PKAc1 subunit, which in turn would dysregulate cGMP-specific PDEs (PMID: 29030485) and thereby PKG. All these connections need to explain in a more clear manner with experimental support (what is positive and what is negatively regulated by C____DPK3).

      Response:

      1. We do not firmly state that CDPK3 regulates cAMP by phosphorylation of a PDE - this is one of the possibilities addressed. We acknowledge the possibility that this could also be via the adenylate cyclase (see line 792).

      PMID: 29030485 demonstrates clearly a link between cAMP signalling and PKG signalling, but does not demonstrate how this is mediated. The authors postulate that a cGMP-specific PDE is dysregulated given their observation that PDE2 is differentially phosphorylated in a constitutively inactive PKA mutant, however this was not validated experimentally. We and others (Moss et al., 2022), however, demonstrate that PDE2 is cAMP-specific. This suggests that the model built by PMID: 29030485 requires revisiting. We acknowledge clearly in the text that Jia et al. have shown a link between cAMP and PKG signalling, and hypothesise that CDPK3’s modulation of cAMP levels may affect this (this is in keeping with our phosphoproteomic data).

      Moreover, the egress defect is not due to a low influx of calcium in the cytosol because when the ionophore A23187 was added to the CDPK3 mutant, its phenotype was not recovered. Rather, the defect may be due to the low or null activity of PKG that would activate PI4K to generate IP3 and DAG. The latter would be used as a substrate by DGK to generate PA that is involved in the secretion of micronemes and Toxoplasma egress. In this context, authors should evaluate the role of CDPK3 in the secretion of micronemes that is directly related to the egress of the parasite.

      1. We agree with the reviewer on their point about calcium influx, and have already acknowledged in the text that the feedback loop does not control release of Ca2+ from internal stores as disruption of CDPK3 does not lead to a delay in Ca2+

      We agree, and clearly address in the text, that the egress defect could be due to altered PKG/phospholipid pathway signalling.

      (Lourido, Tang and David Sibley, 2012; McCoy et al., 2012) have both previously shown that microneme secretion is regulated by CDPK3. We therefore do not deem it necessary to repeat this experiment, but have made clearer mention of their findings in our writing.

      When the Dcdpk3 mutant with BIPPO treatment was evaluated, it was observed that the parasite recovered the egress phenotype. It is concluded that CDPK3 could probably regulate the activity of cGMP-specific PDEs. CDPK3 could (in)activate them, or it could act on other proteins indirectly regulating the activity of these PDEs. Upon inactivation of PDEs, an increase in the cGMP level would activate PKG, which will, in turn, promote egress. From the data, it is not clear whether any phosphorylation by CDPK3 would activate or inactivate PDEs, and if so, then how (directly or indirectly). To reach unambiguous interpretation, authors should perform additional assays.

      Response:

      As mentioned previously, given the abundance of differentially regulated phosphosites, we do not believe that mutating individual sites on individual proteins is a worthwhile or realistic pursuit.

      We clearly show systematic A23187-mediated phosphorylation of key signalling components in the PKA/PKG/PI-PLC/phospholipid signalling cascade, and demonstrate that several of these are CDPK3-dependent. We demonstrate that CDPK3 alters cAMP levels (and that the ∆CDPK3 egress delay in A23187 treated parasites is largely rescued following pre-treatment with a PKA inhibitor). We similarly demonstrate that A23187 treatment leads to phosphorylation of numerous PDEs, including the cAMP specific PDE2, and show that PDE2 knockout parasites show an egress delay following A23187 treatment. While PDE2 may not be directly regulated by CDPK3 (suggesting other Ca2+ kinases are also involved), these findings collectively demonstrate the existence of a calcium-regulated feedback loop, in which CDPK3 and PDE2 play a role (by regulating cAMP).

      We acknowledge that we have not untangled every element of this feedback loop, and do not believe that it would be realistic to do so in a single study given the number of sites phosphorylated and pathways involved. We do believe, however, that we have shown clearly that the feedback loop exists - this in itself is entirely novel, and of significant importance to the field.

      On a similar note, a possible experiment that can be done to improve the work would be to treat the CDPK3 mutant with BIPPO in conjunction with a calcium chelator (BAPTA-AM) to reveal, which proteins are phosphorylated prior to activation of the calcium-mediated cascades?

      Response:

      We agree that this would be an interesting experiment to carry out but would involve significant work. This could be pursued in another paper or project but is beyond the scope of this work.

      The manuscript claims that PDE1, PDE7, PDE9 are cGMP specific, and BIPPO inhibits only cGMP-specific PDEs. All assays are performed with 1-10 micromolar cAMP and cGMP for 1h. There is no data showing the time, protein and substrate dependence. Given the suboptimal enzyme assays, authors should re-do them as suggested here. (1) Repeat the pulldown assay with a higher number of parasites (50-100 million) and measure the protein concentration. (2) Set up the PDE assay with saturating amount of cAMP and cGMP, which is critical if the PDE1,7,9 have a higher Km Value for cAMP (means lower affinity) compared to cGMP. An adequate amount of substrate and protein allows the reaction to reach the Vmax. Once you have re-determined the substrate specificity (revise Fig 5D), you should retest BIPPO (Fig 5E) in the presence of cAMP and cGMP. It is very likely that you would find the same result as PDE9 and PfPDEβ (BIPPO can inhibit both cAMP and cGMP-specific PDE), as described previously

      We have repeated our assay using the exact same conditions outlined by Moss et al. This involved using a similar number of parasites, a longer incubation time of 2 hours at a higher temperature (37ºC) and with a lower starting concentration of cAMP (0.1 uM). We demonstrate that we are able to recapitulate both the Moss et al. and Vo et al. (see Supp Fig. 7B). However, we noticed that these reactions were not carried out with saturating cAMP/cGMP concentrations, since all reactions had reached 100% completion at the end of the assay whereby all substrate was hydrolysed. We therefore believe that based on our original assay, as well as the new PDE1 timecourse that we have performed (Supp Fig. 7C), that PDEs 1, 7 and 9 display predominantly cGMP hydrolysing activity, with moderate cAMP hydrolysing activity.

      We also repeated the BIPPO inhibition assay using the Moss et al. conditions, and still observe that the cGMP activity of PDE1 is the most potently inhibited of all 4 PDEs. We also see moderate inhibition of the cAMP activities of PDE1 and PDE9, suggesting that cAMP hydrolytic activity can also be inhibited. Interestingly, the cGMP hydrolytic activities of PDEs 7 & 9, which were previously inhibited using our original assay conditions, no longer appear to be inhibited. This is likely due to the longer incubation time, which masks the reduced activities of these two PDEs following treatment with BIPPO.

      The authors did not identify any PKG substrate, which is quite surprising as cAMP signaling itself could impact cGMP. Authors should show if they were able to observe enhanced cGMP levels in BIPPO-treated sample (which is expected to stimulate cGMP-specific PDEs). The author mention their inability to measure cGMP level but have they analyzed cGMP in the positive control (BIPPO-treated parasite line)? Why have they focused only on CDPK3 mutant, whereas in their phosphoproteomic data they could see other CDPKs too? It could be that other CDPK-mediated signaling differs and need PKA/PKG for activation.

      In the title, the authors have mentioned that there is a positive feedback loop between calcium release, cyclic nucleotide and lipid signaling, which is quite an extrapolation as there is no clear experimental data supporting such a positive feedback loop so the author should change the title of the paper.

      Response:

      1. As addressed in our previous response to the reviewer, PMID: 29030485 demonstrates clearly a link between cAMP signalling and PKG signalling, but does not confirm how this is mediated. The authors surmise that a cGMP-specific PDE is dysregulated (although the PDE hypothesised to be involved has since been shown to be cAMP-specific), but are similarly unable to detect changes in cGMP levels. This suggests that their model may be incomplete.

      The BIPPO treatment experiment suggested by the reviewer was already included in the original manuscript (see Fig. 4D in original manuscript, now Fig. 4E). With BIPPO treatment we are able to detect changes in cGMP levels.

      We did not deem it to be within the scope of this study to study every single other CDPK. We chose to study CDPK3, as its egress phenotype was of particular interest given its partial rescue following BIPPO treatment. We reasoned that its study may lead us to identify the signalling pathway that links BIPPO and A23187 induced signalling.

      As addressed in greater detail in our response to reviewer #2, the fact that the feedback loop appears to stimulate egress implies that it is positive.

      **Minor Comments:**

      Materials & Methods

      Explanation of parameters is not clear (Line #360-367). Phosphoproteomics with A23187 (8 micromolar) treatment in CDPK3-KO and WT, for 15, 30 and 60s at 37{degree sign}C incubation with DMSO control. Simultaneously passing the DR and CDPK3 dependency thresholds: CDPK3-dependent phosphorylation

      __Response: __We have modified the wording to make this clearer as per the reviewer’s suggestion.

      Line #368: At which WT-A23187 timepoint did the authors identify 2408 DR-up phosphosites (15s, 30s or 60s)? Or consistently in all? It should be clarified?

      __Response: __As already stated in the manuscript (see line 366 in original manuscript, now line 1047), phosphorylation sites were considered differentially regulated if at any given timepoint their log2FC surpassed the DR threshold.

      A23187 treatment of the CDPK3-KO mutant significantly increased the cAMP levels at 5 sec post-treatment, but BIPPO did not show any change. The authors concluded that BIPPO presumably does not inhibit cAMP-specific PDEs. However, the dual-specific PDEs are known to be inhibited by BIPPO, as shown recently (____https://www.biorxiv.org/content/10.1101/2021.09.21.461320v1____). Authors do confirm that BIPPO-treatment can inhibit hydrolytic activity of PfPDEbeta for cAMP as well as cGMP (Line #612). Besides, it was shown in Fig 5E that BIPPO can partially though not significantly block cAMP-specific PDE2. The statements and data conflict each other under different subtitles and need to be reconciled. Elevation of basal cAMP level in the CDPK3 mutant indicates the perturbation of cAMP signaling, however BIPPO data requires additional supportive experiments to conclude its relation with cAMP or dual-specific PDE.

      Response:

      1. The manuscript to which the reviewer refers does not use BIPPO in any of their experiments. They show that continuous treatment with zaprinast blocks parasite growth in a plaque assay, but do not test whether zaprinast specifically blocks the activity of any of the PDEs.

      Having repeated the PDE assay using the Moss et al. conditions (as outlined above), we are now able to recapitulate their findings, showing that PDEs 1, 7 and 9 can display dual hydrolytic activity while PDE2 is cAMP specific. As explained further above, we believe that our original set of experiments are more stringent than the Moss *et al. * To confirm this, we also performed an additional experiment, incubating PDE1 for varying amounts of time using our original conditions (1 uM cAMP or 10 uM cGMP, at room temperature). This revealed that PDE1 is much more efficient at hydrolysing cGMP, and only begins to display cAMP hydrolysing capacity after 4 hours of incubation.

      We also measured the inhibitory capacity of BIPPO on the PDEs using the Moss *et al. * During the longer incubation time, it seems that BIPPO is unable to inhibit PDEs 7 and 9, while with the more stringent conditions it was able to inhibit both PDEs. We reasoned that since BIPPO is unable to inhibit these PDEs fully, the residual activity over the longer incubation period would compensate for the inhibition, eventually leading to 100% hydrolysis of the cNMPs. We also see that while the cGMP hydrolysing capacity of PDE1 is completely inhibited, its cAMP hydrolysing capacity is only partially inhibited. These findings and the fact that PDE2 is not inhibited by BIPPO are in line with our experiments where we measured [cAMP] and showed that treatment with BIPPO did not lead to alterations in [cAMP].

      The method used to determine the substrate specificity of PDE 1,2,7 and 9 resulted in the hydrolytic activity of PDE2 towards cAMP, while the remaining 3 were determined as cGMP-specific. However, PDE1 and PDE9 have been reported as being dual-specific (Moss et al, 2021; Vo et al, 2020), which questions the reliability of the preferred method to characterize substrate specificity by the authors. It is also suggested to use another ELISA-based kit to double check the results.

      Response:

      As outlined above, we have repeated the assay using the conditions described by Moss et al. (lower starting concentrations of cAMP, 2 hour incubation period at 37ºC) and find that we are able to recapitulate the results of both Moss et al. and Vo et al.. However, using the Moss et al. conditions, the PDEs have hydrolysed 100% of the cyclic nucleotide, suggesting that these conditions are less stringent than the ones we used originally using higher starting concentrations of cAMP and incubating for 1 hour only at room temperature. With enzymatic assays it is always important to perform them at saturating conditions (as already suggested by the reviewer) and therefore we believe that our original conditions are more stringent than the results using the Moss et al. conditions.

      Line #607-608: Authors found PDE9 less sensitive to BIPPO-treatment and concluded PDE2 as refractory to BIPPO inhibition; however, the reduction level of activity seems similar as seen in PDE9-BIPPO treated sample? This strong statement should be replaced with a mild explanation.

      __Response: __We have tempered our wording as per the reviewer’s suggestion

      Figures and legends:

      The introductory model in Fig S1 is difficult to understand and ambiguous despite having it discussed in the text. For example, CDPK1 is placed, but only mentioned at the beginning, and the role of other CDPKs is not clear. In addition, the arrows in IP3 and PKG are confusing. The location of guanylate and adenylate cyclase is wrong, and so on... The figure should include only the egress-related signaling components to curate it. The illustration of host cell in orange color must be at the right side of the figure in connection with the apical pole of the parasite (not on the top). Figure legend should also be rearranged accordingly and citations of the underlying components should be included (see below).

      __Response: __We have modified Supp Fig. 1 as per the suggestions of reviewer#2 and #3. We have now modified the localisations of the proteins and have also removed the lines showing the cross talk between pathways. We have also highlighted to the reader that this is only a model and may not represent the true localisations of the proteins, despite our best efforts.

      In Figure 5D, would you please provide the western blot analysis of samples before and after pulling down to demonstrate the success of your immunoprecipitation assay. Mention the protein concentration in your PDE enzyme assay. Please refer to the M&M comments above to re-do the enzyme assays.

      Response:

      We have now included western blots for the pull downs of PDEs 1, 2, 7 and 9 (Supp Fig. 7A). We chose not to measure protein concentrations of samples since all experiments were performed using the same starting parasite numbers, and we do not see large differences in activities between biological replicates of the PDEs.

      Figure legend 1C: Line #194: There is no red-dotted line shown in graph! Correct it!

      __Response: __We have modified this.

      Figure 4Gi-ii: Shouldn't it be labelled i: H89-treatment and ii: A23178, respectively instead of DMSO and H89? (based on the text Line #579).

      __Response: __Our labelling of Fig. 4Gi-ii is correct as panel i parasites were pre-treated with DMSO, while panel ii parasites were pre-treated with H89. Subsequent egress assays on both parasites were then performed using A23187.

      We have modified the figures to include mention of A23187 on the X axis, and modified the figure legend to clarify pre-treatment was performed with DMSO and H89 respectively.

      Bibliography:

      Line #57 and 58: Citations must be selected properly! Carruthers and Sibley 1999 revealed the impact of Ca2+ on the microneme secretion within the context of host cell attachment and invasion, not egress as indicated in the manuscript! Similar case is also valid for the reference Wiersma et al 2004; since the roles of cyclic nucleotides were suggested for motility and invasion. Also notable in the fact that several citations describing the localization, regulation and physiological importance of cAMP and cGMP signaling mediators (PMID: 30449726 , 31235476 , 30992368 , 32191852 , 25555060 , 29030485 ) are either completely omitted or not appropriately cited in the introduction and discussion sections.

      Response:

      We have modified the citations as per the reviewer’s suggestions. We now cite Endo et al., 1987 for the first use of A23187 as an egress trigger, and Lourido, Tang and David Sibley, 2012 for the role of cGMP signalling in egress. We also cite all the GC papers when we make first mention of the GC. We have also removed the Howard et al., 2015 citation (PMID: 25555060) when referring to the fact that BIPPO/zaprinast can rescue the egress delay of ∆CDPK3 parasites.

      Grammar/Language

      Line #31: After "cAMP levels" use comma

      Response:

      We have modified this.

      36: Sentence is not clear. Does conditional deletion of all four PDEs support their important roles? If so, the role in egress of the parasite?

      Response:

      We have clarified our wording as per the reviewer’s suggestion. We state that PDEs 1 and 2 display an important role in growth since deletion of either these PDEs leads to reduced plaque growth. We have not investigated exactly what stage of the lytic cycle this is.

      40: "is a group involving" instead of "are"

      Response:

      We found no mention of “a group involving” in our original manuscript at line 40 or anywhere else in the manuscript, so we are unsure what the reviewer is referring to.

      108: isn't it "discharge of Ca++ from organelle stores to cytosol"?

      __Response: __We thank the reviewer for spotting this error. We have now modified this sentence.

      120: "was" instead of "were"

      __Response: __Since the situation we are referencing is hypothetical, then ‘were’ is the correct tense.

      Reviewer #3 (Significance (Required)):

      There is a significant amount of work that underlies this manuscript; however, from a conceptual viewpoint, the manuscript does not offer significant advancement over the current knowledge without functional validation of phosphoproteomics data. In terms of the mechanism, it is not clear whether and how lipid turnover and cAMP-PKA signaling control the egress phenotype (lack of a validated model at the end of this study).In a methodical sense, the work uses established assays, some of which require revisiting to reach robust conclusions and avoid misinterpretation.

      Compare to existing published knowledge

      A large body of work preceding this manuscript has indicated the crosstalk of cAMP, cGMP, calcium and lipid signaling cascades. This work provides a further refinement of the existing model. The article is quite interesting from a throughput screening point of view, but it clearly lacks the appropriate endorsement of the hits.

      Response:

      Please refer to our first response to reviewer #3 for our full rebuttal to these points. We respectfully disagree with the assessment that the work presented does not advance current knowledge.

      Audience

      Field specific (Apicomplexan Parasitology)

      Expertise

      Molecular Parasitology

      References

      Bailey, A. P. et al. (2015) ‘Antioxidant Role for Lipid Droplets in a Stem Cell Niche of Drosophila’, Cell. The Authors, 163(2), pp. 340–353. doi: 10.1016/j.cell.2015.09.020.

      Bullen, H. E. et al. (2016) ‘Phosphatidic Acid-Mediated Signaling Regulates Microneme Secretion in Toxoplasma Article Phosphatidic Acid-Mediated Signaling Regulates Microneme Secretion in Toxoplasma’, Cell Host & Microbe, pp. 349–360. doi: 10.1016/j.chom.2016.02.006.

      Dass, S. et al. (2021) ‘Toxoplasma LIPIN is essential in channeling host lipid fluxes through membrane biogenesis and lipid storage’, Nature Communications. Springer US, 12(1). doi: 10.1038/s41467-021-22956-w.

      Endo, T. et al. (1987) ‘Effects of Extracellular Potassium on Acid Release and Motility Initiation in Toxoplasma gondii’, The Journal of Protozoology, 34(3), pp. 291–295. doi: 10.1111/j.1550-7408.1987.tb03177.x.

      Flueck, C. et al. (2019) Phosphodiesterase beta is the master regulator of camp signalling during malaria parasite invasion, PLoS Biology. doi: 10.1371/journal.pbio.3000154.

      Howard, B. L. et al. (2015) ‘Identification of potent phosphodiesterase inhibitors that demonstrate cyclic nucleotide-dependent functions in apicomplexan parasites’, ACS Chemical Biology, 10(4), pp. 1145–1154. doi: 10.1021/cb501004q.

      Jia, Y. et al. (2017) ‘ Crosstalk between PKA and PKG controls pH ‐dependent host cell egress of Toxoplasma gondii ’, The EMBO Journal, 36(21), pp. 3250–3267. doi: 10.15252/embj.201796794.

      Katris, N. J. et al. (2020) ‘Rapid kinetics of lipid second messengers controlled by a cGMP signalling network coordinates apical complex functions in Toxoplasma tachyzoites’, bioRxiv. doi: 10.1101/2020.06.19.160341.

      Lentini, J. M. et al. (2020) ‘DALRD3 encodes a protein mutated in epileptic encephalopathy that targets arginine tRNAs for 3-methylcytosine modification’, Nature Communications. Springer US, 11(1). doi: 10.1038/s41467-020-16321-6.

      Lourido, S., Tang, K. and David Sibley, L. (2012) ‘Distinct signalling pathways control Toxoplasma egress and host-cell invasion’, EMBO Journal. Nature Publishing Group, 31(24), pp. 4524–4534. doi: 10.1038/emboj.2012.299.

      Lunghi, M. et al. (2022) ‘Pantothenate biosynthesis is critical for chronic infection by the neurotropic parasite Toxoplasma gondii’, Nature Communications. Springer US, 13(1). doi: 10.1038/s41467-022-27996-4.

      McCoy, J. M. et al. (2012) ‘TgCDPK3 Regulates Calcium-Dependent Egress of Toxoplasma gondii from Host Cells’, PLoS Pathogens, 8(12). doi: 10.1371/journal.ppat.1003066.

      Moss, W. J. et al. (2022) ‘Functional Analysis of the Expanded Phosphodiesterase Gene Family in Toxoplasma gondii Tachyzoites’, mSphere. American Society for Microbiology, 7(1). doi: 10.1128/msphere.00793-21.

      Stewart, R. J. et al. (2017) ‘Analysis of Ca2+ mediated signaling regulating Toxoplasma infectivity reveals complex relationships between key molecules’, Cellular Microbiology, 19(4). doi: 10.1111/cmi.12685.

      Vo, K. C. et al. (2020) ‘The protozoan parasite Toxoplasma gondii encodes a gamut of phosphodiesterases during its lytic cycle in human cells’, Computational and Structural Biotechnology Journal. The Author(s), 18, pp. 3861–3876. doi: 10.1016/j.csbj.2020.11.024.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      In this manuscript, Dominicus et al investigate the elusive role of calcium-dependent kinase 3 during the egress of Toxoplasma gondii. Multiple functions have already been proposed for this kinase by this group including the regulation of basal calcium levels (24945436) or of a tyrosine transporter (30402958). However, one of the most puzzling phenotypes of CDPK3 deficient tachyzoites is a marked delay in egress when parasites are stimulated with a calcium ionophore that is rescued with phosphodiesterase (PDE) inhibitors. Crosstalk between, cAMP, cGMP, lipid and calcium signalling has been previously described to be important in regulating egress (26933036, 23149386, 29030485) but the role of CDPK3 in Toxoplasma is still poorly understood.

      Here the authors first take an elegant phosphoproteomic approach to identify pathways differentially regulated upon treatment with either a PDE inhibitor (BIPPO) and a calcium ionophore (A23187) in WT and CDPK3-KO parasites. Not much difference is observed between BIPPO or A23187 stimulation which is interpreted by the authors as a regulation through a feed-back loop. The authors then investigate the effect of CDPK3 deletion on lipid, cGMP and cAMP levels. The identify major changes in DAG, phospholipid, FFAs, and TAG levels as well as differences in cAMP levels but not for cGMP. Chemical inhibition of PKA leads to a similar egress timing in CDPK3-KO and WT parasites upon A23187 stimulation.

      As four PDEs appeared differentially regulated in the CDPK3-KO line upon A23187, the authors investigate the requirement of the 4 PDEs in cAMP levels. They show diverse localisation of the PDEs with specificities of PDE1, 7 and 9 for cGMP and of PDE2 for cAMP. They further show that PDE1, 7 and 9 are sensitive to BIPPO. Finally, using a conditional deletion system, they show that PDE1 and 2 are important for the lytic cycle of Toxoplasma and that PDE2 shows a slightly delayed egress following A23187 stimulation.

      Major comments:

      -Are the key conclusions convincing?

      The title is supported by the findings presented in this study. However I am not sure to understand why the authors imply a positive feed back loop. This should be clarified in the discussion of the results. The phosphoproteome analysis seems very strong and will be of interest for many groups working on egress. However, the key conclusion, i.e. that a substrate overlaps between PKG and CDPK3 is unlikely to explain the CDPK3 phenotype, seems premature to me in the absence of robustly identified substrates for both kinases.

      I am not sure there is a clear key conclusion from the lipidomic analysis and how it is used by the authors to build their model up. Major changes are observed but how could this be linked with CDPK3, particularly if cGMP levels are not affected?

      The evidence that CDPK3 is involved in cAMP homeostasis seems strong. However, the analysis of PKA inhibition is a bit less clear. The way the data is presented makes it difficult to see whether the treatment is accelerating egress of CDPK3-KO parasites or affecting both WT and CDPK3-KO lines, including both the speed and extent of egress. This is important for the interpretation of the experiment.

      The biochemical characterisation of the four PDE is interesting and seems well performed. However, PDE1 was previously shown to hydrolyse both cAMP and cGMP (https://doi.org/10.1101/2021.09.21.461320) which raises some questions about the experimental set up. Could the authors possibly discuss why they do not observe similar selectivity? Could other PDEs in the immunoprecipitate mask PDE activity? In line with this question, it is not clear what % of "hydrolytic activity (%)" means and how it was calculated. The experiments describing the selectivity of BIPPO for PDE1, 7 and 9 as well as the biological requirement of the four tested PDEs are convincing.

      -Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      The claim that CDPK3 affects cAMP levels seems strong however the exact links between CDPK3 activity, lipid, cGMP and cAMP signalling remain unclear and it may be important to clearly state this.

      -Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      I think that the manuscript contains a significant amount of experiments that are of interest to scientists working on Toxoplasma egress. Requesting experiments to identify the functional link between above-mentioned pathways would be out of the scope for this work although it would considerably increase the impact of this manuscript. For example, would it be possible to test whether the CDPK3-KO line is more or less sensitive to PKG specific inhibition upon A23187 induced?

      -Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      The above-mentioned experiment is not trivial as no specific inhibitors of PKG are available. Ensuring for specificity of the investigated phenotype would require the generation of a resistant line which would require significant work.

      -Are the data and the methods presented in such a way that they can be reproduced?

      It is not clear how the % of hydrolytic activity of the PDE has been calculated.

      -Are the experiments adequately replicated and statistical analysis adequate?

      This seems to be performed to high standards.

      Minor comments:

      -Specific experimental issues that are easily addressable.

      I do not have any comments related to minor experimental issues.

      -Are prior studies referenced appropriately?

      Most of the studies relevant for this work are cited. It is however not clear to me why some important players of the "PKG pathway" are not indicated in Fig 1H and Fig 3E, including for example UGO or SPARK.

      -Are the text and figures clear and accurate?

      While all the data shown here is impressive and well analysed, I find it difficult to read the manuscript and establish links between sections of the papers. The phosphoproteome analysis is interesting and is used to orientate the reader towards a feedback mechanism rather than a substrate overlap. But why do the authors later focus on PDEs and not on AC or CNBD, as in the end, if I understand well, there is no evidence showing a link between CDPK3-dependent phosphorylation and PDE activity upon A23187 stimulation? It is also unclear how the authors link CDPK3-dependent elevated cAMP levels with the elevated basal calcium levels they previously described. This is particularly difficult to reconcile particularly in a PKG independent manner.

      The presentation of the lipidomic analysis is also not really clear to me. Why do the authors show the global changes in phospholipids and not a more detailed analysis? As the authors focus on the PI-PLC pathway, could they detail the dynamics of phosphoinositides? I understand that lipid levels are affected in the mutant but I am not sure to understand how the authors interpret these massive changes in relationship with the function of CDPK3 and the observed phenotypes.

      Finally, the characterisation of the PDEs is an impressive piece of work but the functional link with CDPK3 is relatively unclear. It would also be important to clearly discuss the differences with previous results presented in this this preprint: https://doi.org/10.1101/2021.09.21.461320. My understanding is while the authors aim at investigating the role of CDPK3 in A23187 induced egress, the main finding related to CDPK3 is a defect in cAMP homeostasis that is not linked to A23187. Similarly, the requirements of PDE2 in cAMP homeostasis and egress is indirectly linked to CDPK3. Altogether I think that important results are presented here but divided into three main and distinct sections: the phosphoproteomic survey, the lipidomic and cAMP level investigation, and the characterisation of the four PDEs. However, the link between each section is relatively weak and the way the results are presented is somehow misleading or confusing.

      -Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      This is a very long manuscript written for specialists of this signalling pathway and I would suggest the authors to emphasise more the important results and also clearly state where links are still missing. This is obviously a complex pathway and one cannot elucidate it easily in a single manuscript.

      Significance

      -Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      This is a technically remarkable paper using a broad range of analyses performed to a high standard.

      -Place the work in the context of the existing literature (provide references, where appropriate).

      The cross-talk between cAMP, cGMP and calcium signalling is well described in Toxoplasma and related parasites. Here the authors show that, in Toxoplasma, CDPK3 is part of this complex signalling network. One of the most important finding within this context is the role of CDPK3 in cAMP homeostasis. With this in mind, I would change the last sentence of the abstract to "In summary we uncover a feedback loop that enhances signalling during egress and links CDPK3 with several signalling pathways together."

      The genetic and biochemical analyses of the four PDEs are remarkable and highlight consistencies and inconsistencies with recently published work that would be important to discuss and will be of interest for the field.

      While I understand the studied signalling pathway is complex, I think it would be important to better describe the current model of the authors. In the discussion, the authors indicate that "the published data is not currently supported by a model that fits most experimental results." I would suggest to clarify this statement and discuss whether their work helps to reunite, correct or improve previous models.

      Could the authors also speculate about a potential role of PDE/CDPK3 in host cell invasion as cAMP signalling has be shown to be important for this process (30208022 and 29030485)?

      -State what audience might be interested in and influenced by the reported findings.

      This paper is of great interest to groups working on the regulation of egress in Toxoplasma gondii and other related apicomplexan pathogens.

      -Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      I am working on the cell biology of apicomplexan parasites.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2022-01384

      Corresponding author(s): Mary O’Riordan and Basel Abuaita

      1. General Statements [optional]

      We appreciated the positive feedback and helpful suggestions from the reviewers that pointed to a need for more clarity regarding the central focus of the study. Our goal was to take an unbiased approach to evaluating the role of neutrophils during S. Typhimurium (STM) infection of human intestinal epithelial cells (IEC), using human intestinal organoids as a model. An abundance of data point to important inflammatory roles for neutrophils during STM infection of human intestine but the critical mechanisms involved have not been fully elucidated. New data now included in the revised manuscript provide strong support for human PMN-derived IL1-beta as a driver of epithelial cell shedding in STM-infected HIOs, consistent with known differences in local inflammation between human and mouse infection, and this is the focus of the current study. Our data did not support a significant role for human neutrophils in controlling luminal bacterial numbers, but instead the primary human PMNs robustly stimulated epithelial cell responses that led to decreased intraepithelial bacteria. Several recent studies have suggested that caspase-1 is not a critical inflammasome component during STM infection of IEC, which instead use non-canonical inflammasomes, including caspases-4 and -11. Our data point to a human neutrophil-intrinsic function for caspase-1 and IL1-beta that contributes to the inflammatory tone of the intestinal milieu early in STM infection.

      2. Point-by-point description of the revisions

      Reviewer #1

      Major comments:

      Some important links are missing to fully support the mechanistic model proposed:* *

      1- PMN activity

      The authors may strengthen their evidence of PMN activities presented in lines 135 to 143 and in Fig.S2 and S3. In particular, the authors claim that PMNs form NETs in PMN-HIOs but the evidence displayed are limited. In fact, Fig S2 shows the same condition and same staining as Fig 1B but the MPO-positive structures are different. Clarification in the text or the figure would be welcome. Besides, as the authors insist on the relevance of NETs in the discussion, it seems that a clear demonstration and characterization of these structures in the PMN-HIO model would highly benefit the manuscript.

      While we commented on NETs in our original manuscript, our conclusions do not rely on the presence or absence of NETs. We have therefore removed the NET data and the reference to NETs. While NETs are potentially interesting in the context of intestinal infection, we understand the reviewer's concern about NETs and anticipate that a more quantitative characterization of NETs may be challenging given the structure and variability of the PMN-HIOs.

      Regarding the analyses of the culture supernatants (Fig.S3), only 3 out of the 5 displayed datasets are commented on in the text. The data obtained for BD2 and N-Gal should be either commented or removed from the figure. The author further suggests that Elafin expression in presence of PMN may restrict PMNs' ability to kill Salmonella. Repeating the experiment displayed in Fig S1 in the presence of Elafin as well as in the presence of the supernatant extracted from HIOs and PMN-HIOs would clarify the potential inhibition of PMN killing capacity in the PMN-HIO model.

      We now include a sentence on the antimicrobials BD2 and N-GAL to the text (line 135-136). Elafin is one of many molecules that could potentially affect the ability of PMNs to kill Salmonella. We repeated the experiments in S3 Fig with recombinant Elafin. There was a very weak effect on killing in the presence of Elafin, however Elafin can also kill Salmonella directly, complicating interpretation of these experiments. We have now added a sentence in the Discussion to speculate that Elafin is one example of how the epithelium may inhibit the ability of PMNs to kill (line 366-372). These data are not central to our main conclusions and are only intended to provide context to the reader about possible explanations for why PMNs can kill Salmonella directly, but do not significantly alter total bacterial numbers in the HIO model.

      The author proposed that infected and uninfected cells are extracted from the epithelium due to PMN activation, suggesting that Salmonella infection of epithelial cells is only indirectly involved in cell shedding. This is an interesting hypothesis that could be tested by measuring cell shedding in a non-infected but PMN-activated (for instance with PMA) PMN-HIO model. This would clarify further the role of PMN in controlling epithelial response to the infection.

      We tested this possibility by microinjecting LPS into the lumen of PMN-HIOs (S6 Fig). There was significantly less TUNEL+ signal in LPS-injected PMN-HIOs compared to STM-infected PMN-HIOs, suggesting that active Salmonella infection is required for shedding of both infected and uninfected cells in the presence of PMNs__. __

      2- Specificity of RNA-seq profiling:

      The authors analyzed the transcriptomic profiling of PMN-HIOs and HIOs infected or not. While these experiments bring to light an interesting difference in inflammasome/cell death transcriptomic programs at the scale of the co-culture model, it is not possible to conclude from which cell type these transcriptomic shifts emerge. To clarify this, the authors stain the co-culture for ASC and observe that ASC-positive cells are PMNs. They conclude that PMNs are most likely the primary site of caspase-1 dependent production of IL1. While their model is theoretically consistent, more direct proofs are necessary to conclude on the cell-type specific transcriptomic program during infection of PMN-HIO and could be obtained by FACS sorting of the cells prior to RNA-seq, for instance using MPO to detect PMNs and E-cadherin to detect epithelial cells.

      We now provide evidence that pretreating PMNs with an irreversible Caspase-1 inhibitor before co-culturing with STM-infected HIOs prevented accumulation of luminal TUNEL+ cells (Fig 6B,C). Additionally, IL-1β treatment in the absence of PMNs recapitulated the cell death phenotype of the infected PMN-HIOs (Fig 6D,E) suggesting Caspase-1 activity in PMNs and IL-1β production are necessary for epithelial cell death in the PMN-HIOs.

      3- Roles of cytokine

      After showing an increased expression/release of IL1 and IL1RA in infected PMN-HIOs, the authors move on to testing the role of caspases on cell shedding. Yet, they do not test the impact of IL1 and IL1RA on cell shedding. As, according to their proposed model, IL1 is acting upstream of caspase-1 to promote cell shedding, testing cell shedding in infected PMN-HIOs in the presence of an IL1 inhibitor would clarify that link. The author also proposed that the decrease of IL33 in PMN-HIOs compared to HIOs could be due to PMN processing, which would give an additional role to PMNs in controlling the epithelial response to infection. In the context of this manuscript, it would be highly relevant to test this hypothesis by measuring the rate of cleaved IL-33.

      We now provide data to address these questions about IL-1 signaling. HIOs were microinjected with recombinant IL-1β during STM infection and PMN-HIOs were also treated with IL1RA during STM infection. Cell shedding was measured under these conditions in Fig. 6D-F. Cell shedding was dependent on IL-1 signaling and the model has been updated to reflect this.

      We also concentrated supernatants from STM-infected HIOs and PMN-HIOs, probed for cleaved IL33 via western blot and did see some cleavage. However, without being able to block this process it is not possible to conclude what role cleaved IL33 has during infection in the PMN-HIO and IL-1β seems to be sufficient to drive the cell shedding phenotype. Since the status of IL33 is not central to our conclusions, we have removed these data from the manuscript.

      4- Roles of caspase

      The interpretations of the role of Caspases to restrict bacteria burden are unclear and should be revised (see also minor comment). It appears that both Caspase-1 and Caspase-3 are necessary for efficient cell shedding (Fig4B), Caspase-1 (but not Caspase-3) decreases intraluminal bacteria burden (Fig4C) and Caspase-3 (but not Caspase-1) decreases epithelium-associated bacteria (Fig4D). To reconcile these observations with the hypothesis that cell shedding is responsible for the decrease of intraluminal and epithelium-associated bacterial burden, one may propose that caspase-3 (but not caspase-1) induces cell shedding of mainly non-infected cells (possibly bacteria-associated) and caspase-1 (but not caspase-3) induced cell shedding of infected cells. This could be tested by measuring the % of infected extruded cells upon caspase inhibitor treatments. In addition, these data don't allow to propose that Caspase-3 activation happens downstream of Caspase-1 as suggested by the authors in their abstract figure.

      It is difficult to accurately quantify the percent infected cells that are extruded since both infected and uninfected cells are extruded into a luminal space full of bacteria, which may associate with uninfected cells post-extrusion. However, we did observe cells positive for cleaved Caspase-3 when HIOs were treated with IL-1β leading us to infer that Caspase-1 mediated cytokine signaling through IL1R can trigger downstream Caspase-3 activation (Fig. 6G). We have expanded the Discussion to talk about differing roles of Caspases on bacterial burden and association with the epithelium (lines 374-397).

      Minor comments:

      The majority of the points listed below can be addressed with further analyses of pre-acquired data sets:

      Fig1E/1F/4D: each green dot is not likely to be individual bacteria but rather a cluster of bacterium (based on their size). So the y-axis in Fig 1E and Fig4D should not be #STM.

      Y-axis labels have been changed to #STM objects

      Fig2A: Variations in organoid size and epithelial thickness can be observed between figures. In particular, in Fig 2A, the HIO seems much younger than the other ones displayed in the manuscript.

      There is considerable natural variability between HIOs and between batches, a phenomenon observed by many HIO researchers (Hofer et al. Nature Reviews Materials 2021). HIOs were all treated the same way prior to infection, and based on our extensive observations, epithelial thickness does not correlate significantly with a particular experimental condition, as we now show in S10 Fig.

      Line 176 to 178, the authors mentioned the TUNEL+ cells in the mesenchyme but rule out the possibility that this phenotype could be infection or PMN-dependent because it is observed in the different conditions. As the picture displayed in Fig2A suggests high differences in the number of TUNEL+ cells in the mesenchyme under the 4 tested conditions, the authors should still quantify this phenomenon (possibly in the supplementary).

      This is likely an artifact of culturing and not due to the infection or PMNs. There is variability between HIO batches in the amount of TUNEL signal in mesenchymal cells (for example HIOs in Fig 4A and 5A have very low or no TUNEL positivity in the mesenchyme).

      "DAPI" should be written in blue.

      This has been corrected.

      Fig2C: Could the authors comment on the % of E-cadherin cells that are also TUNEL+? Is it 100%?

      On average about 75% of TUNEL+ cells are E-cadherin+. We think that this may be an underestimate because E-cadherin staining intensity decreases in many cells after shedding. This is commented on in the text (lines 178-179).

      Fig 2D: The point made on lines 182 to 186 that HIOs contain TUNEL + cells retained in the epithelial lining in the absence of PMNs is not very strongly supported by Fig 2D. Quantification of the number of intraepithelial TUNEL+ cells in the 4 compared conditions would make a more solid case.

      We quantified TUNEL intensity in epithelial cells retained in the monolayer (S7 Fig). We do note that there is some variability in this phenotype that correlates with different batches of HIOs__.__

      Fig2E: This experiment should be completed with a quantification of the percentage of TUNEL+ cells that are also cleaved caspase3-positive. The data, as currently displayed, do not prove that the cells negative for cleaved caspase 3 are apoptotic cells and thus do not support the sentence "suggesting that multiple forms of cell death were occurring in the PMN-HIO" (line 194).

      Cells negative for cleaved Caspase-3 that are TUNEL+ may be undergoing some other form of cell death that is not Caspase-3 dependent, such as necrosis. This possibility is consistent with the decreased TUNEL signal observed upon inhibition of Caspase-4 (Fig 5A,B)__. __However, we have reworded our conclusion to identify more clearly what the data indicate, and where we are drawing inferences.

      Fig3A: "IL1RN" should be changed for "IL1RA (IL1RN)" for consistency with Fig 3B.

      The heatmap shows gene expression data so IL1RN is more consistent with the gene nomenclature. However, we have added an asterisk to the label on the heatmap, along with a sentence in the figure legend to elucidate.

      Fig 4C: The authors should provide the percentage of infected cells rather than the number of bacteria per cell (this information can be included in supplementary).

      Percent infected cells has been moved to Fig 4C and the number of bacteria per cell has been moved to Fig 4D__.__

      FigS2: The different thicknesses of the epithelial layer observed between PBS and STM panels suggest a difference in scale. This may be double-checked by the authors.

      The images are scaled similarly – as noted earlier (S10 Fig), there is considerable natural variability between HIOs that is not correlated with any experimental condition in this study.

      Line 197-199, the authors claimed that uninfected cells may be observed in the cell lumen. This seems hard to observe/conclude at this resolution. The authors may show a non-infected cell at higher magnification. __

      We have added higher magnification images, uninfected cells are indicated with white arrows in S8 Fig.

      Discussion: Some important points should be added to the discussion. In particular, what is the fate of intracellular salmonellae after cell shedding? Can the bacterium survive cell apoptosis and burst out of the cell to re-infect the epithelium or be transferred to phagocytic cells during the clearance of intraluminal apoptotic cells? Previous studies showed that cytosolic hyper-replication could fuel cell shedding. The importance of bacterial load in PMN-induced cell shedding could be discussed.

      We have expanded the discussion to elaborate on what may happen to shed cells. One useful feature of the HIOs is that the enclosed lumen allows us to capture the cells to fully measure the extent of cell shedding, however in the intestine where there is flow these cells would be washed away and could help to reduce bacterial load in the intestine. This point is now made in lines 386-388 in the discussion.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Major concerns

      1) The authors show that only ~5% of the neutrophils have migrated to the lumen, which is a barely noticeable increase compared to PBS treated organoids. Does this reflect that the mucosal layer of the organoids might not produce neutrophil chemoattractants and that neutrophil recruitment during Salmonella is a bystander effect from a different cell type?

      This number indicates that PMNs are ~5% of total cells in the PMN-HIO (including epithelial and mesenchymal cells) during Salmonella infection (not that only 5% of PMNs added migrated). Moreover, PMNs were added to a well containing multiple HIOs. We also show that HIOs do produce neutrophil chemoattractants during infection (S1 Fig).

      2) How quickly are neutrophils recruited to the HIOs? The authors show one time point of 8 hours. Related to the relatively low number of neutrophils seen in their HIOs, is this perhaps a result of the time point they chose? Will they see more neutrophils recruited if they go longer?

      It is likely that 5% of total cells in the PMN-HIO represents a significant recruitment of PMNs, and our data clearly indicate a marked effect on the infected epithelium. PMNs can cause substantial tissue damage, and their recruitment and activation is known to be tightly regulated. Due to the short-lived nature of human PMNs it would be difficult to extend this experiment to later timepoints. We have experimentally characterized PMN migration at 24h and by that time, most of the PMNs that we observe are non-viable, thus we focused our studies earlier.

      3) The authors show that PMNs did not kill STM in their organoids, but they do in pure culture. Is this simply because of the low levels of neutrophils present in their HIOs, which would result in lower concentrations of antimicrobials being produced in the HIO lumen? If the authors are able to get higher levels of neutrophils in their HIOs, would they see increased bacterial killing?

      Neutrophils have both inflammatory signaling and microbicidal functions. For example, Cho, et al (PLoS Pathogens 2012) find that neutrophil-derived IL-1 beta is sufficient to support abscess formation in the innate immune response to Staphylococcus aureus soft tissue infection. Similarly, a recent study showed that activation of neutrophils by keratinocyte defensins in a S. aureus skin infection led to neutrophil IL1 beta and CXCL2 release that amplified antibacterial defenses (Dong, et al Immunity 2022). Moreover, in the native environment of the gut with extensive microbiome colonization, direct neutrophil microbicidal activity might be less effective against infection than signaling. Recruitment of higher levels of neutrophils in vivo or in the HIO might exacerbate damage of the epithelial barrier. In the discussion, we speculate there may be proteins, like Elafin, that are upregulated during infection and inhibit some neutrophil functions as a trade-off to control host tissue damage. We reason that our data strongly support an inflammatory signaling role for neutrophils to promote innate immune responses of the intestinal epithelium.

      4) Related to the above point, if the authors treat their HIOs with known neutrophil chemoattractants, can they increase the number of neutrophils that migrate into their organoids?

      High levels of chemoattractants are already being produced in the HIO in response to infection (S1 Fig). The most effective number of neutrophils in the context of intestinal infection may not be the highest number, given that neutrophils can cause tissue damage. Since we see a marked phenotype with the neutrophils that are recruited, we propose that this PMN-HIO model reveals important inflammatory signaling roles for PMNs to promote intestinal epithelial immune function.

      5) The authors speculate that Salmonella may "employ specific mechanisms to overcome PMN effector functions in the HIO luminal environment". Are any such mechanisms known? If so, the authors could test this hypothesis by repeating these experiments with Salmonella mutants in which these mechanisms are ablated. In this case, they should see increased killing of Salmonella by PMNs in the HIO lumen.

      The focus of this study was to test how PMNs contribute to the host response against wildtype Salmonella. In the PMN-HIO model, we find that neutrophils direct a robust epithelial cell extrusion response, impacted intracellular bacterial numbers, and that Salmonella luminal colonization is not affected by PMNs. Thus, our data are pointing to an important inflammatory role for neutrophils in the infected intestine. Indeed, reliance on direct bactericidal mechanisms in the intestinal lumen which in vivo would be colonized with the microbiota might be a losing strategy for neutrophils, which would be hugely outnumbered.

      6) Furthermore there is no information of the activation status of the neutrophils. How does the surface expression of CD16 CD62L, CD66 and CD11b look between the migrated and non-migrated and between infected and uninfected controls? Did the neutrophils de-granulate? Are they CD63+ or is the high levels of NGAL and S100 proteins an effect of lysis? The authors should also be careful in claiming that there is NETosis as the image in the supplement look more like an artifact than actual NETs.

      Our new findings suggest that IL-1 production by PMNs is the biggest factor in driving the cell death phenotype. We have also added a figure with CD63 staining. We were able to visualize some localization of CD63 to the cell surface of PMNs, consistent with degranulation (S4 Fig).

      7) Why does ASC translocate to the nucleus? Is the IL-1b cleavage mediated through Caspase-1 or Caspase-11? The neutrophils stained positive in the lumen appear to be intact, does this mean that pyroptosis does not occur, or does the IL-1b come from cells that did not migrate through the mucosal membrane? Staining for IL-1 and the different caspases might help resolve this question.

      ASC does not appear to be translocating to the nucleus. In Fig 3D the green signal (ASC) is primarily excluded from the DAPI-stained area. In this human model, Caspase-11 is not present, and inhibition of Caspase-1 is sufficient to block the cell shedding phenotype (Fig. 5A,B and Fig. 6B,C). We are unable to distinguish whether IL-1 is being produced by intact PMNs or PMNs that are undergoing pyroptosis. Unfortunately, there are not suitable antibodies for fixed immunofluorescence staining for cleaved Caspase-1, and as a secreted protein, IL-1 beta likely will not remain localized with the producer cell.

      8) The authors comment that there is substantial TUNEL staining in the mesenchyme independent of STM or PMNs, however, there is no explanation for why this happens. Does this have any downstream effects on the neutrophils that doesn't migrate towards the lumen?

      TUNEL positivity in the mesenchyme is likely an artifact of culturing and we have noted this in the text (line 169-172). The extent of TUNEL+ mesenchymal cells appears to be dependent on the batch of HIOs as not all HIOs exhibit this phenotype (for example Figs 4A and 6B). In contrast, the extent of TUNEL+ luminal cells is significantly dependent on the presence of PMNs and Salmonella.

      Minor comments

      1) The authors should remove that MPO is neutrophil-specific, monocytes are known to have higher MPO expression than neutrophils.

      In this controlled co-culture system there are no monocytes, therefore we have modified our text to indicate that MPO is used as a neutrophil marker in the PMN-HIOs (line 161).

      2) If the authors performed flow cytometry as they say, they should provide the flow plots and the gating strategy they used in the supplement.

      Representative flow plots for the data presented in Fig 1A are now included in S2 Fig. The data shown in Figs. 1A and S2 Fig are not gated.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Major Comments

      1.Overall the study is convincing and it is well-conducted. This reviewer found it surprising that the PMNs did not alter the total levels of STM in the HIOs as neutrophils are expected to control the infection. Can the authors elaborate on if the intraepithelial numbers are reduced, what happens to STM in the lumen? It would be convincing if the authors can extend the infection timeline to see if the neutrophils are capable of killing luminal STM. *

      One of the limitations of the HIO model is the lack of flow in the lumen. It is likely that shed cells would be removed from the body following extrusion in vivo. In the HIOs, since the cells are trapped in the lumen, Salmonella could then reinvade and so this phenotype might be even stronger in a model that incorporates flow. We have added this point to the discussion (lines 387-390). Due to the short-lived nature of PMNs, it is difficult to extend the infection beyond 8h. While in vitro experiments with just neutrophils and STM as we and others have performed might set the expectation that neutrophils would alter luminal bacterial levels, there is little to no direct evidence that neutrophil bactericidal activity is critical in the context of the intestinal environment (vs. releasing ROS or inflammatory signals that may have complex indirect effects). Indeed, an advantage of the HIO model is that we are able to test the function of neutrophils in a multi-component system, but one that is still sufficiently simplified that we can do some mechanistic analysis.

      2-It would be powerful to conduct the caspase inhibition on neutrophils prior to HIO co-culturing to convincingly show that the effects of caspase inhibition effect neutrophils which in turn effect the epithelium disrupting the epithelial load of STM.

      We appreciated this suggestion. We pretreated the PMNs with a Caspase-1 inhibitor for 1h prior to co-culture with infected HIOs. We found that this was sufficient to block TUNEL cell accumulation in the lumen of infected PMN-HIOs. These results are now presented in Fig 6B,C.

      3- While other caspases are well-established to be involved in Salmonella-related cell death and epithelial shedding, why did the authors picked caspase 3 but not caspase 4/5 to show activation in Fig 2?

      We have now also tested the role of Caspase-4 on cell shedding using z-LEVD-fmk inhibitor. Consistent with prior published studies, we found that Caspase-4 inhibition reduced the accumulation of TUNEL-positive cells in the PMN-HIO lumen. These results are presented in Fig 5. There are no detectable levels of Caspase-5 in the HIOs (S9 Fig).

      Minor comments

      Fig 1C It is not clear how the total bacterial burden was determined. Please include details such as the timepoint and sufficient details of the technique both in the results section and the legend.

      These details have been added in the figure legend (line 605-607). Briefly, HIOs were washed with PBS and homogenized in PBS at 8hpi. CFU/HIO were enumerated by serial dilution and plating on LB agar.

      • Fig S2. Authors claim that the PMNs form NETs in the lumen, however, the marker used in the immunostaining is MPO. Although a NETting is seen in the images, MPO staining is not sufficient to claim these are NETs. Additional staining is required to show if the neutrophils in the lumen are intact or formed NETs*.

      As noted in response to Reviewer #1, although we commented on NETs in our original manuscript, our conclusions do not rely on the presence or absence of NETs and our new data implicates PMN IL-1 as necessary and sufficient for the cell shedding phenotype. We have therefore removed the NET data and the reference to NETs. While NETs are potentially interesting in the context of intestinal infection, we understand the reviewer's concern about NETs and anticipate that a more quantitative characterization of NETs may be challenging given the structure and variability of the PMN-HIOs.

    1. ABSTRACT

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.66), and has published the reviews under the same license. These are as follows.

      Reviewer 1. Linzhou Li

      Are the data and metadata consistent with relevant minimum information or reporting standards? No. Geographic location (country and/or sea, region, latitude and longitude) is missing, as well as environmental context.

      Is there sufficient data validation and statistical analyses of data quality? No. The genome size and gene number of Dendrobium hybrid cultivar ‘Emma White’ differ greatly from the published Dendrobium genomes (e.g. Zhang et. al Scientific Reports 2016, Zhang et. al Horticulture Research 2021, Han et. al Genome Biology and Evolution 2020...). Specifically, the authors assembled a smaller genome and predicted a larger number of genes compared with the previous study. Therefore, I strongly suspect that the assembled genome is incomplete and fragmented, resulting in more fragmental genes.

      Is the validation suitable for this type of data? No. There's not enough raw data (~24Gb) to assemble a 600Mb (or ~1.2Gb from the previous study) genome. I highly recommend the authors get more raw data and do a genome survey.

      Additional Comments: The Complete BUSCOs only account for 16.6% which is quite low. The authors explain that the large loss of BUSCOs is due to the fact that the mutant genome has a lot of specific sequences, but these genes are very conserved in plants and should not be easily mutated.

      Reviewer 2. Stephanie Chen

      Is the language of sufficient quality? No. Most of the manuscript is written in a sufficient quality, but there are certain parts that require revision to improve readability. Please see detailed comments on the Word document.

      Are all data available and do they match the descriptions in the paper? No. The SRA link is coming up as a permission error, but I assume it will be released once the paper is available. There is no information on where to access the annotation file.

      Is the data acquisition clear, complete and methodologically sound? No. The contiguity (635,396 contigs, N50 of 1,620 bp) and completeness (16.60 %) of the genome is quite low and this may limit its downstream uses. It would be good to incorporate some long-reads or increased sequencing coverage to improve your genome. There are a number of chromosome-level Dendrobium genomes that are available (e.g. D. chrysotoxum and D. huoshanense) and scaffolding off these may be attempted to improve the assembly. Scaffolding from existing assemblies may be a good option if generating more sequencing reads is not feasible.

      Is there sufficient detail in the methods and data-processing steps to allow reproduction? No. Some details on the DNA extraction and library preparation steps are missing. In the methods section, there are also missing details for multiple programs in terms of the version and parameters (e.g. BUSCO version and database used, QUAST version, AUGUSTUS version, details on adapter removal and trimming). It is mentioned 'similarity score and description of each gene was filtered out using in-house pipeline'. The script and details of the pipeline are not provided; please add a reference or details in the manuscript e.g. link to GitHub repository.

      Is there sufficient data validation and statistical analyses of data quality? No. The reporting and interpretation of BUSCO results ('BUSCO version 5.2.2 analysis reveals 913 (56.57%) single-copy orthologs doesn’t match with any data bases indicates the unique and possible uncharacterized sequences in mutant genome of Dendrobium hybrid cultivar') needs to be revisited. There needs to be additional validation of the gene annotation (e.g. BUSCO, comparison with existing Dendrobium annotations) and also some validation of the genome size (e.g. GenomeScope and comparison with reported flow cytometry measures).

      Is the validation suitable for this type of data? Yes. The type of validation in the manuscript (BUSCO) is suitable to assess genome completeness, but reporting and discussion of the results needs to be revised. Some additional validation is also needed (see box above).

      Additional Comments: In this manuscript, the authors provide a draft genome of a gamma-ray induced mutant of a Dendrobium hybrid cultivar using Illumina sequencing that will assist with future breeding efforts and studies. However, I am not convinced of the genome's usefulness in its current form. There are some methods that need to be described in more detail to be reproducible. Revisions will also help improve the readability of the manuscript. As page and line numbers are not provided on the manuscript, please find additional comments directly added to manuscript file attached.

      https://gigabyte-review.rivervalleytechnologies.com/download-api-file?ZmlsZV9wYXRoPXVwbG9hZHMvZ3gvRFIvMzA2L1Jldmlld19TQ185Njc2XzA1MDIyMl9HaWdhYnl0ZV9HYW1tYSBXR1MgZGF0YW5vdGUgKDEpLmRvY3g~

      Re-review: Thank you to the authors for addressing the previous comments on the manuscript. I generally find the revisions satisfactory, although have some follow up comments. The addition of details on the genetic origin of the Dendrobium ‘Emma White’ hybrid cultivar and requested details on bioinformatic tool versions/parameters have strengthened the manuscript. The authors have not followed up on the suggestion to improve the genome via scaffolding, but provide an explanation that existing chromosome-level assemblies/sequencing data of Dendrobium species are not suitable as they are not related to the hybrid cultivar the authors studied, implying that they are highly diverged and scaffolding would not meaningfully improve the genome. Given this information, I think the Dendrobium ‘Emma White’ hybrid cultivar genome can still be useful for orchid breeding efforts despite low contiguity and completeness. However, I do not agree with the author’s point of, “Third, we used low coverage genome analysis with short reads of gamma mutant Dendrobium hybrid cultivar, as it was the first case study and obtained SRA, genome assembly and TSA accessions from NCBI. The genome assemblies of Dendrobium species from earlier studies used both long reads and short reads in their study. Construction of scaffolding from such database species using our contigs may be skewed and shall give unreliable data based on above points mentioned. Hence, I opinioned that suggestion given by Reviwer 2 on scaffolding suggestion may not be correct point.” Even if different types of sequencing technologies were used in comparison to Emma White genome, the availability of a contiguous closely related reference genome would still be useful for reference-guided scaffolding of the draft genome and well as comparative analyses. Lines 107-109: Reorder sentence to make the order of the steps clear i.e. adapter removal and quality filtering before assembly with MaSuRCA. Also, on the MaSuRCA GitHub (https://github.com/alekseyzimin/masurca), it says “Avoid using third party tools to pre-process the Illumina data before providing it to MaSuRCA, unless you are absolutely sure you know exactly what the preprocessing tool does. Do not do any trimming, cleaning or error correction. This will likely deteriorate the assembly.” Did the authors find that the pre-processing meaningfully improved the quality of the assembly, compared to if the raw reads were input straight into the assembler? Please justify the preprocessing of reads. Suggest to reword lines 137-139 “BUSCO version 5.2.2 analysis reveals 913 (56.57%) single-copy orthologs doesn’t match with any data bases indicates the impact from evolutionary development of hybrid cultivars and influence of gamma radiation. It is because, the genome of ‘Emma White’ hybrid cultivar of Dendrobium derived from five unique different species is complex genome and continuously hybridized repeatedly 11 times over a period of 68 years with selection process for economic trait improvement” to make the explanation clearer and also to include the number and/or percentage of complete BUSCOs. This was flagged in the previous comments, but not fully resolved and would benefit from revisiting the interpretation of BUSCO results. There are a large number of missing BUSCOs in your assembly, likely related to low contiguity (as well as radiation which is mentioned). Can you discuss if/how this may be a limitation for using this genome in further studies? You suggest that the BUSCOs are not found in the assembly due to many rounds of trait selection and radiation. It is possible that some of the BUSCOs are indeed missing from the particular plant sequenced, but how can you be certain that this is due to the breeding history and radiation applied as implied in the text, and not low genome contiguity? Some papers which characterised gamma irradiation-induced mutations in plants (DOIs: 10.1093/jrr/rraa059, 10.1186/s12864-019-6182-3, 10.1534/g3.119.400555) indicate that it is unlikely as many as 913 BUSCO genes have been affected. Even with stronger doses of radiation than used on the orchid, the number of mutations/genes affected is much lower. The genus name needs to be consistently italicised throughout the manuscript.

      Re-re-review: Thank you to the authors for addressing the previous comments on the manuscript. The authors have followed up on the suggestion scaffold the genome by using the published Dendrobium huoshanense genome to scaffold their draft genome using RagTag. This is an appropriate tool to use and has improved the contiguity of the draft assembly which is good to see. In the methods, the version of RagTag is missing, as are the parameters used to run the program. Please also provide specification on the specific RagTag utilities used (correct, scaffold, patch and/or merge). The authors have added genome statistics for two other orchids and the scaffolded assembly in Table 1, however, have not added BUSCO results for their scaffolded assembly in Table 2. Also, can the authors provide a comment on if the low BUSCO values may be related to the fragmented assembly as brought up in the previous round of review? It will be interesting to see if BUSCO has improved with the scaffolding. BUSCO results for the other two species, D. catenatum and D. huoshanense, would also be a good point of comparison and this is relatively simple and quick to add. The authors could consider concatenating Table 1 and 2 in this case. The draft assembly has improved, and the authors should report numbers on the final version of the assembly presented in the paper (i.e. the scaffolded assembly) in terms of the analysis they have run. In the results and discussion section, it appears some of the statistics (e.g. 96,529 genes, 216,232 SSRs) still refer to the first draft assembly. The authors have clarified that raw reads were used as input into MaSuRCA (line 111) and have now included the necessary detail for the input and parameterisation of the program. Line 157-159: “Taxonomical analysis of mutant Dendrobium at raw sequence data also revealed limited synteny with its closest Dendrobium catenatum species at below 9% and genetically heterogeneous with outcrossing nature”. Details of how this analysis was done is missing from the methods. It may be more appropriate to perform synteny analysis at the genome level and compare the published D. catenatum genome with the scaffolded Dendrobium hybrid genome.

      Editors comment: Additional Editorial Board assessment and feedback was received during the review process.

    1. Reviewer #1 (Public Review):

      The present study aims to define the main immune cell subsets found in the hemolymph of the white shrimp, P. vannamei. This is significant because this species is heavily farmed around the world to meet the demand of the human consumption market. Yet, farmed shrimp suffer from infectious diseases and therefore we need to understand how their immune system works to design strategies that decrease infection losses.

      Classification of crustacean (and other invertebrates) hemocytes is difficult due to the lack of antibodies to use traditional flow cytometry approaches. Furthermore, hemocyte purification is not easy, cells die and clump, again precluding flow cytometry studies. Thus, the majority of what we know about shrimp hemocytes is based on morphological classification. This study contributes significantly to advancing our knowledge of shrimp Immunobiology by defining hemocyte subsets based on their transcriptional profiles.

      Another strength of the paper is that some function in vivo assays (phagocytosis) are presented in an attempt to validate the single-cell data. The authors frame their question or try to frame their question with a more evolutionary angle, such as whether the macrophage-like cell is the evolutionary precursor of human macrophages. I think that this question is not really achievable because the evolution of innate immune systems may have diverged in many branches of the metazoan tree of life. The authors, however, identify gene markers that are conserved in macrophages from shrimp and humans and that is a fair conclusion. There are some methodological caveats to the study and the manuscript needs to be heavily edited to improve language as well as to increase the depth of the interpretation.

      In summary, there are interesting findings in this manuscript but the manuscript needs to be significantly improved so that its quality and impact are elevated.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Dr Riley and colleagues reports a novel link between molecular clock operative in skeletal muscle and titin mRNA, encoding for essential regulator of sarcomere length and muscular strength. Surprisingly, this clock-mediated regulation of titin occurs at the level of splicing, as demonstrated by SDS-VAGE analyses of skeletal muscle from muscle-specific Bmal1KO mice compared to Bmal1wt counterpart. Concomitant with switch of predominant isoform of titin, skeletal muscle of muscle specific Bmal1KO mice exhibited irregular sarcomere length. Moreover, the authors show that this shift of titin splice is causal for such sarcomere length irregularity and for altered sarcomere length in muscle from the mice with compromised clock function. Importantly, the authors provide compelling evidence that Rbm20, encoding for RNA-binding protein that mediates splicing of titin, is cooperatively regulated by Bmal1-Clock heterodimer and MyoD, via enhancer element in intron 1 of Rbm20, thus identifying Rbm20 as a novel direct clock-regulated gene in the skeletal muscle. Strikingly, rescue of Rbm20 in muscle specific Bmal1KO animals' results in rescue of titin splicing pattern and protein size, suggesting that Rbm20 mediates the regulatory effect of Bmal1 on titin splicing and represents a mechanistic link between the clock and regulator of sarcomere length and regularity.

      We thank reviewer 1 for the very kind comments. We agree that the circadian regulation of titin in any capacity is surprising. We are excited about the implications of our work for cardiac muscle and its therapeutic potential in human skeletal muscle.

      Reviewer #2 (Public Review):

      In this work the authors investigated whether deleting the BMAL1 gene, an integral component of the cellular clock that drives the circadian rhythms of cells, affects the giant protein titin. They report that deleting BMAL1 in skeletal muscle alters the splicing of titin and that this might underlie an increase in sarcomere length dispersion. They show that the effect is through the titin splicing factor RBM20. This work has high novelty and has the potential to add to our understanding of muscle physiology. It is unclear whether splicing of skeletal muscle titin indeed undergoes a circadian rhythm. This could be easily checked using protein gels or RNA seq in muscle samples collected at different times of the day.

      We appreciate the question and recognize that our original manuscript did not clearly outline that the circadian clock regulates both rhythmic and non-rhythmic gene expression. In this study, the target of the muscle clock is expression of Rbm20 mRNA which is not a rhythmically expressed gene in muscle. This has now been addressed in the manuscript.

      Based on the estimated titin turnover and incorporation rates of titin (Cadar et al., 2014), we do not believe that skeletal muscle titin splicing undergoes a circadian rhythm. However, we believe our data highlights the growing recognition of the molecular clock in regulating non-rhythmic processes. We have added data from a chronic phase advance model of circadian disruption with wildtype mice and identify that disrupted circadian rhythms are sufficient to change Rbm20 expression in skeletal muscle (Figure 5).

      This work would be more convincing if the sarcomere length dispersion was investigated in greater detail. Showing this in one muscle type only (TA), in muscles fixed at one length only, and not showing sarcomere length dispersion in the rescue experiment of Figure 6, is rather limited.

      We agree that our analysis of sarcomere length dispersion across joint angles would be interesting but we think it is beyond the scope of this study. As noted above, the premise of this study emerged from our early work in which we found that skeletal muscle from 2 different genetic mouse models of circadian disruption, Bmal1 KO mice as well as the Clock mutant mice, exhibit decreased maximum specific force with significant disruptions to sarcomere structure (Andrews et al., PNAS, 107 (44) 19090-19095 2010). The primary focus of this study was to address the mechanistic link between the muscle circadian clock, its transcriptional targets with a focus on sarcomere structure and our first clue was with the expression of titin isoforms. We included analysis of sarcomere length as an outcome measure because it is a fundamental feature of skeletal muscle, it has links to mechanical function and it is a structure that can be modified by titin spliceforms.

      A small increase in sarcomere length variation as suggested in Figure 2 is unlikely to have a great functional consequence. If it were, how can muscles that express naturally long titin isoforms (soleus, EDL, diaphragm, etc), function well?

      We did not intend to suggest that we see an increase in sarcomere length in Figure 2 and have clarified the figure and text accordingly. The change we see is related to the variability of sarcomere length; we do not see any change in the average sarcomere length. The topic of titin spliceform specialization and the contribution to sarcomere structure and function across different muscle groups (soleus vs. EDL vs. Diaphragm) is a really interesting question but beyond the scope of this study.

      Reviewer #3 (Public Review):

      This manuscript is using an inducible and skeletal muscle specific Bmal1 knockout mouse model (iMSBmal1-/-) that was published previously by the same group. In this study, they utilized the same mouse model and further investigated the effect of a core molecular clock gene Bmal1 on isoform switching of a giant sarcomeric protein titin and sarcomere length change resulted from titin isoform switching. Lance A. Riley et al found that iMSBmal1-/- mouse TA muscle expressed more longer titin due to additional exon inclusion of Ttn mRNA compared to iMSBmal+/+ mice. They observed that sarcomere length did not significantly change but more variable in iMSBmal1-/- muscle compared to iMSBmal+/+ muscle. In addition, they identified significant exon inclusion in the proximal Ig region, so they measured the proximal Ig length domain and confirmed that proximal Ig domain was significantly longer in iMSBmal1-/- muscle. Subsequently, they experimentally generated a shorter titin in C2C12 myotubes and observed that the shorter titin led to the shorter sarcomere length. Since RBM20 is a major regulator of Ttn splicing, they determined RBM20 expression level, and found that RBM20 expression was significantly lower in iMSBmal1-/- muscle. The reduced RBM20 expression was regulated by the molecular clock controlled transcriptional factor MyoD1. By performing a rescue experiment in vivo, the authors found that rescue of RBM20 in iMSBmal1-/- TA muscle restored titin isoform expression, however, they did not measure whether sarcomere length was restored. These data provide new information that the molecular cascades in the circadian clock mechanism regulate RBM20 expression and downstream titin isoform switching and sarcomere length change. Although the conclusion of this manuscript is mostly supported by the data, some aspects of experimental design and data analysis need be clarified and extended.

      Strengths:

      This paper links the circadian rhythms to skeletal muscle structure and function through a new molecular cascade: the core clock component Bmal1-transcription factor MyoD1-RBM20 expression-titin isoform switching-sarcomere length change.

      Utilization of muscle specific bmal1 knockout mice could rule out the confounding factors from the molecular clock in other cell types

      The authors performed the RNA sequencing and label free LC-MS analyses to determine the exon inclusion and exclusion through a side-by-side comparison which is a new approach to identify individual alternative spliced exons via both mRNA level and protein level.

      We agree that the side-by-side analysis from RNAseq and LC-MS data are novel and provides a foundation for others wanting to study both titin mRNA and protein. In this version, we have expanded this work to include samples from our Rbm20 rescue model (Figure 6). Similarly, to our approach in the muscle specific Bmal1 knockout model, these results confirm our RNA-seq results and indicate that LC-MS is a suitable method to measure titin protein isoform. We note that while more work is needed to confirm the broad utility of the LC-MC approach, it may be a suitable alternative to RNA-seq for measuring region-specific, and possibly exon-specific, changes in titin isoform expression.

      Weaknesses:

      Both RBM20 expression and titin isoform expression varies in different skeletal muscles. The authors only detected their expression in TA muscle. It is not clear why the authors only chose TA muscle.

      The reviewer, like Reviewer 2, raises a good point about muscle specificity as this is a significant challenge for research in the field of skeletal muscle. As we noted above, our primary focus was on the TA because our goal was to study the molecular links between the muscle circadian clock and titin expression with inclusion of analysis of a structural outcome, sarcomere length variability. This muscle is well suited for the combination of approaches employed. We recognize the limits of using a single muscle, but we note that the we used ChIPseq data that provided the initial clues that CLOCK and BMAL1 bind to a site within intron 1 of the Rbm20 gene came from gastrocnemius and not TA muscle samples . Our targeted ChIP-PCR confirms that CLOCK and BMAL1 bind to the same intron 1 location from TA muscle samples. In addition, we have included data from quadriceps and TA muscles in our chronic jet lag model in which we use an environmental manipulation to disrupt the muscle clocks. We believe that the edits to the text and inclusion of this data strengthen and extends our findings to other muscles through circadian disruption and not only a genetic knockout model.

      The sarcomere length data are self-contradictory. The authors stated that sarcomere length was not significantly changed in muscle specific KO mice in Line 149, however, in Line 163, the measurements showed significantly longer in muscle specific KO muscle. The significance is also indicated in Figures 2C and 3B.

      We apologize for the miscommunication. The significance indicated in Figure 2C refers to the significant difference in variability of sarcomere length and not a significant difference in sarcomere length. The difference in Figure 3B is to indicate a slightly longer but significantly different from control sarcomere length, but also a significant difference in sarcomere length variability. To make this difference clear, we have changed the symbol for significantly different variability from * to # in both Figures 2C and 3B. We hope this clarifies our findings.

      Manipulating titin size using U7 snRNPs linking to the changes in sarcomere length and overexpressing RBM20 to switch titin size are the concepts that have been proved. These data do not directly support the impact of muscle specific Bmal1 KO on ttn splicing and RBM20 expression

      We agree that the use of U7 snRNPs does not directly support the impact of muscle specific Bmal1 KO on titin splicing and RBM20 expression; however, that was not the goal of this set of experiments. Several papers have recently indicated titin’s role as a sarcomeric ruler (Tonino 2017, Brynnel 2018), but none of them have investigated the proximal Ig domain that we identified as regulated by the circadian clock disruption. Because of this, we thought it necessary to show this region specifically contributes to sarcomere length using our cell culture model. Further, we think this point strengthens our study as it suggests that in the absence of a clock effect, altering the proximal Ig domain of titin directly alters sarcomere length adding to the growing evidence base that titin acts as a sarcomeric ruler. We have edited the text of the results and the discussion to clarify this point.

      There is no evidence to show if interrupted circadian rhythms in mice change RBM20 expression and ttn splicing, which is critical to validate the concept that circadian rhythms are linked to Ttn splicing through RBM20.

      We recognize this concern and have performed a new study in which we used a model of chronic jet lag in normal adult C57BL6 mice as a model to disrupt the muscle clock (Wolff, Duncan and Esser, JAP 2013). This new data has been added in Figure 5 and shows that by altering the lights on: lights off schedule every 4 days for 8 weeks, mimicking repeated jet lag, we disrupt Rbm20 expression in TA and gastrocnemius muscle (note, this is new data for both the muscle and clock fields). Concomitant with changes in clock gene expression we reported in 2013, we found that mRNA expression of Rbm20 is altered as well. These findings confirm that normal muscle clock disruption is sufficient to alter expression of Rbm20.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      This is already a full revision, not a revision plan. All points were carefully addressed. TMF

      July 28, 2022

      RE: Review Commons Refereed Preprint #RC-2022-01555

      Dear Dr. Fuchs,

      Thank you for sending your manuscript entitled "Dissecting the invasion of Galleria mellonella by Yersinia enterocolitica reveals metabolic adaptations and a role of a phage lysis cassette in insect killing" to Review Commons. We have now completed the peer review of the manuscript. Please find the full set of reports below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript Saenger et al. concentrate on the pathophysiological details of insect larvae infection by Yersinia enterocolitica. The authors studied the colonisation, proliferation, tissue invasion, and killing activity of the bacteria in Galleria mellonella larvae. Their study provides valuable evidence for the biological relevance of Tc toxins and a neighboring holin-endolysin cassette during establishment of Y. enterocolitica infection in Galleria mellonella larvae through the oral route. The findings of the authors provide important novel insights, that can be used for the development of Tc toxins as biopesticides.

      In general, this is a nice study. The data and the methods are presented well so that they can be reproduced and the key conclusions convincing.

      Unfortunately, the manuscript is sloppily written in some places, including grammatical and formatting errors. Citations regarding the structure and mechanism of action of Tc toxins are arbitrarily chosen, often taking the wrong ones and important aspects are left out. I highly recommend that the authors read the review of Roderer and Raunser 2019 that nicely describes and summarizes the molecular mechanism of Tc toxins.

      Answer: We have now improved the writing of the manuscript and corrected several errors and typos. In particular, the review by Roderer and Raunser, as well as other literature in the field, is now considered and cited in the text.

      The abstract ends with a speculation: "Suggesting that this dual lysis cassette is an example for a phage-related function that has been adapted for the release of a bacterial toxin" - this is likely true, but not proven in this work. What if it is used for the release of something else like extracellular DNA needed for biofilm formation (see https://doi.org/10.1038/ncomms11220)?

      Answer: This sentence was carefully written as a hypothesis strengthened by the data obtained in our study. Experimental evidence for this assumption is the strong correlation of toxin and HE cassette phenotypes of mutants (see abstract), the highly conserved localisation of the cassette within Tc loci of distinct bacterial genera (see discussion for literature), and the synchronic regulation of both the toxin and the lysis genes (manuscript in preparation). Moreover, strain W22703 is unable to form biofilms in contact with invertebrates (Spanier et al., AEM 2010). There, also in accordance with other reviewers, we would like to keep this statement in the text. However, to address this interesting point, we now mention the finding of Turnbull et al. in the discussion (see last paragraph).

      In addition to that, several outstanding issues must be addressed:

      1. Line 45 3-D structural analysis of the tripartite Tc suggests a 4:1:1 stoichiometry of the A, B and C subunits, with the A subunit forming a cage-like pentamer that associates with a tightly bound 1:1 sub-complex of B and C. This is wrong. The stoichiometry is 5:1:1 and the structure is not a cage. The statement was taken from citation 3. However, citation 3 should not be used, since the stoichiometry as well as the structure that was determined there is wrong. Use Landsberg et al. 2012 PNAS, Gatsogiannis et al. 2013 Nature instead.

      Answer: We apologize for misunderstanding the literature. Reference Lee et al. was removed here, and the two papers plus Meusch et al. (Nature, 2014) are now cited. The stoichiometry was corrected, “cage” was removed.

      "Few bacteria are known to successfully colonize and infect invertebrates" - needs a reference.

      Answer: This was modified to “Several bacteria…”, and we cite the recent paper by Weber and Fuchs (in press) that in Table 7g lists more than 40 bacterial species pathogenic towards insects.

      "Their oral insecticidal activity is comparable to that of the Bacillus thuringiensis- (Bt)- toxin" - reference missing.

      Answer: The reference is now cited (Bowen et al., Science 1998). Please see the last paragraph of the paper.

      "Type a, type b and type c" subunits is not usual for the literature. Please use TcA, TcB, TcC. A-, B-, and C-components should be abbreviated as TcA, TcB and TcC respectively in order to be in line with recent literature on the topic.

      Answer: This was corrected accordingly.

      Is TccC an ADP-ribosyltransferase or does it have a different biochemical activity?

      Answer: This is unknown with respect to the Tc of Y. enterocolitica. In the introduction, we now refer on P. luminescens and do not further attribute such a function to the TcC of Y. enterocolitica. In the abstract, we replaced “ADP-ribosylating” with “toxic”.

      "The toxic and highly variable carboxyl-terminus of TccC that has recently been demonstrated to ADP-ribosylate actin and Rho-GTPases" - this is only certain for TccC3 and TccC5 from P. luminescens. There are many such C-termini, called HVRs which have not had their activities determined yet, see here: https://doi.org/10.1371/journal.ppat.1009102

      Answer: We agree and cite this article. See also the response to comment 5 above.

      "is probably followed by receptor-mediated endocytosis" - more recent references exist for the receptor binding of Tc toxins.

      Answer: We added two references pointing to glycans as receptors of the Tc (line 52).

      "A pH decrease then triggers the injection of a translocation channel formed by the pentameric TcaA subunits into the endosomal vacuole, followed by the subsequent release of the BC subcomplex into the cytosol of the target cell" - this again is incorrect. Please read the above mentioned review and correct this passage accordingly.

      Answer: We agree. This phrase was rewritten to “The attachment of the Tc to the host cell membrane is either followed by receptor-mediated endocytosis or release of the ADP-ribosyltransferase into the target cell {Landsberg, 2011 #738;Sheets, 2011 #742}{Meusch, 2014 #788}. In a pH-dependent manner, the TcA translocation channel injected into the membrane of the host cell. Conformational changes then allow the toxic component to be released into the translocation channel of TcA and from there into the cytosol {Meusch, 2014 #788}{Roderer, 2019 #871}.” (Lines 51-56)

      What is meant by "environmental Yersinia species"?

      Answer: This was corrected to “…and in Y. mollaretii.”

      In the relevant W22703 pathogenicity island sequence (https://www.ncbi.nlm.nih.gov/nuccore/AJ920332) previously submitted by the same group, something odd is going on with the TcA component: it appears to be split into three polypeptides (tcaA, tcaB1, tcaB2). In the manuscript you state TcA is made up from only tcaA and tcaB. Could you please address this?

      Answer: Shotgun sequencing was performed 15 years ago, and mapping revealed a frameshift within tcaB that resulted in the split annotation of tcaB. Even if this frameshift is not the result of a sequencing error, it obviously does not result in Tc inactivation. As this frameshift was not identified in most other Tc-PAI of yersiniae, we assume our statement to be correct.

      "And their products were recently shown to act as a holin and an endolysin, respectively" - missing reference.

      Answer: The reference is now cited (Springer et al., JB 2018).

      "Its Tc proteins are produced at environmental temperatures, but silenced at 37{degree sign}C." versus "Remarkably, HolY and ElyY lyse Y. enterocolitica at body temperature, but not at 15{degree sign}C". Please address the issue that HolY/ElyY lyse the bacteria at temperatures where Tc proteins are not produced.

      Answer: In the absence of in vitro conditions activating the HE gene cassette, we used the pBAD system to artificially overexpress the two genes and showed cell lysis at 37°C, but not at 15°C (Springer et al., JB, 2018). This finding points to a lack of cell lysis as prerequisite for TC release and strengthens the hypothesis of a new secretion system as now corroborated in the last paragraph of the discussion. To avoid confusion of readers, the sentence was removed from the manuscript.

      "Nematodes, which are easily maintained in the laboratory without raising ethical issues, have successfully been used to identify virulence-related genes in a broad set of bacterial pathogens" - what is the relevance of this for the current manuscript?

      Answer: Invertebrates are introduced here as infection models. Nematodes are mentioned here for two reasons: yersiniae are nematocidal due to the Tc, and their immune system is less elaborated than that of G. mellonella, thus explaining its preferred use as insect model. We shortened the sentence by deleting the phrase in commas.

      Fig. 1C - no description is given for the labels 1-8.

      Answer: This is given below figures 1E-H. The labels are valid for all figure panels to ease reading.

      "The hemolymph of these cadavers was found full of Y. enterocolitica cells" - injected CFUs are provided here, but not final CFUs in the cadavers (although referred to in a later section). Please address this.

      Answer: These were preliminary experiments to identify the optimal infection dose. Hemolymph content was plated, but cell numbers in the hemolymph were not enumerated. This sentence therefore now reads: “…and the hemolymph of these cadavers contained Y. enterocolitica cells.” (lines 113-114).

      What is the inducing agent used for pACYC-tcaA and pACYC-HE? Why would "slight leakiness of the pBAD-promoter" make pBAD-tccC non-inducible? Were colonies taken from the cadavers to verify that the bacteria still contained these plasmids?

      Answer: Within pACYC, the genes tcaA and hlyY/elyY (HE) are under control of their own promoters as indicated in Table S2. In general, pACYC vectors are often and successfully used for complementation due to middle copy number.

      This now reads “Due to the slight leakiness of the pBAD-promoter, arabinose was not added to further induce tccC transcription.” (lines 133-134).

      The presence of the plasmids in vivo was confirmed by periodic plating on selective and non-selective plates, not revealing differences in cell numbers.

      Can the authors please address the TD50 of 1.83 days for W22703 ΔHE/pACYC-HE versus 3.67 days for WT bacteria? This would mean that the former kill larvae twice as fast as usual. I would not call this "did not significantly differ in their insecticidal activity".

      Answer: This statement is indeed not very intuitive given the variations of the TD50-values. However, the significance here (and elsewhere in the text) is based on a statistical calculation. For the Kaplan-Meier-plot, we used an application (K.T.Bogen, Advances in Molecular Toxicology, 2016; Exponent Health Sciences, Oakland, CA, United States; Johann Kummermehr, Klaus-Rüdiger Trott, Stem Cells, 1997; Academic Press, London, San Diego) based on all data of a graph. However, to consider this point and to not confuse the readers, the phrase was modified to “…did not significantly differ in their insecticidal activity from that of the parental strain W22703 after one week, demonstrating…” (lines 135-138).

      Fig. 2 is missing survival data for larvae infected with tcaA, HE, and tccC KO bacteria.

      Answer: These data are shown and are equal to the LB-control, e. g. the survival rate of larvae infected with strains W22703 lacking HE, tcaA, or tccC were 100%.

      "And a slight colouring of some of the larvae from one h p.i. on (data not shown)" - best show the data or remove this statement.

      Answer: Although we observed this phenomenon regularly, monitoring and documentation cannot be provided and would not substantially strengthen the manuscript. We therefore deleted this phrase.

      The infection of larvae by W22703 ΔtccC/pBAD-tccC is missing, the other bacterial variants are present. Please address this.

      Answer: Infections with W22703 DtccC are not shown to not overload the figure, please see the panel below. W22703 DtccC/pBAD-tccC infections have not been documented by photos. Figure legend 3 now reads “Infections with W22703 DtccC and DtccC/pBAD-tccC are not shown.”

      "initially proliferated from an application dose of 4.0 × 105 CFU and 4.0 × 105 CFU, respectively, to 2.2 × 106 CFU and 2.8 × 106 CFU, but could not be detected from day three on. This finding strongly suggests that TcaA is involved in adherence to epithelial cells and thus in midgut colonization". Please address the "initially proliferated" (which day post-infection?), their elimination from the larvae (how, why?), why the tccC KO bacteria were more virulent than tcaA KO bacteria, and where the suggestion about TcaA involvement specifically in adherence comes from.

      Answer: “initially proliferated” was rewritten to “proliferated within the first day p.i.”. (line 163)

      Elimination: This now reads “…was completely absent six days p.i., probably due to passage through the gut followed by excretion”. (lines 161-162)

      In our view, the tccC knockout mutant is not more virulent than W22703 DtcaA (se Fig. 2), but replicates during the first day post infection, whereas the cell numbers of the tcaA KO mutant strongly decrease already within the first 24 h p.i.. This prompted us to speculate that Tc is involved in two infection steps, e.g. adherence and hemocyte inactivation. For clarity, this sentence was modified to: “This discrepancy suggests that TcaA is involved in adherence to epithelial cells and thus in midgut colonization, without requiring TccC.” (lines 165-166)

      In Fig. 4, the CFUs for W22703 ΔtccC/pBAD-tccC are essentially the same as for the other rescued KOs and WT, while in the text a point about weaker growth is made. Is this justified? Also, even though the CFU data is present here, data on infection of larvae by W22703 ΔtccC/pBAD-tccC is missing unlike the other bacterial variants. Please explain.

      Answer: We agree that this part of the results is misleading. We want to stress that the complementation very well restores the phenotype of the wildtype. The weaker growth of DtccC may be due to the distinct vector system used here. This part was there shortened and rephrased to: “When larvae were infected with 4.0 × 105 CFU of the DtcaA and DHE mutants, and with 1.4 × 106 CFU of strain W22703 DtccC/pBAD-tccC, all of which carrying the deleted genes on recombinant plasmids, the bacterial burden at days one to six p.i. increased approximately to that of the parental strain W22703 applied with 9.0 × 105 CFU, indicating a successful complementation of the gene deletions.”

      ” (lines 166-170).

      Missing data on W22703 ΔtccC/pBAD-tccC infection in Fig. 3, please the answer to point 20 above.

      Fig. 6b - The presence of an anti-RFP signal is not obvious in any of the bottom row images. The top row images are missing the same kind of annotation provided for Fig. 6a, without which non-histologists will find understanding the figure difficult.

      Answer: The anti-RFP signal is visible only on the left photo of the bottom panel, and not in the other three photos as explained in the text. We understand that the signals are not very strong, but they are visible on the screen.

      "In the absence of the lysis cassette, however, TcaA::Rfp was not detected despite the presence of W22703 ΔHE tcaA::rfp cells." + "To test whether or not the promoter of the lysis cassette is active in vivo, we infected G. mellonella larvae with strain W22703 PHE::rfp. Although Y. enterocolitica cells densely proliferated within the hemolymph (FIG. 6B), no staining signal that would point to the presence of TcaA was obtained, possibly due to no or weak PHE activity." Does this mean that without HE, tcaA does not express?

      Answer: No, we performed Western Blots showing that TcaA is detected in cells lacking HE. Therefore, a negative feedback regulation (e. g. increasing intracellular amounts of TcaA repress its own transcription) can be excluded. This is also in line with the low transcriptional activity of the lysis cassette in vivo (new Fig. S1B).

      "These data suggest that the HE cassette is responsible for the extracellular activity of the insecticidal Tc." Please explain how the preceding paragraph leads to this conclusion.

      Answer: This was poorly written and now reads “…for the transport…” (line 224).

      "As expected, bacterial cells, e.g. Y. enterocolitica, are visible in the hemolymph obtained from W22703-infected animals, but not in all other preparations." - which figure are the authors referring to?

      Answer: We have indeed identified, but not immunostained, bacterial cells in those preparations, but they are not visible in Fig. 7. This sentence was removed. However, the presence of W22703, but not its tc-PAIYe-mutants, in the hemolymph is demonstrated in Fig. 6A.

      "To delineate the transcriptional profile of Y. enterocolitica during infection of G. mellonella, we applied immunomagnetic separation to isolate Y. enterocolitica from the larvae 12 h and 24 h after infection" - do the authors store the bacteria for up to 24 h at 4 {degree sign}C, as indicated in the methods section?

      Answer: Yes, the probes were stabilized with RNAlater and then stored up to 24 h to synchronize all samples of one experiment.

      "The endolysin located within Tc-PAIYe was significantly up-regulated after 24 h, but not after 12 h, pointing to its possible role in the release of the Tc" - I could not find the endolysin in Table S1. Could the authors mark it clearly? Also, why is the holin also not upregulated?

      Answer: The endolysin gene is lacking in Table S1 due to its FC=1.02. We now added a table to Fig. S1 that shows the FC values of all genes from Tc-PAIYe. The FC-value of holin gene is 0.87, thus pointing to a very slight transcription of this lysis gene as discussed, thus preventing cell death.

      "This is in line with the fact that a T3SS is lacking in strain W22703" - Is a complete genomic sequence available for this strain, so readers could validate this statement?

      Answer: The genome sequence is available, and the reference is now cited (line 358). The common virulence plasmid of yersiniae, pYV that encodes the T3SS, is missing in this strain. We do not mention here the presence of a second, but probably incomplete, chromosomally encoded T3SS in strain W22703 do not overload the manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This is a very, very nice study as it actually describes the role of different Tc toxin components in a model infection system using an important bacterium- really for the first time in a properly controlled manner. The mutants lacking either the syringe (AB) or the bullet (C) make 'sense' for a loss of function perspective. The description of the phage cassette in loss of function is also interesting and could do with some more speculation? For example, some groups of Photorhabdus bacteria release their oral toxicity (Tc's) into their bacterial supernatants- whereas in others it remains cell associated. The likely role of this phage cassette in this process should be discussed (is cell suicide required for release?).

      Answer: We now discuss the possibly role of the lysis cassette in more detail, including the possibility that a subpopulation commits cell suicide (see lines 375-396).

      Reviewer #2 (Significance (Required)):

      This is highly significant finding as despite all of the very elegant structural studies done on these important toxins there is still very little work in vivo. These studies clearly show the role of the different components of these ABC toxins in vivo. It should be published with priority.

      Congratulations to the authors.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors analyze the phases of infection of Galleria mellonella by Yersinia enterocolitica following forced oral feeding. They study different phases of infection, including survival within the gut and invasion of the hemolymph. By analyzing differences in the genes up- and down regulated, they show that for example transporters for food sources from the hemocoel are regulated for making those sources available for the bacteria.

      Major comments: This is an interesting paper demonstrating genes of Y. enterocolitica dependent for colonization, growth and crossing of the epithelial gut barrier in G. mellonella.

      Major points which have to be addressed:

      Introduction: line 54: the BC subcomplex is not released into the cytosol! It is only the hypervariable region (enzymatic part) which enters the cytosol. This has to be corrected.

      Answer: This has been corrected accordingly.

      Fig.2/3: Why have different CFU been used for the distinct bacterial strains? This does not allow a direct comparison of their toxicity. For me the dead larvae shown in Fig. 3 are not represented in Fig 2 (data are not concordant), because of the loss before day one depicted in Fig. 2: The curves should be normalized to the same starting point (should be 100 %)?

      Answer: We would like to stress here that infection doses are hard to reproduce if frozen and diluted stocks are used. We decided for overnight culture to better mimic natural conditions and controlled each culture for its viable cell numbers by plating. Moreover, we choose the infection doses in a conservative manner, e.g. the number of mutants was higher than that of the parental strain.

      The data of Fig. 3 are concordant with Fig. 2 for two reasons: First, this experiments was performed in replicates with a total of 36 larvae per strain (see Fig. 2 legend), so that representative photos are shown. Second, larvae were considered dead if they failed to respond to touch, and many larvae without strong sign of melanisation were already killed.

      We analysed the algorithmus of the Kaplan-Meier-plot. All graphs start at 100%, this is now mentioned in the legend. There are no data between day 0 and day 1, and a stepwise graph is essential for this plot.

      Fig. 3: Why is the strain W22703 delta tccC/pBAD - tccC missing in the data set?

      Infections with W22703 DtccC are not shown to not overload the figure, please see the panel below. Answer: W22703 DtccC/pBAD-tccC infections have not been documented by photos. Figure legend 4 now reads “Infections with W22703 DtccC and DtccC/pBAD-tccC are not shown.”

      Minor: line 221: "the" is doubled

      Answer: This has been corrected accordingly.

      Reviewer #3 (Significance (Required)):

      The manuscript shows the use of G. mellonella as a straight foreward method to study gene functions of pathogenic bacteria, a significant knowledge for scientists of the field.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Summary: Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate).

      Answer: There are already three sections that summarize the results and the methods applied, namely the abstract, the last paragraph of the introduction, and the conclusion following the discussion. In our view, a further summary would overload the manuscript. Nevertheless, depending on the journal the manuscript will be published in, an additional authors´ summary would be provided.

      Outlines proposed role of lysis cassette in oral infection of Galleria as a model insect for host pathogen interaction, data which is fortified through use of histology and RNAseq.

      Introduction could extend to additional background eg Aleniz et al and other entomopathogen transcriptome data, more so other studies using Yersinia and Galleria as a model (refer references provided in the below comments)

      Answer: We again carefully screened PubMed for studies in the field and added few papers. However, in vivo transcriptome analyses are still rare, as indicated by a lack of a respective investigations with the highly relevant entomopathogen Photorhabdus luminescens. The literature suggested by the reviewer is now cited in the introduction and the discussion (see below for details).

      The strength of the paper lies in understanding the progression of the disease in the insect host as mentioned L316-317 and clearance of the bacteria via in TcaA mutant

      Major comments: - Are the key conclusions convincing? Yes for mode of action Fig 5 could have additional panels -this is a strength of the paper

      Answer: We agree that this time course is a strength of the paper, and we carefully selected representative photos. There are several to be shown, but to our view, they are rather illustrative than providing a substantial additional value.

      Fig 6 legend could better describe the observed insect components

      Answer: The insect components are now indicated in Fig. 6B and in Fig. 5.

      Figure 7 may be lost in PDF conversion -the figure appears un resolved? are there more high resolution photos

      Answer: Fig. 7 was present in the merged PDF provided by the publisher. We used the photos with the best resolution.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? the data provided is in places rudimentary (i.e. validation of the role of the lysis cassette in virulence) and could be bolstered with the construction and use of a lysis translational reporter etc I was left unsure how the HE::rfp and TcA::rfp constructs were made. I had assumed red florescent protein however it appears an antibody is used. This needs to be clarified as I then found it hard to interpret the results.

      Answer: The transcriptional PHE::rfp fusion is mentioned in the results section, but immunostaining failed probably due to a very low promoter activity (line 223). This is well in line with the transcriptome data. Please see a detailed answer how the HE::rfp and tcaA::rfp were constructed below. We applied the RFP-antibody for two reasons: first, fluorescence microscopy did not reveal clear red fluorescence in the tissue sections, and second, a TcaA antibody failed to match quality criteria for this purpose.

      It appear l114-125 that their may be enough data to derive a LD50 values and or LT value at a fixed dose - if so reporting this data of interest. It may also allude as to why a 10e5 dose was selected for subsequent expts

      Answer: This is an interesting point. The LD50 (dose of cells that kills 50% of all larvae) is usually not calculated in publications in this field of research, because its calculation requires a very huge separate data set that cannot be used to answer the questions addressed here. Such a dat set is not available. We published the dose-dependent toxicity of Y.enterocolitica W22703 upon subcutaneous injection, and from these data, we determined a LD50 for this strain of approximately 2 x 104 cells. The paper is cited in our manuscript. The 10E05 dose was selected due to our preliminary work and the reproducibility of the experimental phenotypes.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Use of lysis the reporter - discuss commonalties of the in host transcriptome with other Yersinia Galleria systems eg Paulson etc al (refer below). Are there any thoughts on the host range of this Yersinia and can this be placed in a pathogen host evolutionary context?

      Answer: Paulson et al. are now cited twice in the text. The host range of Yersinia enterocolitica has not been investigated to our knowledge. However, its nematocidal activity has been described by Spanier et al., and Manduca sexta larvae, the tobacco hornworm, is also killed by W22703 (see references). Moreover, there are two copies of tccC in the genome of strain W22703 encoding the cytotoxic Tc subunit with its hypervariable C-terminus that is assumed to contribute to host specificity. This is discussed in very detail by Song et al. (see references).

      Evolution: Yes, this has been addressed by Waterfield et al. 2004 (see references) where insects are hypothesized as a source of emerging pathogens. We placed our findings in the context of this article in lines 91-94 and 305-310.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. Yes

      • Are the data and the methods presented in such a way that they can be reproduced? yes but I think some vector construction methodology is missing e.g. ::rfp (refer above)

      Answer: The plasmids used to construct the two strains W22703 tcaA::rfp and W22703 PHE::rfp are listed in Table S2. References for details are given (Starke et. al., 2013, Starke and Fuchs, 2014). Briefly, we used a suicide vector (pUTs) carrying the gene encoding the red fluorescent protein (RFP). This vector replicates in E. coli helper strains such as SM10, but not in Y. enterocolitica. Strain SM10 is now listed in Table 2. Following conjugation, the construct is chromosomally inserted upon recombination via the fragments cloned into the plasmid. In case of tcaA, we cloned the 3´-end of the gene to generate a translational fusion, and in case of HE its promoter, resulting in a transcriptional fusion with the reporter RFP.

      Fig 2 I am a little lost mortality seems quick on day 0 is this a result of aberrant injection damage mortality or are the authors observing a different effect across mutants through the initial 24 hours? If data available could this time plot be extended out 0-24 hours. The dash used for W222703 tcaA /TccC look similar can a different symbol be used.

      Answer: The reviewer is right that the mortality is high on the first day. However, larvae monitoring for up to nine days is a standard in the literature. No data are available for a better resolution of the first 24 h that, however, were investigated in more detail in the time course of Fig. 5. Moreover, we observed changes in motility and colouring of some of the larvae from one h p.i. on (data not shown). Aberrant injection damage was avoided, and damaged larvae or larvae that not completely took up the infection solution were not further considered in the experiment. This is mentioned in lines 107-109.

      A different symbol is now used for W222703 DtccC /pBAD-tccC.

      • Are the experiments adequately replicated and statistical analysis adequate? Yes

      Minor comments: - Specific experimental issues that are easily addressable. - Are prior studies referenced appropriately? Other entomopathogenic transcriptome studies could be compared to and or cross referenced (I have provided references in the response

      Answer: Repetition of our answer above: We again carefully screened PubMed for studies in the field and added few papers. However, in vivo transcriptome analyses are still rare, as indicated by a lack of a respective investigations with the highly relevant entomopathogen Photorhabdus luminescens. The literature suggested by the reviewer is now cited in the introduction and the discussion (see below for details).

      I am unsure on the use of immuno pulldown and efficiency of recovering the Yersinia using this method as opposed to direct sequencing total RNA has this method been used in other systems,

      Answer: Isolating RNA from in vivo probes of infected insects encounters two challenges: first, a possible contamination with commensal bacteria, and a too high amount of host RNA that reduces the number of sequence reads. This might be the reason for the relatively low sequence depth found in related papers in the field of in vivo transcriptomics. We overcame these problems by immunomagnetic separation that is easily applicable and enriches the samples with respect to Yersinia cells, this is now mentioned in the results. We also cite a study (Prax et al., in which we established the protocol of IMS.

      • Are the text and figures clear and accurate? Yes though in places better naming of insect components could be listed

      Answer: This was done, see above.

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      As listed above potential use of reporters and or comparison and transcriptome analysis to other systems and an evolutionary pathogen host context (refer comments above) would strengthen the manuscript

      Answer: Please see answer to comments above. We explained the use of the reporter fusions, and put the transcriptome analysis into the context of related studies.

      Minor comments as per below When first mentioned good to state the larval instar used

      Answer: We used larvae of instar 5-6 according to Jorjao et al. (2018), this is now mentioned and cited in the M&M section, line 434.

      l 78 lon protease? what type? this is an important SOS protease affecting many regulatory systems please clarify

      Answer: This is a Lon A endopeptidase, and its function for the temperature-dependent activity of the lysis cassette has ben described (Springer et al. 2021, see references). Its relevance for the thermodependent regulation of Yersinia virulence has been documented by Herbst et al. (PMID: 19468295) and Jackson et al. (https://doi.org/10.1111/j.1365-2958.2004.04353.x).

      l103-113 an description of the elemental tract which is depicted, perhaps this could be placed in the Fig. 1 figure legend

      Answer: We agree and substantially shortened the first paragraph of the results. Relevant aspects are now mentioned in Figure legend 2, redundancies with the figure legend were removed.

      l 133 use of the word larvae in place of the word animals might be more appropriate

      Answer: This was corrected accordingly.

      l 133 clarify delta HE mutant description when first mentioned

      Answer: The abbreviation HE is now introduced in the introduction in line 74.

      Lines 220-234 hard to follow mainly as I am unsure how then strains are constructed, perhaps clarify what rfp is how was it made :: demotes and insertion but yet then they seek to detect TcaA? I could not find the methodology on its or HE::rfp construction

      Answer: The plasmids used to construct the two strains W22703 tcaA::rfp and W22703 PHE::rfp are listed in Table S2. References for details is given (Starke et. Al., 2013, Starke et al. 2014). Briefly, we used a suicide vector (pUTs) carrying the gene encoding the red fluorescent protein (RFP). Following conjugation, the construct is chromosomally inserted upon recombination via the fragments cloned into the plasmid. In case of tcaA, we cloned the 3´-end of the gene to generate a translational fusion, and in case of HE its promoter, resulting in a transcriptional fusion with the reporter RFP.

      Please see above why we used RFP-antibodies to detect TcaA.

      l247 immuno-magnetic separation to isolate Yersinia - is there an efficiency behind this method, might be good to mention (I am unfamiliar with this technique)

      Answer: We here repeat our answer to the point above: Isolating RNA from in vivo probes of infected insects encounters two challenges: first, a possible contamination with commensal bacteria, and a too high amount of host RNA that reduces the number of sequence reads. This might be the reason for the relatively low sequence depth found in related papers in the field of in vivo transcriptomics. We overcame these problems by immunomagnetic separation that is easily applicable and enriches the samples with respect to Yersinia cells, this is now mentioned in the results. We also cite a study (Prax et al., in which we established the protocol of IMS.

      l313 alludes to role of Tca in hemoceol which contradicts an earlier statements in l 130 please clarify

      Answer: The reviewer is right. The sentence in former line 130 (now lines 123-124) was corrected to “…suggesting that the Tc plays a main role in the initial phases of infection”. This statement does not exclude its activity towards hemocytes. Moreover, subcutaneous infection is very artificial and was therefore replaced by oral application in our study to mimic natural routes of infection. This is now elaborated in more detail in the discussion (Lines 305-310).

      For clarity table 1 could colour highlight (different colours) tc and lysis genes

      Answer: We now added a table to Fig. S1 that shows the FC values of all genes from Tc-PAIYe.

      CROSS-CONSULTATION COMMENTS I am in agreement with all points of reviewer 1 who has a clear understanding on Tc toxin composition TcA pentamer etc. Being familiar to the field I regret I did not pick up on these errors

      Answer: This has been corrected according to R1.

      Point 13 agree and should possibly bring in other researchers who have used Galleria as a model. It also needs to be kept in mind that the target host for many Tcs has yet to be determined hence the importance of oral activity of this isolate

      Answer: This has been corrected according to R1.

      I am similarly in agreement with comments of reviewer 3

      Reviewer 4 I over looked the LT50 data -- apologies but agree with reviewer 1 where WT should be the more potent strain --I still think if possible LD50 for WT would be of value more so to define its oral activity

      Answer: We repeat our answer from above. This is an interesting point. The LD50 (dose of cells that kills 50% of all larvae) is usually not calculated in publications in this field of research, because its calculation requires a very huge separate data set that cannot be used to answer the questions addressed here. Such a dat set is not available. We published the dose-dependent toxicity of Y.enterocolitica W22703 upon subcutaneous injection, and from these data, we determined a LD50 for this strain of approximately 2 x 104 cells. The paper is cited in our manuscript. The 10E05 dose was selected due to our preliminary work and the reproducibility of the experimental phenotypes.

      Reviewer #4 (Significance (Required)):

      SECTION B - Significance ========================

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Extends from work of Fuchs - research group Extends from work of Palmer et al on lysis cassettes as potential T10SS Extends from work off Vesga Pseudomonas and Paulson Yersinia(refs provided below) on insect transcriptomics

      Of interest and possibly understated is the oral activity of enterocolitica in the insect host as mentioned L316-317 and how this might relate to the lifestyle/evolution of this microbe further elaboration here would be of interest

      Answer: We agree that this is an important aspect. Therefore, we added the following sentences here: “In contrast to subcutaneous injection in the use of insect larvae as model for bacterial virulence properties towards mammals, oral application mimics natural routes of infection that in particular take place during the bioconversion of animal cadavers by bacteria, fungi, and larvae {Carter, 2007 #879}. Together with the broad cytocidal host spectrum of bacterial toxins {Mendoza-Almanza, 2020 #880}, investigation of yet neglected natural infections of invertebrates will contribute to a better understanding of microbial pathogenicity {Waterfield, 2004 #480}.” (lines 305-310)

      • Place the work in the context of the existing literature (provide references, where appropriate).

      Relevant Transcriptome papers which could be referred to in the discussion i.e. are similar genes in play or is their a point of difference? https://doi.org/10.1093/g3journal/jkaa024;https://doi.org/10.1038/s41396-020-0729-9; https://doi.org/10.1099/mic.0.000311

      Answer: Paulson et al. mainly address virulence factors, whereas metabolism is not uncovered. We now cite similarities with respect to hemolysis and iron scavenging. The focus of Vesga et al. is on the interaction of a plant pathogen with wheat and two insect hosts, including their transcriptome. Although metabolic details are missing, there is an interesting overlap with the paper by Vesga et al. (hemocoel as permissive environment for proliferation) and a difference (upregulation of chitinases was not observed) that are now cited in the discussion. The Alenzi paper mainly investigated the general virulence of Y. enterocolitica strain. We cite its finding on the importance of motility, thus confirming our transcriptome analysis.

      • State what audience might be interested in and influenced by the reported findings. The oral activity of enterocolitica towards Galleria of interest and an evolutionary context insect vs mammalian activity in the discussion could be provided. Potential role of TcaA in gut association For the targeted journal I feel additional technical data is required and a broader context to other global systems (bacterial species) provided

      Answer: All points were addressed carefully and in detail. We refer to our answers to points detailed above.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Reviewers expertise entomopathogens, their toxins and pathogen ecology
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the both the reviewers for their constructive comments. Please see our point-by-point response to all the comments.

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)): *

      • Summary *
      • The authors of this manuscript confirm data found by others by determining replication kinetics of the ancestral B.6 SARS-CoV-2 virus, Delta and Omicron BA.1 and BA.2 in Calu-3 cells. The authors quantify barrier integrity between variants and interferon induction to conclude that Delta is more cytopathic and induced less interferon than Omicron, possibly leading to its increased pathogenesis. In addition the authors identify CuCl2 and FeSO4 as potential antivirals. *

      *Major comments *

      1. *__Reviewer comment: __The author's argue that Omicron's slower replication on Calu-3 cells correlates with mild disease, however many publications show that Omicron replicates more efficiently/ rapidly in primary human airway cultures: *
      2. Hui et al., (Nature, 2022) doi: https://doi.org/10.1038/s41586-022-04479-6*
      3. Peacock et al., (bioRxiv) doi: https://doi.org/10.1101/2021.12.31.474653*
      4. Lamers et al., (bioRxiv) doi: https://doi.org/10.1101/2022.01.19.476898 * Response: Previous reports including the citations indicated by the reviewer have shown that the Omicron variant replicates at a lower levels in lung tissue as compared to cells of bronchial origin or upper respiratory tract. In fact, Omicron variant was shown not to productively infect at all in alveolar type II cells. Omicron replication was severely compromised in Calu-3 cells grown in 96-well plates (https://doi.org/10.1080/22221751.2021.2023329) which is consistent with our observations.

      *__Reviewer comment: __Can the authors explain why air-liquid grown Calu-3 cells appear to display similar viral titers for Omicron and Delta at 24 and 36 h.p.i (Figure 5B), however lower viral replication in Figure 3B? If the cells in Figure 3B are submerged, then the authors should identify why ALI grown Calu-3 cells are more susceptible to Omicron. *

      Response: Cells were grown in plastic multi-well plates for growth curve experiments shown in Figure 3. The cells in this condition are not polarized and the virus titers are the total amount of virus released into the culture supernatant. The infection conditions in Figure 5 is under air-liquid culture conditions, from polarized cells. Therefore, the virus titers are only from the basolateral chamber. The outcomes of figure 3 and figure 5 are not comparable due to these technical differences. We will add this explanation in the results section.

      *__Reviewer comment: __The authors suggest that Delta disrupts epithelial barrier integrity to a larger extent compared to B.6 and Omicron, however this may be due to fewer infected cells (despite equal viral titers, the nucleocapsid staining in Figure 2 and 5C suggests fewer infected cells). Have the authors imaged B.6 or Omicron at a later timepoint (or normalized virus input for equal infected cells) to determine barrier integrity when the amount of infected cells is equal? Alternatively, the authors should discuss this as a possible limitation of their study, especially since they argue this is a major reason why Delta has a growth advantage (lines 345 to 349). *

      Response: We performed confocal imaging of transwells from air-liquid interface model using a 20X objective and have obtained data to show that the percent of infected cells is similar between Omicron and Delta variant. We will include this data in the revised manuscript. In an in vitro system, once the infection is set in, the infected cells eventually die and the TEER reaches background levels. We are proposing a delay in disruption of barrier integrity most probably due to lower cytopathogenicity of the Omicron variant. As per the reviewer’s suggestion, we will discuss the possible limitation of the models and provide additional interpretations.

      Minor comments *A) __Reviewer comment: __Line 118: Implications of this sentence are too strong. The authors have not shown the causality of Ct values and transmission, therefore they should reword the sentence: "indicating a high viral burden in patients during this period resulting in increased transmission of the virus among the contacts" to "likely attributing to increased transmission..." *

      Response: We will correct this.

      *__B) Reviewer comment: __Line 289: The authors suggest that infection with the Omicron variant generated higher levels of antibodies to the Delta variant, however these individuals are already vaccinated and elicit cross-neutralizing antibodies against Delta even before their Omicron infection. Therefore the Delta response is boosted and the Omicron response is essentially a primary response since vaccination elicits almost no cross-protection in itself. Therefore the authors should compare primary Delta infected individuals to primary Omicron infected individuals to determine cross-protection levels. *

      Response: We agree with the reviewer’s argument. Please note that the two vaccines used in India are against the ancestral virus (inactivated) or the spike protein expressed by the adenovirus vector backbone. As over 90% of the population in India have been fully vaccinated with these two vaccines and a majority of them may also have been infected with delta variant and now with omicron, it is practically impossible to compare primary delta cases vs primary omicron cases at this stage. As part of another study in mid 2021, after the second wave of COVID-19 infections due to the Delta variant in India, we randomly selected 55 samples which had a detectable FRNT50 value for the delta variant, to test for their ability to neutralize the Omicron variant. Only twenty of the 55 samples had detectable levels of neutralizing antibodies against the Omicron variant. By assigning a FRNT50 value of 10 for the samples which had no detectable levels of antibodies in the starting dilution (1:20) of the assay, we obtained a GMT of 22.5 (95% CI: 16, 31) for these 55 samples. This value was 20-fold lower than the GMT of Delta variant which was 404 (95% CI:248, 658). This clearly indicates that even during the peak of delta wave, there were barely any cross-reactive antibodies to the Omicron variant. This study was recently published [NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-31170-1]. It would be interesting to eventually compare the antibody responses in reinfections with other sub-lineages of Omicron variant which is beyond the scope of our manuscript. We will add this description in the results and discussion section of the revised manuscript.

      *C) __Reviewer comment: __There appears to be no reference to Figure 6G, however this reference is most likely missing from line 306. *

      Response: Thank you for bringing this to our notice. We will insert the reference to Figure 6G.

      *D) __Reviewer comment: __Line 359-362: The authors suggest that waning antibody titers increase susceptibility to new variants of concern, however their cohort already possessed very low antibody titers against Omicron a month after vaccination (Figure 7F) suggesting they could be equally susceptible to Omicron 1 and 6 months after vaccination. *

      Response: Please note that nine out of 15 samples had FRNT50 value above the level of detection after vaccination in June 2021. The number of samples positive for Omicron antibodies reduced to six out of 15 by Dec 2021 suggesting that relatively more people were without protective antibodies for Omicron variant by Dec 2021. Around 70% of the population was seropositive by Aug 2021 (https://doi.org/10.1016/j.ijid.2021.12.353) and most adults in India received both doses of their vaccine after June 2021 which would have boosted the humoral and cellular response to SARS-CoV-2. This is corroborated in a recently published report, where we showed that 36 out of 55 previously infected subjects had neutralizing antibodies for the Omicron variant after receiving a single dose of inactivated vaccine. Therefore, in the context of hybrid immunity in India, we speculate that waning antibody titers could have played a significant role in the emergence and spread of Omicron variant in addition to the ability of the Omicron variant to escape neutralization, replicate more efficiently in the upper respiratory tract etc., The fact that booster doses of vaccines developed against the ancestral virus/viral protein was capable of increasing the level of neutralizing antibodies to omicron variant suggests that the level of antibodies above a certain threshold may play a significant role in protecting against the omicron variant.

      Reviewer #1 (Significance (Required)):

      • __Reviewer comment: __Many of the conclusions based on replication and barrier integrity may not represent the situation in primary human tissues and does not explain the rapid spread of Omicron. In addition, interferon induction has already been described for these variants and this finding is not novel. The manuscripts most interesting and novel finding is the role of CuCl2 and FeSO4 as antivirals. It would be interesting to test these salts in primary human airway cultures. *

      Response: The study was conducted in the months of Jan-March 2022 and the first version of the results were uploaded on a preprint server in March 2022. The process of journals handling the manuscript and obtaining reviews is not under our control. We cannot argue to defend the comments on novelty when the Omicron variant is barely six months old and new variants continue to emerge. The deluge of publications should not result in reviewers branding most of the efforts as not novel or insignificant. We have been trying since three months to obtain primary cells but the distributors are unable to supply the same. We will continue to try to obtain cells from one or the other source. Transwells are back-ordered with expected delivery dates in three months. Meanwhile, we now have HBEC3-KT cells which are normal human bronchial epithelial cells immortalized with CDK4 and hTERT. We will perform the inhibition experiments in these cell lines to convince the reviewers.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      *In the manuscript entitled "BA.1 and BA.2 sub-lineages of Omicron variant have comparable replication kinetics and susceptibility to neutralization by antibodies" the authors assess the kinetics of growth of SARS-CoV-2 variants in Calu-3 cells and their effects on epithelial junction, and the interferon response. The authors also analyze the capacity of metal salts to block SARS CoV-2 replication in Calu-3 cells. Finally, the authors characterize the ability of vaccinated and/or COVID-19 patients to develop neutralizing antibodies to different variants using FRNT and specific binding assays (ELISA). *

      • The paper largely confirms several previous reports on the replication capacity and interferon responses of the different variants. Although the title and abstract focus on the Omicron sub-lineages, the paper is mostly focused on comparing original CoV2, with Kappa, Delta and Omicron. *
      • Figures 1-5 compare the replication kinetics, interferon responses, and epithelial barrier disruption of Kappa, Delta and the original Omicron (B.1.1.529) to the original B6 variant. On a separate note, Figure 7 shows the ability of metal salts (especially iron, copper, and zinc) to block viral RNA-dependent RNA polymerase activity (RdRp) in vitro. The authors also show the effect on virus replication in Calu-3 cells (Delta and Omicron B.1.1.529 only). The data mainly focus on the variants, the Delta and the Omicron (BA.1.1.529 and not the BA.1 and BA.2 sub-lineages) except in Fig 6A, B, G. *

      • __Reviewer comment: __Most importantly, a major limitation of the paper is that when human samples are analyzed, the authors assume that the patients have been infected with a specific variant according to the "peak" of infection, but sequencing is never performed. When neutralization and binding of antibodies are analyzed, the information on the patients is unclear - for example, were the patients exposed to Delta or Omicron or any of their sub-lineages? What was the vaccination status of SARS CoV-2 positive patients? And why non-tested individuals showing symptoms were included in the study (lines 302-304)? *

      Response: We thank the reviewer for the comments. Over 90% of the population in India is vaccinated. All the participants of the study have been vaccinated in 2021. The participants were enrolled into the study almost 4 weeks after recovery from illness. We have enrolled participants who have reported to have had fever or COVID-19-like symptoms in the preceding weeks with or without confirmed RT-PCR test results. Testing is an individual and voluntary choice now. Therefore, it would be difficult to find RT-PCR confirmed cases. Our assumption about exposure is based on a nationwide sequencing effort of thousands of samples every week and this approach is reliable and credible. As indicated in the text and in the supplementary figure, Omicron lineages BA.1 followed by BA.2 were the circulating virus lineages since Jan 2021 in India.

      *__Reviewer comment: __The authors show that BA.1 and BA.2 have similar replication kinetics in Calu-3 cells and induce similar neutralizing antibodies in the patients tested. However, there is a large disconnection with the rest of the paper that is mostly focused on Kappa, Delta, and Omicron B.1.1.529. Also, no comparisons between these variants and BA.1 or BA.2 have been shown. Similarly, a large assumption in the paper is that the patients who tested positive for COVID-19 have had "natural Omicron infection" (lines 36-37; lines 307-311) when it could be any other variants or Omicron sub-lineages as well. *

      Response: Please note that the B.1.1.529 which was used at the beginning of the study is the BA.1 sub-lineage which has been compared with Kappa and Delta variants. BA.2 emerged at later stages and therefore we have compared the kinetics and neutralization titer between BA.1 and BA.2. It is unreasonable to expect to repeat all the comparisons with BA.2 considering the cost and challenges of working in a BSL-3 environment. The initial version of this data was uploaded on preprint server in March 2022 when only two sub-lineages of Omicron namely BA.1 and BA.2 existed. Our data from the national SARS-CoV-2 sequencing consortium clearly shows that there were no other sub-lineages circulating at that time.

      Reviewer #2 (Significance (Required)):

      *__Reviewer comment: __In light of the fact that most of the paper does not look at the subvariants BA.1 and BA.2 of Omicron- either the authors compare BA.1 and BA.2 more comprehensively with Omicron B.1.1.529 or rewrite the conclusions and claims of the current paper. Similar to the experiments comparing B6 with Kappa, Delta and Omicron, Omicron B.1.1.529 should be compared similarly to BA.1 and BA.2 in a separate figure. In any case, the novelty compared to other papers -also cited by the authors- remains limited. *

      Response: We will revise the conclusions and claims of the paper as per the suggestions. Please see our response to reviewer 1 with regards to the novelty of our observations. The B.1.1.529 variant was later classified as the BA.1 variant. Our study was uploaded on the preprint server in March 2022 and the entire review process has taken four months. It is unfair to now demand comparison of BA.2 with Kappa or Delta variant which does not add any additional value to our observations.

      *__Reviewer comment: __In addition to the concerns mentioned above, there are more pressing variants circulating right now, such as BA.4 and BA.5. These variants are not referred in the paper. It might be beyond the scope of the paper, but including more analyses with BA.1, BA.2 (as the ones done with B.1.1.529) and adding some key data with BA.3, BA.4, BA.5 might substantially increase the relevance and importance of the paper. *

      Response: Please see our comments above. Our efforts are continuing in this direction to further look at antibody responses and replication kinetics of newer variants which have emerged recently. However, the scarcity of positive clinical samples and lower probability of getting samples that would be suitable for virus isolation are the challenges we are dealing with. We think testing newer variants which have emerged during the review process is certainly valuable but is extremely difficult under the current circumstances. We will have to apply to seek import permits to obtain these sub-lineages or enrol patients with symptoms and keep testing them to isolate, culture the virus and obtain whole genome sequence. We will have to establish neutralization assays with newer sub-variants to test in parallel with other Omicron lineages. All this is beyond the scope of our manuscript and will take few months of paper work and experimentation.

    1. Reviewer #3 (Public Review):

      The present study aims to elucidate posterior cingulate cortex (PCC) function with both single-unit and population-level depth electrodes. The results clearly show that the dorsal PCC (dPCC) is involved in executive functions (search and add), but that it also contains neurons that are selective for episodic memory (past and future) and rest conditions. With this impressive study design, the authors are able to reconcile discrepancies between human and primate studies. Furthermore, the derived conclusion that PCC function is more diverse than merely its participation in the DMN is of great importance for the field. Thus, I believe that this work will have a great impact on how we think about the PCC, by (1) emphasizing its participation in executive processes and (2) providing evidence of distinct single-unit response profiles that do not manifest on a population level.

      The main strength of this work is the combination of population-level measurements that clearly show the participation of dPCC in executive processes with microelectrode single-unit measurements and an unsupervised hierarchical clustering approach that allows for the identification of 4 distinct SU response profiles within the dPCC. In addition, the population-level electrodes mostly engaged in executive function cluster around an fMRI meta-analysis peak related to executive processing derived from neurosynth, providing a bridge to human fMRI research.

      Nevertheless, there is one concern regarding the data collected within the ventral PCC (vPCC) in this study and the way the authors integrated it into their conclusions.

      Specifically, the conclusion that "Together, they [the findings] inform a view of PCC as a heterogeneous region composed of dorsal and ventral subregions specializing in executive and episodic processing respectively" may not be completely supported by the data. The dPCC macroelectrode data does clearly show a functional specialization in executive processing, but does the data from vPCC presented in this manuscript also support the claim? While taking a closer look at the vPCC data, several inconsistencies stood out: First, the total number of vPCC electrodes was much smaller (6 vs 29 microelectrodes and one microwire probe that was not analyzed). Second, it is not clear which of the presented electrodes in figure 3 were considered to be ventral. From comparing figure 3 with the dorsal/ventral split displayed in figure 1B, it seems as if only one electrode was unambiguously placed in vPCC. Third, BBG statistics of these 6 electrodes are not presented, thus the claim that they show vPCC functional specialization is not statistically supported.

    1. Author Response

      Reviewer #1 (Public Review):

      Jones et al. investigated the relationship between scale free neural dynamics and scale free behavioral dynamics in mice. An extensive prior literature has documented scale free events in both cortical activity and animal behavior, but the possibility of a direct correspondence between the two has not been established. To test this link, the authors took advantage of previously published recordings of calcium events in thousands of neurons in mouse visual cortex and simultaneous behavioral data. They find that scale free-ness in spontaneous behavior co occurs with scale free neuronal dynamics. The authors show that scale free neural activity emerges from subsets of the larger population - the larger population contains anticorrelated subsets that cancel out one another's contribution to population-level events. The authors propose an updated model of the critical brain hypothesis that accounts for the obscuring impact of large populations on nested subsets that generate scale free activity. The possibility that scale free activity, and specifically criticality, may serve as a unifying theory of brain organization has suffered from a lack of high-resolution connection between observations of neuronal statistics and brain function. By bridging theory, neural data, and behavioral dynamics, these data add a valuable contribution to fields interested in cortical dynamics and spontaneous behavior, and specifically to the intersection of statistical physics and neuroscience.

      Strengths:

      This paper is notably well written and thorough.

      The authors have taken a cutting-edge, high-density dataset and propose a data-driven revision to the status-quo theory of criticality. More specifically, due to the observed anticorrelated dynamics of large populations of neurons (which doesn't fit with traditional theories of criticality), the authors present a clever new model that reveals critical dynamics nested within the summary population behavior.

      The conclusions are supported by the data.

      Avalanching in subsets of neurons makes a lot of sense - this observation supports the idea that multiple, independent, ongoing processes coexist in intertwined subsets of larger networks. Even if this is wrong, it's supported well by the current data and offers a plausible framework on which scale free dynamics might emerge when considered at the levels of millions or billions of neurons.

      The authors present a new algorithm for power law fitting that circumvents issues in the KS test that is the basis of most work in the field.

      Weaknesses:

      This paper is technically sound and does not have major flaws, in my opinion. However, I would like to see a detailed and thoughtful reflection on the role that 3 Hz Ca imaging might play in the conclusions that the authors derive. While the dataset in question offers many neurons, this approach is, from other perspectives, impoverished - calcium intrinsically misses spikes, a 3 Hz sampling rate is two orders of magnitude slower than an action potential, and the recordings are relatively short for amassing substantial observations of low probability (large) avalanches. The authors carefully point out that other studies fail to account for some of the novel observations that are central to their conclusions. My speculative concern is that some of this disconnect may reflect optophysiological constraints. One argument against this is that a truly scale free system should be observable at any temporal or spatial scale and still give rise to the same sets of power laws. This quickly falls apart when applied to biological systems which are neither infinite in time nor space. As a result, the severe mismatch between the spatial resolution (single cell) and the temporal resolution (3 Hz) of the dataset, combined with filtering intrinsic to calcium imaging, raises the possibility that the conclusions are influenced by the methods. Ultimately, I'm pointing to an observer effect, and I do not think this disqualifies or undermines the novelty or potential value of this work. I would simply encourage the authors to consider this carefully in the discussion.

      R1a: We quite agree with the reviewer that reconciling different scales of measurement is an important and interesting question. One clue comes from Stringer et al’s original paper (2019 Science). They analyzed time-resolved spike data (from Neuropixel recordings) alongside the Ca imaging data we analyzed here. They showed that if the ephys spike data was analyzed with coarse time resolution (300 ms time bins, analogous to the Ca imaging data), then the anticorrelated activity became apparent (50/50 positive/negative loadings of PC1). When analyzed at faster time scales, anticorrelations were not apparent (mostly positive loadings of PC1). This interesting point was shown in their Supplementary Fig 12.

      This finding suggests that our findings about anticorrelated neural groups may be relevant only at coarse time scales. Moreover, this point suggests that avalanche statistics may differ when analyzed at very different time scales, because the cancelation of anticorrelated groups may not be an important factor at faster timescales.

      In our revised manuscript, we explored this point further by analyzing spike data from Stringer et al 2019. We focused on the spikes recorded from one local population (one Neuropixel probe). We first took the spike times of ~300 neurons and convolved them with a fast rise/slow fall, like typical Ca transient. Then we downsampled to 3 Hz sample rate. Next, we deconvolved using the same methods as those used by Stringer et al (OASIS nonnegative deconvolution). And finally, we z-scored the resulting activity, as we did with the Ca imaging data. With this Ca-like signal in hand, we analyzed avalanches in four ways and compared the results. The four ways were: 1) the original time-resolved spikes (5 ms resolution), 2) the original spikes binned at 330 ms time res, 3) the full population of slow Ca-like signal, and 4) a correlated subset of neurons from the slow Ca-like signal. Based on the results of this new analysis (now in Figs S3 and S4), we found several interesting points that help reconcile potential differences between fast ephys and slow Ca signals:

      1. In agreement with Sup Fig 12 from Stringer et al, anticorrelations are minimal in the fast, time-resolved spike data, but can be dominant in the slow, Ca-like signal.

      2. Avalanche size distributions of spikes at fast timescales can exhibit a nice power law, consistent with previous results with exponents near -2 (e.g. Ma et al Neuron 2019, Fontenele et al PRL 2019). But, the same data at slow time scales exhibited poor power-laws when the entire population was considered together.

      3. The slow time scale data could exhibit a better power law if subsets of neurons were considered, just like our main findings based on Ca imaging. This point was the same using coarse time-binned spike data and the slow Ca-like signals, which gives us some confidence that deconvolution does not miss too many spikes.

      In our opinion, a more thorough understanding of how scale-free dynamics differs across timescales will require a whole other paper, but we think these new results in our Figs S3 and S4 provide some reassurance that our results can be reconciled with previous work on scale free neural activity at faster timescales.

      Reviewer #2 (Public Review):

      The overall goal of the paper is to link spontaneous neural activity and certain aspects of spontaneous behavior using a publicly available dataset in which 10,000 neurons in mouse visual cortex were imaged at 3 Hz with single-cell resolution. Through careful analysis of the degree to which bouts of behavior and bouts of neural activity are described (or not) by power-law distributions, the authors largely achieve these goals. More specifically, the key findings are that (a) the size of bouts of whisking, running, eye movements, and pupil dilation are often well-fit by a power-law distribution over several decades, (b) subsets of neurons that are highly correlated with one of these behavioral metrics will also exhibit power-law distributed event sizes, (c) neuron clusters that are uncorrelated with behavior tend to not be scale-free, (d) crackling relationships are generally not found (i.e. size with duration exponent (if there is scaling) was not predicted by size power-law and duration power-law), (e) bouts of behavior could be linked to bouts of neural activity. In the second portion of the paper, the authors develop a computational model with sets of correlated and anti-correlated neurons, which can be accomplished under a relatively small subset of connection architectures: out of the hundreds of thousands of networks simulated, only 31 generated scale-free subsets/non-scale-free population/anti correlated e-cells/anti-correlated i-cells in agreement with the experimental recordings.

      The data analysis is careful and rigorous, especially in the attention to fitting power laws, determining how many decades of scaling are observed, and acknowledging when a power-law fit is not justified. In my view, there are two weaknesses of the paper, related to how the results connect to past work and to the set-up and conclusions drawn from the computational modeling, and I discuss those in detail below. While my comments are extensive, this is due to high interest. I do think that the authors make an important connection between scale-free distributions of neural activity and behavior, and that their use of computational modeling generates some interesting mechanistic hypotheses to explore in future work.

      My first general reservation is in the relationship to past work and the overall novelty. The authors state in the introduction, "according to the prevailing view, scale-free ongoing neural activity is interpreted as 'background' activity, not directly linked to behavior." It would be helpful to have some specific references here, as several recent papers (including the Stringer et al. 2019 paper from which these data were taken, but also papers from McCormick lab and (Anne) Churchland lab) showed a correlation between spontaneous activity and spontaneous facial behaviors. To my knowledge, the sorts of fidgety behavior analyzed in this paper have not been shown to be scale-free, and so (a) is a new result, but once we know this, it seems that (e) follows because we fully expect some neurons to correlate with some behavior.

      R2a: We agree with the reviewer that our original introductory, motivating arguments needed improvement. We have now rewritten the last 2 paragraphs of the introduction. We hope we have now laid out our argument more clearly, with more appropriate supporting citations. In brief, the logic is this:

      1. Previous theory, modeling, and experiments on the topic of scale-free neural activity suggest that this phenomenon is an autonomous, internally generated thing, independent of anything the body is doing.

      2. Relatively new experiments (including those by Churchland’s lab and McCormmick’s lab: Stringer 2019; Salkoff 2020; Clancy 2019; Musall 2019) suggest a different picture with a link between spontaneous behaviors and ongoing cortical activity, but these studies did not address any questions about scale-free-ness.

      3. Moreover, these new experiments show that behavioral variables only manage to explain about 10-30% of ongoing activity.

      4. Is this behaviorally-explainable 10-30% scale-free or perhaps the scale-free aspects of cortical dynamics fall withing the other 70-90%. Our goal is to find out.

      Digging a bit more on this issue, I would argue that results (b) and (c) also follow. By selecting subsets of neurons with very high cross-correlation, an effective latent variable has emerged. For example, the activity rasters of these subsets are similar to a population in which each neuron fires with the same time-varying rate (i.e., a heterogeneous Poisson process). Such models have been previously shown to be able to generate power-law distributed event sizes (see, eg., Touboul and Destexhe, 2017; also work by Priesemann). With this in mind, if you select from the entire population a set of neurons whose activity is effectively determined by a latent variable, do you not expect power laws in size distributions?

      Our understanding is that not all Poisson processes with a time-varying rate will result in a power law. It is quite essential that the fluctuations in rate must themselves be power-law distributed. As a clear example of how this breaks down, consider a Poisson rate that varies according to a sine wave with fixed period and amplitude. In this case, the avalanche size distribution is definitely not scale-free, it would have a clear typical scale. Another point of view on this comes from some of the simplest models used to study criticality – e.g. all-to-all connected probabilistic binary neurons (like in Shew et al 2009 J Neurosi). These models do generate spiking with a time-varying Poisson rate when they are at criticality or away from criticality. But, only when the synaptic strength is tuned to criticality is the time-varying rate going to generate power-law distributed avalanches. I think the Priesmann & Shriki paper made this point as well.

      My second reservation has to do with the generality of the conclusions drawn from the mechanistic model. One of the connectivity motifs identified appears to be i+ to e- and i- to e+, where potentially i+/i- are SOM and VIP (or really any specific inhibitory type) cells. The specific connections to subsets of excitatory cells appear to be important (based on the solid lines in Figure 8). This seems surprising: is there any experimental support for excitatory cells to preferentially receive inhibition from either SOM or VIP, but not both?

      R2b: There is indeed direct experimental support for the competitive relationship between SOM, VIP, and functionally distinct groups of excitatory neurons. This was shown in the paper by Josh Trachtenberg’s group: Garcia-Junco-Clemente et al 2017. An inhibitory pull-push circuit in frontal cortex. Nat Neurosci 20:389–392. However, we emphasize that we also showed (lower left motif in Fig 8G) that a simpler model with only one inhibitory group is sufficient to explain the anticorrelations and scale-free dynamics we observe. We opted to highlight the model with two inhibitory groups since it can also account for the Garcia-Junco-Clemente et al results.

      In the section where we describe the model, we state, “We considered two inhibitory groups, instead of just one, to account for previous reports of anticorrelations between VIP and SOM inhibitory neurons in addition to anticorrelations between groups of excitatory neurons (Garcia-Junco-Clemente et al., 2017).”

      More broadly, I wonder if the neat diagrams drawn here are misleading. The sample raster, showing what appears to be the full simulation, certainly captures the correlated/anti-correlated pattern of the 100 cells most correlated with a seed cell and 100 cells most anti-correlated with it, but it does not contain the 11,000 cells in between with zero to moderate levels of correlation.

      R2c: We agree that our original model has several limitations and that one of the most obvious features lacking in our model is asynchronous neurons (The limitations are now discussed more openly in the last paragraph of the model subsection). In the data from the Garcia-Junco-Clemente et al paper above there are many asynchronous neurons as well. To ameliorate this limitation, we have now created a modified model that now accounts for asynchronous neurons together with the competing anticorrelated neurons (now shown and described in Fig S9). We put this modified model in supplementary material and kept the simpler, original model in the main findings of our work, because the original model provides a simpler account of the features of the data we focused on in our work – i.e. anticorrelated scale-free fluctuations. The addition of the asynchronous population does not substantially change the behavior of the two anticorrelated groups in the original model.

      We probably expect that the full covariance matrix has similar structure from any seed (see Meshulam et al. 2019, PRL, for an analysis of scaling of coarse-grained activity covariance), and this suggests multiple cross-over inhibition constraints, which seem like they could be hard to satisfy.

      R2d: We agree that it remains an outstanding challenge to create a model that reproduces the full complexity of the covariance matrix. We feel that this challenge is beyond the scope of this paper, which is already arguably squeezing quite a lot into one manuscript (one reviewer already suggested removing figures!).

      We added a paragraph at the end of the subsection about the model to emphasize this limitation of the model as well as other limitations. This new paragraph says:

      While our model offers a simple explanation of anticorrelated scale-free dynamics, its simplicity comes with limitations. Perhaps the most obvious limitation of our model is that it does not include neurons with weak correlations to both e+ and e- (those neurons in the middle of the correlation spectrum shown in Fig 7B). In Fig S9, we show that our model can be modified in a simple way to include asynchronous neurons. Another limitation is that we assumed that all non-zero synaptic connections were equal in weight. We loosen this assumption allowing for variable weights in Fig S9, without changing the basic features of anticorrelated scale-free fluctuations. Future work might improve our model further by accounting for neurons with intermediate correlations.

      The motifs identified in Fig. 8 likely exist, but I am left with many questions of what we learned about connectivity rules that would account for the full distribution of correlations. Would starting with an Erdos-Renyi network with slight over-representation of these motifs be sufficient? How important is the homogeneous connection weights from each pool assumption - would allowing connection weights with some dispersion change the results?

      R2e: First, we emphasize that our specific goal with our model was to identify a possible mechanism for the anticorrelated scale-free fluctuations that played the key role in our analyses. We agree that this is not a complete account of all correlations, but this was not the goal of our work. Nonetheless, our new modified model in Fig S9 now accounts for additional neurons with weak correlations. However, we think that future theoretical/modeling work will be required to better account for the intermediate correlations that are also present in the experimental data.

      We confirmed that an Erdo-Renyi network of E and I neurons can produce scale-free dynamics, but cannot produce substantial anticorrelated dynamics (Fig 8G, top right motif). Additionally, the parameter space study we performed with our model in Fig 8 showed that if the interactions between the two excitatory groups exceed a certain tipping point density, then the model behavior switches to behavior expected from an Erdos-Renyi network (Fig 8F). Finally, we have now confirmed that some non-uniformity of synaptic weights does not change the main results (Fig S9). In the model presented in Fig S9, the value of each non-zero connection weight was drawn from a uniform distribution [0,0.01] or [-0.01,0] for excitatory and inhibitory connections, respectively. All of these facts are described in the model subsection of the paper results.

      As a whole, this paper has the potential to make an impact on how large-scale neural and behavioral recordings are analyzed and interpreted, which is of high interest to a large contingent of the field.

      Reviewer #3 (Public Review):

      The primary goal of this work is to link scale free dynamics, as measured by the distributions of event sizes and durations, of behavioral events and neuronal populations. The work uses recordings from Stringer et al. and focus on identifying scale-free models by fitting the log-log distribution of event sizes. Specifically, the authors take averages of correlated neural sub-populations and compute the scale-free characterization. Importantly, neither the full population average nor random uncorrelated subsets exhibited scaling free dynamics, only correlated subsets. The authors then work to relate the characterization of the neuronal activity to specific behavioral variables by testing the scale-free characteristics as a function of correlation with behavior. To explain their experimental observation, the authors turn to classic e-i network constructions as models of activity that could produce the observed data. The authors hypothesize that a winner-take-all e-i network can reproduce the activity profiles and therefore might be a viable candidate for further study. While well written, I find that there are a significant number of potential issues that should be clarified. Primarily I have main concerns: 1) The data processing seems to have the potential to distort features that may be important for this analysis (including missed detections and dynamic range), 2) The analysis jumps right to e-i network interactions, while there seems to be a much simpler, and more general explanation that seems like it could describe their observations (which has to do with the way they are averaging neurons), and 3) that the relationship between the neural and behavioral data could be further clarified by accounting for the lop-sidedness of the data statistics. I have included more details below about my concerns below.

      Main points:

      1) Limits of calcium imaging: There is a large uncertainty that is not accounted for in dealing with smaller events. In particular there are a number of studies now, both using paired electro-physiology and imaging [R1] and biophysical simulations [R2] that show that for small neural events are often not visible in the calcium signal. Moreover, this problem may be exacerbated by the fact that the imaging is at 3Hz, much lower than the more typical 10-30Hz imaging speeds. The effects of this missing data should be accounted for as could be a potential source of large errors in estimating the neural activity distributions.

      R3a: We appreciate the concern here and agree that event size statistics could in principle be biased in some systematic way due to missed spikes due to deconvolution of Ca signals. To directly test this possibility, we performed a new analysis of spike data recorded with high time resolution electrophysiology. We began with forward-modeling process to create a low-time-resolution, Ca-like signal, using the same deconvolution algorithm (OASIS) that was used to generate the data we analyzed in our work here. In agreement with the reviewer’s concern, we found that spikes were sometimes missed, but the loss was not extreme and did not impact the neural event size statistics in a significant way compared to the ground truth we obtained directly from the original spike data (with no loss of spikes). This new work is now described in a new paragraph at the end of the subsection of results related to Fig 3 and in a new Fig S3. The new paragraph says…

      Two concerns with the data analyzed here are that it was sampled at a slow time scale (3 Hz frame rate) and that the deconvolution methods used to obtain the data here from the raw GCAMP6s Ca imaging signals are likely to miss some activity (Huang et al., 2021). Since our analysis of neural events hinges on summing up activity across neurons, could it be that the missed activity creates systematic biases in our observed event size statistics? To address this question, we analyzed some time-resolved spike data (Neuropixel recording from Stringer et al 2019). Starting from the spike data, we created a slow signal, similar to that we analyzed here by convolving with a Ca-transient, down sampling, deconvolving, and z-scoring (Fig S3). We compared neural event size distributions to “ground truth” based on the original spike data (with no loss of spikes) and found that the neural event size distributions were very similar, with the same exponent and same power-law range (Fig S3). Thus, we conclude that our reported neural event size distributions are reliable.

      However, although loss of spikes did not impact the event size distributions much, the time-scale of measurement did matter. As discussed above and shown in Fig S4, changing from 5 ms time resolution to 330 ms time resolution does change the exponent and the range of the power law. However, in the test data set we worked with, the existence of a power law was robust across time scales.

      2) Correlations and power-laws in subsets. I have a number of concerns with how neurons are selected and partitioned to achieve scale-free dynamics. 2a) First, it's unclear why the averaging is required in the first place. This operation projects the entire population down in an incredibly lossy way and removes much of the complexity of the population activity.

      R3b: Our population averaging approach is motivated by theoretical predictions and previous work. According to established theoretical accounts of scale-free population events (i.e. non-equilibrium critical phenomena in neural systems) such population-summed event sizes should have power law statistics if the system is near a critical point. This approach has been used in many previous studies of scale-free neural activity (e.g. all of those cited in the introduction in relation to scale-free neuronal avalanches). One of the main results of our study is that the existing theories and models of critical dynamics in neural systems fail to account for small subsets of neurons with scale-free activity amid a larger population that does not conform to these statistics. We could not make this conclusion if we did not test the predictions of those existing theories and models.

      2b) Second, the authors state that it is highly curious that subsets of the population exhibit power laws while the entire population does not. While the discussion and hypothesizing about different e-i interactions is interesting I believe that there's a discussion to be had on a much more basic level of whether there are topology independent explanations, such as basic distributions of correlations between neurons that can explain the subnetwork averaging. Specifically, if the correlation to any given neuron falls off, e.g., with an exponential falloff (i.e., a Gaussian Process type covariance between neurons), it seems that similar effects should hold. This type of effect can be easily tested by generating null distributions using code bases such as [R3]. I believe that this is an important point, since local (broadly defined) correlations of neurons implying the observed subnetwork behavior means that many mechanisms that have local correlations but don't cluster in any meaningful way could also be responsible for the local averaging effect.

      R3c: We appreciate the reviewer’s effort, trying out some code to generate a statistical model. We agree that we could create such a statistical model that describes the observed distribution of pairwise correlations among neurons. For instance, it would be trivial to directly measure the covariance matrix, mean activities, and autocorrelations of the experimental data, which would, of course, provide a very good statistical description of the data. It would also be simple to generate more approximate statistical descriptions of the data, using multivariate gaussians, similar to the code suggested by the reviewer. However, we emphasize, this would not meet the goal of our modeling effort, which is mechanistic, not statistical. The aim of our model was to identify a possible biophysical mechanism from which emerge certain observed statistical features of the data. We feel that a statistical model is not a suitable strategy to meet this aim. Nonetheless, we agree with the reviewer that clusters with sharp boundaries (like the distinction between e+ an e- in our model) are not necessary to reproduce the cancelation of anticorrelated neurons. In other words, we agree that sharp boundaries of the e+ and e- groups of our model are not crucial ingredients to match our observations.

      2c) In general, the discussion of "two networks" seems like it relies on the correlation plot of Figure~7B. The decay away from the peak correlation is sharp, but there does not seem to be significant clustering in the anti-correlation population, instead a very slow decay away from zero. The authors do not show evidence of clustering in the neurons, nor any biophysical reason why e and i neurons are present in the imaging data.

      R3d: First a small reminder: As stated in the paper, the data here is only showing activity of excitatory neurons. Inhibitory neurons are certainly present in V1, but they are not recorded in this data set. Thus we interpret our e+ and e- groups as two subsets of anticorrelated excitatory neurons, like those we observed in the experimental data. We agree that our simplified model treats the anticorrelated subsets as if they are clustered, but this clustering is certainly not required for any of the data analyses of experimental data. We expect that our model could be improved to allow for a less sharp boundary between e+ and e- groups, but we leave that for future work, because it is not essential to most of the results in the paper. This limitation of the model is now stated clearly in the last paragraph of the model subsection.

      The alternative explanation (as mentioned in (b)) is that the there is a more continuous set of correlations among the neurons with the same result. In fact I tested this myself using [R3] to generate some data with the desired statistics, and the distribution of events seems to also describe this same observation. Obviously, the full test would need to use the same event identification code, and so I believe that it is quite important that the authors consider the much more generic explanation for the sub-network averaging effect.

      R3e: As discussed above, we respectfully disagree that a statistical model is an acceptable replacement for a mechanistic model, since we are seeking to understand possible biophysical mechanisms. A statistical model is agnostic about mechanisms. We have nothing against statistical models, but in this case, they would not serve our goals.

      To emphasize our point about the inadequacy of a statistical model for our goals, consider the following argument. Imagine we directly computed the mean activities, covariance matrix, and autocorrelations of all 10000 neurons from the real data. Then, we would have in hand an excellent statistical model of the data. We could then create a surrogate data set by drawing random numbers from a multivariate gaussian with same statistical description (e.g. using code like that offered by reviewer 3). This would, by construction, result in the same numbers of correlated and anticorrelated surrogate neurons. But what would this tell us about the biophysical mechanisms that might underlie these observations? Nothing, in our opinion.

      2d) Another important aspect here is how single neurons behave. I didn't catch if single neurons were stated to exhibit a power law. If they do, then that would help in that there are different limiting behaviors to the averaging that pass through the observed stated numbers. If not, then there is an additional oddity that one must average neurons at all to obtain a power law.

      R3f: We understand that our approach may seem odd from the point of view of central-limit-theorem-type argument. However, as mentioned above (reply R3b) and in our paper, there is a well-established history of theory and corresponding experimental tests for power-law distributed population events in neural systems near criticality. The prediction from theory is that the population summed activity will have power-law distributed events or fluctuations. That is the prediction that motivates our approach. In these theories, it is certainly not necessary that individual neurons have power-law fluctuations on their own. In most previous theories, it is necessary to consider the collective activity of many neurons before the power-law statistics become apparent, because each individual neurons contributes only a small part to the emergent, collective fluctuations. This phenomenon does not require that each individual neuron have power-law fluctuations.

      At the risk of being pedantic, we feel obliged to point out that one cannot understand the peculiar scale-free statistics that occur at criticality by considering the behavior of individual elements of the system; hence the notion that critical phenomena are “emergent”. This important fact is not trivial and is, for example, why there was a Nobel prize awarded in physics for developing theoretical understanding of critical phenomena.

      3) There is something that seems off about the range of \beta values inferred with the ranges of \tau and $\alpha$. With \tau in [0.9,1.1], then the denominator 1-\tau is in [-0.1, 0.1], which the authors state means that \beta (found to be in [2,2.4]) is not near \beta_{crackling} = (\alpha-1)/(1-\tau). It seems as this is the opposite, as the possible values of the \beta_{crackling} is huge due to the denominator, and so \beta is in the range of possible \beta_{crackling} almost vacuously. Was this statement just poorly worded?

      R3g: The point here is that theory of crackling noise predicts that the fit value of beta should be equal to (1-alpha)/(1-tau). In other words, a confirmation of the theory would have all the points on the unity line in the rightmost panels of Fig9D and 9E, not scattered by more than an order of magnitude around the unity line. (We now state this explicitly in the text where Fig 9 is discussed.) Broad scatter around the unity line means the theory prediction did not hold. This is well established in previous studies of scale-free brain dynamics and crackling noise theory (see for example Ma et al Neuron 2019, Shew et al Nature Physics 2015, Friedman et al PRL 2012). A clearer single example of the failure of the theory to predict beta is shown in Fig 5A,B, and C.

      4) Connection between brain and behavior:

      4a) It is not clear if there is more to what the authors are trying to say with the specifics of the scale free fits for behavior. From what I can see those results are used to motivate the neural studies, but aside from that the details of those ranges don't seem to come up again.

      R3h: The reviewer is correct, the primary point in Fig 2 is that scale-free behavioral statistics often exist. Beyond this point about existence, reporting of the specific exponents and ranges is just standard practice for this kind of analysis; a natural question to ask after claiming that we find scale behavior is “what are the exponents and ranges”. We would be remiss not to report those numbers.

      4b) Given that the primary connection between neuronal and behavioral activity seems to be Figure~4. The distribution of points in these plots seem to be very lopsided, in that some plots have large ranges of few-to-no data points. It would be very helpful to get a sense of the distribution of points which are a bit hard to see given the overlapping points and super-imposed lines.

      R3i: We agree that this whitespace in the figure panels is a somewhat awkward, but we chose to keep the horizontal axis the same for all panels of Fig 4B, because this shows that not all behaviors, and not all animals had the same range of behavioral correlations. We felt that hiding this was a bit misleading, so we kept the white space.

      4c) Neural activity correlated with some behavior variables can sometimes be the most active subset of neurons. This could potentially skew the maximum sizes of events and give behaviorally correlated subsets an unfair advantage in terms of the scale-free range.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Scalabrino et al. show persistent cone-mediated RGC signaling despite changes in cone morphology and density with rod degeneration in CNGB1 mouse model of retinitis pigmentosa. The authors use a linear-nonlinear receptive field model to measure functional changes (spatial and temporal filters and gain) across the RGC populations with space-time separable receptive fields. At mesopic and photopic conditions, receptive field changes were minor until rod death exceeded 50%; while response gain decreased with photoreceptor degeneration. Using information theory, the authors evaluated the fidelity of RGC signaling demonstrated that mutual information decreased with rod loss, but cone-mediated RGC signaling was relatively stable and was more robust for natural movies than artificial stimulus. This work reveals the preservation of cone function and a robustness in encoding natural movies across degeneration. This manuscript is the first demonstration of using information theory to evaluate the effects of neural degeneration on sensory coding. The study uses a systematic evaluation of rod and cone function in this model of rod degeneration to make the following findings: (1) cone function persists for 5-7 months, (2) spatial and temporal changes to the ganglion cell receptive fields were not monotonic with time, (3) mutual information between spikes and photopic stimuli remained relatively constant up to 3-5 months, and (4) information rates were higher for natural movies than for checkerboard noise stimuli.

      The strengths of this paper include the following:

      A systemic evaluation of potentially confusing data. The authors do an excellent job of organizing the results in terms of light levels and time points. The results themselves are confusing and difficult to draw across metrics, but the data are presented as clearly as possible. The work is especially well executed and presented.

      The insight that cone responses remain relatively stable despite rod loss. The study clearly demonstrates that despite cone loss and morphological changes, cone-mediated responses remain robust and functional.

      The application of information theory to degeneration is the first of its kind and the study clearly shows the utility of the metric.

      The results are thoughtfully interpreted.

      We thank the reviewer for these comments.

      The weaknesses of this study include the following:

      The inability to follow the same ganglion cell types over time is a major weakness that could confound the interpretation in terms of whether the changes are happening from artifacts of the recording method or from dynamic changes in the pooled population of ganglion cells. Is there even a single cell class, for example the ON-OFF direction-selective ganglion cells, that this group has so well quantified on the MEA, that the study could track over time, in addition to examining the pooled population changes over time? Tracking a single cell type for each of the metrics would make the population data more convincing or could clearly show that not all ganglion cells follow the population trend.

      As suggested by the reviewer, we have added a cell type that is tracked through all the analyses: ON brisk sustained RGCs. Example receptive field mosaics, temporal receptive fields, and spike train autocorrelation functions for WT and 4M Cngb1neo/neo animals are shown in Figure 2-figure supplement 1E-F. These RGCs follow the trends displayed by the larger populations of RGCs in each analysis. We chose this cell type because they are readily identified by their spike train autocorrelation functions compared to other RGC types and they have approximately space-time separable receptive fields (RFs). There are many text changes associated with adding an analysis of the ON Brisk sustained RGCs (see lines 202-207; 227-229; 264-267, etc).

      We chose not to focus on direction selective RGCs because we are analyzing the spatial and temporal RFs of RGCs in Figures 3-5 and direction-selective RGCs do not have space-time separable RFs (see example in Figure 2C-D). Thus, those cells could not be used to track those receptive field properties across degeneration. Also, we did not collect responses to drifting gratings or bar responses across a range of speeds or contrasts, so we are unable to reliably distinguish the different types of direction-selective RGCs (e.g., ON vs ON-OFF) from these data.

      While the non-monotonic changes are interesting, they are also difficult to make sense of. Can the authors speculate in the Discussion what could be underlying mechanisms that give rise to non-monotonic changes. In the absence of potential mechanisms, the concern of recording artifacts arises.

      Thank you for raising this point. We have added some speculation for the cause of these non-monotonic changes in the Discussion (lines 455-462). “While we do not know why non-monotonic changes are occurring for some RF properties, they largely occurred in the 3-5M range. During this time, there is a transient decrease in the rate of rod death (4-5M) and cone death begins (Figure 1). Consequently, there may be complex changes to retinal circuitry as the retina reacts to a temporary stabilization in rod numbers and an acceleration in cone death. Intracellular studies of the light-driven synaptic currents impinging onto bipolar cells and RGCs during this time will be important for understanding the origin of these non-monotonic changes in RF properties.”

      The mutual information calculation seems to be correlated with the spike rate despite the argument made in Fig 10E-F. Can the authors show this directly by calculating the bits per spike in Figures 8 and 9? Of all the metrics, the gain function and the mutual information seem to be more consistent with each other. Can the authors demonstrate or refute a connection between the spike rate and information rates?

      We added a supplementary figure to each of the information figures (see for Figures 8-10 figure supplement 1) showing the trends hold after dividing the information rate by the spike rate. Certainly, changing spike rates are contributing, but there are also clear changes in the bits/spike plots (Figure 8-figure supplement 1D; Figure 9-figure supplement 1D, Figure 10-figure supplement 1D).

      Can the authors provide an explanation for why the mutual information calculation remains stable despite lower SNR and lower gain, especially after the contributions of oscillations have been ruled out?

      The mutual information depends more strongly on the precision of spiking (both in terms of time and spike number within a small time bin) than the mean spike rate (averaged over the stimulus). Diminishing the total number of spikes (because of reduced gain) will have a relatively small effect on the information rate if the spike trains continue to exhibit low variability (high precision). Indeed, spike generation by RGCs is distinctly sub-Poisson (Berry, Warland, and Meister 1997), indicating it can exhibit relatively high information rates even when spike rates are relatively low. We clarified this in Results at lines 493-496.

      Lack of age-matched WT controls to accompany the different time points. It is known that photoreceptor degeneration can occur naturally in WT mice. Though the authors have used controls pooled from across the ages used in the CNGB1 mutants, it would be informative to know if there are age-dependent changes in any of the metrics for WT mice.

      WT recordings were pooled from retinas from littermate control mice between 2 and 7 months of age (n=3 2M, n=1 each 4M, 6M, 7M). We have added data points from individual retinal recordings to the figure supplements for Figure 2-6 and 8-10 to illustrate the consistency between these recordings, which allowed us to confidently pool the results.

      Can the authors elaborate on why cone function persists despite the rod loss and morphological changes? This is unique for other models of rod loss and is worth extra discussion.

      This is something we are also very interested in, but outside the scope of this study. The Sampath Lab (co-author and collaborator) has data from single cell recordings in late stage rd10 retinas that show abnormal cone signaling (and structure similar to the 7M Cngb1neo/neo cones), yet relatively normal cone bipolar cell and horizontal cell responses. Thus, somehow there is either compensation or a high level of redundancy in the transmission of signals from cones to 2nd-order neurons that makes the responses of the 2nd-order neurons robust to deteriorating cone function. These results suggest our observations in Cngb1neo/neo mice are not unique to this model of RP. Future experiments are needed to understand how this compensation is occurring.

      Reviewer #2 (Public Review):

      In this study, the authors assess the decline of retinal function in a mouse model of slow photoreceptor degeneration - the Cngb1neo/neo. Rod loss occurs between 1-7 months and complete cone loss occurs by 8-9 months. The authors characterize cone loss in the first 7 months and find that 70% of cones are still there at 7 months, though their outer segments are highly degraded. They then use MEA recordings to characterize retinal function using a variety of measures. First, they use spike-triggered averaging to determine the spatial and temporal receptive fields, restricting this analysis to RGCs that have separable spatial and temporal receptive fields. They find that both rod and cone receptive fields are surprisingly intact over the first 5 months, identifying primarily a reduction in contrast response functions (and a reduction in the number of rods that are light responsive-though this is not quantified). Second, they show that oscillatory activity does not appear until after photoreceptors are completely deteriorated-in sharp contrast to other PR degeneration models (e.g. rd10) in which oscillatory activity appears while there are still light-evoked responses. Third, they use information theory to assess the reliability of signaling. When examining the 10% of RGCs with the highest information rates they see a significant decrease at mesoscopic light levels, while information rates were mostly stable at photopic light levels. Finally, they showed that at photopic light levels, the mutant retinas conveyed more information about natural movies than a repeating checkerboard, and this was maintained across light levels.

      My primary question is whether this represents a significant advance. There have been many studies regarding the changing retinal circuits in various rodent models of photoreceptor degeneration. The authors make a few arguments regarding the uniqueness of this study.

      One is that this is a novel analysis that is not limited to particular cell types but rather characterized the retinal as a "whole". But in this point is also its weakness. First, one cannot speak to the retinal as a "whole" since they state that there is a reduction in the number of light-responsive cells across degeneration - yet they do not quantify it. This seems incredibly important to know because even presuming the remaining cells have perfect receptive field structure if only 10% of cells are left, assessing the receptive fields of only the remaining cells is clearly not a characterization of the retention of visual function.

      We never claim that we have assessed the “retina as a whole”. We do state that we are measuring certain features of RGC signaling that reflect the “net changes” induced by photoreceptor degeneration (e.g., changes in photoreceptor function, retinal rewiring, homeostatic mechanisms, etc.) on those features. In fact, we are explicit that we are only measuring certain RF properties in certain RGC types, such as the linear spatial and temporal RFs in cells with space-time separable RFs: Figure 2 makes this point explicitly. We do not measure changes in direction-selectivity, object motion sensitivity, orientation selectivity, edge detection, looming detection, luminance encoding, chromatic opponency, contrast adaptation, motion reversal signaling, etc., because doing so would produce a manuscript with at least one figure for every RGC type (e.g., 45 figures). This would clearly be an unreasonable amount for a single study.

      We agree with the Reviewer that explicitly quantifying the number of light responsive RGCs is important, and we now include this information as a function of degeneration time point in Figure 2-figure supplement 1. Under photopic conditions, this fraction is quite stable until 5M and then begins to deteriorate. We also observe a decrease in the number of RGCs with space-time separable RFs at 5M (Figure 2F), suggesting (but not proving) that these RGCs are representative of changes across all RGCs. We also described these results in the Results (lines 167-174).

      Second, it is hard to assess whether this mouse model is better than existing models for human disease. Their phenotype is different than the rat model of this same disease. It also shows a lack of oscillatory activity that is apparent in rd models.

      We are not making the claim that this model is better than other models. Each model has value. However, because the degeneration in this model is relatively slow, it may be more representative of changes that occur in slower forms of human retinal degeneration (emphasis on “may be”). This is a discussion point, not something that we are aiming to prove. We also believe the utility of a model depends on the questions being asked. In this case, we aimed to track changes over time during photoreceptor loss to better understand the extent to which retinal output is impaired.

      Also, retinitis pigmentosa is a heterogenous disease with a spectrum of phenotypes that may or may not be genotype specific. A patient with a PDE6B mutation presents with differing phenotypes than a patient with CNGB1 mutation, despite both having an RP diagnosis. It is fallacy to assume a mouse is the exact same as a human, just as it is incorrect to assume clinical presentations are identical for all patients for one broad disease that is known to have a diverse set of underlying causes. Studying a range of models is thus essential to understanding the disease. Given that mutations causing RP have different impacts on retinal signaling, we believe it is important to contextualize findings to their mutation. We make this point in Discussion: Comparison to previous studies of RGC signaling in retinitis pigmentosa (beginning on line 436).

      Finally, the model we study does not lack oscillatory activity, it simply arises later than in rd1 or rd10 mice and does so only after all the photoreceptors have died (Figure 7). To our knowledge, it is not clear when or even if RGCs exhibit oscillations in human patients with RP. We discuss why oscillation might arise at different time points in different genetic models of RP in lines 555-570.

      Reviewer #3 (Public Review):

      In the manuscript by Scalabrino et al. a rigorous characterization of the functionality of retinal ganglion cells in a mouse model of rod photoreceptor degeneration is presented. The authors analyzed the degeneration of cone photoreceptors, which is known to be linked to rod degeneration. Based on the time course of cone degeneration they investigated the functional properties of retinal ganglion cells aged between 1 month and seven months.

      The most interesting finding is robust preservation of functional properties, as reflected in little changes of the receptive fields (spatial and temporal characteristics) or signaling fidelity/information rate. In contrast to other mouse models, the present one shows no oscillatory activity until a complete loss of cone photoreceptors occurred at an age of nine months.

      Although the receptive fields of retinal ganglion cells remain nearly intact, the number of ganglion cells with identifiable receptive fields decreases significantly with age (Fig.2F). Could the authors comment, if this might imply a "patchy" vision?

      Visual field loss is a predominant clinical observation in patients with retinitis pigmentosa, including those with Cngb1 mutations. We connect to this observation in the Discussion at lines 521-529: “At the latest stages of photoreceptor degeneration in the Cngb1neo/neo mice (5-7M), we did observe a decrease in the fraction of RGCs with spike rates that were strongly modulated by checkerboard noise (Supplemental Figure 2). It is possible these RGCs were losing their light response completely, or that changes in their light response properties made them relatively unresponsive to checkerboard noise. If the former, it is possible that light responsive RGCs are becoming sparser at the later stages of degeneration which may result in inhomogeneous, or “patchy”, visual sensitivity described by RP patients (see reviews by Hull et al., 2017; Nassisi et al., 2021).”

      Reviewer #4 (Public Review):

      Scalabrino et al. report the remarkable persistence of cone-driven retinal ganglion cell responses in a mouse model of retinitis pigmentosa (i.e., Cngb1 KO mice). The authors first map the time course of primary rod and secondary cone degeneration in Cngb1 KO mice. Approximately 30% of rods are gone at one month (1M), and all rods are lost by 7M in Cngb1 KO retinas. The cone morphology changes progressively as rods degenerate, cone outer segments shrink and are largely absent by 5M. Cones die between 8-9M. Scalabrino et al. next perform multielectrode array recordings from wild-type and Cngb1 KO retinas from 1M to 5M in mesopic and photopic stimulus conditions. They find that spatiotemporal receptive fields remain relatively stable in the face of photoreceptor degeneration, whereas contrast gain gradually decreases. Oscillatory spontaneous ganglion cell activity emerges late (~9M) in Cngb1 KO mice compared to other retinal degeneration models. Finally, the authors analyze mutual information between stimuli (white noise and naturalistic movies) and ganglion cell spikes trains and find that the encoding of the most informative ganglion cells is preserved relatively late into photoreceptor degeneration and that information rates decline less in photopic vs. mesopic conditions and for naturalistic movies vs. white noise stimuli.

      Overall, this is an exciting study that shows remarkable preservation of cone-driven ganglion cell light responses in advanced stages of a retinitis pigmentosa model when most rods have died, and cone morphologies are dramatically altered. The results are presented clearly in the text and figures and are scholarly discussed. Nonetheless, the authors should address a few specific comments to clarify and better support some of the conclusions they draw.

      Specific comments:

      1) In describing the results on information encoding, the authors write and show data (panels A of Figures 8-10) that suggest that most ganglion cells, even in recordings from wild-type retinas, respond unreliably to white noise stimuli and naturalistic movies. Why does such a large fraction of cells have such low repeat reliability? Does this reflect unreliable spike detection and sorting, poor cell or tissue health, or true variability in the responses of healthy retinal ganglion cells. The latter does not seem to align with results from patch-clamp recordings targeted to specific ganglion cell types. The limited repeat reliability also raises questions about how well the linear-nonlinear model, which the authors use to compare responses between wild-type and Cngb1 KO mice of different ages, predicts the responses of these cells. Comparing model parameters (receptive field size, temporal filtering, and contrast sensitivity) between genotypes and ages only makes sense if the model is a good description in the acquired datasets.

      We agree with the reviewer that this is an important point to be clear about. In Figures 8-10 some RGCs exhibit high repeatability, some exhibit low repeatability as quantified by their information rates. The reviewer is concerned about those cells with low repeatability and the ability of capturing their responses with an LN model. This is a valid concern, but to be clear, we are not fitting an LN model to cells with low information rates. In Figures 3-6, where an LN model is being used to estimate the spatial and temporal components of the RFs, we are fitting a subset of all the RGCs: those with space-time separable RFs (see Figure 2). Those particular cells exhibit high information rates and highly reproducible responses, and an LN model captures ~60% of the explainable variance in the spike rate (see Figure 2-figure supplement 1A-B; also see lines 157-151). This is typical for LN models that approximately predict the responses of RGCs to checkerboard noise. Thus, we think the LN model reasonably captures the responses of cells for which we use the LN model. The information rate estimates include these cells as well as other cells that are not well described by an LN model. Note, the LN model is not used to calculate the mutual information rates. We have added text in the Results (lines 324-327) to clarify this.

      In addition, the information rates we estimated in mouse are consistent with past studies from guinea pig (Koch et al, 2004 and Koch et al, 2006). We think cells with very low repeatability are not well driven by checkerboard noise or the particular 10s natural movies we showed. We have updated the example neurons to better reflect the reliability of the cells near the median of the MI distributions in Figures 8-10.

      2) The authors should, maybe in figure supplements and parts of the main figures, break results down by recordings. Inter-experimental variability has been well documented (e.g., Shah et al. Neuron 2022, Zhao et al Sci Rep 2020), and it would be reassuring to see that the conclusions drawn by the authors are supported by statistics in which n = number of recordings (e.g., there is a somewhat difficult to explain broadening of temporal filters in 4M Cngb1 KO retinas that recover by 5M).

      We agree that inter-experiment variability can be large and is important to control for. We now show all the analyses broken down by experiment in Supplemental Figures (2, 3, 4, 5, 6, 8, 9, and 10) for each analysis. None of the trends we describe or highlight in the manuscript were driven by inter-experiment variability.

      3) At different points in their manuscript, the authors conclude that their results "suggest that homeostatic mechanisms in the retina serve to compensate for deteriorating photoreceptors" (or similar). I think that this may well be the case. However, in its present form, the study provides no evidence that retinal circuits in Cngb1 KO mice change to preserve function compared to the alternative that the observed stability is evidence for functional redundancy or resilience in retinal circuits (as they are) without the need for adjustments. Distinguishing between these alternatives would be conceptually important. For example, Care et al. Cell Rep 2019 and Care et al. Cell Rep 2020 used partial stimulation to activate fewer photoreceptors and compare light responses in downstream neurons to those in retinas with fewer photoreceptors. Other studies have directly observed changes in circuit wiring in models of retinal degeneration. If the authors cannot provide experimental evidence for homeostatic changes, it would be good to reflect this in the interpretation and discussion.

      The reviewer raises a terrific point and potential alternative interpretation. We agree. We have not been able to identify an equivalent analysis to that in Care et al. 2019 that we can run that will cleanly distinguish between these two possibilities, without doing many more experiments across timepoints of degeneration. We have thus rewritten portions of the Introduction and the Discussion to recognize the potential of this alternative interpretation.

      Introduction (lines 39-44): Alternatively, homeostatic plasticity or redundancy in retinal circuitry may compensate for photoreceptor loss (Care et al., 2020; Lee et al., 2021; Shen et al., 2020). Such mechanisms could facilitate reliable signaling at the level of retinal output, despite deterioration in photoreceptor function. Identifying the extent to which changes in photoreceptor morphology impact retinal output will inform treatment timepoints for gene therapies aimed at halting rod loss to preserve cone-mediated vision.

      Discussion (lines 514-520): There are two potential classes of mechanisms for this compensation. First, homeostatic plasticity has been documented in models of photoreceptor loss in which the retina remodels to preserve signal transmission (Care et al., 2019; Keck et al., 2013, 2011, 2008; Leinonen et al., 2020; Shen et al., 2020). Alternatively, functional redundancy within the circuit could explain how robust retinal signaling is retained longer than the changes in cone morphology would suggest (Care et al., 2020). This study did not distinguish between the two compensation models.

      4) The authors do not attempt to classify retinal ganglion cells into functional types as functional changes from degeneration may confound such classifications. However, it would be beneficial to separate some categorical response types (direction-selective ON-OFF and ON ganglion cells, maybe orientation-selective [horizontal, vertical, ON, OFF] ganglion cells) and compare how their responsiveness, reliability, and information encoding change with degeneration. This would provide additional insights and address concerns that changes caused by degeneration may be obscured by the differences between ganglion cell types in the present analysis.

      We agree. We now track ON brisk sustained RGCs across degeneration time points for the RF analyses and mutual information analyses. These RGCs are likely the ON sustained alpha cells because they generate very large spikes on the MEA as would be expected for cells with large somata. Example receptive field mosaics, temporal receptive fields, and spike train autocorrelation functions for WT and 4M Cngb1neo/neo animals are shown in Figure 2-figure supplement 1E-F. These RGCs follow the trends displayed by the larger populations of RGCs in each analysis. We chose this cell type because they are readily identified by their spike train autocorrelation functions compared to other RGC types and they have approximately space-time separable receptive fields (RFs). There are many text changes associated with adding an analysis of the ON Brisk sustained RGCs (see lines 202-207; 227-229; 264-267, etc).

      We chose not to focus on direction selective RGCs because we are analyzing the spatial and temporal RFs of RGCs in Figures 3-5 and direction-selective RGCs do not have space-time separable RFs (see example in Figure 2C-D). Thus, those cells could not be used to track those receptive field properties across degeneration. Also, we did not collect responses to drifting gratings or bar responses across a range of speeds or contrasts, so we are unable to reliably distinguish the different types of direction-selective RGCs (e.g., ON vs ON-OFF) from these data.

    1. Historical Hypermedia: An Alternative History of the Semantic Web and Web 2.0 and Implications for e-Research. .mp3. Berkeley School of Information Regents’ Lecture. UC Berkeley School of Information, 2010. https://archive.org/details/podcast_uc-berkeley-school-informat_historical-hypermedia-an-alte_1000088371512. archive.org.

      https://www.ischool.berkeley.edu/events/2010/historical-hypermedia-alternative-history-semantic-web-and-web-20-and-implications-e.

      https://www.ischool.berkeley.edu/sites/default/files/audio/2010-10-20-vandenheuvel_0.mp3

      headshot of Charles van den Heuvel

      Interface as Thing - book on Paul Otlet (not released, though he said he was working on it)

      • W. Boyd Rayward 1994 expert on Otlet
      • Otlet on annotation, visualization, of text
      • TBL married internet and hypertext (ideas have sex)
      • V. Bush As We May Think - crosslinks between microfilms, not in a computer context
      • Ted Nelson 1965, hypermedia

      t=540

      • Michael Buckland book about machine developed by Emanuel Goldberg antecedent to memex
      • Emanuel Goldberg and His Knowledge Machine: Information, Invention, and Political Forces (New Directions in Information Management) by Michael Buckland (Libraries Unlimited, (March 31, 2006)
      • Otlet and Goldsmith were precursors as well

      four figures in his research: - Patrick Gattis - biologist, architect, diagrams of knowledge, metaphorical use of architecture; classification - Paul Otlet, Brussels born - Wilhelm Ostwalt - nobel prize in chemistry - Otto Neurath, philosophher, designer of isotype

      Paul Otlet

      Otlet was interested in both the physical as well as the intangible aspects of the Mundaneum including as an idea, an institution, method, body of work, building, and as a network.<br /> (#t=1020)

      Early iPhone diagram?!?

      (roughly) armchair to do the things in the web of life (Nelson quote) (get full quote and source for use) (circa 19:30)

      compares Otlet to TBL


      Michael Buckland 1991 <s>internet of things</s> coinage - did I hear this correctly? https://en.wikipedia.org/wiki/Internet_of_things lists different coinages

      Turns out it was "information as thing"<br /> See: https://hypothes.is/a/kXIjaBaOEe2MEi8Fav6QsA


      sugane brierre and otlet<br /> "everything can be in a document"<br /> importance of evidence


      The idea of evidence implies a passiveness. For evidence to be useful then, one has to actively do something with it, use it for comparison or analysis with other facts, knowledge, or evidence for it to become useful.


      transformation of sound into writing<br /> movement of pieces at will to create a new combination of facts - combinatorial creativity idea here. (circa 27:30 and again at 29:00)<br /> not just efficiency but improvement and purification of humanity

      put things on system cards and put them into new orders<br /> breaking things down into smaller pieces, whether books or index cards....

      Otlet doesn't use the word interfaces, but makes these with language and annotations that existed at the time. (32:00)

      Otlet created diagrams and images to expand his ideas

      Otlet used octagonal index cards to create extra edges to connect them together by topic. This created more complex trees of knowledge beyond the four sides of standard index cards. (diagram referenced, but not contained in the lecture)

      Otlet is interested in the "materialization of knowledge": how to transfer idea into an object. (How does this related to mnemonic devices for daily use? How does it relate to broader material culture?)

      Otlet inspired by work of Herbert Spencer

      space an time are forms of thought, I hold myself that they are forms of things. (get full quote and source) from spencer influence of Plato's forms here?

      Otlet visualization of information (38:20)

      S. R. Ranganathan may have had these ideas about visualization too

      atomization of knowledge; atomist approach 19th century examples:S. R. Ranganathan, Wilson, Otlet, Richardson, (atomic notes are NOT new either...) (39:40)

      Otlet creates interfaces to the world - time with cyclic representation - space - moving cube along time and space axes as well as levels of detail - comparison to Ted Nelson and zoomable screens even though Ted Nelson didn't have screens, but simulated them in paper - globes

      Katie Berner - semantic web; claims that reporting a scholarly result won't be a paper, but a nugget of information that links to other portions of the network of knowledge.<br /> (so not just one's own system, but the global commons system)

      Mention of Open Annotation (Consortium) Collaboration:<br /> - Jane Hunter, University of Australia Brisbane & Queensland<br /> - Tim Cole, University of Urbana Champaign<br /> - Herbert Van de Sompel, Los Alamos National Laboratory annotations of various media<br /> see:<br /> - https://www.researchgate.net/publication/311366469_The_Open_Annotation_Collaboration_A_Data_Model_to_Support_Sharing_and_Interoperability_of_Scholarly_Annotations - http://www.openannotation.org/spec/core/20130205/index.html - http://www.openannotation.org/PhaseIII_Team.html

      trust must be put into the system for it to work

      coloration of the provenance of links goes back to Otlet (~52:00)

      Creativity is the friction of the attention space at the moments when the structural blocks are grinding against one another the hardest. —Randall Collins (1998) The sociology of philosophers. Cambridge, MA: Harvard University Press (p.76)

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      1. General Statements

      It is the common view of all three reviewers that we have not utilized adequate in vitro/biochemical evidence to support the idea that SATB1 protein undergoes liquid-liquid phase separation. We do agree with the reviewers that our manuscript lacks biochemical evidence to support such notion. Though we find it quite interesting and we would like to suggest for the first time in the field of chromatin organization and function, based upon the action of SATB1, that this protein does exist in at least two polypeptide isoforms (764 and 795 amino acids long) which display different phase separation propensity and therefore confer different actions in regulating the (patho)physiological properties of a murine T cell.

      Every single research group that works on SATB1, considered so far only a single protein isoform, that is, the shorter isoform of 764 amino acids and no tools, such as isoform-specific antibodies have been developed to discriminate the two isoforms and thus being able to assign unique functions to each isoform. We do understand that such a report, suggesting the presence of two protein isoforms, with potentially quite diverse functions, would question (not necessarily by the authors of this manuscript, since no such comment is included in our manuscript) the conclusions drawn in the literature assigning all biochemical properties to a single, short isoform of SATB1. Moreover, all the genetically modified mice that have been analyzed so far (including our group), deleted both Satb1 isoforms. Our future research approaches should, from now on, consider unraveling the isoform-specific functions of SATB1 and their involvement in physiology and disease. This could also deem useful to explain the quite diverse, both positive and negative effects of SATB1 in transcription regulation. Another major objection of the reviewers was that we should provide cumulative supporting evidence for the existence of the long SATB1 isoform, or at least evaluate the specificity of our custom-made antibody.

      Taking under consideration the aforementioned constructive criticism of the three reviewers we would like to perform (most of the suggested experiments have already been performed) additional experiments to support our claims in the manuscript. These experiments are described below as a point-by-point reply to each point raised by the reviewers.

      In line with the aforementioned rationale, we propose the title of our manuscript to change into “Two SATB1 isoforms display different phase separation propensity”, if our manuscript is considered for publication.

      1. Description of the planned revisions

      **Reviewer #1**:

      4) Lack of in vitro reconstitution experiments with purified long and short SATB1

      **PLANNED EXPERIMENT #1**

      We do realize this shortcoming of our work. We have to note that purifying recombinant SATB1 protein is quite a challenging task, yet we 1. cloned both Satb1 cDNAs for the long and short isoforms, 2. we successfully expressed both proteins in great quantity and quality and we are willing to perform these experiments if our work is considered for publication.

      This proposed experiment has also been requested by Reviewers #2 and #3.

      **Reviewer #2**:

      1. Moreover, an important and direct experiment would be to clone the long isoform in a suitable vector and overexpress in the cell line (as done for the canonical isoform in Supp Fig 1a). This would unequivocally show the efficacy of the antibody and thus the following usage of the same for various assays.

      **PLANNED EXPERIMENT #2**

      This is a great suggestion. We have cloned the long and short Satb1 cDNAs in pEGFP-C1 vector. We will transfect these plasmids in NIH 3T3 fibroblasts and we will perform Western blot analysis, utilizing the antibody raised against the extra 31 amino acids long peptide present only in the long SATB1 isoform, for the following samples: 1. NIH-3T3 whole cell protein extracts, 2. protein extracts from NIH 3T3 fibroblasts transiently transfected with the pEGFP-C1 plasmid, 3. protein extracts from NIH 3T3 fibroblasts transiently transfected with the pEGFP-long_Satb1_ plasmid and 4. protein extracts from NIH 3T3 fibroblasts transiently transfected with the pEGFP-short_Satb1_ plasmid.

      This experiment will consist another proof regarding the specificity of the antibody raised against the extra 31 amino acids long peptide present only in the long SATB1 isoform.

      **Minor comments:**

      1. On pg 6, related to Figure 1, the authors mention 'It should also be noted that when investigating the SATB1 protein levels, we have to bear in mind that the antibodies targeting the N-terminus of SATB1 protein cannot discriminate between the short and long isoforms'. The authors reason that their sizes are too close. It is indeed possible, and widely studied in biochemistry to assess various factors on protein migration (such as PTMs). The authors should validate this aspect (as it is important as per their premise) and perform separation based on charge as well and also use a commercial antibody to validate the same.

      (Experiments already performed)

      We have adapted the text so that it does not imply that the two isoforms cannot be separated by size. This part in lines 102-107 then reads: “It should also be noted that when investigating the SATB1 protein levels, we have to bear in mind that the antibodies targeting the N-terminus of SATB1 protein cannot discriminate between the short and long isoforms, thus we can only compare the amount of the long SATB1 isoform to the total SATB1 protein levels in vivo conditions. To overcome this limitation and to specifically validate the presence of the long SATB1 protein isoform in primary murine T cells, we designed a serial immunodepletion-based experiment (Fig. 1e, Supplementary Fig. 1a).”

      Moreover, in the revised version of the manuscript we now provide a number of additional proofs supporting the presence of the long isoform and also the specificity of the long isoform-specific antibody. As evident in the text cited above, in the revised Fig. 1e,f and revised Supplementary Fig. 1a,b; we present two immunodepletion experiments which should alone address the Reviewer’s concerns. Moreover, we added Supplementary Fig. 1c; demonstrating that the long isoform-specific antibody does not detect any protein in cells with conditionally depleted SATB1 (Satb1_fl/fl_Cd4-Cre+), supporting its specificity. The custom-made and publicly available antibodies targeting all SATB1 isoforms were also verified in Supplementary Fig. 1d. Moreover, the long isoform and all isoform antibodies display similar localization in the nucleus (Supplementary Fig. 1e; their co-localization based on super-resolution microscopy is also quantified in Supplementary Fig. 5a).

      In our accompanying revised manuscript Zelenka et al., 2022 (https://doi.org/10.1101/2021.07.09.451769), we will provide yet another piece of evidence, consisting of bacterially expressed short and long SATB1 protein isoforms detected by western blot using either the long isoform-specific or the non-selective all SATB1 isoform antibodies.

      **PLANNED EXPERIMENT #3**

      Although we think that in the revised version of the manuscript, we have provided enough proof about the existence of the long isoform in primary murine thymocytes we would like to try the following approach as suggested by this Reviewer.

      The pI of the two SATB1 isoform is quite similar. The pI of the short SATB1 isoform is 6.09 and for the long SATB1 isoform is 6.18. We will perform 2D PAGE coupled to Western blotting utilizing the antibodies detecting the long and all SATB1 isoforms. Given the fact that both isoforms are post-translationally modified to a various degree, it will be extremely difficult to discriminate between the long and short unmodified versus the long and short post-translationally modified proteins especially in the absence of a specific antibody only for the short isoform.

      **Reviewer #3**

      1. Hexanediol is another assay frequently used in phase-separation studies. However, hexanediol has many deleterious effects on the cell, even at a fraction of the concentration normally used in phase-separation studies. Authors should show controls of cell viability, control proteins that do not phase-separate, etc. See https://www.jbc.org/article/S0021-9258(21)00027-2/fulltext.

      Secondly, hexanediol treatment should cause phase-separated protein aggregates to disperse. It is difficult to determine from the images whether or not the aggregates actually disperse or there is just less protein. In any case, small aggregates remain even after treatment, and this appears different from most other hexanediol experiments reported in the literature where the signals become more dispersed and uniform. This is likely because the samples are fixed.

      One of the main features of using hexanediol in phase-separation is to show that upon washout, LLPS aggregates can reform. Because the cells are fixed, the critical aspect of this assay is not performed. A washout and LLPS recovery would control for cell viability issues described above and would provide the opportunity to show that total SATB1 protein levels did not change, but its distribution did, which is the essence of this assay in the context of LLPS. This review from the Tjian group is very informative and may be a good resource:

      http://genesdev.cshlp.org/content/33/23-24/1619

      In line with our reply to point #1 of this Reviewer (page 26 of this document), we should again emphasize that we utilized the hexanediol treatment in primary murine developing T cells as this is the only way to investigate the properties of SATB1 speckles under physiological conditions. This also explains why some small insoluble structure remains after the hexanediol treatment. Note that under physiological conditions, there is a contribution of several protein variants (such as differential PTMs) out of which some will tend to form more stable structures while others could undergo LLPS. It is not clear how the washout experiment could be applied in the primary cell conditions that include cell fixation as the heterogeneity and big variation among cells would make such data analysis highly unreliable.

      **PLANNED EXPERIMENT #1**

      As we answered to point #4 of Reviewer 1 (page 2), we propose the following experiment. Although the purification of recombinant SATB1 protein is quite a challenging task, yet we 1. cloned both Satb1 cDNAs for the long and short isoforms, 2. we successfully expressed both proteins in great quantity and quality and we are willing to perform in vitro reconstitution experiments if our work is considered for publication.

      1. The major difference between the long and short isoform of SATB1 is the 31aa segment within the IDR. However the authors find that neither the long or short isoform SATB1 forms LLPS aggregates, and the IDR alone forms aggregates in the cytoplasm (Fig5) but they do not respond to Cry2 light activation. When forced to localize to the nucleus, it does not form aggregates as well (Fig6). The short isoform also did not form any aggregates. These results seem to argue against any isoform specific phase-separation. This experiment seems critical for the story, yet it does not support their overall conclusions. The authors might consider using a different cell line or perhaps do an in vitro assay using purified protein.

      I am not certain what to make of the cytoplasmic aggregation, which appears to not form upon localization to the nucleus. Because of this, it is difficult to place weight on the significance of the S635A mutation and the role that a phosphorylation of SATB1 contributes to phase-separation, let alone function There are many additional points of concern, but the ones listed above are perhaps the most significant in terms of the overall conclusions of the paper.

      In Fig. 5c we show that the full length long SATB1 isoform often aggregates unlike the short isoform. These data are accompanied with the results for the IDR region, where the situation is even more obvious (Fig. 5f,g). However, in the latter, we have to bear in mind the absence of the multivalent N-terminal part of the protein which seems to be essential for the overall phase behavior of the protein as indicated in Fig. 4b,c.

      **PLANNED EXPERIMENT #1**

      To further support LLPS of SATB1, we are considering performing the following in vitro experiment, as we answered to point #4 of Reviewer 1 (page 2). Although the purification of recombinant SATB1 protein is quite a challenging task, yet we 1. cloned both Satb1 cDNAs for the long and short isoforms, 2. we successfully expressed both proteins in great quantity and quality and we are willing to perform in vitro reconstitution experiments if our work is considered for publication.

      1. Description of the revisions that have already been incorporated in the transferred manuscript

      **Reviewer #1 (Evidence, reproducibility and clarity)**:

      This paper looks at an important nuclear matrix protein SATB1, which is a well known global chromatin organizer and help chromatin loop attach to the nuclear matrix. The paper starts with identification of novel short and long form of SATB1. Both the isoform consist of a prion like low complexity domains, but the long isoform additionally contain an extra EPF domain next the Prion like low complexity domain. The paper reports that in murine cells the long isoform is 3-4 fold more abundant than the short isoform. By using STED microscopy they show SATB1 foci lie next to transcription sites in the nucleus. They conclude by looking at the spherical shape of the SATB1 foci and the susceptibility of SATB1 staining after 1,6 hexanediol treatment that SATB1 forms the small foci in the nucleus due to LLPS. The authors also use RAMAN spectroscopy to conclude a change in nuclear chemical space in absence of SATB1 but without much explanation about which chemical bond or nuclear sub structure change correspond to the change in principal component analysis from Raman spectroscopy. The authors use the light inducible aggregation cry2 tag with the PrD domain of SATB1 and compare it with the Cry2-FUS-LC domain to conclude that the SATB1 LC domain can undergo LLPS. The authors hint at involvement of RNA and also DNA in the LLPS of the SATB1 but without going into any detail. Reviewer: The paper reports that in murine cells the long isoform is 3-4 fold more abundant than the short isoform.

      Actually, in page 5 (lines 94-96) of the manuscript we write: “We confirmed that in murine thymocytes the steady state mRNA levels of the short Satb1 transcripts were about 3-5 fold more abundant compared to the steady state mRNA levels of the long Satb1 transcripts (Fig. 1d).” Although the steady state mRNA levels of the long isoform are less abundant compared to the shorter isoforms, the long isoform protein levels are almost comparable to the short isoform as deduced based on immunofluorescence experiments. Moreover, Using our two immunodepletion experiments we quantified the difference, estimating the long isoform being 1.5× to 2.62× less abundant than the short isoform (Fig. 1f and Supplementary Fig. 1b; compare lanes 2 & 3 at the lower panel). • Regarding the RAMAN spectroscopy experiments please see Minor Comment #1 of this Reviewer (page 10).

      The key conclusions of the paper are- A) SATB1 undergoes LLPS. But this conclusion is drawn after correlative experiments as detailed below-

      This conclusion is indeed made based on correlative experiments only for the primary murine T cells, which do not allow for any targeted experiments. However, the use of in vitro cell lines allowed us to validate these findings using the optogenetic approaches, utilizing additional experimentation.

      1) observation of spherical punctae by STED-which could also seem spherical due to their small size. The resolution limit achieved by the STED microscopy used in this paper is not determined or mentioned clearly.

      In the revised version of the manuscript, we have specified the resolution of our systems, for STED in Lines 745-746: ”This system enables super-resolution imaging with 35 nm lateral and 130 nm axial resolution.” and for SIM in Lines 759-761: “Images were acquired over the majority of the cell volume in z-dimension with 15 raw images per plane (five phases, three angles), providing ~120-135 nm lateral and ~340-350 nm axial resolution for 488/568 nm lasers, respectively.” The size of the observed speckles is thus above the resolution limit with sizes ranging between 40-80 nm.

      The resolution of our systems is routinely verified by the following methods: The resolution of our OMX (SIM-3D) system was tested using ARGO-SIM slide containing a pattern of 36 µm long lines with gradually increasing spacing ranging from (left to right) 0 to 390 nm, with a step of 30 nm (Fig. 1 below). Our SIM system was able to clearly resolve two lines separated by 120 nm.

      2) No live cell FRAP experiment with fluorescent SATB1 long or short isoform to show that these foci are liquid like

      We did perform FRAP experiments for the SATB1 N-terminus optogenetic construct as demonstrated in Fig. 4f. We did not perform FRAP in the primary murine T cells as this is not technically feasible without creating a new mouse line with fluorescently labeled protein. In the revised version of the manuscript, we additionally performed FRAP experiments for the full length short and long isoform of SATB1 labeled with EGFP and transfected into the NIH-3T3 cell line (Supplementary Figure 6f).

      5) LLPS is strongly coupled to the cellular concentration of the proteins. Authors should quantify the cellular concentration of the long and short isoform in the cells.

      We did consider protein concentration in our analyses of optogenetic constructs in Fig. 4b,d,e and Supplementary Fig. 6a,b,c. Quantifying the physiological cellular concentration of short and long SATB1 protein isoforms in primary T cells is impossible due to the inherent inability to discriminate between the isoforms by two antibodies, in the absence of Satb1 isoform-specific knockout mice.

      However, an approximation of the cellular concentration can be obtained from our immunodepletion experiments. On top of the original immunodepletion experiment that we now present in Supplementary Fig. 1a,b; in the revised version of the manuscript we have repeated the experiment in Fig. 1e,f. Comparison of the two bands for the long and short SATB1 isoforms in the lower panel of the western blot figures suggest that the long SATB1 isoform protein levels are 1.5× to 2.62× less abundant than the short isoform, according to the original and new immunodepletion experiment, respectively. This is now also included in the main text in Lines 110-116: “This experiment can also be used for approximation of the cellular protein levels of SATB1 isoforms in primary murine thymocytes. Comparison of the two bands for long (lane 2) and short SATB1 (lane 3) isoform in the lower panel of Fig. 1f and Supplementary Fig. 1b, suggests that the long SATB1 isoform protein levels may be about 1.5× to 2.62× less abundant than the short isoform, according to the two replicates of our immunodepletion experiment, respectively.”

      Major conclusion B)- SATB1 regulates transcription and splicing.

      This was also shown previously and in this paper they show the close proximity of the transcription site and SATB1 foci by microscopy. Hexanediol treatment which lead to loss of colocalization between FU foci and SATB1 is also taken as an evidence in regulation of transcription is not right as the transcription foci itself can be dissolved using 1,6 Hexanediol. Although the rate of transcription is not measured quantitatively.

      As mentioned in comment #3 (page 29) of this Reviewer, unfortunately there is no better tool to investigate these questions in primary cells than using microscopy approaches in conjunction with hexanediol treatment. However, we should also note that there is an accompanying manuscript from our group that is currently being under revision in another journal (preprint available: Zelenka et al., 2021; https://doi.org/10.1101/2021.07.09.451769). In the preprint manuscript, we showed that: 1. the long SATB1 isoform binding sites have increased chromatin accessibility than what expected by chance (Fig. 3b), 2. there is a drop in chromatin accessibility at SATB1 binding sites in Satb1 cKO mouse (Fig. 3c) and 3. this drop in chromatin accessibility is especially evident at the transcription start sites of genes (Supplementary Fig. 1i)

      We believe that, together these data suggest a direct involvement of SATB1 in transcription regulation. Also note the vast transcriptional deregulation that occurs in Satb1 cKO T cells, affecting the expression of nearly 2000 genes (Fig. 2f, this revised manuscript). That is why we believe that the co-localization analysis, using super-resolution microscopy, presented in Fig. 2c and quantified in Fig. 3g, represents a nice additional support to our claims. Moreover, in the revised version of the manuscript we now present a positive correlation between SATB1 binding and deregulation of splicing (Supplementary Fig. 4d) which also supports its direct involvement in the regulation of transcriptional and co-transcriptional processes.

      In the revised version of the manuscript we have made this clear in Lines 182-194: “Satb1 cKO animals display severely impaired T cell development associated with largely deregulated transcriptional programs as previously documented19,37,38. In our accompanying manuscript19, we have demonstrated that long SATB1 isoform-specific binding sites (GSE17344619) were associated with increased chromatin accessibility compared to randomly shuffled binding sites (i.e. what expected by chance), with a visible drop in chromatin accessibility in Satb1 cKO. Moreover, the drop in chromatin accessibility was especially evident at the transcription start site of genes, suggesting that the long SATB1 isoform is directly involved in transcriptional regulation. Consistent with these findings and with SATB1’s nuclear localization at sites of active transcription, we identified a vast transcriptional deregulation in Satb1 cKO with 1,641 (922 down-regulated, 719 up-regulated) differentially expressed genes (Fig. 2f). Specific examples of transcriptionally deregulated genes underlying SATB1-dependent regulation are provided in our accompanying manuscript19. Additionally, there were 2,014 genes with altered splicing efficiency (Supplementary Fig. 4d-e; Supplementary File 3-4). We should also note that the extent of splicing deregulation was directly correlated with long SATB1 isoform binding (Supplementary Fig. 4d).”

      Major conclusion C)-Post transcriptional modification is important for SATB1 function.

      This point is just barely touched upon in the last figure of the paper

      We would not call the identification of the novel phosphorylation site as a main conclusion of our manuscript. Though, it is already known that posttranslational modifications of SATB1 are important for its function as they can function as a molecular switch rendering SATB1 into either an activator or a repressor (Kumar et al., 2006; https://doi.org/10.1016/j.molcel.2006.03.010).

      In the revised manuscript, we support the effect of serine phosphorylation on the DNA binding capacity of SATB1 by another experiment. We have performed DNA affinity purification experiments utilizing primary thymocyte nuclear extracts treated with phosphatase (Supplementary Fig. 7b). We found that SATB1’s capacity to bind DNA (RHS6 hypersensitive site of the TH2 LCR) is lost upon treatment with phosphatase (Supplementary Fig. 7c). These results are in line with the data presented in Supplementary Fig. 7d, indicating the lost ability of SATB1 to bind DNA upon mutating the discovered phosphorylation site S635. Given the importance of posttranslational modifications of proteins on LLPS, we found it relevant to include it in our manuscript. Even more so, when we identified SATB1 aggregation, upon mutation of this phospho site (Fig. 6d).

      Overall I find that the major conclusion-point A and B, is based on very indirect experiments and needs much more convincing data and the role of SATB1 LLPS in cells should be demonstrated more rigorously. And conclusion C is barely described and needs a lot more cell biological and genetic evidence.

      One of the major assets of our work is that most of our data are based on the analysis of primary murine T cells and thus investigating the biological roles of the endogenous SATB1 protein, under physiological conditions. We apologize that we did not make it clear to this Reviewer, that our system has certain inherent limitations due to the utilization of primary cells.

      I do not recommend publishing the paper in current state. The story needs much more experiment to convincingly prove the major conclusions. Further, the MS needs more careful thinking and presentation to make it streamlined.

      We hope that in the revised version we have significantly improved the quality of our manuscript by implementing the suggested changes.

      Minor comments: One of the major flaw of the paper is the use too many techniques without proper explanation. E.g. use of STED and RAMAN microscopy need controls and explanation on what is being quantified. The use of Raman microscopy to quantify the nuclear environment of nucleus is not related to the chromatin organization or LLPS of SATB1 at all. And no information is provided at all which aspect of nuclear organization is being measured in Raman and what it means for the LLPS of SATB1.

      We do provide quite a thorough explanation of Raman spectroscopy and the underlying quantification in Lines 224-231: “we employed Raman spectroscopy, a non-invasive label-free approach, which is able to detect changes in chemical bonding. Raman spectroscopy was already used in many biological studies, such as to predict global transcriptomic profiles from living cells42, and also in research of protein LLPS and aggregation43–47. Thus we reasoned that it may also be used to study phase separation in primary T cells. We measured Raman spectra in primary thymocytes derived from both WT and Satb1 cKO animals and compared them with spectra from cells upon 1,6-hexanediol treatment. Principal component analysis of the resulting Raman spectra clustered the treated and non-treated Satb1 cKO cells together, while the WT cells clustered separately (Fig. 3h).” We also do provide controls as the method was performed on both treated and untreated WT and Satb1 cKO cells.

      Regarding the RAMAN spectroscopy experiments we now provide more information on the changes of chemical bonds altered between wild type and Satb1 cKO thymocytes. Following principal component analysis, we have extracted the two main principal components that were used for the clustering of our data. The differences are presented in Supplementary Fig. 5d.

      We do realize that RAMAN spectroscopy, although a quite novel approach utilized to study LLPS, has not been used to study LLPS in live cells. If deemed proper we are willing to avoid presenting these results in this manuscript.

      Similarly for Hexanediol treatment, duration of treatment is missing. Hexanediol can also dissolve the liquid like transcription foci. And hence a decrease in correlation between SATB1 foci and FU foci cannot be taken as a measure of SATB1 foci connection to transcription alone

      The duration of hexanediol treatment was 5 minutes as presented in Line 724 and in the revised version of the manuscript also in Lines 1206-1207. We should also note that additionally, we performed experiments with different hexanediol concentrations and timing varying from 1 minute to 10 minutes with results consistent with the data presented.

      It is not very clear how many times the STED or Raman microscopy is done on how many samples and biological replicates. Similarly for RNA sequencing number of samples and description of controls are missing. Also if the sequencing data is made publicly available is not clear.

      Data availability is clearly stated in Lines 506-509: “RNA-seq experiments and SATB1 binding sites are deposited in Gene Expression Omnibus database under accession number GSE173470 and GSE173446, respectively. The other datasets generated and/or analyzed during the current study are available upon request.”

      The Reviewer’s token is “wjwtmeeeppovzqx”.

      RNA sequencing was performed in a biological triplicate for each genotype as stated in the GEO repository and now also in Line 566 of the revised manuscript.

      In Lines 180-181, we also state that it was performed on Satb1 cKO animals and WT mice as a control: “we performed stranded-total-RNA-seq experiments in wild type (WT) and Satb1fl/flCd4-Cre+ (Satb1 cKO) murine thymocytes”.

      In Lines 739-740, we now also state that all imaging approaches were performed on at least two biological replicates (different mice) and please also note the fact that all findings were based on data from both STED and 3D-SIM methods, allowing to minimize detection of artifacts. In the Raman spectroscopy figure, each point represents measurements from an individual cell and for each condition we used 2-5 biological replicates (Lines 831-832 & Line 1169).

      Similarly, in Lines 129-132 we provided a quite detailed description of differences between STED and 3D-SIM, even though these techniques are not that rare as Raman spectroscopy in biology research.

      Additional control is needed to report the resolution limit of Superresolution techniques-STED and 3D-SIM systems used by them.

      We have already provided this information in our reply to comment #1 of this Reviewer (pages 6-7): In the revised version of the manuscript, we have specified the resolution of our systems, for STED in Lines 745-746: ”This system enables super-resolution imaging with 35 nm lateral and 130 nm axial resolution.” and for SIM in Lines 759-761: “Images were acquired over the majority of the cell volume in z-dimension with 15 raw images per plane (five phases, three angles), providing ~120-135 nm lateral and ~340-350 nm axial resolution for 488/568 nm lasers, respectively.” The resolution of our systems is routinely verified by the following methods: The resolution of our OMX (SIM-3D) system was tested using ARGO-SIM slide containing a pattern of 36 µm long lines with gradually increasing spacing ranging from (left to right) 0 to 390 nm, with a step of 30 nm (Fig. 1 below). Our SIM system was able to clearly resolve two lines separated by 120 nm.

      Would be very helpful if the zonation was plotted for the FluoroUridine (FU) also to show that Zone1 (heterochromatin) is completely depleted of FU, and is present in other regions.

      In the revised version of the manuscript, we performed the suggested analysis and in Supplementary Fig. 3a we now show that indeed FU is significantly less localized to Zone 1 (heterochromatin) and has the most abundant localization in Zones 3 and 4, similar to the localization of SATB1 protein, as demonstrated in Fig. 2b.

      Scale bar needed figure 3d

      In the revised version of the manuscript, we included scale bars which are both 0.5 µm (line 1213).

      Perfectly rounded SATB1 foci- this does not mean LLPS. For LLPs measurement, protein condensate dynamics measurement by FRAP or fusion experiments is required. What is the size of condensates? and cellular concentration of SATB1? Will SATB1 undergo LLPS in vitro at similar concentrations? does SATB1 interact with DNA or RNA to undergo LLPS ?

      We toned down this sentence which now reads: “Here we demonstrated its connection to transcription and found that it forms spherical speckles (Fig. 1g), markedly resembling phase separated transcriptional condensates. (Lines 200-202)”.

      Moreover, as explained in earlier replies to comments of this Reviewer, we cannot perform FRAP on primary murine T cells without generating a new mouse line. We did, however, use FRAP and other in vitro approaches including visualization of droplet fusion in ex vivo experiments utilizing cell lines. Moreover, we are willing to demonstrate the LLPS properties of SATB1 on in vitro purified SATB1 protein as indicated in the suggested experiment of Point#4 (page 2).

      After careful reading of the MS I conclude that the main conclusions of the paper are very preliminary and need much more detailed experiments. So does not qualify to get published at all at this stage.

      **Reviewer #1 (Significance)**:

      The present manuscript tries to connect the phase separation of SATB1 to understanding the mechanism of SATB1 function in cells. One of the major hallmarks of phase separation is dynamic, liquid-like behaviour and in absence of these measurements, it is very difficult to say that the current manuscript has made any contribution to showing that SATB1 can phase separate.

      The presence of 2 isoforms of SATB1 is a novel finding and the paper could have focused more on this. E.g. elucidate expression of the isoform during thymocyte development and maturation.

      As a reviewer my expertise are cell biology experiments, microscopy, in vitro reconstitution assays, RNA binding proteins, RNA and RBP condensate formation. And I feel that the reconstitution experiments are an important tool for understanding phase behaviour of proteins and also to gauge if this behaviour can occur or not in cellular concentration and conditions.

      I do not have sufficient expertise in Raman microscopy and hence the information provided in the MS on this part was not enough to understand the experiment and conclusions drawn from it.

      **Reviewer #2 (Evidence, reproducibility and clarity)**:

      The authors have reported the existence of a 'long' SATB1 isoform which also undergoes LLPS. The authors tried to draw multiple comparisons and pointed out distinction between phase properties of SATB1 isoforms. The authors also touch upon two functional roles of SATB1. Although a wide array of assays are used, the data presented and hence the manuscript makes multiple transitions into disparate hypotheses without diving deep into a single hypothesis. As a result, the connections drawn are unclear, and do not converge at best. The authors have used number of techniques, however, the results do not support their conclusions and they appear hastily drawn. It is not clear why the authors jump from one context to the other, discussing LLPS first, then transcription, splicing, post-translational modification and finally cancer. The link between all of these isn't clear and not fully supported by data. It appears that the authors wish to focus on Satb1's physiological role in development, hence the data on breast cancer is confusing. Thus, this work suffers from multiple pitfalls. Specific comments are given below:

      Major comments 1. Importantly, in Fig 1d, there is no statistics shown. There is no mention of number of replicates as well in the legends. Proper statistical evaluation is critical for interpreting this result.

      Please note that Fig. 1d only serves as a control to the sequencing experiment in Fig. 1b. In Line 566, we now state that for the RNA-seq: “A biological triplicate was used for each genotype.” To validate these data, we further designed a RT-qPCR experiment which was performed on three technical replicates from a male and female mouse. We now state this in Line 636. For the low number of samples, statistical tests are not accurate but we still added t test into the figure Fig. 1d and specified it also in the figure legend in Line 1169-1170.

      1. Figure 1f presents one of the weakest evidences in the manuscript. There are a number of corrections needed. Firstly, being their major and only validation figure for their custom antibody, the immunoblot is not clean, bands are fuzzy. Importantly, as the authors claim that the antibody is highly specific to 'long' SATB1, after the IP there should be only a single band (like input) of Satb1 long. But that does not seem to be the case, rather an array of bands are visible below (lane 2 top panel). This could easily mean that the shorter isoforms or non-specific protein bands are also pulled down with the 'long' form specific antibody. Therefore, raising a critical concern regarding the specificity of the antibody.

      • The long antibody was raised in mice inoculated with the extra peptide present in the long isoform only. Therefore, the capacity of this antibody precipitating the shorter isoforms, which do not express the sequence of the extra peptide (EP, Figure 1a) in not possible. • We have repeated the immunodepletion experiment and we now provide the results in Fig. 1f and Supplementary Fig. 1b. The western blot in Fig. 1f is now cleaner and supports quite convincingly the presence of a long SATB1 isoform. Given the lack of isoform-specific knockouts which we could utilize to immunoprecipitate or detect the different isoforms in a single cell (or cell population), the utilized approach of immunodepletion and subsequent western blotting is the approach we thought of implementing. • As shown in Fig. 1f and Supplementary Figure 1b, the long isoform SATB1 antibody has the capacity to recognize the long isoform in murine thymocyte protein extracts but not the short SATB1 isoform (please compare lane 3 in the two western blots utilizing either the antibody for the long isoform -top panel - or the antibody that detects both isoforms (lower panel). • We have performed Immunofluorescence experiments utilizing the antibody detecting the long SATB1 isoform in thymocytes isolated from either C57BL/6 or Satb1 cKO mice. The antibody is specific to the SATB1 protein since there is no signal in immunofluorescence experiments utilizing the knockout cells (Supplementary Figure 1c). • We have performed Immunofluorescence experiments utilizing thymocytes and the antibody detecting the long SATB1 or a commercially available antibody detecting all SATB1 isoforms. The pattern of SATB1 subnuclear localization is similar for both antibodies (Supplementary Figure 1e). • In our accompanying revised manuscript Zelenka et al., 2022 (https://doi.org/10.1101/2021.07.09.451769), we provide yet another piece of evidence, consisting of bacterially expressed short and long SATB1 protein isoforms detected by western blot using either the long isoform-specific or the non-selective all SATB1 isoforms antibodies. • Regarding the additional bands detected in the immunoprecipitation experiment presented in the original Supplementary Figure 1b (lane 2), it is not surprising that additional bands appear in a sample of protein extracts that is used for several hours for the immunoprecipitation experiments, while the “input” sample simply denotes protein extract that is frozen at -80oC right after the preparation of protein extracts until use. It is well-established that SATB1 is the target of proteases which might as well be active during the immunoprecipitation steps (2 consecutive immunoprecipitation steps take place). Therefore, the immunoprecipitated material cannot necessarily be a copy of the input material displaying a single protein band even if protease inhibitors are included in the buffers.

      Taken together the experiments described here we showed that the antibody raised against the extra 31 aa long peptide, present only in the long SATB1 isoform, is specific for this isoform.

      1. Related to Fig. 2 a, the authors state on Pg 5, '....the euchromatin and interchromatin regions (zones 3 & 4, Fig. 2a, b).' Although the DAPI correlation seems clear, there is no mention on how they reached the above said correlation. They should at least show a parallel speckle staining for HP1 or signature modification such as H3K4me9 STEDs for making supporting such a claim. DAPI alone is not sufficient. The authors should rectify the text thoroughly for many such interpretations without validation/reference or provide relevant data.

      This is a great suggestion we have again taken under consideration and we added the following experiments and the appropriate changes in the revised version of our manuscript. • We modified the text and added a reference to Miron et al., 2020 (https://doi.org/10.1126/sciadv.aba8811) supporting our claims regarding SATB1 localization in relation to DAPI staining. • We have also added new microscopy images for HP1, H3K4me3 and fibrillarin staining and quantified the localization of FU-stained sites of active transcription in nuclear zones, to further support our claims. • This whole modified part in Lines 139-167 then reads: “ “The quantification of SATB1 speckles in four nuclear zones, derived based on the relative intensity of DAPI staining, highlighted the localization of SATB1 mainly to the regions with medium to low DAPI staining (zones 3 & 4, Fig. 2a, b). A similar distribution of the SATB1 signal could also be seen from the fluorocytogram of the pixel-based colocalization analysis between the SATB1 and DAPI signals (Supplementary Fig. 2a). SATB1’s preference to localize outside heterochromatin regions was supported by its negative correlation with HP1β staining (Supplementary Fig. 2b). Localization of SATB1 speckles detected by antibodies targeting all SATB1 isoforms and/or only the long SATB1 isoform, revealed a significant difference in the heterochromatin areas (zone 1, Fig. 2b), where the long isoform was less frequently present (see also Fig. 2a and Fig. 3c). Although, this could indicate a potential difference in localization between the two isoforms, due to the inherent difficulty to distinguish the two based on antibody staining, we refrain to draw any conclusions. The prevailing localization of SATB1 corresponded with the localization of RNA-associated and nuclear scaffold factors, architectural proteins such as CTCF and cohesin, and generally features associated with euchromatin and active transcription32. This was also supported by colocalization of SATB1 with H3K4me3 histone mark (Supplementary Fig. 2c), which is known to be associated with transcriptionally active/poised chromatin. Given the localization of SATB1 to the nuclear zones with estimated transcriptional activity32 (Fig. 2b, zone 3), we investigated the potential association between SATB1 and transcription. We unraveled the localization of SATB1 isoforms and the sites of active transcription labeled with 5-fluorouridine. Sites of active transcription displayed a significant enrichment in the nuclear zones 3 & 4 (Supplementary Fig. 3a), similar to SATB1. As detected by fibrillarin staining, SATB1 also colocalized with nucleoli which are associated with active transcription and RNA presence (Supplementary Fig. 3b). Moreover, we found that the SATB1 signal was found in close proximity to nascent transcripts as detected by the STED microscopy (Fig. 2c). Similarly, the 3D-SIM approach indicated that even SATB1 speckles that appeared not to be in proximity with FU-labeled sites in one z-stack, were found in proximity in another z-stack (Supplementary Fig. 3c). Additionally, a pixel-based colocalization of SATB1 and sites of active transcription is quantified later in the text in Fig. 3g, supporting their colocalization.”

      1. The authors mention, '...of the different SATB1 isoforms, uncovered by the use of the two different antibodies, relied in the heterochromatin areas (zone 1), where the long isoform was less frequently...' There is no supporting figure number mentioned. The authors need to show a zone-by-zone comparison images for 'all iso' vs 'long' iso of SATB1. Just to reiterate, there is a need for a heterochromatin mark to unambiguously call out the distinction.

      We should remind that there is an inherent difficulty to accurately compare localization of short and long SATB1 isoforms in primary cells, especially due to the lack of Satb1 isoform-specific knockout mice. There is no way to detect only the short isoform in these primary cells as there are only antibodies targeting the long or all SATB1 isoforms. Therefore, we cannot set up additional experiments probing these questions.

      In line with this, in the revised version of the manuscript, we toned down our statements regarding the differential localization of the two isoforms in primary cells. We only refer to it as an indication and we support it by adding references to the relevant figures. This part now reads: “Localization of SATB1 speckles detected by antibodies targeting all SATB1 isoforms and/or only the long SATB1 isoform, revealed a significant difference in the heterochromatin areas (zone 1, Fig. 2b), where the long isoform was less frequently present (see also Fig. 2a and Fig. 3c). Although, this could indicate a potential difference in localization between the two isoforms, due to the inherent difficulty to distinguish the two based on antibody staining, we refrain to draw any conclusions. (Lines 145-150)”

      1. On the same lines, '....Given the localization of SATB1 to the nuclear zones with estimated transcriptional activity (Fig. 2b, zone 3)....' How was the region labelled as transcriptionally active? For the statistical analysis of speckle count for the two antibodies' staining, the claim posited is a bit bigger. This could simply be true for that cell. The authors thus need to statistically analyse the speckle counts for multiple cells. This needs to be done for all imaging statistics done in multiple figures throughout the manuscript.

      As mentioned in our reply to the two previous comments of this Reviewer, transcriptional activity in relation to the nuclear zonation is well established in the literature. To make this clear, we have now added the reference to Miron et al., 2020 (https://doi.org/10.1126/sciadv.aba8811) supporting our claims and additionally we have also included HP1, H3K4me3 and fibrillarin staining and quantification of FU signal in the nuclear zones. Moreover, it is not clear to which particular cell the comment refers to. The presented dots in Fig. 2b represent individual cells and the relative proportions of speckles in each nuclear zone are plotted on the y axis. In the revised version of the manuscript, we added into the figure the number of cells scored and we adapted the figure legend so that it is absolutely clear that we have analyzed multiple cells:

      “Nuclei of primary murine thymocytes were categorized into four zones based on the intensity of DAPI staining and SATB1 speckles in each zone were counted. Images used represented a middle z-stack from the 3D-SIM experiments. The graph depicts the differences between the long and all SATB1 isoforms’ zonal localization in nuclei of primary murine thymocytes. (Lines 1189-1193)”

      1. For figure 2c. the authors have used 5 Fluorouridine for nascent RNA speckles. 5FU is known to have a spread signal type (with strong association to nucleolus as well). This is not the case for the image presented 2c. The authors should resolve this by showing different sets of images.

      Developing and naive T cells are very unique in terms of their metabolic features and thus they should not be directly compared with other cell types. Therefore, we would not expect to see such a spread FU pattern as previously shown for other cell types. Having said that, we could not find any reference publication that utilized super-resolution microscopy to detect localization of FU-stained sites of active transcription in developing primary T cells. However, we performed additional immunofluorescence experiments to demonstrate the colocalization or its lack between SATB1 and HP1 (Supplementary Fig. 2b), H3K4me3 (Supplementary Fig. 2c) and fibrillarin (Supplementary Fig. 3b). Moreover, we provide additional regions of SATB1 and FU staining in Supplementary Fig. 3c. The modified text reads:

      “We unraveled the localization of SATB1 isoforms and the sites of active transcription labeled with 5-fluorouridine. Sites of active transcription displayed a significant enrichment in the nuclear zones 3 & 4 (Supplementary Fig. 3a), similar to SATB1. As detected by fibrillarin staining, SATB1 also colocalized with nucleoli which are associated with active transcription and RNA presence (Supplementary Fig. 3b). Moreover, we found that the SATB1 signal was found in close proximity to nascent transcripts as detected by the STED microscopy (Fig. 2c). Similarly, the 3D-SIM approach indicated that even SATB1 speckles that appeared not to be in proximity with FU-labeled sites in one z-stack, were found in proximity in another z-stack (Supplementary Fig. 3c). Additionally, a pixel-based colocalization of SATB1 and sites of active transcription is quantified later in the text in Fig. 3g, supporting their colocalization. (Lines 157-167)”

      1. Fig 2 d., the authors have suddenly jumped solely to 'all iso' Satb1 here for IP MS. Is there a reason for that? The authors either need to do this with 'long iso' antibody or remove the analysis from the manuscript as it does not add to their primary aim of the manuscript. Also, the authors have only selectively talked about two clusters? What about chromatin related proteins? It is quite intuitive to have highest enrichment of these given previous literature and even IP MS data by other groups. Thus, it is necessary to revise this thoroughly or remove it.

      We appreciate the acknowledgment by the Reviewer that our IP-MS data identified anticipated factors. In the revised version of the manuscript we modified the underlying text to accommodate references to these former findings revealing interactions between SATB1 and chromatin modifying complexes: “Apart from subunits of chromatin modifying complexes that were also detected in previous reports25,33–36, unbiased k-means clustering of the significantly enriched SATB1 interactors revealed two major clusters consisting mostly of proteins involved in transcription (blue cluster 1; Fig. 2d and Supplementary Fig. 4c) and splicing (yellow cluster 2; Fig. 2d and Supplementary Fig. 4c). (Lines 170-174)”

      Please note that many subunits of chromatin modifying and chromatin-related complexes are in fact characterized as transcription-related factors, therefore our statements are not in disagreement with the former findings. Note also that we provide Supplementary File 1 & 2 with comprehensive description of our IP-MS data for the readers’ convenience. Please also note that we are the first group to report on the existence of the long isoform. Therefore, we find it absolutely reasonable to perform IP-MS experiment for all SATB1 isoforms which can then be used for a comparison with other publicly available datasets. We believe that there is no contradiction in this experimental setup in relation to the rest of the manuscript. We discuss the two major clusters simply because they are the two major clusters identified as indicated in Fig. 2d. Additionally, in Supplementary Fig. 4c, we provide a comprehensive description of all significantly enriched interactors including their cluster annotation and thus anyone can investigate the data if needed.

      1. In relation to Fig. 2f, the authors have not mentioned any of the previously published work on Satb1 CD4 specific KO, not even the RNA seq studies the other groups have reported under the same condition. Only an unpublished reference of their own (preprint) is cited. It is imperative to show how much their data corroborates with other published studies. Additionally, what is the binding site status of dysregulated genes?

      In the revised version of the manuscript, we have included the references to other studies using the same Satb1 conditional knockout. Moreover, we have clarified the relationship between SATB1 binding and gene transcription. The modified part in Lines 182-194 now reads: “Satb1 cKO animals display severely impaired T cell development associated with largely deregulated transcriptional programs as previously documented19,37,38. In our accompanying manuscript19, we have demonstrated that long SATB1 isoform specific binding sites (GSE17344619) were associated with increased chromatin accessibility compared to randomly shuffled binding sites (i.e. what expected by chance), with a visible drop in chromatin accessibility in Satb1 cKO. Moreover, the drop in chromatin accessibility was especially evident at the transcription start site of genes, suggesting that the long SATB1 isoform is directly involved in transcriptional regulation. Consistent with these findings and with SATB1’s nuclear localization at sites of active transcription, we identified a vast transcriptional deregulation in Satb1 cKO with 1,641 (922 down-regulated, 719 up-regulated) differentially expressed genes (Fig. 2f). Specific examples of transcriptionally deregulated genes underlying SATB1-dependent regulation are provided in our accompanying manuscript19. Additionally, there were 2,014 genes with altered splicing efficiency (Supplementary Fig. 4d-e; Supplementary File 3-4). We should also note that the extent of splicing deregulation was directly correlated with long SATB1 isoform binding (Supplementary Fig. 4d).”

      1. In context of Figure 3a and b, the authors write .'...The long SATB1 isoform speckles evinced such sensitivity as demonstrated by a titration series with increasing concentrations of 1,6-hexanediol treatment followed...' Whereas it is apparent from the image at least that overall numbers of individual speckles are instead increased at both 2 and 5%. There is although a clear spreading of restricted speckles compared to the controls. The authors should revise their figures to substantiate the associated text. Furthermore, there needs to be 'all iso' SATB1 3D SIM imaging and not just quantitation for comparison. This is also true for panel c in order to demonstrate the effect.

      In the revised Fig. 3a we provide new images which better reflect the underlying data analysis. Moreover, in Fig. 3c and Fig. 3d we provide an additional comparison between SATB1 all isoforms and long isoform staining and their changes upon hexanediol treatment, detected by both the 3D-SIM and STED approaches. It is true that upon treatment, there tend to be more speckles, however these are much smaller as they are gradually being dissolved. Depending on the treatment duration, the cells are swollen which is reflected in increased spreading of speckles. Nevertheless, the nuclear size was considered in all the quantification analyses. We believe that the new images provide better evidence of SATB1’s sensitivity to hexanediol treatment.

      1. Fig. 3 d also does not clearly demonstrate what the authors have claimed '...hexanediol treatment highly decreased colocalization between...' The figure shows at best decreased signal intensity for both SATB1 and FU. We suggest that the authors should give a statistical analysis as well for the colocalization points between the two using multiple source images. Lastly, the two images shown (control and treated), there seems to be a clearly visible magnification difference. The authors should clarify this.

      • In the revised version of the manuscript in Figure 3d, we have provided scale bars, which are both 0.5 µm (line 1213). The difference observed by this Reviewer is actually the main reason why we provided this image. Figure 3d demonstrates that upon hexanediol treatment, the speckles are mostly missing or significantly reduced in size, for both FU and SATB1 staining. • Moreover, the suggested statistical analysis is also provided – in Figure 3e. In Figure 3e, we performed pixel-based colocalization analysis which is a method that allows both quantification and statistical comparison of colocalization between two factors and between different conditions. Please note especially the decreased colocalization between long SATB1 isoform and FU-stained sites of active transcription in the left graph, which is in agreement with our claims in the manuscript. • Moreover, our data are compared to a negative control, i.e. 90 degrees rotated samples, which is a common method in colocalization experiments as described for example in Dunn et al., 2011 (https://doi.org/10.1152/ajpcell.00462.2010). • Additionally, we provide Costes’ P values which are based on randomly scrambling the blocks of pixels (instead of individual pixels, because each pixel’s intensity is correlated with its neighboring pixels) in one image, and then measuring the correlation of this image with the other (unscrambled) image. Please see Costes et al., 2004 (https://doi.org/10.1529%2Fbiophysj.103.038422) for more details.

      1. Figure 3f. The authors show the PC plot for Raman spectroscopy for phase behaviour due to Satb1. The experiment and its related text seems misinterpreted; the authors write...' ese bonds were probably enriched for weak interactions responsible for LLPS that are susceptible to hexanediol treatment. This shifted the cluster of WT treated cells towards the Satb1 cKO cells. However, the remaining covalent bonds differentiated the WT samples from Satb1 cKO cells......' whereas the clusters are clearly far away in 3D for both WT and KO while being closer to their respective treatments. Which is also intuitive given the sensitivity of Raman spectroscopy. Thus, it is more likely to be treatment effect and KO effect as separate. Treatment of WT leads to KO like spectra is far-fetched. Thus, the authors need to show separate PCs and modify their text thoroughly.

      We do not present any 3D graph hence it is not clear what the Reviewer refers to. Please also note that as stated in Lines 817-818, we used a customized Raman Spectrometer. Therefore, this approach allowed us to measure Raman spectra at cellular and even sub-cellular levels. For example, solely by utilizing Raman spectroscopy, we can now distinguish euchromatin and heterochromatin, methylated and unmethylated DNA and RNA, etc. This, together with other reports, such as Kobayashi-Kirschvink et al., 2018 (https://doi.org/10.1016/j.cels.2018.05.015) and Kobayashi-Kirschvink et al., 2022 (https://doi.org/10.1101/2021.11.30.470655), indicate a potential use of Raman in biological research. In our manuscript, we used this method as a supplementary approach, however we do find it noteworthy. We should also emphasize that in the revised Raman spectroscopy Fig. 3h, each point represents measurements from an individual cell and for each condition we used 2-5 biological replicates (Lines 831-832 & Lines 1225-1226). We specifically refer to the principal component 1 (PC1) that differentiates the samples. Therefore, there are certain spectra (representing certain chemical bonding) that allowed us to differentiate between WT and Satb1 cKO. The same type of bonding was then affected when WT samples were treated with hexanediol and we also had controls to rule out the impact of hexanediol on the resulting spectra.

      1. In Fig 4. b, The authors have shown the propensity of SATB1 N terminus to phase separate using different optodroplet constructs. Although the imaging is clear, why are the regions selected not uniform when comparing various constructs?

      We have selected images that would best represent each category. Please note that this was live cell imaging of photo-responsive constructs, thus there are many limitations regarding the area selection. Very often, even the brief time of bright light exposure to localize cells may trigger protein clustering. Upon disassembly, every new light exposure of the same cell then triggers much faster assembly which skews the overall results. It is therefore desired to work fast, while neglecting selection of equally sized cells. Moreover, it is not clear how would the proposed change improve the quality of our manuscript.

      1. Figure 5a, the disassembly should be shown for 'long' SATB1 as well. On pg 13, the authors write '....cytoplasmic protein aggregation has been previously described for proteins containing poly-Q domains and PrLDs..' no reference given.

      • In the revised version of the manuscript, we present the assembly and disassembly for both short and long full length SATB1 optogenetic constructs. To increase clarity, we present the behavior of the short and long isoforms as two separate images in Figure 5a and Figure 5b, respectively. • Moreover, we provided references to the statement regarding aggregation of PrLD and poly-Q-containing proteins in Lines 305-309, which now reads: ”Since protein aggregation has been previously described for proteins containing poly-Q domains and PrLDs8,11,38,39, we next generated truncated SATB1 constructs encoding two of its IDR regions, the PrLD and poly-Q domain and in the case of the long SATB1 isoform also the extra peptide neighboring the poly-Q domain (Fig. 1a and 4a).”

      1. Fig. 5d, Is there an amino-acid specific reasoning to support the authors claim of the phase behaviour due to extra peptide? They need to show a proper control with equal extra (unrelated) peptide to show the specificity. Are the shorter isoform aggregates responsive to light?

      • We have referred to the amino acid composition bias in Fig. 5c. In the revised version of the manuscript, we made this clear by showing the composition bias in the new revised Fig. 5e. The related part of the main text then reads: “Computational analysis, using the algorithm catGRANULE37, of the protein sequence for both murine SATB1 isoforms indicated a higher propensity of the long SATB1 isoform to undergo LLPS with a propensity score of 0.390, compared to 0.379 for the short isoform (Fig. 5d). This difference was dependent on the extra peptide of the long isoform. Out of the 31 amino acids comprising the murine extra peptide, there are six prolines, five serines and three glycines – all of which contribute to the low complexity of the peptide region3 (Fig. 5e).” (Lines 298-304) • Moreover, we should note that the low complexity extra peptide of the long SATB1 isoform directly extends the PrLD and IDR regions as indicated in Fig. 4a and which we now directly state in Lines 304-305: “Moreover, the extra peptide of the long SATB1 isoform directly extends the PrLD and IDR regions as indicated in the Fig. 4a.” • We show in Fig. 4, that the N terminus of SATB1 undergoes LLPS. Since this part of SATB1 is shared by both isoforms, it is reasonable to assume that both isoforms would undergo LLPS. This is also in line with the observed photo-responsiveness of both short and long full length SATB1 isoforms in CRY2 optogenetic constructs in revised Fig. 5a,b, and similar FRAP results for both short and long full length SATB1 isoform constructs transiently transfected in NIH-3T3 cells in the revised Supplementary Fig. 6f. However, the main reason why we think that the difference in LLPS propensity between the isoforms is important is because the long isoform is more prone to aggregate compared to the short isoform, as documented in Fig 5c,f,g and Supplementary Fig. 5f.

      1. Fig 6c., It is important that authors show the data for NLS+short iso data as well to prove their hypothesis.

      As shown in original Figure 5d, the long SATB1 isoform undergoes cytoplasmic aggregation, unlike the short SATB1 isoform (as shown in the same Figure). Therefore, an image of the NLS + short isoform would not be related to our hypothesis. Actually, we wanted to reverse the long SATB1 isoform’s relocation, from the aggregated form in the cytoplasm into the nucleus. Nevertheless, to show the complete picture, in the revised version of the manuscript in Figure 6c, we now provide data for both short and long SATB1 isoforms.

      1. Fig 6d., The authors claim that mutating a specific P site changes the phase behaviour of the 'short iso'. Does it also increase for the long isoform? The authors need to confirm this in order to verify the effect of a single P site outside of oligomerization domain. ...' phosphorylation status; when phosphorylated it remains diffused, whereas unphosphorylated SATB1 is localized to PML bodies....' This being an important premise, thus should be moved to the results text.

      In the revised version of the manuscript, we moved the part regarding PML in the results section, as suggested by the Reviewer. Moreover, we included additional experiments probing the impact of association between PML and two SATB1 full length isoforms on their dynamics. The modified section in Lines 357-368 now reads: “In relation to this, a functional association between SATB1 and PML bodies was already described in Jurkat cells64. We should note that PML bodies represent an example of phase separated nuclear bodies65 associated with SATB1. Targeting of SATB1 into PML bodies depends on its phosphorylation status; when phosphorylated it remains diffused, whereas unphosphorylated SATB1 is localized to PML bodies66. This is in line with the phase separation model as well as with our results from S635A mutated SATB1, which has a phosphorylation blockade promoting its phase transitions and inducing aggregation. To further test whether SATB1 dynamics are affected by its association with PML, we co-transfected short and long full length SATB1 isoforms with PML isoform IV. The dynamics of long SATB1 isoform was affected more dramatically by the association with PML than the short isoform (Supplementary Fig. 7e), which again supports a differential behavior of the two SATB1 isoforms.”

      Moreover, given the localization of the discussed phosphorylation site in the DNA binding region of SATB1 we did test its impact on DNA binding as documented in the revised Supplementary Fig. 7d. Additionally, as we have noted in our answer in Major Comment C of this reviewer, to further support the effect of serine phosphorylation on the DNA binding capacity of SATB1 we have performed DNA affinity purification experiments utilizing primary thymocyte nuclear extracts treated with phosphatase (Supplementary Fig. 7b) We found that SATB1’s capacity to bind DNA (RHS6 hypersensitive site of the TH2 LCR) is lost upon treatment with phosphatase (Supplementary Fig. 7c).

      1. Pg 16,. The authors have tried to explain multiple things (concepts of self-regulation, accessibility) which is quite tangential. There is no inference to Fig 6f., which is showing the opposite to what the authors had postulated. This portion should either be removed or explained with a rationale. The writing also needs to be revised thoroughly in this section. Similarly, the discussion should also be modified.

      The rationale for the original Fig. 6f (revised Fig. 6g) was described in great detail in Lines 330-343 of the original manuscript. It is not clear why the Reviewer assumes that it shows the opposite to our hypothesis. As we explained, the increased accessibility allows faster read-through by RNA polymerase, and thus the exon with higher accessibility is more likely to be skipped. The exact relationship is shown in the revised Fig. 6g where the increased accessibility is associated with the expression of the short isoform, whereas the long isoform expression needs lower chromatin accessibility which allows the splicing machinery to act on the specific exon to be included. We reason that these findings are important and relevant because: 1) we suggest a potential regulatory mechanism for the SATB1 isoforms production. This is highly relevant to this manuscript given the fact that this is the first report on the existence of the long SATB1 isoform, and 2) the differential production of the long/short SATB1 isoforms has a potential relevance to breast cancer prognosis. In the revised version of the manuscript we added Fig. 6f, which now indicates the differential chromatin accessibility in human breast cancer patients and accordingly the expression of the long SATB1 isoform are associated with worse patient prognosis as indicated in Fig. 6h and Supplementary Fig. 8a,b. In the revised version of the manuscript, we substantially modified the text in Lines 374-408, to make the relevance of all these conclusions clear. The modified text now reads: “Therefore, we reasoned that a more plausible hypothesis would be based on the regulation of alternative splicing. In our accompanying manuscript19, we have reported that the long SATB1 isoform DNA binding sites display increased chromatin accessibility than what expected by chance (Fig. 3b in 19), and chromatin accessibility at long SATB1 isoform binding sites is reduced in Satb1 cKO (Fig. 3c in 19), collectively indicating that long SATB1 isoform binding promotes increased chromatin accessibility. We identified a binding site specific to the long SATB1 isoform19 right at the extra exon of the long isoform (Fig. 6e). Moreover, the study of alternative splicing based on our RNA-seq analysis revealed a deregulation in the usage of the extra exon of the long Satb1 isoform (the only Satb1 exon affected) in Satb1 cKO cells (deltaPsi = 0.12, probability = 0.974; Supplementary File 4). These data suggest that SATB1 itself is able to control the levels of the short and long Satb1 isoforms. A possible mechanism controlling the alternative splicing of Satb1 gene is based on its kinetic coupling with transcription. Several studies indicated how histone acetylation and generally increased chromatin accessibility may lead to exon skipping, due to enhanced RNA polymerase II elongation48,49. Thus the increased chromatin accessibility promoted by long SATB1 isoform binding at the extra exon of the long isoform, would increase RNA polymerase II read-through leading to decreased time available to splice-in the extra exon and thus favoring the production of the short SATB1 isoform in a negative feedback loop manner. This potential regulatory mechanism of SATB1 isoform production is supported by the increased usage of the extra exon in the absence of SATB1 in Satb1 cKO (Supplementary File 4). To further address this, we utilized the TCGA breast cancer dataset (BRCA) as a cell type expressing SATB150. ATAC-seq experiments for a series of human patients with aggressive breast cancer51 revealed differences in chromatin accessibility at the extra exon of the SATB1 gene (Fig. 6f). In line with the “kinetic coupling” model of alternative splicing, the increased chromatin accessibility at the extra exon (allowing faster read-through by RNA polymerase) was positively correlated with the expression of the short SATB1 isoform and slightly negatively correlated with the expression of the long SATB1 isoform (Fig. 6f). Moreover, we investigated whether the differential expression of SATB1 isoforms was associated with poor disease prognosis. Worse pathological stages of breast cancer and expression of SATB1 isoforms displayed a positive correlation for the long isoform but not for the short isoform (Fig. 6g and Supplementary Fig. 6c). This was further supported by worse survival of patients with increased levels of long SATB1 isoform and low levels of estrogen receptor (Supplementary Fig. 6d). Overall, these observations not only supported the existence of the long SATB1 isoform in humans, but they also shed light at the potential link between the regulation of SATB1 isoforms production and their involvement in pathological conditions.”

      1. The authors should not draw conclusions based on any data which is not shown '....ed differences in chromatin accessibility at the extra exon of the SATB1 gene (data not shown), suggesting its potential involvement in alternative splicing regulation according to the "kinetic coupling" model...'. This has led to overspeculation and needs correction.

      In the revised version of the manuscript, we included the ATAC-seq data from human breast cancer patients in the revised Fig. 6f. The legend of this figure now reads: “Human TCGA breast cancer (BRCA) patient-specific ATAC-seq peaks51 span the extra exon (EE: extra exon; labeled in green) of the long SATB1 isoform. Note the differential chromatin accessibility in seven selected patients, emphasizing the heterogeneity of SATB1 chromatin accessibility in cancer. Chromatin accessibility at the promoter of the housekeeping gene DNMT1 is shown as a control. (Lines 1281-1285)” Accordingly, we have also modified the main text: “ATAC-seq experiments for a series of human patients with aggressive breast cancer68 revealed differences in chromatin accessibility at the extra exon of the SATB1 gene (Fig. 6f). In line with the “kinetic coupling” model of alternative splicing, the increased chromatin accessibility at the extra exon (allowing faster read-through by RNA polymerase) was positively correlated with the expression of the short SATB1 isoform and slightly negatively correlated with expression of the long SATB1 isoform (Fig. 6g).” (Lines 395-339)”

      Minor comments: 1. On pg 4, the authors state 'Here, we utilized primary murine T cells, in which we have identified two full-length SATB1 protein isoforms.' Whereas only one 'long' isoform is identified and the other is the canonical version. The authors should correct the statement.

      In the revised version of the manuscript, we modified this statement as follows: ”In this work, we utilized primary developing murine T cells, in which we have identified a novel full-length long SATB1 isoform and compared it to the canonical “short” SATB1 isoform.” (Lines 64-66)”

      1. Fig. 1 a , Is there a specific reason to generate a custom-made antibody for 'all' SATB1, using similar regions that are already commercially available. This becomes redundant otherwise, because there is no apparent difference in detection compared to the commercial one (Suppl. Fig 1a). Antibody generation strategy (1a) should be moved to supplementary. Additionally, authors have obtained the custom antibodies from a commercial source, therefore, the text should reflect the same alongside relevant details.

      The custom-made SATB1 antibody targeting the amino-terminal region of the protein has been developed in order to be utilized for detecting the native form of the protein. Unlike commercially available antibodies raised against either short peptides or denatured forms of the protein we have utilized the native form of the amino-terminal part of the protein for raising this antibody. To be honest, this antibody has been raised in order to be utilized in ChIP-seq experiments since no commercially available antibody is of high quality for this approach. Moreover, the original Figure 1a was utilized in order to provide an overview of the SATB1 protein structure which is highly relevant to understand its biophysical properties and not for presenting the strategy for raising a custom-made antibody for SATB1.

      1. Fig 3e: what is the control used here? In their Pearson correlation analysis, there seem to be significant reduction in control sets as well upon treatment. This needs to be clarified.

      We used scans rotated by 90° which served as a negative control, as stated in Line 769: “SATB1 scans rotated by 90° served as a negative control for the colocalization with FU.” Note that this is a commonly used control in colocalization experiments as described for example in Dunn et al., 2011 (https://doi.org/10.1152/ajpcell.00462.2010).

      Additionally, we provide Costes’ P values which are based on randomly scrambling the blocks of pixels (instead of individual pixels, because each pixel’s intensity is correlated with its neighboring pixels) in one image, and then measuring the correlation of this image with the other (unscrambled) image. Please see Costes et al., 2004 (https://doi.org/10.1529%2Fbiophysj.103.038422) for more details. Moreover, it was actually anticipated to see a decrease in colocalization upon hexanediol treatment even in the negative control, as hexanediol significantly reduces both SATB1 and FU speckles as established in Fig. 3a-d.

      1. Pg 10, the authors claim that '..., thus we reasoned that it may also be used to study phase separation...' But there have been numerous reports starting from 2018, which have utilized this technique in corelation to phase behaviour (albeit individual proteins). The authors should include proper citations as they are extending an idea from the same field to their specific need.

      In the revised version of the manuscript, we included relevant citations to support the use of Raman spectroscopy in LLPS research: “Raman spectroscopy was already used in many biological studies, such as to predict global transcriptomic profiles from living cells42, and also in research of protein LLPS and aggregation43–47. Thus we reasoned that it may also be used to study phase separation in primary T cells.” (Lines 225-228)”

      1. For Fig 5b, there should be a comparative image for 'short' isoform.

      In the revised Figure 5c we have included a comparative image for the short SATB1 isoform.

      1. In the context of Figure 5c, the authors claim ...' Note also the higher LLPS propensity of the human long SATB1 isoform compared to the murine SATB1...' Why suddenly human and mouse comparisons are drawn? This figure should be moved to supplementary.

      The comparison between the human and mouse SATB1 isoforms has been implemented because it is relevant for our claims regarding the increased SATB1 aggregation in human cells in relation to the revised Fig. 6f,g,h and Supplementary Fig. 6c,d. This is also discussed in Lines 479-482, which read: “This is particularly important given the higher LLPS propensity of the human long SATB1 isoform compared to the murine SATB1 (Fig. 5d). Therefore, human cells could be more susceptible to the formation of aggregated SATB1 structures which could be associated with physiological defects.”

      **Reviewer #3 (Evidence, reproducibility and clarity):**

      Zelenka et al., focus on a T cell genome-organizing protein, SATB1, to show that SATB1 undergoes liquid liquid phase-separation (LLPS), and distinct isoforms confer different LLPS-related biophysical properties. They generate a long-isoform specific antibody and conduct several experiments to test for LLPS and compare LLPS properties between the long-isoform relative to the whole SATB1 protein population. Given that SATB1 plays important roles in T cell development and in cancer, interrogating SATB1 biophysical properties is an important question. However, there are multiple problems with the experimental setup and data that weaken their support of the conclusions. I will detail some of the major issues below:

      Regarding phase-separation There are several assays to determine whether a protein undergoes LLPS. 1. One of the first the authors address is the spherocity or roundness. Indeed, formation of spherical droplets is one evidence of the liquid nature of a protein. However, the authors use fixed preparations (which can introduce artifacts), not free-floating protein, and determine roundness by showing a 2D image. Roundness should take into account the diffraction-limits of fluorescent imaging, as many structures can be imaged to appear round by the detector. There are quantifiable measurements that can be taken on 3D images to show roundness. This would best be shown using non-fixed protein.

      • We thank this Reviewer for several insightful comments. Although, we agree with most of them, we should highlight the main goal of our manuscript, i.e. to investigate the SATB1 protein with an emphasis on its physiological roles in primary developing murine T cells. We highlight this already in the introduction in Line 64 “In this work, we utilized primary developing murine T cells,...” and mainly also in the respective part of the result section: “To probe differences in phase separation in mouse primary cells, without any intervention to SATB1 structure and expression, we first utilized 1,6-hexanediol treatment, which was previously shown to dissolve the liquid-like droplets34.(Lines 203-205)”

      • We believe that this is a very important aspect of our study that should not be overlooked. The majority of proteins perhaps behave differently under physiological and in vitro conditions. However, due to the extensive post-translational modifications affecting the properties of SATB1, its completely different localization patterns between primary developing T cells and other cell types but especially cell lines and many other aspects, it was of utmost importance to focus our research on primary T cells. Unfortunately, this was accompanied with multiple difficulties, such as that we have to use fixed cells as this is the only way to visualize SATB1 in these cells. Alternatively, one could create a new mouse line expressing a fluorescently tagged SATB1 protein, but this is beyond the scope of our work.

      • However, we should also note that many LLPS-related studies do not pay any focus on primary physiological functions of proteins and they simply focus on the investigation of protein’s artificial behavior in in vitro conditions. Having said that, we too extended our experiments in primary cells to the ex vivo studies in cell lines to further support our claims. In these experiments, we utilized live cell imaging in Fig. 4-6, quantified the spherocity in Supplementary Fig. 6, showed the ability of speckles to coalesce in Fig. 4c and also used FRAP in Fig. 4f and also in the revised version of the manuscript in Supplementary Figure 6f. Moreover, we should note that most of these experiments were designed and performed during 2017 and 2018 conforming with the standards. We are well aware of the progress in the field and impact of fixation on LLPS, as described in Irgen-Gioro et al., 2022 (https://doi.org/10.1101/2022.05.06.490956), but after over seven months of review process in another journal we also believe that these aspects should be considered not to delay further progress of the SATB1 field.

      Regarding the isoform specificity of SATB1 biophysical properties 1. The authors generate a long isoform-specific antibody. However, the western blot is not convincing that this is indeed specific to the long isoform as there is a rather large smear. Can this be improved with antibody preabsorption? Since this is a key reagent for the manuscript, improvement in antibody quality is essential.

      The custom-made antibody for the long isoform has been raised against the unique 31 amino acids long peptide present in the long SATB1 isoform. The polyclonal serum has undergone affinity chromatography utilizing the immobilized peptide (antigen) to purify the antibody. In the revised version of the manuscript we have included another immunodepletion experiment with cleaner bands (Fig. 1f). Moreover, please read our answer to Major comment #2 of Reviewer 1 that follows: • The long antibody was raised in mice inoculated with the extra peptide present in the long isoform only. Therefore, the capacity of this antibody precipitating the shorter isoforms, which do not express the sequence of the extra peptide (EP, Figure 1a) in not possible.

      • We have repeated the immunodepletion experiment and we now provide the results in Fig. 1f and Supplementary Fig. 1b. The western blot in Fig. 1f is now cleaner and supports quite convincingly the presence of a long SATB1 isoform. Given the lack of isoform-specific knockouts which we could utilize to immunoprecipitate or detect the different isoforms in a single cell (or cell population), the utilized approach of immunodepletion and subsequent western blotting is the approach we thought of implementing.

      • As shown in Fig. 1f and Supplementary Figure 1b, the long isoform SATB1 antibody has the capacity to recognize the long isoform in murine thymocyte protein extracts but not the short SATB1 isoform (please compare lane 3 in the two western blots utilizing either the antibody for the long isoform -top panel - or the antibody that detects both isoforms (lower panel).

      • We have performed Immunofluorescence experiments utilizing the antibody detecting the long SATB1 isoform in thymocytes isolated from either C57BL/6 or Satb1 cKO mice. The antibody is specific to the SATB1 protein since there is no signal in immunofluorescence experiments utilizing the knockout cells (Supplementary Figure 1c).

      • We have performed Immunofluorescence experiments utilizing thymocytes and the antibody detecting the long SATB1 or a commercially available antibody detecting all SATB1 isoforms. The pattern of SATB1 subnuclear localization is similar for both antibodies (Supplementary Figure 1e).

      • In our accompanying revised manuscript Zelenka et al., 2022 (https://doi.org/10.1101/2021.07.09.451769), we provide yet another piece of evidence, consisting of bacterially expressed short and long SATB1 protein isoforms detected by western blot using either the long isoform-specific or the non-selective all SATB1 isoforms antibodies.

      • Regarding the additional bands detected in the immunoprecipitation experiment presented in the original Supplementary Figure 1b (lane 2), it is not surprising that additional bands appear in a sample of protein extracts that is used for several hours for the immunoprecipitation experiments, while the “input” sample simply denotes protein extract that is frozen at -80oC right after the preparation of protein extracts until use. It is well-established that SATB1 is the target of proteases which might as well be active during the immunoprecipitation steps (2 consecutive immunoprecipitation steps take place). Therefore, the immunoprecipitated material cannot necessarily be a copy of the input material displaying a single protein band even if protease inhibitors are included in the buffers.

      Taken together the experiments described here we showed that the antibody raised against the extra 31 aa long peptide, present only in the long SATB1 isoform, is specific for this isoform.

      1. Fig 4 Optodroplet experiment appears to show that the N-terminus of SATB1 can undergo LLPS. The results of this assay show that SATB1 has a domain that can undergo phase-separation in isolation, but it does not show that the protein itself is a phase-separating protein. The FRAP assay methods are not provided by the authors, but this is important, as continued light activation means proteins are continuously forming aggregates, and the bleaching for FRAP should be balanced with the levels of Cry2 activation. A very good description of the methods is described in the original Optodroplet paper: https://www.sciencedirect.com/science/article/pii/S009286741631666X?via%3Dihub#sec4

      We should note that we did follow the FRAP protocol provided by the recommended study Shin et al., 2017 (https://doi.org/10.1016/j.cell.2016.11.054). Indeed, these experiments are very tricky to perform and interpret, as every cell expresses slightly different amounts of protein which is directly associated with the different speed of optoDroplet formation, and thus its propensity to aggregate upon overactivation. On the other hand, there need to be continuous activation during the FRAP experiment as the lack of activation laser would result in fast disassembly of the optoDroplets, counteracting the FRAP results. Moreover, the optoDroplets actively move around the cell in all dimensions which makes the accurate measurement of signal intensity really challenging, even with an adjusted pinhole. Therefore, we do not think that FRAP is the best approach to examine the behavior of optoDroplets.

      Either way, we have now described the detailed FRAP protocol in Lines 889-898, which read: “For the FRAP experiments, cells were first globally activated by 488 nm Argon laser illumination (alongside with DPSS 561 nm laser illumination for mCherry detection) every 2 s for 180 s to reach a desirable supersaturation depth. Immediately after termination of the activation phase, light-induced clusters were bleached with a spot of ∼1.5 μm in diameter. The scanning speed was set to 1,000 Hz, bidirectionally (0.54 s / scan) and every time a selected point was photobleached for 300 ms. Fluorescence recovery was monitored in a series of 180 images while maintaining identical activation conditions used to induce clustering. Bleach point mean values were background subtracted and corrected for fluorescence loss using the intensity values from the entire cell. The data were then normalized to mean pre-bleach intensity and fitted with exponential recovery curve in Fiji or in frapplot package in R.”

      1. Description of analyses that authors prefer not to carry out

      **Reviewer #1**:

      Can they use the all and long isoform antibodies together, then subtract the signal from long isoform to conclude about the localization of the shorth isoform ?

      We thank the Reviewer for the suggestion, though given the differential efficiency of antibodies and other limitations of imaging experiments, we do not find the suggested experiment to have a potential to improve the quality of our manuscript. However, we should note that we have performed a pixel-based colocalization experiment between the signal detected by all isoform and long isoform SATB1 antibodies. Fluorocytogram of the pixel-based colocalization, based on 3D-SIM data is provided on the left, with quantified colocalization on the right of the revised Supplementary Fig. 5a.

      3) Lack of better staining with antibody against the long and short SATB1 isoforms after treatment with 1,6 Hexanediol. 1,6 Hexanediol treatment can change many other chromatin associated proteins to which SATB1 can be bound to indirectly. This experiment can

      We do understand the controversy and difficulties of experiments using 1,6-hexanediol treatment. However, we have to note that there is no better approach available for the investigation of LLPS in our primary murine T cells. We did use alternative approaches in ex vivo experiments, utilizing cell lines to validate our hypothesis without the involvement of 1,6-hexanediol.

      **Reviewer #2**:

      1. The authors mention, '...of the different SATB1 isoforms, uncovered by the use of the two different antibodies, relied in the heterochromatin areas (zone 1), where the long isoform was less frequently...' There is no supporting figure number mentioned. The authors need to show a zone-by-zone comparison images for 'all iso' vs 'long' iso of SATB1. Just to reiterate, there is a need for a heterochromatin mark to unambiguously call out the distinction.

      We should remind that there is an inherent difficulty to accurately compare localization of short and long SATB1 isoforms in primary cells, especially due to the lack of Satb1 isoform-specific knockout mice. There is no way to detect only the short isoform in these primary cells as there are only antibodies targeting the long or all SATB1 isoforms. Therefore, we cannot set up additional experiments probing these questions.

      In line with this, in the revised version of the manuscript, we toned down our statements regarding the differential localization of the two isoforms in primary cells. We only refer to it as an indication and we support it by adding references to the relevant figures. This part now reads: “Localization of SATB1 speckles detected by antibodies targeting all SATB1 isoforms and/or only the long SATB1 isoform, revealed a significant difference in the heterochromatin areas (zone 1, Fig. 2b), where the long isoform was less frequently present (see also Fig. 2a and Fig. 3c). Although, this could indicate a potential difference in localization between the two isoforms, due to the inherent difficulty to distinguish the two based on antibody staining, we refrain to draw any conclusions. (Lines 145-150)”

      1. Fig. 6a, The authors wished to see the effect of RNA on Satb1 nuclear localization. This is not related to the main theme of the paper, thus should be moved to supplementary (true for b as well). Importantly, the experiments should be performed with total cells to show the divergence of localization (like the paper the authors referred to) instead of matrix for clarity.

      • We did not wish to see the effect of RNA on SATB1 localization. In fact, there is a long history of SATB1 research that is inherently linked with the concept of nuclear matrix, a putative nuclear structure which is highly associated with nuclear RNAs. SATB1 was described many times as a nuclear matrix protein (https://doi.org/10.1016/0092-8674(92)90432-c; https://doi.org/10.1128/mcb.14.3.1852-1860.1994; https://doi.org/10.1074/jbc.272.17.11463; https://doi.org/10.1128/mcb.17.9.5275; https://doi.org/10.1021/bi971444j; https://doi.org/10.1083/jcb.141.2.335; https://doi.org/10.1101/gad.14.5.521; https://doi.org/10.1038/ng1146).

      • Moreover, our data discussed in comments 4-7 of this Reviewer, such as i. the localization of SATB1 to the nuclear zones associated with RNA and nuclear scaffold factors (Fig. 2b, Supplementary Fig. 1c), ii. colocalization of SATB1 with actively transcribed RNAs (Fig. 2c, Fig. 3g, Supplementary Fig. 2a, Supplementary Fig. 2c), iii. including its association with nucleoli (Supplementary Fig. 3b), and also iv. its computationally predicted interaction with Xist lncRNA (Agostini et al., 2013; https://doi.org/10.1093/nar/gks968) as a notable factor of nuclear matrix, all suggest that the interaction between RNA and SATB1 is plausible and potentially relevant for its function and/or at least its subnuclear localization. It is relevant even more so, when considering numerous reports on the ability of RNA-binding, poly-Q and PrLD-containing proteins to undergo LLPS https://doi.org/10.1016/j.molcel.2015.08.018; https://doi.org/10.1042/bcj20160499; https://doi.org/10.1016/j.cell.2018.03.002; https://doi.org/10.1016/j.cell.2018.06.006; https://doi.org/10.1093/nar/gkaa681), including RNAs specifically regulating LLPS behavior, especially for poly-Q and PrLD-containing proteins, such as SATB1 (https://doi.org/10.1126/science.aar7366; https://doi.org/10.1126/science.aar7432; https://doi.org/10.1016/j.ceb.2019.03.007; https://doi.org/10.1038/s41598-020-57994-9; https://doi.org/10.1016/j.molcel.2015.09.017; https://doi.org/10.1038/s41598-019-48883-x; https://doi.org/10.1038/s41467-019-11241-6).

      • It should also be noted that SAF and various hnRNPs, as the most prominent proteins of nuclear matrix were many times reported to phase separate (https://doi.org/10.1016/j.molcel.2019.10.001; https://doi.org/10.1074/jbc.ra118.005120; https://doi.org/10.1016/j.celrep.2019.12.080; https://doi.org/10.1038/s41467-019-09902-7; https://doi.org/10.1016/j.molcel.2017.12.022; https://doi.org/10.1074/jbc.tm118.001189). All these aspects show that the relation between nuclear matrix, SATB1 and RNA are quite relevant to our manuscript.

      • Moreover, in light of the aforementioned information, we believe that it is much clearer to follow the protocol we did – i.e. to remove soluble proteins by CSK treatment and then, upon RNase treatment, extract the released proteins using ammonium sulfate. In an experiment utilizing whole cells, one would need to microinject RNase A into the nucleus, which 1. is very challenging for primary T cells having a radius of 3-5 micrometers, 2. is of low throughput, 3. would not allow for released protein removal which would thus make the results hard to interpret. Please note that in the reference paper, the authors used cell lines overexpressing heterologous GFP-tagged proteins, which is not related to our setup.

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Reply to the Reviewers

      I thank the Referees for their...

      Referee #1

      1. The authors should provide more information when...

      Responses + The typical domed appearance of a hydrocephalus-harboring skull is apparent as early as P4, as shown in a new side-by-side comparison of pups at that age (Fig. 1A). + Though this is not stated in the MS 2. Figure 6: Why has only...

      Response: We expanded the comparison

      Minor comments:

      1. The text contains several...

      Response: We added...

      Referee #2

    1. If he had takencare to specify that when he said “we”, “us”, and “our” he meant each one of us, actingand responding as a user of quantum mechanics, he would have had it exactly right. Butit seems to me likely that he was using the first person plural collectively, to mean all ofus together, thereby promulgating the Copenhagen confusion
      • PEIERLS
      • (I THINK) "OUR"==="OBJETIVO", "COLECTIVO"

      • (I DON'T UNDERSTAND) "promulgating the Copenhagen CONFUSION"

      • (google) lo he encontrado en OTRO libro de Mermin, en referencia a una "CRITICA" de BELL al uso de la palabra "KNOWLEGDE" en las "explicaciones" de Heisenberg, y PEIERLS (!!!)

      • BELL: Whose Knowlegde? Knowlegde about WHAT?

      • (google) tambien aqui [MASTERPIECE-SEE 3 LAST PARAGRAPHS)

      • [Peierls responde a Bell]: "That leaves the question: whose knowledge should be represented in the density matrix? In general there will be many who may have some information about the state of a physical system. Each of them has to use his or her density matrix. These may differ, as the nature and amount of knowledge may differ. People may have observed the system by different methods, with more or less accuracy; they may have seen part of the results of another physicist. However, there are limitations to the extent to which their knowledge may differ. This is imposed by the uncertainty principle. For example if one observer has knowledge of s, of our Stern-Gerlach atom, another may not know sx, since the measurement of sx would have destroyed the other person's knowledge of J2, and vice versa. This limitation can be compactly and conveniently expressed by the condition that the density matrices used by the two observers must commute with each other."

      • (I THINK) Ha respondido a la pregunta??? I DONT THINK SO!

      • Me recuerda la "respuesta" de Bohr a EPR
      • "AMBOS" vuleven a "repetir" como "aplican" ellos la QM, pero sin "responder" a la pregunta!
    1. As We May Think

      Da a entender que los nuevos inventos han ampliado el poder físicos del hombre, pero no el de su mente. Por lo cual, Bush, dice que están a la mano instrumentos que, si se desarrollan adecuadamente darán al hombre acceso sobre el conocimiento.

    1. Author Response

      Reviewer 1

      Bailon-Zambrano and colleagues were trying to answer the general question: what contributes to phenotypic variation when a gene of strong effect is mutated?

      The work has several major strengths for answering this interesting question. First, they decided to study mef2ca in zebrafish for which they had previously shown that mutants displayed highly variable facial phenotypes. To learn how phenotypic variation depends on phenotypic severity, they realized they had studied more alleles, and so induced two more alleles to have three different types of molecular lesions (start codon mutation, premature stop codon, and full coding gene deletion). Investigating these alleles showed that increasingly severe alleles had more variation among individuals in the population but not necessarily more variation between the left and right sides of the face within individuals.

      Over several years, these investigators had spent considerable effort to select lines of fish that segregate the start-codon mutation and have either severe or weak effects on facial phenotypes. wondered: what factors were selected out of the original genetic background that would increase or decrease phenotypic severity? They hypothesized that one or more of the five mef2 paralogs in zebrafish might help to ameliorate the phenotype in the low line or reciprocally intensify the phenotype in the high line. They studied expression of the mef2 paralogs in neural crest cells by single-cell transcriptomics. They found that paralogs were downregulated in the high-penetrance line with respect to an unselected line, a result expected if expression of the paralogs contributed to buffering phenotypic severity. This experiment has two weaknesses, first that the method only examined neural crest cells but we know that signals from the ectodermal and endodermal epithelia contribute to craniofacial morphologies by diffusible signals. If genes regulating craniofacial morphologies that act in epithelia had genetic variation that contributes to severity, those genes would not be investigated in these crest-only experiments. A minor problem (which is associated with the expense of the experiment) is that the scRNA-seq experiments compared only the high and unselected lines, not the low line. To address both problems, the investigators performed qPCR on RNAs extracted from whole heads of genetically mef2ca-wild types from the high and low line. In these qPCR experiments, however, they did not investigate the unselected line. Leaving out the low line in one approach and leaving out the unselected line in the other approach somewhat weakens the strength with which one can draw conclusions (e.g., the qPCR conclusion assumes that the unselected line would be intermediate between the two selected lines) but is unlikely to change the basic conclusions the authors drew. In addition, using whole heads in the qPCR experiments, while it has the advantage that it includes epithelia, does not distinguish between genes expressed only in the crest and genes expressed in other cell types, and these experiments did not test for any genes known to affect craniofacial development that are epithelium-specific.

      In response to this comment, and those below, we removed the scRNA-seq comparing neural crest cells from unselected and high-penetrance strains. We replaced those data with new important results which considerably advance our model. We found significant paralog expression variation among unselected zebrafish families (Fig. 4D). These results strongly suggest that our breeding selected upon standing paralog variation the unselected parental strains. See more below.

      Finally, in key experiments that are a major strength of the work and require significant effort, the researchers systematically made mutations in four of the five zebrafish mef2 paralogs (mef2aa, mef2b, mef2cb, and mef2d, all except mef2ab, which didn't become mutated despite significant effort) in the genetic background of the lowpenetrance strain and studied them in single homozygotes, in double mutants, and in various heterozygous combinations. These important experiments showed that some paralogs provided significant buffering in the low-penetrance strain, the strain that up-regulated expression of these paralogs. It would be helpful in the discussion to mention that mef2ab couldn't be mutated and a phrase added about what that means for the general conclusions - in the opinion of this reviewer, the impact of this is not great but it should be acknowledged.

      We acknowledge that mef2ab couldn’t be mutated and consider what that means for the general conclusions in the text.

      A strength of the experiments is that the workers quantified effects of various genotypes by focusing on the length of the symplectic, a convenient element for quantification both within single individuals and among fish in a population. It would be helpful to have a statement on the evidence that this measure is a good representative for other aspects of the phenotype.

      We provide new data indicating that the symplectic cartilage length is significantly correlated with another mef2ca-associated phenotype (Fig. 1-figure supplement 2). See more below.

      Finally, the paper presents a model for understanding the results presented that does a good job of summarizing the data and, importantly, suggests ways to move the analysis deeper. Missing from the description of the model is a discussion about whether the genetic variation that was selected and ultimately upregulated mef2 paralogs is in regulatory elements of the mef2 paralogs themselves or whether it might be in trans-acting transcriptional regulators that simultaneously regulate all mef2 paralogs due to the authors' hypothesized 'cryptic vestigial' functions.

      We considerably revised the discussion, thoroughly considering both these possibilities.

      This work is likely to have a significant impact on the fields of developmental biology, the interpretation of human mutational variation (in for example the concept of phenotypic expansion), and the way people think about the evolution of new morphologies over time. A brief comparison of the authors' results and interpretations to those of C.H. Waddington's concept of genetic assimilation would provide improved historical context and broaden the potential impact of the work.

      We now include a discussion of our study in the context of Waddington’s genetic assimilation.

      Reviewer 2

      Bailon-Zambrano et al study the possible mechanisms that contribute to the oft-observed phenomenon that an individual mutation may be associated with variable expression of a phenotype. They focus on loss-of-function of the mef2ca gene of zebrafish, which is needed for the normal development of several craniofacial structures. They demonstrate that recessive putative loss-of-function mutant alleles of the mef2ca gene of zebrafish are associated with a range of expressivity. By focusing on one aspect of the mutant phenotype, the length of the symplectic cartilages that support the jaw, they find a correlation between the average strength of the phenotype of an allele (measured as reduction in length) and the extent of variability between mutant individuals that carry the allele. I am concerned about this conclusion and generalizations that may be drawn from focus on a single quantifiable character, the symplectic cartilage. Perhaps there is always a fixed variation in the length of this cartilage. As stronger alleles produce shorter cartilage pieces, variations in size may appear to be of greater significance when affecting shorter average length.

      We now show that the symplectic cartilage length is a good proxy for other craniofacial phenotypes (Fig. 1figure supplement 2). Further, we clarify in the text that we use the coefficient of variation (standard deviation/mean) which is the accepted best practice for determining and comparing variation. We also use the F-test statistic which is the standard statistical method to test for equality of two variances. This test tells us if the standard deviations from two datasets are significantly different.

      The authors hypothesize that one factor that contributes to the varied phenotypic expression of an allele (expressivity) is the co-expression of paralogs that may provide wildtype function and thus partially or wholly rescue the mutant phenotype. They test this hypothesis by "fixing" conditions where a single mutation may be expressed with low or high penetrance. By selective breeding based on phenotype, they create two sets of strains that carry an identical mef2ca mutation: one strain has high penetrance of the mutant phenotype and the other low penetrance. They then investigate the factors that are likely responsible for the high vs low penetrance. Historically we would call these factors "genetic modifiers". There is extensive literature on the nature of genetic modifiers and there are many current screens in both mice and Drosophila to identify genetic modifiers and uncover their nature, but there is little reference to these studies in the current manuscript. Further, there is previously published work that hypothesizes that one important function of paralogs in multicellular organisms is to provide a buffer to stabilize levels of gene expression needed for developmental decisions.

      Following this reviewer’s suggestion, we now include many new references (increased from ~50 to >80) incorporating much of the important work leading up to our study. These include referencing both genetic modifier mutagenesis screens, paralogous buffering in other systems, and “natural” modifier studies that set the stage for our work.

      The authors find that paralogs of the mef2ca gene are expressed in cells that normally express mef2ca, and that these paralogs are expressed at higher levels in the mutant strain with low penetrance than in the mutant strain with high penetrance. They say that selection for high penetrance of the mef2ca mutant phenotype "leads to down-regulation" of paralog expression. As the authors only show that paralog expression is at lower levels in high penetrance vs low penetrance strains, it is not clear what they mean by "down-regulation". Perhaps their breeding scheme has only "captured" what is natural variation and there is no active mechanism of "down-regulation". The authors need to clarify what they mean.

      Thank you for this suggestion. We clarified that we do not mean active down or up regulation but rather selection on preexisting genetic variation. This conclusion is supported by new data (Fig. 4D).

      The authors also find that individuals from the high penetrance strains that don't carry the mef2ca mutation (they are wildtype for this gene) sometimes exhibit mef2ca mutant characters. They suggest the reduced paralog expression is responsible for the occasional emergence of the mef2ca mutant characters. In contrast with this suggestion, the authors later claim the paralogs "have no function" in craniofacial development. The authors need to clarify their thoughts about what is paralog function in craniofacial development and why reduced paralog function might contribute to the expression of mef2ca mutant characters. This topic is worthy of discussion.

      We considerably revised our discussion of this topic including our interpretation that the decreased expression of mef2ca in high penetrance strain led to the phenotypes we observe in mef2ca wild types from this strain. We also are more careful with our language, stating that the paralog mutants are indistinguishable from wild types, rather than stating that paralogs do not function in craniofacial development. In fact, they do function in craniofacial development, as buffers. Thank you for this suggestion that strengthened our manuscript.

      The authors claim is there is both up-regulation of paralogs in low penetrance strains and down-regulation of paralogs in high penetrance strains. As they only compare steady state levels of expression in each strain, they can only reasonably conclude that there are differences - they seem to imply a mechanism and they need to be clear about what they are thinking.

      Excellent point. In the revised manuscript, we are clear that there is not active up or down regulation but rather selection upon preexisting variation.

      They hypothesize that paralog expression in the low penetrance strain masks the effects of loss of mef2ca. They test this by creating CRISPR-engineered mutations of two paralogs and examining the effects of the paralog mutations in wildtype fish or in fish carrying the mef2ca mutation. They find the putative loss-offunction mutations in the paralogs have no effect in wildtype backgrounds and conclude these paralog genes have no function in craniofacial development. However, the paralog mutations enhance the mutant phenotype in fish that carry the mef2ca mutation. This provides strong evidence consistent with the model that the elevated expression of the paralogs functions to reduce the severity of the phenotype associated with the mef2ca mutation.

      Reviewer 3

      In this elegant genetic study, Bailon-Zambrano et al. draw on classical genetic concepts to address the clinically pertinent question of how genetic variants in the same gene can yield wildly different phenotypes in different individuals. They focus on the Mef2c gene, which is required for craniofacial and cardiac development in humans and model organisms yet shows highly variable phenotypes across and within individuals. Previous work by this lab had established that zebrafish mef2ca craniofacial phenotypes are highly variable and, importantly, that this variability is heritable and can be selectively bred for low vs. high penetrance. The authors hypothesize that vestigial expression of paralogous genes variably compensates for loss of mef2ca, such that individuals with higher levels of paralogous genes will show lessened severity and vice versa. To test their hypothesis, they methodically quantify the penetrance, expressivity, and variability of all known mef2caassociated craniofacial phenotypes in fish carrying 1) different mef2ca mutations, 2) the same mutation but after selecting for high vs. low penetrance for many generations, and 3) mef2ca mutations combined with mutations in paralogous genes. They find that not only does allele severity directly correlate with variation, but also that different paralogs buffer the severity and variability of different craniofacial phenotypes. Another particularly interesting finding is that some of the craniofacial phenotypes are apparent even in mef2ca wildtypes from the high penetrance strain, which they explain by the very low expression of paralogs on this background. A weakness of the study is that the authors do not directly show whether paralog expression is increased in the low-penetrance strain relative to the initial, unselected genetic background. It is therefore not clear whether the selection for low penetrance worked in this manner, as the authors imply. Overall, the authors have achieved an important step forward in understanding the genetic basis for the high variability of human faces among both healthy individuals and those with craniofacial anomalies.

      We can’t go back (over ten generations) to survey the original parental strain. However, we can use the unselected AB strain as a proxy for the initial unselected genetic background. In an important addition to the manuscript, we found significant paralog expression variation between unselected AB families (Fig. 4D). These results strongly suggesting there is cryptic, standing paralog expression variation that we selected upon. We would like to thank the reviewer for this excellent critique which motivated these important new experiments considerably advancing our model.

    1. Author Response

      Reviewer 1

      Ting Tang et al. present the results of a species x genotype diversity experiment within BEF China. The authors assess the relative impacts of species and genotype diversity on community-level primary productivity of the trees and the potential mediation of this effect via interactions of plants with soil fungi and herbivores. The results show that both species and genotype diversity influence productivity via changes in herbivory, soil fungal diversity, and other unknown mechanisms. Most of the species diversity effects could be directly related to functional diversity, while genotype diversity effects were not well represented by the way functional diversity was measured in this study.

      Thanks for the positive comments on the paper.

      The study is based on an impressive experiment that will certainly allow achieving major insights into the role of genotype and species diversity on ecosystem functioning. However, there are some significant shortcomings in the methods that limit this study. In particular, the incomplete assessment of functional traits, herbivory, and fungal diversity across the subplots used for this study reduces statistical power. Specific measurements of traits, herbivory and fungal diversity in each plot would substantially simplify the design and the analyses and likely also reduce the unexplained variance observed in the study. However, this is nothing that can be changed now and has the likely explanation of feasibility constraints.

      Thank you for the positive comments on the paper and the understanding of the feasibility constraints. In our study, functional traits of all the seed families of the four species across all the species × genetic diversity combinations were sampled, but to reduce circularity, we used the seed-family means across all tree diversity combinations to calculate functional diversity for every subplot instead of only using the functional trait measures obtained in that particular subplot. We have taken up the suggestion to also calculate functional diversity based on trait measurements of individual trees, but also here used data across all plots to reduce circularity. Additionally, we now acknowledge the incomplete assessment of herbivory in the Methods and state that fungal diversity in plant species mixtures was sampled on plot level because of feasibility constraints.

      Lines 334–337: “To reduce circularity, we used the seed-family means across all species × genetic diversity combinations to calculate FDis values per subplot that did not only depend on the functional trait measures obtained in that particular subplot. Using traits measured in a particular subplot to calculate FDis for that subplot bears the risk that the measured traits reflect a response to the local environment, yet we want to use FDis as a predictor variable for the performance of that subplot.

      Lines 380–382: “The mean value of herbivore damage per species × genetic diversity level was used to fill in missing values in a few subplots with tree individuals lacking herbivory data (Table S3).

      Lines 385–388: “Soil fungal diversity was used as a proxy of unspecified trophic interactions. To be consistent with the species and genetic diversity treatment design, soil samples were taken on subplot level for the 1.1 and 1.4 diversity treatments, but, due to feasibility constraints, on plot level for the 4.1 and 4.4 diversity treatments in 2017.”

      The writing of the manuscript is generally good. However, given the somewhat diffuse results obtained for genetic diversity effects, they receive a lot of attention in the discussion, while species diversity effects are little mentioned. This could be better balanced and also referred back to the hypotheses. For example, I miss the discussion of the very clear hypothesis that genotype diversity effects are positive in species monocultures but neutral in species mixtures. How do your results fit with this hypothesis? My general impression is that the study is very well framed, but lacks to stick to this frame in the discussion. I am aware that this might be a challenge with the results obtained, but worth trying.

      Thank you for the positive comments on the writing and pointing out the unclear part of the genetic diversity effects. To better connect the discussion to our hypothesis that genotype diversity effects are “more important in species monocultures than in species mixtures” (lines 114–115), we have rewritten the corresponding Discussion section.

      Lines 248–164: “In contrast of our second hypothesis, we found that the effects of genetic diversity via functional diversity and multi-trophic feedbacks were negative in species monocultures but positive in the species mixture (Fig. 5 and Fig. S3). We found genetic diversity had positive effects on tree functional diversity and soil fungal diversity, which supports the trade-offs between genetic and species diversity discussed in the previous section. However, the hypothesized positive effects of tree functional diversity on productivity turned negative in species monoculture. This result indicates that functional diversity may not have positive effects on the ecosystem functioning under low environmental heterogeneity, i.e. species monocultures in our study (Hillebrand and Matthiessen 2009). Therefore, our findings show that the different effects of genetic diversity on tree productivity between species monocultures and mixtures, not only depend on the different effects of genetic diversity on functional diversity and trophic interaction but also on the varied tree productivity consequences from functional diversity and trophic interaction on tree productivity between species monocultures and mixtures. Moreover, other aspects of tree genetic diversity seem to play an important role not only for productivity in tree species mixtures (see previous section) but also for productivity in tree species monocultures. These may include unmeasured functional traits such as root traits (Bardgett et al., 2014) or unknown mechanisms underpinning effects of tree genetic diversity.

      Given the complex results obtained, I also feel that the title and main message received in the abstract do not fully reflect the results. Genetic diversity effects on productivity, but also on herbivory and fungal diversity, are not general (e.g. Fig. 2) nor are all genetic diversity effects on productivity mediated by functional diversity and trophic feedback. I think the title and main message of the study should be articulated more precisely.

      In this study we did not find direct effects of genetic diversity on tree productivity in the binary analyses (Fig. 2), but we did find indirect effects of genetic diversity on tree productivity via functional diversity and trophic feedbacks in the path analysis (Fig. 4). Now we have pointed this out in the Discussion.

      Lines 201–204: “Although only species diversity but not genetic diversity was found to affect tree productivity in binary analyses, both kinds of diversity positively affected tree community productivity and trophic interactions via functional diversity according to our structural equation models (SEMs) depicted in the corresponding path-analysis diagrams (see Fig. 4).

      We agree that not all genetic diversity effects on productivity were mediated by functional diversity and trophic feedbacks. This may have been because we did not include all relevant functional traits and trophic interactions in this study. Nevertheless, our findings support the hypothesis that genetic diversity can affect productivity via functional diversity and trophic feedbacks and suggest more possibilities for further research. We have explained this in the Discussion.

      Lines 230–238: “Even after accounting for tree functional diversity and trophic feedbacks, we still detected a direct negative effect of tree genetic diversity on tree productivity, while the direct effect of tree species diversity was fully explained by functional diversity and trophic feedbacks. This suggests that aspects of genetic diversity that do not contribute to functional diversity or trophic interactions as measured in this study may reduce ecosystem functioning, e.g. due to trade-offs between genetic diversity and species diversity. For example, it has been shown that in species-diverse grassland ecosystems, niche-complementarity between species can increase at the expense of reduced variation within species (van Moorsel et al., 2018; van Moorsel et al., 2019; Zuppinger-Dingley et al., 2014; Zvereva et al., 2012).

      Lines 260–264: “Moreover, other aspects of tree genetic diversity seem to play an important role not only for productivity in tree species mixtures (see previous section) but also for productivity in tree species monocultures. These may include unmeasured functional traits such as root traits (Bardgett et al., 2014) or unknown mechanisms underpinning effects of tree genetic diversity.”

      Reviewer 2

      This study aims to disentangle the contributions of genetic and species diversity to tree community fitness. It confirms the role of genetic diversity in functional and ecological traits but shows how these effects change when plant species diversity is increased, which can potentially add to our understanding of the interplay between plant diversity at various levels and community and ecosystem functions. It would be desirable to make emphasis whether differences between the effects of genetic and species diversity are comparable since they can act at complementary but different levels. It is hard to establish whether the effects of species diversity override the effects of genetic diversity by shared mechanisms; or whether a high species diversity reduces plant intraspecific interactions and the consequent effects of genetic diversity by density-dependent effects. However, this point has to be emphasized in the discussion.

      Thank you for your positive comments on this paper. In the binary analyses in this paper, we used general linear mixed-model analysis to detect the effects of genetic diversity within species. Now we have clarified this in the Methods and the Results section. However, in Fig. 2 we also indicate the significance of the main effect of genetic diversity. We do not focus on this because of the interaction between species and genetic diversity. In statistical terms, fitting genetic diversity effects separately for species monocultures and mixture (2 degrees of freedom) is equivalent (i.e. has the same sum of squares) as fitting the main effect of genetic diversity (1 degree of freedom) and the interactions species x genetic diversity (1 degree of freedom).

      Lines 415–424: “To determine how species and genetic diversity and their interaction affected tree functional diversity and trophic interactions, linear mixed-effects models (LMMs) were fitted with two types of contrast coding. In the first, we used the ordinary 2-way analysis of variance with interaction and in the second we replaced the genetic diversity main effect and the interaction with separate genetic diversity effects for species monocultures and the species mixture (Table S6). Note that as our design was orthogonal, fitting sequence did not matter in either of the codings. However, we focused our major analysis on the second type of coding to make it consistent with our hypotheses. Main effects of genetic diversity are presented in inset panels in Fig. 2. Our second contrast coding ensured that we tested the effects of genetic diversity separately in species monocultures and species mixture, but within the same analysis.

      Lines 120–121: “Using linear mixed-model analyses, we tested the effects of species diversity and genetic diversity within species on trophic interactions and community productivity.

      Meanwhile, to emphasize that species diversity and genetic diversity could affect each other, we discussed that the trade-offs between species and genetic diversity could contribute to the effects of tree diversity on tree community productivity. We also discussed that the different effects of genetic diversity between species monocultures and mixtures may occur because different biotic environments resulted from different species diversity.

      Lines 232–238: “This suggests that aspects of genetic diversity that do not contribute to functional diversity or trophic interactions as measured in this study may reduce ecosystem functioning, e.g. due to trade-offs between genetic diversity and species diversity. For example, it has been shown that in species-diverse grassland ecosystems niche-complementarity between species can increase at the expense of reduced variation within species (van Moorsel et al., 2018; van Moorsel et al., 2019; Zuppinger-Dingley et al., 2014; Zvereva et al., 2012).

      Lines 250–260: “We found genetic diversity had positive effects on tree functional diversity and soil fungal diversity, which supports the trade-offs between genetic and species diversity discussed in the previous section. However, the hypothesized positive effects of tree functional diversity on productivity turned negative in species monoculture. This result indicates that functional diversity may not have positive effects on the ecosystem functioning under low environmental heterogeneity, i.e. species monocultures in our study (Hillebrand and Matthiessen 2009). Therefore, our findings show that the different effects of genetic diversity on tree productivity between species monocultures and mixtures, not only depend on the different effects of genetic diversity on functional diversity and trophic interaction but also on the varied tree productivity consequences from functional diversity and trophic interaction on tree productivity between species monocultures and mixtures.

      The experimental design has to be explained in more detail, in particular how plants were planted in the species monocultures. It is not stated whether the same or different species were used in the plots or in subplots. The design lacks proper replication for the treatment with high genetic diversity in species monocultures (n=2) which could lead to a biased result, especially if those plots were located in the same area.

      Thank you for the valuable comments on the experiment design. In total, we used four species and eight seed families per species for the whole experiment, and now we have added a diagram of the experimental design to the supplementary material (Fig. S5) to show the species and seed-family information for every subplot. Furthermore, we have added a table to the supplementary material to indicate the occurrence time of each species and each seed family across all the tree diversity-treatment combinations (Table S2). The high genetic diversity in species monoculture (1.4 treatment) was replicated 2 times per species and thus had 8 replications (Fig. S5). However, because we did not have enough seedlings, we could only establish these treatments at subplot level and thus put the different species for the 1.4 treatment into only two plots. Now we have added more explanation of the plot design in the Methods part. The plot distribution was completely randomized across the experimental site and plots of the same treatments were mostly located at least 50 m from each other (see Fig. 1 from Bongers et al., 2020, pasted here further below). The reason that there are more plots for the 1.1 treatment is that typically in biodiversity experiments more plots are needed at the lowest diversity treatment because of the desire to have all seed families occurring in any mixture also present as monoculture. Regarding the point that the four diversity treatments varied between rather than within plots, we ensured that diversity effects were tested at the plot level by including plot as random-effects term in the mixed models.

      Lines 305–323: “For each of the four species, we collected seeds from eight mother trees to allow for two replications of four-family mixtures per species. Furthermore, to avoid the effects of unequal representation of particular seed families and correlations between seed family presence and diversity treatments, we made sure that every seed family occurred the same number of times at each diversity level (see Table S2, small deviations from the rule were required where not enough seeds from a seed family could be obtained). Due to budget limitations and the number of replicates required per single seed family, the 1.1 and 1.4 diversity treatments were applied at subplot level (0.25 mu) and replicated 32 and 8 times, respectively. The 4.1 and 4.4 diversity treatments were applied at plot level (1 mu) and were replicated 8 and 6 times, respectively (Fig. S5; see also Fig. 1 in Bongers et al., 2020). To allow for simpler analysis, we obtained most community measures at subplot level also for the 4.1 and 4.4 diversity treatments and thereafter used the subplots for all tests of diversity effects on these community measures, including plots as error (i.e. random-effects) term for testing the diversity effects in the corresponding mixed models. In total, because one 1-mu plot could not be established due to logistic constraints, the number of subplots used was 92 (32 subplots of 1.1, 8 subplots of 1.4, 28 subplots of 4.1 and 24 subplots of 4.4 diversity treatment). Note that in biodiversity experiments lower richness levels represent more different communities and thus require more plots. For the highest richness level, where there is typically only one species composition, this same community is typically replicated multiple times, as we did here for the 4.4. diversity treatment.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      The manuscript describes the formation of supernumerary centriole protein assemblies ("cenpas") upon silencing of the E3 ubiquitin ligase TRIM37. These "cenpas" resemble centrioles, centriole precursors, or electron-dense striped structures, termed "tigers". Similar observations are made in cells from patients lacking functional alleles of TRIM37. The "cenpas" usually lack the full complement of centriolar proteins, but contain increased amounts of the pro-centriole marker centrobin. It is further shown that the formation of "cenpas" depends on centrobin, or on a parallel pathway involving Plk1 and SAS-6. Overall, the experiments in this study are of high technical quality and most of them are carefully controlled. The discovery of centrobin-containing striped protein assemblies ("tigers") is very interesting and provokes the question of their molecular composition and their mechanistic role in centriole assembly. Since striated fibres containing the protein rootletin have a similar periodicity of stripes (75nm) as the "tigers" in this study (Vlijm et al., PNAS 2018, 115:E2246-53), I was wondering whether the authors couldn't simply test for colocalization of their "tiger"-stripes with rootletin. A potential identity of "tigers" with striated fibres would help understanding the mechanisms of "cenpas" and centriole assembly upon depletion of TRIM37: striated fibres or "tigers" might be controlling the balance of centriole cohesion vs. disengagement and thereby centriole duplication, or they might play a role in the recruitment of additional proteins involved in pro-centriole assembly.

      We are grateful to the reviewer for this interesting suggestion. Accordingly, we will test the distribution of Rootletin and potentially CEP68 by immunofluorescence analysis of cells depleted of TRIM37.

      In the same context, did the authors correct for the experimentally induced sample expansion in Figure 5B, when comparing inter-stripe distances between U-ExM and EM samples?

      Yes, we did. We will clarify the text of the revised manuscript to make this more explicit.

      Other major points: The amount of TRIM37-depletion upon siRNA-treatment should be indicated prominently. I see in the "Materials and Methods" and in Fig. S4 that quantitative RT-PCR has been performed. Could Western blotting be performed to have direct information on the protein levels? Fig. 2C demonstrates that this is possible in cells from human patients, so why are there no data on the majority of other experiments in this manuscript?

      We previously reported Western blot analysis to estimate the extent of TRIM37 depletion upon siRNA treatment (Balestra et aI., 2013). However, following the suggestion of the reviewer, we will repeat this analysis for select experiments of this study.

      Moreover, what is the transfection efficiency in the siRNA experiments? Is there variability between cells that might explain variability in the "cenpas" phenotypes?

      The reviewer brings up an interesting point. However, in the absence of an antibody to detect endogenous TRIM37 by immunofluorescence analysis, we cannot provide an accurate figure in this case. We will mention this limitation explicitly in the text of the revised manuscript.

      Minor point: In line 353 (page 12), it is stated that centrobin in si-TRIM37 cells migrates slower (Fig. 4D), suggesting that TRIM37 regulates the post-translational state of centrobin. It looks to me as if the corresponding gel in Fig. 4D was "smiling" (see curvature of centrobin in the neighboring lane). I think that the authors should tone down their statement, or replace Fig. 4D with a more convincing image.

      We thank the reviewer for having noticed this. We will provide another gel that is not “smiling” -the difference in migration has been observed in a reproducible manner.

      Reviewer #1 (Significance):

      The findings of this manuscript are highly significant for our understanding of centriole biogenesis. They should be of interest to a large community of cell biologists working on mitosis and on the centrosome, and they are of further importance for biomedical research related to developmental growth abnormalities (Mulibrey nanism). The manuscript shows for the first time a mechanistic link between TRIM37-dependent control of centrobin protein levels, and their impact on the formation of centriole precursors during the cell cycle. The manuscript is well presented, and the relevant scientific literature is cited correctly. However, I would prefer that a potential relationship between "cenpas", "tigers", and the welldescribed rootletin-containing striated fibres be discussed, if not controlled by additional experiments.

      We thank the reviewer for her/his appreciation of our work and support for publication.

      Field of expertise of this reviewer: centrosome, microtubules, mitosis, cell culture, light and electron microscopy, biochemistry.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this work, the authors investigated roles of TRIM37 in regulation of centriole numbers. It was previously observed that depletion of TRIM37 results in supernumerary centrioles and centriole-like structures (Balestra et al., Dev. Cell, 2013; Meitinger et. al., 2016). Here, the authors characterized these centriolar protein assemblies (Cenpas). Cenpas were formed, following an atypical de novo pathway and eventually trigger centriole assembly. They observed that Centrobin is frequently present in Cenpas from the early stage and other centriolar components are sequentially recruited. Furthermore, they established that Cenpas formation upon TRIM37 depletion requires PLK4 activity. TRIM37 depletion also activates PLK1-dependent centriole multiplication. 1.They propose that the tiger structure acts as platform for PLK4-dependent Cenpas assembly. Cenpas may evolve into centriole-like structures after a stepwise incorporation of other centriolar proteins. Fig. 6E suggests that a series of events seem to occur within G2 phase. Therefore, this reviewer suggests to perform a detailed time-course experiments at G2 phase. According to the model, the Centrobin-positive tiger structures may appear first, and then a Centrobin- and centrin-2-double positive structure starts to appear.

      We fully agree with the reviewer that this is an important experiment, which we will perform by analyzing TRIM37 depleted cells at successive time points after release from a double thymidine block, using antibodies against Centrobin and Centrin.

      2.They claim that Mulibrey patient cells exhibited evidence of chromosome mis-segregation, as would be expected from multipolar spindle assembly, and conclude that Cenpas are present and active also in Mulibrey patient cells. Chromosome mis-segregation may be observed in the normal cells, too. Therefore, they have to perform statistical analysis on Fig. 2D.

      In response to this suggestion and to the related comment of reviewer 3 (see below), we will conduct additional immunofluorescence analysis and quantification of patient and normal cells, assessing the distribution of Centrin, Centrobin, microtubules and γ-tubulin, as well as scoring the extent of chromosome mis-segregation.

      3.In Fig. 2A, They claimed that mitotic microtubules were disrupted with the cold treatment for 30 min. In our experience, cold treatment for 30 min is not sufficient to disrupt mitotic microtubules. They may show control panel before microtubule regrowth.

      We will show the control panel as requested.

      Reviewer #2 (Significance):

      Significance of this work resides in identification and description of Cenpas as a novel centriole assembly pathway. The authors used cutting-edge microscopy techniques to visualize Cenpas. The manuscript raised more questions than answers. Nonetheless, it is worth to publish the manuscript after revision.

      We thank the reviewer for supporting publication after revision.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Balestra and colleagues investigate the function of Trim 37 in centrosome biogenesis. Trim 37 is a ubiquitin ligase that has previously been identified by the authors as a regulator of centriole duplication. Mutations in Trim37 cause a rare syndrome named Mulibrey that is responsible for a severe form of dwarphism Here they show that depletion of Trim37 in human cells results in the assembly of structures that they name Cenpas. They follow the possibility that Trim37 localises to the centrosome, which might inhibit the assembly of these structures. Further they show that Trim37 depleted cells (or in patient fibroblasts ) assemble multipolar mitosis. Further analysis shows that what the authors defined as abnormal centriole structures are formed in Trim37 depleted cells. These structures recruit centrobin, a daughter centriole component and this process requires the activity of PLK4 and PLK1. Major comments: This study characterizes Trim37 and its possible role in centriole biogenesis. Most conclusions are convincing, although some of the claims taken by the authors might require more data to be corroborated.

      1)The major point to be taken into consideration in my opinion relates with the Cenpas structure. According to the beautiful cryo-EM data shown on Fig 3, I wonder why the authors describe these structures as centriole like- or centriole related. I think these appear very different from centrioles and this might be even quite interesting if these structures nucleate microtubules and can participate in mitotic spindle assembly.

      We have a different opinion on this point. Most of the “centriole-like” or “centriole-related” structures do resemble the organelle, in that they contain microtubule bundles and are of a related size (in addition to bearing centriolar markers). However, recognizing that the distinction between these two categories of structures is somewhat arbitrary, we will combine them into the most prudent term “centriole-related”, and further explain in the revised manuscript that they comprise a range of structures.

      The authors correlate these non-canonical centriole structures as possible microtubule nucleators that might be responsible for multipolar configurations like in Fig 2D. This correlation has to be established. In Figure 2D, the authors analyze configurations of mitotic cells in terms of centrosome number and characterized frequency of extra foci. To me the foci they show are quite different in nature. Poles 1 and 3 have both centrin and g-tubulin (presumably centrioles), pole 2 has only a tiny amount of centrin and no g-tubulin, while pole 4 appears to contain both but less of each protein. So the question is are they all nucleating microtubules and participating in spindle assembly? This is particularly important in light of what the authors then mention, which is the occurrence of chromosome mis-segreation in patient cells (this is not shown). Also they describe these extra poles, and then say that Cenpas are active in patient cells. But, active in which manner? By nucleating microtubules? First, in either siRNA cells or in patient cells the authors should analyze microtubules and show that all the extra poles (made of non-canonical centriole) nucleate microtubules and participate in spindle assembly.

      In response to this suggestion and to the related comment of reviewer 2 (see above), we will conduct additional immunofluorescence analysis and quantification of patient and normal cells, assessing the distribution of Centrin, Centrobin, microtubules and γ-tubulin, as well as scoring the extent of chromosome mis-segregation.

      If they want to propose that this might be the cause of genome integrity loss in patients (as stated in the abstract and suggested a few times throughout the paper) they have to show that cells divide abnormally and generate aneuploidy progeny.

      See response just above.

      2) Another important point that is only partially addresses is the function of Trim37 in stabilizing centrobin. Does Trim37 ubiquitinates centrobin? While the western blot on Figure 4 shows an increase at 8hrs in Trim37 RNAi, this is also the case for tubulin (Fig 4E). But the overall levels appear only slightly increased when compared to its levels at time point zero (Fig. 4F). I can see that in siRNA Ctrl Trim 37 levels go down, but it is still present so how do they explain the lack of Cenpas in this case? Is there a threshold that supports centriole duplication without any major defect but accumulation of a certain level of centrobin then generates Cenpas? Can the authors generate Cenpas just by over-expressing centrobin directly?

      It appears from the comment of the reviewer that we were not sufficiently clear here. The experiment reported in Figure 4E and 4F is done in the presence of cycloheximide to analyze the half-life of Centrobin in control conditions and upon TRIM37 depletion. We will clarify the text in the revised manuscript to facilitate understanding.

      In Figure 2, they analyze configurations of mitotic cells in terms of centrosome number and characterized frequency of extra foci. To me the foci they show are quite different in nature. Poles 1 and 3 have both centrin and g-tubulin (presumably centrioles), pole 2 has only a tiny mount of centrin and no g-tubulin, while pole 4 appears to contain both but less of each protein. So the question is are they all nucleating microtubules and participating in spindle assembly? This is particularly important in light of what the authors then mention, which is the occurrence of chromosome mis-segreation in patient cells without showing it. Also they describe these extra poles, and then say that Cenpas are active in patient cells. But, active in which manner? By nucleating microtubules? This has to be shown. Also analysis of mitosis should be included to back up a defect in chromosome segregation and also to identify which type of defect.

      The above section is a copy/paste mistake (as indicated also in a correspondence between Review Commons and the reviewer).

      So in conclusion, the link between Cenpas and multipolarity has to be better investigated in my opinion. This should not be time consuming and also not extremely costly. Authors should label spindle MTs in patient fibroblasts to show that indeed Cenpas are nucleating microtubules. Ideally Cenpas would be distinguished by centrobin labeling. In siRNA depleted cells maybe time lapse microscopy can be used to image mitosis and show a correlation between Cenpas and multipolarity?

      As mentioned above, we will conduct additional immunofluorescence analysis and quantification of patient and normal cells, assessing the distribution of Centrin, Centrobin, microtubules and γ-tubulin, as well as scoring the extent of chromosome mis-segregation.

      The data is presented without statistical analysis on the figures only on Fig legends, This is really difficult for the reader. The number of experiments and cells analyzed maybe should be also included in each Figure.

      We had kept this information to the legends merely to have lean figures, but will consider moving it to the figure panels in the revised manuscript.

      Minor comments: Some picture lack scale bars

      Apologies. This will be fixed.

      the localization of GFP-Trim37. On Figure 1 the authors describe a different localization when fused to a NES localization. It is true that a dotty signal is seen on the panel of NES (Figure 1D), but a nuclear signal is not seen on Trim-GFP in any of the images provided. Shouldn't this be the case?

      There is some GFP-TRIM37 nuclear signal in the left panel of Figure 1D, although it is very weak. We will explore the possibility of providing an inset with adjusted brightness/contrast to emphasize this point.

      Fig 1C is missing a siCtrl.

      The control quantification will be included (no extra centrioles are present in this case).

      Why Trim37GFP does not rescue completely the assembly of the extra foci?

      In general, there can be many reasons why rescue in such an experimental setting is not complete, including slightly different protein levels, distribution, or interaction with partner proteins. Such possibilities will be discussed explicitly in the revised manuscript.

      In Fig 6E, are the authors sure that in the condition of siTRim3 plus si Centrobin and Plk1 inhibition, cells are not stuck in S-phase? This might explain the lack of being in a permissive G2 phase to generate Cenpas?

      Although Plk1 inhibition is not expected to block cells in S phase, we cannot rule out this possibility from the data currently available. Therefore, we plan to conduct FACS analysis in a repeat of this experiment to assess cell cycle status.

      The data is presented without statistical analysis on the figures. This can be found on figure legends, but it is better to include on the figures to facilitate the reader's job. The number of experiments and cells analyzed maybe should be also included in each Figure?

      As mentioned above also, we had kept this information to the legends merely to have lean figures, but will consider moving it to the figure panels in the revised manuscript.

      Reviewer #3 (Significance):

      Interesting findings and quite novel since a role for Trim 37 in centriole biogenesis has never been reported. Also quite interesting the possible link between multipolarity (needs better characterization) and Mulibrey syndrome.

      We thank the reviewer for recognizing the interest and novelty of our work

    1. I was thinking about everything, and that includes general relativity theory, since actually this theory is rather complicated. It has many branches and there was a lot of material which had been worked out for many years. People have studied it, and quantum gravity is extremely complicated. I was just lucky that such beautiful things were at the surface so I could see them. You see, my mind is not very technical. I work best of all in those places where I can use my intuition. Lightman: That's very interesting. I'd like to start asking you questions about that. I've noticed from your technical papers and in your paper in Physics Today[6] and your lectures that you describe things intuitively, with pictures and so forth. I know there have been certain physicists in the past who have used images and visualization and pictures more than other physicists. I think Einstein used a lot of visual images. All of his Gedanken experiments were based on mental images rather than on writing out equations. Even here [at Harvard] we make a joke in the physics department that Weinberg is very technical and [Sheldon] Glashow is very intuitive. So there do seem to be different styles of doing physics. One question that I've been very interested in, and some psychologists are interested in too, is how physicists use mental pictures. Maybe not exactly pictures but, for example, the way we say in quantum mechanics that sometimes things act as particles and sometimes as waves. I guess we're attempting to make a connection to our daily experience with the world. How do you use images in your work? Do you find images useful or harmful? Linde: Typically, I just use them. Of course, I use mathematics, certainly. Lightman: Of course. Linde: But first we usually have a rough idea of how it could work and why, and what is the purpose. Without understanding the purpose of what we are doing, you may try many different ways and you just solve equations without understanding why it is necessary.
      • "METHOD"
    1. Author Response

      Reviewer 1

      In this article Farrell et al. leverage existing datasets which measure frailty longitudinally in mice and humans to model 'robustness' (the ability to resist damage) and 'resilience' (the ability to recover from damage), their dynamics across age, and their relative contributions to overall frailty and mortality. The concept of separating damage/robustness from recovery/resilience is valid and has many important applications including better assessment and prediction of effective intervention strategies. I also appreciate the authors' sophisticated attempts to effectively model longitudinal data, which is a challenge in the field. The use of human and mouse data is another strength of the study, and it is quite interesting to see overlapping trends between the two species.

      While I find the rationale sound and appreciate the approach taken at a high level, there are a few key considerations of the specific data used which are lacking. The authors conceptualize resilience based on studies which primarily use short time scales and dynamic objective measures (ex. complete blood cell counts in Pyrkov et al.) often in conjunction with an acute stress stimulus. For example, they heavily cite Ukraintseva et al. who define resilience as "the ability to quickly and completely recover after deviation from normal physiological state or damage caused by a stressor or an adverse health event."

      Resilience and robustness are typically studied at short time-scales, with small numbers of continuous health attributes. We study transitions of binary health attributes, which we call damage and repair, and which we suggest should be thought of as resilience and robustness. Our approach is well suited for studying large numbers of binary health attributes over long time-scales without acute external stimuli. How resilience and robustness in these limits (binary, large numbers, long times, intrinsic dynamics) compare with resilience and robustness as has been typically measured (continuous, short times, acute stimuli) is an interesting and important question that arises from our work.

      Given these definitions, the human data used seem to fit within this framework, but we should carefully consider the mouse data. The mouse frailty index is a very useful tool for efficiently measuring the organismal state in large cohorts. A tradeoff for quickly measuring a broad range of health domains is that the individual measurements are low resolution (categorical) and involve inherent subjectivity (which may be considered part of the measurement error). Some transitions in individual components are due to random measurement error and I believe this is especially likely with decreases (or 'resilience' transitions).

      The reason I think the resilience transitions are subject to high measurement error is that I am skeptical as to whether many of the deficits in the mouse index are reversible under normal physiologic conditions. For example, it is exceptionally unlikely for a palpable/visible tumor to resolve in an aged mouse over the time scales studied here, thus any reversal that was observed is very likely due to random measurement error. Other components which I have doubts about reversibility are alopecia, loss of fur color, loss of whiskers, tumors, kyphosis, hearing loss, cataracts, corneal capacity, vision loss, rectal prolapse, genital prolapse.

      In summary, I applaud the authors' efforts in generating complex models to better understand longitudinal aging data. This is an important area that needs further development. I appreciate their conceptualization of resilience and robustness and think this framework has an important place in aging research. I also appreciate their cross-species approach. However, the authors may have over-conceptualized and made some assumptions about the mouse data which may not be valid. It will be important to assess the results with careful consideration of the time scales of the underlying biology and the resolution and measurement error inherent to these tools.

      For each of our mouse attributes, there are published studies demonstrating reversibility (see our new Supplementary Table 1). Nevertheless, we cannot distinguish what causes the observed discrete transitions (measurement error, stochastic fluctuations in underlying organismal features, or logisticlike continuous transitions in underlying continuous variables). We analyze the discrete data as given.

      The question of time-scale is interesting. From survival curves of individual binarized attributes, we obtain reasonable fits to exponential models (i.e. a single timescale) see Fig 5 supplement 1 and 2. For the human data there are a broad range of timescales for both robustness and resilience. For the mouse data there appears to be a similarly broad range (note the logarithmic scale) though with considerable uncertainty. We work with the data we have, so we are unable to probe shorter timescales than the measurement interval (months for mice, and years for humans). We have reinforced this caveat in the discussion.

      Reviewer 2

      This study uses repeated measurements of the frailty index (FI), composed of multiple binary parameters. It is posited that newly detected changes in the number of these parameters represent damage and that the parameters that have previously been detected but are not detected currently represent damage repair. Statistical treatment then follows, deriving resilience and robustness and their changes over time. This is an interesting idea. Strengths of the study include analyses across species (mice and humans), including multiple datasets in mice.

      To be clear, our data analysis is on the binary health attributes that are used in the FI. By considering the damage/repair (binary transitions) of individual attributes, we can obtain the aggregate damage/repair rates.

      What are the elements of FI that increase at each period of life, and what are those that decrease? For example, humped phenotype or alopecia are more likely to appear in old mice and are essentially irreversible, whereas weight loss due to infection may be more common in young mice and is reversible. Therefore, the choice of health deficits would affect the model and, for example, may artificially lead to a decreased value of what the authors call damage repair.

      More generally, information on the frailty index lacks sufficient details. I doubt that this method has sufficient accuracy to draw conclusions from as little as 32 female mice (21 + 11 animals in datasets 1 and 2) and 63 males (13 + 6 + 44 animals in datasets 1, 2 and 3). Also, only 25 enalapril-treated mice of each sec were analyzed, and only 17 exercised mice (11 females and 6 males). The number of human participants is large, but the total follow-up period is not shown, and the subjects were assessed based on 23 parameters only.

      We have not examined other choices of health attributes. While we picked standard sets from available data, we do not know whether other attributes would behave differently. It would be difficult to do our detailed modelling on single attributes in the mouse data, since the data is so sparse. Our approach was developed specifically to be able to draw conclusions from limited mouse data. Where possible we aggregate the individual mice, sex, health attributes, studies, and measurement times. The analysis of human data shows that the approach generalizes.

      While we have mostly not studied individual attributes (we have considered survival times, but without age or time effects), we would expect that some of them may have behavior that qualitatively differs from our aggregate results. If attribute selection was biased towards (or away from) qualitatively distinct behaviors that would, of course, be reflected in aggregate results. We suspect that this would be unlikely, but that any such distinctive behavior would be interesting and important to identify and understand. We have added some discussion on this point, since we cannot exclude this possibility.

      A key assumption in this work is that increased FI is equivalent to the rise in damage. However, the relationship between changes in FI and damage is unknown. One can imagine a situation when damage increases, but protection also increases. In this case, fitness may increase, decrease or remain unchanged. What is the basis for calling an increased number of health deficits damage? Is there a more reliable method to measure damage that could support the authors' claims?

      See also discussion point #1 in essential revisions. We call binary states 0 “healthy” and 1 “damaged”, but we could instead say “more healthy for most individuals” and “less healthy for most individuals” – where “healthy” means associated with desirable (low FI and low mortality) health outcomes. We have not explored other measures of organismal damage. We have not explored how interactions between variables could affect resilience or robustness for individuals. We do not think that alternative approaches would be easy to study without much more data (for mice) that is more finely resolved in time (for mice and humans). We are quite happy to have found an approach to use with binarized data, but would welcome viable alternative approaches to compare with.

      Reviewer 3

      In this work, the authors aimed at investigating two related components of aging-related processes of health deficits accumulation in mice and humans: the processes of damage (representing the robustness of an organism) and repair (corresponding to resilience), and at determining how different interventions (the angiotensin-converting enzyme inhibitor enalapril and voluntary exercise) in mice and a representative measure of socio-economic status (household wealth) in humans affect the rates of damage and repair. Two key elements in this study allowed the authors to achieve their goals: 1) the use of relevant data containing repeated measurements of health deficits from which they were able to compute the cumulative indices of health deficits in mice and humans and which are also necessary to evaluate the processes of damage and repair; 2) the methodological approach that allowed them to formulate the concepts of damage and repair, model them and estimate from the available data. This methodological framework coupled with the data resulted in important findings about the contribution of the age-related decline in robustness and resilience in health deficits accumulation with age and the differential impact of interventions on the processes of damage and repair. This provides important insights into these key components of the process of aging and this research should be of interest to both lab researchers who plan experimental studies with laboratory animals to study potential mechanisms and interventions affecting health deficits accumulation as well as researchers working with human longitudinal studies who can apply this approach to further investigate the impact of different factors on robustness and resilience and their contribution to the overall health deterioration, onset of diseases and, eventually, death.

      The key strength of this work is a rigorous analytic approach that includes joint modeling of longitudinal measurements of health deficits and mortality (in mice). This approach avoids biased inference which would be observed if longitudinal data were analyzed alone, ignoring attrition due to mortality. Another strength is a comprehensive analysis of both laboratory animal data that allows exploring the impact of different interventions on the processes of damage and repair and human data that allows investigating disparities in these processes in individuals with different socioeconomic backgrounds (represented by household wealth).

      One weakness (which is commonplace for human studies) is self-reported data on health deficits in humans which makes it difficult to compare with lab data where deficits are assessed objectively by lab researchers. The subjective nature of health deficits measurements complicates the interpretation of findings, especially about repairs of deficits. In addition, it is not clear whether the availability/absence of caregivers at different exams/interviews factors into the answers on difficulty/not difficulty with specific activities constituting health deficits and, respectively, into their change over time reflected in damage/repair estimates.

      Variability of the evaluator is expected in any longitudinal study, and amounts to a variety of measurement error. The question of whether there are age-effects in the measurement error, such as bias or age-dependent variability is interesting. For the mouse data, evaluator training is designed to minimize such errors and inter-evaluator differences are not large (Feridooni et al, 2015; Kane et al, 2017). For the human self-report data any such age-effects are unavoidable.

    1. Author Response

      Reviewer 1

      Sadeh and Clopath analyze two mouse datasets from the Allen Brain Atlas and show that sensory representations can have apparent representational drift that is entirely due to behavioral modulation. The analysis serves as a caution against over-interpreting shifts in the neural code. The analysis of data is coupled with careful modeling work that shows that the behavioral state reliably shifts sensory representations independently of stimulus modulation (rather than acting as a gain factor), and further show that it is reproducibly shifted when the behavioral state is adequately controlled for. The methods presented point towards a more careful consideration and measurement of behavioral states during sensory recordings, and a re-analysis of previous findings. The findings held up for both standard drifting grating stimuli as well as natural movies.

      The fact that neurons may have different tuning depending on the behavioral state of the animal raises obvious questions about readout. The authors show that neurons with strong behavioral shifts should simply be ignored and that this can be achieved if the downstream decoder weights inputs with more stimulus information. While questions remain about why behavior shifts representations and how that could be more effectively utilized by downstream circuits, the results presented clearly show that sensory representations might not always be simply drifting over time, and will spark some careful analysis of past and future experimental results.

      Many thanks for a clear summary of the work and emphasizing the significance of the results.

      Reviewer 2

      Studies from recent years have shown that neuronal responses to the same stimuli or behavior can gradually change with time - a phenomenon known as representational drift. Other recent studies have shown that changes in behavior can also modulate neuronal responses to a given sensory stimulus. In this manuscript, Sadeh and Clopath analyzed publicly available data from the Allen Institute to examine the relationship between animal behavioral variability and changes in neuronal representations. The paper is timely and certainly has the potential to be of interest to neuroscientists working in different fields. However, there are currently several important issues with the analysis of the data and their interpretations that the authors should address. We believe that after these concerns are addressed, this study will be an important contribution to the field.

      We really appreciate the time and the effort the reviewer(s) have taken to evaluate our results and analysis in detail. Their comments are very relevant and critical to the improvement of the manuscript. We explain below how we addressed their various comments and concerns

      1. The manuscript raises a potential problem: while previous work suggested that the passage of time leads to gradual changes in neuronal responses, the causality structure is different: i.e., the passage of time leads to gradual changes in behavior, which in turn lead to gradual changes in neuronal responses. The authors conclude that "variable behavioral signal might be misinterpreted as representational drift". While this may be true, in its current form, the paper lacks critical analyses that would support such a claim. It is possible that both factors - time and behavior - have a unique contribution to changes in neuronal responses, or that only time elicits changes in neuronal responses (and behavior is just correlated with time). Thus, the authors should demonstrate that these changes cannot be explained solely by the passage of time and elucidate the unique contributions of behavior (and elapsed time) to changes in representations.

      This is a very important point and we addressed it with new analyses, by dedicating a new figure (Figure 1–figure supplement 5) and a new part of the Results section to it. The results of our new analyses show that strong representational drift mainly exists in those animals/sessions with large behavioral changes between the two blocks, and that in animals/sessions with small behavioral changes, such drift is minimal, despite the passage of time (see our responses below to Major comments for further details).

      1. There are also several issues with the analysis of the data and the presentation of the results. The most concerning of which is that the data shows a non-linear (and non-monotonic) relationship between behavioral changes and representational similarity. In many of the presented cases, the data points fall into two or more discrete clusters. This can lead to the false impression that there is a monotonic relationship between the two variables, even though there is no (or even opposite) relationship within each cluster. This is a crucial point since the clusters of data points most likely represent different blocks that were separated in time (or separation between within-block and acrossblock comparisons).

      This is an important concern. To address this, we analyzed the source of the non-monotonic relationship / opposite trend in the data and demonstrated the results in a new figure (Figure 4–figure supplement 2). Our results show that the non-monotonic relationship does not compromise the result of our previous analysis. Furthermore, it suggests that the non-monotonic / opposite trend is emerging as a result of more complex interactions between different aspects of behavior. We have also shown, in separate analyses, that the passage of time is not the main contributing factor to representational drift, rather large behavioral changes are correlated with strong drifts between the two blocks of presentation (Figure 1—figure supplement 5, and Figure 3—figure supplement 2).

      More generally, we did not intend to claim that the relationship with behavioral changes is linear or/and monotonic. We used linear analysis just to show the main trend of decrease in representational similarity with large behavioral changes. Any other analysis should assume some form of nonlinearity, but because the nonlinear relationships between behavior and activity were complex, it was not easy to assume such nonlinearity.

      We in fact tried to use two other ways of analysis, nonlinear correlations and generalized linear models (GLM), but there were issues hindering a proper use of each analysis. Nonlinear correlations assume a specific type of nonlinearity, but the nature of nonlinearity underlying the data is not clear (in fact, it looks to be different in different example non-monotonic trends in the data). We could not, therefore, assume a nonlinearity that best fitted all the data; we believe the nature of this nonlinearity, or how behavior modulates neuronal activity in a nonlinear manner, is in itself an interesting and open question for future investigation, but beyond the scope of this study. GLM did not provide useful results either, as the relationship between behavioral changes and neural activity/representational similarity was state-dependent and transitioning between nonlinear states, therefore hindering the usage of linear methods.

      We therefore opted for the simplest analysis which can show and quantify this dependence - emphasizing that further analyses are in fact needed to get to the bottom of the exact nonlinear relationship (for further details, see the responses below to Major comments).

      1. The authors also suggest that using measures of coding stability such as 'population-vector correlations' may be problematic for quantifying representational drift because it could be influenced by changes in the neuronal activity rates, which may be unrelated to the stimulus. We agree that it is important to carefully dissociate between the effects of behavior on changes in neuronal activity that are stimulus-dependent or independent, but we feel that the criticism raised by the authors ignores the findings of multiple previous papers, which (1) did not purely attribute the observed changes to the sensory component, and (2) did dissociate between stimulus-dependent changes (in the cells' tuning) and off-context/stimulus-independent changes (in the cells' activity rates).

      That’s a very valid point. As population vector correlations are used quite often in (experimental and theoretical) works on representational drift, we wanted to highlight the pitfalls of such a metric in dissociating between sensory-evoked and sensory-independent components. However, as the reviewers have mentioned, these two aspects have been separated and addressed independently in some of the past literature in the field. For instance, as we discussed in the Discussion, Deitch et al. (Current Biology, 2021) have calculated this for different metrics, including tuning curve correlations, which can potentially alleviate this problem:

      A recent analysis of similar datasets from the Allen Brain Observatory reported similar levels of representational drift within a day and over several days5. The study showed that tuning curve correlations between different repeats of the natural movies were much lower than population vector and ensemble rate correlations5; it would be interesting to see if, and to which extent, similarity of population vectors due to behavioural signal that we observed here may contribute to this difference.

      We tried to highlight these contributions better in the revised manuscript (see further on this below in our responses to Major comments).

      1. Another important issue relates to the interchangeable use of the terms 'representational drift' and 'representational similarity'. Representational similarity is a measure to identify changes in representations, and drift is one such change. This may confuse the reader and lead to the misconception that all changes in neuronal responses are representational drift.

      We thank the reviewer(s) for raising this point. We have clarified our use of the terms representational similarity and representational drift in the revised manuscript. Specifically, we have quantified representational drift index between the two blocks according to a previously used metric (RDI; Marks & Goard, 2021) in our new analysis (Figure 1–figure supplement 5).

      For the main part of the paper, however, we have decided to base our analysis on representational similarity (RS), and to evaluate the drop of RS with changes in behavior. Our reasoning for this is twofold. First, any measure of representational drift should ultimately be a function of the representational similarity. The measure we used above, for instance, is calculated as RD = (RS_ws - RS_bs)/(RS_ws + RS_bs) (Marks and Goddard, 2021), with RS_ws and RS_bs referring to the average representational similarity within a session or between different sessions. However, RS contains more information, especially with regard to fine-tuned changes - the above metric, for instance, averages all the changes within each block of presentation. By focusing on the basic function of representational similarity, we could capture both the gross changes between the blocks as well as more nuanced changes that can arise within them, especially with regard to behavioral changes. Another aspect that would have been lost by only using the usual metric of representational drift is the direction of change. In our analysis, we in fact found that the average RS increased within the second block of presentation, which might be contrary to the usual direction of drift. We found this unconventional change of RS interesting and informative too. We could highlight that, presenting the raw RS provided a better analysis strategy. Based on these reasons, we think representational similarity would be a better metric to base our analyses upon, although we have now calculated a conventional representational drift index for comparison too.

      Reviewer 3

      Although it is increasingly realized that cortical neural representations are inherently unstable, the meaning of such "drift" can be difficult or impossible to interpret without knowing how the representations are being read out and used by the nervous system (i.e. how it contributes to what the experimental animal is actually doing now or in the future). Previous studies of representational drift have either ignored or explicitly rejected the contribution of what the animal is doing, mostly due to a lack of high-dimensional behavioural data. Here the authors use perhaps the most extensive opensource and rigorous neural data available to take a more detailed look at how behaviour affects cortical neural representations as they change over repeated presentations of the same visual stimuli.

      The authors apply a variety of analyses to the same two datasets, all of which convincingly point to behavioural measures having a large impact on changing neural representations. They also pit models against each other to address how behavioural and stimulus signals combine to influence representations, whether independently or through behaviour influencing the gain of stimuli. One analysis uses subsets of neurons to decode the stimulus, and the independent model correctly predicts the subset to use for better decoding. However, one caveat may be that the nervous system does not need to decode the stimulus from the cortex independently of behaviour; if necessary, this could be done elsewhere in the nervous system with a parallel stream of visual information.

      Overall the authors' claims are well-supported and this study should lead to a re-assessment of the concept of "representational drift". Nonetheless, a weakness of all analyses presented here is that they are all based on data in head-fixed mice that were passively viewing visual stimuli, such that it is unclear what relevance the behaviour has. Furthermore, the behavioural measurements available in the opensource dataset (pupil movements and running speed) are still a very low dimensional representation of what the mice were actually doing (e.g. detailed kinematics of all body movements and autonomic outputs). Thus, although the authors here as well as other large-scale neural recording studies in the past decade or so make it clear that relatively basic measures of behaviour can dramatically affect cortical representations of the outside world, the extent to which any cortical coding might be considered purely sensory remains an important question. Moreover, it is possible that lowerdimensional signals are overly represented in visual areas, and that in other areas of the cortex (e.g. somatosensory for proprioception), the line between behaviour parameters and sensory processing is blurred.

      Many thanks for the clear and insightful summary of the results, significance and caveats of our analysis. We totally agree with this critical evaluation - and suggestions for future work.

    1. Author Response

      Reviewer 2

      In the manuscript, the cellular deformation that is due to the shear stress generated in a classical microfluidic channel is used to deform detached cells that are moving in the flow. A very elegant point of the paper is that the same cells are used in the provided software to determine the fluid flow, which is a key parameter of the method. This is particularly important, as an independent way to crosscheck the fluid flow with the expected values is important for the reliability of the method. Instead of complicated shape analysis that are required in other microfluidic methods, here the authors simply use the elongation of the cell and the orientation angle with respect to the fluid flow direction. The nice thing here is that a well-known theory from R. Roscoe can be successfully used to relate these quantities to the viscoelastic shear modulus. Thanks to the knowledge of the fluid flow profile, the mechanical properties can be related to the tank treading frequency of the cells, which in turn depends on the position in the channel, and the flow speed. Hence, after knowing the flow profile, which can be determined with a sufficiently fast camera, and the actual static cell shape, it is possible to obtain frequency dependent information. Assuming then that cells do have a statistically accessible mean viscoelastic property, the massive and quick data acquisition can be used to get the shear modulus over a large span of frequencies.

      The very impressive strength of the paper is that it opens the door for basically any, non-specialized cell biology lab to perform measurements of the viscoelastic properties of typically used cell types in solution. This allows to include global mechanical properties in any future analysis and I am convinced that this method can become a main tool for a rapid viscoelastic characterization of cell types and cell treatment.

      Although it is both elegant and versatile, there remain a couple of important questions open to be further studied before the method is as reliable as it is suggested by the authors. A main problem is that the model and the data simply don't really work together. This is most prominent in Figure 3a. This is explained by the authors as a result of non-linear stress stiffening. Surely this is a possible explanation, but the fact that the question is not fully answered in the paper makes the whole method seems not sufficiently backed. I agree that the test with the elastic beads are beautiful, but also here the results obtained with the microfluidic method and the AFM seem not to match sufficiently to simply use the proposed model in conjecture with a single power law approach to fully translate the single frequency data into a frequency dependent plot. There are more and more hints that two power law models are more reasonable to describe cell mechanics. If true this would abolish the approach to exploit only a single image to get the mechanical power law exponent and the prefactor in a single image. Despite all the excitement about the method, I have the feeling that the used models are stretched to their extreme, and the fact that the only real crosscheck (figure 3a) does not work for the power law exponent undermines this impression.

      We had assumed that the probing frequency equals the tank treading frequency. This is incorrect. As the cell undergoes a full rotation, any given volume element inside the cell is compressed twice and elongated twice. Hence, the frequency with which the cell is probed is twice the tank-treading frequency. This correction shifts the G’ and G” versus frequency curves to the right (by a factor of two), and in addition, the G” data points are shifted (increased) by a factor of two (Eq. 17). This also increases the fluidity alpha (and hence the slope of the power-law relationship) roughly by a factor of two (Eq. 22), and since the actual slope of the G’ and G” versus frequency data “cloud” is unchanged by the correction, the single power-law description now describes the data much better (see new Fig. 3a).

      Regarding the critique that models are stretched to their extreme: The Roscoe model assumes that cells behave as the visco-elastic continuum-mechanics equivalent of a Kelvin-Voigt body consisting of an elastic spring in parallel with a resistive (or viscous) dash-pot element . This then gives rise to a complex shear modulus with storage modulus G’ and loss modulus G”, measured at twice the tank treading frequency 𝜔. Roscoe makes no assumptions whatsoever about how G’ and G” might change as a function of frequency. Hence, our “raw” G’ and G” data, e.g. in Fig. 3a, are obtained without any power law assumption.

      One could leave it at that, as the reviewer suggests below, and only present the raw G’ and G” vs. frequency plots. However, this would also make it nearly impossible to compare our measurements to those obtained with other techniques that operate at different, non-overlapping time- or frequency-scales. For such a comparison to work, one needs a model to predict how G’ and G” scale with frequency.

      A commonly used and very simple model to predict how G’ and G” scale with frequency, which is also the model used by Fregin et al. and many others, is that of a Kelvin-Voigt body consisting of an elastic spring in parallel with a resistive element (dash-pot), both with a frequency-independent stiffness and resistance (viscosity), respectively. However, our data show that G’ and G” of different cells, all measured at different tank-treading frequencies, exhibit a behavior that is very unlike that of a simple Kelvin-Voigt body with a constant, frequency-independent stiffness and resistance. In this case, G’ would be flat (power law exponent zero), and G” would increase proportional with frequency (power law exponent of unity). This is clearly not what our data show.

      Rather, we find that G’ and G” increase with increasing frequency according to a power law, with the same exponent 𝛼 for G’ and G”. At high frequencies (beyond the range of our microfluidic method, but in the range of our AFM measurements), G” increases more strongly with frequency, akin to a Newtonian viscosity (power law exponent of unity), which we take into account in the case of the AFM measurements. A large number of publications have shown that many types of cells, including cells in suspension, follow power law rheology, regardless of the measurement method. Also the AFM measurements that we include in this study support the validity of power-law rheology.

      Power law rheology predicts a peculiar behavior: The ratio of G”/G’ in the low-frequency regime (where the high-frequency viscous term is not yet dominating) must be equal to tan(𝛼𝜋/2), for mathematical reasons (Eq. 22). With our correction (that the probing frequency is twice the tank-treading frequency), we find that Eq. 22 correctly predicts the power-law exponent of the G’ and G” vs. frequency data.

      Note that we actually do not fit a power law model (Eq. 1) to the population data of G’ and G” vs. frequency in Fig. 3a. The G’ and G” data are obtained by applying Roscoe-theory, without any further assumptions such as power-law rheology. Only the lines shown in Fig. 3a that go nicely through the data are a prediction of how a typical cell (selected from the mode of the joint probability density of alpha and k, see Fig. 3b) would behave if we had measured it at different frequencies, under the assumption that this cell follows power law rheology, based on Eq. 22. With this assumption, we can directly convert the measured G’ and G” of any cell into a stiffness k and power law exponent 𝛼 using Eqs. 21 and 22 - no fit is needed here.

      Since we only measure two parameters for any given cell at twice its tank-treading frequency, namely strain and alignment angle, we can only extract two parameters for each cell (i.e., G’ and G”, or k and alpha) but not a third parameter. In essence, the reviewer expresses concerns that the G' and G" behavior of a typical cell, when extrapolated to higher or lower frequencies, may not necessarily match the frequency behavior of the entire cell population (Fig. 3a). However, our data show that a single (typical) cell that was measured at a single mid-range frequency comes remarkably close to describing the G’ and G” versus frequency behavior of all other cells.

      The reviewer suggests that a power law model with two exponents may be able to even more accurately describe the mechanics of the cell population. This is certainly correct, and in particular when cell mechanics is measured over a larger range of frequencies or strain rates, as we have done here using AFM, we find that at higher frequencies, G” deviates from a weak power law and merges into a different power law with a larger slope (i.e., power law exponent) that approaches unity or a value close to unity, akin to a Newtonian viscous term. Therefore, the single power law expression (Eq. 1) is not sufficient for the AFM data, and we use Eq. 2 instead. However, in the case of our shear stress cytometry measurements, the tank-treading frequency remains below the range where this second power law behavior becomes prominent. Therefore, the Newtonian viscosity term of Eq. 2 cannot be fitted with reasonable fidelity to the data from a single measurement.

      In the case of polyacrylamide beads, we start to see a hint of an upward trend in G” versus frequency at tank-treading frequencies of around 10 Hz, and therefore have performed a global fit with Eq. 2 to the shear flow data where we keep the Newtonian viscous term constant for all conditions (different shear stresses and bead stiffnesses).

      The reviewer furthermore cautioned that mechanical non-linearities such as strain stiffening may distort or otherwise bias the results. As the reviewer brings up this issue in more detail below, we have addressed it there.

      Regarding the concern that “results obtained with the microfluidic method and the AFM seem not to match sufficiently to simply use the proposed model in conjecture with a single power law approach to fully translate the single frequency data into a frequency dependent plot.”:

      First, we tend to agree more with the opinion of Reviewer #1 who found it remarkable that results obtained with the microfluidic method and the AFM method are actually fairly similar. Now that we have introduced the correction that the probing frequency is twice the tank-treading frequency, the cells in suspension turn out to be softer and more fluid-like compared to the cells measured with AFM. But there are many more commonalities between the AFM data and the shear flow data, which we list above in our reply to reviewer #1, the most relevant here is that cells show power-law behavior both when measured with AFM and with our new method.

      Second, we did not use a single power law to fit the AFM data. Rather, we used Eq. 2, which contains two power law relationships (the second power law exponent of unity for the Newtonian viscosity therm is usually not explicitly written). However, the origin of the Newtonian viscosity therm arises mainly from the hydrodynamic drag of the cantilever with the surrounding liquid, and less so from the cells. This hydrodynamic drag is absent in our shear flow deformation cytometry method, and moreover the tank treading frequency of most cells remains far below 10 Hz where an additional Newtonian viscosity therm does not yet come into play.

      Third, we disagree that Fig. 3a is “the only real crosscheck for the power law exponent”. The inverse relation that we see between the power law exponent and the stiffness of individual cells (Fig. 3b) has been previously reported for different cell types and methods. Moreover, we find a power law exponent close to zero for PAA beads at small strain values, which is to be expected for a predominantly elastic material such as PAA. We think that this last result is a particularly convincing experimental cross-check.

  3. Jul 2022
    1. Author Response

      Reviewer #1 (Public Review):

      My primary criticism of this paper is that it misses the opportunity to give some key details about the statistics of neural activity during 'ripples' rather than studying identified replay events. A secondary criticism is that they limit their analyses to neurons that have place fields in both environments. I think the activity of the other 3 categories of neurons (active in Track 1 only, active in Track 2 only, and not active in either track) are also of critical interest.

      We agree with the reviewer that it is important to demonstrate that the main observations are not due to a small subset of neurons or replay events. We have described above the inclusion of Figure 1- figure supplement 6, where the threshold for replay detection is made less stringent and the ratio of significant replay events/candidate replay events are now reported in the manuscript. To address the concern that the analysis is limited to neurons only with place fields on both tracks, we have added four more subpanels to Figure 1-figure supplement 6, where we perform our regression analysis on all spatially tuned (pyramidal) neurons (Figure 1-figure supplement 6E), neurons with only place fields on one track (track 1 and track 2 neurons will be in the upper right and lower left quadrant of plot respectively, Figure 1-figure supplement 6F), neurons with peak amplitude <1Hz on each tracks (Figure 1-figure supplement 6G) and finally, interneurons (Figure 1-figure supplement 6H). Consistent with our previous findings, we observe significant regressions for POST replay events for all spatially tuned neurons and neurons with place fields only one track. Conversely, neurons that were not active on either track and interneurons are not rate modulated by experience during replay.

      It is important to note that replay detection uses all spatially tuned cells, but the regression analysis is limited to cells active on both tracks in the main analysis. The reason for this is now explained in more detail in the revised manuscript (page 5):

      “It is important to note that a significant regression would be expected when analyzing neurons with a place field only on one track, as they are expected to participate in replay events of this track, while being silent during the replay of the other track. As such, our regression analysis only analyzed place cells active on both tracks and stable across the whole run (Figure 1-figure supplement 1B and see Methods).”

      Reviewer #2 (Public Review):

      This study by Tirole et al. addresses to what extent differences in firing rate that occurs during the awake experience of two different tracks are replayed during SWRs.

      In principle, this is a topic broadly relevant to our understanding of the circuit-level mechanisms and neural coding of memory, because it can provide insight into the ways in which experience is transformed into memory traces, and in particular, whether an entire coding modality (firing rate patterns) is available for replay. However, I didn't have an easy time situating this study in the context of the existing literature. When I first read the title, I expected this work was going to address the question of if there is replay of rate-remapped experiences, which is still an understudied topic (but see Takahashi, 2015) and would be important to examine. But once I realized that the two experiences here are actually more like global remapping, it was less clear to me what is novel here.

      My best guess about what's novel is that even though on the one hand, many studies have shown a distinguishable replay of two (or more) distinct experiences, e.g. different mazes like in Karlsson et al. 2009, different arms of a T-maze in Gupta et al. 2010, the overlapping central stem element of different trajectories in various mazes (Takahashi, 2015 and work from the Jadhav lab). On the other hand, there have been extremely detailed examinations of the contributions of firing rate changes (as distinct from temporal order or synchrony) as in Farooq et al. 2019. But perhaps the authors think that the intersection of those two kinds of work has not been studied, that is, how much do firing rate changes specifically contribute to the replay of two distinct experiences? In any case, regardless of whether I understood that correctly or not, the authors need to be more explicit in the introduction and discussion in contextualizing their work. I also suspect that the current findings are a direct logical consequence of putting together these well-established previous results; this would not mean the current work isn't a useful advance, but it would moderate the novelty and general interest.

      Beyond this overall question of how the work relates to the extant literature, I have a suggested modification to the data analysis. I think that the quality of the data and the care taken in the analyses were very high in general, so I do not have any major concerns, and the conclusions are very thoroughly supported. However, I wonder if there is a way to simplify some of the analyses and make them a bit more straightforward to interpret. As the authors have realized, there is potential for a circularity in the analysis, in the sense that to compare firing rate differences for two tracks between Track and Replay, Replay events first need to be assigned to one or the other (decoded) Track. But then any firing rate differences may be contributing to the output of the decoder, rendering the analysis circular. I understand the authors use various methods like the firing-rate-insensitive method in Figure 2 to deal with this crucial issue. But wouldn't a simpler way be to leave out the cell whose firing rates are being analyzed out of the decoding step so that the labeling of Replay events is independent of that cell? This seems an intuitive and rigorous way to address the central question the authors have. Is there some reason why that isn't done?

      We thank the reviewer for this feedback, and agree it is important to emphasize the novel contributions of the manuscript (as we see it), and clarify this further if needed. The reviewer is correct that there are several studies that have looked at rate remapping during reactivation. We have cited some of these, but have now updated our citations in the intro and discussion based on the comments here. While we have avoided directly criticizing a particular study in our earlier draft of the manuscript, these previous studies are affected generally by several issues: 1) replay detection methods were sensitive to rate modulation, creating a circular argument for the existence of rate modulation in replay. [Our study thoroughly addresses this with several controls]. 2) the analysis of reactivations rather than replay, which lacks the statistical rigor of sequence detection [we have focused on replay using a strict threshold for significance] 3) Replay/reactivations are analyzed for a single environment, making it difficult to distinguish between rate modulation and changes in the overall excitability levels of neurons maintained over behavior and sleep. [our studies uses two tracks to avoid this potential issue]. 4) When multiple contexts were decoded, neurons that only fired in one context were not removed from the analysis, artificially “inflating” any observed rate modulation. [we have circumvented this issue by only analyzing neurons with place fields in both environments]

      The suggestion to repeat the analysis and leave one neuron out for replay detection is excellent, however this was avoided due to the required processing time- to run our complete analysis takes more than a week, and repeating this for each possible “leave-one-out” combination would take significantly longer (this has to be done independently for each neuron). We used multiple controls (track rate shuffle, replay rate shuffle, rank order correlation- figure 2, figure 2—figure supplement 2) to eliminate any possibility that a neuron’s firing rate could influence replay detection. Specifically, for rank-order correlation based replay detection, each burst of spikes is only treated as a single event (median of spike times in the burst), which directly circumvents the problem of firing rate biasing replay event selection.

    1. Author Response

      Reviewer 1

      In general, I consider that the manuscript reflects a huge effort in terms work done and data collection, the manuscript is very well written, and it brings new knowledge in terms of cooperative breeding and its connection with groups size in ostrich. My major concerns are about the title and introduction that are in my opinion too broad and not enough detailed.

      In the introduction the scientific background that led to this research is lacking, and the manuscript would benefit from a more supported introduction, which makes it difficult to understand how far this study went comparatively to previous studies. The research work was well conducted, and adjusted to the study aims. However, it would benefit from including more details on the observational data collected by the authors.

      I think the research topic is interesting, and the study was well performed, but the manuscript would benefit from a more clear approach to the working hypothesis, expected results and background theories/hypotheses.

      We are very grateful for the positive and constructive feedback. The title and introduction have been revised according to the reviewer’s suggestions. We provide a more extensive introduction to the hypotheses being tested, which are now explicitly stated. The observational data we collected have been described in more detail and we integrate our observational and experimental data more thoroughly.

      In the evaluation summary, the reviewer highlights that we did not address some aspects of groups, such as relatedness and parentage. We have now added additional analyses to show these do not change the conclusions of our study (for details please see responses to reviewer 2 who raises similar concerns more extensively). These were not originally included in the manuscript as the aim of our study was to examine how group size and composition influence the average reproductive success for any given individual, irrespective of variation in relatedness and parentage within groups.

      Reviewer 2

      This work sets out to investigate experimentally the effect of differences in group size and group composition on reproductive behavior and success in ostrich groups. Direct field observations of the relationship between group composition/group size and reproductive success, do not allow for causal inference, as there may be several reasons why patterns may arise. For example, observing individuals having a higher reproductive success in larger groups than in smaller groups may not be a direct result of a larger group size per se, but it may be that higher quality individuals manage to establish themselves more often in larger groups. Hence, experimental manipulation of group size and group condition in natural contexts is important. 96 experimental groups of ostriches were established in fenced off areas in the Karoo in South Africa, varying the number of males (1 / 3) and the number of females (1 / 3 / 4 / 6) across groups. Groups were followed for almost a year, studying a period without parental care (eggs were removed and incubated in an incubator to measure reproductive success) and a period with parental care (eggs were left in the enclosures).

      In the latter case, behavioral observations were done to study nest incubation, and sexual conflict (interruptions of incubation). The study was done for seven years, and having such data on experimental manipulations in semi-wild conditions is very valuable. The combination of behavioral analysis, with careful tracking of the fate of eggs (by daily nest checks), the experimental nature, and measuring reproductive success make for a very complete analysis of the breeding ecology of this system and can serve as a blueprint for more of such work in the fields of cooperation, group living and breeding ecology.

      Some aspects, however, deserve more attention. First, at present, the origin and familiarity and possible relatedness among the group members of the experimentally composed groups is not discussed, and it may be that these factors play a role in shaping the results. Second, the reproductive measure used was the average number of chicks per sex, but it was not calculated at the individual level. There were no genetic analysis done to establish which individuals were actually successful in terms of reproduction. Since individual level selection is likely very important in this system, the results of average reproductive success need to be interpreted with great care. Third, the study was done under semi-natural conditions, meaning that the effects of other factors possibly shaping the success of group size and group composition in the wild (e.g., possible nest predation) were weakened. Finally, a closer connection between the experimental results on optimal group size, and whether this can actually be found in the dataset on natural variation in group size and group composition can be explored.

      We are very grateful for the careful review of our work and positive feedback. The suggestions and comments have been extremely helpful in revising the manuscript, which have led to the following changes:

      1) We have added details about the origin and familiarity of group members, together with extra analyses verifying that our results are not confounded by variation in within-group relatedness. The study population has a nine-generation pedigree allowing us to accurately estimate relatedness between individuals. In the design phase of the experiment, relatedness amongst individuals was kept low in accordance with data from natural populations, but there were related individuals of the same sex in some groups. We tested if the average relatedness within groups influenced the average number of chicks individuals produced and found no significant relationship (Supplementary file 1 – Tables S16 and S17).

      2) We have included genotyping analyses of 3227 offspring to verify that our non-genetic estimates of average reproductive success per sex (total chicks produced by groups / number of same sex individuals) accurately reflect measures obtained using genetic estimates of individual reproductive success. Genetic and non-genetic measures were highly correlated (R >0.95). We have added these verification analyses to the manuscript. The text has also been edited to further clarify that our aim is to estimate the average reproductive benefits for any given individual of being in group of a particular size, rather than examining differences in reproductive success between individuals within groups, for which genetic methods are required.

      3) We have clarified the advantages and limitations of experimental studies. As reviewer 2 highlights, observational studies alone do not provide causal insight into the factors influencing group size, but as reviewer 1 indicates, experimental studies can lack ecological context. Consequently, both have their merits. Experimental manipulations of entire social groups are currently lacking on large vertebrate cooperative breeders, but can be used to estimate the costs and benefits of living in different group sizes that arise independently of ecological conditions. The results of such experimental studies can be used as a benchmark against which other data can be compared, such as observational data on wild groups subject to ecological pressures, including nest predation. The discrepancies between experimental and observational data can then be used to infer the relative importance of social versus ecological factors in shaping social groups.

      4) We have added a figure (Figure 1 - figure supplement 1) and extended the discussion to better connect our experimental data with our observations of natural variation in group size.

    1. Author Response

      Reviewer 1

      This manuscript attempts to explain the well-known difference in DNA mutation rates between father vs. mother (paternal mutation is 4 times higher than maternal mutation in humans). Although the mutation rate difference was believed to arrive from the number of cell divisions (male germ cells undergo many more divisions compared to female germ cells), recent studies suggested that most mutations arise from DNA damage (which will be proportional to the absolute time) rather than DNA replication-induced mutations (which will be proportional to the number of cell divisions). The authors thus revisited the question as to why the paternal mutation rate is higher (if absolute time is more important than the number of cell divisions in causing mutations). They used 'taxonomic approaches' comparing paternal/maternal mutation rates of mammals, birds, and reptiles, correlating them to specifics of reproductive mode in these species. To measure paternal vs. maternal mutation rate, they compared the mutation rates of neutrally evolving DNA sequences between the X chromosome vs. autosomes, as well as the Z chromosome (utilizing the fact that the X chromosome will spend twice more generations in females than males, while autosomes spend equal time. Likewise, the Z chromosome will spend twice more time in males than in females, while autosomes spend equal time).

      They first confirm the paternal bias across a broad range of species (amniotes), eliminating many species-specific parameters (longevity, sex chromosome karyotype (XY vs. ZW), etc) as a contributor to the paternal bias. This implies that something common in males in these broad species causes paternal bias. They show that in mammals, the paternal bias correlates with a generation time. They propose that the total mutation is determined by the combination of the mutation rate during early embryogenesis (when both male and female have the same mutation rate) and the later mutation rate when two sexes exhibit different mutation rates. This model seems to explain why generation time correlates well with the extent of paternal bias in mammals. However, this does not explain at all why birds do not exhibit any correlation with a generation time. The speculation on this feels rather weak (although there is nothing they can do about this. Fact is fact).

      The logic behind their analysis is well laid out and seems mostly sound. Their finding is of broad interest in the field.

      • I am confused by this statement (the last sentence in the result section): 'If indeed the developmental window when both sexes have a similar mutation rate is short in birds then, under our model, generation times are expected to have little to no influence on α." Based on their model, if the early period is gone, when the mutation rates are similar between sexes are similar, intuitively it feels that generation time influences α even more. Am I missing something? (if the period with the same mutation rate is gone, then females and males are mutating at different rates the whole time).

      We apologize for the lack of clarity, as we should have made clear that here we are assuming a fixed ratio of paternal to maternal generation times. Under that assumption, if female and male germ cells are accumulating mutations as a fixed rate over time, then for each sex, the number of mutations accumulated with time is a line that goes through the origin, and the ratio of the paternal-to-maternal slopes (α) will be constant regardless of the age of reproduction. In other words, if Me=0 in equation 1, then α would be constant for any fixed ratio Gm/Gf. We have revised this sentence to be clearer; lines 334-338 now read:

      If indeed the mutation rate in the two bird sexes differs from very early on in development (i.e., if term Me ≈ 0 in equation 1), then assuming a fixed ratio of paternal-to-maternal generation times, our model predicts the sex-averaged age of reproduction will have little to no influence on α.

      • The authors state that this paper provides a simple explanation as to why paternal biases arise without relying on the number of cell divisions. However, it seems to me that the entire paper relies on the recent findings that mutation arises based on absolute time (instead of cell division number), and the novelty in this paper is the idea of 'two-phase mutation rates' to explain the observed numbers of paternal bias in various species. Yet it fails to explain the mutation rate difference in birds. There is not enough speculation or explanation as to what determines different mutation rates in males of various species. Although the modeling seems to be sound and there is nothing that can be done experimentally, I felt somewhat unsatisfied at the end of the manuscript.

      We agree with the reviewer that our paper does not address why the ratio of paternal-tomaternal mutation rates is lower in birds than mammals, and had stated so explicitly (lines 358360): “Another question raised by our findings is why, after sexual differentiation of the germline, mutation appears to be more paternally-biased in mammals (∼4:1) than in birds and snakes (∼2:1).

      To try to gain more insight into this question, we are now analyzing mutations in a set of three generation pedigrees from birds and reptiles, which should allow us to obtain a direct estimate of α and characterize sex differences in the mutation spectra, which we can then compare to what is seen in mammals. While this analysis is beyond the scope of this manuscript, we now note how this question might be pursued (lines 360-362):

      In that regard, it will be of interest to collect pedigree data from these taxa, with which to compare mutation signatures to those typically seen in mammals.

      Reviewer 2 The primary goal of this paper is to re-assess the cause for the excess of male over female germline mutations seen in many animals. By re-analyzing X (Z) and autosomal substitution rates across 42 species of mammals, birds, and snakes, and fitting a model that allows for a constant and equal-sex embryonic mutation rate, along with a mutation rate that increases with age, the authors show that there is no need to invoke the model that assumes mutation rate depends strictly on numbers of cell divisions.

      Strengths 1. The paper challenges a dogma in evolutionary genomics, which states that males have a higher germline mutation rate than females. It establishes convincingly that the count of premeiotic mitotic divisions is NOT the primary driver of the excess male mutations, but instead, it is the intrinsic mutation rate in males (balance of DNA damage vs DNA repair) that accumulates over time.

      1. The authors establish a simple model where the number of mutations that accumulate each generation depends on the embryonic mutation rate (which is shown empirically to not differ between the sexes) and a post-maturity mutation rate, which has elevated male mutation (driven presumably by a shift in the balance between DNA damage and DNA repair). The model is very clear and intuitive described.

      2. The paper is extremely carefully thought-out, planned, and executed. Criteria for inclusion and exclusion of species in the phylogenetic work are clearly laid out. Similarly, decisions about filtering genomic regions (avoiding repeats, etc.) are well done and exhaustively documented. The standard of scholarship is very high - for example, the analysis of de novo mutation rates in mammals pulled in data from no fewer than 15 published studies.

      Weaknesses 1. The method of estimating alpha relies on the assumption that the mutation process (and rates) are the same in autosomes and sex chromosomes. There is an attempt to control for GC content and replication timing, but it is easy to imagine other factors at play, including the inactivation of one X in females, the extensive differences in chromatin modifications, especially of the X, that differ in males vs. females. The case of the cat X chromosome, with its 50 Mb of recombination cold spot and corresponding oddly slow substitution rate, might be just one example of features in other species that cause other perturbations in the substitution rate of the X. This does not seriously erode confidence in the results, but there is more potential for intrinsic mutation rates of sex chromosomes and autosomes to differ than is suggested by the authors.

      We agree with the reviewer that despite our attempts, we do not control for all factors that distinguish X and autosomes beyond exposure to sex. We had written that “while our pipeline may not account for all the differences between autosomes and X (Z) chromosomes unrelated to sex differences in mutation, the qualitative patterns are reliable.” and have now included a sentence to make this limitation clearer (lines 165-167):

      Nonetheless, it is unlikely that our regression model perfectly accounts for all the genomic features that differ between sex chromosomes and autosomes other than exposure to sex.”

      In turn, the assumption that mutation rates in X (Z) and autosomes differ only with regard to their exposure to sex (after accounting for base composition and other genomic features) is unproven; we now state this assumption explicitly in the Methods (lines 678-681). Nonetheless, it seems warranted by the high concordance of evolutionary- and pedigree-based estimates of alpha in humans, mice and cattle. With regard to the specific factors mentioned by the reviewer, excluding CpG sites has little effect on our qualitative conclusions for mammals (see Fig S1E), suggesting that DNA methylation differences between X and autosomes are not having a major influence on our findings. Moreover, X-inactivation in the germline of mammals (as distinct from the soma) is likely quite short-lived, given that it lasts around three days in early development of mice (Chuva de Sousa Lopes et al. 2008) and at most four weeks in humans (Guo et al. 2015). Thus, it is unlikely to be an important mutation rate modifier. We have now reworked three paragraphs in the main text to make the limitations above clearer (lines 127-175).

      1. The authors point out that the human mutations in spermatogonia are due to mutation signatures SBS5/40 ( which are known not to be correlated with cell division rates). The work on the nonhuman species could be greatly extended with this mutation spectrum approach. For each species, one could ask: Are the mutation spectra of the embryonic mutations consistent between males and females? What about the mutation spectra for the post-puberty individuals? Is alpha consistent across mutation signatures? Does the GC bias correction impact these inferences?

      Unfortunately, there is not enough de novo data to address this question outside of humans. In turn, the analysis of substitution data is unreliable, because of the differential impact of repeated substitutions at a site and the effects of GC-biased gene conversion.

      1. While the data do not suggest reasons WHY males display a higher mutation rate, it is fair to ask whether the evolutionary drive for a higher mutation rate might shape the mechanism whereby it happens. There is a certain amount of speculation in the paper as it is, and it is done in a way that is often well supported by data after the fact. Speculation about why males have an elevated mutation rate would not erode the overall quality of the paper, and I would expect that many readers would be eager to see what the authors have to say on the subject.

      As we envisage it, along the lines of Lynch’s models for the evolution of germline mutation (Lynch 2010), there is likely selection to keep the mutation rate as low as possible, subject to the constraints of the need to replicate DNA, repair damage, etc. efficiently. Why the attainable lower limit would be higher in males than in females is unclear to us, both mechanistically and in terms of evolutionary selection pressures. As we now note lines 353-355, a potential proximal cause is a greater effect of reactive oxygen species, a major source of DNA damage, in male germ cells than in oocytes (Smith et al. 2013; Rodríguez-Nuevo et al. 2022). Potential evolutionary causes are even less clear to us, but could be related to the greater competition among sperm vs. oocytes (added in lines 354-357).

      Another way to think about these results is as shifting the question somewhat, broadening it from the long-standing puzzle of the selection pressures shaping sex differences to asking what determines the relative mutation rates of different cell types, including oocytes and spermatagonia but also somatic cell types/tissues. We had previously written that “our results recast long standing questions about the source of sex bias in germline mutations as part of a larger puzzle about why certain cell types (here, spermatogonia versus oocytes) accrue more mutations than others.” We have revised the final paragraph of the Discussion to try to emphasize this point.

      Overall the paper achieves its intended goal of toppling the dogma that the excess male mutation rate is driven by number of rounds of cell division in spermatogenesis (compared to oogenesis).

    1. Author Response

      Reviewer 3

      The number of identified anti-phage defense systems is increasing. However, the general understanding of how phages can overcome such bacterial defense mechanisms is a black box. Srikant et al. apply an experimental evolution approach to identify mechanisms of how phages can overcome anti-phage defense systems. As a model system, the bacteriophage T4 and its host Escherichia coli are applied to understand genome dynamics resulting in the deactivation of phage-defensive toxin-antitoxin systems.

      Strengths: The application of a coevolutionary experimental design resulted in the discovery of a geneoperon: dmd-tifA. Using immunoprecipitation experiments, the interaction of TifA with ToxN was demonstrated. This interaction results in the inactivation of ToxN, which enables the phage to overcome the anti-phage defense system ToxIN. The characterization of the genomes of T4 phages that overcome the phage-defensive ToxIN revealed that the T4 genome can undergo large genomic changes. As a driving force to manipulate the T4 phage genome, the authors identified recombination events between short homologous sequences that flank the dmd-tifA operon. The discovery of TifA is well supported by data. The authors prepared several mutant strains to start the functional characterization of TifA and can show that TifA is present in several T4-like phages.

      In addition, they describe T4 head protein IPIII as another antagonist of a so far unknown defense system.

      In summary, the application of a coevolutionary approach to discover anti-phage defense systems is a promising technique that might be helpful to study a variety of virus-host interactions and to predict phage evolution techniques.

      Weaknesses: The authors apply Illumina sequencing to characterize genome dynamics. This NGS method has the advantage of identifying point mutations in the genome. However, the identification of repetitive elements, especially their absolute quantification in the T4 genome, cannot be achieved using this method. Thus, the authors should combine Illumina Sequencing with a longread sequencing technology to characterize the genome of T4 in more detail.

      We think the combination of Illumina-based sequencing and PCR analyses presented are more than sufficient to arrive at the conclusions drawn about the repeats that emerge in our evolved T4 clones.

      To characterize the influence of TifA during infection, T4 phage mutants are generated using a CRISPR-Cas-based technique. The preparation of these phages is unclearly described in the methods section. The authors should describe in detail whether a b-gt deficient strain was applied to prepare the mutants. Information about the used primers and cloning schemes of the Cas9 plasmid would allow the community to repeat such experiments successfully.

      We have added details to the Methods section to clarify and expand on our mutagenesis approach.

      The discovery of TifA would benefit from additional data, e.g. structure-based predictions, that describe the protein-protein interaction TifA/ToxN in more detail.

      We were unable to predict the ToxN-TifA interaction interface using AlphaFold, and we are currently conducting follow-up work to characterize how TifA neutralizes ToxN.

      Several publications have described that antitoxins can arise rapidly during a phage attack. The authors should address that this concept has been described before as well by citing appropriate publications.

      We believe that we have already addressed this point sufficiently in the Introduction of the manuscript, in which we discuss (1) the emergence of phage-encoded pseudo-toxI repeats to overcome P. atrosepticum toxIN and (2) the presence of the naturally-occurring antitoxins Dmd and AdfA in T4 and T-even phages, respectively. We also discuss the similarities between TifA, Dmd, and AdfA in the discussion of the manuscript. To our knowledge, these are the only known examples of antitoxins arising during phage attack outside of TifA, but we are happy to include additional citations of which the reviewers are aware.

      The authors propose that accessory genomes of viruses reflect the integrated evolutionary history of the hosts they infected. However, the experimental data do not support such a claim.

      We disagree with the reviewer’s comment, as our evolution experiment demonstrates the plasticity of the T4 genome during adaptation to different hosts, as well as showing that the T4 accessory genome includes genes necessary for infection of some, but not all hosts. The proposal also comes as the last sentence of the Abstract and is framed not as a conclusion, but as a proposal based on the work done here, with the clear intention of providing a sense of how future work may build off our work.

    1. Author Response

      Reviewer 1

      They adopted a comprehensive experimental and analytic approach to understand molecular and cellular mechanisms underlying tissue-specific responses against 3-CePs. They used two cell lines - BxPC-3 and HCT-15 - as example models for responsive and non-responsive cell lines, respectively. Although mutation rates didn’t differ by the drug treatment, they observed changes in cell cycle and expression of genes involved in DNA damage, repair and so on. Furthermore, they combined RNA-seq and ATAC-seq data and applied two approaches, pairwise and crosswise, to identify a number of gene groups that are altered in each cell line upon the drug treatment. Finally, they calculated enrichment of up/down genes in different cell lines, tumor types and samples to estimate potential responsitivity against the drug. This study is unique in in-depth analysis of RNA-seq and ATAC-seq data in identifying genetic signature underlying drug treatment. This study has the potential to be applied to different drugs and cell lines.

      We thank the reviewer for the precise and kind summary of our work.

      However, several major concerns need to be resolved. First of all, the biological and clinical performance of 3-CePs is not clearly described. They referenced several papers but they seem to have focused on the chemical properties of the drug. Without proven activity of 3-CePs against cancers in vitro and in vivo, the rationale of the study would be compromised.

      We apologize for not being clear enough when introducing previous findings on the differential sensitivity of HCT-15 and BxPC-3 cancer cell lines to 3-CePs. In the revised manuscript, we now cite references on the preferential activity of these agents against the pancreatic cancer cell line in 2D and 3D in vitro cancer models (see lines 71-74, 128-129). These compounds have been selected to exemplify the use of the pipeline in drug discovery and early-stage of drug development: indeed, only cellular data are available for these molecules, which have not yet been characterized in vivo. The pipeline itself offered a final perspective on directions to take for their further development, i.e. most sensitive tumor types to target (PAAD, KIRC).

      Their RNA-seq analysis was focused on discovering differentially expressed genes between cell lines, time points, etc. Interestingly, they found that DNA damage and repair signal was specifically increased in HCT-15. But is this approach capable of finding signals that are constitutively expressed in different cell lines? In other words, what if the differential responsiveness to 3-CePs was already there even before the drug was introduced?

      We thank the reviewer for pointing out such key concept. The premise for the developed approach is that factors determining the overall cellular sensitivity to a treatment must be determined by intrinsic characteristics of the cell line. For this reason, we built the sensitivity signature on basal transcriptome profiles, where we prioritized a subset of genes based on perturbational evidence (perturbation-informed basal signature).

      Beyond signature genes, we show in figure R1 (see above) the results of a GSEA analysis on the whole overlap (300 genes) between DE genes from the baseline comparison (BxPC-3 ctrl vs HCT-15 ctrl) and those from the 6 h M treatment comparison, in the sensitive cell line (BxPC-3 M 6 h vs BxPC-3 ctrl). Pathways like ribosome biogenesis, ROS metabolism, UPR also arise, attesting that genes activated in response to the treatment also have a constitutively different expression in unperturbed cells.

      Are there any overlapping signals between pairwise vs crosswise approaches?

      We thank the reviewer for this question. To make it easier for the reader to compare the output from the two types of integration and to intuitively grasp their functional overlap, we changed the visualization of the results from the pairwise approach (Figure 4 D).<br /> Indeed, some functional pathways both new or already emerging from previous analysis, arise from both integrations. This overlap has now been directly discussed from the functional point of view in the main text (from line 348 and in the following crosswise integration paragraph).

      Genes used as input in both types of integration are DE or DAR-associated, so this means that many of the hits that we find having the same double regulation (pairwise) also appear in CoCena modules. Among them, only few hits show both 1) the same double regulation in a specified comparison (as suggested by crosswise) and also 2) end up having the similar pattern of regulation across all conditions (contributing to the same CoCena module, one of the strengths of the crosswise integration). Indeed, while the pairwise integration checks one single comparison per time, CoCena checks the pattern throughout conditions providing a more holistic view of the gene regulation (e.g one gene can have a different pattern across conditions at the transcriptional and chromatin level). This is due to the biological fact that RNA and chromatin regulation is not 1:1 (also, for instance, from a timing perspective).

      The major added value of the two approaches consists in their intrinsically different output information. Within a specific comparison, the pairwise integration detects genes consistently activated at the transcriptome and chromatin level. At this information level gene set enrichment can simplify the coherent functional role of this set of genes; we now report this extra information in figure 4 to provide a more granular description of the pairwise integration. Instead, CoCena analyzes the pattern throughout conditions, and clusters together genes and peaks that behave similarly. Functional annotation of genes behaving similarly can put together promoters and/or transcripts that together may orchestrate a specific process (as highlighted by GSEA on each module).

      Probably a similar question with the above: is this methodology applicable to other drugs in addition to 3-CePs?

      To address this extremely important point, that we agree with the reviewer would be key to prove the versatility of our approach, we further applied the pipeline to the prediction of cancer cell lines’ sensitivity to cisplatin, a thoroughly reported broad-acting chemotherapeutic also acting as a DNA damaging agent. Results strongly supported the broad applicability of our approach, which was able to predict sensitivity to this reference drug with extremely high accuracy.

      Reviewer 2

      Carraro et al. describe a framework to understand MoA and susceptibility of drug candidates by integrating RNA-seq and ATAC-seq information. More specifically, by collecting drug responses from high-sensitive and low-sensitive cell lines, the authors identified a key set of pathways with co-expression analysis, and further predicted sensitivity of different cancer cell lines.

      The authors provided a new bioinformatics pipeline to integrate multi-omics data (RNA-seq and ATAC-seq) in a drug response study. This approach increased detection power and identified additional key pathways that are associated with drug 3-CePs. This framework has the potential to be applied to the general drug discovery process.

      We thank the reviewer for the precise summary of our study.

      However, the current manuscript failed to describe the integration methodology in a clear and concise way. Without a full understanding of the methodology, it’s tough to evaluate the downstream results in an unbiased manner.

      We apologize for not having included sufficient details in describing the difference between CoCena and the other two horizontal and vertical approaches. As already discussed in the response to Reviewer 1, we now included a more detailed description not only in the Methods section (from line 894) but also in the main text (lines 393-400).

      In addition, the authors didn’t mention how much additional value this multi-omics approach provided compared to the single-omic data set, as multi-omics approaches are more expensive and labor-intensive.

      We thank the reviewer for this valuable point. To better support the claim for multi-omics approaches, we have extended the Introduction (lines 96-98), as successful integration of information derived from multiple omic layers usually strengthens the determination of the major observed cellular responses. Here, this information helps dissecting and predicting how perturbations (here by drugs) can affect the overall cellular dynamics and mechanisms underlying a certain niveau of sensitivity. We agree with the reviewer that current costs are still prohibitive for large scale use of multi-layer omics in many settings, mainly when it comes to clinical use or drug development. However, significantly less expensive technologies (90% cost-reductions, lines 53-55) have recently been announced, which assures us that approaches as outlined here, will be applicable to many more clinical questions in the near future. Further, we show evidence that some cellular responses to the drug-induced perturbation was only revealed by applying multilayer analysis, but not by a single omics layer, e.g. TGF beta and EMT signaling (see lines 456-459).

      Reviewer 3

      Carraro et al utilize systems biology approaches to decode the mechanism of action of 3chloropiperidines (a novel class of cancer therapeutics) in cancer cell lines and build a drugsensitivity model from the data that they evaluate using samples from The Cancer Genome Atlas and cancer cell lines. The approach provides a framework for integrating transcriptomic and open-chromatin data to better understand the mechanism of action of drugs on cancer cell types. The author’s approach is of sound design, is clearly explained, and is bolstered by validation via holdout sets and analysis in new cell lines which lends the findings and approach credibility.

      The major strength of this approach is the depth of information provided by performing RNA-seq and ATAC-seq on cells treated with 3-CePs at various time points, and the author’s utilization of this data to perform pairwise and crosswise analyses. Their approach identified gene modules that were indicative of why one cell type was more sensitive to a particular drug compared to another. The data was then used to build a sensitivity model which could be applied to samples from The Cancer Genome Atlas, and the authors evaluated their sensitivity predictions on a set of cancer cell lines which validated the predictions.

      We thank the reviewer for the accurate recapitulation of our work.

      The major drawback to this type of approach is that it relies on next-generation sequencing (somewhat costly) and requires intricate bioinformatics analyses. While I agree with the author’s perspective that this approach can be applied to additional classes of drugs and cancer samples, I disagree with their view that it is efficient and versatile. However, for research teams with the means to perform both transcriptomic and open-chromatin studies, I think this integrated approach has promise for evaluating novel classes of drugs, particularly in cancer cell lines that are easy to manipulate in vitro.

      We thank the reviewer for this insightful comment. As with almost every technology, the early years are more difficult and at times adventurous. However, we have seen enormous improvements in robustness of the technology and significant cost reduction with more to come. Only recently sequencing technologies have been introduced into the market with a further 90% cost reduction (as stated in line 53-55). We are convinced that due to their increasing affordability and robustness, RNA-seq and ATAC-seq will be implemented routinely into clinical contexts. As a group working at the cross-section between drug discovery and bioinformatics, we hope that our current work, accompanied by a fair and detailed sharing of our scripts, will become a head start to run this type of analysis also by others in the field who are not (yet) so close to bioinformatics and computational biology.

      While there are examples of similar frameworks being applied to drug development, this work will add to the body of literature utilizing an integrated systems biology approach for pairing drugs with specific tumor or cancer types and understanding their mechanism of action on an epigenetic level.

      We thank the reviewer for this very positive statement and the support for our approach and her/his interest in the described pipeline.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2022-01501

      Corresponding author(s): Prachee Avasthi

      [The “revision plan” should delineate the revisions that authors intend to carry out in response to the points raised by the referees. It also provides the authors with the opportunity to explain their view of the paper and of the referee reports.

      • *

      The document is important for the editors of affiliate journals when they make a first decision on the transferred manuscript. It will also be useful to readers of the reprint and help them to obtain a balanced view of the paper.

      • *

      If you wish to submit a full revision, please use our "Full Revision" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      • *

      We thank the reviewers for their careful reading and evaluation of our manuscript. The reviewers have emphasized the need for several important changes which we plan to address.

      First, they request better evidence and specificity of the BCI target in Chlamydomonas. We have created double mutants between the dusp6 ortholog mutants and found severe defects in ciliogenesis similar to what we see with BCI treatment. We plan to include this data in the paper as well as the subsequent analyses we performed with the single dusp6 ortholog mutants. This data will provide stronger evidence that this pathway regulates ciliary length in Chlamydomonas aside from the other potential off target effects that could be impacting this pathway that we may be seeing through the use of BCI.

      Second, the reviewers have requested more consistency and clarity both in statistics and descriptions of the data and to expand upon our findings in the discussion. We will create a clear guideline for our use of statistics and adjust the descriptions of the data to fit this guideline more strictly and prevent overstating/oversimplifying results. We will also add more discussion and information related to off target effects of BCI, the importance of the subtle defects in NPHP4 protein expression in the transition zone, and the relevancy of the membrane trafficking data in light of this study.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):____


      SUMMARY:____


      The authors investigated the effects of an allosteric inhibitor of DUSP (BCI) on cilia length regulation in Chlamydomonas. Among seven conclusions summarized in Fig. 7, BCI is found to severely disrupt cilia regeneration and microtubule reorganization. Additionally, changes in kinesin-II dynamic, ciliary protein synthesis, transition zone composition and membrane trafficking are also explored. All these aspects have been shown to affect cilia length regulation. Findings from this body of work may give insights on how MAPK, a major player in cilia length regulation, functions in various avenues. Additionally, the study of BCI and other specific phosphatase inhibitors may provide a unique addition to the toolset available to uncover this important and complicated mechanism.

      MAJOR COMMENTS

      Major comment 1____

      The addition of BCI increases phosphorylated MAPK in Chlamydomonas based on Fig 1B. However, the claim that BCI inhibits Chlamydomonas MKPs is not supported at all. SF1A shows CrMKP2, 3 and 5 are related to each other but distant from HsDUSP6 and DrDUSP6. At the same time, 2 out 3 predicted BCI interacting residues are different from the Hs and Dr DUSP6 in SF1B, contradicting "well conserved" in line 172. Consistently, mutants of these orthologs have little to no ciliary length and regeneration defects compared to BCI treatment (see major comment 6 about statistical significance). I am not convinced that BCI inhibits the identified orthologs or any MKPs in Chlamydomonas. It's possible that BCI inhibits a broad range of phosphatases including the ones listed and/or those for upstream kinases. But such a point is not demonstrated by the presented data.

      While BCI is predicted to interact with these residues, it is also predicted to interact with the “general acid loop backbone” by fitting in between the a7 helix and the acid loop backbone (Molina et al., 2009).

      MKP2 has ciliary length defects compared to wild type, though it regenerates normally. In addition, we have crossed these mutants together and have found that cells (2x3 12.2 and 3x5 29.4) cannot generate cilia. We will include this data in the supplement and perform follow up analyses on these double mutants. Because these structures are not 100% conserved, and we have changed the text to “partially conserved” to reflect this, it is possible that BCI is hitting all of these DUSPs rather than just one, or the DUSPs may serve compensatory functions that rescue ciliary length.

      Major comment 3____

      The claims that "BCI inhibits KAP-GFP protein expression" (line 271) and "BCI inhibits ciliary protein synthesis" (line 286) are not convincingly demonstrated. Overlooking that only KAP is investigated instead of kinesin-II, none of the relative intensity from the WB in 30 or 50 µM BCI and the basal body fluorescence intensity indicates a statistically significant difference. The washout made no difference in any of the assay and it's not explained how phosphatase inhibition by BCI might affect overall ciliary protein synthesis. The claims about protein expression may need a fair amount of effort and time investment to demonstrate, therefore I suggest leaving these out for this manuscript.

      Though it's very interesting to see that in SF 2C cilia in 20 µM BCI treatment can regeneration slowly. Line 162, the author claimed "In the presence of (30 µM) BCI, cilia could not regenerate at all (Fig 1E)". Since Fig 1E only extends to 2 hours, I think it's important to clarify if in 30 µM BCI cilia indeed can not generate even after 6 or 8 hours.

      We have altered the text to be more specific with our wording that KAP-GFP is investigated rather than kinesin-2, and we have added text to indicate that downstream phosphorylation events could impact transcription and translation of proteins necessary for ciliary maintenance. This interpretation of the data mentioned above is correct; KAP-GFP is not significantly altered at the basal bodies or in accordance with the steady state western blots. What we see here and demonstrated in Figure 2F-I is the depleted KAP-GFP protein which is not restored following a 2 hour regeneration in BCI. We likely do not see a difference in steady state conditions because the protein is not degraded, just being moved around in the cell. We can only see the difference when the majority of KAP-GFP, which the data suggests is mostly present in cilia, is physically removed through ciliary shedding. This protein is not replaced during a 2 hour regeneration which allows us to conclude that this protein is inhibited due to BCI.

      The washout made a small difference in the double regeneration whereby we begin to see cilia begin to form in washed out conditions, though this was not statistically significant. It is possible that BCI has a potent effect on the cell similar to how other drugs, such as colchicine, cannot be easily washed out. The purpose here is to show that regardless of the statistical significance, cells can begin to regenerate their cilia after BCI washout, though this occurs 4 hours after washout in doubly regenerated cells, and we do not see this potent effect on the singly regenerated cells in SF 2C. Though in SF2C, as mentioned, we do see slowly growing cilia, and this could, once again, be due to the potent inhibition BCI has on ciliary protein synthesis. We will confirm and clarify if 30 µM BCI cannot regenerate even after 6 or 8 hours.

      Major comment 5____

      It is very interesting that BCI disrupts microtubule reorganization induced by deciliation and colchicine. Data in Fig 6B and C are presented differently than those in SF 4C. For example, in SF 4C, BCI treatment for 60 min has close to 50 % cells with microtubule partially reorganized while in Fig 6C about 20% cells with microtubule fully (or combined?) reorganized. The nature of the difference is unclear to me without an assay comparing the two directly. Hence the implied claim that BCI affects colchicine induced microtubule reorganization differently than deciliation induced one is hard to interpret (line 398, line 388 vs line 403).


      The fact that taxol doesn't rescue cilia regeneration defect by BCI is very interesting. Here taxol treatment results in fully regenerated cilia while Junmin Pan's group (Wang et. al., 2013) reported much shorter regenerated cilia. It might be worthwhile to compare the experimental variance as this is a key data point in both instances. The relationship between cilia regeneration and microtubule dynamic is not in one direction. On one side, there's a significant upregulation of tubulin after deciliation. While many microtubule depolymerization factors such as katanin, kinesin-13 positively regulate cilia assembly (though not without exceptions). It is hard to determine that the BCI induced cilia regeneration defect can't be rescued by other forms of microtubule stabilization. Microtubule reorganization is one of the most striking defects related to BCI treatment. I suggest changing the oversimplified claim to a more limited one (such as "PTX stabilized microtubule ...") and an expansion on the discussion about microtubule dynamics and cilia length regulation beyond the use of taxol. Meanwhile, I strongly encourage authors to continue to investigate this aspect and its connection to the cilia regeneration.

      We will remove data regarding “partially” formed cytoplasmic microtubules and only include fully formed for each of these experiments for clarity.

      It is important to note the different taxol concentration used here. While Wang et al., 2013 used 40 µM taxol to study ciliary affects, we use 15 µM where stabilization still occurs. There have been reports of varied cell responses to higher vs. lower doses of taxol (see Ikui et al., 2005, Pushkarev 2009, Yeung 1999) mostly with regards to the cell’s mitotic/apoptotic response. We could be seeing altered responses at this lower concentration because Chlamydomonas cells also behave differently in higher vs. lower taxol concentrations. Thank you for your suggestions. We have adjusted the text to be more specific to PTX treatment as opposed to general stabilization.

      Major comment 7:____

      There are several places where the technical detail or presentation of the data are missing or clearly erroneous.

      Fig 1B: pMAPK and MAPK antibodies used in the WB are not described in the Material and methods. It's not clear if the same #9101, CST antibody used for RPE1 cell in Fig 1J is used.

      We have updated the materials and methods to include that this antibody was used for both RPE1 and Chlamydomonas cells.


      line 260 and Fig 3A state 20 µM BCI was used while Fig 3 legend repeatedly states 30 µM until (J). Also 30 µM in SF 2A.

      We have corrected the text to 20 µM BCI in the mentioned places.

      Fig 6C, the two lines under p value on top mostly likely start from the second column (B) instead of the first (D). Fig 6G, the line is perhaps intended for the second and fourth columns?

      We will make these comparisons more clear. We had performed a chi-square analysis and were comparing the difference between DMSO and BCI before PTX stabilization or MG132 treatment to after. We will add brackets to more clearly show these comparisons.

      Fig 6C, legends indicate bars representing each category. But only one bar is shown for each column. Same for 6G?

      This is the same as the previous comment for the way we represented the statistics. We will make this clearer with brackets to show the comparisons.

      Minor comments:____

      1. A number of small errors in text were noted above. Done.

      "orthologs" is misused in place of "ortholog mutants": line 176, 352, 421 (first), 879, 882, 898, 902, 938 , 939.

      Done.

      Capital names is misused as mutant names (e.g. "MKP2"should be "mkp2"): line 178, SF 1C, 1D and 1E, SF 3C, SF 6A

      Done.

      At several places such statistical analysis lines indicated are chosen confusingly. A simplest example is in Fig 1D, the comparison between 0 to 45 is less important than 0 to 30. Same as in Fig 1H, 1I. The line ends are inconsistent as well. They either end in the middle or the edge of the columns/data points (such as in SF 4B) and some with vertical lines (SF 2B, SF 4A, SF 6B). I suggest adding vertical lines pointing to the middle to indicate the compared datasets clearly.

      Thank you for this suggestion. We agree and will update the figures to reflect this and provide clarity for statistical comparisons.

      line 101 remove "the"

      Done.

      line 120 "modulate" to "alter"

      Done.

      line 198 "N=30" should be "N=3"

      Done.

      line 212. The legend for p value is likely for (G)

      Done.

      line 284, "singly" should be "single"

      Done.

      The dataset for "Pre" and "0m" in Fig 6D and 6E are clearly the same. Consider combining the two as in Fig 6C.

      This is correct. We will combine the data sets.

      Fig 6E, "BCI" on the X-axis should be "DMSO".

      This is correct. We will correct this.

      line 685, remove "?".

      Done.

      line 894: "Fig 3J" instead of "Fig 3H"

      Done.

      SF 1 legend, (C) and (D) are inverted.

      Done.

      SF 4A "Recovered" should be "Full"

      Done.

      SF 5, row 5, under second arrow perhaps missing +PTX

      Done. We greatly appreciate this close reading of the text and the list of changes making these errors easy to find. We will make these changes in the manuscript.

      Reviewer #1 (Significance (Required)):____


      Increasing evidence indicates that several MAPKs activated by phosphorylation negatively control cilia length while few studies focus on how MAPK dephosphorylation affects cilia length regulation, largely due to the unknown identity of the phosphatase(s) specifically involved in cilia length regulation. The authors set out to investigate the effect of BCI on cilia length control. BCI specifically inhibits DUSP1 and DUSP6, both of which are known MAPK phosphatase, and therefore may provide a unique opportunity to understand how MAPK pathway is controlled by specific phosphatase(s) activity in cilia length regulation.


      Overlooking some inconclusive results and oversimplified interpretations, I find the most striking findings are the BCI's effects including ciliogenesis, kinesin-2 ciliary dynamics and microtubule reorganization. I believe these findings have significant relevance to the stated goal (line 131) and conclusions (line 57) and readers may find them a good starting point for further investigation of the role phosphatases play in cilia length regulation.

      Cilia length regulation is a complicated mechanism that is affected by many aspects of the cell and functions differently in various systems. My field of expertise may be summarized by cilia biology, cilia length regulation, IFT, kinesin, kinases (MAPKs), microtubules. The membrane trafficking's role in cilia length regulation is somewhat unfamiliar to me. Additionally, the authors used a number of statistical tests and corrections in various assays. The nuance of these choices is not clear to me and neither explained to general readers.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In their manuscript, "ERK pathway activation inhibits ciliogenesis and causes defects in motor behavior, ciliary gating, and cytoskeletal rearrangement," Dougherty et al investigate how BCI, an activator of MAPK signaling, regulates ciliary length. Despite advances in our understanding of the structure and function of cilia, a fundamental question remains as to what are the mechanisms that control ciliary length. This is a critical question because cilia undergo dynamic changes in structure during the cell cycle where they must disassemble as they enter the cell cycle and must rebuild after cell division. This work contributes to a growing body of work to determine mechanisms that regulate cilia length.

      The authors use a well-established model system, Chlamydomonas, to study cilia dynamics. This work expands on previous findings from these authors that inhibition of MAPK signaling using U0126 lengthens cilia as well as other publications that implicate MAPK signaling in controlling ciliary length. However, the authors only observe a few significant phenotypes with other subtle trends, leaving the conclusion regarding the role of MAPK signaling murky. Furthermore, it is unclear through what mechanism BCI impacts ciliary length. Several issues must be addressed:

      MAJOR ISSUES

      1. The basis for this study is the use of the ERK activator BCI, which the authors show activates MAPK signaling. While the authors do use putative DUSP6 ortholog mutants to corroborate some of the phenotypes, the majority of the data (and conclusions) uses BCI. However, there may be off target effects and the authors do not address this limitation of the study. The authors only use 1 pharmacological tool to manipulate MAPK signaling, so it is unclear whether these ciliary disruptions are specifically due to increased MAPK. It is necessary to clarify the following questions about BCI action to interpret the results:
      2. ____a.____ What are off target effects of BCI? Does BCI impact proliferation? Why is the BCI phenotype of cilia shortening transient and dose dependent? Why does the phenotype of cilia length and regeneration capacity in Chlamydomonas differ from both ortholog mutants and hTERT-RPE1 cells? While we do mention following supplemental figure 1 that other MKPs could be the target for BCI, we also cite Molina et al., 2009 who showed specificity for BCI hydrochloride in zebrafish. BCI targets primarily DUSP6, but also exhibited some activity towards DUSP1. In this study, the authors had also used zebrafish embryos to check expression of 2 other FGF inhibitors, spry 4 and XFD, in the presence of BCI but found that their effects were not reversed. In addition, they checked the ability for BCI to suppress activity of other phosphatases including Cdc25B, PTP1B, or DUSP3/VHR and found that BCI could not suppress these phosphatases. BCI inhibition has previously been found to be more specific to MAPK phosphatases. In addition, we have previously confirmed that U0126 has a slight lengthening effect on Chlamydomonas which further implicates this pathway in cilium length tuning (Avasthi et al. 2012).

      While cell proliferation assays maybe provide more support for MAPK signaling, it does not clarify lack of off target effects that could also contribute to this same phenotype. We do provide a cell proliferation assay for RPE1 cells where we show that higher concentrations of BCI result in cellular senescence as well (Fig 1I).

      The BCI phenotype of cilia shortening is likely transient and dose dependent due to its effect on ciliary protein synthesis demonstrated in Figure 3J. The increase in drug likely increases its substrate binding to exert its effects on the cell faster, even if this includes off target proteins.

      In RPE1 cells, we are likely seeing differences in regeneration capacity potentially due to their different mechanisms of ciliogenesis (RPE1 cells partake in intracellular ciliogenesis where axonemal assembly begins in the cytosol whereas Chlamydomonas cells partake in extracellular ciliogenesis where axonemal assembly begins after basal bodies dock to the apical membrane), or it could be that we’re missing a delay in regeneration in RPE1 cells after waiting 48 hours for ciliogenesis. We do not check this process sooner. There may be a defect that cells overcome. Additionally, among ortholog mutants and RPE1 compared to BCI-treated wild-type Chlamydomonas, there indeed could be off target effects or the drug could be targeting all of these MKPs rather than just one. We will add this to the discussion for clarity.

      Reviewer #2 (Significance (Required)):


      see above

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      SUMMARY:

      In this study, the authors used a pharmacological approach to explore the function of ERK pathway in ciliogenesis. It has been reported that the alteration of FGF signaling causes abnormal ciliogenesis in several animal models including Xenopus, zebrafish, and mice. However, it remains elusive the molecular detail of how ERK pathway is associated with cilia assembling process. The authors found that the ERK1/2 activator/DUSP6 inhibitor, BCI inhibits ciliogenesis, highlighting the importance of ERK during ciliogenesis. Overall, this paper is well written, data are solid and convincing. This paper will be of great interest to many researchers who are interested in understanding ciliogenesis. The following comment is not mandatory requests but suggestions to improve the paper's significance and impact.

      MAJOR COMMENTS:

      - Combination of chemical blocker experiments were well controlled and data are solid. The authors are aware of the side effects of BCI, thus they carefully characterized the phenotypes of Mkp2/3/5 in Chlamydomonas. This reviewer wonders if the levels of ERK1/2 phosphorylation are activated in these mutants. Did the authors examine the levels of ERK1/2 phosphorylation in these mutants?

      While we do not include the data showing ERK activation in these mutants, we have checked pMAPK activation and found that it is not significantly upregulated in these mutants. This could likely be due to compensatory pathways preventing persistent pMAPK activation. For example, constant ERK activation can lead to negative feedback to regulate this signal for cell cycle progression (Fritsche-Guenther et al., 2011). The ERK pathway has not been fully elucidated in Chlamydomonas, but it is possible that these similar mechanisms are in place for MAPKs. We will include this data in the supplement.

      Reviewer #3 (Significance (Required)):


      Accumulated studies suggest that the FGF signaling pathway plays a pivotal role in ciliogenesis. Disruption of either FGF ligands or its FGF receptor results in defective ciliogenesis in Xenopus and zebrafish. On the other hand, FGF signaling negatively controls the length of cilia in chondrocytes that would cause skeletal dysplasias seen in achondroplasia. Therefore, there is strong evidence suggesting that FGF signaling participates in ciliogenesis in cell-type and tissue-context dependent manners. However, the detailed mechanism of the downstream of FGF signaling in ciliogenesis is still unclear. In this regard, this paper is beneficial for the cilia community to expand the knowledge of how ERK1/2 kinase contributes to the regulation of ciliogenesis.


      This reviewer therefore suggests that the authors may want to add more discussion to explain how their finding possibly moves the field forward to understand the pathogenesis of multiple ciliopathies.

      We will add a description of this to the discussion.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      Reviewer 1:


      Major comment 4____

      A single panel in Fig 4A also can't support the shift in protein density in the TZ in line 317. As line 324 implies protein synthesis defect by BCI, the very minor (in amount and significance) reduction of the NPHP4 fluorescence should not be interpreted as any disruption at all to the transition zone. I suggest checking other TZ proteins such as CEP290 etc or leave this section out.

      Also, The additive effect from BFA and BCI treatment in Fig 5A suggests BCI affects cilia length independent of Golgi. The "actin puncta" and arpc4 mutant are not sufficiently introduced. And more importantly, how increase in the actin puncta explains the shorter cilia length caused by BCI while actin puncta are absent in arpc4 mutant with shorter cilia? Also, the Arl6 fluorescence signal "increase" is not significant in either time point. I suggest leaving this section out as well.

      We agree that one EM image cannot support a protein shift and have removed our observation in the text. However, we do see a statistically significant decrease in NPHP4 fluorescence in BCI treated cells which we consider a disruption in the sense that the structural composition is altered. We will change the word “disruption” to “alteration” for clarity. Though this is a minor defect, we believe it is still worth noting. We believe this data still adds to the model that though the EM-visible structure is unaltered, finer details within the transition zone are indeed altered and we cannot rule out that these smaller changes are not impacting protein entry into cilia. Awata et al. 2014 shows that NPHP4 is important for controlling trafficking of ciliary proteins at the transition zone, and its loss from the transition zone has been found to have effects in ciliary protein composition. Because we see decreased NPHP4 expression, we believe this is a notable finding as we see effects on the abundance of a protein which is known to affect ciliary protein composition and have therefore chosen to leave the data in the manuscript. We will adjust the language to most accurately describe our findings.

      We also agree with the interpretation that the additive effect seen from BFA and BCI treatment could suggest independent pathway collapse separate from the Golgi which we have mentioned in the manuscript.

      We have provided more information to introduce actin puncta and ARPC4 with regards to membrane trafficking. Bigge et al. 2020 shows that ARPC4, a subunit of the ARP2/3 complex which is an actin binding protein important for nucleating actin branches, has a role in ciliary assembly. ARPC4 mutants have repressed ability to regenerate their cilia. One feature they noticed in regenerating cells is the immediate formation of actin puncta which are reminiscent of yeast endocytic pits. This observation in addition to altered membrane uptake pathways in Chlamydomonas suggests that ciliogenesis involves reclaiming plasma membrane for use in ciliogenesis (because of the diffusion barrier preventing a contiguous membrane). Here, we incorporate this assay to assess the ability for the cell to reclaim membrane during BCI treatment and find that there is increased actin puncta. This could indicate that there is increased number of endocytic pits or alternatively that the lifetime of these pits is increased (perhaps due to incomplete endocytosis) such that we are able to detect more of them at a fixed point in time. While we cannot say which is happening here, we have previously found that these actin puncta are likely endocytic and needed to reclaim membrane for early ciliogenesis. An increase in these puncta may suggest dysregulated endocytosis in one way or another. ARPC4 cells cannot form the actin puncta in the first place, whereas we are seeing defects following puncta formation. We have taken out the Arl6 data.

      Major comment 6____

      Throughout this manuscript, the standard the authors used to interpret statistical significance is erratic. In a few instances, the threshold for p value is clearly indicated such as in Fig 1 legend. Though other times, much higher p values are considered differences. Here are some examples:

      SF 1C, p=0.1167 is considered "(mkp5) shorter than wildtype ciliary lengths" (also line 177 "SF 1C" instead of "SF 1D")

      Fig 3C, p=0.083 interpreted as "slightly less" in line 262 and possibly as "(KAP-GFP) not being able to enter (cilia)" in line 268

      Fig 3G, p=0.1087 is considered "not decrease after two hours" line 267

      SF 3C, p=0.2929 for mkp2 mutant (misuse of "orthologs" in line 352) is considered "fewer actin puncta compared to wild type cells" (line 352).

      SF 6B, p=0.1565for mkp3 mutant (line 421: misuse of "orthologs" and correct use of "ortholog mutants") is considered not be able to "fully reorganize their microtubules" (line 421).

      These instances sometimes serve as basis for major conclusions and should be clarified or more carefully characterized.

      We agree the interpretations are very erratic in places and greatly appreciate this detailed list making it easy to find and correct these interpretations. We have adjusted the text in the mentioned places to reflect these changes, and we have made a statement in the text and under statistical methods that say we consider p Reviewer 2:

      In multiple instances the conclusions are overstated, and the author must clarify the interpretation of the results to reflect the data presented. Here are some examples:

      • ____a.____ The conclusion that protein synthesis is disrupted is incorrect in two instances (line 258 and 275) as the experiments in figure 3 do not directly examine changes in synthesis (they look at cilia regeneration as a proxy). We show that KAP-GFP expression is not normal during regeneration at 120 minutes which suggests, in addition to the inability for cilia to grow in BCI, that synthesis is inhibited because this protein is not replaced. In addition, blocking the proteosome did not rescue this decrease in KAP-GFP expression indicating that this is not a matter of KAP-GFP protein being degraded rapidly. We use regeneration and KAP-GFP readout as a proxy for protein synthesis. We have clarified this in the text.

      • ____b.____ The conclusion that BCI disrupts membrane trafficking is too broad when the authors only examined trafficking of one membrane protein, Arl6. While we only looked at one membrane protein specifically, we assess other membrane trafficking paths. We looked at BCI vs. BFA to assess Golgi trafficking (Dentler 2010) in addition to formation of actin puncta which is used in Bigge et al. 2020 as an assay for membrane uptake from the plasma membrane for incorporation into cilia.

      • ____c.____ The conclusion that the transition zone is disrupted is too broad based on a decrease in the expression of one transition zone protein, NPHP4. We have changed the text to be more specific to NPHP4.

      Highlighting the overstatement, the conclusion of the header and figure caption on page 10 contradict one another. The manuscript states that "BCI partially disrupts the transition zone" (line 313) and that "The TZ structure is structurally unaltered with BCI treatment" (line 329).

      In the manuscript, we show that the EM-visible structure is indeed unaltered. Because we see a decrease in NPHP4 fluorescence, we concluded that while the EM-visible structure is unaltered, protein composition within the transition zone is altered which suggests that BCI partially disrupts the transition zone.

      Why is kinesin-2 the only target studied for ciliogenesis? Ciliogenesis is a complex process that involves many other critical proteins and investigating kinesin-2 alone is not sufficient to conclude why BCI prevents cilia assembly.

      We use kinesin-2 because it is the only ciliary anterograde motor in Chlamydomonas which is required for proper ciliogenesis. By assessing kinesin-2, we were able to address whether this protein alone was the cause for inhibited ciliary assembly (and we find that it’s not), whether its ability to enter was impacted (likely owing to defects in other protein entry), and we were able to use this protein to understand how its protein expression was affected. Because KAP-GFP is a cargo adaptor protein and interacts with IFT complexes and other cargoes, defects in this protein can have a wide range of implications. We agree and the data agree that kinesin-2 alone is not sufficient to conclude why BCI prevents cilia assembly. Because of this, we assessed other pathways including membrane trafficking and microtubule stabilization to better understand why we see defects in ciliary assembly. Certainly many other proteins are important in ciliogenesis and we hope that this study sparks further work in this area to identify additional causative explanations for impaired ciliogenesis upon MAPK activation..

      Tagged ciliary proteins are sensitive to disruptions in function and expression within cilia. It is important to include proper controls in the study using KAP-GFP Chlamydomonas cells to ensure that KAP-GFP maintains endogenous expression levels and normal function as untagged KAP. Furthermore, if this information is available through the resource where the cells were purchased, then this needs to be discussed.

      KAP-GFP expressing Chlamydomonas has previously been validated as described in Mueller et al., 2005. We will provide details in the text about validation of this strain.

      The authors need to provide clear explanations to a general audience of why this technique is used and how the authors reached the interpretations. There are several instances where the authors use techniques that are cited as fundamental papers in Chlamydomonas. Here are two examples:

      • ____a.____ It is unclear how the authors concluded that decreased frequency and velocity of train size shows that kinesin entry, specifically, is disrupted. We have expanded on this in the text. Please see response to reviewer 1, Major comment 2 above.

      • ____b.____ It was impossible to follow how the experiment where cells treated with cycloheximide could not regenerate their cilia following BCI treatment shows that BCI inhibits protein synthesis. We have adapted the text to be more clear regarding this experiment. In this experiment, we deplete the ciliary protein pool by forcing ciliary shedding two times. Following the first shedding, there is enough protein to assemble cilia to half length (Rosenbaum, 1969). We ensure that the protein pool is completely used up by inhibiting further ciliary protein synthesis with cycloheximide. For the second shedding event, completely new ciliary protein must be synthesized for ciliogenesis to occur which is why ciliogenesis takes much longer compared to a single regeneration where half of the ciliary protein pool still remains and can be immediately incorporated into cilia (SF 2C). In the presence of BCI, cilia cannot grow at all as expected; but 4 hours after BCI is washed out, we see ciliogenesis just beginning to occur which indicates that there is protein present for ciliogenesis to begin whereas in cells where BCI is not washed out, we do not see any ciliogenesis.

      The impact of BCI treatment on membrane trafficking as presented is confusing. BCI exacerbated the effects of BFA treatment on Golgi, yet the authors do not address that this could be an indirect effect of BCI or an off-target effect of BCI.

      This is addressed in the discussion (paragraph 4).

      The discussion section includes many interpretations of the results, but leaves the reader confused as to what the authors think might be happening. The manuscript would be far clearer if the authors would provide a working model for why BCI impacts cilia length. It is fine for this to be left for future work but, as the experts, the authors must have relevant thoughts to share with the field.

      Figure 7 provides a model with as much as we can conclude given the data; what we show is that BCI inhibits many different processes in the cell, but we do not necessarily show links between these processes to provide a complete working model of how these are all interconnected; we have provided a summary model that depicts the various, still disconnected processes that are inhibited by BCI. MAP kinases such as ERK have dozens of downstream targets both within and outside the nucleus. Ciliogenesis also is a complex process coordinating many cellular mechanisms. The intersection of these two seem to have a multi-fold effect that results in a dramatic ciliary phenotype through a combination of factors, however not one that fully explains the severity upon initial deciliation in BCI/MAPK activation. Further work is needed to identify the precise cause of completely inhibited cilium growth from zero length.

      MINOR ISSUES

      1. The title of the manuscript is inaccurate and overstates the pathway involvement in cilia. The authors do not directly show that ERK pathway activation causes the ciliary phenotypes due to the use of BCI, a drug that modulates ERK. We have adjusted the title to “The ERK activator, BCI, causes…”

      When discussing results of data that are not statistically significant it creates confusion to state that the results "increased/decreased slightly".

      We agree that references to statistics are inconsistent or confusing throughout the text and have adjusted these references accordingly.

      Reviewer 3:

      Major comment:

      - If the authors want to emphasize their finding is associated with MAP kinases, it would be also beneficial to examine other major MAP kinase pathways such as P38/JNK. If not, then this reviewer suggests revising the text as ERK through this manuscript to avoid confusions.

      Because the ERK pathway has not been fully elucidated in Chlamydomonas, we have refrained from using “ERK” as a descriptor because this particular MAPK shares equal identity with multiple MAPKs in Chlamydomonas. Further, BCI may be targeting more than one MAPK phosphatase resulting in the myriad phenotypes we have discovered. At this time, we lack a level of gene-level resolution to map to known MAPK pathways.

      • *

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.


      Reviewer 1:

      Major comment 2____

      The claim that "BCI treatment decreases kinesin-2 entry into cilia" (line 236) is a misinterpretation of the data presented. The data indicates KAP-GFP have reduced accumulation in cilia, decreased IFT (anterograde) frequency, velocity and injection size associated with BCI treatment. Though as shown in Fig 1D and Fig 2C, cilia length is also shorter due to BCI treatment. Ludington et. al, 2013 showed a negative correlation of cilia length and KAP injection rate in various treatments that affect cilia length. It's essential to rule out that the KAP dynamics reported in the current manuscript is not an outcome of shortened cilia in order to claim as line 236 seems to suggest. One way to demonstrate specific effect by BCI would be to compare KAP dynamic in cilia with equal or similar length, either by only selecting the shorter cilia from wt or use other treatments that are known to decrease cilia length (chemicals, cell cycle, mutants etc.). Given the capability and resource represented in this manuscript, I don't expect a significant cost and time investment for these experiments.

      Ludington et al., 2013 shows that injection size decreases with increasing length. Our data show that the shorter length cilia have decreased injection size and rate inconsistent with the cause being due to shortened length alone. In other words, in figure 2C and 2G, we see decreased KAP-GFP fluorescence in shorter cilia as opposed to greater fluorescent signal in shorter cilia seen in Ludington et al., 2013. This data, in combination with the decreasing frequency of KAP-GFP entry overtime in figure 2E and decreased velocity in figure 2F support decreased kinesin-2 entry into cilia. If entry was unaltered, we would expect increased KAP-GFP fluorescence in the cilia over time in BCI-treated cells.


      Reviewer 2:

      The authors state that the decreased length of cilia following BCI treatment could be a result of reduced assembly or increased assembly. Disruptions to cilia assembly and disassembly are not mutually exclusive and both must be evaluated. The authors do not test whether cilia disassembly is disrupted in BCI treatment and therefore, cannot conclude that BCI solely disrupts cilia assembly.

      While effects on disassembly remains a possibility, the striking inability to increase from zero length upon deciliation and the effects on anterograde IFT through the TIRFM assays suggest an affect on assembly. There may be effects on disassembly and likely many other cilia related processes not investigated but we feel it remains accurate to conclude that assembly is affected by BCI treatment.

      Reviewer 3:

      - If time allows, in addition to examining NPHP4, it would be beneficial to examine other TZ/TF markers such as CEP164 to confirm if BCI partially disrupts the TZ.

      Given the known outcomes of NPHP4 loss in Chlamydomonas (Awata et al., …) in affecting ciliary protein composition, we suspect the changes in NPHP4 abundance at the transition zone will have a significant impact and agree it would be interesting in a follow up study to see how other transition zone proteins (particularly ones known to interact with NPHP4 or others critical for TZ function) are impacted following BCI treatment.


      MINOR COMMENTS:

      - I suggest moving supplemental figure 1 to the main figure (Fig. 1?) so that the readers appreciate the author's careful examination of BCI through this manuscript.

      Thank you for your suggestion and kind critique. We have included this data in the supplement for consistency with mutant data in all of the other supplemental figures.


    1. Overview Q&A Notebook Transcript INSTRUCTOR Jeff Toister Author, Consultant, Trainer Follow on LinkedIn RELATED TO THIS COURSE Learning Groups Show all Exercise Files (2) Show all Certificates Show all Continuing Education Units Show more Exam Start Exam Course details 1h 22m Beginner Updated: 11/18/2020 4.7 (12,712) View Jeff's LinkedIn NewsletterDo your customers feel valued? When they do, they keep coming back. When they don't, your business suffers. In this course, writer and customer service consultant Jeff Toister teaches you the three crucial skill sets needed to deliver outstanding customer service and increase customer loyalty. Learn how to build winning relationships, provide the right assistance at the right times, and effectively handle angry customers. He also shares ways to find out what your customers really think about your service, and use their feedback to improve. Learning objectives Explore how you can use customer surveys to build rapport. Name three ways you can use active listening to serve your customers more effectively. Identify the different types of needs that must be addressed in order to solve problems. Explain the benefits of taking ownership of a problem. Define “preemptive acknowledgment” and recognize its impact on customer service. List three types of attitude anchors and explain their differences. Skills covered Customer Loyalty Customer Service Learners 24,449 members like this content 537,649 people started learning CEU - Continuing Education Units (2 certifications available) National Association of State Boards of Accountancy (NASBA) Continuing Professional Education Credit (CPE): 3 Recommended NASBA Field of Study: Communications and Marketing Sponsor Identification number: 140940 To earn CPE credits the learner is expected to: Complete all videos and chapter quizzes Complete the final exam within one year from completing the course Score 70% or higher on final exam Glossary: see PDF file in the Exercise Files area Program Level: Basic Prerequisite Education: There are no prerequisites for this course. Advanced Preparation: There is no advance preparation required for this course. If you undertake this course for CPE credits, you can leave final comments in the Self Study Course Evaluation. LinkedIn Learning is registered with the National Association of State Boards of Accountancy (NASBA) as a sponsor of continuing professional education on the National Registry of CPE Sponsors. State boards of accountancy have final authority on the acceptance of individual courses for CPE credit. Complaints regarding registered sponsors may be submitted to the National Registry of CPE Sponsors through its web site: www.nasbaregistry.org Register here with LinkedIn Learning. For course refund policy, issue resolution, and additional info please see the LinkedIn User Agreement. For more information regarding administrative policies such as complaint and refund, please contact our offices at +1 650-687-3600. Project Management Institute (PMI)® PDUs/ContactHours: 1.75 LinkedIn Learning has been reviewed and approved by the PMI® Authorized Training Partner Program. This course qualifies for professional development units (PDUs). The PMI Authorized Training Partner logo is a registered mark of the Project Management Institute, Inc. To view the activity and PDU details for this course, click here. Related courses POPULAR 32m COURSE Course Customer Service: Problem Solving and Troubleshooting 293,029 learners Save POPULAR 27m COURSE Course Building Rapport with Customers 238,646 learners Save POPULAR 49m COURSE Course De-Escalating Conversations for Customer Service 278,035 learners Save POPULAR 23m COURSE Course Customer Service: Call Control Strategies 188,760 learners Save POPULAR 33m COURSE Course Creating Positive Conversations with Challenging Customers (2019) 275,662 learners Save Learner reviews 4.7 out of 5 12,712 ratings How are ratings calculated? 5 star Current value: 9,973 78% 4 star Current value: 2,159 17% 3 star Current value: 444 3% 2 star Current value: 44 <1% 1 star Current value: 92 <1% Olatunji Awesu 3rd Sales Team Lead July 25, 2022 Great course Helpful Report Ayanda Hlatshwayo Call Center Representative July 25, 2022 ... Helpful Report thobani mkhize agent July 25, 2022 very helpful Helpful Report Show more reviews Live office hours with experts Show all Show all upcoming events Jun 16, 10:00 AM EVENT Event Motivating customer service employees By: Jeff Toister Ask here to share with learners, experts and others Ask Looking for technical assistance (e.g. downloading certificates)? Visit Learning Help Question asked by Tye Locke Tye Locke Willing to help but are you? 5d More options for this question Copy link to question Report this post Where can I download the worksheet? From the video: Define outstanding customer service (00:38) 4 Answers Like Answer Add your answer here Add your answer here Answered by sadam arab sadam arab Student at alpha university 10h More options for this answer Report this post also I want download so how I can download Like Reply Answered by Sydney Sabelo Sydney Sabelo Risk Controller at Robor 1d More options for this answer Report this post PDF  is the best or recommended to download your worksheet with Like Reply Load more answers Question asked by Kufre Edet Kufre Edet Information Technology Specialist at Akwa Ibom State Agency for the Control of AIDS 1w More options for this question Copy link to question Report this post I cant find where to download the PDF files recommended in the course From the video: Create a plan (02:02) 2 Likes 1 Answer Like Answer Add your answer here Add your answer here Answered by Jeff Toister Jeff Toister Instructor Your service culture guide. 1w More options for this answer Report this post Hi Kufre. The exercise files are available to LinkedIn Learning subscribers. To download the files, navigate to the "Overview" tab and look for a link marked "exercise files" near the top. I'd recommend contacting LinkedIn Learning directly for technical assistance if you run into any more difficulty: www.linkedin.com/help/learning Like Reply Question asked by Sandip Kaur Badhesha Sandip Kaur Badhesha Passionate IT Analyst Looking for a Challenging Opportunity 1w More options for this question Copy link to question Report this post I can't find all the documents he suggests to Download in each Video. From the video: Implement techniques to build rapport (00:22) 1 Like 1 Answer Like Answer Add your answer here Add your answer here Answered by Jeff Toister Jeff Toister Instructor Your service culture guide. 1w More options for this answer Report this post Hi Sandip, The exercise files are available to LinkedIn Learning subscribers. They can be accessed by navigating to the course's Overview tab. Look for a link labeled "exercise files" near the top. I'd recommend contacting LinkedIn Learning directly for technical support if you run into any difficulties: www.linkedin.com/help/learning  -Jeff Like Reply Question asked by Lucas M. Ladeveze Lucas M. Ladeveze Surgeon Specialized Knee-Foot and Ankle -Specialized Sports Medicine - Profesional Football Coach - Professional Padel Coach - 2w More options for this question Copy link to question Report this post LEarning a lot! But I cannot find all the documents he suggests to Download in each Video.   From the video: Implement techniques to build rapport (00:23) 3 Answers Like Answer Add your answer here Add your answer here Answered by Jeff Toister Jeff Toister Instructor Your service culture guide. 1w More options for this answer Report this post Hi Lucas, I'm glad you're learning a lot so far! The exercise files are available to LinkedIn Learning subscribers. They can be accessed by navigating to the course's Overview tab. Look for a link labeled "exercise files" near the top. I'd recommend contacting LinkedIn Learning directly for technical support if you run into any difficulties: www.linkedin.com/help/learning  -Jeff Like Reply 1 Like Answered by Maha M. Maha M. Entrepreneurial professional with growth mindset, excellent interpersonal skills, problem-solving abilities. Successful at team-leading & building ,showcasing strong emotional intelligence & full filling business needs. 1w More options for this answer Report this post good content Like Reply 1 Like Load more answers Question asked by Marlene Ranallo Seelig Marlene Ranallo Seelig Recruiter 2w More options for this question Copy link to question Report this post Where are these downloads?  From the video: Implement techniques to build rapport (00:20) 1 Like 1 Answer Like Answer Add your answer here Add your answer here Answered by Jeff Toister Jeff Toister Instructor Your service culture guide. 2w More options for this answer Report this post Hi Marlene. The exercise files are available to LinkedIn Learning subscribers. They can be accessed by navigating to the course's Overview tab. Look for a link labeled "exercise files" near the top. I'd recommend contacting LinkedIn Learning directly for technical support if you run into any difficulties: www.linkedin.com/help/learning -Jeff Like Reply 1 Like Question asked by Charisa Chinyere Ndinojuo Charisa Chinyere Ndinojuo I am a professional freelancer in customer support, social media marketing, virtual assistant and data entry 1mo More options for this question Copy link to question Report this post I am done with watching all the video in this course and I still can't download the certificate, why? From the video: Keep your customers happy (00:28) 6 Likes 4 Answers Like Answer Add your answer here Add your answer here Answered by Ekemini Eyoh Ekemini Eyoh -- 3w More options for this answer Report this post I am not able to download the questions or try out the quizzes. Please how do I go about it,? Like Reply Answered by Quach T Dung Quach T Dung -- 4w More options for this answer Report this post Me too, I'm trying a lot but I can not get certificate Like Reply Load more answers Question asked by Patience Chekwube Patience Chekwube General virtual Assistant/ Data entry specialist/ lead generator 1mo More options for this question Copy link to question Report this post Please how do I download the learning plan worksheet.  Thank you From the video: What to know before watching this course (01:22) 1 Like 1 Answer Like Answer Add your answer here Add your answer here Answered by Patience Chekwube Patience Chekwube General virtual Assistant/ Data entry specialist/ lead generator 1mo More options for this answer Report this post Ok, I saw similar questions here and the answer to it. Have downloaded it but can't seem to open the downloaded file. What should I do Like Reply 1 Reply Commented by Jeff Toister Jeff Toister Instructor Your service culture guide. 1mo More options for this comment Report this post Hi Patience Chekwube . I'd recommend contacting LinkedIn Learning for technical support. www.linkedin.com/help/learning Like Reply Question asked by Eze Joy Eze Joy Student at Nnamdi Azikiwe University 1mo More options for this question Copy link to question Report this post Hello I have completed my course with a total of 73%in my exam but was not issued any certificate what will I do? From the video: Identify emotional needs (00:54) 1 Answer Like Answer Add your answer here Add your answer here Answered by Jeff Toister Jeff Toister Instructor Your service culture guide. 1mo More options for this answer Report this post Thanks for completing the course, Eze Joy . I hope it was very valuable to you! Here's a guide I found on the LinkedIn site for getting your certificate. It includes some troubleshooting steps. https://www.linkedin.com/help/learning/answer/a700836 Like Reply Question asked by Manar Fakhri Manar Fakhri MSc Master degree in Business Administration with Specialisation in International Marketing ( SMART CITY ) 1mo More options for this question Copy link to question Report this post I complete course and did the assessment and got 75% but no certification got !!!!!!!!!! From the video: Create a plan (00:01) 3 Likes 5 Answers Like Answer Add your answer here Add your answer here Answered by Esther Mutisya Esther Mutisya Operations Manager at Greenvale Hotel 1mo More options for this answer Report this post how do i download the pdfs? Like Reply 1 Reply Commented by Jeff Toister Jeff Toister Instructor Your service culture guide. 1mo More options for this comment Report this post Hi Esther. LinkedIn Learning subscribers can access the course worksheets by navigating to the Overview tab. There's a link near the top marked Exercise Files. Like Reply 1 Like Answered by Jeff Toister Jeff Toister Instructor Your service culture guide. 1mo More options for this answer Report this post Hi Manar. Thanks for completing the course! I found this guide on the LinkedIn Learning site with some troubleshooting steps for downloading certificates of completion: https://www.linkedin.com/help/learning/answer/a700836 If those steps don't help, I recommend contacting LinkedIn Learning directly for technical support: https://www.linkedin.com/help/learning While I don't work for LinkedIn Learning, and my technical skills are limited, I'd be happy to answer any questions you have about the course itself. -Jeff Like Reply 1 Like 3 Replies Load previous replies Commented by Jeff Toister Jeff Toister Instructor Your service culture guide. 32m More options for this comment Report this post Janh Delantar Here's what I shared with Manar. Hopefully, this will help you: I found this guide on the LinkedIn Learning site with some troubleshooting steps for downloading certificates of completion: https://www.linkedin.com/help/learning/answer/a700836 If those steps don't help, I recommend contacting LinkedIn Learning directly for technical support: https://www.linkedin.com/help/learning While I don't work for LinkedIn Learning, and my technical skills are limited, I'd be happy to answer any questions you have about the course itself. -Jeff Like Reply Commented by Janh Delantar Janh Delantar -- 1d More options for this comment Report this post How i can get my certificate i finish the course Like Reply Load more answers Question asked by Kingsley Chinemerem Kingsley Chinemerem Customer Relationship Officer at Sendme.ng 2mo More options for this question Copy link to question Report this post I'm not able to take the first lesson in the path. what could be the problem? From the video: Keep your customers happy 2 Likes 3 Answers Like Answer Add your answer here Add your answer here Answered by Dishita Peketi Dishita Peketi Customer Success Account Manager ( Sales Service Operations) CRM! 1mo More options for this answer Report this post Hello sir I am dishita I  couldn't able to open the exercise file which I  downloaded. Like Reply 2 Replies Commented by Bulelani lunathi Bulelani lunathi Student at Afedilem 1mo More options for this comment Report this post In other to be able to open your exercise file,i think you should go back to google out about how to open that type of file so that they will show you steps of opening the file you about to open. Like Reply Commented by Jeff Toister Jeff Toister Instructor Your service culture guide. 1mo More options for this comment Report this post Hi Dishita. I'd suggest contacting LinkedIn Learning's support team directly for technical assistance. These forums are focused on content-related questions, so your question might not get as fast and thorough a response as if you contacted support: www.linkedin.com/help/learning Like Reply 1 Like Answered by Sphamandla Hopewell Mchunu Sphamandla Hopewell Mchunu Cisco Network Academy IT. Computer Literacy. NACCW (Child and Youth Care).Department of Education (Learn Support Agent). Department of Health (TB screener and Lay counseling) Department 1mo More options for this answer Report this post Hi I have managed to finish all the quiz and exam but I cant access the certificate please help Like Reply 1 Reply Commented by Jeff Toister Jeff Toister Instructor Your service culture guide. 1mo More options for this comment Report this post Hi Sphamandla. I'd suggest contacting LinkedIn Learning's support team directly for technical assistance. These forums are focused on content-related questions, so your question might not get as fast and thorough a response as if you contacted support: www.linkedin.com/help/learning Like Reply Load more answers Show more Join the community of learners Project Management Institute (PMI) Prep - LI Learning Group 117,984 Members This group is for learners who are interested in Project Management Institute certification prep and want to connect, share, collaborate, learn, and teach in an open, safe environment. Learning is fun when done together. Let’s make it great and enjoy the conversation. *Note: By joining this group, your profile will be visible to other group members but your network will NOT be notified. Join National Association of State Boards of Accountancy (NASBA) - LinkedIn Learning Group 98,159 Members This group is for learners who are interested in NASBA and want to connect, share, collaborate, learn, and teach in an open, safe environment. Learning is fun when done together. Let’s make it great and enjoy the conversation. *Note: By joining this group, your profile will be visible to other group members but your network will NOT be notified. Join Graphic Design Tips & Tricks - LinkedIn Learning 30,908 Members This group is for learners who are interested in <topic> and want to connect, share, collaborate, learn, and teach in an open, safe environment. Learning is fun when done together. Let’s make it great and enjoy the conversation. *Note: By joining this group, your profile will be visible to other group members but your network will NOT be notified. Join Customer Service Skills & Management - LinkedIn Learning 17,488 Members This group is for learners who are interested in Customer Service Skills & Management and want to connect, share, collaborate, learn, and teach in an open, safe environment. Learning is fun when done together. Let’s make it great and enjoy the conversation. *Note: By joining this group, your profile will be visible to other group members but your network will NOT be notified. Join Show all Learning Groups 0 Notes taken Press Enter to save No notes saved yet Take notes to remember what you learned! Export your notes Get your notes for this course which includes description, chapters, and timestamps Download Filter results by video selected In this video Determine the value of outstanding customer service Selecting transcript lines in this section will navigate to timestamp in the video - When people think about outstanding customer service, there's often an employee who goes above and beyond to be the hero. Think about an experience where you received outstanding customer service. There's a good chance that an individual employee went above and beyond to make it happen. Have you ever wondered why they gave that extra effort? People go above and beyond, because they get something out of it. Even if it's just the satisfaction of knowing they made a difference. Let's explore some of the ways you, your coworkers and even your organization might benefit when you make the effort to provide outstanding customer service. You can download the value of outstanding service worksheet to help you, or just jot down some notes on a blank piece of paper. A good place to start is to look at how you personally benefit from providing your customers with service that exceeds their expectations. Make a list of what you gain from putting in that extra effort. It may help to think about a specific situation where you went out of your way to delight a customer. Here's some examples that might be on your list. Happy customers are easier to serve. You enjoy helping people, and you feel a sense of accomplishment when you are able to help someone else solve a problem. We can also have a positive impact on our coworkers when we personally provide outstanding service. Try making a list of ways your extra effort might benefit the people you work with. This time, it might be helpful to think about how you felt when one of your coworkers delivered outstanding service. Here's some examples that might be on that list. Your coworkers will have to fix fewer problems. Great service brings positive energy to the entire team, and you can be a positive role model to your colleagues. Customers often look at the people who serve them as representatives of the entire organization. As a third step in this exercise, make a list of benefits your organization receives when you personally provide outstanding customer service. Here are a few examples that might be on that list. Increased profits, retained customers, and positive word of mouth from customers who refer your organization to others. Hopefully this exercise helped you identify some reasons that providing outstanding service is important to you. Whenever you have a tough day, reread the list you've just created and reflect on why you worked so hard to help your customers. Customer service isn't always easy. But the important thing to remember is that you can choose to give that extra effort to be outstanding.

      customer service

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      • The authors claim that bin2 has a "confused" phenotype, which they define as high variability in shoot versus root lengths along with a low degree of response to water limitation. bin2-1 is a semi-dominant gain-of-function mutant, which can only be propagated as a heterozygote (homozygous individuals are viable, but don't produce seeds). There is no mention in the manuscript about genotyping or selection of homozygous bin2-1 individuals for the phenotyping assays. Could the high variability observed in fact be caused by the authors looking at a segregating population of bin2-1? * By propagating plants under optimal growth conditions over > 4 months at the TUMmesa ecotron, we were in fact able to obtain over 24 individual homozygous bin2-1 plants. We distinguish homo- and heterozygous seed by (i) adult phenotype (ii) segregation in the next generation (iii) root:shoot ratios from dark-grown seedlings on plate and (iv) sequencing of the TREE domain (as shown in Fig. 2e). Therefore, we are sure to have used only homozygous mutants in our analysis. This is now specified in the supplementary method S5.

      *The authors state that bin2 mutants had considerably more severe phenotypes than other BR biosynthesis, perception, or transcription factor mutants. This is like comparing apples to oranges, as the set of mutants they've examined consists of gain-of-function and partial loss-of-function alleles. Null alleles for BR biosynthesis (e.g. cpd, dwf4), perception (bri1brl1brl3 triple mutants) and transcription factors (bzr1bes1beh1-4 sextuple mutants) are described in the literature and would need to be tested before arriving at such a conclusion. *

      This is an important point and the nature of all alleles was and still is clearly outlined in Table S1 “Lines used in this study”. We have obtained and propagated bri1brl1brl3 triple mutant seed from Christian Hardtke (Kang et al., 2017), as well as null cpd alleles from NASC and these now complement or replace det2-1 and bri1-6 in our analysis. We compare null alleles, semi-dominant or dominant or higher order null alleles with each other. To make these comparisons clear we have highlighted these different allele types in the manuscript as depicted in the table, with null in regular font, semi-dominant or dominant in bold and higher order mutants underlined. This is described in Table S1 and in the figure legends, where applicable. We have not been able to obtain and propagate enough seed in the period of review to extend the analysis to sextuple transcription factor mutants. Therefore, we have removed the comparison between brassinosteroid mutants and now refer to the importance and role of the brassinosteroid pathway in general and, more specifically, to BR signaling rather than to BIN2.

      *For most of the phenotyping experiments a "RQ ratio" is presented. This is the ratio adjustment of the mutant/ratio adjustment of WT. While this derived quantity is useful for interpretation, we're missing plots of the raw data, and particularly those that show the underlying distribution of data points. *

      We understand that the RQratio (Fig. 4e) value is a step removed from the raw data. Please note that we also show the RQshoot (Fig. S8a) and the RQroot (Fig. S8b) in the supplement. We now depict violin plots in Fig. 4a-c and Fig. S7 as a best representation of the raw data, as follows

      Results page 10: “The violin plots compare organ length distributions in mutants versus the corresponding wild-type ecotype, which depicts dwarfism in some brassinosteroid mutants. It is also apparent that wild-type (Col-0) root length varies under water-deficit in the dark (Fig. S7). Although we have optimized protocols for PEG plates to the best of our ability, there is still a lot-to-lot and plate-to-plate variation. This emphasizes the need for normalizing each mutant line to its corresponding wild-type ecotype on the same (PEG) plate in the same experiment. To this end, the response to water stress in the dark was represented as a normalized response quotient (RQ), which is an indication of how much the mutant deviates from the corresponding wild type (Fig. 4e; see methods).”

      The RQhypocotyl, RQroot and RQratio are a necessary consequence of the variance in the data, and we consider them to be the most relevant metrics. Representative experiments were chosen from at least three replicates on the bases of RQ and P values (as specified in the legends of Fig. 3 and Fig. S10).

      Root growth involves both cell division in meristematic cells at the tip of the root and subsequent elongation as cells exit the meristem and begin to differentiate. The authors claim a nine-fold difference in CycB1,1:GUS in the root meristem in dark vs darkW, however their images show similar CycB1,1:GUS expression patterns. Furthermore, the meristems of darkW are actually smaller than dark, which would be unexpected if cell division *was increased. *

      We have reviewed the raw data again, applying blinding to avoid bias, and chosen a more representative image for the dark; the mitotic indexes are represented in a violin plot (Fig. 6c) to better show the distribution of datapoints. The conclusions are unchanged. We reimaged the wild-type under light, dark and darkW, specifically focusing on meristem properties and on final cell length. The results are presented in Fig. 6, Fig. S14, Fig. S15 and described as follows:

      Results page 14:

      “It is generally accepted that root growth correlates with the size of the root apical meristem (RAM; Beemster and Baskin 1998). Meristem size was assessed by computing the number of isodiametric and transition cells (González-García et al., 2011; Verbelen et al., 2006; Method S8). In addition, we applied a Gaussian mixed model of cell length to distinguish between short meristematic cells and longer cells in the elongation zone (Fig. S14; Fridman et al., 2021). Meristem size was shortest under water deficit in the dark (Fig. 6a; Fig. S15a,b) and, surprisingly, did not correlate well with final organ length (Fig. 1c; Fig. 6g). “

      Discussion page 16:

      “it appears counterintuitive that meristem size and organ length do not correlate in our conflict-of-interest scenario. Questions arise as to why the meristem is smaller under water deficit in the dark even though the mitotic index is higher than in the dark, and how growth is promoted under our additive stress scenarios. An important difference between our conditions and those described by others is that we germinated seed under limiting conditions in the dark in the absence of a carbon source… When water stress was applied in the dark, the mitotic index increased, but the newly produced meristematic cells immediately elongated, thereby exiting the meristem. As a consequence, meristem size remained small despite the increased number of mitotic cells. It appears that what our study shows is a novel paradigm for root growth under limiting conditions, which depends not only on shoot-versus-root trade-offs in the allocation of limited resources, but also on an ability to deploy different strategies for growth in response to abiotic stress cues.”

      We are not aware of any other study that has addressed root growth under water deficit in the dark and in the absence of a carbon source.

      • In addition, the authors claim that the longer root length in dark water stress was at least in part due to increased elongation (Fig. 7c). Elongation was only assessed by looking at the first elongating cell (~10-14um) and the differences found are on the order of magnitude of ~2um, but final cell size in Arabidopsis roots often reaches several hundred um. Therefore, a comparison of final cell size would be more appropriate. *

      Results page 14:

      “mature cell length… was highest in the dark, the condition with the shortest roots (Fig. 6b). Thus, neither meristem size nor mature cell length account for the fold-change in final organ length (Fig. 6g).”

      *Finally, the authors phenotype plt1/2 double mutants and show that they fail to elongate in response to water limitation. Their interpretation is that this supports a centralized control model for the root apical meristem. PLT1/2 are important determinants of meristem function and are necessary to maintain stem cell identity. Given the strong phenotype of plt1/2 double mutants it is not surprising that they are unable to elongate in response to this stimulus. This does not necessarily indicate that only the RAM controls root growth, but rather that functional stem cells are required for root growth, which also involves subsequent steps such as cell elongation. *

      This is an important point and we thank the reviewer for pointing it out. We now write:

      Results page 15:

      “Taken together, the cell length and anisotropy curves (Fig. 6) and genetic analyses (Fig. 6; Fig. S15f; Fig. S16) suggest that root length under our different environmental conditions is regulated by (i) the mitotic index, (ii) the timing of cell elongation or of exit from the meristem and (iii) cell geometry. We also conclude that these are differentially modulated to account for increased root length under different environmental conditions (Fig. 6c-e).”

      We also modulate the conclusion and model (Fig 7c) to state that RAM function accounts “in part” for root growth. However, it is to be noted that mature cell length in our study did not correlate with root length (Fig. 6b, 6g). Our conclusion is now reached not solely based on plt1plt2 but also on a careful and quantitative cellular analysis of the root apical meristem in the wild-type and in bin2-3bil1bil3 mutants. The major contribution of our study, however, is the difference between the different conditions, and the ability to respond to stimulus.

      *Reviewer #1 (Significance (Required)): *

      * While the study system and some of the findings in this manuscript are interesting, there are major flaws in the authors' primary claims. *

      Contested claims have been (i) deleted where unessential to the storyline or (ii) substantiated by independent methods.

      *Reviewer # 2 *

      1. I recommend to exchange shoot for hypocotyl when hypocotyls were examined to avoid to confuse the readers. We thank the reviewer for pointing this out and have exchanged shoot for hypocotyl throughout.

      2. The authors have chosen SnRK2 (and should also indicate it in all Figures as SnRK2, to not confuse the readers with SnRK1), and implement ABA signaling in parallel to BR action, but this must be proven in higher order mutants of both pathways, at the moment the results are to preliminary to allow conclusions. *

      We concur with the reviewer that higher order mutants between the BR and ABA pathways would be required to make this claim. We also concur that this would require numerous generations and therefore that it does not lie within the scope of this manuscript.

      • When the authors are interested in shoot dominance/photosynthetic activity, why didn't they look on snrk1 mutants, which are known to regulate those processes. *

      The issue of energy signaling is a key one, and we mention this in the final “perspective” paragraph of the discussion (p. 18) as follows:

      “As a limited budget is an essential component of our screen conditions, the role of energy sensing and signaling (Baena-González and Hanson, 2017) in growth tradeoffs will need to be elucidated.”

      • In Fig6d the authors propose a sketch of the mechanism, but the data of this study don't show direct interaction of the pathways and as indicated in the figure text parts of the information are taken from other papers, I recommend to remove this sketch or shift it to the supplements. * We concur with the reviewer and have deleted former panels 6d, 6e and 6f as well as reference to the mutants these included. We now focus on the BR pathway, as discussed below.

      *To discriminate the role of downstream BR signaling events from other roles of BIN2, I suggest to complement the data with pharmacological experiments (eBL or bikini where appropriate), and if possible to implement phenotyping of OE lines. *

      In response to this comment, we attempted bikinin experiments. Unfortunately, it is difficult to germinate seed on bikinin and seedlings grow poorly on this shaggy-like kinase inhibitor. As the assay relies on seed germination rather than on seedling transfer, applying bikinin was suboptimal. Because of the requirement for germination in the dark, and in lieu of eBL or PPZ or a combination thereof, we now include a null allele of a BR biosynthesis mutant, cpd, in Fig. 3b, to replace the leaky det2-1 mutant we had previously used.

      How many independent ko lines were tested, can the authors exclude that the BR independent phenotype indeed corresponds to BIN2 activity and not to a off target effect.

      Four independent bin2 mutants (B1, bin2-1, ucu1, dwarf12) were analyzed in our study. In total, 83000 M2 seed were used in our forward genetic screen; of these and for BIN2 the B1 line is the one we rescreened, mapped and characterized. We complemented B1 with bin2-1 and ucu1 alleles and compared it to bin2-1, ucu1 and dwarf12 alleles at the BIN2 locus; these three published mutant lines exhibited the same behavior as B1, including semi-dominance and phenotypes under single versus multiple stress conditions (Fig. 2c cf Fig. 3d; Fig. S6). Fine mapping (Fig. 2d), segregation analysis (Table S2), allele sequencing (Fig. 2e), backcrossing, outcrossing and complementation analysis provide independent lines of evidence that B1 is a BIN2 allele. Please note that the conclusions regarding BIN2 in this manuscript are based not on B1 but on the published bin2-1 and bin2-3bil1bil2 lines.

      We write results page 10:

      “We complemented B1 with bin2-1 and ucu1 alleles and compared it to bin2-1, ucu1 and dwarf12 (Perez-Perez et al., 2002; Choe et al., 2002) alleles at the BIN2 locus; these three published mutant lines exhibited the same behavior as B1, including semi-dominance and partial etiolation.”

      *I further recommend to exchange the pictures in Fig7a showing BRI1-GFP to pictures showing fewer cells, but with higher resolution. *

      We now show higher resolution images in Fig. 7b.

      • Regarding the implementation of photoreceptor mutants and the claim that photoreceptors are more abundant in shoot, I want to point out that the situation is more complex, as the root also reacts differently to light of different quality and quantity, with different responses in the meristem, by inhibiting cell proliferation, or in the elongation zone by triggering negative phototropism. this should be corrected in the text. *

      We are aware that light, especially when Arabidopsis is grown on media, is perceived by photoreceptors within the root system. Phototropic growth would not have affected measurements of root length as measurements were performed in ImageJ with the freehand tool. This is described in the methods on page 6, and in the supplementary method S5. For the model, we have now modulated our discussion as follows:

      Discussion p. 16-17:

      “ we postulate that a hypocotyl to root (basipetal) signal coordinates trade-offs in organ growth in response to light (Fig. 7c green arrow). However, and even though photoreceptors are considerably more abundant in the hypocotyl than in the root (van Gelderen et al., 2018), it needs to be borne in mind that photoreceptors in the root could be playing a role in root responses to light or to darkness (Mo et al., 2015).”

      *The data and methods are presented in a clear and sufficient way, as well as the statistical analysis. *

      We thank the reviewer for this positive assessment.

      *Altogether, the hypothesis and work amount are worth to be recognized, but the manuscript also resembles partially more a review and I would suggest to shorten those parts in the manuscript, reduce the amount of described lines and focus strictly on the BR pathway, in response to the environmental changes. Before implementing photoreceptors and ABA/SnRK2 pathway into the story to either test higher order mutants between the signaling pathways of interest or come up with a pharmacological screen connecting the data. Therefore I suggest to reduce the amount of mutants investigated and focus on BIN2 action, implementing also a pharmacological screen to track a fluorescent tagged BIN2 upon the mentioned treatments. And if possible to add proteomics and phosphoproteomics to understand better what changes are undergoing in the bin2 mutant vs WT upon stress. *

      We thank the reviewer for suggesting that we “focus strictly on the BR pathway, in response to the environmental changes”, as this has truly supported us in tightening the story line.

      We have removed the sections of the manuscript that resembled a review and focus entirely on the BR pathway, with additional or tighter mutants. We also look at BIN2 more closely and at a cellular level, with SEM micrographs for the hypocotyl and CSLM for the root tip. The BIN2 interactome on BIOGRID comprises 36 well annotated interactions (https://thebiogrid.org/12898/summary/arabidopsis-thaliana/bin2.html), of which 2 are documented by multiple lines of evidence and 27 are from low throughput studies. Adding adequately validated interactions to this exceeds the scope of this manuscript. Furthermore, as we no longer make the claim that BIN2 mutants are the most severely impacted (see response to reviewer #1), BIN2 is no longer the primary focus of this study; we now refer more loosely to the BR pathway, or to facets thereof referred to as BR biosynthesis, perception, signaling or BR-responsive gene expression. We have also updated and extended the reference list to include references on light perception and energy sensing or signaling. Phosphoproteomics is an important suggestion that we have also taken into the perspective.

      In brief, the manuscript has a new focus on what we consider is its true contribution: a cellular analysis of cell division, elongation and anisotropy in the wild type and in BR mutants under resting or additive stress conditions.

      *Reviewer #3 *

      1. *My major concern is that in the search of a decision mutant the authors performed the first screening not under 'a conflict of interest' scenario but under dark conditions. Can the authors explain the reasons behind this more clearly? * The reason we did not use the dark water stress condition as an initial but as a secondary screen is the variability of the response. In the new violin plots (Fig. 4a-c; Fig. S7), the variance especially in root length can be seen to be considerably greater in darkW than in dark even for the wild-type. This is why we initially screened individual M2 seed in the dark and then rescreened M3 populations under darkW conditions. Due to the relatively high variance, all conclusions in the manuscript are drawn on populations of seedlings rather than on individuals.

      We write in the results section on page 9:

      “We initially screened in the dark because the high variance in root growth under water deficit in the dark in the wild-type (see below) would obscure the distinction between putative mutants versus stochastically occurring wild-type seedlings with short roots under darkW.”

      • Related to above, the role of the BR pathway in etiolation has been well established with the prominent constitutive photomorphogenesis phenotypes of BR related mutants; since both bin2 alleles are impaired in light responses this mutant may behave in dark vs darkW, like a wildtype plant in light vs. lightW (maybe also partially as shown in SFig. 5a). However, the authors show that the growth tradeoff was not evident under light conditions (Fig 2). I think to conclude that bin2 is a decision mutant it requires more evidence to excluded that a defect in efficient sensing and signaling of dark conditions are not the primary source of the 'confused' phenotype. In addition to the phenotype in SFig. 5a where light responses are attenuated in B1 when compared to Wt, a comparison of gene expression analysis of some established light regulated genes could help to show that bin2 is able to efficiently sense the absence of light. *

      This is an important point. We have looked at the expression levels of the light responsive gene LHCB1.2 via qPCR in wild-type Ws-2 versus bin2-3bil1bil2. The data show that the gene expression is light-regulated in bin2-3bil1bil2 seedlings (Fig. S12) and are described in the Results on page 13.

      In addition, Fig. S10 and Fig. S11 are dedicated to a careful analysis of light responses in all the BR pathway mutants we analyze. In Fig. S10d, bin2-1 can be seen to have a significant (P-value We write, in the Results on page 13.

      “Interestingly, the BR mutant lines with the strongest etiolation phenotypes (cpd and bri1-116brl1brl3, Fig. S11a,b) in the dark were not the ones with the strongest deviation from the wild-type under water deficit in the dark (Fig. S8).”

      3. Cells that fail to elongate in the dark may cannot - or only to a limited extent - reduce further their cell length in the darkW conditions. Since BR-mutants fail to expand hypocotyl cells in the dark, an analysis of the hypocotyl epidermis cell length in bin2 mutants compared to wt in light vs dark vs darkW (as in Fig. 8c) could be a feasible experiment to exclude that the general BR-related cell elongation defects led to the confused phenotypes of this mutant.

      This is an excellent suggestion and we thank the reviewer for pointing it out. Accordingly, bin2-1 mutants were imaged via scanning electron microscopy (SEM) and cellular parameters assessed. We also investigated root meristem properties in bin2-3bil1bil2, which had the most aberrant root response to water stress in the dark (Fig. 3e; Fig. S8b). Our new observations are described in Fig. 5, Fig. 6h-j, Fig. S16 and in the results on pages 13-15 as follows:

      “To explore whether general BR-related cell elongation defects led to the confused phenotypes of some BR pathway mutants, we analysed bin2-1 mutants, which were among the most severely impaired hypocotyl response to water stress in the dark (Fig. S8a). The data show a most striking impact of bin2-1 on growth anisotropy, assessed in 2D as length/width (Fig. 5f). Indeed, in a comparison between dark and dark with water stress (darkW), the anisotropy of hypocotyl cells decreased considerably in the wild type (Fig. 5c), but showed no adjustment in bin2-1 (Fig. 5f). Cell length alone showed the elongation defect typical of bin2-1 mutants, with a much greater deviation from the wild type under darkW than under dark or light conditions; nonetheless, there was a significant length adjustment to water stress in the dark, even in bin2-1 (Fig. 5e). These observations suggest that the impaired bin2-1 hypocotyl response can be attributed to an inability to differentially regulate cell anisotropy in response to the simultaneous withdrawal of light and water. ….

      Meristem size and mature cell length followed the same trends in a comparison between bin2-3bil1bil2 (Fig. S16a, S16b) and the wild type (Fig. 6a, 6b), but the extent of elongation in cells proximal to the QC differed (Fig. S16c). Indeed, bin2-3bil1bil2 length and anisotropy curves lacked the steep slopes characteristic for darkW in the wild type (compare the green arrows in Fig. 6d, 6f & 6j to the purple arrows in Fig. 6j & Fig. S16c). We conclude that bin2-3bil1bil2 mutants fail to adjust their root length due to an inability to differentially regulate the elongation of meristematic cells in the root in response to water stress in the dark.”

      • The experiments with the BR-deficient and signaling mutant and the bypass mutant may suggest that BR hormone is playing a relative minor role in the 'decision activity' of BIN2. bri1-6 was described to respond like wildtype (page10 line 6-8). Since this seems because of normal root responses in dark vs. darkW (Fig. 5) it could also be caused by the role of BRL1 and BRL3 in root drought responses (Fabregas et al., 2018). To verify if functional BRL1 and BRL3 in bri1-6 could cause the root response to water stress an additional experiment with bri1,brl1,brl3 triple mutant is required; In my opinion this is very important to state if the BR input is at all required for BIN2 signal integration or not. *

      We have extended our analysis to include bri1brl1brl3 lines (Kang et al., 2017). These are dwarf mutants, yet able to respond to water stress in the dark with reduced hypocotyl and increased root growth (Figure panel former 5c replaced new Fig. 3c, shown left). Note that the lines have a null bri1-116 allele and segregate (bri1-/+ brl1-/- brl3 -/-)quite clearly, as was verified by propagating seedlings on plate after the scan on day 10 (Supplementary Method S5).

      ***Minor comments:** *

      *5. The authors separate conceptually growth tradeoffs in sensing, signaling, decision making and execution processes. A clearer explanation of the expected phenotypes from mutants in only decision making with and without stress would be interesting to add (page 8)? *

      We have now moved up phya phyb cry1 cry2 quadruple photoreceptor mutant and write:

      Results on page 9

      “Perception mutants would fail to perceive light or water stress; a good example of this is the phya phyb cry1 cry2quadruple photoreceptor mutant, which had a severely impaired light response (Fig. S4d), but a “normal” response to water stress in the dark (Fig. S4e). In contrast, execution mutants may have aberrantly short hypocotyls or roots that are nonetheless capable of differentially (and significantly) increasing in length depending on the stress conditions. Decision mutants would differ from perception or execution mutants as they would clearly perceive the single stress factors yet fail to adequately adjust their hypocotyl/root ratios in response to a gradient of single or multiple stress conditions. Failure to adjust organ lengths would be seen as a non-significant response, or as a significant response but in the wrong direction as compared to the wild-type. We thus used organ lengths, the hypocotyl/root ratio and the significance of the responses as decision read outs. We specifically looked for mutants in which at least one organ exceeded wild-type length under darkW.“

      Later in the results on page 11 and in the legend to Fig. 4 we pick up on this as follows:

      “For bin2-1, the response to water stress in the dark was severely impaired: the hypocotyl and root responses were non-significant …bin2-3bil1bil2 mutants fit the above definition of decision mutants as they have a significant root response but in the wrong direction as compared to the wild-type, as denoted by red asterisks (Fig. 3e)…

      Figure 4. … bin2-3bil1bil2 mutants qualified as decision mutants on 3 counts: (i) failure to adjust the hypocotyl/root ratio to darkW (the ratio for darkW is the same as for dark in panel c), (ii) low or non-significant P-value (see panel f below) and (iii) one organ (here the hypocotyl in panel a) exceeded wild-type length under darkW.”

      Line 26 page 17: BR responses in the epidermis of the hypocotyl have been shown to be already sufficient to control hypocotyl growth (Savaldi-Goldstein et al 2007), showing that not all cells of the hypocotyl need to receive the signal (at least in the case of brassinosteroids) We have deleted the sentence because it is too speculative. However, the issue of different tissue layers is now mentioned in the perspective on page 18, as follows:

      “3D imaging will be required to assess the impact of abiotic stress and/or of BR signalling on different cell files or tissue layers in the root (see Hacham et al., 2011; Fridman et al., 2014; Fridman et al., 2021; Graeff et al., 2021). .”

      Because of the importance of distinguishing between different cell files and cell layers, we have now removed the confocal images of BRI1-GFP under the different environmental conditions (formerly Fig. 7a); this needs to be extended to a 3D analysis, which is not within the scope of this manuscript.

      1. *Page 6 Line 11: In the volcano blots the mean RQ ratio is shown in Fig. 6c and 6f. *

      We thank the reviewer for pointing this out, we had accidentally written median RQratio, this has been rectified in the results text.

      *Some parts of the ms could be shortened and the amount of Fig. could be reduced. Fig. 1-3 could be merged as one figure showing the optimal conditions to analyze tradeoffs in shoot vs. root growth and all the conditions not suitable could be supplementary figures. *

      We concur with the reviewer and have merged the first three figures as suggested. Reviewer #2 has also requested that we slim the manuscript and all reviewers request that we strengthen our conclusions on the brassinosteroid pathway mutants. To reduce the number of figure panels, we have removed the analysis of all mutants that are not in the BR pathway, with the exception of the quadruple photoreceptor mutant in Fig. S4d,e and plethora mutants in Fig. S15. Nonetheless, incorporating the new data generated in response to reviewer comments leaves us with 7 main and 16 supplementary figures.

      *In the ms several experiments are described as 'screen' this is confusing with the forward genetic screen that was performed. *

      This is indeed ambiguous. We now use the terms “single versus multiple stress conditions/additive stress/conflict-of-interest scenario ” versus “forward genetic screen”.

      *Reviewer #3 (Significance (Required)): *

      * Mechanisms how growth trade-offs between multiple stresses are controlled are highly interesting. Growth vs. biotic stress tradeoffs have already been investigated and were found to be interdependent with light (Leone et al. 2014; Campos et al 2016; Fernandez-Milmanda et al. 2020) and hormone signaling (Lozano-Duran and Zifpel et al., 2016 and Ortiz-Morea et al 2020; van Butselaar and van den Ackerveken, 2020). Less is known about growth tradeoffs between two abiotic stress responses (Bechtold and Field, 2018; Hayes et al., 2019). The separation of root meristem growth and cell expansion in the hypocotyl is interesting. Whether the two directional root-to-shoot and shoot-to-root signals are independent or whether they may employ the same mechanism with a different output remains open. Different sensitivities of organs and cell types to BRs have for example been reported (Müssing et al. 2003 and Fridman et al. 2014). The findings that BIN2 most likely act to integrate multiple signals is in line with the reported roles of BIN2 to crosstalk with several pathways (reviewed by Nolan et al. 2020). In my point of view, it remains to be strengthened if this is through 'decision making' and not through signaling and execution. I think if the authors carefully separate the defects in bin2 this work will be interesting to many plant biologists. * We thank the reviewer for highlighting references we had not referred to in the former draft. The references pertaining to the growth versus defense trade-off are now included in the introduction (page 3) and the ones on abiotic stress factors in the Discussion on page 18:

      “In addition to its role in light and drought responses… BIN2 has been implicated in regulating hypocotyl elongation in response to far-red light and salt stress (Hayes et al., 2019). Studies on responses to abiotic stress factors have typically addressed growth arrest or tradeoffs between growth and acclimation (Bechtold and Field, 2018). Indeed, root growth is inhibited by, for example, phosphate deprivation or salt stress (Balzergue et al., 2017; West et al., 2004). Recent efforts have addressed strategies for engineering drought resistant or tolerant plants that do not negatively impact growth (Fàbregas et al., 2018; Yang et al., 2019). In contrast to other studies, here we look at two abiotic stress factors that promote organ growth. Indeed, hypocotyl growth is promoted by darkness or low light and primary root growth by water deficit in this study.”

      We emphasize the above point about decision making in the discussion. In the in the introduction and early on in the results we introduce conceptual frameworks for decision making. Yet after a forward genetic screen and mutant characterization, we revise this in the Discussion on page 18 as follows:

      “In the judgement and decision-making model for plant behaviour put forth by Karban and Orrock (2018), signal integration might be considered integral to judgement. ….Whether judgement and decision making can be distinguished from each other empirically remains unclear. As BR signalling regulates cell anisotropy and growth rates in the hypocotyl and root apical meristem, it may play a role not only in signal integration but also in the execution of decisions (or in an implementation of the action; González-García et al., 2011; Vilarrasa-Blasi et al., 2014). Thus, this study does not enable us to empirically distinguish between decision making on the one hand and signalling and execution on the other.”

    1. https://niklas-luhmann-archiv.de/bestand/zettelkasten/zettel/ZK_2_SW1_001_V

      One may notice that Niklas Luhmann's index within his zettelkasten is fantastically sparce. By this we might look at the index entry for "system" which links to only one card. For someone who spent a large portion of his life researching systems theory, this may seem fantastically bizarre.

      However, it's not as as odd as one may think given the structure of his particular zettelkasten. The single reference gives an initial foothold into his slip box where shuffling through cards beyond that idea will reveal a number of cards closely related to the topic which subsequently follow it. Regular use and work with the system would have allowed Luhmann better memory with respect to its contents and the searching through threads of thought would have potentially sparked new ideas and threads. Thus he didn't need to spend the time and effort to highly index each individual card, he just needed a starting place and could follow the links from there. This tends to minimize the indexing work he needed to do regularly, but simultaneously makes it harder for the modern person who may wish to read or consult those notes.

      Some of the difference here is the idea of top-down versus bottom-up construction. While thousands of his cards may have been tagged as "systems" or "systems theory", over time and with increased scale they would have become nearly useless as a construct. Instead, one may consider increasing levels of sub-topics, but these too may be generally useless with respect to (manual) search, so the better option is to only look at the smallest level of link (and/or their titles) which is only likely to link to 3-4 other locations outside of the card just before it. This greater specificity scales better over time on the part of the individual user who is broadly familiar with the system.


      Alternatively, for those in shared digital spaces who may maintain public facing (potentially shared) notes (zettelkasten), such sparse indices may not be as functional for the readers of such notes. New readers entering such material generally without context, will feel lost or befuddled that they may need to read hundreds of cards to find and explore the sorts of ideas they're actively looking for. In these cases, more extensive indices, digital search, and improved user interfaces may be required to help new readers find their way into the corpus of another's notes.


      Another related idea to that of digital, public, shared notes, is shared taxonomies. What sorts of word or words would one want to search for broadly to find the appropriate places? Certainly widely used systems like the Dewey Decimal System or the Universal Decimal Classification may be helpful for broadly crosslinking across systems, but this will take an additional level of work on the individual publishers.

      Is or isn't it worthwhile to do this in practice? Is this make-work? Perhaps not in analog spaces, but what about the affordances in digital spaces which are generally more easily searched as a corpus.


      As an experiment, attempt to explore Luhmann's Zettelkasten via an entryway into the index. Compare and contrast this with Andy Matuschak's notes which have some clever cross linking UI at the bottoms of the notes, but which are missing simple search functionality and have no tagging/indexing at all. Similarly look at W. Ross Ashby's system (both analog and digitized) and explore the different affordances of these two which are separately designed structures---the analog by Ashby himself, but the digital one by an institution after his death.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Manuscript number: __RC-2022-01357

      __Corresponding author(s): __Peter Novick and Gang Dong

      1. General Statements [optional]

      We would like to thank both reviewers for their thorough and constructive evaluation and comments on our manuscript. Following their suggestions, we have reworked our manuscript and added several pieces of new data to address questions from them, including (1) evaluation of how M7 mutant of Sso2 affects its interaction with Sec3 using three independent methods (in vitro); (2) investigation of how the M7 mutant affects the interaction of Sso2 with Sec3 by co-immunoprecipitation (in vivo). We hope that, with all these further introduced changes, this manuscript will be suitable for publication in your journal. Detailed point-to-point responses are shown below.

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): *

      Using the entire cytoplasmic domain of Sso2 and protein crystallization, Peer and colleagues show that two N-terminal peptides (NPY) of Sso2 synergistically interact with the Sec3 PH domain. This interaction provides an additional low affinity binding site to the previously published interface between the Sso2 four-helix bundle and the PH domain. Mutagenesis, in particular of both NPY motifs, results in reduced cell growth, in the accumulation of transport vesicles at the budding site, and in decreased secretion of invertase and Bgl2. The paper is well written, the data are convincing and the characterization of these novel peptide interaction sites clearly advances the field. Although, the exact role of the Sec3 NPY - Sec3 interaction still needs to be established, the overall functional relevance is apparent and thus the paper could be published with minor changes. *

      __Response: __We really appreciate the reviewer for his/her positive comments and clear/constructive feedbacks.

      *Nevertheless, the authors may consider to address the following issues to improve the manuscript. - To strictly exclude the possibility that the Sso2 NPY motif also interacts with other components of the exocytosis machinery (e.g. Sec1), thereby causing the observed phenotypes, Sec3 mutagenesis of the NPY motif-binding site would be required. *

      __Response: __It would be a good idea to generate reverse mutants on Sec3. However, the pocket on Sec3 bound by the NPY motifs of Sso2 is mostly hydrophobic and contains many semi-buried residues that are in close contact with other residues in the hydrophobic core of structure (including L78, Y82, I109, V112, V208, etc.; see Fig. S3D, E) and thus essential in maintaining the folding of Sec3. Making mutations on these residues would destabilize the folding of Sec3. This was why we have not done this as suggested by the reviewer.

      *- The authors suggest that the NPY-peptide binding contributes to the initial interaction/recruitment of Sso2 to the exocytosis site, defined by the localization of Sec3 (exocyst). Further data sustaining this concept/hypothesis could improve the impact of the manuscript. Thus, an experiment analyzing the co-distribution of the Sec3 with Sso2 would directly support the authors' conclusion. (In Figure 7, the authors already show the highly polarized distribution of Sec3-3xGFP.) The M7 mutant could impact the distribution of Sso2. In addition, it would be helpful to determine to which degree the Sso2 NPY - Sec3 PH domain interaction increases the overall affinity of Sso2 for the Sec3 PH domain; e.g. comparison of the binding of Sso2 (1-270) wt and M7 to Sec3 PH domain using ITC. *

      Responses:

      • We greatly value the reviewer’s suggestion. For the suggestion to investigate how the M7 mutant affects the co-distribution of Sso2 with Sec3 in yeast, we have tried a variety of conditions with both the original serum and affinity purified Sso antibodies. In neither case did we see a clear concentration at sites where we would expect to see Sec3, such as the tips of small buds. We were able to see some detectable concentration of HA-tagged Sso2 in small buds using anti-HA Ab, but it would be difficult to tag the M7 mutant at the same site since it is so close to the M7 mutation. We are also worried that the tag might interfere with Sec3 binding due to the proximity. Given the lack of detectable concentration of WT Sso2, it would not be possible to see a loss of localization in M7.
      • For the suggestion to check the binding of Sec3 with either the WT or M7 mutant of Sso2 (aa1-270), we have generated M7 mutant within the same fragment of Sso2 as the WT (i.e. aa1-270) and carefully checked how this M7 mutant affects the interaction of Sso2 with the Sec3 PH domain using three independent methods. Our ITC data show that WT Sso2 bound Sec3 very robustly, with a Kd of approximately 2 µM (Fig. 8C). Surprisingly, however, the M7 mutant of Sso2 (aa1-270) completely abolished its interaction with Sec3 (Fig. 8D). To further verify this observation, we carried out electrophoresis mobility shift assays (EMSA) and size-exclusion chromatography (SEC). Our EMSA data on a native PAGE gel shows that WT Sso2 (aa1-270) bound Sec3, whereas the M7 mutant did not (Fig. S5A, B). Similarly, our SEC data demonstrate that Sec3 was co-eluted with WT Sso2 in the higher molecular weight peak; in contrast, Sec3 and the M7 mutant of Sso2 (aa1-270) were eluted in separate peaks and no stable complex of the two was formed (Fig. S5C, D). All these new data confirm that the NPY motifs play an essential role in maintaining the stable interaction between Sso2 and Sec3, which would explain why the M7 mutant gave such dramatic phenotype in vivo (Fig. 4B-E; Fig. 5D-F; Fig. 6D, E). *Minor point: In the discussion, the authors should mention to which degree the NPY binding site within Sec3 is accessible for / occupied by other known exocyst components, or PI(4,5)P2, etc. *

      Response: __Thank you for the suggestion. A new diagram has been added to __Fig. 9E to compare the structures of the previously reported Sec3/Rho1 complex and the Sso2/Sec3 complex determined by us. It shows that the NPY binding site on Sec3 is on the opposite side of the membrane-binding surface patch. The NPY binding site is also far away from the Rho1 interacting site on Sec3 and thus does not interfere with Rho1 binding either.

      *Reviewer #1 (Significance (Required)):

      The manuscript significantly contributes to our understanding of how the vesicle tethering machinery interacts and coordinates the assembly of the membrane fusion machinery and will be of broad interest in the field of membrane trafficking. I am not an expert in X-ray crystallography. *

      __Response: __We sincerely appreciate this reviewer’s positive feedbacks.

      ***Referees cross-commenting**

      I agree with the comments of the other reviewer. It would be nice to show the effect of the M7 mutant in a reconstituted liposome fusion assay, but as already mentioned this may require an additional collaboration. Whether the relatively weak Sec3 - NPY interaction can be resolved in the liposome fusion assay needs to be shown.*

      __Response: __Please check our detailed answer to the other reviewer’s question about this.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): * The manuscript of Peer et al. Describe the structural characterization of the interaction of the syntaxin-like Sso2 protein with the exocyst subunit Sec3. The authors identify here a dual NPY motif at the N-terminal part of Sso2 that binds to Sec3 and thus confers functionality. Using x-ray crystallography, they show a nearly full-length Sso2 in complex with Sec3, which reveals how Sso2 binds to Sec3. Subsequent mutagenesis shows that both NPY motifs act together in binding, and are both required for functionality in vivo, using established assays in localization of exocyst subunits, secretion assays and growth tests. Their data suggest an overall model how Sso2 is efficiently recruited by exocyst to promote vesicle secretion.

      This is__ an overall very complete and clear manuscript__, where the authors nicely demonstrate, how the two NPY motifs both contribute to efficient Sso2 interaction with Sec3. Their data further show that each motif alone can contribute to function, whereas loss of both motifs (the M7 mutant) result in deficient binding. Likewise, their established assays to determine cellular importance of the NPY motifs in Sso2 show that trafficking and localization in the secretory pathway is strongly impaired in the mutant. I only have a few questions and suggestions. *

      __Response: __Thank you for the positive feedback.

      *1. The authors present in Figure 4 the mutants. I recommend to show the alignment of the mutants (M5,M6,M7) similar to panel A in Figure S4 here to orient the reader. They could also be listed in Figure 3, since the authors have here the sequences. *

      Response: __Alignment of M5-M7 has been added in __Fig. 4A as suggested. Thank you.

      2. The authors previously showed that Sso2 mutants affect the Sec3 driven assembly and also the fusion. I am wondering if they have the tools ready to also conduct this assay with their M7 mutant, which has the strongest defect. I am aware that this may be challenging if the tools are not established here as in the previous collaboration (Yue et al., 2017). It may provide additional information on the functional crosstalk.

      Responses:

      • Thank you for the suggestion. However, we do not think it is necessary to perform such assay based on our new results. As shown in 8C&D and Fig. S5, we found that the M7 mutant of Sso2 (aa1-270) completely abolished its interaction with Sec3, which is in contrast to the robust interaction between the WT Sso2 (aa1-270) and Sec3. Therefore, we expect that the M7 mutant would fail to accelerate liposome fusion in the same way as we had previously seen for the WT Sso2.
      • On the other hand, we have to admit that to perform such assay would indeed be challenging for us as the PhD student who had carried out the in vitro liposome fusion assay has left our previous collaborator’s lab and it would take quite a while to re-establish the assay in our own group and to optimize various parameters in that assay. *3. Along the same line, it would be good if the authors show that the mutation also impairs the interaction of Sec3 and Sso2 in vivo. *

      Response: __We appreciate the reviewer’s suggestion and have carried out co-immunoprecipitation of Sec3-3×Flag and Sso2 from yeast extract to find out how the M7 mutant affects Sso2’s interaction with Sec3 (__Fig. S6). Our results show that in contrast to the clear signal of WT Sso2 pulled down by Sec3-3×Flag, the pull-down band for the M7 mutant was much weaker and at a similar level to the negative control. This is consistent with what we saw in our in vitro binding assays (Fig. 8D; Fig. S5).

      *4. I really like the similarity of the different Munc18-Syntaxin interactions and the Sec3-Sso2 interaction. Do the authors think that Sec3 is an ancestral fragment of a Sec1 like protein, which just maintained this interaction? *

      __Response: __This is a very interesting idea. However, it seems too speculative to us to draw such conclusion. It could also be due to co-evolution in function for Sec3 to use a simpler structure (i.e. PH domain) to mimic syntaxin binding of SM proteins and to employ the extra “add-on” NPY motifs as a handle to facilitate and regulate their interaction.

      1. *Small mistake in the discussionResponses: "plasmas membrane" *

      __Response: __This has been corrected. Thank you.

      *Reviewer #2 (Significance (Required)): Important advance in our understanding of Exocyst function, which deserves publication. I only had minor issues that can be addressed quickly. *

      __Response: __We sincerely appreciate the reviewer’s positive feedbacks and constructive suggestions.

    1. Author Response

      Reviewer 3

      This is work by an internationally recognized group with strong expertise in sophisticated single-molecule microscopy assays in cells. They present here a single-molecule fluorescence-based assay for proximity in the nanometer range.

      It has long been reported that cyanine dyes such as Cy3, Cy5 and derivatives such as AF555, AF647 can undergo a photoswitching mechanism by which the shorter wavelength dye when being excited can switch the longer wavelength dye which is in a dark state back into the bright state. And it has furthermore been reported that this switching mechanism is not based on FRET, as the distance requirement is more stringent (up to ~ 2 nm). However, this mechanism has not been fully explored for the investigation of molecular interactions yet.

      The authors in the present work present a similar mechanism for a different class of rhodamine-based fluorophores, specifically JF549 and JFX650. They describe the discovery of this mechanism in dual-color labeling of a pentameric protein and initial characterization to distinguish it from UV-light-mediated recovery from a pumped dark state as reported for (d)STORM-like measurements. They extend their observation to TMR, JF529 as lower wavelength "senders" and JF646 and JFX646 as longer wavelength "receivers" that can become reactivated into the ground state upon illumination of a nearby "sender". The authors then test activation pulse length and distance dependence and find that longer pulses lead to more recovery and that PAPA of JF549/JFX650 has unlike previously observed for the Cy3/Cy5 pair a smaller distance dependence than FRET of the same fluorophore pair. The authors then move on to use both the UV-light mediated direct reactivation "DR" and proximityassisted photoactivation "PAPA" to activate different molecules that are either double-labeled for PAPA or singly labeled with JFX650 for DR. They succeeded in four different cases to identify clear population shifts to distinguish molecules of different mobility.

      Overall, I think the authors made an interesting discovery and characterizing this previously poorly characterised interaction for cellular single-molecule experiments is certainly an important effort. The authors make an honest and quite complete effort to work out the practical details of this interaction and designed experiments that convincingly highlight the basic capabilities this technique offers to the detection of verified interactions and the mobility of interacting molecules in cells.

      The weakness is that these capabilities do not seem to be as clear-cut as the reviewer hoped for when starting to read this manuscript. It remains unclear to this reviewer, to what extant PAPA molecules can be separated from DR molecules. In all but the last diffusion experiment(s) in Figure 4, PAPA molecules seem to be significantly perturbed by DR molecules, casting doubt on the usefulness in real experiments. Similarly, in Figure 5, a difference is seen but does not allow for quantification. This certainly is not the case for other methods of sensing as well, but maybe the authors could more specifically compare their efforts and the dynamic range to other sensors for example in Figure 5? This would make it easier for the reader to make up their mind if the assay is worthwhile adopting for their system.

      We agree that a problem with PAPA at present is that although PAPA trajectories are significantly enriched for double-labeled complexes, they are still “contaminated” with singlelabeled molecules. As we described in the Discussion (and as pointed out by Reviewer 1), we think that one major contribution to this background arises from chance proximity of sender and receiver molecules independent of direct physical interaction. Additionally, some background is expected from continual spontaneous (a.k.a. “thermal”) reactivation of molecules from the dark state.

      In response to the reviewers’ comments, we have tried to quantify more precisely how much PAPA enriches for one population over another by fitting the diffusion spectra of 2-component mixtures to linear combinations of the corresponding individual components (Figure 4–figure supplement 4). We estimate that the fold enrichment of double-labeled molecules ranged from 3.7 to 37-fold between different 2-component mixtures.

      We fully agree that it is critical that researchers who use PAPA be aware of its limitations, so that they do not fallaciously assume that all green-reactivated localizations are protein complexes. To avoid committing a bait-and-switch against our readers, we now state explicitly in the Introduction that PAPA in its current form enriches for complexes but does not provide perfect selectivity. In Appendix 2, we now discuss the problem of background reactivation in more detail and outline what we think will be required to correct quantitatively for this background. Though we believe that such corrections will ultimately be possible, at least in some cases, figuring out how to do this rigorously will require substantial additional development of experimental and computational methods, which we hope the editor and reviewers agree is beyond the scope of the current paper.

      At the end of Appendix 2, we briefly mention another technical problem that we have noticed with SNAP ligand background staining. While this background was negligible for the experiments described in this paper, which involved highly expressed SNAPf transgenes, it may pose a more significant problem for SNAPf-tagged proteins with lower expression levels. We think it is worth mentioning this problem to make readers aware of it and hopefully to motivate the development of better orthogonal pairs of self-labeling tags.

      While there are obviously limitations to PAPA, we think this should not overshadow the fact we have identified a novel photophysical property of commonly used fluorophores and harnessed it to detect molecular interactions in live cells. Our initial proof-of-concept study provides a foot in the door of this new biophysical approach, which we and others will continue to refine. Immediate applications of PAPA could include disambiguation of peak assignments in complex diffusion spectra, confirmation of proposed interactions between proteins (and subsequent investigations into the molecular mechanisms supporting such interactions), or integration into SPT-based high-throughput screening (https://www.eikontx.com/technology) to provide a useful additional readout for each experimental condition.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors show that the unmitigated generation interval of the original variant of SARS-CoV-2 is longer than originally thought. They argue that in the absence of interventions that limit transmission late in the course of infection, the fraction of transmission events that occur before symptom onset would be considerably lower, and the fraction of transmission events occurring 10 days or more after infection of the index case would be substantially higher.

      These findings improve our ability to accurately estimate the basic reproductive number (R0), to evaluate quarantine and isolation policies, and to model counterfactual intervention-free scenarios. Many applied analyses rely on accurate generation interval estimates. To head off confusion, it would be helpful if the authors could provide more comprehensive guidance about which applied analyses should be informed by the unmitigated generation interval, or the observed generation interval. (E.g. the unmitigated interval is useful for quarantine and isolation policies, but would we ever want to use the unmitigated interval to estimate R?).

      The unmitigated generation-interval should be used for estimation of the R0 of the initial epidemic phase, but not for the R(t) of the current epidemics. Estimation of R(t) must account for changes in generation-interval distributions caused by the invasion of new variants and changes in behavior. When analyzing policies of quarantine, isolation or contact tracing, the unmitigated interval should also be adopted to account for late transmissions.

      We added few sentences at the end of our introduction to clarify this point:

      “The estimated unmitigated generation-interval distribution could be adopted for answering questions about quarantine and isolation policy, as well as for estimating the original R0 at the initial spread in China. However, estimation of instantaneous R(t) should account for changes in generation-interval distributions, reflecting mitigation effects and the current variant.”

      The analysis estimates a longer generation interval after accounting for three main sources of bias or error that are common in other analyses: 1. Recently infected individuals are intrinsically overrepresented in data on a growing epidemic. Thus, shorter incubation periods and forward serial intervals are more likely to be observed, even in the absence of interventions. This analysis adjusts for these dynamical biases. 2. Interventions or behavioral changes can prevent transmission late in the course of infection. This can shorten the generation and serial intervals over the course of an epidemic. This analysis focuses specifically on transmission pairs observed very early, before the adoption of interventions. 3. The incubation period and generation interval should be correlated - infectors that progress relatively quickly to symptoms should also become infectious sooner (symptom onset occurs near the peak of viral titers). Most existing analyses assume these intervals are uncorrelated, but this analysis accounts for their correlation.

      Overall, the conclusions seem reasonable and well-supported. The observation that the generation interval decreases over the course of an epidemic is also consistent with existing studies that show the serial interval has similarly decreased over time. But given the implications of the findings, I hope the authors can address a few questions about potential additional sources of bias:

      1. It is somewhat reassuring that the generation interval decreases relatively smoothly as the cutoff date increases (Fig. S6), but it would be helpful if the authors address the potential impact of ascertainment biases. One of the main reasons that the authors estimate a shorter generation interval is that they define January 17th, early in the outbreak before interventions and behavioral changes had taken place, as the cutoff point for the infector's date of symptom onset. This cutoff eliminates biases from interventions, but it also severely limits the size of the transmission-pair dataset (Fig. S3), and focusing on this very early subset of cases may increase the influence of ascertainment bias. Prior to January 17th, should we expect observed transmission pairs to involve more severe cases on average? And is the unmitigated generation interval correlated with case severity?

      We thank the reviewer for identifying a source of possible bias that we overlooked. Following the comment we performed a new sensitivity analysis for the inclusion of the severe cases, summarized in Appendix 1—figure 11.

      Severity of the cases was reported only in Ali et al.’s data, for some of the individuals. In these cases, individuals are divided into one of three conditions: mild, severe (non-fatal) and death. As non-mild cases represent a small fraction of the dataset, we combine them into one category, which we denote as severe.

      Severe cases (including deaths) were overrepresented in the period prior to January 17, with 8 out of 77 cases, compared to 18 out of 745 in the period of January 18-31. The effect of inclusion of severe cases was analyzed by comparing the means of the estimated generation-interval distribution, separately for the two periods in question, using the inference framework with 30 bootstrapping runs. For the earlier period, the estimated means were compared between the dataset with or without the severe cases. For the later period, we also consider the “enriched” dataset, in which severe cases are oversampled for each bootstrap such that the proportion of severe cases matches that during the earlier period (10%). In both cases we see that the effect on the estimated mean generation interval is small.

      1. The analysis assumes the incubation period follows a fixed distribution, whose parameterization comes from a meta-analysis of previously estimated incubation periods. But p.5 discusses the idea that observed incubation periods are affected by the same dynamical biases as forward serial intervals, "For example, when the incidence of infection is increasing exponentially, individuals are more likely to have been infected recently. Therefore, a cohort of infectors that developed symptoms at the same time will have shorter incubation periods than their infectees on average, which will, in turn, affect the shape of the forward serial-interval distribution." Has the incubation period been adjusted for these dynamical biases, or should it be?

      In our analysis we use the incubation period distribution from Xin et al. 2021 which already considers the backward bias caused by the expanding epidemic with the corrected growth rate of 0.1/d. Xin et al. showed in their meta-analysis that the mean incubation period reported by the various sources changed according to the dates used by the source. Incubation periods prior to the peak of the epidemic in China were lower than ones from after the peak, in a manner that coincided with the backward correction they performed (using a similar derivation to that suggested by Park et al. 2021). Accordingly, the distribution of incubation period they report is the intrinsic incubation period, after correction for the growth rate of the initial spread in China. We added two sentences in our methods section to clarify this point:

      “In their meta-analysis, Xin et al. found an increase of the incubation period following the introduction of interventions in China, matching the theoretical framework shown above. Their inferred incubation period distribution includes a correction for the growth rate of the early spread, accordingly.”

      Furthermore, we perform a sensitivity analysis for the shape of the incubation period distribution, and show that it has a minor effect on our conclusions (Appendix 1—figure 10).

      1. It appears that correlation parameter estimates co-vary with estimates of the mean generation interval (Fig. S6; S13b). Are the authors confident that the correlation parameter is identifiable? How much would the median generation interval estimate in the main analysis change if the correlation parameter had been fixed to 0 (which isn't realistic) or to 0.5 (which might be plausible)?

      As the reviewer pointed out, the correlation parameter estimates co-vary with estimates of the mean generation interval. We further analyzed this relation following the comment. The analysis is summarized in supplemental figures S19-20.

      We first examine the relation between the mean generation interval and the correlation parameter based on the uncertainty estimates, consisting of 1000 bootstrap runs. Appendix 1—figure 12 shows a joint bivariate scatter plot of the parameters, together with contours of equal probability. As can be seen there is a connection between the parameters. The estimates centered around the maximum likelihood estimate with correlation parameter of 0.75 and mean generation interval of 9.7 days. The confidence interval for the correlation parameter of 0.45-0.95 corresponds to mean generation intervals in the range of 8-11 days, supporting the conclusion of this study.

      Next, we reanalyzed the dataset while fixing the correlation parameter, as suggested by the reviewer. Appendix 1—figure 13 shows the estimated mean generation interval for fixed correlation parameters with values of 0, 0.25, 0.5, 0.75, 0.9. For each fixed correlation parameter 100 bootstrapping runs. As can be seen, the results reflect the same connection that can be seen in Appendix 1—figure 12, with probable values in the range of 8-11 days, for correlation parameters in the range of 0.5-0.9. Assuming no correlation would cause underestimation of the mean generation interval match previous literature (Hart, Maini, and Thompson 2021; Park et al. 2022).

      Reviewer #2 (Public Review):

      There have been several estimates of the generation time and serial interval published for SARS-CoV-2, but as the authors note, estimates can be subject to biases including shifted event timing depending on the phase of the epidemic, correlation in characteristics between infector and infectee, and impact of control measures on truncating potential infectiousness. This study, therefore, has several strengths. It first collates data on transmission events from the earliest phase of the COVID-19 pandemic, then makes adjustments for these potential biases to estimate the generation time in absence of control measures, and finally discusses implications for transmission.

      Given many subsequent aspects of the COVID-19 pandemic have been defined relative to earlier phases (e.g. relative transmissibility of variants, relative duration of infectiousness), understanding the baseline characteristics of the infection is crucial. I thought this paper makes a useful contribution to this understanding, generating adjusted estimates for infectiousness (which is longer than previous estimates) and corresponding values for the reproduction number (which is remarkably similar to earlier estimates, presumably because of the different sources of bias in the growth rate and generation time distribution somehow end up canceling each other out).

      However, there are some weaknesses at present. The study correctly flags several potential sources of bias in estimates, but in making adjustments uses estimates from the literature that themselves could suffer from these biases, e.g. the distribution of incubation period from a 2021 meta-analysis. Although the authors conduct some sensitivity analysis it would be worth including some more explicit consideration of whether they would expect any underlying bias to propagate through their calculations. The authors also conduct some sensitivity analysis around the underlying data (e.g. ordering of transmission pairs), but again it would be useful to know whether there could be systematic biases in these early data. Specifically, the paper references Tsang et al (2020), which highlighted variability in early case definitions - is it possible that early generation times are estimated to be longer because intermediate cases in the transmission chain were more likely to go undetected than later in the epidemic?

      We recognize the potential biases in the transmission pairs data. We therefore developed an extensive framework of sensitivity analyses for identifying biases that could substantially affect the results. In the results section and figure 5, we show that the main study result, that the unmitigated generation-interval distribution is longer than previously estimated, is robust to reasonable amounts of ascertainment bias. We discuss this point at length and have added several supplemental figures to support this claim.

      As reviewer #3 mentioned, we conducted a sensitivity analysis for the inclusion of the longest serial intervals, to investigate possible effects of missing links in the longest transmission pairs. We also discuss why we think it’s not necessary to explicitly model the short intervals that may be unobserved due to missing links.

      “Second, we considered the possibility that long serial intervals may be caused by omission of intermediate infections in multiple chains of transmission, which in turn would lead to overestimation of the mean serial and generation intervals. Thus, we refit our model after removing long serial intervals from the data (by varying the maximum serial interval between 14 and 24 days). We also considered “splitting” these intervals into smaller intervals, but decided this was unnecessarily complex, since several choices would need to be made, and the effects would likely be small compared to the effect of the choice of maximum, since the distribution of the resulting split intervals would not differ sharply from that of the remaining observed intervals in most cases.”

      We added to the discussion text regarding the effect of possible bias in the dataset, explicitly specifying the ascertainment bias.

      “Our analysis relies on datasets of transmission pairs gathered from previously published studies and thus has several limitations that are difficult to correct for. Transmission pairs data can be prone to incorrect identification of transmission pairs, including the direction of transmission. In particular, presymptomatic transmission can cause infectors to report symptoms after their infectees, making it difficult to identify who infected whom. Data from the early outbreak might also be sensitive to ascertainment and reporting biases which could lead to missing links in transmission pairs, causing serial intervals to appear longer (For example, people who transmit asymptomatically might not be identified). Moreover, when multiple potential infectors are present, an individual who developed symptoms close to when the infectee became infected is more likely to be identified as the infector. These biases might increase the estimated correlation of the incubation period and the period of infectiousness. We have tried to account for these biases by using a bootstrapping approach, in which some data points are omitted in each bootstrap sample. The relatively narrow ranges of uncertainty suggest that the results are not very sensitive to specific transmission pairs data points being included in the analysis. We also performed a sensitivity analysis to address several potential biases such as the duration of the unmitigated transmission period, the inclusion of long serial intervals in the dataset, and the incorrect ordering of transmission pairs (see Methods). The sensitivity analysis shows that although these biases could decrease the inferred mean generation interval, our main conclusions about the long unmitigated generation intervals (high median length and substantial residual transmission after 14 days) remained robust (Figure 5).”

      It would also be helpful to have some clarifications about methodology, particularly in how the main results about generation time are subsequently analyzed. For example, estimates such as the conversion of generation time to R0 and VOC scalings are described very briefly, so it is currently unclear exactly how these calculations are being performed.

      Following the reviewer comments we made edits to the Methods section in order to make it more readable and clearer. We added subheadings for the various sections. Moreover, we added a section explaining the derivation of the basic reproduction number and clarified the section regarding the VOCs extrapolations.

      We made some edits to the methods section in order to make it more accessible and clear, for example, we added subheadings for the various sections, added a section explaining the derivation of the basic reproduction number, and clarified the section regarding the VOCs extrapolations.

      Reviewer #3 (Public Review):

      Sender & Bar-On et al. perform robust analyses of early SARS-CoV-2 line list data from China to estimate the intrinsic generation interval in the absence of interventions. This is an important topic, as most SARS-CoV-2 data are from periods when transmission-reducing interventions are in place, which will lead to underestimation of the potential infectious period.

      The authors highlight two shortcomings in previous approaches. First, the distribution of 'observed' serial intervals (the time between symptom onset in the infector and symptom onset in the infectee) depends not only on the timeline of each infector's infection, but also the epidemic growth rate, which weights the proportion of observed short vs. long serial intervals. The authors argue that by accounting for this weighting, more accurate estimates of the intrinsic generation interval - the metric on which isolation policies are based - can be obtained. Second, the authors find that the original SARS-CoV-2 generation interval distribution has both a higher mean and longer tail than previous estimates when using only data prior to the introduction of interventions. Finally, the authors use publicly available data on viral load trajectories to extrapolate their estimates to other SARS-CoV-2 variants, finding that alpha, delta, and omicron may have shorter generation intervals than original SARS-CoV-2. These findings are important, as case isolation policies are based on assumptions for how long individuals remain infectious. More broadly, these methods will be important for future work to correctly estimate generation intervals in other outbreaks.

      The conclusions are well supported by the data, and a suite of sensitivity analyses give confidence that the findings are robust to deviations from many of the key assumptions. The code is well documented and publicly available, and thus the findings are easily reproducible. Key strengths of the paper include the clarity and rigor of the modeling methods, and the exhaustive consideration of potential biases and corresponding sensitivity analyses - it is very difficult to think of potential biases that the authors have not already considered! I think this is a well-written and well-executed study. The work is likely to be impactful for reconsidering SARS-CoV-2 isolation policies and revisiting generation interval estimates from other data sources. I also expect this to be a key reference and method for future studies estimating the generation interval.

      I have some minor comments on potential weaknesses and interpretation:

      1. Uncertainty in early generation interval estimates. One of the conclusions is that the estimated mean generation interval is longer than the observed mean serial interval. However, this conclusion does not seem justified given that the observed mean serial interval (9.1 days) is well within the 95% CI of 8.3-11.2 days for the mean generation interval. The confidence intervals for the serial interval in figure 2 are also wide for pre-Jan 17th (though presumably these would be reduced if all pre-Jan 17th serial intervals were combined). Further, only 77 of the ~1000 transmission pairs are actually from pre-January 17th. The actual sample size used for these estimates is much smaller than suggested by Figure S1 and thus this should be made clear. Therefore, although the intuition for why observed serial intervals may differ from the generation interval is correct, I do not think that the data alone demonstrate this. A related issue is on ascertainment bias - could the early serial interval data be biased longer because ascertainment is initially poor and thus more intermediate infectors are missed? The authors consider removing particularly long serial intervals to try and account for this, but that does not deal with e.g. chains of multiple short serial intervals being incorrectly recorded as a single long serial interval (but still within 16 days).

      We agree with the reviewer that due the large uncertainty we cannot deduce that the mean generation interval is longer than the mean serial interval. We changed the phrasing to emphasize this statement is supported by mathematical theory.

      “We note that our estimated mean generation-interval is longer than the observed mean serial-interval (9.1 days) of the period in question. This is supported by the theory (Park et al. 2021) of the dynamical effects of the epidemic -- in contrast to the common assumption that the mean generation and serial intervals are identical. During the exponential growth phase, the mean incubation period of the infectors is expected to be shorter than the mean incubation period of the infectee - this effect causes the mean forward serial interval to become longer than the mean forward generation interval of the cohorts that developed symptoms during the study period. However, these cohorts of infectors with short incubation periods will also have short forward generation (and therefore serial) intervals due to their correlations. When the latter effect is stronger, the mean forward serial interval becomes shorter than the mean intrinsic generation interval, as these findings suggest.“

      Following the comment, we added to Figure S1 the filtering according to date, to reflect the true sample size we use for the main analysis (We renamed it: Appendix 1—figure 1).

      We recognize the potential biases in the transmission pairs data. We therefore developed an extensive framework of sensitivity analyses for identifying biases that could substantially affect the results. In the results section and figure 5, we show that the main study result, that the unmitigated generation-interval distribution is longer than previously estimated, is robust to reasonable amounts of ascertainment bias. We discuss this point at length and have added several supplemental figures to support this claim.

      As reviewer #3 mentioned, we conducted a sensitivity analysis for the inclusion of the longest serial intervals, to investigate possible effects of missing links in the longest transmission pairs. We also discuss why we think it’s not necessary to explicitly model the short intervals that may be unobserved due to missing links.

      “Second, we considered the possibility that long serial intervals may be caused by omission of intermediate infections in multiple chains of transmission, which in turn would lead to overestimation of the mean serial and generation intervals. Thus, we refit our model after removing long serial intervals from the data (by varying the maximum serial interval between 14 and 24 days). We also considered “splitting” these intervals into smaller intervals, but decided this was unnecessarily complex, since several choices would need to be made, and the effects would likely be small compared to the effect of the choice of maximum, since the distribution of the resulting split intervals would not differ sharply from that of the remaining observed intervals in most cases.”

      We added to the discussion text regarding the effect of possible bias in the dataset, explicitly specifying the ascertainment bias.

      “Our analysis relies on datasets of transmission pairs gathered from previously published studies and thus has several limitations that are difficult to correct for. Transmission pairs data can be prone to incorrect identification of transmission pairs, including the direction of transmission. In particular, presymptomatic transmission can cause infectors to report symptoms after their infectees, making it difficult to identify who infected whom. Data from the early outbreak might also be sensitive to ascertainment and reporting biases which could lead to missing links in transmission pairs, causing serial intervals to appear longer (For example, people who transmit asymptomatically might not be identified). Moreover, when multiple potential infectors are present, an individual who developed symptoms close to when the infectee became infected is more likely to be identified as the infector. These biases might increase the estimated correlation of the incubation period and the period of infectiousness. We have tried to account for these biases by using a bootstrapping approach, in which some data points are omitted in each bootstrap sample. The relatively narrow ranges of uncertainty suggest that the results are not very sensitive to specific transmission pairs data points being included in the analysis. We also performed a sensitivity analysis to address several potential biases such as the duration of the unmitigated transmission period, the inclusion of long serial intervals in the dataset, and the incorrect ordering of transmission pairs (see Methods). The sensitivity analysis shows that although these biases could decrease the inferred mean generation interval, our main conclusions about the long unmitigated generation intervals (high median length and substantial residual transmission after 14 days) remained robust (Figure 5).”

      1. Frailty of using viral loads to extrapolate generation intervals. The authors take the observation that variants of concern demonstrate faster viral clearance on average to estimate shorter generation intervals for alpha, delta, and omicron. The authors rightly point out in the discussion that using viral load as a proxy for infectiousness has many limitations. I would emphasize even further that it is very difficult to extrapolate from viral load data in this way, as infectiousness appears to vary far more between variants than can be explained by duration positive or peak viral load. Other factors are potentially at play, such as compartmentalization in the respiratory tract, aerosolization, receptor binding, immunity, etc. Further, there is considerable individual-level variation in viral trajectories and thus the use of a population-mean model overlooks a key component of SARS-CoV-2 infection dynamics. An important reference, which came out recently and thus makes sense to have been missed from the initial submission, is Puhach et al. Nature Medicine 2022 https://doi.org/10.1038/s41591-022-01816-0.

      We agree with the reviewer about the frailty of using viral loads to extrapolate generation intervals. We therefore expanded our discussion of the limitation of using viral load data for inferring infectiousness including many of the points mentioned by the reviewer. We use viral load data in the most minimal way to try to enable some discussion of new VOC, and try to emphasize the needed caution.

      Viral load trajectories data have potential for informing estimates of the infectiousness profile. However the relationship between viral load, culture positivity, symptom onset, and infectivity is complex and not well characterized. Due to this limitation we tried to use viral loads in a more limited way, extrapolating our results to variants of concerns (which lack unmitigated transmission data). Following the comment, we added a detailed discussion of the limitations of using viral loads as a proxy for infectiousness, including the variation of viral loads across individuals. We also added supplementary figures (Figure 6—figure supplements 1-2) to show the possible effect of an individual's viral loads in relation to the infectiousness and for comparison with new viral load and culture results (Chu et al. 2022; Killingley et al. 2022). As the viral load trajectories data for the different VOC is given only as a function of time from the onset of symptoms, it is not possible to directly link it to the fraction of transmission post 14 days from infection. We made changes to Figure 6 to clarify the possible connection of viral load with the TOST (time from symptoms onset to transmission) distribution and the resulting extrapolation to the unmitigated generation-interval distributions.

      “SARS-CoV-2 viral load trajectories serve an important role in understanding the dynamics of the disease and modeling its infectiousness (Quilty et al. 2021; Cleary et al. 2021). Indeed, the general shapes of the mean viral load trajectories and culture positivity, based on longitudinal studies, are comparable with our estimated unmitigated infectiousness profile (Figure 6—figure supplements 1-2, comparison with (Chu et al. 2022; Killingley et al. 2022; Kissler et al. 2021)). However, the nature of the relationship between viral load, culture positivity, symptom onset, and real-world infectivity is complex and not well characterized. Therefore, the ability to infer infectiousness from viral load data is very limited, especially near the tail of infectiousness, several days following symptom onset and peak viral loads. Viral load models are usually made to fit the measurements during an initial exponential clearance phase and in many cases miss a later slow decay (Kissler et al. 2021). Furthermore, there is considerable individual-level variation in viral trajectories that isn’t accounted for in population-mean models (Kissler et al. 2021; Singanayagam et al. 2021). Other factors limiting the ability to compare generation-interval estimates with viral loads models are the variability of the incubation periods and its relation to the timing of the peak of the viral loads, and the great uncertainty and apparent non-linearity of the relation between viral loads and culture positivity (Jaafar et al. 2021; Jones et al. 2021). Due to these caveats and in order to avoid over interpretation of viral load data, we restrict our extrapolation of new VOCs’ infectiousness to a single parameter characterizing the viral duration of clearance.”

      We also edited another paragraph in the discussion:

      “Our extrapolations are necessarily crude given the complex relationship between viral load, symptomaticity, and infectiousness discussed above. Moreover, compartmentalization in the respiratory tract, aerosolization, receptor binding affinity, and immune history can also play important roles in determining the infectiousness profiles of SARS-CoV-2 variants (Puhach et al. ). ”

      1. Lack of validation with other datasets This study hinges on data from a single setting in a short window of time. Although the data are from multiple publications, the fact that so many reported the same transmission pair data demonstrates that these are overlapping datasets. As the authors note, there are potential biases e.g., ascertainment rates and behavioral changes which will impact the generation interval estimates. Thus, generalizability to other settings is limited.

      We agree with the reviewer that the dataset used in our study is limited, and consists of overlapping transmission pairs. We perform some analysis of the possible bias caused by inclusion of each dataset, as can be seen in Appendix 1—figure 4.

      The best validation would have been a comparison with another independent dataset from the early spread of the epidemic, but no such dataset exists. We added a sentence to the discussion to emphasize this point.

      “Due to the nature of early spread of a new unknown disease it is nearly impossible to find two completely unrelated datasets from the period prior to mitigation, limiting the ability of further validation of the current results.”

      1. The impact of epidemic dynamics on infector vs. infectee serial intervals. It took me a long time to get my head around the assertion that the forward serial interval distribution will be longer during epidemic growth due to the overrepresentation of short incubation periods among infectors relative to infectees. A supplementary figure, similar to the way Figure 1 is laid out, to illustrate this concept may go a long way to aid the reader's understanding.

      We added an explanation to the paragraph in order to make it clearer:

      “A cohort of individuals that develop symptoms on a given day is a sample of all individuals who have been previously infected. When the incidence of infection is increasing, recently infected individuals represent a bigger fraction of this population and thus are over-represented in this cohort. Therefore, we are more likely to encounter infected individuals with a short incubation period in this cohort compared to an unbiased sample. The forward serial-interval is calculated for a cohort of infectors who developed symptoms at the same time and therefore is sensitive to this bias. These dynamical biases are demonstrated using epidemic simulations by Park et al."

      1. Simulations to illustrate concepts and power Given the assertion that observed serial intervals will depend on epidemic growth rates, reporting, and timing of interventions, I think a simple simulation to illustrate some of these ideas would be very helpful. For example, a simple agent-based model with simulated infectivity profiles and incubation periods using the estimated bivariate distribution would be extremely helpful in illustrating how serial intervals and estimates of the generation interval can differ from the true intrinsic generation interval (I coded such a simulation to help me understand this paper in a couple of hours with <100 lines of R code, so I do not think this would be much work). This would also be very helpful for illustrating statistical power re. comment 1.

      The current paper is based on a strong theoretical foundation provided by previous works, specifically Park et al. 2021, which used simulations similar to the reviewer’s suggestions to demonstrate the dynamical biases. We now mention these simulations somewhere in the introduction section:

      “These dynamical biases are demonstrated using epidemic simulations by Park et al."

    1. Any available documents regarding student-led activism on cam

      I don't know if this would count as one of the sources we should be using, but perhaps you could also look into what schools offer on-campus gender affirming health care. Overall, I think the project pitch is well thought out and organized, with a good plan of action put in place. The research question is specific enough to produce intriguing result (that are not general) as well as may make it easier to know what to search for when it comes to sources.

    1. Author Response

      Reviewer #1 (Public Review):

      Bohère, Eldridge-Thomas and Kolahgar have studied the effect of mechanical signalling in tissue homeostasis in vivo, genetically manipulating the well known mechano-transductor vinculin in the adult Drosophila intestine. They find that loss of vinculin leads to accelerated, impaired differentiation of the enteroblast, the committed precursor of mature enterocytes, and stimulates the proliferation of the intestinal stem cell. This leads to an enlarged intestinal epithelium. They discriminate that this effect is mediated through its interaction with alpha-catenin and the reinforcement of the adherens junctions, rather than with talin and integrin-mediated interaction with the basal membrane. This results aligns well, as the authors note, with previous observations from Choi, Lucchetta and Ohlstein (2011) doi:10.1073/pnas.1109348108. Bohère et al then explore the impact that disrupting mechano-transduction has on the overall fitness of the adult fly, and find that vinculin mutant adult flies recover faster after starvation than wild types.

      The main conclusions of the paper are convincing and informative. Some important results would benefit from a more detailed description of the phenotypes, and others could have alternative explanations that would warrant some additional clarification.

      1) - Interpretation of phenotypes in vinc[102.1] mutants

      The paper presents several adult phenotypes of the homozygous viable, zygotic null mutant vinculin[102.1], where the fly gut is enlarged (at least in the R4/5 region). In many cases, they correlate this phenotype with that of RNAi knockdown of vinculin in the gut induced in adult stages. This is a perfectly valid approach, but it presents the difficulty of interpretation that the zygotic mutant has lacked vinculin throughout development and in every fly tissue, including the visceral mesoderm that wraps the intestinal epithelium and that also seems enlarged in the vinc[102.1] mutant. So this phenotype, and others reported, could arise from tissue interactions. To me, the quickest way to eliminate this problem would be to express vinculin in ISCs and/or EBs the vinc[102.1] background, either throughout development or after pupariation or emergence, and observe a rescue.

      We agree with the reviewer that we cannot exclude additional vinculin role(s) in other tissues during or after development that might have an impact on the intestinal epithelium. Our attempts to express a full-length Vinculin construct (Maartens et al, 2016) in the vinc102.1 flies, either in adulthood or throughout development, were not very conclusive: although we observed some degree of rescue, it was not fully penetrant. This was in contrast to the complete rescue observed with the genomic rescue of vinculin. Thus, it is possible that some form of tissue interaction contributes to the phenotype observed, for example if vinculin loss affects muscle structure. Alternatively, just like it was shown that too much active vinculin is detrimental to the fly (Maartens et al, 2016), our experiment suggests that too much vinculin may be deleterious to the intestine.

      In any case, because of cell-specific knockdowns in the adult gut, we are confident that EB reduction of vinculin levels or activity is sufficient to accelerate tissue turnover, at least in a specific portion of the posterior midgut. We have amended the text to acknowledge a role for tissue interactions (see page 6 (end of first paragraph), page 7 (start of last paragraph), page 12 (starvation experiments).

      An experiment where this is particularly difficult is with the starvation/refeeding experiment. The authors explored whether the disruption of tissue homeostasis, as a result of vinculin loss, matters to the fly. So they tested whether flies would be sensitive to starvation/re-feeding, where cellular density changes and vinculin mechano-sensing properties may be necessary. They correctly conclude that mutant flies are more resistant to starvation, and suggest that this may be due to the fact that intestines are larger and therefore more resilient. However, in these animals vinculin is absent in all tissues. It is equally likely that the resistance to starvation was due to the effect of Vinculin in the fat body, ovary, brain, or other adult tissues singly or in combination. The fact that the intestine recovers transiently to a size slightly larger than that of the fed flies seems anecdotal, considering the noise within the timeline of fed controls. I am not sure this experiment is needed in the paper at all, but to me, the healthy conclusion from this effort is that more work is needed to determine the impact of vinculin-mediated intestinal homeostasis in stress resistance, and that this is out of the scope of this paper.

      Please the new data presented in Figure 8A-B (text page 12).

      2) - Cell autonomy of the requirement of Vinculin and alpha-Catenin

      Authors interpret that Vinculin is needed in the EB to maintain mechanical contact with the ISC, restrict ISC proliferation through contact inhibition, and maintain the EB quiescent. This interpretation explains seemingly well the lack of an obvious phenotype when knocking down vinculin in ISCs only, while knockdown in ISCs and EBs, or EBs only, does lead to differentiation problems. It also sits well with the additional observation that vinculin knockdown in mature ECs does not have an obvious phenotype. However, a close examination makes the results difficult to explain with this interpretation only. If the authors were correct, one would expect that in mutant clones, eventually, vinculin-deficient EBs will be produced, which should mis-differentiate and induce additional ISC proliferation. However, the clones only show a reduction in ISC proportions; the most straight forward interpretation of this is that vinculin is cell-autonomously necessary for ISC maintenance (which is at odds with the phenotype of vinculin knockdown using the ISC and ISC/EB drivers).

      We apologise that we were unclear in the text. With hindsight, the confusion may have been caused by our describing the phenotype of MARCM clones before reporting the accumulation of EBs in the vinc102.1 guts. Therefore, we swapped these two sections and improved the description of these experiments in the manuscript (see section: “The pool of enterocyte progenitors expands upon vinculin depletion” pages 6-8).

      In brief, we do not think that our results are at odds with the phenotype of vinculin knockdown using the ISC and ISC/EB drivers - we realise the text was misleading and hope to have clarified our observations in the revised manuscript (pages 7 and 8). From cell conditional RNAi experiments, like the reviewer, we would predict that vinculin knockdown or loss of function in mitotic clones (MARCM experiments, Figure 4E-G) will induce accelerated differentiation of vinculin deficient enteroblasts, which in turn will increase proliferation. We observed that vinc102.1 or vinc RNAi mitotic clones contained similar number of cells compared to control clones, but reduced proportion of stem cells (Figure 4G). We interpret this as indicating that to maintain an equivalent clone size, stem cells must have divided more frequently, with some divisions producing two differentiated daughter cells. This type of symmetric division would increase the EB pool (as seen in Figure 4-figure supplement 2B), at the expense of the ISC population, in turn decreasing long term clonal growth potential. Altogether, the results obtained with MARCM clones highlight changes in tissue dynamics compatible with those observed with cell-specific vinculin knockdowns.

      Also, from the authors interpretation, it would follow induce that the phenotype of vinculin knockdown in ISCs+EBs and in EBs only should be the same. However, in ISCs+EBs vinculin knockdown, differentiation accelerates, which is likely accompanied by increased proliferation (judging by the increase in GFP area, PH3 staining would be more definitive).

      Indeed, the accelerated differentiation observed with esgGal4>UAS VincRNAi is accompanied by increased proliferation with the two independent RNAi lines used. We have added this result in Figure 1-figure supplement 1G (and in text, page 5).

      This contrasts with the knockdown only in EBs, which leads to accumulation of EBs due to misdifferentiation, and increased proliferation, mostly of ISCs, as measured directly with PH3 staining, but not additional late EBs or mature ECs. The authors call this "incomplete maturation due to accelerated differentiation". I think that one should expect to find incomplete differentiation/maturation when the rate of the process is very slow, not the other way around. To me, these are different phenotypes, which could perhaps be explained if vinculin was also needed in the ISC to transmit tension to the EB and prevent its differentiation, and removing it only in the EB may be revealing an additional, cell-autonomous requirement in maturation.

      When vinculin is knocked down in EBs, cells appear bigger than controls (as judged by the RFP+ nuclei in Figure 5E). This, compared to yw and vinc102.1 guts shown in Figure 4D suggests that these cells are more advanced in their differentiation. We have removed the sentence, to not confuse the reader, and clarified the text (see page 8). The discrepancy in the differentiation index between the esgGal4 and KluGal4 experiments might result from differences in the drivers, or an additional role of vinculin in EC differentiation, which we now mention in the text (page 8).

      So far, we have no evidence to support the idea that vinculin is also needed in the ISC to transmit tension to the EB and prevent its differentiation; for example, the lack of any phenotype when we knocked down vinculin specifically in ISCs (Figure 3) – notably, no increase in ISC ratio and no increase in cell density (unlike the reduction seen in Figure 1F with ISC+EB Knockdown).

      Another unexpected result, considering the authors interpretation, is that the over expression of activated Vinculin (vinc[CO]) does not seem to have much of an effect. It does not change the phenotype of the wild type (where there is very little basal turnover to begin with) and it only partially rescues the phenotype of the vinc[102.1] mutants, when the rescue transgene vinc:RFP does. This again suggests that there may be tissue interactions, in development or adulthood, that may explain the vinc[102.1] phenotypes. It could also be that this incomplete rescue is due to the deleterious effect of Vinc[CO]; this is another reason for doing the vinc[102.1]; esg-Gal4; UAS-vincFL experiments suggested above). An alternative experiment to perform this rescue would be to knock down vinculin gene while overexpressing the Vinc[CO] transgene - this may be possible with the RNAi HSM02356, which targets the vinculin 3'UTR and is unlikely to affect UAS-vinc[CO].

      Please refer to essential point 2c; as VincCO is not a simple overactive protein, like a constitutively active kinase, additional effects in the tissue can be expected.

      The claims of the authors would be more solid if the reporting of the phenotypes was more homogeneous, so one could establish comparisons. Sometimes conditions are analysed by differentiation index, others by extension of the GFP domains, others with phospho-histone-3 (PH3), others through nuclear size or density, and combinations. I do not think the authors should evaluate all these phenotypes in all conditions, but evaluating mitotic index and abundance of EBs and "activated EBs/early ECs" to monitor proliferation and differentiation rates should be done across the board (ISC, ISC+EB, EB drivers).

      To improve consistency, in all conditions we have compared cell types ratios and cellular density upon vinculin knockdown: see Figure 1E-F for ISC+EB, Figure 3B-C for ISC, and Figure 5 – figure supplement 1C-E for EB (with panel E newly added). As we did not observe any effect on ratio or density, we did not monitor cell proliferation for ISC knockdown.

      Nonetheless, we added the mitotic index for the ISC+EB driver (new Figure 1- figure supplement 1G) to be consistent with the results from the EB driver (Figure 5- figure supplement 1C).

      If the primary role of Vinculin is to induce contact inhibition in the ISC from the EB and prevent the EB differentiation and proliferation, one would expect that over expression of Vinc[CO] (or perhaps VincFL or sqhDD) in EBs should prevent or delay the differentiation and proliferation induced by a presumably orthogonal factor, like infection with Pseudomonas entomophila or Erwinia carotovora.

      This is indeed an exciting prediction, but outside the scope of this manuscript.

      3) - Relationship between Vinculin and alpha-Catenin

      The authors establish a very clear difference in the phenotypes between focal adhesion components and Vinculin, whereas the similarity of alpha-catenin and vinculin knockdowns is very compelling. Therefore I am sure the authors are in the right path with their interpretation of this part of the paper. However, some of the alpha-Catenin experiments are not very clear. The result from the rescue experiment of alpha-Cat knockdown with alpha-Cat-deltaM1b does not seem to show what the authors claim, and differentiation does not seem affected, only the amount of extant older ECs (which may be due to other reasons as this is a non-autonomous effect).

      Like the reviewer, we were surprised about the milder rescue with M1b compared to M1a and are unsure of the reasons for this. Nevertheless, quantifications of the differentiation and retention indices show significant differences for M1a and M1b compared to the FL control (Figure 6F-G), with phenotypes resembling the vinc knockdown. In Figure 6E, we have added a row of zoomed views to better highlight the similarity of phenotype between M1a and M1b and have acknowledged the mild differences in the text (bottom of page 9). For the sake of rigour, we think it is important to include results from both M1 deletions, even if there is not yet a logical reason to explain why they have different effects.

      Ulrich Tepass produced a UAS-alpha-catenin construct with the full deletion of the M1 region, perhaps that would show a clearer phenotype.

      This is a good suggestion, however for technical reasons this is not possible. The strategy devised by Ken Irvine and his group relies on rescuing the RNAi with an RNAi resistant construct, which is not the case for the constructs generated in the Tepass lab. Furthermore, we cannot adopt a MARCM strategy as -cat is too close to the centromere (80F).

      Also, the autonomy of the phenotype is difficult to address with these experiments alone. It would be expected that the phenotype of alpha-catenin knockdown should be similar to that of vinculin knockdown in the ISCs only or EBs only.

      This is not what our understanding of cadherin-mediated adhesion would predict. Forming cadherin adhesions requires cadherins and catenins in both cells, so we would expect similar phenotypes in ISCs only and EBs only. What is exciting about our findings is that the mechanosensitive machinery is not equally important in the two adherent cells, i.e. the EB is using vinculin to measure force on the contact and regulate differentiation, whereas the ISC needs to resist that force, but does not use vinculin to sense that force and regulate its behaviour.

      We have added new data showing the role of the vinculin/α-catenin interaction in ISCs or EBs by co-expressing α-Cat RNAi and α-Cat ΔM1a. We observed that absence of VBS in α-catenin has no effect in ISCs but promotes EB differentiation and increase in numbers (new Figure 6 – figure supplement 2), similar to our observations with vincRNAi (see text page 10).

      Reviewer #2 (Public Review):

      Vinculin functions as an important structural bridge that connects cadherin and integrin-mediated adhesions to the F-actin cytoskeleton. This manuscript carefully examined the mutant phenotype of vinc in the Drosophila intestine and found that vinc mutant in EBs causes significant increases of EB to EC differentiation, stem proliferation, and tissue growth. By analyzing the mutant phenotype of the cadherin adaptor alpha-catenin, the authors suggest that the vinc functions through the cell-cell junctions instead of cell-CEM adhesions in EBs. Finally, manipulation of myosin activity in EBs phenocopies the vinc mutant, suggesting that vinculin is regulated by the mechanical tension transduced through the cytoskeleton.

      The authors claim that the vinculin mutant phenotype is opposite compared to the loss of the major integrin components, suggesting a function independent of the cell-ECM adhesions. However, the phenotype of vinc and integrin may not be completely opposite. Besides loss of ISCs, both mys and talin knockdown in ISCs clearly causes ISCs differentiation into EC cells (Fig.3A), suggesting a possible involvement of integrin in EB to EC differentiation. Therefore, it will be important to test the phenotype of integrin KD in EBs using EB-specific Gal4.

      The reviewer raised an important point. To test this we had to overcome the ISC defect of mys or talin RNAi, and specifically tested their function in enteroblasts using the KluGal4 driver. This revealed a similar phenotype of accelerated differentiation, assayed with the ReDDM system (see new Figure 6 -figure supplement 4). Thus, as the reviewer suggested both integrins and cadherins function in this process, we have amended the text to indicate this (see page 10, and sentence in the discussion page 12). It appears however that, unlike vinculin, they also have a key role in ISCs.

      The authors proposed a model that the cell-cell adhesion between ISC and EBs is required for vinculin mediated differentiation suppression. However, this model is not directly supported by the data as the EB-ISC adhesion and EB-EC adhesion have not been tested separately.

      This is an important point and we have amended the text to address this.

      We have focussed our model on EB-ISC adhesion as the adherens junctions are stronger between progenitor cells than EBs-ECs, and because of previous data from the Ohlstein lab (Choi et al, 2011) demonstrating the relationship between adherens junction stability and EB differentiation/ISC proliferation. Nonetheless we agree it is possible that EB-EC adhesion might contribute to this mechanism and have modified the last sentence of the result section (page 12) and the legend associated to the model (Figure 8) to take this into account.

      In addition, previous short-term manipulation of E-cadherin in ISCs and EBs shows no change in cell proliferation (Liang J. et al. 2017), which seems to contradict the authors' model. To support the authors' conclusion, long-term manipulation of E-cadherin in ISCs and EBs must be tested.

      A main feature of the vinculin phenotype is the regional accelerated differentiation observed in R4/5, potentially reflecting areas more subject to mechanical forces. Strikingly, this accelerated differentiation is rarely observed more anteriorly (such as region R4a/b studied in Liang et al, 2017). In fact, these regional differences were previously reported with E-cadherin knockdown by the Adachi-Yamada group (see Figure S1, Maeda et al, 2008). This highlights the importance of considering regional control of cell fate for the field.

      To test our hypothesis further, we have knocked down E-cadherin and α-catenin in EBs only (with Klu-Gal4). As shown in new Figure 6-figure supplement 3, we observed an accumulation of EBs as early as 3 days after induction, reminiscent of vinculin loss of function phenotype. Longer E-cadherin EB knock-down with KluGal4 appears particularly detrimental for survival as all flies died after 4 days of continuous RNAi expression preventing any further observations (see new text page 10). These observations support our model that junctional stability slows down EB differentiation. Our results are also in agreement with the work described in Choi et al (2011), whereby after 6 days of E-Cadherin RNAi expression in progenitors or EBs (using a different driver from us, Su(H)Gal4), the mitotic index increases, showing a feedback regulation on ISC proliferation. Therefore, our work and the Liang et al 2017 study are not in fact contradictory: the differences in the contribution of junctions to tissue dynamics might reflect the variety of molecular mechanisms involved along the small intestine.

      The result of MARCM analysis seems inconsistent with the rest of the data. In MARCM, no significant change of clone sizes is observed between WT and vinc mutant (Fig. 3E). However, vinc mutant in EBs clearly promotes ISC proliferation in other experiments such as esg>vinc-RNAi and the EB>vinc-RNAi (Fig. 1A, Fig. 4).

      Please refer to point 2a, essential revisions. We do not think that our results are at odds with the phenotype of vinculin knockdown using the ISC and ISC/EB drivers - we realise the text was misleading and hope to have clarified our observations in the revised manuscript (pages 7-8).

      In Fig. 4H, the authors suggest that vinculin mutant prevent terminal EC formation. However, this may be simply caused by longer retention of Klu expression in the newborn ECs. To test if EB differentiation is indeed affected, the EC marker pdm1 staining will provide more convincing evidence. Another experiment to strengthen the conclusion will be the tracking of clone sizes generated from a single EB cell using the UAS-Flp system (such as G-trace).

      These are good suggestions to strengthen our findings. Unfortunately, we have not managed to obtain a working Pdm1 antibody (or other commercially available EC marker), which is why we assayed nuclear size and the tracking of KluReDDM cells. Therefore, we have not been able to test if Klu is retained in newborn ECs.

      As we agree this section of the text was misleading, we have rephrased and highlighted that the phenotype seen with KluGal4ReDDM resembles the accumulation of activated EBs and newborn ECs observed in vinc102.1 guts. (page 8).

      In Fig. 6D, the survival rate of WT and vinc mutant flies were compared. However, as there is no additional assay about the feeding behavior or metabolic rate, the systematic mutant of vinc does not provide a direct link between animal survival and intestinal EBs. Therefore, an experiment with vinc level specifically manipulated in fly intestine using esg>vinc-RNAi or BE>vinc-RNAi will be more relevant.

      This experiment has now been added in Figure 8B and the text modified to acknowledge the limitations of the survival experiments with whole mutant flies (see point 3, essential revisions above).

      Reviewer #3 (Public Review):

      Prior work had identified essential roles for Integrin signaling in regulating intestinal stem cell (ISC) proliferation, and the authors studies were motivated by trying to understand whether Vinculin (Vinc) might participate in this. However, Vinc is involved in mechanotransduction at both focal adhesions (FA) and adherens junctions (AJ), and their results revealed that Vinc phenotypes do not match those of FA proteins like Integrin. Conversely, they do match a-catenin (a-cat) RNAi phenotypes, and together with the localization of Vinc and the phenotypes associated with a-cat mutants that can't bind Vinc, this led to the conclusion that Vinc is acting at AJ rather than FA in this tissue. The results here are convincing, with clear presentation, nice images, and appropriate quantitation. It's also worth emphasizing that initial characterization of Vinc mutant flies failed to reveal any essential roles for this protein in Drosophila, so finding a mutant phenotype of any sort is significant.

      While the manuscript is strong as a descriptive report on the requirement for Vinc in the Drosophila intestine, it doesn't provide us with much understanding of the mechanism by which Vinc exerts its effects, nor how its requirement is linked to intestinal physiology.

      There is always more to learn, and the importance of our work so far is that it demonstrates a very specific role for vinculin as a mechanoeffector in regulating cell fate decisions in specific regions of the midgut, and provide the foundation for future work addressing the detailed mechanism of this function and physiological role.

      Prior work has shown that mechanical stretching of intestines stimulates ISC proliferation (presumably through Integrin signaling), which is opposite to what Vinc does here.

      We would like to stress that very little mechanistic knowledge is available regarding how mechanical stretching stimulates ISC proliferation, in Drosophila or mammalian systems. To our knowledge, the only work linking gut mechanical stretching to cell fate decisions in Drosophila identified Msn/Hippo pathway (Li et al., 2018) and the ion channel Piezo requirement (He et al., 2018). We agree with the reviewer that integrin signaling would most likely contribute, especially given the composition of gels for organoid cultures (Gjorevski et al, 2016), yet the actual molecular mechanisms remain to be elucidated.

      There is a suggestion that Vinc is involved in maintaining homeostasis, but how its regulated remains a bit murky. The authors report that reductions in myosin activity result in phenotypes reminscent of Vinc phenotypes, which they interpret as supporting a model where Vinc's role is to help maintain tension at AJ. Of course it could also be reversed - maybe they are similar because tension is needed to maintain Vinc recruitment to AJ? They lack of epistasis tests and lack of analysis of whether Vinc localization to AJ in EBs is affected by tension or the M2 deletion of a-cat leaves us uncertain as to the actual basis for the relationship between Vinc and myosin phenotypes.

      Thank you for all these suggestions. New experiments have been done to test the relationship between cellular tension and vinculin at junctions (see essential point 1).

    1. Author Response:

      Reviewer #3 (Public Review):

      Murphy et al. further develop the linked selection model of Elyashiv et al. (2016) and apply it to human genetic variation data. This model is itself an extension of the McVicker et al. (2009) paper, which developed a statistical inference method around classic background selection (BGS) theory (Hudson and Kaplan, 1995, Nordborg et al., 1996). These methods fit a composite likelihood model to diversity data along the chromosome, where the level of diversity is reduced by a local factor from some initial "neutral" level π0 down to observed levels. The level of reduction is determined by a combination of both BGS and the expected reduction around substitutions due to a sweep (though the authors state that these models are robust to partial and soft sweeps). The expected reduction factor is a function of local recombination rates and genomic annotation (such as exonic and phylogenetically conserved sequences), as well as the selection parameters (i.e. mutation rates and selection coefficients for different annotation classes). Overall, this work is a nice addition to an important line of work using models of linked selection to differentiate selection processes. The authors find that positive selection around substitutions explains little of the variation in diversity levels across the genome, whereas a background selection model can explain up to 80% of the variance in diversity. Additionally, their model seems to have solved a mystery of the McVicker et al. (2009) paper: why the estimated deleterious mutation rate was unreasonably high. Throughout the paper, the authors are careful not only in their methodology but also in their interpretation of the results. For example, when interpreting the good fit of the BGS model, the authors correctly point out that stabilizing selection on a polygenic trait can also lead to BGS-like reductions.

      Furthermore, the authors have carefully chosen their model's exogenous parameters to avoid circularity. The concern here is that if the input data into the model - in particular the recombination maps and segments liked to be conserved - are estimated or identified using signals in genetic variation, the model's good fit to diversity may be spurious. For example, often recombination maps are estimated from linkage disequilibrium (LD) data which is itself obtained from variation along the chromosome. Murphy et al. use a recombination map based on ancestry switches in African Americans which should prevent "information leakage" between the recombination map and the BGS model from leading to spuriously good fits. Likewise, the authors use phylogenetic conservation maps rather than those estimated from diversity reductions (such as McVicker et al.'s B maps) to avoid circularity between the conserved annotation track and diversity levels being modeled. Additionally, the authors have carefully assessed and modified the original McVicker et al. algorithm, reducing relative error (Figure A2).

      One could raise the concern that non-equilibrium demography confounds their results, but the authors have a very nice analysis in Section 7 of the supplementary material showing that their estimates are remarkably stable when the model is fit separately in different human populations (Figure A35). Supporting previous work that emphasizes the dependence between BGS and demography, the authors find evidence of such an interaction with a clever decomposition of variance approach (Figure A37). The consistency of BGS estimates across populations (e.g. Figures A35 and A36) is an additional strong bit of evidence that BGS is indeed shaping patterns of diversity; readers would benefit if some of these results were discussed in the main text.

      We appreciate the reviewer’s kind remarks. With regards to the results included in the main text vs the supplement, we attempted to strike a balance between having the main text remain communicative to a larger readership and providing experts with details they may find useful. We have, however, done our best for the supplementary analyses to be written clearly.

      I have three major concerns about this work. First, it's unclear how accurate the selection coefficient estimates are given the non-equilibrium demography of humans (pre-Out of Africa split, and thus not addressed by the separate population analyses). The authors do not make a big point about the selection coefficient estimates in the main section of the paper, so I don't find this to be a big problem. Still, some mention of this issue might be helpful to readers trying to interpret the results presented in the supplementary text.

      As the reviewer notes, we chose not to emphasize the inferred distributions of selection coefficients. Our main reason for this choice is the technical issue addressed in Appendix Section 1.5 (L561-564): “Second, thresholding potentially biases our estimates of the distribution of selection effects. While this bias is probably smaller than the bias without thresholding, its form and magnitude are not obvious. This is why we decided not to report the inferred distributions of selection effects in the Main Text.” We agree that if we were to focus on our estimates of the distribution of selection effects, the effects of demographic history would also need to be considered. This is, however, not the focus here.

      Second, I'm curious whether the composite likelihood BGS model could overfit any variance along the chromosome - even neutral variance. At some level, the composite likelihood approach may behave like a sort of smoothing algorithm, albeit with a functional form and parameters of a BGS model. The fact that there is information sharing across different regions with the same annotation class should in principle prevent overfitting to local noise. Still, there are two ways I think to address this overfitting concern. First, a negative neutral control could help - how much variation in diversity along the chromosome can this model explain in a purely neutral simulation? I imagine very little, likely less than 5%, but I think this paper would be much stronger with the addition of a negative control like this. Second, I think the main text should include the R2 values from out-sample predictions, rather than just the R2 estimates from the model fit on the entire data. For example, one could fit the model on 20 chromosomes, use the estimated θΒ parameters to predict variation on the remaining two. The authors do a sort of leave-one-out validation at the window level (Figure A31); however, this may not be robust to linkage disequilibrium between adjacent windows in the way leaving out an entire chromosome would be.

      The two requested analyses were done and their results are described above, in response to essential revisions (p. 2-3 here). In brief, there is no overfitting of neutral patterns or otherwise. We elaborate on why this finding is expected below.

      Finally, I feel like this paper would be stronger with realistic forward simulations. The deterministic simulations described in the supplementary materials show the implementation of the model is correct, but it's an exact simulation under the model - and thus not testing the accuracy of the model itself against realistic forward simulations. However, this is a sizable task and efforts to add selection to projects like Standard PopSim are ongoing.

      We agree that forward simulations would be a nice addition, but believe that it is a project in itself. Indeed, a major complication is that when, for computational tractability, purifying selection is simulated in small populations with realistic population-scaled parameters, the reduction in diversity due to selection at unlinked sites has a major effect on neutral diversity levels (see, e.g., Robertson 1961). We hope to address this issue in future work. Meanwhile, we note that the theory that we rely on has been tested against simulations in the past (e.g., Charlesworth et al., 1993; Hudson and Kaplan, 1995; Nordborg et al., 1996).

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01158

      Doi preprint: https://doi.org/10.1101/2021.11.16.468835

      Corresponding author(s): Salah, MECHERI

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      2. Point-by-point description of the revisions

      This section is mandatory. Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __Whole sporozoite vaccines confer sterilizing protection against Plasmodium infection. However, further improvements of whole sporozoite vaccines is needed and requires a thorough understanding of the immune processes that mediate protection and the deployment of novel strategies further augment protective immunity while limiting the impact of factors that are detrimental to protection. Work from the Mecheri laboratory and others had previously established that IL-6 signaling plays a critical role in the immune response to a liver stage infection; engagement of IL-6 signaling promotes the initial control of a liver stage infection and enhances the protective adaptive immune response. Given this potent protective role for IL-6, Belhimeur and colleagues design a parasite strain in rodent malaria parasites that encodes and secrete murine IL-6 during liver stage infection. They show that upon infection of wildtype mice, these transgenic parasites i) are unable to transition to blood stage infection, ii) produce Il-6 and iii) induce a durable adaptive immune response that can protect against sporozoite challenge. This study is novel and intriguing. However, a superficial analysis of the transgenic parasite strain, an incomplete analysis of the immune response to infection and the lack of data regarding the possibility of IL-6 mediated immunopathology have dampened this reviewer's enthusiasm for the work.

      **Major Concerns:** __

      1)The data in Figure 3b-3d clearly indicate that the IL-6 encoding transgenic parasites exhibit a defect in parasite development within HepG2 cells that is maintained in vivo. The authors propose that an arrest of these parasites in the liver stage precludes their transition to blood stage infection and that this arrest is dependent on IL-6 signaling. To better support that claim the authors should:

      a.Better characterize in vivo liver stage arrest using infected liver tissue analysis with immunofluorescence microscopy to determine when and how precisely IL-6 transgenic parasites are impacted in development.

      Done. New data in figure 3B, C, D

      b.Determine if arrested development of IL-6 transgenic parasites is truly dependent on IL-6 signaling using antibody blockade of IL-6 signaling and mice with genetic defects in IL-6 signaling.

      Experiments were done using anti-IL-6 receptor blocking antibodies, but did not work. This was commented in the text and shown in Supplementary Fig 2 .

      2)The authors claim that IL-6 production and secretion into the liver tissue augments the adaptive immune response to liver stage infection. This in turn results in a durable adaptive immune responses that protect against infection. However, the mechanistic underpinning of IL-6 signaling in the liver that is induced by their transgenic parasites and the impact on adaptive immune responses is poorly characterized:

      a.There is no evidence that the protective adaptive immune response induced by IL-6 trangenic parasite infection is dependent on IL-6 signaling. Is superior protection and immunogenicity lost in IL-6 signaling deficient animals that are infected with IL-6 transgenic parasites?

      Not addressed but the point is that IL-6 leads to attenuation.

      b.What elements of the adaptive immune response are impacted? One can imagine that IL-6 mediated killing of infected hepatocytes might introduce more parasite antigen that can be acquired by antigen presenting cells, or that IL-6 mediated pro-inflammatory signaling might regulate the maturation of antigen presenting cells, increased differentiation of helper T cells, the downregulation of regulatory T cell function and frequency and/or the differentiation of effector CD8 T cells into long-lived hepatic memory CD8 T cells. The authors should conduct a more comprehensive analysis of how parasite-encoded IL-6 impacts adaptive immunity.

      Done. An extensive analysis of CD4 and CD8 phenotype and status of activation is represented in Fig 9.

      3)While IL-6 transgenic parasites induce a potent and durable adaptive immune response, the authors should show how this compares to published whole sporozoite immunizations. The authors should determine if immunization with IL-6 transgenic parasites is superior to for example immunization with radiation-attenuated sporozoites and generically attenuated sporozoites.

      It not the point. The work presented here emphasizes the proof of concept that the proposed new strategy works. Follow up studies will compare this model to previous ones.

      4) IL-6 signaling is a major player in inflammatory diseases and the induction of immunopathology. As such the authors should carefully examine the duration and magnitude of IL-6 protein production in the liver, and serum after IL-6 Tg parasite infection and determine if IL-6 signaling promotes liver immunopathology.

      Not done but this point was discussed in the text. Also, we made it clear in the material and methods section that the way the construct was made, i.e the IL-6 production is time-frame restricted to the first 48h of liver infection, precisely because of the expression of IL-6 gene is under the control of LISP-2 promoter. Therefore there is no persistence of IL-6 production by liver stage parasites.

      Reviewer #1 (Significance (Required)):

      The paper is reporting a novel strategy to generate a whole sporozoite vaccine. Expression of IL6 in a transgenic parasite. This could be a significant contribution to the field if additional experiments as outlined in the critique are conducted.The work might also inform vaccine design for other pathogens.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript describes the construction of a Plasmodium berghei that expresses murine interleukin-6 in exoerythrocytic (liver stage) parasites and the analysis of mice infected with sporozoites of this parasite line. They find that such parasites do not complete development in liver cells and therefore do not produce subsequent infection in red blood cells. The ability of prior infection with these parasites on the ability of the host to resist both wild type and heterologous species challenge is then examined.

      The key assumption that underlies the study is that the observed phenotypes result from parasite expression of bioactive IL-6 that functions to modulate the immune system. Other explanations are not considered, for example the over-expression of secreted IL-6 may prevent the complete maturation of the intracellular parasite by clogging up the parasite secretory pathway. The authors use the 'wild type' parasite as the control but not only does the wild type not express IL-6 it also does not express the human DHFR gene used as a selection system. A much better control parasite would be one that expresses a non-bioactive IL-6 so that the potential effects on parasite maturation can be differentiated from those on the mouse immune system. Another control to be considered would be comparison with a genetically attenuated parasite with a block in late stage development, and which does not produce a host cytokine.

      Interesting comment but key novel result is that co-infection studies show reversed phenotype of IL-6 transgenic parasites, likely due to counteracting Of IL-6 effect by Wild type parasites (Supplementary Fig 1)

      Another assumption is that IL-6 is secreted from the infected liver cell and mediates its effects, presumably by binding to its cell surface receptor. The expectation of Il-6 secretion from the parasite is that it would accumulate in the parasitophorous vacuole - how would it get out of the infected host cell? While evidence is provided of IL-6 in the in vitro culture supernatant of infected cells - this might arise from damaged cells in rather artificial conditions. Have the authors considered doing the experiment of concurrent mouse infection with both wild type and recombinant parasites? If the mechanism of parasite killing in infected liver cells is as proposed, then a reduction of wild type parasites in the subsequent asexual blood stage would be expected.

      Experiments done. We discussed both experiments: IL-6 receptor blocking antibody experiement (Suppl Fig 2), and mixed infection (Suppl Fig 1).

      Figure 3 indicates that IL-6 TgPbA/LISP2 parasites are as efficient or better than wild type parasites at invading host cells but then they do not develop to maturity. What is the evidence that the key factor in their ability to immunize the host is expression of IL-6 rather than the effect of an attenuated parasite?

      This is an interesting observation made by the reviewer. With the available data, we cannot really tell which of the two possibilities is operating in thin system. It could also be that the two option are interconnected.

      In this model malaria infection, it looks like there are two lethal outcomes: one associated with experimental cerebral malaria at relatively low blood stage parasitemia (which I understand is a controversial model for human cerebral malaria) and the second associated with high blood stage parasitemia. Some of the protocols affect which outcome occurs (see for example Fig 6), but this observation is not properly discussed.

      In many occasions, we did see in the past a discrepancy between anti-parasite immunity and anti-disease protection. In this particular experiment (Fig 6), we explored the dose effect of the IL-6 mutant. What is clear from this model is that at the high dose, 104 SPZ, we observe both anti-parasite and anti-disease protection and immunity, whereas at the lower doses, 103 and 102 SPZ, although there was no efficient anti-parasite immunity, mice did not die from cerebral malaria but much later from hyperparasitemia. We consider that the two low doses of IL-6 transgenic parasites did protect against disease expression.

      For the data presented in Fig 7, why was there a challenge with WT PbA sporozoites before the heterologous Py challenge? If this step is excluded is there still an effect against P. yoelii? Why was the parasite chosen for the heterologous challenge Py17XNL? Since this parasite is largely restricted to reticulocytes in the blood stream would a different effect have been observed if the heterologous challenge parasite was, for example, P. chabaudi?

      Out of scope.

      Although the expectation is that IL-6 expression would not occur in the asexual blood stage, I think it would be important to demonstrate experimentally that this is the case.

      Done. IL-6 transgenic parasite, when inoculated as infected erythrocytes have no development defect and grow normally in infected mice.

      In Fig. 4A the y-axis is labelled IL-6 rRNA when it should be IL-6 mRNA.

      Corrected

      Reviewer #2 (Significance (Required)):

      The significance of the report does depend on whether or not the experimental evidence is sufficient to support the claim that parasite expression of IL-6 is important in generating immunity. There has been a number of studies to show that infection with sporozoites that have been genetically attenuated to not complete subsequent development in the infected liver cell can provide immunity to subsequent infection; what is different about this study is that the authors specifically target the parasite to express a host protein that is likely to be important in acquisition of immunity. Therefore for the study to have high significance they have to show convincingly that it is the expression and activity of IL-6 that is important and I do not think this is the case with the experiments reported. If the authors are correct, then the idea of manipulating the host response by expression of host proteins by the parasite may be an attractive approach to dissect the key elements of immunity to sporozoite infection. At the moment, although there is a lot of focus on developing an attenuated whole sporozoite vaccine against malaria, and this study may provide proof of principle for including a host component in the parasite, there would still be long way to go before any practical application of this approach.

      The key message was toned down. As the formal demonstration that the expression and activity of IL-6 is direcxtly involved in IL-6 transgenic parasites to confer protective immunity, we suggest to tone down the message by saying that IL-6 attenuates parasite virulence, the mechanism being likely through IL6 signaling detrimental effect on parasite development.

      The audience would be those interested in parasite immunology.

      __

      Reviewer expertise: malaria parasite cell and molecular biology; host immunity.

      **Referees cross commenting** __

      __ I think all reviewers are of the opinion that there needs to be a better demonstration that the observed phenotype is mediated by expression and signaling of IL-6, for example by antibody blockade or using a mouse line with a genetic defect in IL-6 signaling. Looking at all the issues that have been raised by the reviewers and need to be addressed with further experimentation, my feeling is that this will take longer than 6 months.

      __

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      __ **Summary** This study explores the expression of murine IL-6 by rodent Plasmodium berghei as a means to generate transgenic parasites whose development in the liver is arrested, which may be used as a genetically attenuated pre-erythrocytic vaccine against malaria. The authors conclude that IUL-6-expressing Plasmodium parasites elicit CSD8+ T-cell mediated immune responses that protect against a subsequent challenge with infectious sporozoites.

      **Major Comments** __

      In Figure 3, the authors show the results of qRT-PCR analysis of mouse livers infected with WT or transgenic parasites. They then use HepG2 cells to assess hepatic parasite numbers and development. Why didn't the authors assess this also in vivo, in liver sections of infected mice?

      Done. New data are presented in Fig 3B, C, D

      Linked to the above, a more complete analysis of the parasite's behavior in HepG2 cells should be provided. The authors write in the discussion that "IL-6 transgenic parasites develop perfectly well in cultured hepatocytic cells". Does this mean that they develop to the production of infectious merozoites? This could be confirmed by allowing the infected cultures to progress for 60-70 hours and then collecting the supernatants of these cultures and injecting them into naïve mice, to understand whether or not infectious merozoites are formed in vitro.

      New analysis demonstrate that IL-6 transgenic parasites actually display a developmental defect at the pre-erythrocytic stage in vivo.

      Figure 3C: The authors mention this result almost in passing but fail to provide an explanation for this observation. Why is the number of transgenic parasite EEFs approximately double that of WT parasite EEFs?

      A new figure 3 is provided and show that the EEF density (Fig 3B) was drastically reduced both at 24h and at 48h in mice infected with the IL-6 transgenic parasites as compared to those infected with WT PbA parasites, although the differences were not statistically significant. We also examined the size (Fig. 3C) of EEFs, and found the same tendency, namely a reduced size and diameter of IL-6 transgenic EEF as compared to those of WT PbA EEFs with a statistical difference only at 40h.

      Figure 3D: The EEF area units (mm2) on the YY axis are certainly wrong. However, they cannot be um2 either, as 15-30 um2 would be far too small for EEFs at 48 hours post-infection. What is it then?

      New data are now provided in a new Fig 3.

      The authors write "... suggest that the failure of IL-6 Tg-PbANKA/LISP2 parasites to develop in the liver of infected mice is likely due to an active anti-parasite immune response mediated by parasite-encoded IL-6 in vivo". I have several issues with this statement. 1) as mentioned above, the in vitro data cannot be used to draw definitive conclusions about the parasites' behavior in vivo; 2) the transgenic parasites do not "fail to develop in the liver of infected mice". If anything, they develop less than their WT counterparts, which is different from "failing to develop". Clarifying how much they do develop would be important (see next comment).

      We provide new in vivo data as to the development of IL-6 transgenic parasites. A new figure 3 is provided and show that the EEF density (Fig 3B) was drastically reduced both at 24h and at 48h in mice infected with the IL-6 transgenic parasites as compared to those infected with WT PbA parasites, although the differences were not statistically significant. We also examined the size (Fig. 3C) of EEFs, and found the same tendency, namely a reduced size and diameter of IL-6 transgenic EEF as compared to those of WT PbA EEFs with a statistical difference only at 40h. We replaced failure by a defect in development.

      In connection with the above, I would like to know more about the time when the development of IL-6 Tg-PbANKA/LISP2 parasites is arrested in vivo, in the liver. Are these early- or late-arresting parasites? Is the liver stage of infection compromised during parasite development or at egress? To clarify this, the manuscript would benefit from a timecourse analysis of liver sections of mice infected with this parasite, including data on EEF numbers and sizes up to and beyond 48 h after sporozoite inoculation.

      Done. See new figure 3.

      Still linked to the issue of parasite arrest in vivo and the possibility of breakthroughs, the manuscript would benefit from an experiment where mice were injected with a high number of transgenic sporozoites and parasitemia is monitored thereafter, much like what was done in Figure 2D, but starting off with a larger inoculum of at least 5 x 10^5 sporozoites.

      This was done and there was no breakthrough even with doses as high as 106 sporozoites

      While the results shown to suggest that secreted IL-6 restricts the parasite's liver stage development in vivo, this could be more definitely demonstrated by performing an infection with the transgenic parasites in the context of blocking or absence of the host's IL-6 receptor. This experiment was done but unfortunately did not work (Suppl. Fig 2). That is, the treatment of mice infected with IL-6 transgenic parasites with anti-IL-6 receptor blocking antibodies did not reverse the infection phenotype. This was also discuss in the manuscript.

      **Minor Comments**

      __

      The manuscript needs to be improved in terms of both language and format. Some examples, solely from the abstract, are listed below, but the manuscript needs to be appropriately revised in terms of language, grammar, punctuation and format throughout:__

      -Space missing between "P." and "berghei"

      Done

      -Gene names should be italicized

      Done

      -Rephrase "Considering IL-6 as a critical proinflammatory signal..." to "Considering that IL-6 is a critical proinflammatory signal..."

      Done

      -"transgenic IL-6 sporozoites" should be "transgenic IL-6-expressing P. berghei sporozoites"

      Done

      -"impairs Plasmodium infection at the liver stage" should be "impairs the liver stage of Plasmodium infection"

      Done

      INTRODUCTION

      The sentence "Among them, parasites lacking integrity of the parasitophorous vacuole, or late during development, and..." appears to be incomplete and needs rephrasing.

      Done

      The references used in sentence "During the last decade, in search of key mechanisms that determine the host inflammatory response, a set of host factors turned out to be critical for malaria parasite liver stage development (Mathieu et al., 2015); (Demarta-Gatsi et al., 2017; Demarta-Gatsi et al., 2016) (Grand et al., 2020)" do not all relate to the liver stage of infection. The authors need to select references that are relevant for their statement or else change the statement.

      Rephrased

      RESULTS

      I suggest the authors change the title of Results section "Transgenic P. berghei parasites expressing IL-6 during the liver stage lose infectivity to mice" not only to improve the quality of the English language employed but also to better clarify the notion that they are talking about hepatic infectivity.

      On the same section, please correct "timely specific timely".

      Done

      Transfectants are not "verified". If anything, the insertion of the gene in the parasite's genome is verified or, better still, confirmed.

      Done

      Sentence "The two lines behave similarly" is redundant.

      Done

      The legend of Figure 1 must include the definitions of all the acronyms in that figure.

      Acronyms in the whole manuscript are defined elsewhere

      "IL-6 transgenic sporozoites" is not an appropriate designation. If anything, they should be called IL-6-expressing P. berghei sporozoites".

      Done

      Figure 2 B: The YY axis should clarify that it refers to sporozoite numbers, as there are many other parasite stages in mosquitoes.

      Done

      Figure 2C: This scheme is hardly necessary. It would suffice to label the plots in D and E with the names of the parasite lines employed rather than "Group 1", "Group 2", "Group 3". The scheme is provided for more clarity and easy reading of the accompanying figures

      Figure 2D, 2E: Why didn't the authors use the same scale on the XX axis of the two plots?

      The qRT-PCR data per se do not substantiate the statement "Therefore, RT-qPCR analysis in the liver confirms that the loss of infectivity of IL-6 Tg-PbANKA/LISP2 SPZ is due to a defect in liver stage development in vivo", as a defect in invasion of hepatocytes cannot be excluded. The term "loss of infectivity" is also misleading. Do the authors mean loss of blood stage infectivity?

      Yes

      Sentence "... all parasites were able to invade and develop inside HepG2 cells." is misleading. The authors probably mean "parasites of both lines".

      Changed

      Figure 4: Why did the authors swap the order of the two experimental groups from one plot to the next? The same order should be used, to avoid confusion! Also, the authors should make the width of the bars in similar between the two plots.

      Done

      The authors should consider moving Figure 5 to the Supplementary materials.

      Reviewer #3 (Significance (Required)):

      *Nature and significance of the advance. Compare to existing published knowledge. Audience.*

      This study extends our current knowledge on genetically attenuated malaria vaccine candidates and validates the concept of suicide parasites for immunization against malaria. This paper will be of interest to researchers working on malaria vaccination, as well as all those interested in transgenic Plasmodium parasites, and the biology and immunology of liver stage infection by malaria parasites.

      *Your expertise.*

      The co-reviewer and the reviewer are experts on the liver stage of Plasmodium infection and on pre-erythrocytic malaria vaccination.

      **Referees cross commenting**

      I agree with all of Reviewers 1 and 2's remarks and, upon consideration, I would like to revise my "Estimated time to Complete Revisions" to become between 3 and 6 months

    1. Author Response

      Reviewer #1 (Public Review):

      The general idea of comparing response patterns to stress in the offspring generation is new and very interesting.

      We thank Reviewer 1 for their time and thoughtful comments. We agree that these comparisons are new and very interesting and have added multiple revised analyses to the manuscript based on the reviewer comments that we think will further enhance the impact of and conclusions made in this study.

      However, the data that are presented are in several ways preliminary. The phenotype comparisons are mostly convincing, although statistical treatments are partly unclear, given that each "replicate" includes itself many individuals.

      The statistical treatments for groups of individuals are the same as in Burton et al., 2017, Burton et al., 2020, and Willis et al., 2021 which include the original reports of the intergenerational responses studied here. Replicates that include many individuals are relatively common when working with C. elegans and are usually compared using ANOVA or student’s t-tests (depending on the number of comparisons) to analyze the variation in batch effects as well as differences between populations of animals.

      We believe this ability to assay hundreds or even thousands of animals, in total, for each comparison in this study makes our data substantially stronger and more reliable. However we are happy to perform any additional statistical tests the reviewer might want to see.

      The transcriptomic data are minimal (only three replicates)

      To address this comment we compared our original three replicates of RNA-seq from F1 animals from C. elegans parents exposed to P. vranovensis BIGb0446 to a second independent three replicates of F1 animals from C. elegans parents exposed to a second P. vranovensis isolate (BIGb0427 – the data for this second P. vranovensis isolate was already part of Fig. 4 of this manuscript).

      By comparing these three new replicates to our previous findings from three original replicates we found that 515 of the 562 genes that exhibited a >2-fold change and were significant at padj <0.01 in the original three replicates were also changed at >2-fold and padj <0.01 in the new three replicates. We believe our findings that 91.6% of genes change >2-fold and remain significant at padj<0.01 even when the number of replicates is doubled (and a different isolate of P. vranovensis is used!) suggests that adding additional replicates would not substantially change the conclusions of this manuscript.

      We would also like to highlight, as above, that because this analysis was done on populations of thousands of similarly staged animals, as opposed to individuals, that this further reduces the variability between replicates. In addition, much of our transcriptomic data from each species was then compared across species and genes were only analyzed for those that changed in multiple different species which themselves each represent a separate three additional replicates [ie genes that change in all 4 species analyzed have to exhibit significant (>2-fold, padj <0.01) changes across 12 total replicates].

      Our new findings comparing six replicates did not substantially change the number of genes identified when compared to using three replicates, and the fact that for all of the main conclusions of this manuscript each set of triplicates from one species was then compared across 9 additional replicates from three other species from pools of thousands of animals makes us very confident that our results are robust and highly reproducible.

      and lack comparison to the stress responses in the parental animals.

      We agree with Reviewer 1 that comparisons to parental animals are interesting and important. Comparisons of F1 progeny gene expression patterns to parental animals were not included here because such comparisons were previously published in some of our original reports of these intergenerational effects (For example, see Burton et al., 2020). In summary, we found that most, but not all, of the effects on gene expression in F1 animals were also detected in parental animals. However, the transcriptional responses only turn on in F1 animals post gastrulation and do not appear to be due to the simple deposition of parental mRNAs into embryos (Burton et al., 2020).

      We have updated the text to highlight these findings.

      The analysis of the transcriptome data is limited to counting overlaps between significantly changed genes, without deeper discussion of the genes and pathways that are affected.

      In the revised manuscript we have completely redone all of the transcriptomic analysis to use a stricter set of cutoffs for significance – both padj <0.01 and requiring a >2-fold change in expression based on the helpful comments of Reviewer 1 – which we agree with – see below.

      As part of this new analysis we have now also included a deeper discussion of the genes that exhibited similar changes across species, including using g:Profiler to examine the genes that exhibited changes across all four species.

      In addition, we have now paired our phenotypic and transcriptomic data across species to identify 19 new genes that we predict are highly likely to be involved in intergenerational responses to stress based on their expression patterns across species. These 19 genes come out of highly filtered analyses across species that identified a total of 23 genes that change only in species that adapt to P. vranovensis or osmotic stress and not in species that do not adapt.

      Interestingly, this analysis identified nearly all of the previously known genes involved in intergenerational adaptations to these stresses including rhy-1, cysl-1, cysl-2 and gpdh-1. Thus, we predict the remaining 19 genes that came out of this analysis are highly likely to be involved in the responses to these stresses. Furthermore, in the revised text we highlight that our new list of 19 genes includes multiple conserved factors that are required for animal viability including genes involved in nuclear transport (imb-1 and xpo-2), the CDC25 phosphatase ortholog cdc-25.1, and the PTEN tumor suppressor ortholog daf-18. This new analysis will likely form the basis for future investigations into the mechanisms underlying these exciting intergenerational effects.

      We believe this additional analysis greatly improves this manuscript. We are also happy to include any specific additional analysis the reviewer would like to see.

      The top response genes that are directly tested have been discovered before. Hence, while interesting patterns are evident from the data, this work largely confirms prior work, including that described in Burton et al. 2020.

      We have revised the text to highlight that the aims of this particular study were to determine if multigenerational responses to stress were evolutionarily conserved at any level, as well as to determine the potential costs of such effects and the specificity of the responses. Questions that were not addressed in any previous study of multigenerational effects, including Burton et al., 2020. Because of the aims of this study we believe it was critical to focus on genes that had an established role in these intergenerational responses in C. elegans and to compare and contrast the behavior and requirement of these genes in intergenerational responses in other species. (Although we note that this newly revised manuscript we have now also reported 19 new top response genes – see above).

      In addition to our original goals, in this study we were able to determine the extent to which intergenerational transcriptional responses are conserved and the extent to which intergenerational transcriptional changes persist transgenerationally (which we find to be effectively not at all using our revised stricter analysis). We believe these findings are not only novel, but perhaps will be surprising to much of the intergenerational and transgenerational field and have a major impact on both how multigenerational studies are interpreted and how they are conducted in the future. This is especially the case for studies in C. elegans which is one of the leading model organisms to study the mechanisms underlying both intergenerational and transgenerational responses to stress.

      For example, we note that several landmark studies of transgenerational effects (persisting into F3 or later generations) in C. elegans performed RNA-seq on F1 progeny (For example, Moore et al., Cell 2019 or Ma et al., Nature Cell Biology 2019). Our new findings reported here suggest that it is possible that none of the transcriptional effects detected in F1 animals will persist in F3 progeny. Furthermore, our studies demonstrate the importance of comparing C. elegans transcriptional effects to related Caenorhabditis species as we found that only a subset of the effects detected in C. elegans are conserved in any other Caenorhabditis species. (Such comparisons are important for determining if and to what extent observations of intergenerational and/or transgenerational effects observed in C. elegans represent conserved phenomena).

      For all of these reasons we believe our data is highly exciting, will be of broad interest to the field, and represent novel and potentially unexpected findings that were not previously reported in any prior work including Burton et al., 2020.

      Reviewer #2 (Public Review):

      Transgenerational effects (TE) (usually defined as multigenerational effects lasting for at least three generations) generated a lot of interest in recent years but the adaptive value of such effects is unclear. In order to understand the scope for adaptive TE we need to understand i) whether such effects are common; ii) whether they are stress-specific; and iii) if there are trade-offs with respect to performance in different environments. The last point is particularly important because F1, F2 and F3 descendants may encounter very different environments. On the other hand, intergenerational effects (lasting for one or two generations) are relatively common and can play an important role in evolutionary processes. However, we do not know whether intergenerational and transgenerational effects have same underlying mechanisms.

      This study makes a big step towards resolving these questions and strongly advances our understanding of both phenomena. Much of the previous work on mechanisms of multigenerational effects has been conducted in C. elegans and this works uses the same approach. They focus on bacterial infection, Microsporidia infection, larval starvation and osmotic stress. I did not quite understand why the authors chose to focus on P. vranovensis rather than P. aeruginosa P14 that has been used in previous studies of transgenerational effects in C. elegans. However, this is a minor point because I guess they were interested in broad transgenerational responses to bacterial infection rather than in strain-specific ones. The authors used different Caenorhabditis species, which is another strength of this study in addition to using multiple stresses.

      We thank the reviewer for these comments. We’d like to briefly highlight that P. vranovensis was also shown to elicit the same transgenerational effects as P. aeruginosa in the bioRxiv version of the same papers that reported transgenerational effects for P. aeruginosa (Kaletsky et al., 2020 – GRb0427 is an isolate of P. vranovensis).

      It is not clear to us why this result was not included in the final published version of this manuscript, but we in fact used P. vranovensis for these studies in part because of this bioRxiv paper and because we failed to detect any robust intergenerational effects using P. aeruginosa PA14 in any of our assays – including at the RNA-seq level (unpublished).

      Nonetheless, we have since confirmed with Coleen Murphy’s lab that they do find P. vranovensis elicits the same transgenerational effect on behaviour as P. aeruginosa. We expect that future investigations into the conditions under which P. vranovensis elicits effects that are lost/erased after 1 generation and the conditions under which effects might be maintained for more than 3 generations will be highly interesting.

      They found 279 genes that exhibited intergenerational changes in all C species tested, but most interestingly, they show that a reversal in gene expression corresponds to a reversal in response to bacterial infection (beneficial in two species and deleterious on one). This is very intriguing! This was further supported by similar observations of osmotic stress response.

      We thank Reviewer 2 for their excitement and we agree that these findings were highly exciting.

      They also report that intergenerational effects are stress-specific and there have deleterious effects in mismatched environments, and, importantly, when worms were subject to multiple stresses. It is quite likely that offspring will experience a range of environments and that several environmental stresses will be present simultaneously in nature. I really liked this aspect of this work as I think that tests in different environments, especially environments with multiple stresses, are often lacking, which limits the generality of the conclusions.

      Another interesting piece of the puzzle is that beneficial and deleterious effects could be mediated by the same mechanisms. It would be interesting to explore this further. However, this is not a real criticism of this work. I think that the authors collected an impressive dataset already and every good study generates new research questions.

      Given these findings, I was particularly keen to see what comes of transgenerational effects. The general answer was that there aren't many, and the authors conclude that all intergenerational effects that they studied are largely reversible and that intergenerational and transgenerational effects represent distinct phenomena. While I think that this is a very important finding, I am not sure whether we can conclude that intergenerational and transgenerational effects are not related.

      In my view, an alternative interpretation is that intergenerational effects are common while transgenerational effects are rare. Because intergenerational effects are stress-specific, transgenerational effects could be stress-specific as well.

      We agree with reviewer 2 that our findings suggest that intergenerational effects are common and transgenerational effects are either rare in comparison or only occur under specific conditions. We have updated the text to include this interpretation.

      Perhaps different mechanisms regulate intergenerational responses to, say, different forms of starvation (e.g. compare opposing transgenerational responses to prolonged larval starvation (Rechavi et al. doi:10.1016/j.cell.2014.06.020) and rather short adulthood starvation (Ivimey-Cook et al. 2021 https://doi.org/10.1098/rspb.2021.0701). Perhaps some (most?) forms of starvation generate only intergenerational responses and do not generate transgenerational responses. But some do. Those forms of starvation that generate both intergenerational and transgenerational effects could do so via same mechanisms and represent the same phenomenon. I am by no means saying this is the case, but I am not sure that the absence of evidence of transgenerational effects in this study necessarily suggests that inter- and trans-generational effects are different phenomena.

      We agree and, similar to above, have updated the text accordingly to state that it is also very possible that transgenerational effects only occur under certain conditions.

      The only concern real concern was the lack of phenotypic data on F3 beyond gene expression. Ideally, I would like to see tests of pathogen avoidance and starvation resistance in F3. However, given the amount of work that went into this study, the lack of strong signature of potential transgenerational effects in gene expression, and the fact that most of these effects were shown previously to last only one generation, I do not think this is crucial.

      We thank reviewer 2 for these comments and agree that phenotypic investigations of F3 effects are also very interesting.

      We have previously investigated the phenotypic effects of all of the stresses used in this paper on F3 animals using the assays described here and consistent with our new gene expression findings we previously found that most of these stresses do not exert phenotypic effects in F3 animals (Burton et al. 2020, Willis et al 2021, Hibshman et al., 2016).

      Separately, we have also attempted to investigate the effects of pathogen exposure on pathogen avoidance, as these effects have previously been reported to occur transgenerationally, but to date have been unable to consistently replicate these findings. We expect that this is likely due to what might be subtle differences in conditions between labs (differences in water used for the media prep, air humidity, potential differences in N2 wild-type strains etc….) because assays such as behavioral avoidance are known to be very sensitive to many different environmental inputs.

      We currently believe that our experiences as they relate to intergenerational and transgenerational effects support the general conclusion of this manuscript that while intergenerational effects are common and easy to initiate across multiple labs (the intergenerational effects studied here have now been successfully reproduced in labs in the US, UK, and Canada), transgenerational effects might be more specific and/or only occur/be initiated under more stringent conditions – perhaps with the aim of avoiding the costs of such multigenerational effects.

      Future studies of exactly when/under what conditions C. elegans initiates intergenerational vs transgenerational effects is likely to be very interesting.

      It would be very interesting to compare gene expression and other phenotypic responses in F1 and F3 between P. vranovensis and PA14. Also, it would be interesting to test the adaptive value of intergenerational and transgenerational effects after exposure to both strains in different environments. This is would be very informative and help with understanding the evolutionary significance of transgenerational epigenetic inheritance of pathogen avoidance as reported previously. Why response to P. vranovensis is erased while response to PA14 is maintained for four generations? Are nematodes more likely to encounter one species than the other? Again, however, this is not something necessary for this study.

      We completely agree with Reviewer 2 and have indeed attempted these experiments both in Burton et al., 2020 and in unpublished results.

      With regards to the transgenerational F3 effects, as mentioned above, P. vranovensis has been reported to elicit the same transgenerational effect as P. aeruginosa PA14 – at least as reported in the Kaletsky et al., 2020 bioRxiv version of the manuscript from the same studies. (GRb0427 is an isolate of P. vranovensis).

      To date, however, in our laboratory we have been unable to detect any transgenerational effects for either P. vranovensis or P. aeruginosa infection on gene expression data from RNA-seq experiments (data from this manuscript and unpublished data).

      It is not yet clear why this is the case, but we note that the RNA-seq analysis from the transgenerational PA14 studies (published in Moore et al., Cell 2019) was performed on F1 animals and thus was looking at intergenerational effects – to our knowledge no RNA-seq on F3 progeny from animals exposed to PA14 has ever been published. Thus, as it stands there is no existing F3 gene expression studies done using PA14 for us to compare our results to, but it remains possible that PA14 does not elicit specific effects on F3 gene expression when analyzed by RNA-seq.

      For F1 effects we have published a gene expression comparison for P. vranovensis and P. aeruginosa F1 effects in a previous manuscript (Burton et al 2020) and will add a mention of this to the text. Briefly, we detected very few F1 effects on gene expression when exposing adults to P. aeruginosa for 24 hours and parental infection by P. aeruginosa did not result in protection for offspring from P. vranovensis infection (Burton et al., 2020). We concluded that the intergenerational adaptation to P. vranovensis was not initiated by P. aeruginosa and was at least somewhat specific to P. vranovensis as well as the new species of Pseudomonas described in this manuscript which does cross protect.

      The main strengths of this paper are i) use of multiple stresses; ii) use of multiple species; iii) tests in different environments; and iv) simultaneous evaluation of intergenerational and transgenerational responses. This study is first of a kind, and it provides several important answers, while highlighting clear paths for future work.

      Excellent work and I think it will generate a lot of interest in the community, definitely want to see it published in eLife.

      We agree with Reviewer 2 and thank them for their kind comments.

      Reviewer #3 (Public Review):

      In this manuscript, the authors address whether the mechanisms mediating intergenerational effects are conserved in evolution. This question is important not only to frame this phenomenon in an evolutionary context, but to address several interlinked questions: is there a mechanism in common between adaptive versus deleterious effects? What makes some effects last one instead of several generations? What is the ecological relevance for those mechanisms? Using Caenorhabditis elegans as a model of reference, they compare four types of intergenerational effects on additional three Caenorhabditis species.

      The authors used previously characterized models of intergenerational inheritance, focusing on those that are likely to have adaptive significance. This is relevant, because the adaptive relevance of other published examples of inter- and transgenerational inheritance is not clear. They used functional studies to probe for conservation of mechanisms for bacterial infection and resistance to osmolarity stress, which is a major strength of this study. The data supports the claim of conservation in some types of intergenerational inheritance and divergence in others. One major question addressed in this manuscript is whether there is a potential overarching mechanism that confers stress-resistance across generations. Their experiments convincingly show that this is not the case, but that instead, there are stress-specific mechanisms responsible for intergenerational inheritance.

      We agree and thank Reviewer 3 for their kind comments.

    1. Author Response

      Reviewer #1 (Public Review):

      The relationship between genetic disease and adaptation is important for biomedical research as well as understanding human evolution. This topic has received considerable attention over the past several decades in human genetics research. The present manuscript provides a much more comprehensive and rigorous analysis of this topic. Specifically, the authors select a set of ~4000 human Mendelian disease genes and examine patterns of recent positive selection in these genes using the iHS and nSL tests (both haplotype test) for selection. They then compare the signals of sweeps to control genes. Importantly, they match the control set to the disease genes based upon many different genomic variables, such as recombination rate, amount of background selection, expression level, etc. The authors find that there is a deficit of selective sweeps in disease genes. They test several hypotheses for this deficit. They find that the deficit of sweeps is stronger in disease genes at low recombination rate and those that have more disease mutations. From this, the authors conclude that strongly deleterious mutations could be impeding selective sweeps.

      Strengths

      The manuscript includes a number of important strengths:

      1) It tackles an important question in the field. The question of selection in disease genes has been very well-studied in the past, with conflicting viewpoints. The present study examines this topic in a rigorous way and finds a deficit of sweeps in disease genes.

      2) The statistical analyses are rigorously done. The genome is a confusing place and there can often be many reasons why a certain set of genes could differ from another set of genes, unrelated to the variable of interest. Di et al. carefully match on these genomic confounders. Thus, they rigorously demonstrate that sweeps are depleted in disease genes relative to control genes. Further, the pipeline for ranking the genes and testing for significance is solid.

      3) The Introduction of the manuscript nicely relates different evolutionary models and explanations to patterns that could be seen in the data. As such, the present manuscript isn't just merely an exploratory analysis of patterns of sweeps in disease genes. Rather, it tests specific evolutionary scenarios.

      Weaknesses

      1) The authors did not discuss or test a basic explanation for the deficit of sweeps in disease genes. Namely, certain types of genes, when mutated, give rise to strong Mendelian phenotypes. However, mutations in these genes do not result in variation that gives rise to a phenotype on which positive selection could occur. In other words, there are just different types of genes underlying disease and positive selection. I could think that such a pattern would be possible if humans are close to the fitness optimum and strong effect mutations (like those in Mendelian disease genes) result in moving further away from the fitness optimum. On the other hand, more weak effect mutations could be either weakly deleterious or beneficial and subject to positive selection. I'm not sure whether these patterns would necessarily be captured by the overall measures of constraint which the disease and non-disease genes were matched on.

      We thank the reviewer for suggesting that alternative explanation. It is indeed important that we compare it with our own explanation. To rephrase the reviewer’s suggestion, it is possible that disease genes may just have a different distribution of fitness effects of new mutations. Specifically, mutations in disease genes might have such large effects that they will consistently overshoot the fitness optimum, and thus not get closer to this optimum. This would prevent them from being positively selected. Two predictions can be derived from this potential scenario. First, we can predict a sweep deficit at disease genes, which is what we report. Second, we can also predict that disease genes should exhibit a deficit of older adaptation, not just recent adaptation detected by sweep signals. Indeed, the decrease in adaptation due to (too) large effect mutations would be a generic, intrinsic feature of disease genes regardless of evolutionary time. This means that under this explanation, we expect a test of long-term adaptation such as the McDonald-Kreitman test to also show a deficit at disease genes.

      This latter prediction differs from the prediction made by our favored explanation of interference between deleterious and advantageous variants. In this scenario, the sweep deficit at disease genes is caused by the presence of deleterious, and most importantly currently segregating disease variants. Because the presence of the segregating variants is transient during evolution, our explanation does not predict a deficit of long-term adaptation. We can therefore distinguish which explanation (the reviewer’s or ours) is the most likely based on the presence or absence of a long-term adaptation deficit at disease genes.

      To test this, we now compare protein adaptation in disease and control genes with two versions of the MK test called ABC-MK and GRAPES (refs). ABC-MK estimates the overall rate of adaptation, and also the rates of weak and strong adaptation,and is based on Approximate Bayesian Computation. GRAPES is based on maximum likelihood. Both ABC-MK and GRPES have shown to provide robust estimates of the rate of protein adaptation thanks to evaluations with forward population simulations (refs). We find no difference in long-term adaptation between disease and control non-disease genes, as shown in new figure 4. This shows that the explanation put forward by the reviewer of an intrinsically different distribution of mutation effects at disease genes is less likely than an interference between currently segregating deleterious variants with recent, but not with older long-term adaptation. We even show in the new figure 4 that disease genes and their controls have more, not less strong long-term adaptation compared to the whole human genome baseline (new figure 4C). Also, disease genes in low recombination regions and with many disease variants have experienced more, not less strong long-term adaptation than their controls. Therefore, far from overshooting the fitness optimum due to stronger fitness effects of mutations, it looks like that these stronger fitness effects might in fact be more frequently positively selected in these disease genes.

      We now provide these new results P15L418:<br /> “Disease genes do not experience constitutively less long-term adaptive mutations<br /> A deficit of strong recent adaptation (strong enough to affect iHS or 𝑛𝑆!) raises the question of what creates the sweep deficit at disease genes. As already discussed, purifying selection and other confounding factors are matched between disease genes and their controls, which excludes that these factors alone could possibly explain the sweep deficit. Purifying selection alone in particular cannot explain this result, since we find evidence that it is well matched between disease and control genes (Figures 2 and Figure 4-figure supplement 1). Furthermore, we find that the 1,000 genes in the genome with the highest density of conserved elements do not exhibit any sweep deficit (bootstrap test + block-randomized genomes FPR=0.18; Methods). Association with mendelian diseases, rather than a generally elevated level of selective constraint, is therefore what matters to observe a sweep deficit. What then might explain the sweep deficit at disease genes?

      As mentioned in the introduction, it could be that mendelian disease genes experience constitutively less adaptive mutations. This could be the case for example because mendelian disease genes tend to be more pleiotropic (Otto, 2004), and/or because new mutations in mendelian are large effect mutations (Quintana-Murci, 2016) that tend to often overshoot the fitness optimum, and cannot be positively selected as a result. Regardless of the underlying processes, a constitutive tendency to experience less adaptive mutations predicts not only a deficit of recent adaptation, but also a deficit of more long-term adaptation during evolution. The iHS and nSL signals of recent adaptation we use to detect sweeps correspond to a time window of at most 50,000 years, since these statistics have very little statistical power to detect older adaptation (Sabeti et al., 2006). In contrast, approaches such as the McDonald-Kreitman test (MK test) (McDonald and Kreitman, 1991) capture the cumulative signals of adaptative events since humans and chimpanzee had a common ancestor, likely more than six million years ago. To test whether mendelian disease genes have also experienced less long-term adaptation, in addition to less recent adaptation, we use the MK tests ABC-MK (Uricchio et al., 2019) and GRAPES (Galtier, 2016) to compare the rate of protein adaptation (advantageous amino acid changes) in mendelian disease gene coding sequences, compared to confounding factors-matched non-disease controls (Methods). We find that overall, disease and control non-disease genes have experienced similar rates of protein adaptation during millions of years of human evolution, as shown by very similar estimated proportions of amino acid changes that were adaptive (Figure 5A,B,C,D,E). This result suggests that disease genes do not have constitutively less adaptive mutations. This implies that processes that are stable over evolutionary time such as pleiotropy, or a tendency to overshoot the fitness optimum, are unlikely to explain the sweep deficit at disease genes. If disease genes have not experienced less adaptive mutations during long-term evolution, then the process at work during more recent human evolution has to be transient, and has to has to have limited only recent adaptation. It is also noteworthy that both disease genes and their controls have experienced more coding adaptation than genes in the human genome overall (Figure 5A), especially more strong adaptation according to ABC-MK (Figure 5C). The fact that the baseline long-term coding adaptation is lower genome-wide, but similarly higher in disease and their control genes, also shows that the matched controls do play their intended role of accounting for confounding factors likely to affect adaptation. The fact that long-term protein adaptation is not lower at disease genes also excludes that purifying selection alone can explain the sweep deficit at disease genes, because purifying selection would then also have decreased long-term adaptation. A more transient evolutionary process is thus more likely to explain our results.”

      Then P22L613: “More importantly, the fact that constitutively less adaptation at disease genes combined to more power to detect sweeps in low recombination regions does not explain our results, is made even clearer by the fact that disease genes in low recombination regions and with many disease variants have in fact experienced more, not less long-term adaptation according to an MK analysis using both ABC-MK and GRAPES (Figure 5F,G,H,I,J). ABC-MK in particular finds that there is a significant excess of long-term strong adaptation (Figure 4H, P<0.01) in disease genes with low recombination and with many disease variants, compared to controls, but similar amounts of weak adaptation (Figure 5G, P=0.16). It might be that disease genes with many disease variants are genes with more mutations with stronger effects that can generate stronger positive selection. The potentially higher supply of strongly advantageous variants at these disease genes makes it all the more notable that they have a very strong sweep deficit in recent evolutionary times. This further strengthens the evidence in favor of interference during recent human adaptation: the limiting factor does not seem to be the supply of strongly advantageous variants, but instead the ability of these variants to have generated sweeps recently by rising fast enough in frequency.”

      2) While I think the authors did a superb job of controlling for genome differences between disease and non-disease genes, the analysis of separating regions by recombination rate and number of disease mutations does not seem as rigorous. Specifically, the authors tested for enrichment of sweeps in disease genes vs control and then stratified that comparison by recombination rate and/or number of disease mutations. While this nicely matches the disease genes to the control genes, it is not clear whether the high recombination rate genes differ in other important attributes from the low recombination rate genes. Thus, I worry whether there could be a confounder that makes it easier/harder to detect an enrichment/deficit of sweeps in regions of low/high recombination.

      We thank the reviewer for emphasizing the need for more controls when comparing our results in low or high recombination regions. We have now compared the confounding factors between low recombination disease genes and high recombination disease genes, as classified in the manuscript. As shown in new supp table Figure 6 figure supplement 1, confounding factors do not differ substantially between low and high recombination disease genes, and are all within a range of +/- 25% of each other. It would take a larger difference for any confounding factor to explain the sharp sweep deficit difference observed between the low and high recombination disease genes. The only factor with a 35% difference between low and high recombination mendelian disease genes is McVicker’s B, but this is completely expected; B is expected to be lower in low recombination regions.

      We now write P20L569: “Further note that only moderate differences in confounding factors between low and high recombination mendelian disease genes are unlikely to explain the sweep deficit difference (Figure 6-figure supplement 1).”

      Regarding the potential confounding effect of statistical power to detect sweeps differing in low and high recombination regions, please see our earlier response to main point 2.

      Reviewer #2 (Public Review):

      This paper seeks to test the extent to which adaptation via selective sweeps has occurred at disease-associated genes vs genes that have not (yet) been associated with disease. While there is a debate regarding the rate at which selective sweeps have occurred in recent human history, it is clear that some genes have experienced very strong recent selective sweeps. Recent papers from this group have very nicely shown how important virus interacting proteins have been in recent human evolution, and other papers have demonstrated the few instances in which strong selection has occurred in recent human history to adapt to novel environments (e.g. migration to high altitude, skin pigmentation, and a few other hypothesized traits).

      One challenge in reading the paper was that I did not realize the analysis was exclusively focused on Mendelian disease genes until much later (the first reference is not until the end of the introduction on pages 7-8 and then not at all again until the discussion, despite referring to "disease" many times in the abstract and throughout the paper). It would be preferred if the authors indicated that this study focused on Mendelian diseases (rather than a broader analysis that included complex or infectious diseases). This is important because there are many different types of diseases and disease genes. Infectious disease genes and complex disease genes may have quite different patterns (as the authors indicate at the end of the introduction).

      We want to apologize profusely for this avoidable mistake. We have now made it clearer from the very start of the manuscript that we focus on mendelian non-infectious disease genes. We have modified the title and the abstract accordingly, specifying mendelian and non-infectious as required.

      The abstract states "Understanding the relationship between disease and adaptation at the gene level in the human genome is severely hampered by the fact that we don't even know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during recent human evolution." This seems to diminish a large body of work that has been done in this area. The authors acknowledge some of this literature in the introduction, but it would be worth toning down the abstract, which suggests there has been no work in this area. A review of this topic by Lluis Quintana-Murci1 was cited, but diminished many of the developments that have been made in the intersection of population genetics and human disease biology. Quintana-Murci says "Mendelian disorders are typically severe, compromising survival and reproduction, and are caused by highly penetrant, rare deleterious mutations. Mendelian disease genes should therefore fit the mutation-selection balance model, with an equilibrium between the rate of mutation and the rate of risk allele removal by purifying selection", and argues that positive selection signals should be rare among Mendelian disease genes. Several other examples come to mind. For example, comparing Mendelian disease genes, complex disease genes, and mouse essential genes was the major focus of a 2008 paper2, which pointed out that Mendelian disease genes exhibited much higher rates of purifying selection while complex disease genes exhibited a mixture of purifying and positive selection. This paper was cited, but only in regard to their findings of complex diseases. A similar analysis of McDonald-Kreitman tables3 was performed around Mendelian disease genes vs non-disease genes, and found "that disease genes have a higher mean probability of negative selection within candidate cis-regulatory regions as compared to non-disease genes, however this trend is only suggestive in EAs, the population where the majority of diseases have likely been characterized". Both of these studies focused on polymorphism and divergence data, which target older instances of selection than iHS and nSL statistics used in the present study (but should have substantial overlap since iHS is not sensitive to very recent selection like the SDS statistic). Regardless, the findings are largely consistent, and I believe warrant a more modest tone.

      We thank the reviewer for their recommendation. We should have written more about what is currently well known or unknown about recent adaptation in disease genes, and in more nuanced terms. Instead of writing “Understanding the relationship between disease and adaptation at the gene level in the human genome is severely hampered by the fact that we don't even know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during recent human evolution”, we now write in the new abstract:

      “Despite our expanding knowledge of gene-disease associations, and despite the medical importance of disease genes, their recent evolution has not been thoroughly studied across diverse human populations. In particular, recent genomic adaptation at disease genes has not been characterized as well as long-term purifying selection and long-term adaptation. Understanding the relationship between disease and adaptation at the gene level in the human genome is hampered by the fact that we don’t know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during the last ~50,000 years of recent human evolution.”

      We also toned down the start of the introduction. We now write P3L74:

      “Despite our expanding knowledge of mendelian disease gene associations, and despite the fact that multiple evolutionary processes might connect disease and genomic adaptation at the gene level, these connections are yet to be studied more thoroughly, especially in the case of recent genomic adaptation.”

      Although we agree that others have made extensive efforts to characterize older adaptation or purifying selection at disease genes compared to non-disease genes, we still believe that our results are novel and more conclusive about recent positive selection. Our initial statement was however poorly phrased. To our knowledge, our study is the first to look at the issue using specifically sweep statistics that have been shown to be robust to background selection, while also controlling for confounding factors. These sweep statistics have sensitivity for selection events that occurred in the past 30,000 or at most 50,000 years of human evolution (Sabeti et al. 2006). This is a very different time scale compared to the millions of years of adaptation (since divergence between humans and chimpanzees) captured by MK approaches.

      We also want to note that we did cite the Blekhman et al. paper for their result of stronger purifying selection in our initial manuscript. It is true however that we did not specify mendelian disease genes, which was confusing. We want to apologize again for it:

      From the earlier manuscript: “Multiple recent studies comparing evolutionary patterns between human disease and non-disease genes have found that disease genes are more constrained and evolve more slowly (lower ratio of nonsynonymous to synonymous substitution rate, dN/dS, in disease genes) (Blekhman et al., 2008; Park et al., 2012; Spataro et al., 2017)”

      “Among other confounding factors, it is particularly important to take into account evolutionary constraint, i.e the level of purifying selection experienced by different genes. A common intuition is that disease genes may exhibit less adaptation because they are more constrained (Blekhman et al., 2008)”

      It is important to remember that, as we mention in the introduction, previous comparisons did not take potential confounding factors at all into account. It is therefore unclear whether their conclusions were specific to disease genes, or due to confounding factors. We have now made this point clearer in the introduction, as we believe that we have made a substantial effort to control for confounding factors, and that it is a substantial departure from previous efforts:

      P7L201: “In contrast with previous studies, we systematically control for a large number of confounding factors when comparing recent adaptation in human mendelian disease and nondisease genes, including evolutionary constraint, mutation rate, recombination rate, the proportion of immune or virus-interacting genes, etc. (please refer to Methods for a full list of the confounding factors included).”.

      P9L253: “These differences between disease and non-disease genes highlight the need to compare disease genes with control non-disease genes with similar levels of selective constraint. To do this and compare sweeps in mendelian disease genes and non-disease genes that are similar in ways other than being associated with mendelian disease (as described in the Results below, Less sweeps at mendelian disease genes), we use sets of control non-disease genes that are built by a bootstrap test to match the disease genes in terms of confounding factors (Methods)”.

      Furthermore, we have now added a comparison of older adaptation in disease and non-disease genes using a recent version of the MK test called ABC-MK, that can take background selection and other biases such as segregating weakly advantageous variants into account. Also controlling for confounding factors, we find no difference in older adaptation between disease and non-disease genes (please see our response to main point 2).

      Therefore, contrary to the reviewer’s claim that the sweep statistics and MK approaches should have substantial overlap, we now show that it is clearly not the case. We further show that the lack of overlap is expected under our explanation of our results based on interference between recessive deleterious and advantageous variants (see our responses to main point 1 and to reviewer 1 weakness 1).

      Previous analyses were using much smaller mendelian disease gene datasets, less recent polymorphism datasets and, critically, did not control for confounding factors. We also note that reference 3 (Torgerson et al. Plos Genetics 2009) does not make any claim about recent positive selection in mendelian disease genes compared to other genes. Their dataset at the time also only included 666 mendelian disease genes, versus the ~4,000 currently known.

      In short, we do think that we have a claim for novelty, but the reviewer is entirely right that we did a poor job of giving due credit to previous important work. These previous studies deserved much better credit than no credit at all. We want to thank the reviewer from avoiding us the embarrassment of not citing important work.

      We now cite the papers referenced by the reviewer as appropriate in the introduction, based on the scope of their results:

      P3L93: “Multiple recent studies comparing evolutionary patterns between human mendelian disease and non-disease genes have found that mendelian disease genes are more constrained and evolve more slowly (Blekhman et al., 2008; Quintana-Murci, 2016; Spataro et al., 2017; Torgerson et al., 2009). An older comparison by Smith and Eyre-Walker (Smith and Eyre-Walker, 2003) found that disease genes evolve faster than non-disease genes, but we note that the sample of disease genes used at the time was very limited.”

      P5L134 “Among possible confounding factors, it is particularly important to take into account evolutionary constraint, i.e the level of purifying selection experienced by different genes. A common intuition is that mendelian disease genes may exhibit less adaptation because they are more constrained (Blekhman et al., 2008; Spataro et al., 2017; Torgerson et al., 2009),”

      There are some aspects of the current study that I think are highly valuable. For example, the authors study most of the 1000 Genomes Project populations (though the text should be edited since the admixed and South Asian populations are not analyzed, so all 26 populations are not included, only the populations from Africa, East Asia, and Europe are analyzed; a total of 15 populations are included Figures 2-3). Comparing populations allows the authors to understand how signatures of selection might be shared vs population-specific. Unfortunately, the signals that the authors find regarding the depletion of positive selection at Mendelian disease genes is almost entirely restricted to African populations. The signal is not significant in East Asia or Europe (Figure 2 clearly shows this). It seems that the mean curve of the fold-enrichment as a function of rank threshold (Figure 3) trends downward in East Asian and European populations, but the sampling variance is so large that the bootstrap confidence intervals overlap 1). The paper should therefore revise the sentence "we find a strong depletion in sweep signals at disease genes, especially in Africa" to "only in Africa". This opens the question of why the authors find the particular pattern they find. The authors do point out that a majority of Mendelian disease genes are likely discovered in European populations, so is it that the genes' functions predate the Out-of-Africa split? They most certainly do. It is possible that the larger long-term effective population size of African populations resulted in stronger purifying selection at Mendelian disease genes compared to European and East Asian populations, where smaller effective population sizes due to the Out-of-Africa Bottleneck diminished the signal of most selective sweeps and hence there is little differentiation between categories of genes, "drift noise"). It is also surprising to note that the authors find selection signatures at all using iHS in African populations while a previous study using the same statistic could not differentiate signals of selection from neutral demographic simulations4.

      We want to thank the reviewer profusely for putting us on the right track thanks to their insightful suggestion. As described in our response to reviewer 1 weakness 1, we have now shown with simulations that the interference of deleterious variants on advantageous variants is strongly decreased during a bottleneck of a magnitude similar to the Out of Africa bottlenecks experienced by East Asian and European populations. This decrease of interference is likely strong enough to not require any other explanation, even if other processes may also be at work, such as a decrease of the sweeps signals as suggested by the reviewer.

      About the Granka et al. paper, the last author of the current manuscript has already shown in a previous paper (ref) that the type of approaches used to quantify recent adaptation is likely to be severely underpowered due to a number of confounding factors, notably including comparing genic and non-genic windows that are not sufficiently far from each other to not overlap the same sweep signals. Our result are also based on much more recent and less biased sets of SNPs used to measure the sweeps statistics.

      The authors find that there is a remarkably (in my view) similar depletion across all but one MeSH disease classes. This suggests that "disease" is likely not the driving factor, but that Mendelian disease genes are a way of identifying where there are strongly selected deleterious variants recurrently arising and preventing positively selected variants. This is a fascinating hypothesis, and is corroborated by the finding that the depletion gets stronger in genes with more Mendelian disease variants. In this sense, the authors are using Mendelian disease genes as a proxy for identifying targets of strong purifying selection, and are therefore not actually studying Mendelian disease genes. The signal could be clearer if the test set is based on the factor that is actually driving the signal.

      Based on the reviewer’s comment, we have now better explained why our results are unlikely to be a generic property of purifying selection alone. As we explain in our response to main point 3, our results cannot be explained by purifying selection alone, because we match purifying selection between disease genes and the controls. Indeed, we now show with additional MK analyses and GERP-based analyses that our controls for confounding factors already account for purifying selection. This is shown by the fact that disease genes and their controls have similar distributions of deleterious fitness effects.

      In addition, we added a comparison that shows that purifying selection alone does not explain our results. Instead of comparing sweeps at disease and non-disease genes, we compared sweeps (in Africa) between the 1,000 genes with the highest density of conserved, constrained elements and other genes in the genome. If purifying selection is the factor that drives the sweep deficit at disease genes, then we should see a sweep deficit among the genes with the most conserved, constrained elements compared to other genes in the genome. However, we see no such sweep deficit at genes with a high density of conserved, selectively constrained elements (boostrap test + block randomization of genomes, FPR=0.18). See P15L424. Note that for this comparison we had to remove the matching of confounding factors corresponding to functional and purifying selection densities (new Methods P40L1131).

      Again, our results are better explained not just by purifying selection alone, but more specifically by the presence of interfering, segregating deleterious variants. It is perfectly possible to have highly constrained parts of the genome without having many deleterious segregating variants at a given time in evolution.

      The similarity across MeSH classes can be readily explained if what matters is interference with deleterious segregating variants. Because all types of diseases have deleterious segregating variants, then it is not surprising that different MeSH disease categories have a similar sweep deficit. We make that point clearer in the revised manuscript:

      P26L707: “The sweep deficit is comparable across MeSH disease classes (Figure 8), suggesting that the evolutionary process at the origin of the sweep deficit is not diseasespecific. This is compatible with a non-disease specific explanation such as recessive deleterious variants interfering with adaptive variants, irrespective of the specific disease type.”.

      One of the most important steps that the authors undertake is to control for possible confounding factors. The authors identify 22 possible confounding factors, and find that several confounding factors have different effects in Mendelian disease genes vs non-disease genes. The authors do a great job of implementing a block-bootstrap approach to control for each of these factors. The authors talk specifically about some of these (e.g. PPI), but not others that are just as strong (e.g. gene length). I am left wondering how interactions among other confounding factors could impact the findings of this paper. I was surprised to see a focus on disease variant number, but not a control for CDS length. As I understand it, gene length is defined as the entire genomic distance between the TSS and TES. Presumably genes with larger coding sequence have more potential for disease variants (though number of disease variants discovered is highly biased toward genes with high interest). CDS length would be helpful to correct for things that pS does not correct for, since pS is a rate (controlling for CDS length) and does not account for the coding footprint (hence pS is similar across gene categories).

      Based on our response to the previous point, it is clear that a high density of coding sequences, or conserved constrained sequence in general are not enough to explain our results. Furthermore, we want to remind the reviewer that we already control for coding sequence length through controlling for coding density, since we use windows of constant sizes.

      The authors point out that it is crucial to get the control set right. This group has spent a lot of time thinking about how to define a control set of genes in several previous papers. But it is not clear if complex disease genes and infectious disease genes are specifically excluded or not. Number of virus interactions was included as a confounding factor, so VIPs were presumably not excluded. It is clear that the control set includes genes not yet associated with Mendelian disease, but the focus is primarily on the distance away from known Mendelian disease genes.

      We are sorry that we were not more explicit from the start of the manuscript. We now make it clearer what the set disease genes includes or not throughout the entire manuscript, by repeating that we focus specifically on mendelian, non-infectious disease genes. By noninfectious, we mean that we excluded genes with known infectious disease-associated variants. This does not exclude most virus-interacting genes since most of them are not associated at the genetic variant level with infectious diseases. It is also important to note that the effect of virus interactions is accounted for by matching the number of interacting viruses between mendelian disease genes and controls.

      We write P29L818: “By non-infectious, we mean that we excluded genes with known infectious disease-associated variants. This does not exclude most VIPs since most of them are not associated at the genetic variant level with infectious diseases. It is important to note that the effect of virus interactions is accounted for by matching the number of interacting viruses between mendelian disease genes and controls.”

      Minor comments:

      On page 13, the authors say "This artifact is also very unlikely due to the fact that recombination rates are similar between disease and non-disease genes (Figure 1)." However, Figure 1 shows that "deCode recombination 50kb" is clearly higher in disease genes and comparable at 500kb. The increased recombination rate locally around disease genes seems to contradict the argument formulated in this paragraph.

      We apologize for the lack of precision in this sentence. What we meant is that the recombination rates are not different enough that the mentioned hypothetical artifact would be able to explain our results. We also forgot to remind at this point in the manuscript that we match recombination between disease genes and controls. We now use more precise language:

      P28L772 “The recombination rate at disease genes is also only slightly different from the recombination rate at non-disease genes (Figure 1), and we match the recombination rate between disease genes and controls.”.

      Reviewer #3 (Public Review):

      In this paper, the authors ask whether selective sweeps (as measured by the iHS and nSL statistics) are more or less likely to occur in or near genes associated with Mendelian diseases ("disease genes") than those that are not ("non-disease genes"). The main result put forward by the authors is that genes associated with Mendelian diseases are depleted for sweep signatures, as measured by the iHS and nSL statistics, relative to those which are not.

      The evidence for this comes from an empirical randomization scheme to assess whether genes with signatures of a selective sweep are more likely to be Mendelian disease genes that not. The analysis relies on a somewhat complicated sliding threshold scheme that effectively acts to incorporate evidence from both genes with very large iHS/nSL values, as well as those with weaker signals, while upweighting the signal from those genes with the strongest iHS/nSL values. Although I think the anlaysis could be presented more clearly, it does seem like a better analysis than a simple outlier test, if for no other reason than that the sliding threshold scheme can be seen as a way of averaging over uncertainty in where one should set the threshold in an outlier test (along with some further averaging across the two different sweeps statistics, and the size of the window around disease associated genes that the sweep statistics are averaged over). That said, the particular approach to doing so is somewhat arbitrary, but it's not clear that there's a good way to avoid that.

      In addition to reporting that extreme values of iHS/nSL are generally less likely at Mendelian disease genes, the authors also report that this depletion is strongest in genes from low recombination regions, or which have >5 specific variants associated with disease.

      Drawing on this result, the authors read this evidence to imply that sweeps are generally impeded or slowed in the vicinity of genes associated with Mendelian diseases due to linkage to recessive deleterious variants, which hitchhike to high enough frequencies that the selection against homozygotes becomes an important form of interference. This phenomenon was theoretically characterized by Assaf et al 2015, who the authors point to for support. That such a phenomenon may be acting systematically to shape the process of adaptation is an interesting suggestions. It's a bit unclear to me why the authors specifically invoke recessive deleterious mutations as an explanation though. Presumably any form of interference could create the patterns they observe? This part of the paper is, as the authors acknowledge, speculative at this point.

      We thank the reviewer for their comments. We are sorry that we did not provide a clear explanation of why only recessive deleterious mutations are expected to interfere more than other types of deleterious variants. This was shown by Assaf et al. (2015), and we should have stated it explicitly. The reason why recessive deleterious variants interfere more than additive or dominant ones is that they can hitchhike together with an adaptive variant to substantial frequencies before negative selection actually happens, when a significant number of homozygous individuals for the deleterious mutation start happening in the population. On the contrary dominant mutations do not make it to the same high frequencies linked to an adaptive variant, because they start being selected negatively as soon as they appear in the population.

      We now write P18L496: “In diploid species including humans, recessive deleterious mutations specifically have been shown to have the ability to slow down, or even stop the frequency increase of advantageous mutations that they are linked with (Assaf et al., 2015). Dominant variants do not have the same interfering ability, because they do not increase in frequency in linkage with advantageous variants as much as recessive deleterious do, before the latter can be “seen” by purifying selection when enough homozygous individuals emerge in a population (Assaf et al., 2015).”

      We have also confirmed with SLiM forward simulations that recessive deleterious variants interfere with adaptive variants much more than dominant ones (Table 1).

      I'm also a bit concerned by the fact that the signal is only present in the African samples studied. The authors suggest that this is simply due to stronger drift in the history of European and Asian samples. This could be, but as a reader it's a bit frustrating to have to take this on faith.

      We thank the reviewer for pointing out this issue with our manuscript. We have now shown, as detailed above in our response to main point 1, reviewer 1 weakness 1, that a weaker sweep deficit at disease genes in Europe and East Asia is an expected feature under the interference explanation, due to the weakened interference of recessive deleterious variants during bottlenecks of the magnitude observed in Europe and East Asia. We therefore believe that these new results strengthen our previous claim regarding the role interference between deleterious and advantageous variants. We want to thank the reviewer for forcing us to examine the difference between results in Africa and out of Africa, as the manuscript is now more consistent and our results substantially better explained.

      There are other analyses that I don't find terribly convincing. For example, one of the anlayses shows that iHS signals are no less depleted at genes associated with >5 diseases than with 1 does little to convince me of anything. It's not particularly clear that # of associated disease for a given gene should predict the degree of pleiotropy experienced by a variant emerging in that gene with some kind of adaptive function. Failure to find any association here might just mean that this is not a particularly good measure of the relevant pleiotropy.

      We agree with the reviewer that the number of associated disease may not be a good measure of pleiotropy. Unfortunately to our knowledge there is currently no good measure of gene pleiotropy in human genomes. Given that the evidence in favor of interference of deleterious variants is now strengthened, we have chosen to remove this analysis from the manuscript. As we now explain throughout the manuscript, pleiotropy is an unlikely explanation in the first place because of the fact that disease genes have not experienced less long-term adaptation (see the details on our new MK test results in the response to main point 2).

      P16L447: “We find that overall, disease and control non-disease genes have experienced similar rates of protein adaptation during millions of years of human evolution, as shown by very similar estimated proportions of amino acid changes that were adaptive (Figure 5A,B,C,D,E). This result suggests that disease genes do not have constitutively less adaptive mutations. This implies that processes stable over evolutionary time such as pleiotropy, or a tendency to overshoot the fitness optimum, are unlikely to explain the sweep deficit at disease genes.”.

      A last parting thought is that it's not clear to me that the authors have excluded the hypothesis that adaptive variants simply arise less often near genes associated with disease. The fact that the signal is strongest in regions of low recombination is meant to be evidence in favor of selective interference as the explanation, but it is also the regime in which sweeps should be easiest to detect, so it may be just that the analysis is best powered to detect a difference in sweep initiation, independent of possible interference dynamics, in that regime.

      We thank the reviewer for stating these important alternative explanations that needed more attention in our manuscript. In our response to main point 2 above, we explain that higher statistical power in low recombination regions is unlikely to explain our results alone, because we also show that the sweep deficit is substantially present not only in low recombination regions, but also requires the presence of a higher number of disease variants. We also describe in our response to main point 2 how our new MK-test results on long-term adaptation make it very unlikely that mendelian disease genes experience constitutively less adaptation. We want to thank the reviewer again for pointing out this issue with our manuscript, since it was indeed an important missing piece.

    1. Author Response

      Reviewer #2 (Public Review):

      (1) Much of the cited literature that is used to make the case for their hypothesis is very old and actually refers to active HIV infection and patient studies prior to ART. Also, the literature they cite regarding the role of H2S as an antimicrobial agent seem to be limited to tuberculosis infection.

      We have revised the list of literature and included more relevant references post- ART era. Recently, the antimicrobial role of H2S is comprehensively examined in the context of tuberculosis. Given the close association of TB with HIV, we thought our study is very timely and essential. However, we would like to point out that the references showing the effect of H2S on infection caused by respiratory viruses are included in the manuscript (7-9). Further, recent findings showing the influence of H2S in the context of SARS-CoV2 infection are also included in the revised manuscript

      (2) The choice of the latently infected model cell lines is rather unfortunate. There are much better defined models out there these days than J1.1 or U1 cells, such as the J-LAT cells from the Verdin lab or the various reporter cell lines generated by Levy and co-workers. In particularly, U1 cells should not be considered as latently infected, as the virus has a defect in the Tat/TAR axis and is mostly just transcriptionally attenuated. It is unclear why the authors only use J-LAT cells for one of the last experiments

      As suggested by the reviewer, we have generated new data using J-LAT cells in the revised manuscript. First, we confirmed that PMA-mediated HIV-1 reactivation in J-LAT cells is associated with the down-regulation of cbs, cth, and mpst transcripts (Figure 1-figure supplement 1C-D in the revised manuscript). Additionally, we have performed several other mechanistic experiments in J-LAT cells to validate the data generated in U1 (see below response to # 3).

      (3) It is further unclear why the authors perform most of the experiments using U1 cells, which are considered promonocytic, but in the end seek to demonstrate the influence of H2S on latent HIV-1 infection in CD4 T cells. Performing all experiments in J1.1 or better J-LAT cells would have seemed more intuitive.

      The choice of U1 was based on our earlier studies showing that U1 cells uniformly recapitulate the association of redox-based mechanisms and mitochondrial bioenergetics with HIV-latency and reactivation (10-12). We have validated key findings of U1 cells in J1.1 and J-Lat cell lines. We genetically and chemically silenced the expression of CTH in J-Lat cells and examined the effect on HIV-1 reactivation. Consistent with U1 and J1.1, genetic silencing of CTH using CTH-specific shRNA (shCTH) reactivated HIV-1 in J-Lat (Figure 2-figure supplement 1F-G in the revised manuscript). Supporting this, pre-treatment of J-Lat with non-toxic concentrations of a well-established CTH inhibitor, propargylglycine (PAG) further stimulated PMA-induced HIV-1 reactivation (Figure 2-figure supplement 1H-I in the revised manuscript). Altogether, using various cell line models of HIV-1 latency, we confirmed that endogenous H2S biogenesis counteracts HIV-1 reactivation.

      (4) The authors suggest that H2S production would control latent HIV-1 infection and reactivation. Regarding the idea that CBS, CTH or possibly MPST would control latent infection as a function of their ability to produce H2S from different sources, there are several questions. First, if H2S is the primary factor, why would the presence of e.g. MPST not compensate for the reduction of CTH? Second, why would J1.1 and U1 cells both host latent HIV-1 infection events, however, their CBS/CTH/MPST composition is completely different? Third, natural variations in CTH expression caused by culture over time are larger than variations caused by PMA activation.

      These questions are important and complex. CBS, CTH, and MPST produce H2S in the sulfur network. CBS and CTH reside in the cytoplasm, whereas MPST is mainly involved in cysteine catabolism and is mitochondrial localized. The lack of compensation of CTH by MPST could be due to the compartmentalization of their activities. Furthermore, CTH and CBS activities are regulated by diverse metabolites, including heme, S-adenosyl methionine (SAM), and nitric oxide/carbon monoxide (NO/CO). In contrast, MPST activity responds to cysteine availability. How substrates/cofactors availability and enzyme choices are regulated in the cellular milieu of J1.1 and U1 is an interesting question for future experimentation.

      Moreover, the tissue-specific expression/activity of CBS and CTH dictates their relative contributions in H2S biogenesis and cellular physiology (13). Some of these factors are likely responsible for differential expression of CBS, CTH, and MPST in J1.1 and U1 cells. Regardless of these concerns, viral reactivation uniformly reduces the expression of CTH in U1, J1.1, and J-Lat. While we cannot completely rule out natural variations in CTH expression over prolonged culturing, in our experimental setup CTH remained stably expressed and consistently showed down-regulation upon PMA treatment as compared to untreated conditions.

      (5) Also, the statement that H2S production as exerted per loss of CTH would control reactivation is not supported by the kinetic data. In latently HIV-1 infected T cell lines or monocytic cell lines, PMA-mediated HIV-1 reactivation at the protein level is usually almost complete after 24 hours, but at this time point the difference between e.g. CTH levels only begins to appear in U1 cells. The data for J1.1. are even less convincing.

      We have performed the kinetics of p24 production and CTH in U1 cells. We showed that the levels of p24 gradually increased from 6 h and kept on increasing till the last time point, i.e., 36 h post-PMA-treatment (Fig. 2D in the revised manuscript). The p24 ELISA detected a similar kinetics of p24 increase in the cell supernatant (Fig. 2E in the revised manuscript). The CTH levels show reduction at 24 h and 36 h. Based on these data, we report that HIV-1 reactivation is associated with diminished biogenesis of endogenous H2S. We have not made any claims that depletion of CTH precedes HIV reactivation. However, our CTH knockdown data clearly showed that diminished expression of CTH reactivates HIV-1 in the absence of PMA, which is consistent with our hypothesis that H2S production is likely to be a critical host component for maintaining viral latency.

      (6) Figure 2F. PMA is known to induce an oxidative stress response, however, in the experiments the data suggest that PMA results in a downregulated oxidative stress response. Maybe the authors could explain this discrepancy with the literature. In fact, both shRNA transductions, scr and CTH-specific seem to result in a lower PMA response.

      In our experiment, PMA treatment for 24 h results in down-regulation of oxidative stress genes. However, the effect of PMA on the oxidative stress responsive genes is time-dependent. In our earlier publication, we showed that 12 h PMA treatment induces oxidative stress responsive genes in U1 cells (12), whereas at 24 h, the expression of genes is down-regulated (10). Genetic silencing of CTH resulted in elevated mitochondrial ROS and GSH imbalance, which is in line with a further decrease in the expression of oxidative stress responsive genes as compared to PMA alone. As a consequence, PMA-treatment of U1-shCTH induced HIV-1 reactivation, which supersedes that stimulated by PMA or shCTH alone.

      (7) Given that the others in subsequent experiments use GYY4137, which is supposed to mimic the increased release of H2S, the authors should have definitely included experiments in which they would overexpress CTH, e.g. by retroviral transduction. Specifically in U1 cells, which seemingly do not express CBS, overexpression of CBS should also result in a suppressed phenotype

      We have explored the role of elevated H2S levels using GY44137. Treatment with GYY4137 suppressed HIV reactivation in multiple cell lines and primary CD4+ T cells. As suggested by the reviewer, overexpression of CTH could be another strategy to validate these findings. However, since the transsulfuration pathway and active methyl cycle are interconnected and share metabolic intermediates (e.g., homocysteine), overexpression of CTH could disturb this balance and may lead to metabolic paralysis. Owing to these potential limitations, we used a slow releasing H2S donor (GYY4137) to chemically complement CTH deficiency during HIV reactivation. We thank the reviewer for this comment.

      (8) Figure 4F: The authors need to explain how they can measure a 4-fold gag RNA expression change in untreated cells. Also, according to Figure 4A, 300 µM GYY produces much less H2S than 5mM, yet the suppressive effect of 300 µM GYY is much higher?

      The four-fold-expression in untreated cells is likely due to leaky control of viral transcription in J1.1 cells (14-16). However, to avoid confusion, we have replotted the results by normalizing the data generated upon PMA mediated HIV reactivation with the PMA untreated cells in the revised manuscript (Figure 4F in the revised manuscript). The suppressive effect of GYY4137 at the lower concentration is intriguing but consistent with the findings that high and low concentrations of H2S have profound and distinct effects on cellular physiology (3,17). One possibility is that the high concentration of H2S induces mitochondrial sulfide oxidation pathway to avert toxicity. This might modulate mitochondrial activity and ROS, resulting in the suppression of GYY4137 effect. Consistent with this, higher concentrations of H2S have been shown to cause pro-oxidant effects, DNA damage and genotoxicity (3,18). We have discussed these possibilities in the revised manuscript

      (9) Initially, the authors argue "that the depletion of CTH could contribute to redox imbalance and mitochondrial dysfunction to promote HIV-1 reactivation"(p. 9). Less CTH would suggest less produced H2S. However, later on in the manuscript they demonstrate that addition of a H2S source (GYY4137) results in the suppression of HIV-1 replication and supposedly HIV-1 reactivation. This is somewhat confusing.

      We show that depletion of endogenous H2S by diminished expression of CTH (U1-shCTH) resulted in higher mitochondrial ROS and GSH/GSSG imbalance. Both of these alterations are known to reactivate HIV-1 and promote replication (10,11,19). The addition of GYY4137 chemically compensated for the diminished expression of CTH, and prevented HIV-1 reactivation in U1-shCTH. These events are expected to suppress HIV-1 replication and reactivation. We have made this distinction clear in the revised manuscript.

      (10) CTH, or for that matter CBS or MPST do not only produce H2S, however, they also are part of other metabolic pathways. It would have been interesting and important to study how these metabolic pathways were affected by the genetic manipulations and also how the increased presence of H2S (GYY4137) would affect the metabolic activity of these enzymes or their expression.

      We fully agree with the reviewer. In fact, our NanoString data show that upon CTH knockdown (U1-shCTH), MPST levels were down-regulated and CBS remained undetectable (Fig. 2F in the revised manuscript). Additionally, GYY4137 treatment induced the expression of CTH but not MPST upon PMA addition (Fig. 5A in the revised manuscript). We have incorporated these findings in the revised manuscript. Given that CBS and CTH catalyzed at least eight H2S generating steps and two cysteine-producing reactions, the modulation of CTH by HIV is likely to have a widespread influence on transsulfuration pathway and active methyl cycle intermediates. Our future strategies are to generate a comprehensive understanding of sulfur metabolism underlying HIV latency and reactivation. These experiments require multiple biochemical and genetic technologies with appropriate controls. We hope that the reviewer would agree with our views that these experiments should be a part of future investigation. We thank the reviewer for this comment.

      (11) H2S has been reported to cause NFkB inhibition by sulfhydration of p65; as such, the findings here are not particularly novel or surprising. Also, H2S induced sulfhydration is rather not targeted to a specific protein, let alone a HIV protein, making this approach a very unlikely alternative to current ART forms.

      We believe that NF-kB inhibition is not the only mechanism by which H2S exerts its influence on HIV latency. Recent studies point towards the importance of the Nrf2-Keap1 axis in sustaining HIV-latency (20). Our data suggest an important role for Nrf2-Keap1 signaling in mediating the influence of H2S on HIV latency. Additionally, recruitment of an epigenetic silencer YY1 is also affected by H2S. Interestingly, YY1 activity is modulated by redox signaling (21), suggesting H2S could be an important regulator of YY1 activity in HIV-infected cells. We have so far, no evidence for viral proteins targeted by H2S. However, experiments to examine global S-persulfidation of host and HIV protein are ongoing in the laboratory to fill this knowledge gap. Lastly, our findings raise the possibility of exploring H2S donors with the current ART (not as an alternate to ART) for reducing virus reactivation. We have tone down the clinical relevance of our findings.

      (12) The description of the primary T cell model used to generate the data in Figure 6 is slightly misleading. Also, the idea of this model was originally to demonstrate that "block and lock" by didehydro-cortistatin is possible. In this application, the authors did not investigate whether GYY4137 would actually induce a HIV "block and lock" over an extended period of time.

      As suggested by the reviewer, we have cited the didehydro-cortistatin studies as the basis of our strategy. Our idea was to adapt the primary T cell model to begin understanding the role of H2S in blocking HIV rebound. Our results indicate the future possibility of investigating GYY4137 to lock HIV in deep latency for an extended period of time. However, comprehensive investigation would require long-term experiments and samples from multiple HIV subjects. In the current pandemic times with overburdened Indian clinical settings, we cannot plan these experiments. However, we hope our data form a solid foundation for HIV researchers to perform extended “block and lock” studies using H2S donors.

      (13) However, the authors never provide evidence that endogenous H2S is altered in latently HIV-1 infected cells (which may actually be an impossible task). By the end of the manuscript, the authors have not provided clear evidence that the effects of e.g. CTH deletion would be mediated by the production of H2S, and not by another function of the enzyme. Similarly, the inability of stimuli to trigger efficient HIV-1 reactivation following the provision of unnaturally high levels of H2S is not surprising given reports on the effect of GYY4137 as anti-inflammatory agent and suppressor NF-kB activation. Unless the authors were to demonstrate a true "block and lock" effect by GYY4137 the data will likely have limited impact on the HIV cure field.

      It's difficult to measure H2S levels in the latently infected primary cells due to the assay's sensitivity and the insufficient number of cells latently infected with HIV-1. However, in the revised manuscript we have clearly shown that cysteine levels are not affected by CTH depletion and cysteine deprivation does not reactivate HIV-1. These results indicate that the effects of CTH depletion are likely mediated by H2S. This is consistent with our data showing that GYY4137 specifically complement CTH deficiency and blocks HIV-1 reactivation in U1-shCTH. Further, we carried in-depth investigation to show that the effect of GYY4137 is not due to impaired activation of CD4+ T cells.

      Lastly, since CTH catalyzed multiple reactions during H2S production, we cannot rule out the effect of other metabolites in this process. However, we think that this is outside the scope of the present study. Our study focuses on understanding of how H2S modulates redox, mitochondrial bioenergetics, and gene expression in the context of HIV latency. These understandings are likely to positively impact future studies exploring the role of H2S on HIV cure.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to establish a standardized quantitative approach to categorize the activity patterns in a central pattern generator (specifically, the well-studied pyloric circuit in C. borealis). While it is easy to describe these patterns under "normal" conditions, this circuit displays a wide range of irregular behaviors under experimental perturbations. Characterizing and cataloguing these irregular behaviors is of interest to understand how the network avoids these dysfunctional patterns under "normal" circumstances.

      The authors draw upon established machine learning tools to approach this problem. To do so, they must define a set of features that describe circuit activity at a moment in time. They use the distribution of inter-spike-intervals ISIs and spike phases of the LP and PD neuron as these features. As the authors mention in their Discussion section, these features are highly specialized and adapted to this particular circuit. This limits the applicability of their approach to other circuits with neurons that are unidentifiable or very large in number (the number of spike phase statistics grows quadratically with the number of neurons).

      We agree with the reviewer that the size of the feature vectors as described grows quadratically with the number of neurons. The feature sets we describe are most suited for “identified” neurons – neurons whose identity and connectivity are known and can be reliably recorded from multiple animals. The method described here is best suited for systems with small numbers of identified neurons. For other systems, other feature vectors may be chosen, as we have suggested in the Discussion: Applicability to other systems.

      The main results of the paper provide evidence that ISIs and spike phase statistics provide a reasonable descriptive starting point for understanding the diversity of pyloric circuit patterns. The authors rely heavily on t-distributed stochastic neighbor embedding (tSNE), a well-known nonlinear dimensionality reduction method, to visualize activity patterns in a low-dimensional, 2D space. While effective, the outputs of tSNE have to be interpreted with great care (Wattenberg, et al., "How to Use t-SNE Effectively", Distill, 2016. http://doi.org/10.23915/distill.00002). I think the conclusions of this paper would be strengthened if additional machine learning models were applied to the ISI and spike phase features, and if those additional models validated the qualitative results shown by tSNE. For example, tSNE itself is not a clustering method, so applying clustering methods directly to the high-dimensional data features would be a useful validation of the apparent low-dimensional clusters shown in the figures.

      We thank the reviewer for these suggestions, and agree with the reviewer that t-SNE is not a clustering method, and directly clustering on t-SNE embeddings is rife with complexities. Instead we have used t-SNE to generate a visualization that allows domain experts to quickly label and cluster large quantities of data. This makes a previously intractable task feasible, and offers some basic guarantees on quality (e.g., no one data point can have two labels, because labels derive from position of data points in two dimensional space). In addition:

      • We used uMAP, another dimensionality reduction algorithm, to perform the embedding step, and colored points by the original t-SNE embedding. (Figure 3—figure supplement 3). Large sections of the map are still strikingly colored in single colors, suggesting that the manual clustering did not depend on the details of the t-SNE algorithm, but is rather informed by the statistics of the data.

      • We validated our method using synthetic data. We generated synthetic spike trains from different “classes” and embedded the resultant feature vectors using t-SNE. Data from different classes are not intermingled, and form tight “clusters” (Figure 2 -- figure supplement 4).

      • Finally, we attempted to use hierarchical clustering to cluster the raw feature vectors, and were not able to find a reasonable portioning of the linkage tree that separated qualitatively different spike patterns (Figure at the top of this document). We speculate that this is because feature vectors may contain outliers that bias clustering algorithms that attempt to preserve global distance to lump the majority of the data into a single cluster, in order to differentiate outliers from the bulk of the data.

      The authors do show that the algorithmically defined clusters agree with expert-defined clusters. (Or, at least, they show that one can come up with reasonable post-hoc explanations and interpretations of each cluster). The very large cluster of "regular" patterns -- shown typically in a shade of blue -- actually looks like an archipelago of smaller clusters that the authors have reasoned should be lumped together. Thus, while the approach is still a useful data-driven tool, a non-trivial amount of expert knowledge is baked into the results. A central challenge in this line of research is to understand how sensitive the outcomes are to these modeling choices, and there is unlikely to be a definitive answer.

      We agree with the reviewer entirely.

      Nonetheless, the authors show results which suggest that this analysis framework may be useful for the community of researchers studying central pattern generators. They use their method to qualitatively characterize a variety of network perturbations -- temperature changes, pH changes, decentralization, etc.

      In some cases it is difficult to understand the level of certainty in these qualitative observations. A first look at Figure 5a suggests that three different kinds of perturbations push the circuit activity into different dysfunctional cluster regions. However, the apparent spatial differences between these three groups of perturbations might be due to animal-level differences (i.e. each preparation produces multiple points in the low-D plot, so the number of effective statistical replicates is smaller than it appears at first glance). Similarly, in Figure 9, it is somewhat hard to understand how much the state occupancy plots would change if more animals were collected -- with the exception of proctolin, there are ~25 animals and 12 circuit activity clusters which may not be a favorable ratio. It would be useful if a principled method for computing "error bars" on these occupancy diagrams could be developed. Similar "error bars" on the state transition diagrams (e.g. Fig 6a) would also be useful.

      We agree with the reviewer. Despite this paper containing data from hundreds of animals, the dataset may not be sufficiently large to perform some necessary statistical checks. We agree with the reviewer that a more rigorous error analysis would be useful, but is not trivially done.

      Finally, one nagging concern that I have is that the ISIs and spike phase statistics aren't the ideal features one would use to classify pyloric circuit behaviors. Sub-threshold dynamics are incredibly important for this circuit (e.g. due to electrical coupling of many neurons). A deeper discussion about what is potentially lost by only having access to the spikes would be useful.

      We agree with the reviewer that spike times aren’t the ideal feature to use to describe circuit dynamics. This is especially true in the STG, where synapses are graded, and coupling between cells can persist without spiking. However, the data required simply do not exist, as it requires intracellular recordings, which are substantially harder to perform (and maintain over challenging perturbations) than extracellular recordings.

      Finally, the signal to the muscles – arguably the physiologically and functionally relevant signal – is the spike signal, suggesting that spike patterns from the pyloric circuit are a useful feature to measure. Nevertheless, this is an important point, and we thank the reviewer for raising it, and we have included it in the section titled Discussion: Technical considerations.

      Overall, I think this work provides a useful starting point for large-scale quantitative analysis of CPG circuit behaviors, but there are many additional hurdles to be overcome.

      Reviewer #2 (Public Review):

      This manuscript uses the t-SNE dimensionality reduction technique to capture the rich dynamics of the pyloric circuit of the crab.

      Strengths:

      • The integration of a rich data-set of spiking data from the pyloric circuit

      • Use of nonlinear dimension reduction (t-SNE) to visualise that data

      • Use of clusters from that t-SNE visualisation to create subsets of data that are amenable to consistent analyses (such as using the "regular" cluster as a basis for surveying the types of dynamics possible in baseline conditions)

      • Innovative use of the cluster types to describe transitions between dynamics within the baseline state and within perturbed states (whether by changes to exogenous variables, cutting nerves, or applying neuromodulators)

      • Some interesting main results: o Baseline variability in the spiking patterns of the pyloric circuit is greater within than between animals

      o Transitions to silent states often (always?) pass through the same intermediate state of the LP neuron skipping spikes

      Weaknesses:

      • t-SNE is not, in isolation, a clustering algorithm, yet here it is treated as such. How the clusters were identified is unclear: the manuscript mentions manual curation of randomly sampled points, implying that the clusters were extrapolations from these. This would seem to rather defeat the point of using unsupervised techniques to obtain an unbiased survey of the spiking dynamics, and raises the issue of how robust the clusters are

      We have used t-SNE to visualize the circuit dynamics in a two-dimensional map. We have exploited t-SNE’s ability to preserve local structure to generate an embedding where a domain expert can efficiently manually identify and label stereotyped clusters of activity. As the author points out, this is a manual step, and we have emphasized this in the manuscript. The strength of our approach is to combine the power of a nonlinear dimensionality reduction technique such as t-SNE with human curation to make a task that was previously impossible (identifying and labelling very large datasets of neural activity) feasible.

      To address the question of how robust the manually identified clusters are, we have:

      1) used another dimensionality reduction technique, uMAP, to generate an embedding and colored points by the original t-SNE map (Figure 3 – figure supplement 3). To rough approximation, the coloring reveals that a similar clustering exists in this uMAP embedding.

      2) We generated synthetic spike trains from pre-determined spike pattern classes and used the feature vector extraction and t-SNE embedding procedure as described in the paper. We found that this generated a map (Figure 2—figure supplement 4) where classes of spike patterns were well separated in the t-SNE space.

      • the main purpose and contribution of the paper is unclear, as the results are descriptive, and mostly state that dynamics in some vary between different states of the circuit; while the collated dataset is a wonderful resource, and the map is no doubt useful for the lab to place in context what they are looking at, it is not clear what we learn about the pyloric circuit, or more widely about the dynamical repertoire of neural circuits

      • in some places the contribution is noted as being the pipeline of analysis: unfortunately as the pipeline used here seems to rely in manual curation, it is of limited general use; moreover, there are already a number of previous works that use unsupervised machine-learning pipelines to characterise the complexity of spiking activity across a large data-set of neurons, using the same general approach here (quantify properties of spiking as a vector; map/cluster using dimension reduction), including Baden et al (2016, Nature), Bruno et al (2015, Neuron), Frady et al (2016, Neural Computation).

      • Some key limitations are not considered:

      o the omission of the PY neuron activity means that the map as given is incomplete: potentially there are many more states, and hence transitions, within or beyond those already found that correspond to changes in PY neuron activity

      We agree with the reviewer that the omission of the PY neurons’ activity means that the map is incomplete. There are likely many more states, and hence many more transitions, than the ones we have identified. In addition, we note that there are other pyloric neurons whose activity is also missing (AB, IC, LPG, VD). However, measuring just LP and PD allows us to monitor the activity of the most important functional antagonists in the system (because they are effectively in a half-center oscillator because PD is electrically coupled to AB). In general, the more neurons one measures, the richer the description of the circuit dynamics will be. Collecting datasets at this scale (~500 animals) from all pyloric neurons is challenging, and we have revised the manuscript to make this important point (see Discussion: Technical considerations).

      o The use of long, non-overlapping time segments (20s) - this means, for example, that the transitions are slow and discrete, whereas in reality they may be abrupt, or continuous.

      We agree with the reviewer. There are tradeoffs in choosing a bin size in analyzing time series – choosing longer bins can increase the number of “states” and choosing shorter bins can increase the number of transitions. We chose 20s bins because it is long enough to include several cycles of the pyloric rhythm, even when decentralized, yet was short enough to resolve slow changes in spiking. We have included a statement clarifying this (see Discussion: Technical considerations).

      o tSNE cannot capture hierarchical structure, nor has a null model to demonstrate that the underlying data contains some clustering structure. So, for example, distances measured on the map may not be strictly meaningful if the data is hierarchical.

      We agree with the reviewer. t-SNE can manifest clusters when none exist (Section 4 of https://distill.pub/2016/misread-tsne/) and can obscure or merge true clusters. We have restricted analyses that rely on distances measured in the map to cases where there are qualitative differences in behavior (e.g., with decentralization, Fig 7) or have compared distances within subsets of data where a single parameter is changed (e.g., pH or temperature, Fig 5). The only conclusion we draw from these distance measures is that data are more (or less) spread out in the map, which we use as a proxy for variability. We have included a statement discussion limitations of using t-SNE (Discussion: Comparison with other methods).

      • the Discussion does not include enough insight and contextualisation of the results.

      We have completely rewritten the discussion to address this.

      Reviewer #3 (Public Review):

      Gorur-Shandilya et al. apply an unsupervised dimensionality reduction (t-SNE) to characterize neural spiking dynamics in the pyloric circuit in the stomatogastric ganglion of the crab. The application of unsupervised methods to characterize qualitatively distinct regimes of spiking neural circuits is very interesting and novel, and the manuscript provides a comprehensive demonstration of its utility by analyzing dynamical variability in function and dysfunction in an important rhythm-generating circuit. The system is highly tractable with small numbers of neurons, and the study here provides an important new characterization of the system that can be used to further understand the mapping between gene expression, circuit activity, and functional regimes. The explicit note about the importance of visualization and manual labeling was also nice, since this is often brushed under the rug in other studies.

      Major concern:

      While the specific analysis pipeline clearly identifies qualitatively distinct regimes of spike patterns in the LP/PD neurons, it is not clear how much of this is due to t-SNE itself vs the initial pre-processing and feature definition (ISI and spike phase percentiles). Analyses that would help clarify this would be to check whether the same clusters emerge after (1) applying ordinary PCA to the feature vectors and plotting the projections of the data along the first two PCs, or (2) defining input features as the concatenated binned spike rates over time of the LP & PD neurons (which would also yield a fixed-length vector per 20 s trial), and then passing these inputs to PCA or tSNE. As the significance of this work is largely motivated by using unsupervised vs ad hoc descriptors of circuit dynamics, it will be important to clarify how much of the results derive from the use of ISI and phase representation percentiles, etc. as input features, vs how much emerge from the dimensionality reduction.

      We agree with the reviewer that is important to clarify how much of our results come from the data itself, and how we parameterize them using ISIs and phases, and how much comes from the choice of t-SNE as a dimensionality reduction algorithm. We have addressed this concern in the following ways:

      1. We used principal components analysis on the feature vectors and measured triadic differences in features such as the period and duty cycle of the PD neuron. We found that triadic differences were lower in the t-SNE embedding than in the first two PCA features, or in shuffled t-SNE embeddings (Figure 2– Figure supplement 2), suggesting that the embedding is creating a useful representation that captures key features of the data.

      2. We have used uMAP to reduce the dimensionality of the feature matrix to two dimensions and found that it too preserved the coarse features of the embedding that we observe with t-SNE. Coloring the uMAP embedding by the t-SNE labels revealed that the overall classification scheme was intact (Fig 3 – figure supplement 3).

      3. We generated a synthetic dataset and applied the unsupervised part of our algorithm to it (conversion to ISIs, phases, etc., then t-SNE). We colored the points in the t-SNE embedding by the category in the synthetic dataset. We found that categories were well separated in the t-SNE plot, and each cluster tended to have a single color. This validates the overall power of our approach and shows that it can recover clustering information in large spike sets (Figure 2—figure supplement 4).

      4. We have run k-means and hierarchical clustering on the feature vectors directly and shown that our method is superior to these naïve clustering algorithms running on the feature vectors. We speculate that this is because these clustering methods attempt to partition the full space using global distances, at the expense of distance along the manifold on which the data is located. Algorithms like t-SNE are biased towards local distances, and discount global distances between points outside a neighborhood, and are this better suited here.

    1. Author Response

      Reviewer 1

      Panda and co-workers analyzed RS fMRI recordings from healthy patients and from two types of coma: UWS and MCS. They characterized the time-resolved functional connectivity in terms of metastability (time-variance of the Kuramoto order parameter), spatiotemporal patterns via non-negative tensor factorization, and its relationship to the eigenmodes of structural connectivity. Finding greater metastability and non-stationarity of the DMN network in healthy MCS patients, than in UWS patients, they found that the best discriminators to classify the different DoCs are the number of excursions (nonstability) from the DMN, salience and FPN networks extracted by the NNTF analysis. Interestingly, the data-driven NNTF yielded a novel sub-network comprising the FPN and some subcortical structures. The excursions and dwell times from this FPN subnetwork showed to be significantly lower in the UWS patients than in MCS. Surrogate data testing assures that the different methods and fits are effectively expressing the functional connectivity matrices measured.

      Overall, I think that the results are correct and they advance in the characterization and understanding of the brain under DoC. However, some improvements can be made in the way the results, and the rationale behind them, are presented.

      We thank Prof. Patricio Orio for his assessment.

      While reading the Results section, it is easy to have the impression of a disconnected set of analyses that just happened to be together. In particular, the section about the structural eigenmodes and their relationship with the time-resolved FC seems to have little connection with the rest of the work, except for confirming (yet again) that DoC patients have a less dynamic FC. More elaboration about the relevance of these results, and what they say about DoC (that other dynamical FC analyses don't), is needed both in the introduction and discussion. Although a clear explanation is given in the introduction, the bottom line seems to be yet another measure of metastability. Perhaps, a better explanation of what underlies the 'modulation strength of eigenmodes expression' will be helpful for distinguishing this analysis from others. How novel is the connection that is being done with the structural connectivity and why is this important? Moreover, the eigenmodes analysis has little-to-none importance in the discrimination of patients done at the end; thus, its place within the big picture is hard to evaluate.

      We understand the reviewer’s position. Part one of our work covers time-resolved FC and spatiotemporal networks in DoC. Part two covers the relationship between timeresolved FC and eigenmodes of the structural network. The rationale for including part two is the following: there is a lot of literature that shows that eigenmodes of the structural network can be considered as ‘building blocks’ or basis functions/vectors for spatiotemporal networks at the functional level (Aqil et al., 2021; Atasoy et al., 2016, 2018; Deslauriers-Gauthier et al., 2020; Gabay et al., 2018; Gabay and Robinson, 2017; Robinson et al., 2016; Robinson, 2021; Tewarie et al., 2019, 2020; Wang et al., 2017). Ideally to link part one and two, you would take this notion further by analysing if the magnitude eigenmode coefficients differed between UWS, MCS and healthy controls and how this would relate to dwell times or expression of spatiotemporal networks. However, this would lead to an immense multiple testing issue, which would be impossible to overcome with our sample size. An important link between part one and two of our work is the relationship between change in eigenmode expression and metastability. Our measure for metastability is only a proxy for metastability. Lack of change in eigenmode expressions seems to confirm this result of metastability.

      To allow for better integration of part one and two of our work, we have added to the introduction:

      “These eigenmodes can be considered as patterns of ‘hidden connectivity’ that come to expression at the level of functional networks. It has been postulated that eigenmodes form elementary building blocks for spatiotemporal dynamics (Aqil et al., 2021). There is evidence that the well-known resting state networks can be explained by activation of a small set of eigenmodes (Atasoy et al., 2018).”

      We have also clarified in the result section:

      “As resting-state network activity can be explained by activation of structural eigenmodes, we next analyse the role of fluctuations in eigenmode expression over time.”

      Something that I find counter-intuitive and that may confuse some readers, is the (apparent) contradiction between the diminished metastability in the DoC conditions and the reduced dwell times (Figure S1; also "the inability to sequentially dwell for prolonged times in a different set of eigenmodes", as stated in the Discussion). Fewer excursions and shorter dwell times can only mean that some networks are just less visited and maybe this would be enough to distinguish between conditions. Further explaining this will help to understand better the implications of the work.

      We understand the reviewer’s point, however we disagree that diminished metastability is in contradiction with the findings on dwell times. We show that dwell times are reduced in the posterior DMN, FPN and sub-FPTN networks, however, there is very long dwelling in the residual network in DoC. Hence, the brain resides in fewer network states in DoC, which is in agreement with reduced metastability. Our proxy for metastability is the standard deviation of the Kuramoto order parameter. Whenever there are more visits to network states, or switching between network states as is the case for healthy controls in our data, this would lead to phase uncoupling followed by phase synchronization, which would hence boost the standard deviation of the Kuramoto order parameter (a proxy for metastability).

      We agree with the reviewer that the sentence starting “the inability to sequentially dwell for prolonged….” Is confusing. We have now removed this statement.

      We have now added to the result section:

      “These findings of very short dwell times in the posterior DMN, FPN and sub-FPTN and long dwell time in the residual network can be considered as a contraction of the functional network repertoire in DoC, which is in agreement with a loss in metastability in these patients.”

      Finally, some comments about the connection(s) of these analyses with the commonly used FCD analysis (based on sliding windows of pair-wise correlations) will be useful, to put better this work into the big picture of time evolution of the functional connectivity.

      We have now discussed sliding window-based analysis in the context of our work in the methodology section.

      “Lastly, we have used a high temporal resolution method to estimate time-resolved connectivity at every time point instead of a sliding window-based method. Previous studies using sliding window approaches have provided novel insights into brain dynamics of loss of consciousness, such as the brain co-occurrence of functional connectivity patterns, which is known as brain states and its temporal (i.e., rate of pattern occurrence (probability) and between pattern transition probabilities) alteration in loss of consciousness in DoC patients (Demertzi et al., 2019) and anaesthesia induced loss of consciousness (Barttfeld et al., 2014a; Uhrig et al., 2018). However, sliding window approaches have limited sensitivity to non-stationarity in the fMRI BOLD signals (Hindriks et al., 2016) and lack to provide spatial alteration of classical brain functional network. The exploration of the spatiotemporal aspects of well-known resting state networks is an important step forwards for better understanding the relation between brain function and consciousness, in a way that is impossible to achieve at the whole brain level. In addition, recent work on time-resolved connectivity shows that brief periods of co-modulation in BOLD signals are an important driving factor for functional connectivity (Esfahlani et al., 2020; Hindriks et al., 2016).”

      Reviewer 2

      The study is of high significance, rigor, and novelty. Despite the many studies of repertoire, dynamic connectivity, etc., in the study of consciousness, there is (surprisingly, as I confirmed with a literature search) a dearth of application of these approaches to disorders of consciousness. The manuscript is well-written and transparent about its limitations. The author should consider the following recommendations:

      We thank the reviewer for his/her assessment of our work.

      1) There is frequent reference to "subcortical" and related networks, but I see no description in the text of which subcortical structures are involved. Panel N of figure 2 is helpful but I think that more explicit detail is important, especially given the specific predictions of mesocircuit theory.

      We have provided details for the subcortical networks presented in the Panel N of Figure 2. In the manuscript we provide a textual description of the brain areas that are part of the network. To improve the clarity of the description of the network, we also now refer to it as “subcortical fronto-temporoparietal (Sub-FTPN)”.

      In the result section, it read as: “This modulated subcortical fronto-temporoparietal network consist of the following brain regions: bilateral thalamus, caudate, right putamen, bilateral anterior and middle cingulate, inferior and middle frontal areas, supplementary motor cortex, middle and inferior temporal gyrus, right superior temporal, bilateral inferior parietal and supramarginal gyrus.”

      2) Similarly, although the global neuronal workspace does posit a critical role for recurrent frontal-parietal networks, can the authors be more specific about the nodes of the proposed workspace and what they found empirically?

      As above mentioned, we have provided more details about the regions part of the “subcortical fronto-temporoparietal”. As the reviewers rightfully noted, this network also shows some overlap with the Global Neuronal Workspace. We refer to that in more detail in the discussion, highlighting how our functional networks overlap and differ with the two networks (i.e., one feedforward only, one with recurrent activity), and with the predictions of the mesocircuit model. For more detail, please refer to the reply to point 1 of “Recommendations for the authors”.

      3) The classification sensitivity/specificity did not, in my opinion, add much to the manuscript, especially since the number of patients is not remotely close to what would be required for a population-based diagnostic approach. If the authors chose to include this with any reference to diagnosis (highlighted in the introduction and elsewhere), I would encourage a comparison with similar data from other clinical or neuroimagingbased diagnostic approaches. However, I think the value of the study resides more with mechanistic understanding than diagnosis.

      We agree with your suggestions that the primary aim of our work is to provide a mechanistic understanding of loss of consciousness. Therefore, we have removed the classification part from the paper and explain our findings focusing on mechanism of pathological unconsciousness rather than its potential as a clinical diagnostic tool. This change has required several textual edits throughout the manuscript.

    1. When memes or the subjects of a meme are used for commercial purposes without permission, the meme creator may sue, as the effect of the commercial use on the market value of the original meme usually prevents a finding of fair use. In 2013, the owners of the cats featured in the “Nyan Cat” and “Keyboard Cat” memes won a lawsuit against Warner Bros. and 5th Cell Media for respectively distributing and producing a video game using images of their cats.

      Big corporations use other creators' work more often than we think. It is unreal to think that people's work can be stolen from the internet easily and sometimes it could be hard to prove. Fortunately, these two cases were able to win their lawsuit.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript investigates a role for YAP in replication. Previous work from this group has shown that Yap knock-down leads to accelerated S-phase and an abnormal progression of DNA replication in the frog eye. Here they extend this to show that YAP depletion accelerates S-phase and DNA replication in the frog embryo, and that YAP binds a DNA replication regulator called Rif1. Combing assays suggest that YAP acts on origin firing. This is an interesting new aspect of YAP function. I am not an expert on DNA replication, however, I feel that the manuscript would have been improved if more mechanistic insight was gained into how Rif1 and YAP interact, and how that interaction influences replication timing.

      In the revised version of the manuscript, we have strengthened our conclusion that Yap regulates the dynamics of DNA replication. We now provide additional experiments in addition to DNA combing and nascent strand analysis by agarose gel electrophoresis: Rhodamine-dUTP incorporation/nucleus, 32P-dCTP incorporation, western blotting for replication fork proteins. All show that DNA synthesis and origin activation is increased after Yap depletion.

      Moreover, in the revised manuscript we also directly compared the effects of YAP depletion to those of Rif1 depletion alone (page 7, New Figure 4). As for Yap depletion, we first quantified rhodaminedUTP incorporation after Rif1 depletion by direct fluorescence microscopy that demonstrated a clear increase of DNA synthesis, consistent with Alver et al. 2017. Second, we performed DNA combing experiments after Rif1 depletion in egg extracts that show a marked increase in DNA replication and fork density like those seen after Yap depletion, spanning from very early to mid S-phase. We therefore found that Rif1 depletion and Yap depletion qualitatively show the same main effects: an increase of DNA synthesis and fork density, that are more pronounced in early S-phase. We also noticed quantitative differences in the direct fluorescence after rhodamine incorporation of whole nuclei and fork density, with stronger effects after Rif1 depletion compared to Yap depletion. This suggests that there might be an additional mechanism for Rif1 in regulating origin activation.

      The title of the manuscript is "A non-transcriptional function of YAP orchestrates the DNA replication program". It is not clear that YAP "orchestrates" DNA replication - for this to be true, it would have to be signal responsive. Since the authors did not reveal any links to YAP activity (such as YAP phosphorylation or nuclear/cytoplasmic distribution) it is not "orchestrating" DNA replication.

      We have replaced “orchestrates” by “regulates”.

      Figure 1 shows that YAP is recruited onto chromatin after MCM2 and MCM7 and at the same time as PCNA and the start of DNA synthesis. Addition of geminin, an inhibitor of Cdt and MCM loading inhibits YAP loading onto chromatin. YAP immuno depletion leads to premature DNA synthesis or replication. Fig 1 B is quite confusing- the labeling in Figure 1B is likely incorrect.

      We apologize for this confusion. This has been corrected and the Figure 1B is now properly labelled.

      Figure 2 investigates if YAP depletion affects origin firing or fork speed, using DNA combing. Fig 2A shows that there is increased activated replication origins and decreased distance between origins. The authors say that the increase of fork density is more pronounced than the decreased distance, suggesting YAP is regulating the activation of origins. The number of replicates is low. This is especially true for the conclusion that eye length is unaltered -it appears that there is a subset of eye length that is increased in 2F, which might reach significance if triplicates were performed.

      As the referee points out, both the observed increase of fork density and decrease of origin distances argues that origin activation is increased after Yap depletion. The fact that the increase of the fork density seems more pronounced than the local decrease of neighbouring origins allows a more detailed interpretation, explicitly that whole clusters of origins are activated on top of origins inside already active clusters. This can be observed in the two independent experiments probing many fibers for eye distances and eyes numbers.

      Concerning Figure 2F, the scatter plot makes it look like that the impression that there are more eyes with larger sizes after Yap depletion, but please note that there are also more EL measured as stated in the legend (Mock n=182 versus Yap n=311). To highlight this parameter, we added these numbers below the scatter plot in the revised Figure 2F, as we have done consistently for all of the experiments presented in the revised Figures. The means of the two EL distributions are numerically different but since both distributions are not Gaussian (tested by d'Agostino and Pearson test), only non-parametric tests can apply (Mann-Whitney or Kolmogorov Smirnow test). The results of the two non-parametric tests show that the distributions are not significantly different, as mentioned in the legend. However, we cannot rule out that after Yap depletion some larger eyes may arise from fusions of forks or from a higher fork speed, but again, the tests, applied to a high number of measurements, show no significant statistical differences.

      The authors conducted AP-MS on egg extracts to identify proteins that co-IP with YAP. One of many proteins identified was RIF1 Figure 3 shows a co-IP with RIF1 and YAP. It is a very weak co-IP.

      We agree that the Rif/Yap co-IP is weak, but it is reproducible in several independent experiments with different extracts. There could be many reasons for this. Co-IPs with a high molecular weight partner like Rif1 (250 kDa) are generally tedious (poor gel migrations and WB transfer). Further, Rif1 has been described as having a subnuclear localisation and to associate with the nuclear lamina and heterochromatin. These characteristics are known to make the proteins highly insoluble. These technical limitations have been reported for the mouse Rif1 for instance (Sukackaite R; et al. Sci Rep 2017 May 18;7(1):2119). In fact, similar “weak co-IPs” were also obtained between Rif1 and Nanog (Wang J. et al. Nature 2006 (444), 364–368 ) as well as with PPI (Hiraga S. et al. EMBO Rep. 2017 Mar;18(3):403-419). Finally, it could also be that this interaction is not permanent but dynamic, making it difficult to capture in a Co-IP. Taken together, these parameters mean that the identification of the interaction is in itself challenging. What we did manage to provide is a reciprocal co-IP using the endogenous proteins, which we believe best reflects native conditions.

      Figure 4 shows that YAP levels increase during development and that depletion of YAP or RIF1 leads to increased cell division. The authors use Trim-away to deplete YAP and RIF1 and find that depletion of either leads to an increased number of small cells. The YAP depletion shown in Fig 4B is clear, as is the increased number of small cells in YAP depletion or RIF1 depletion.

      Figure 4 supplement 1 is arguing that trim away and morpholino combined are more effective. Quantitation of the western blots in panel A is needed for this to be convincing.

      The quantification is now presented in new Figure 5-figure supplement 1A. At the 2-cell stage, we observe some fluctuations in the amounts of Yap between samples, the origin of which we do not fully understand. At the 4-cell stage, a reduction in Yap is observed regardless of the depletion strategy used. It is from the 8-cell stage onwards that differential effects between the depletion methods can be appreciated. From this stage onwards, the quantifications confirm that the TRIM-Away and morpholino combined are more effective than taken separately.

      Figure 5 shows that RIF1 is expressed in the eye in RSC and that loss of RIF1 leads to a small eye. Panel B shows that by western blot analysis RIF1 antibody is specific. However, antibodies can have very different abilities in western vs staining. The RIF1 and YAP antibodies should be validated in staining. Also, the staining in Fig5C is at low resolution for both YAP and RIF1 and the identification of foci is unclear.

      This is indeed an important issue. To address this point, we performed immunostaining on retinal sections from embryos depleted with the target protein and compared the fluorescent signal obtained in control versus depleted samples. We show that upon depletion of Yap or Rif, the signal from the immunostaining is severely reduced for Yap or Rif1, respectively, which attests the specificity of the antibodies used in this study. We have added an additional supplementary Figure to show this control (Figure 6-figure supplement 1).

      We agree with the reviewers that the quality of the images could be improved. We now provide confocal images with a better resolution (Figure 6C).

      For Rif1, we observe a clear nuclear staining, rather non-homogenous which is consistent with data reported in the literature. Indeed, Rif1 localisation has been shown to be highly dynamic during the cell cycle and also during S-phase (Cornacchia D. et al. EMBO J. 2012). Some brighter foci could be observed at specific phases (such as G1-phase) but overall, the general pattern appears rather “granular” and restricted to the nucleus. This is what we are also observing. Interestingly, Rif1 does not appear to colocalize with the replication fork or with the replicative helicase MCM3 (Cornacchia D. et al. EMBO J. 2012). The replication foci observed in this study are therefore to be understood independently of the Rif1 localisation pattern.

      For Yap, we do not detect any granular expression but observe rather homogeneous nuclear and cytoplasmic staining, which is also consistent with reported data showing YAP nucleo-cytoplasmic shuffling (see for instance Manning S.A. et al. Curr Biol. 2018). STED microscopy might be necessary for higher resolution.

      It is difficult to see the points the authors wish to communicate in Figure 6. There is almost no Edu in the YAP-MO, which questions the ability to recognize the different patterns in this region of the eye.

      Our observations show that there are fewer EdU positive cells in the Yap-MO but not “no EdU”. The fluorescence intensity in the green-labelled nuclei in Figure 7C after Yap MO does not appear different from that in the control-MO. Under these conditions, there is no reason to think that one pattern is more difficult to recognise than the other one.

      Reviewer #2 (Public Review):

      This paper is of potential interest within the field of DNA replication, as it identifies a novel role for YAP protein in DNA replication dynamics. However, the conclusions are not supported by properly controlled data. Several aspects of data analysis and representation need to be revised.

      In this manuscript, the authors characterized YAP function in the control of DNA replication dynamics, taking advantage of the Xenopus laevis system.

      They found that YAP is recruited to replicating-chromatin and showed that its chromatin enrichment depends on the assembly of pre-RC proteins. In addition, they show that the immuno-depletion of YAP leads to increased DNA synthesis and origin activation, revealing YAP's possible role in the regulation of replication dynamics.

      The authors were also interested in finding YAP potential partners that could mediate its function. They identified Rif1, a major regulator of replication timing, as a novel YAP interactor during DNA replication.

      As RIF1 expression in vivo is restricted to the stem cell compartment of the Xenopus retina, similar to YAP, the authors assessed whether Rif1 could regulate the spatial-temporal program of DNA replication in stem cells. They showed that depletion of Rif1 at early stages of Xenopus embryos development leads to alterations in replication foci of retinal stem cells, resembling the effect observed following YAP down-regulation.

      Finally, they studied the impact of YAP and RIF1 down-regulation at early stages of development, showing that their absence results in the acceleration of cell division rate of Xenopus embryos, where RNA transcription is absent. Based on these results they concluded that YAP has a role in S-phase independent from transcription.

      The higher rate of DNA synthesis observed in the absence of Yap in Figure 1D is not very evident from the gels in Figure 1, supplement 3B. The timing of the experiments is continuously changing throughout the figures. It is therefore difficult to compare them. Also, comparisons across different gels are difficult to interpret. Most importantly, relative quantification on gel images cannot support the claim of increased DNA synthesis in the absence of YAP. To accurately quantify the replication of DNA added to the extract, the total amount of DNA synthesized must be quantified.

      Although we do not agree that relative quantification on gel images cannot support the claim of increased DNA synthesis in the absence of Yap, we thank the reviewer for his suggestion since we now provide additional data clearly strengthening our conclusion.

      Many studies, published in high standards journals and coming from different Xenopus replication laboratories have quantified DNA synthesised after 32P-dCTP incorporation and separation by agarose gel electrophoresis (Shechter et al, 2004; Trenz et al, 2008; Guo et al, 2015; Walter & Newport, 1997; Suski et al, 2022, Nature). Nevertheless, as the referee suggested, we quantified the total amount of DNA synthesized in three new independent experiments. These new results, presented page 5, lines 34-39 and shown in Figure 1G, support our conclusion, as they also show that Yap depletion increases total DNA synthesis. Please note that the DNA combing results presented in Figure 2 also show that replication is increased after Yap depletion. Finally, we also added another set of experiments to Figure 1 to further confirm these findings. We used the incorporation of Rhodamine-dUTP followed by the quantification of the fluorescence intensity within nuclei. This nuclei-fluorescence based method is frequently used in proliferation assays to assess nucleotide incorporation resulting from the DNA replication process in other organisms. Our new results demonstrate that DNA synthesis is increased 1.5-fold in six biological replicates and represent a third independent method, in addition to DNA combing and 32P-dCTP incorporation, showing that DNA synthesis is increased upon YAP depletion. These new results are now presented page 5, lines 27-24 and shown in Figure 1D-F.

      As explained in the MM section page 14 in the original manuscript, the replication extent (percent of replication) differs for a specific time point from one extract to another, because each egg extract prepared from one batch of eggs replicates nuclei with its own replication kinetics. To overcome this problem and to compare different independent experiments performed using different egg extracts, the data points of each sample were normalized to maximum incorporation value.

      It is also necessary to analyze the dynamics and the abundance of chromatin-bound replication proteins associated with the active replication fork after Yap depletion using chromatin binding assays. This would further confirm the increase in the fork density observed by DNA combing experiments.

      We thank the referee for this suggestion and we added a western blot of chromatin bound proteins after Yap depletion. This shows that two replication proteins associated with the active replication fork, namely Cdc45 and PCNA, are enriched after Yap depletion compared to the control at the beginning of S-phase. This observation further supports the DNA combing results showing that more forks are active after YAP depletion. This new data is now presented page 6 lines 25-32 and displayed in Figure 2H.

      We would like to stress here that with these additional methods added to the revised version, five different methods in total (Rhodamine-dUTP incorporation/nucleus, 32P-dCTP incorporation - total synthesis, 32P-dCTP incorporation - nascent strand analysis, DNA combing, western blotting for replication fork proteins) show that DNA synthesis and origin activation is increased after Yap depletion.

      The quantification of the amount of YAP in Figure 1B is confusing. The legend of the chart states "Control in light grey and presence of geminin in black", but the bar colors are of different shades of grey. It is not clear how to evaluate them.

      We apologize for this confusion. This has been corrected and the Figure 1B is now properly labelled.

      The efficiency of depletion for both Rif1 and YAP is different in Figure 4B and Figure 4A, supplement 1.

      We agree with the referee that the efficiency of depletion is different in both figures. This is explained by the fact that the extent of the depletion varies from experiment to experiment. We work with different batches of in vitro fertilized embryos and extracts, so these differences simply reflect the technical/biological variability.

      Moreover, the combined use of the TRIM-away approach with injections of MO led to a stronger and prolonged YAP depletion but also triggered toxicity in the tadpoles, which display severe abnormalities.

      It is important to point out that abnormal development is not always attributable to a toxic effect. Many losses of gene function result in malformations without being ascribed to toxicity or unspecific effects. However, we agree with the reviewers on the need to present a rescue experiment, which is now shown in new Figure 5C and new Figure 5-figure supplement 1B. In addition, we also provide gain-of-function (GOF) data for YAP in early embryos. In brief, we find that the Yap GOF leads to opposite outcomes than those of its depletion with embryos at the same stage of development, having fewer and larger cells than the control. Furthermore, we show that the effects of Yap depletion, i.e. embryos with more and smaller cells than the control at the same developmental stage, are rescued by the injection of MO-resistant Yap mRNA to restore the protein level. This is true for both embryonic divisions (new Figure 5C) and development, as we obtained normal-looking neurula after Yap rescue (new Figure 5-figure supplement 1B). Overall, these data now clearly show that Yap is both sufficient and necessary to maintain the rate of embryonic divisions and that this phenotype is specific since it can be rescued by expressing Yap alone. These new data are presented page 8, lines 2-10.

      Reviewer #3 (Public Review):

      The article by Garcia et al clearly describes a set of experiments establishing Yap as a novel regulator of DNA replication dynamics. Its characterization as both a RIF1 interaction partner as well as playing its own role in replication initiation will likely have a significant impact on the field, as currently little is known about how DNA replication during early embryonic cell divisions is regulated.

      The authors aim to identify a non-transcriptional function of YAP through the use of the Xenopus in vitro replication system and Yap depletion. Strengths of the paper include the particularly appropriate use of the Xenopus in vitro replication system, as well as the combined use of Trim-Away and morpholino oligonucleotides to deplete Yap and Rif1. Moreover, these experiments were elegantly complemented by single-molecule molecular combing and in vivo studies. Identifying Yap as a novel regulator of DNA replication dynamics, the authors achieved their aim. Through characterization of Yap as both playing a role in replication initiation and as a Rif1 interaction partner will likely have a significant impact on the field, as currently little is known about how DNA replication during early embryonic cell divisions is regulated. A weakness of the paper is that some of the representative data does not appear to be very representative of the entire data set.

      We replaced representative data in Figure 2 A, which we think better reflects the main conclusions of the entire data set.

    1. Reviewer #1 (Public Review):

      1: The authors formulate competing hypotheses on the behavioral impact of alpha oscillations using signal detection theory (SDT) (Intro and Fig. 1). SDT is indeed well suited for this, as it is used to compute the orthogonal behavioral metrics d' (discriminability) and criterion (bias). However, soon the authors write:

      "The higher d' for conservative trials may be due to the more skewed mapping between the false alarm (FA) rate to its Z-value in our d' computation. Specifically, when criterion (or the decision boundary) intersects the noise distribution at its right tail, small changes in FA rate are nonlinearly exaggerated after Z-transformation. As we did not observe a difference in accuracy between conservative and liberal trials, which is a more robust measure of perceptual discriminability when target presence rate equals 50%, we argue that the observed statistically significant d' difference is equivocal."

      And also:

      "For the binning analyses, we mainly focused on the percentage correct (i.e., accuracy),<br /> and hit and FA rates, because these metrics scale linearly (as opposed to d', which scales<br /> nonlinearly as the hit rate increases or FA rate decreases linearly) and are well defined for both<br /> behavioral data and MVPA outputs."

      And indeed from Fig. 3 onwards they do not really use SDT anymore, which is confusing given the Introduction and Fig. 1. I think it's also problematic, as accuracy, hit-rate and fa-rate are not orthogonal and are therefore much less suited to arbitrate between their competing hypotheses. As a result, I'm not convinced the paper accomplishes what it sets out to do in the Introduction.

      2: Related, if indeed the authors choose to deviate from SDT, they should put the metric "% yes-choices" on equal footing with accuracy. For example, in Fig. 3A, we can see that alpha oscillations predict a reduction of hit-rate as well as fa-rate; this suggest that the main effect is actually on choice bias (% yes-choices) rather than accuracy. If that's true, then the title of this manuscript is misleading.

      3: Have the authors considered to test for non-monotonic effects of alpha oscillations and cortical computation and behavior?

      4: The authors use challenging and sophisticated methods, but these are introduced very casually. For example:

      "To obtain a more fine-grained picture of the alpha power modulation of behavior, we applied generalized linear mixed models (GLMMs; see Methods) to account for both between-subjects and within-subject trial-by-trial response variability, and to estimate the effects of alpha oscillatory power on d' and criterion simultaneously."

      And:

      "To evaluate the quality of visual information coding, we used multivariate pattern analysis (MVPA), operationalizing the quality of visual representation as the neural classifier's classification performance. We used the priming trials to train binary classifiers to classify target-present vs. absent trials in a time-resolved manner, [...]".

      It would help a lot if the authors could unpack their rationale some more. For example, why did they consider between-subjects effects, and could they show some scatter plots with between-subjects correlations before turning to the GLMM? Also, what is the question the authors wanted to answer that required training the classifier in a time-resolved manner (which I like, on a personal note)?

      5: Throughout, the label "liberal trials" is odd, given that group-average criterion > 0 on those trials (Fig. 2C).

      6: It would be nice to explicitly bridge to the literature on (pupil-linked) arousal predicted shifts in decision-making, and to findings on the relationship between alpha oscillations and (pupil-linked) arousal.

    1. But I think we may go still further. The right to regulate the use of wealth in the public interest is universally admitted. Let us admit also the right to regulate the terms and conditions of labor, which is the chief element of wealth, directly in the interest of the common good.

      On July 5, 1935, President Roosevelt created the Wagner Act, also known as the Nation Labor Relations Act. The Act included many things such as entitlement to wages and benefits, hour of work, overtime arrangements and overtime compensation, and leave for illness, maternity, vacation or holiday. Labor and working conditions. (n.d.). https://firstforsustainability.org/risk-management/understanding-environmental-and-social-risk/environmental-and-social-issues/labor-and-working-conditions/ National Labor Relations Act (1935). (2021, November 22). National Archives. https://www.archives.gov/milestone-documents/national-labor-relations-act

    1. Well, this was a true early morning treat!You reeeeally botched that one. Like 180 degrees misinterpreted it.That thread is about how Luhmann developed a personal approach that worked for him (as we all do and should), and that there is no one way to work/do a zettelkasten. Ie. We all must (and inevitably will) interpret Luhmann's take on zettelkasten method (and any other tools/method/etc we encounter) in light of what our needs are.What's super dope, is that my whole jam in this ZK world is about showing the thread/lineage of these techniques and helping people specifically wrestle with some of the principles and practices Luhmann employed so that in the end they can apply them in whatever way they see fit. And yet, somehow....you actually miss that?Also, this.... (you)"We approach these methods from such a top down manner, in part, because our culture has broadly lost the thread of how these note taking practices were done historically. Instead of working with something that has always existed and been taught in our culture, and then using it to suit our needs, we're looking at it like a new shiny toy or app and then trying to modify it to make it suit our needs."... Is this....(me)"We're coming at [zettelkasten] top-down. We're appropriating something and trying to retrofit it in a desire to "be better." In doing so, we're trying "clean it up a bit."I'm critiquing this approach 😂 I'm saying we come at it top-down bc we see it as a reified object (which is incorrect) that is set in stone, when in fact those who present the "one true way" are actually presenting a "cleaned up version" of Luhmann's very personal approach and calling it "official." Again, I'm critiquing that! I am, by design and punk ethos, kinda against "official."Silly, dude. The whole thread is about not looking at it as a "shiny new toy" and seeing it as a more fluid aspect of note-taking and personal practice. It's about recognizing that the way to recreate Luhmann is to be flexible, interpret these methods for yourself. Why? Bc that's exactly what Luhmann did."Let the principles and practices guide your zettelkasten work. Throw them in a box with your defined workflow issues. Let them hash it out. Shake the box and let them tell you the "kind" of zk you should be working with." (thread the day before the above mentioned)Also, and you're gonna love this....Here's you above...."People have been using zettelkasten, commonplace books, florilegium, and other similar methods for centuries, and no one version is the "correct" one."And here's me....."The most well-known slip-boxes in the world have been employed by writers in service of their writing. Variations of the system date back to the 17th c., [3] and modern writers such as, Umberto Eco, Arno Schmidt, and Hans Blumenberg are all known for employing some version of the slip-box to capture, collect, organize, and transform notes into published work. Of course, today, the most famous zettelkasten is the one used...."Sound familiar? It's me citing you, ya dum dum 😂 Footnote numero tres....https://writing.bobdoto.computer/zettelkasten-linking-your-thinking-and-nick-milos-search-for-ground/Such a funny thing to see this fine Friday morning! ☀

      Sadly I think we're talking past each other somehow; I broadly agree with all of your original thread. Perhaps there's also some context collapse amidst our conversations across multiple platforms which doesn't help.

      Maybe my error was in placing my comment on your original thread rather than a sub branch on one of the top several comments? I didn't want to target anyone in particular as the "invented by Luhmann myth" is incredibly wide spread and is unlikely to ever go away. It's obvious by some of the responses I've seen from your thread here in r/antinet that folks without the explicit context of the history default to the misconception that Luhmann invented it. This misconception tends to reinforce the idea that there's "one true way" (the often canonically presented "perfect" Luhmann zettelkasten, rather than the messier method that he obviously practiced in reality) when, instead, there are lots of methods, many of which share some general principles or building blocks, but which can have dramatically different uses and outcomes. My hope in highlighting the history was specifically to give your point more power, not take the opposite stance. Not having the direct evidence to the contrary, you'll noticed I hedged my statement with the word "seems" in the opening sentence. I apologize to you that I apparently wasn't more clear.

      I love your comparison of LYT and zettelkasten by the way. It's reminiscent of the sort of comparison I'm hoping to bring forth in an upcoming review of Tiago Forte's recent book. His method—ostensibly a folder based digital commonplace book, which is similar to Milo's LYT—can be useful, but he doesn't seem to have the broader experience of history or the various use cases to be able to advise a general audience which method(s) they may want to try or for which ends. I worry that while he's got a useful method for potentially many people, too many may see it and his platform as a recipe they need to follow rather than having a set of choices for various outcomes they may wish to have. Too many "thought leaders" are trying to "own" portions of the space rather than presenting choices or comparisons the way you have. Elizabeth Butler is one of the few others I've seen taking a broader approach. A lot of these explorations also means there are multiple different words to describe each system's functionality, which I think only serves to muddy things up for potential users rather than make them clearer. (And doing this across multiple languages across time is even more confusing: is it zettelkasten, card index, or fichier boîte? Already the idea of zettelkasten (in English speaking areas) has taken on the semantic meaning "Luhmann's specific method of keeping a zettelkasten" rather than just a box with slips.)

    1. Author Response

      Reviewer 1

      Strengths:

      This manuscript combines experimental, exploratory, and observational methods to investigate the big question in innovation literature--why do some animals innovate over others, and how information about innovations spread. By combining a variety of methods, the manuscript tackles this question in a number of ways, and finds support for previous work showing that animals can learn about foods via social olfactory inspection (i.e., muzzle to muzzle contact), and also presents data intended to investigate the role of dispersing animals in innovation and information spread.

      Using data from a previously-published experiment, the manuscript illustrates how investigators can numerous interesting questions while limiting the disturbances to wild animals. The manuscript's attempt at using exploratory analysis is also exciting, as exploratory analyses provide a useful tool for behavior research-indeed, Tinbergen insisted that behavior must first be described.

      Weaknesses:

      The manuscript's introduction is a bit unclear as to how the fact that dispersing males may be an important source of information ties to innovations in response to disruptions due to climate change, humans, or new predators, if at all. An introduction regarding the role of dispersed animals in introducing novel behaviors and social transmission would better prepare readers for the questions presented in the manuscript. As it stands now, the manuscript only provides one sentence discussing the theoretical relevance of investigating the role of dispersing animals in innovations.

      We have added some information about this to the introduction (lines 66 – 69 and 121-123) and maintain our discussion of it in the discussion.

      Additionally, while the manuscript attempts to use exploratory analysis, it does not provide enough theoretical background as to why certain questions were asked while the data were explored. While the discussion provides some background as to the role of dispersing males in innovation, the introduction provides little background, and thus does not properly frame the issue. It is unclear how dispersing males became of interest and why readers should be interested in them. As the manuscript reads now, it may be that dispersing males became interesting only as a result of the exploratory analysis-except that the predictions explicitly mentions dispersing males. Thus, manuscript at present makes it difficult to know if the questions surrounding immigrant males resulted from the exploratory analysis, or was a question the analyses were intended to answer from the beginning. If this question only came out after first reviewing the results, then this needs to be made clear in the introduction. I see no issue with reporting observations that were the result of investigations into earlier results, but it needs to be reported in a way that can be replicated in future research-I need to know the decision process that took place during the data exploration.

      We hope this is clearer from our new research aims (lines 125-173)

      The manuscript never clearly defines what counts as an immigrant male; presumably, in this species, all adult males in the group should be immigrants, as females are the philopatric sex. Sometimes, the manuscript uses "recently" to modify immigrant males, but doesn't define exactly what counts as recent, except to say that the males that innovated were in their respective groups for fewer than 3 months, but never explains why three months should be an important distinction in adult male tenure.

      We realise how we wrote about this previously was not clear and perhaps misleading. We noticed that the males that innovated had been in the group for less than three months. We do not know if this is necessary for them to innovate or not. We also added to the discussion a description of the male in AK19 who had been in the group for four months and did no innovate – as he had many other traits which we would expect to exclude him from criteria for innovation (e.g. very old, post-prime, and inactive – died within months of the experiment).

      Due to the above weaknesses, the provided predictions are a bit murky. It is not clear how variation between groups in accordance with who innovated, or initiated eating a novel food, or demographics is related to the central issue. The manuscript does contribute to the literature by looking at changing rates of muzzle contact over exposure to a novel food source, and provides a good extension of previous findings; that, if muzzle contacts help animals learn about new foods, then rates of muzzle contacts involving novel foods should decrease as animals become familiar with the food. However, this point isn't explicit in the manuscript.

      This is now addressed in the new aims paragraph (lines 125-173)

      Finally, it is also unclear as to why changing rates of muzzle contact AND whether certain individual level variables like knowledge, sex, age, and/or rank might influence muzzle contacts during opportunities to innovate.

      We are not sure exactly what the reviewer means here, but hope that the substantial revisions we have made now address their concern.

      As for the methods, the manuscript doesn't provide enough details as to why certain decisions were made. For example, no reason is given as to why only the first four sessions after an animal ate were considered, why the first three months of tenure (but not four, as seen on one group that didn't innovate) was considered to be a critical time for which immigrant males may innovate, why (including the theoretical reasons) the structure of models for one analysis was changed (dropping one variable, adding interactions), or even how the beginning and ending of a trial was decided, despite reporting that durations varied widely,-from 5 minutes to two hours.

      Please see: above about the male with 4 month tenure; and top of document for description of our updated models.

      The discussion contains results that are never elsewhere presented in the manuscript- (2a) Individual variation in uptake of a novel food according to who ate first).

      It was just an error in the sub-title in the discussion – this is now amended. But all the other corresponding details were already there, in the list of research aims in the introduction and in the results as well.

      Finally, the largest issue with the manuscript is that its results are not as convincing as the conclusions made. An issue with all the analyses is that some grouping variables in some analyses but not others despite the fact that all of the analyses contain multiple groups (necessitating group as a grouping variable) and multiple observations of the same individuals (i.e., immigrant males tested in multiple groups, necessitating animal identity as a random effect), and not accounting for individual exposure to the experiment when considering whether animals ate the food in the allotted period (an important consideration given the massive differences in trial times), making these results difficult to interpret in their current forms. As for the results regarding muzzle contact, the analyses has a number of issues that make it difficult to determine if the claims are supported. These issues include not explaining why rank calculated a year before the experiments took place was valid or if rank was calculated among all group members or within age and sex classes, not explaining how rank was normalized, and not conducting any kind of formal model comparisons before deciding the best model.

      Mostly addressed at top of this document. Regarding rank calculations: rank was not calculated a year before the experiments, it was calculated using a year’s worth of data up to the beginning of the experiments – and ranks were calculated among all group members - we have made this clearer in the methods. We also explained our method of normalisation, and noted that it was an error to include non-normalised rank in one of the models – this has now been rectified

      As for the results regarding immigrant males and innovation, little is done to help the fact that these results are from very few observations and no direct analyses. It is possible that something that occurs relatively often but in small sample sizes, like dispersing animals, could have immense power in influencing foraging traditions, and observation is a necessary step in understanding behavior. However, the manuscript doesn't consider any alternative hypotheses as to why it found what it found. No other possible difference between the groups was considered (for example, the groups that rapidly innovated appear to be quite smaller than the groups that did), making the claim that immigrant males were what allowed groups to innovate unconvincing. This is particularly true given that some groups in this study population have experimental histories (though this goes unmentioned in the current manuscript), which likely influenced neophobia-especially given work by the same research group showing that these animals are more curious compared to their unhabituated counterparts.

      We have added more discussion of alternative hypotheses to the discussion (line numbers mentioned above).

      Regarding the comment about rapid innovation in smaller groups – we are not sure what the reviewer means here – all groups except BD were similar sized. The second largest group, NH, had one of the quickest innovations and a smaller group (KB) innovated only at the third exposure. Unless the reviewer instead refers to the spread of the innovation here? This is also not quite what we see in the data – BD is the largest group and one of the fastest to spread, and KB is the smallest group and the slowest to spread. Regarding groups experimental histories, all the five studied groups have already been used in field experiments. The group (LT) with the least experimental history was the one having the greatest proportion of individuals eating the novel food at the first and over the four exposures (see Fig. 2) while one of the groups with the most experimental history (NH) was one having a smaller proportion of individuals eating the food across the experiment. This is discussed in the discussion (lines 370-380).

      Reviewer 2

      I have separated my issues with the manuscript into three sub-headings (Conceptual Clarity, Observational Detail and Analysis) below.

      1) Conceptual clarity

      There are a number of areas where it would greatly benefit the manuscript if the authors were to revisit the text and be more specific in their intentions. At present, the research questions are not always well-defined, making it difficult to determine what the data is intended to communicate. I am confident all of these issues could be fixed with relatively minor changes to the manuscript.

      For example, Line 104: Question 1 is not really a question, the authors only state that they will "investigate innovation and extraction of eating the food", which could mean almost anything.

      We re-wrote the research questions paragraph and results with this advice in mind – hope it is clearer now. We keep the innovation part just descriptive and hope this is less problematic now.

      Question 2a (line 98) is also very vague in it's wording, and I'm left unclear as to what the authors were really interested in or why. This is not helped by Line 104 which refuses to make predictions about this research question because it is "exploratory". Empirical predictions are not simply placing a bet on what we think the results of the study will be, but rather laying out how the results could be for the benefit of the reader. For instance, if testing the effects of 10 different teaching methods on language acquisition-rate: Even if we have no a priori idea of which method will be most effective, we can nevertheless generate competing hypotheses and describe their corresponding predictions. This is a helpful way to justify and set expectations for the specific parameters that will be examined by the methods of the study. In fact, in the current paper, the authors in fact had some very clear a priori expectations going into this study that immigrant males would be vectors of behavioural transmission (clear that is from the rest of the introduction, and the parameters used in their analysis, which were not chosen at random).

      We have now updated the whole research aims (lines 125-173).

      The multiple references to 'long-lived' species in the abstract (line 16 and introduction (39, 56) is a bit confusing given the focus of this study. Although such categorisations are arbitrary by nature (a vervet is certainly long-lived compared to a dragonfly), I would not typically put vervet monkeys (or marmosets, line 62) in the same category as apes (references 8 and 9) or humans (line 62) in this regard.

      When we use “long-lived” in the introduction, we explain that we mean animals with slow generational turnover for whom genetic adaptation is relatively slow – too slow to adapt to very rapid environmental change. Within the distinctions the reviewer makes here, we feel that vervets and marmosets are much more similar to apes than to dragonflies etc. in this respect… and we think making the comparisons that we do are valid in this context (though we do agree that for other reasons we would not find it appropriate). We have modified the sentence in the introduction (line 4042) and hope this is clearer now. The study in reference 9 is about crop-raiding, which is something vervets can learn to do within one generation too. In addition, reference 8 is used as it was one of the earlier and long-standing definitions of innovation which we are using here – we are not comparing vervets to apes directly, but we do not think a different definition of innovation is required.

      This contributes a little towards the lack of overall conceptual focus for the manuscript: beginning in this fashion suggests the authors are building a "comparative evolutionary origins" story, hinting perhaps at the phylogenetic relevance of the work to understanding human behaviour, but the final paragraph of the study contextualises the findings only in terms of their relevance to feeding ecology and conservation efforts. I would recommend that the authors think carefully about their intended audience and tailor the text accordingly. This is not to say that readers interested in human evolution will not be interested in conservation efforts, but rather that each of these aspects should be represented in each stage of the manuscript (otherwise - conservationists may not read far into the Introduction, and cultural evolution fans will be left adrift in the Conclusion).

      We agree that the line running through the whole paper needed to be clearer and have tried to improve this.

      2) Observational detail

      There are a number of areas of the manuscript which I found to be lacking in sufficient detail to accurately determine what occurred in these experimental sessions, making the data difficult to interpret overall. All of this additional information ought to be readily available from the methods used (the experiments were observed by 3-5 researchers with video cameras (line 341)) and is all of direct relevance to the research questions set out by the authors.

      We added more details about the experiment in the method section.

      While I appreciate that it will take quite a bit of work to extract this information, I am certain that it would greatly improve the robustness and explanatory power of this study to do so.

      The data on who was first to innovate/demonstrate successful extraction of the food in each group (Question 1) and subsequent uptake (Question 2), as well as the actual mechanism by which that uptake occurred (the authors strongly imply social learning in their Discussion, but this is never directly examined) is difficult to interpret based on the information presented. Some key gaps in the story were:

      We did not intend to claim that muzzle contact was the specific mechanism by which individuals learned to extract and eat peanuts – we rather use this experiment to evaluate the function of muzzle contact in the presence of a novel food.

      We did not record observation networks in all groups during experiments and cannot obtain accurate ones from all our videos – we hope it is clearer in our text now. Our group’s previous study (Canteloup et al., 2021) already shows social transmission of the opening techniques using data of two of our groups (NH and KB).

      • Which/how many individuals encountered the food and in what order? I.e., were migrants/innovators simply the first to notice the food?

      No, and we have now added some info about other individuals approaching the box and inspecting the peanuts before innovation took place

      • Did any individuals try and fail to extract the food before an "innovator" successfully demonstrated?
      • How many tried and failed to extract the nuts before and after observing effective demonstrators?

      We have added the number of individuals that inspected the peanuts (visually and with contact)

      • Were individuals who observed others interact with the food more likely to approach and/or extract it themselves?
      • Did group-members use the same methods of extraction as their 'innovators'?

      Yes – this is the topic of Canteloup et al. 2021 – and these data are not presented again here. That study was on two of the groups presented here (KB and NH), and with up to 10 exposures in each of those groups and present a fine-grained analysis of peanuts opening techniques used by monkeys. We hope this is clearer now in the text where we refer to this paper.

      • How many tried and succeeded without having directly observed another individual do so (i.e. 'reinvention' as per Tennie et al.)?

      For this, and the above points: We did not record an observation network for the groups added in this study and are not able to answer this – it is not the focus of this study. For this reason, we do not make claims in this line in the present study, and are cautious with our social learning related language. Whilst we examine the role of muzzle contact in acquiring information about a novel food, we do not expect this behaviour to be a necessary prerequisite in being able to extract and eat this food – indeed many individuals who learned to eat did not perform muzzle contacts. This aspect of the study is about using this novel food situation to explore whether muzzle contact serves information acquisition – which our evidence suggests it does.

      Moreover, the processing of this food is not complex and is similar to natural foods in their environment, and we do expect individuals to be capable of reinventing it easily (and this point with Tennie’s hypothesis is actually discussed in Canteloup et al. 2021 paper) – but the point here is that their natural tendency is to be neophobic to unknown food, and therefore they do not readily eat it until they see a conspecific doing so, after which they do. And we also used this opportunity, though in a very small sample size, to investigate which individuals would overcome that neophobia and be the first to eat successfully.

      The connective tissue between the research questions set out by the authors is clearly social learning. In short: the thesis is that Migrants/Innovators bring a novel behaviour to the group, then there is 'uptake' (social learning), which may be influenced by demographic factors and muzzle-contact (biases + mechanisms). Given this focus (e.g. lines 224-264 of the Discussion), I would expect at least some of the details above to be addressed in order to provide robust support for these claims.

      See above – the reason we talk about ‘uptake’ rather than social learning is that we really see this as a case of social disinhibition of neophobia, rather than more detailed social learning such as copying or imitation, as it would be in a tool-use setting, for example (though in Canteloup et al. 2021 paper, evidence is found that the specific methods to open peanuts are socially transmitted).

      Question 2a (Lines 136-146): This data is hard to interpret without knowing how much of the group was present and visible during these exposures.

      Please see response to reviewer 1 on this.

      For example: 9% update in NH group does not sound impressive, but if only 10% of the total group were present while the rest were elsewhere, then this is 90% of all present individuals. Meanwhile if 100% of BD group were present and only experienced 31% uptake, then this is quite a striking difference between groups.

      Experiments were done at sunrise at monkeys’ sleeping site in AK, LT, NH and KB where most of the group was present in the area; we added more precision on this point in the Method section (lines 615-619).

      Of course, there is also an issue of how many individuals can physically engage with the novel food even if they want to - the presence of dominant individuals, steepness of hierarchy within that group, etc, will significantly influence this (and is all of interest with regards to the authors' research questions).

      We discuss this with respect to the result showing that higher rank individuals were more likely to extract and eat the food at the first exposure and over all four exposures

      Muzzle-contact behaviour: The authors use their data to implicate muzzle-contact in social learning, but this seems a leap from the data presented (some more on this in the Analysis section).

      We hope our distinction between information acquisition and information use is clearer now.

      For example: - What is the role of kinship in these events?

      We did not analyse kinship here, but we see a lot of targeting towards adult males, and we do not have reliable kinship data for them. We also checked (see response to reviewer 3) the muzzle contacts initiated by knowledgeable adult females, and they are mostly towards adult males, not towards related juveniles (see new figure 4D and lines 497-500).

      • Did they occur when the juvenile had free access to the food (i.e. not likely to be chased off by a feeding adult)?

      We recorded muzzle contacts visible within 2m of the box, so individuals were not necessarily eating at the box at the time of engaging in muzzle contacts. However, the majority of muzzle contacts that we could record took place directly at the edge of the box – at the location where the food is accessed – so an individual would not likely be if they were not able to have access to the food. It is possible they could be there and not eating, but they would not have been chased off, otherwise they would not be able to engage in muzzle contacts there. But it is not entirely clear what the reviewer’s point is here.

      • Did they primarily occur when adults had a mouthful of food? (i.e. could it simply be attempted pilfering/begging)

      This is not typical of this species. Very few specific individuals remove food from others’ mouths, and they do it with their hands, usually beginning with grooming their face and cheekpouches, before prising their mouth open and removing food from the victim’s cheekpouches

      • What proportion of PRESENT (not total) individuals were naïve and knowledgeable in each group for each trial (if 90% present were knowledgeable, then it is not surprising that they would be targeted more often)?

      We agree somewhat with this statement, but given the multiple ways we show the effect of knowledge – both at the individual level and the group level (effect of exposure number i.e. overall group familiarity) – we feel we present enough evidence to establish the link between knowledge of the food and muzzle contacts. We find that the model showing the interaction between exposure number and number of monkeys eating on the overall rate of muzzle contacts actually addresses this issue, because we see that when many monkeys are eating during later exposures, when many were indeed knowledgeable, the rate of muzzle contacts is massively decreased. Moreover, if 90% of the individuals present are knowledgeable, then only 10% of the individuals present are naïve, and we show both that knowledgeable individuals are targeted, but also that naïve individuals are initiators.

      • Did these events ever lead to food-sharing (In other words, how likely are they to simply be begging events)?

      We do not observe food-sharing in vervets.

      • Did muzzle-contact quantifiably LEAD to successful extraction of the food? If the authors wish to implicate muzzle-contact in social learning, it is not sufficient to show that naïve individuals were more likely to make muzzle-contact, they must also show that naïve individuals who made more muzzlecontact were more likely to learn the target behaviour.

      We disagree here, because there is a distinction between information acquisition and information use - obtaining olfactory information about a novel resource that conspecifics are eating is not the same as learning a complex tool use behaviour for which detailed observation of a model is required. We are not claiming that that muzzle contact is THE mechanism by which the monkeys learn how to eat the food – but we do believe that the clear separation between naïve individuals initiating and knowledgeable individuals being target, and the decrease of the rate of this behaviour as groups’ familiarity with the food increases – is good evidence that this behaviour functions to acquire information about a novel food.

      3) Analysis

      There are a number of issues with the current analysis which I strongly recommend be addressed before publication. Some of these are likely to simply require additional details inserted to the manuscript, whereas others would require more substantial changes. I begin with two general points (A & B), before addressing specific sections of the manuscript.

      A) My primary issue with each of the analyses in this manuscript is that the authors have fit complex statistical models for each of their analyses with no steps to ascertain whether these models are a good fit for the data. With a relatively small dataset and a very large number of fixed effects and interactions, there is a considerable risk of overfitting. This is likely to be especially problematic when predictor variables are likely to be intercorrelated (age, sex and rank in the case of this analysis).

      We have now checked for overfitting in our models.

      The most straightforward way to resolve this issue is to take a model-comparison approach. Fitting either a) a full suite of models (including a 'null' model) with each possible permutation of fixed effects and interactions (since the authors argue their analysis is exploratory) or b) a smaller set of models which the authors find plausible based on their a priori understanding of the study system. These models could then be compared using information criterion to determine which structure provides the best out-of-sample predictive fit for the data, and the outputs of this model interpreted. Alternatively, a model-averaging approach can be taken, where the effects of each individual predictor are averaged and weighted across all models in the set. Both of these approaches can be performed easily using the r package 'MuMIn'. There are also a number of tutorials that can be found online for understanding and carrying out these approaches.

      Please see our answer at the beginning of the document, detailing how we have updated our models.

      B) It does not seem that interobserver reliability testing was carried out on any of the data used in these analyses. This is a major oversight which should be addressed before publication (or indeed any re-analysis of the data).

      We have added this now and mention it above already.

      Line 444: Much more detail is needed here. What, precisely, was the outcome measure? Was collinearity of predictors assessed? (I would expect Age + Rank to be correlated, as well as Sex + Rank).

      This is now addressed (please see details above) – we use VIFs to assess multicollinearity of predictors in our models and find they are all satisfactory (see R code).

      Line 452. A few comments on this muzzle-contact analysis:

      The comments below are a little confusing as some seem to refer to the muzzle-contact rate model (previously line number 452), and some seem to refer to the initiator/receiver model. We have tried to figure out which comments refer to which, and answer accordingly.

      "We investigated muzzle contact behaviour in groups where large proportions of the groups started to extract and eat peanuts over the first four exposures"

      What was the criteria for "a large proportion"?

      All groups are now included in this analysis.

      The text for this muzzle-contact analysis would indicate that this model was not fit with any random effects, which would be extremely concerning. However, having checked the R code which the authors provided, I see that Individual has been fit as a random effect. This should be mentioned in the manuscript. I would also strongly recommend fitting Group (it was an RE in the previous models, oddly) and potentially exposure number as well.

      The model about muzzle contact rate never contained individual as a random effect because individuals are not relevant in this model – it is the number of muzzle contacts occurring during each exposure. However, the reviewer might refer here to the model that we forgot to provide the script for. Nonetheless, we have substantially revised this model, it now (Model 3) includes all groups, and has group as a random effect.

      Following on from this, if the model was fit with individual as a random effect it becomes confusing that Figure 3 which represents this data seemingly does not control for repeated measures (it contains many more datapoints than the study's actual sample size of 164 individuals). This needs to be corrected for this figure to be meaningfully interpretable.

      Figure 3 is not related to the model described in (original) line 452.

      The numbers were referring to the number of muzzle contacts, and this was written in the figure caption. However, we no longer present these details on the new figure (see Fig 4).

      Finally, would it make sense to somehow incorporate the number of individuals present for this analysis? Much like any other social or communicative behaviour, I would predict the frequency of occurrence to depend on how many opportunities (i.e. social partners) there are to engage in it.

      We have included the number of monkeys eating in our muzzle contact rate model now (Model 3) as upon further thought, we found that this was the issue leading us to want to exclude exposures, and only include the groups where many monkeys were eating. We have resolved this now by including all groups and not dropping exposures, and rather we include an interaction between number of monkeys eating and exposure number. We feel this addresses our hypothesis here much more satisfactorily. We hope these updates also address the reviewers concerns adequately.

      Line 460: "For BD and LT we excluded exposures 4 and 3, respectively, due to circumstances resulting in very small proportions of these groups present at these exposures"

      What was the criterion for a satisfactory proportion? Why was this chosen

      See above – this is now addressed.

      Line 461: "We ran the same model including these outlier exposures and present these results in the supplementary material (SM3)."

      The results of this supplemental analysis should be briefly stated. Do they support the original analysis or not?

      We no longer present this like this. We revised the model examining muzzle contact rate substantially and actually included the number of individuals eating in the model rather than excluding groups where this number was low. The results of the new model show good support our hypothesis.

      Line 465: "Due to very low numbers of infants ever being targets of muzzle contacts, we merged the infant and juvenile age categories for this analysis."

      This strikes me as a rather large mistake. The research question being asked by the authors here is "How does age influence muzzle-contact behaviour?"

      Then, when one age group (infants) is very unlikely to be a target of muzzle-contact, the authors have erased this finding by merging them with another age category (juveniles). This really does not make sense, and seriously confounds any interpretation of either age category.

      Yes we agree with this issue, and no longer do that. Rather we remove the infant data from this model, which is now Model 6, because of the large amounts of error they introduced into the model due to the small sample size. We show the process in the R code, and we describe our reasons in the text (lines 713-719). Since we are now only comparing within age- and sex-categories (see below) we do not find this decision introduces any bias.

      Lines 466-474: Why was rank removed for the second and third models? Why is Group no longer a random effect (as in the previous analysis)? The authors need to justify such steps to give the reader confidence in their approach.

      This is now addressed and discussed in descriptions of our new models.

      Furthermore - because of the way this model is designed, I do not think it can actually be used to infer that these groups are preferentially targeted, merely that adult female and adult males are LESS likely to target others than to be targeted themselves, which is a very different assertion.

      Because the specific outcome measure was not described here, this only became apparent to me after inspecting Figure 3, where outcome measure is described as "Probability of (an individual) being a target rather than initiator" - so, it can tell us that adults are more often targeted rather than initiating, but does not tell us if they are targeted more frequently than juveniles (who may get targeted very often, but initiate so often that this ratio is offset).

      We thank the reviewer for noticing this as we had indeed chosen an inappropriate model for what we were intending to measure – this has been addressed now with two additional models (Models 4 and 5; see details at the top of document). We nonetheless found the aspects of this model to still be highly interesting, so have re-framed it to focus on them.

      Lines 467-473: "Our first simple model included individuals' knowledge of the novel food at the time of each muzzle contact (knowledgeable = previously succeeded to extract and eat peanuts; naïve = never previously succeeded to extract and eat peanuts) and age, sex and rank as fixed effects. Individual was included as a random effect. The second model was the same, but we removed rank and added interactions between: knowledge and age; and knowledge and sex. The third model was the same as the second, but we also added a three-way interaction between knowledge, age and sex."

      This is a good example of some of the issues I describe above. What is the justification for each of these model-structures? The addition and subtraction of variables and interactions seems arbitrary to the reader.

      For Model 6, we no longer include rank at all, because we had not hypothetical reason to (see lines 723-725). We now begin with the three-way interaction, and only remove this, because it is not significant, and the model had problems converging as well, due to its complexity. We show this in the R script. We retain only the two separate interactions, and we do not include group as a random effect in this model due to the complexity AND because we do not think there is a theoretical requirement for it to be included here (this is explained in lines 730-735- in the manuscript. We report the results of the 3-way interaction in the supplementary material – SM3 Table S2).

      Reviewer 3

      In this study, the authors introduce a novel food that requires handling time to five vervet monkey groups, some of which had previous experience with the food. Through the natural dispersal of males in the population, they show that dispersing individuals transmit behavioral innovations between groups and are often also innovators. They also examine muzzle contact initiations and targets within the groups as a way to determine who is seeking social information on the new food source and who is the target of information seeking. The authors show that knowledgeable adults are more often the target of muzzle contacts compared to young individuals and those that are not knowledgeable.

      This is a very interesting study that provides some novel insights. The methods employed will be useful to others that are considering an experimental approach to their field research. The data set is good and analyzed appropriately and the conclusions are justified. However, there are several areas where the paper could be improved for readers in terms of its clarity.

      1) It wasn't until the Discussion that it became clear to me that the actual physiological and personality traits of dispersers were being linked with innovation. From the Title, Abstract, and Introduction, it seemed as though the focus was on dispersing males bringing their experience with a novel food to a new group to pass it on. I think it needs to be made clear much earlier in the manuscript that the authors are investigating not only the transmission of behavioural adaptation but also how the traits of dispersers might may make them more likely to innovate.

      We have now addressed this above.

      2) Early in the paper on line 28, the authors state that continued initiation of muzzle contacts by adult females could have been an effort to seek social information. This is true but another interpretation is that females were imparting or giving social information. It seems important here and elsewhere (lines 322-323) to consider and report the target of these initiations. If these were directed at more knowledgeable individuals, it supports the idea that this was social information seeking. If muzzle contacts were directed to younger or unknowledgeable individuals, it would imply a form of teaching, which is possible but perhaps unlikely, so I think the authors need to be totally clear here.

      We thank the reviewer for pointing this out We looked into our data and now present figure 4D, showing that almost all knowledgeable adult females’ muzzle contacts were targeted towards knowledgeable adult males and talk about it in the discussion (lines 499-500).

      3) The argument made on lines 344-350 needs more fleshing out to be convincing or it should be deleted. The link between number of dispersers, social organization, and large geographic range seems a little muddled. There are many dispersing individuals in species that are not typically in large multi-male, multi-female social organizations. Indeed, in many species both sexes disperse. Think of pair living birds where both sexes disperse and geographic range can be enormous. There are also no data or references presented here to show that species in multi-male, multi-female social organizations do have larger geographic ranges than those that are not in these social organizations. It seems to me that, even if this is the case, niche is more important than social organization, for instance not being dependent on forests to constrain much of your range.

      We have removed this section

    2. Reviewer #2 (Public Review)

      I have separated my issues with the manuscript into three sub-headings (Conceptual Clarity, Observational Detail and Analysis) below.

      1) Conceptual clarity

      There are a number of areas where it would greatly benefit the manuscript if the authors were to revisit the text and be more specific in their intentions. At present, the research questions are not always well-defined, making it difficult to determine what the data is intended to communicate. I am confident all of these issues could be fixed with relatively minor changes to the manuscript.

      For example, Line 104: Question 1 is not really a question, the authors only state that they will "investigate innovation and extraction of eating the food", which could mean almost anything.

      Question 2a (line 98) is also very vague in it's wording, and I'm left unclear as to what the authors were really interested in or why. This is not helped by Line 104 which refuses to make predictions about this research question because it is "exploratory". Empirical predictions are not simply placing a bet on what we think the results of the study will be, but rather laying out how the results could be for the benefit of the reader. For instance, if testing the effects of 10 different teaching methods on language acquisition-rate: Even if we have no a priori idea of which method will be most effective, we can nevertheless generate competing hypotheses and describe their corresponding predictions. This is a helpful way to justify and set expectations for the specific parameters that will be examined by the methods of the study. In fact, in the current paper, the authors in fact had some very clear a priori expectations going into this study that immigrant males would be vectors of behavioural transmission (clear that is from the rest of the introduction, and the parameters used in their analysis, which were not chosen at random).

      The multiple references to 'long-lived' species in the abstract (line 16 and introduction (39, 56) is a bit confusing given the focus of this study. Although such categorisations are arbitrary by nature (a vervet is certainly long-lived compared to a dragonfly), I would not typically put vervet monkeys (or marmosets, line 62) in the same category as apes (references 8 and 9) or humans (line 62) in this regard. This contributes a little towards the lack of overall conceptual focus for the manuscript: beginning in this fashion suggests the authors are building a "comparative evolutionary origins" story, hinting perhaps at the phylogenetic relevance of the work to understanding human behaviour, but the final paragraph of the study contextualises the findings only in terms of their relevance to feeding ecology and conservation efforts. I would recommend that the authors think carefully about their intended audience and tailor the text accordingly. This is not to say that readers interested in human evolution will not be interested in conservation efforts, but rather that each of these aspects should be represented in each stage of the manuscript (otherwise - conservationists may not read far into the Introduction, and cultural evolution fans will be left adrift in the Conclusion).

      2) Observational detail

      There are a number of areas of the manuscript which I found to be lacking in sufficient detail to accurately determine what occurred in these experimental sessions, making the data difficult to interpret overall. All of this additional information ought to be readily available from the methods used (the experiments were observed by 3-5 researchers with video cameras (line 341)) and is all of direct relevance to the research questions set out by the authors.

      While I appreciate that it will take quite a bit of work to extract this information, I am certain that it would greatly improve the robustness and explanatory power of this study to do so.

      The data on who was first to innovate/demonstrate successful extraction of the food in each group (Question 1) and subsequent uptake (Question 2), as well as the actual mechanism by which that uptake occurred (the authors strongly imply social learning in their Discussion, but this is never directly examined) is difficult to interpret based on the information presented. Some key gaps in the story were:

      - Which/how many individuals encountered the food and in what order? I.e., were migrants/innovators simply the first to notice the food?<br /> - Did any individuals try and fail to extract the food before an "innovator" successfully demonstrated?<br /> - How many tried and failed to extract the nuts before and after observing effective demonstrators?<br /> - Were individuals who observed others interact with the food more likely to approach and/or extract it themselves?<br /> - Did group-members use the same methods of extraction as their 'innovators'?<br /> - How many tried and succeeded without having directly observed another individual do so (i.e. 'reinvention' as per Tennie et al.)?

      The connective tissue between the research questions set out by the authors is clearly social learning. In short: the thesis is that Migrants/Innovators bring a novel behaviour to the group, then there is 'uptake' (social learning), which may be influenced by demographic factors and muzzle-contact (biases + mechanisms). Given this focus (e.g. lines 224-264 of the Discussion), I would expect at least some of the details above to be addressed in order to provide robust support for these claims.

      Question 2a (Lines 136-146): This data is hard to interpret without knowing how much of the group was present and visible during these exposures.

      For example: 9% update in NH group does not sound impressive, but if only 10% of the total group were present while the rest were elsewhere, then this is 90% of all present individuals. Meanwhile if 100% of BD group were present and only experienced 31% uptake, then this is quite a striking difference between groups.

      Of course, there is also an issue of how many individuals can physically engage with the novel food even if they want to - the presence of dominant individuals, steepness of hierarchy within that group, etc, will significantly influence this (and is all of interest with regards to the authors' research questions).

      Muzzle-contact behaviour: The authors use their data to implicate muzzle-contact in social learning, but this seems a leap from the data presented (some more on this in the Analysis section).

      For example:<br /> - What is the role of kinship in these events?<br /> - Did they occur when the juvenile had free access to the food (i.e. not likely to be chased off by a feeding adult)?<br /> - Did they primarily occur when adults had a mouthful of food? (i.e. could it simply be attempted pilfering/begging)<br /> - What proportion of PRESENT (not total) individuals were naïve and knowledgeable in each group for each trial (if 90% present were knowledgeable, then it is not surprising that they would be targeted more often)?<br /> - Did these events ever lead to food-sharing (In other words, how likely are they to simply be begging events)?<br /> - Did muzzle-contact quantifiably LEAD to successful extraction of the food? If the authors wish to implicate muzzle-contact in social learning, it is not sufficient to show that naïve individuals were more likely to make muzzle-contact, they must also show that naïve individuals who made more muzzle-contact were more likely to learn the target behaviour.

      3) Analysis

      There are a number of issues with the current analysis which I strongly recommend be addressed before publication. Some of these are likely to simply require additional details inserted to the manuscript, whereas others would require more substantial changes. I begin with two general points (A & B), before addressing specific sections of the manuscript.

      A) My primary issue with each of the analyses in this manuscript is that the authors have fit complex statistical models for each of their analyses with no steps to ascertain whether these models are a good fit for the data. With a relatively small dataset and a very large number of fixed effects and interactions, there is a considerable risk of overfitting. This is likely to be especially problematic when predictor variables are likely to be intercorrelated (age, sex and rank in the case of this analysis).

      The most straightforward way to resolve this issue is to take a model-comparison approach. Fitting either a) a full suite of models (including a 'null' model) with each possible permutation of fixed effects and interactions (since the authors argue their analysis is exploratory) or b) a smaller set of models which the authors find plausible based on their a priori understanding of the study system. These models could then be compared using information criterion to determine which structure provides the best out-of-sample predictive fit for the data, and the outputs of this model interpreted. Alternatively, a model-averaging approach can be taken, where the effects of each individual predictor are averaged and weighted across all models in the set. Both of these approaches can be performed easily using the r package 'MuMIn'. There are also a number of tutorials that can be found online for understanding and carrying out these approaches.

      B) It does not seem that interobserver reliability testing was carried out on any of the data used in these analyses. This is a major oversight which should be addressed before publication (or indeed any re-analysis of the data).

      Line 444: Much more detail is needed here. What, precisely, was the outcome measure? Was collinearity of predictors assessed? (I would expect Age + Rank to be correlated, as well as Sex + Rank).

      Line 452. A few comments on this muzzle-contact analysis:

      "We investigated muzzle contact behaviour in groups where large proportions of the<br /> groups started to extract and eat peanuts over the first four exposures"

      What was the criteria for "a large proportion"?

      The text for this muzzle-contact analysis would indicate that this model was not fit with any random effects, which would be extremely concerning. However, having checked the R code which the authors provided, I see that Individual has been fit as a random effect. This should be mentioned in the manuscript. I would also strongly recommend fitting Group (it was an RE in the previous models, oddly) and potentially exposure number as well.

      Following on from this, if the model was fit with individual as a random effect it becomes confusing that Figure 3 which represents this data seemingly does not control for repeated measures (it contains many more datapoints than the study's actual sample size of 164 individuals). This needs to be corrected for this figure to be meaningfully interpretable.

      Finally, would it make sense to somehow incorporate the number of individuals present for this analysis? Much like any other social or communicative behaviour, I would predict the frequency of occurrence to depend on how many opportunities (i.e. social partners) there are to engage in it.

      Line 460: "For BD and LT we excluded exposures 4 and 3, respectively, due to circumstances resulting in very small proportions of these groups present at these exposures"

      What was the criterion for a satisfactory proportion? Why was this chosen?

      Line 461: "We ran the same model including these outlier exposures and present these results in the supplementary material (SM3)."

      The results of this supplemental analysis should be briefly stated. Do they support the original analysis or not?

      Line 465: "Due to very low numbers of infants ever being targets of muzzle contacts, we merged the infant and juvenile age categories for this analysis."

      This strikes me as a rather large mistake. The research question being asked by the authors here is "How does age influence muzzle-contact behaviour?"<br /> Then, when one age group (infants) is very unlikely to be a target of muzzle-contact, the authors have erased this finding by merging them with another age category (juveniles). This really does not make sense, and seriously confounds any interpretation of either age category.

      Lines 466-474: Why was rank removed for the second and third models? Why is Group no longer a random effect (as in the previous analysis)? The authors need to justify such steps to give the reader confidence in their approach.

      Furthermore - because of the way this model is designed, I do not think it can actually be used to infer that these groups are preferentially targeted, merely that adult female and adult males are LESS likely to target others than to be targeted themselves, which is a very different assertion.

      Because the specific outcome measure was not described here, this only became apparent to me after inspecting Figure 3, where outcome measure is described as "Probability of (an individual) being a target rather than initiator" - so, it can tell us that adults are more often targeted rather than initiating, but does not tell us if they are targeted more frequently than juveniles (who may get targeted very often, but initiate so often that this ratio is offset).

      Lines 467-473: "Our first simple model included individuals' knowledge of the novel food at the time of each muzzle contact (knowledgeable = previously succeeded to extract and eat peanuts; naïve = never previously succeeded to extract and eat peanuts) and age, sex and rank as fixed effects. Individual was included as a random effect. The second model was the same, but we removed rank and added interactions between: knowledge and age; and knowledge and sex. The third model was the same as the second, but we also added a three-way interaction between knowledge, age and sex."

      This is a good example of some of the issues I describe above. What is the justification for each of these model-structures? The addition and subtraction of variables and interactions seems arbitrary to the reader.

    1. Author Response

      Reviewer #1 (Public Review):

      This is an extremely well-done study, revealing a fascinating phenotype of mes-4 mutant, which they show upregulates X-linked genes, leading to PGC death. These X-linked genes are mostly oogenesis genes, upregulation of which likely impedes normal proliferation of PGCs. The results are very concrete and supports their conclusion, and contribute significantly to the field. I do not have any major concerns except for a couple of conceptual issues. First, the title 'germline immortality' does not seem to be well aligned with the results. It is not wrong that PGCs die in mes-4 mutant, and thus the germline is 'mortal': however, the term 'germline immortality' implies multi-generational passages of germline, and the data in the present study, where mutant PGCs just die in the offspring, do not necessarily point to 'germline immortality' per se. So, I suggest to change the title to reflect the contents of the paper better.

      Good point. We changed germline immortality to germline survival and/or development throughout the paper.

      Second, although the authors speculate (in the discussion) why X activation is toxic to germ cells (discussing that upregulated X-linked genes are oogenesis genes, whose precocious activation is toxic to PGCs), there is not sufficient discussion as to why the effect is mostly limited to X chromosome, and why mes-4 is specifically involved in this. Is it because all oogenesis genes are concentrated on X chromosome? (likely not). Are autosomal genes that are upregulated in mes-4 mutant also oogenesis genes? Is this related to dosage compensation? I would like to see fuller discussions as to why X chromosome requires special regulation, also discussing the role of mes-4 in this context. I understand that the authors might have refrained from expanding discussions on matters that do not have any data, but without this discussion, I feel that many readers will be left wondering 'why?'.

      As noted in Point #5 above, we added to Discussion whether up-regulation of X genes in mes-4 mutant PGCs and EGCs reflects a defect in dosage compensation or a defect in keeping the oogenesis program (which is enriched for X-linked genes) quiet in the nascent germline (see lines 604-630). Based on new analyses showing up-regulation of oogenesis genes (on the X and autosomes) in mes-4 and PRC2 nascent germlines and the points in Discussion, we favor the view that the essential function of MES-4 and PRC2 is to repress X-linked oogenesis genes in PGCs and EGCs (see Figures 6 and 7, associated figure supplements, and lines 389-417).

      Reviewer #2 (Public Review):

      This manuscript makes substantial progress in resolving a long-standing mystery regarding the precise role of the histone methyltransferase MES-4 in promoting germline development. MES-4 maintains the histone modification H3K36me3 and germ cell survival, but prior evidence was unable to distinguish among several possibilities for target pathways. This paper utilizes a transcriptional profiling approach at the critical time of germline development to definitively demonstrate that the essential function of MES-4 is to repress X gene expression in germ cells. This result is surprising because X repression is an indirect effect of MES-4 activity (MES-4 does not localize to the X), while the direct effect of maintaining germline gene expression is not essential. To buttress this finding, the authors also utilize a series of elegant genetic experiments to independently test whether expression from the X is sufficient to cause germ cell degeneration. They then go further to identify a single X-linked target, lin-15b, as a primary contributor to the inappropriate X-linked gene expression in mes-4 mutants, by showing that loss of lin-15b activity rescues both the germline degeneration and X mis-expression of mes-4 mutants. Finally, the authors demonstrate that PRC2, the H3K27me3 histone methyltransferase and MRG-1, a candidate H3K36me3 effector protein, are also involved in promoting X silencing through lin-15b.

      The manuscript's strengths lie in the development or application of novel techniques, including the profiling of individual pairs of PGCs (a non-trivial advancement), as well as some very well-designed and conceptually innovative genetic assays. These were used to address specific and important gaps in knowledge regarding the phenotype of mes-4, which had been elusive despite having been studied for almost 30 years. Although specific to C. elegans in some ways, the findings are clearly relevant to conserved regulatory events, such as epigenetic memory mechanisms and establishment of opposing chromatin states. Thus, this work provides a substantial advance in the field overall.

      One limitation of this study is the lack of clarity about the conclusions regarding the relationship between the two H3K36me3 histone methyltransferases mes-4 and met-1, and between X vs autosomal gene expression. The authors do not precisely state what genes (X or A) are affected in the met-1 and mes-4 mutants. Ultimately, this confusion muddles the final message of X chromosome upregulation being the critical contributor to the mes-4 germline degeneration phenotype. The experiment presented in figure 3B indicates that loss of mes-4 or met-1 is sufficient to prevent germline development even when the Xs are repressed, indicating that failure to activate autosomal gene expression is also an underlying cause of the degeneration. Perhaps this cannot be definitively concluded without directly assessing met-1 and met-1;mes-4 mutant PGCs (or EGCs) for gene expression changes. If technically possible, this would be a very valuable experiment to directly examine autosomal gene expression changes in the double mutant.

      We profiled met-1 PGCs and observed very few mis-regulated genes (Figure 7 – supplemental figure 1). We tried to profile met-1; mes-4 double mutant PGCs, which completely lack both MET-1 and MES-4 and inherit chromosomes that lack H3K36me3. That was not feasible, due to the high level of embryonic lethality and rapid deterioration of PGCs dissected from met-1; mes-4 double mutant larvae. Notably, this demonstrates that germlines that lack both maternal K36me3 HMTs are sicker than those that lack just 1 of the HMTs. The high degree of embryo lethality suggests an essential function for MET-1 and MES-4 in the soma. As requested, we generated and included a list of X and autosomal genes mis-regulated in met-1, mes-4, and other mutant PGCs (see Figure 7—figure supplement 1).

      The sterility of hermaphrodites with a met-1; mes-4 mutant XspXsp germline and lacking either maternal MES-4 or maternal MET-1 may be due to mis-regulation of autosomal genes, or it may reflect that the X chromosomes are not repressed in met-1; mes-4 XspXp germlines that lack H3K36me3. To test that, we would need to profile those XspXsp PGCs. It is not feasible to identify mutant F1 larvae with Xsp/Xsp PGCs immediately after hatching, which is required for transcript profiling. We think that the main message from analyzing met-1; mes-4 mutant XspXsp germlines -- that inherited H3K36me3 marking is not critical for germline development but re-establishment of marking is important and requires both enzymes – does not require our delving into the cause of sterility of mutant XspXsp germlines lacking MET-1.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Autophagy of the endoplasmic reticulum (ER-phagy) is a fundamental process that is essential for maintaining cellular homeostasis and quality control. We recently identified a novel mechanism regulating ER-phagy in both plants and animals that is based on the ubiquitin-like protein modifiers ATG8 and UFM1, and the ER-associated protein, C53. Here, we use a combination of evolutionary, biochemical, and physiological experiments to investigate the evolution and regulation of this process. We reveal the dynamic evolution of UFM1 and the ubiquity of C53-mediated autophagy across eukaryotes. Leveraging these results, we then identify an ancestral molecular toggle switch, mediated by shuffled ATG8-interacting motifs (sAIMs), that controls C53-mediated autophagy through competitive binding between UFM1 and ATG8. These findings provide new insights into the evolution of UFM1, reveal a conserved mechanism for the regulation of ER-phagy, and raise new and exciting hypotheses about the diversity and function of the UFMylation pathway. We believe that this work will be of interest to those studying autophagy and cellular stress response but will also serve as an interesting example of the benefits of combining evolutionary analyses with biochemical and cellular experiments.

      Our manuscript has been reviewed by three reviewers through ReviewCommons, whose comments, and our responses, can be found below. Two of the reviewers (Reviewer 1 and 3) were supportive of our work and its significance whereas Reviewer 2 questioned the novelty of our findings.

      Each of the reviewers’ comments can be addressed through a few supporting experiments as well as an improved manuscript which clarifies the novelty and significance of our results. While being supportive of our work, Reviewer 1 requested minor additional experiments to support our mechanistic conclusions and Reviewer 3 suggested that we expand our characterizations of C53 function to additional eukaryotic supergroups. These experiments are straightforward to perform, the materials and protocols to accomplish them are already established, and our overall conclusions are robust to the resulting outcomes.

      In contrast, Reviewer 2 did not suggest any additional experiments but rather challenged the novelty of our results as well as some of our interpretations. In particular, Reviewer 2 was uncertain of how our phylogenomic analyses built upon a previous study, published in 2014, which used comparative genomics to identify ubiquitin-related machinery across eukaryotes. Although it was an oversight to not reference this study (we cited a more recent article showing the same results), we were aware of their conclusions that UFMylation was present in the last eukaryotic common ancestor but absent in Fungi. We now clearly outline, both below and within the manuscript, our key phylogenomic results. These were acquired after implementing more advanced and comprehensive comparative genomic searches which allowed us to identify dynamic patterns in UFMylation evolution and permitted co-evolutionary analyses which were not only important for informing our experimental hypotheses but generated new functional questions. Our phylogenomic analyses are also linked to biochemical and physiological data, providing, for the first time, experimental support for our conclusions regarding UFMylation evolution. Similarly, Reviewer 2 suggested that our mechanistic results were an incremental extension of our previous work. Although our current work does of course build on our initial identification of C53-mediated autophagy, this manuscript provides novel insights into the importance and function of this process by revealing its ubiquity across eukaryotes and by characterizing the mechanistic details of its regulation. Ultimately, we disagree with Reviewer 2 but appreciate that this misunderstanding likely resulted from a lack of context and clarity in our manuscript which we have now resolved.

      As outlined in detail below, we will address the reviewers concerns through additional experiments, analyses, and improvements to the text.

      Thank you for considering our manuscript. We look forward to hearing from you.

      Description of the planned revisions

      We thank the reviewers for carefully evaluating our manuscript and for providing us with an opportunity to respond to their suggestions and criticisms. As you can see below in our pointby-point response, we address each of the points raised by the reviewers through the addition of supporting experiments, analyses, and an improved text. Altogether, we think these additional experiments and textual changes will significantly improve the manuscript. Therefore, we would like to thank all the reviewers and editors for their time and input.

      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript Picchianti et al. provide novel insights into the interaction of C53 with UFM1 and ATG8. Initially, the authors show that protein modification by UFM1 exists in the unicellular organism Chlamydomonas reinhardtii. To that end they demonstrated that pure Chlamydomonas UBA5, UFC1 and UFM1 proteins, can charge UFC1. Then, they showed that C53 interacts with ATG8 and UFM1. Specifically, they found that the sAIM are essential for the interaction with UFM1, while substituting this motif with canonical AIM prevents the binding of UFM1 but not of ATG8. Since binding of C53 to ATG8 recruits the autophagy machinery, the authors suggest that ufmylation of RPL26 releases UFM1 from C53 which allows the binding of ATG8. Overall, the authors demonstrate that C53 that forms a complex with UFL1 connects between protein ufmylation and autophagy by its ability to bind both UBLs. Here the authors revisited the assumption that only multicellular organisms have the UFM1 system. Using bioinformatic tools they show that it exists also in unicellular organism. Also, they show that in some organisms the E3 complex UFL1, UFBP1 and C53 exist but not UBA5, UFC1 or UFM1. This is a very interesting observation that suggests an additional role for this complex. In Fig 1C the authors show that in Chlamydomonas RPL26 undergoes ufmylation. Please use IP against RPL26 and then a blot with anti UFM1. From the current experiment it is not clear how the authors know that this is indeed RPL26 that undergoes ufmylation

      RPL26 is highly conserved across eukaryotes, so by comparing our western blots with previous studies (Walczak et. al., 2019, Wang et al. 2020), we concluded that these bands corresponded to UFMylated RPL26. However, we agree with the reviewer that we need to confirm the identify of RPL26 with additional assays. Since the submission of the manuscript, we tested RPL26 antibodies in Chlamydomonas and showed that they work well. So, we will update our figure with the confirmation westerns.

      In the second part of the manuscript the authors characterize the interaction of C53 with ATG8 and UFM1. This is a continuation of their previous published work (Stephani et al, 2020). Here the reviewer thinks that further data on the binding of these proteins to C53 is required. Specifically, defining the Kd of these interactions using ITC or other biophysical method can contribute to the study.

      We agree with the reviewer. To obtain the KD values, we will perform ITC experiments with C53 wild type, a C53 sAIM mutant and a C53 cAIM variant titrated with ATG8 and UFM1.

      Under normal condition the authors suggest that C53 binds UFM1 and this keeps it inactive. The reviewer thinks that this claim needs further support. Using IP (maybe with crosslinker) the author can show that C53, in normal conditions, bind more UFM1 than ATG8. Also, since the interaction of UFM1 to C53 is noncovalent, it will be nice to show how alternations in UFM1 expression levels can affect the activation of C53.

      We thank the reviewer for this suggestion. Since the submission of the manuscript, we have obtained UFM1 overexpression lines. We will pull on C53 using our C53 antibody and check for ATG8 levels in wild type and UFM1 overexpressing lines under normal and stress conditions. We think this will show how alterations in UFM1 levels can affect C53 activation.

      Finally, the authors suggest that ufmylation of RPL26 allows binding of ATG8 to C53 and this, in turn, leads to C53 activation. Can the authors show that in cells lacking UBA5, under normal condition or with Tunicamycin treatment, ATG8 does not activate C53 due to the fact that UFM1 does not leave C53.

      In Stephani et al., we showed that C53-mediated autophagy requires the UFMylation machinery. In ufl1 and ddrgk1 mutants, C53 becomes insensitive to ER stress. However, to supplement these results, we will perform autophagic flux assays using the native C53 antibody to test autophagic degradation of C53 in a uba5 and ufc1 mutant under normal and tunicamycin stress conditions. The uba5 mutant that we have is a knockdown, so that’s why we will include the ufc1 mutant in our experiments.

      Significance

      This manuscript advances our understanding of the connection of ufmylation to autophagy which is mediated by C53.

      Thank you!

      Referee #2

      Evidence, reproducibility and clarity

      The manuscript from Picchianti et al. seeks to define the role of CDK5RAP3 (hereinafter referred as C53) during autophagy and its interplay with UFMylation. Together with UFL1 and DDRGK1, C53 is a component of a trimeric UFM1 E3 ligase complex that modifies the 60S ribosomal protein RPL26 at the endoplasmic reticulum (ER) surface upon ribosomal stalling (among other proposed functions that are not addressed). Several previous studies have implicated the UFMylation pathway in autophagy or ER-phagy although a non-autophagic fate for UFM1- tagged ribosomal subunits has also been reported. A previous study from the same authors (PMID: 32851973) identified an intrinsically disorder region (IDR) in C53 that is necessary and sufficient for interaction between C53 and autophagy receptor, ATG8. They reported that this IDR comprises four non canonical ATG8 interacting motifs (AIM), named shuffled AIMs (sAIMs) and showed that combinatorial mutagenesis of sAIM1, sAIM2, and sAIM3 abrogates ATG8 binding. A similar effect was observed for plant C53, though an additional canonical AIM (cAIM) in the C53 IDR had to be mutated to completely abolish C53 and ATG8 interaction. The earlier study reported that C53 IDR also interacts with UFM1, and this interaction can be disrupted in vitro by adding increasing concentration of ATG8, suggesting that ATG8 and UFM1 may compete with one another for C53 binding. The present paper attempts to build on this previous work by using phylogenomics to infer a coevolutionary relationship between UFMylation machinery and sAIMs in C53, which the authors argue, constitutes further evidence of the primary importance of a role for UFMylation in ER homeostasis. The manuscript includes a lot of biochemical data using variations of in vitro and in vivo pull-down experiments to define the roles of individual AIMs in mediating the binding of C53 to ATG8 and to UFM1. They also use NMR spectroscopy in an attempt to define the structural basis of the UFM1 and ATG8 binding to C53, concluding that plant C53 interacts with UFM1 mainly through sAIM1, while interaction with ATG8 requires cAIM as well as sAIM1 and sAIM2. Finally, the authors attempt to contextualize these findings by conducting studies on Arabidopsis mutants, showing that replacing sAIMs with cAIMs causes increases sensitivity to ER stress and apparently increases formation of C53 intracellular puncta that may colocalize with ATG8. From these data the authors concluded that the dual-ATG8 and UFM1 binding of C53 IDR regulates C53 recruitment to autophagosomes in response to ER stress. Major Issues: 1) The phylogenomics analysis conclusion that UFM1 is common in unicellular lineages and did not evolve in multicellular eukaryotes is not novel, as another comprehensive analysis of UFM1 phylogeny, published eight years ago - in 2014 - by Grau-Bové et al. (PMID: 25525215), also reported that UFM1, UBA5, UFC1, UFL1 and UFSP2 were likely present in LECA and lost in Fungi. Although the phylogenomic analysis by Picchianti et al. is also extended to DDRGK1 and C53 proteins, and some parasitic and algal lineages, their findings are incremental. Their proposed coevolution of sAIM and UFM1 is based on presence-absence correlation observed within five species (i.e., Albugo candida, Albuco laibachii, Piromyces finnis, Neocallimastix californiae, Anaeromyces robustus). However, this coevolutionary relationship must be further investigated by substantially increasing the taxonomic sampling within the UFM1-lacking group.

      We were aware that previous studies had investigated the distribution of UFMylation proteins across eukaryotes and that these analyses had predicted the presence of UFMylation in LECA and subsequent loss in Fungi. We included a more recent citation noting this (Tsaban et al. 2021) but apologise for not citing Grau-Bové et al. (2014), which we have now included. We must emphasize that our results are not incremental. Although we had made a point of emphasizing the presence of UFM1 in LECA, this was to counter a recent and highly cited paper in the field which claimed that UFMylation evolved in plants and animals (Walczak et al. 2019). Below we note the novel and important results from our phylogenomic analyses: 1. We used improved taxonomic sampling and more advanced comparative genomics methods to identify UFMylation components sensitively and specifically across eukaryotes. This involved the inclusion of additional eukaryotic genomes, phylogenetic annotation of orthologs, and genomic searches to complement proteome predictions. These methods are essential for accurately identifying UFMylation components and yield more robust results than using sequence similarity clustering (Tsaban et al. 2021) or un-curated Pfam HMMER search results (Grau-Bové et al. 2014). 2. By placing our UFMylation reconstructions in a modern phylogenetic context we were not only able to support previous observations which noted the presence of UFM1 in LECA and its loss in Fungi (Grau-Bové et al. 2014) and Plasmodium (Tsaban et al. 2021), but also to identify novel patterns in the evolution of UFMylation. This included the observation of recurrent losses in diverse but trophically-related lineages (such as algae and parasites) and revealed the retention of certain UFMylation components in the absence of UFM1. We identified the frequent coretention of UFL1 and DDRGK1 following UFM1 loss in multiple eukaryotic groups, including Fungi, which were previously thought to be devoid of UFMylation machinery. These previously uncharacterized patterns, suggest that these proteins could have alternative functions and may be functionally associated with life history. These results therefore expand on and add complexity to our understanding of the evolution of UFMylation. 3. By conducting a comprehensive and accurate survey of UFMylation components we were able to use our data to examine co-evolutionary trends between C53 and UFM1, which would have been incomplete and inaccurate using previously curated datasets. As the reviewer noted, only five species were identified that encoded C53 but lacked UFM1. This is not a reflection of insufficient taxon sampling, but rather the strong co-evolution between C53 and UFM1 (i.e., when UFM1 is lost, C53 is almost always lost as well). We attempted to identify additional cases by searching hundreds of fungal and oomycete genomes as well as those from other eukaryotes, but no other species were found. We agree with the reviewer that additional taxa would have made our analyses stronger, but importantly, we do not rely on genomic correlations to infer function. Rather, we use these correlations to generate functional hypotheses which we then tested experimentally. In this way, we do not rely on the strength of our correlations. We have now revised the manuscript to include additional context (including citations) and have improved the clarity of the text to better convey the novelty of our findings.

      2) The manuscript presents an overwhelming amount of biochemical and structural data obtained from a variety of protein binding techniques (i.e., NMR spectroscopy, in vitro GSTpulldown, fluorescence microscopy-based on-bead binding assays, and native massspectrometry). The results are poorly explained and not organized in a logical manner. Moreover, no attempt was made to explain the rationale behind using one technique over the other or how one method complements another to build a stronger conclusion than any individual approach. Given that none of the methods employed report quantitative measurement of binding affinities between C53 IDR and UFM1 or ATG8, it is not clear how the data presented in this manuscript contribute to our understanding of the proposed competition model for UFM1 and ATG8 binding to C53 IDR. To conclude that an interaction is "stronger" or "weaker" it is necessary to measure equilibrium binding constants. Fortunately, there are suitable techniques, including surface plasmon resonance (SPR), microscale thermophoresis (MST), fluorescence anisotropy, or calorimetry that are available to dissect these complex competitive binding interactions and to build models.

      We thank the reviewer for their suggestion. Although we attempted to describe the rationale behind each experiment (please see the line 135-137; on-bead binding assays, line154-157; NMR, 177-181), we agree that the volume of data and variety of techniques warrants additional explanation. We will revise the manuscript to further explain our rationale for using each of the different approaches. As we noted above in our response to reviewer 1, we will also perform relevant ITC binding assays to quantify the interaction between C53, ATG8, and UFM1.

      3) The NMR studies have the potential to dissect the types of dynamic binding inherent in unstructured proteins. However, the abundant NMR data presented combined with the aforementioned binding studies, remarkably, do not seem to significantly advance our understanding of how the system is organized or even how UFM1 and ATG8 bind C53, beyond the rather vague and somewhat circular conclusion stated in the abstract: "...we confirmed the interaction of UFM1 with the C53 sAIMs and found that UFM1 and ATG8 bound the sAIMs in a different mode." Or on line 165 "Altogether these results suggested that ATG8 and UFM1 bind the sAIMs withn C54 IDR, albeit in a different manner".

      We agree that NMR has the potential to dissect the complex binding interactions between UFM1, ATG8, and C53, but disagree with the reviewer’s interpretation that our NMR data fail to achieve this. To sum up, our NMR data: 1. Revealed the structural basis of the interaction of C53-IDR with ATG8 and UFM1 at atomic resolution by showing that UFM1 binds preferentially to sAIM1 in the fast-intermediate exchange [Fig.4 and Fig. S7B], instead ATG8 binds cAIM in the slow-intermediate exchange, and once cAIM is occupied, it binds sAIM1,2 with lower affinity in the fast-intermediate exchange (Fig.4 and Fig.S7D). 2. Determined conformational changes in C53 IDR upon binding of ATG8, but not UFM1 (Fig.S7E), which lead to increased dynamics in distinct regions in C53 IDR. These data could explain how binding of first ATG8 would trigger C53-dependent recruitment of the tripartite complex to autophagosomes. 3. Identified how UFM1 binds to atypical hydrophobic patch in C53 sAIM, similar to what was shown for the UBA5 LIR/UFIM. To sum up, our results shed light on how both UBLs interact with C53, being sAIM1 the highest affinity binding site for UFM1 while ATG8 binds cAIM preferentially before occupying sAIM1,2. To provide more detailed information on the atomic details of the interaction between C53 and the UBLs, we will perform molecular docking studies by using the restraints obtained from the experimental NMR data.

      4) The functional assays performed in Arabidopsis do not support the competitive model between UFM1 and ATG8 for binding to C53 during C53-mediated autophagy. The fluorescence microscopy images do not provide convincing evidence of colocalization between C53 and ATG8. In fact, in contrast to the claims made in the text or the quantification, mCherry-C53 fluorescence does not seem to localize in discrete puncta and its signal does not seem to overlap with ATG8A.

      We disagree with the reviewer’s interpretation of these results although we acknowledge that there is some subtlety in interpreting the co-localization data. Importantly, Arabidopsis has 9 ATG8 isoforms and C53 can bind to most of them with varying affinities (see Stephani et al). Because of this, we do not expect C53 puncta to fully colocalize with ATG8A puncta. Additionally, the C53 puncta are smaller and more subtle than ATG8 puncta, which label the entire autophagosome. To reconcile this, we will quantify the effect by performing colocalization analyses under normal and stress conditions. We will also upload all the raw images as supporting material, so that anyone can independently assess our images.

      Minor Issues: 1. The authors might choose to avoid teleological arguments such as (line 135): "As the phylogenomic analysis suggested that eh sAIMs have been retained to mediate C53-UFM1 interaction..."

      We thank the reviewer for this suggestion and will modify the text accordingly.

      1. The authors refer on multiple occasions to C53 "autoactivation" without defining what they mean by this. Do they propose that C53 UFMylates itself?.

      We refer to C53 activity as the ability to recruit the autophagy machinery and initiate cargo sequestration and degradation in the vacuole. We attempted to explain this in lines 57-61 but we will reword it more clearly, as suggested by the reviewer.

      1. The paper might want to avoid preachy philosophical statements like "Our evolutionary analysis also highlights why we should move beyond yeast and metazoans and instead consider the whole tree of life when using evolutionary arguments to guide biological research." (333- 335). While this is indeed a laudable goal, given the rather limited insights from this study, it is unclear how this paper exemplifies the notion.

      We added this statement as we were intrigued by our evolutionary analyses’ ability to link C53 to UFM1 (an association which took years to identify experimentally) and generate useful functional hypotheses about the interaction between C53 sAIMs and UFM1. As we mentioned above, we also wanted to highlight this point in reference to a recent prominent study in the field which drew conclusions after only considering animals, plants, and fungi (Walczak et al., 2019). We believe this point is important and underappreciated by some cell biologists, but we will modify the text to make it more generic: “This work highlights the utility of using evolutionary analyses and eukaryotic diversity to generate mechanistic hypotheses for cellular processes”.

      Significance

      Overall, while the manuscript contains an abundance of new data, the overall conclusion of the work, stated in the title: "Shuffled ATG8 interacting motifs form an ancestral bridge between UFMylation and C53-mediated autophagy" does not constitute a significant advance beyond other published phylogenomic analysis (below) and the two previous papers by the same authors, including the 2020 paper "A cross-kingdom conserved ER-phagy receptor maintains endoplasmic reticulum homeostasis during stress (PMID: 32851973)" and the 2021 paper "C53 is a cross-kingdom conserved reticulophagy receptor that bridges the gap between selective autophagy and ribosome stalling at the endoplasmic reticulum PMID: 33164651)". While a regulatory interaction between UFMylation and autophagy is of potential importance, the data in this manuscript do not constitute a major advance and fail to provide new mechanistic insight to explain the role of C53 IDR in autophagy and its interplay with UFMylation

      We disagree with the reviewer’s suggestion that our work does not constitute a significant advance. We outlined above in detail the novel insights that were obtained from our phylogenomic analysis which involved using improved methods to reveal a much more dynamic and informative picture of UFMylation evolution than has been described previously. Likewise, this manuscript builds substantially on our previous mechanistic work. In our 2020 paper (which is summarized in the mentioned 2021 review article), we identified C53 as an ER-associated protein that binds ATG8 through sAIMs and interacts with the phagophore after RPL26 UFMylation. This work linked C53 activity to ER-phagy and highlighted its importance in plant and animal stress response. However, key questions remained unanswered prior to our current work such as whether this mechanism is conserved across eukaryotes, especially in unicellular species, how C53 activity is regulated, and how UFM1 and ATG8 interact with C53. Our current manuscript builds on this work with the following key results: 1. We use a combination of phylogenomic and experimental analyses to demonstrate that C53 function is conserved across eukaryotes. 2. We reveal a mechanism whereby UFM1 and ATG8 compete for binding at the sAIMs in the C53 IDR and characterize how each of these ubiquitin-like proteins interacts in an alternative way (see the NMR results described above). 3. We show how the sAIMs are required for the regulation of C53-mediated autophagy and reveal the importance of UFM1-ATG8 competition in preventing C53 autoactivation, which causes unnecessary autophagic degradation and impairs cellular stress responses.

      These insights are fundamental for understanding the mechanisms regulating C53-mediated autophagy which were unknown before this work. We will therefore adjust our manuscript to more clearly and explicitly explain how our data build on previous observations so that the novelty and significance of our results are clearer.

      Referee #3

      Evidence, reproducibility and clarity

      Picchianti and colleagues have investigated a conserved molecular framework that orchestrates ER homeostasis via autophagy. For this, they have carried out phylogenomics and large-scale gene family analyses across eukaryote diversity as well as a barrage of molecular lab work. The amount of work carried out as well as the overall quality of the study is impressive.

      Thank you!

      I have only a few comments that should be very easy to tackle. (1) Maybe I missed it, but please upload all alignments used for phylogenetics and phylogenomics for reproducibility to e.g. Zenodo, Figshare or other suitable OA databases.

      We included the alignments in the supplementary data, but as suggested, we will upload all the source data including the scripts and the alignments to Zenodo.

      (2) "Why these non-canonical motifs were selected during evolution, instead of canonical ATG8 interacting motifs remains unknown" --> Maybe there is no "why" and these were not selected at all. Could be random... drift, non-adaptive constructive neutral evolution. I am not saying that asking "why" in evolutionary biology is wrong. It, however, often does not yield satisfactory answers--or any answer at all.

      The reviewer is completely right that “why” is not the right way to frame an evolutionary question. Thank you for pointing this out. We will revise the text and make sure that we remove these kinds of deterministic statements.

      (3) The authors make a case for UFMylation in LECA and I am fully sympathetic with this. However, getting rid of misfoled/problematic proteins and subcellular entities is something that prokaryotes also to a certain degree must have (and still do) master. Are inclusion bodies or export their only answers (I don't know)? Of course, in eukaryotes with all their intracellular complexity this is likely more of an issue. Given the scope of this manuscript (i.e. shedding light on that ancient framework, deep evolutionary roots in eukaryote evolution etc. etc.) it would be very interesting to read the authors thoughts on this and also pinpoint the prokaryote/eukaryote divide in light of the machinery discussed here.

      Thank you for this suggestion. We did indeed check whether any of the UFMylation machinery were present in prokaryotes and only found homologs of UFSP2. These results are consistent with Grau-Bové et al. (2014) who conducted an equivalent analysis and concluded that UFMylation machinery were derived during eukaryogenesis. We will make reference to this in the revised manuscript.

      Significance

      This study not only impresses with the volume of experiments and data, but also the courage to show conservation of a molecular framework by working with such a range of distantly-related eukaryotes. The results and conclusions from this study should be interesting to anyone working in the broad fields of cellular stress and/or autophagy--both extremely timely topics.

      We thank the reviewer for understanding our take-home message and the advances made. We especially thank the reviewer for understanding the challenge of connecting in silico genomic data with in vivo and in vitro experiments.

      CROSS-CONSULTATION COMMENTS

      Referee #2 The challenge in providing a fair review of this manuscript is to clearly define what contributions are novel, significant advances. It is difficult to tell the way the manuscript is written, as it is unclear how the new data - which are voluminous- actually advance the model already put forth by the same authors in two previous publications. It is also unfortunate that the authors overlooked the 2004 phylogenomics paper. There clearly are some new pieces of information here, but the overall increment in knowledge is rather minimal. Response from Referee #3 I agree that the authors somehow steamroll the reader with a wealth of data. But I think this can be addressed by the authors by requesting a lot more justification and by giving them the opportunity to put the significant advances into their own words. This is, in my opinion, quite doable in course of a revision. Overall I have to say that I am very sympathetic with the crosseukaryote reactivity approach that the authors have taken. It is quite intriguing.

      We thank the reviewers for this useful exchange. We agree that our manuscript was not clear enough to emphasize the novelty of our results which likely resulted from the volume and diversity of the experiments and analyses that were presented. We have now revised the manuscript to improve the context and rationale for the study, the intent and hypotheses behind each experiment, and the novel results and insights obtained in each section.

      Response from Referee #2 I agree that the cross-eukaryote approach is intriguing. Shouldn't we be concerned that the 2004 publication already made two of their key points (ie present in LECA, loss in Fungi). What is the incremental insight from this paper? I'd appreciate an opinion from an evolutionary biologist as to how strongly one can conclude functional co-evolution from such correlative data, especially given the rather small number of supporting examples. Is it also necessary to consider counter-examples- ie species that have sAIMs but no UFM1 (I believe that they found a few such cases)?

      Importantly, we do not conclude functional co-evolution from our correlative data. Instead, we used these correlations to generate hypotheses that we tested with various experiments in different model systems. For example, the apparent correlation between C53 sAIMs and UFM1 prompted us to test whether or not UFM1 and sAIMs interact. Regardless of sample size or statistical significance, phylogenomic analyses can never demonstrate functional links, only correlations, which is why we combined these two approaches. Although only a few species encoded C53 without UFM1, each of these contained C53 cAIMs and lacked sAIMs (Figure 2c). There are species with UFM1 that lack C53 but this makes sense as UFM1 is used in other processes besides ER-phagy. We have revised the text to make our approach and reliance on certain data clearer.

      Response from Referee #3 Well with these deep evolutionary questions this is always a challenge. Where does one stop to sample more homologs for one's analyses (one from each supergroup [which are no longer recognised by the community])? In that sense, the authors are right to make the parsimonious base assumption that if X and Y interact in species A and B (no matter how distant they are related) then X and Y interacted in the last common ancestor of A and B. That being said, if I would have designed this study, I would have sampled more broadly for my in vitro crosseukaryote approach. But also this, I think, could be carried out by the authors in a reasonable timeframe. Specifically, they have now sampled from Amorphea and Archaeplastida, they should add one from TSAR, one Haptista, one Cryptista, and one CRuM. If they synthesised the proteins via a company, they could have the constructs in a few weeks for about 1K Euro - I do not think that this would be an unreasonable request.

      We agree that testing C53 function in additional species would strengthen our understanding of the conservation of this pathway across eukaryotes, as it cannot be assumed that orthologous proteins will function in the same way across all species. To our knowledge there is no other work showing experimentally that the UFMylation pathway is working in a single-celled organism. We focussed our efforts on the unicellular green alga, Chlamydomonas due to its relative experimental tractability. However, testing this was not trivial as it required us to establish expression and purification protocols, isolate Chlamydomonas mutants, optimize physiological stress assays, and perform the experiments.

      Nevertheless, we agree that we could expand our in vitro assays with C53 orthologs from additional species. As suggested by reviewer 3, we will now synthesize 6 more C53 isoforms from two TSAR representatives (the alveolate, Tetrahymena thermophila, and the stramenopile, Phytophthora sojae), as well as a representative from Haptista (Emiliania), Cryptista (Guillardia), Diplomonada (Trypanosoma), and CRuMs (Rigifila). We will test their interaction with human and plant ATG8 and UFM1 proteins. We have also added two species from CRuMs into our phylogenomic analysis.

      The list of experiments that we can do to address the reviewer’s concerns: 1. Repeat experiment in Figure 1C probing with �-RPL26. 2. To calculate KD values, perform ITC experiments with C53 wild-type, C53 sAIM mutant and C53 cAIM variant titrated with ATG8 and UFM1. 3. Perform CoIP experiments using C53 antibody in wild type and UFM1 overexpressing lines and detect for ATG8 association, under normal and stress conditions. 4. We will test autophagic degradation of C53 in uba5 and ufc1 mutants under normal and tunicamycin stress conditions by performing autophagic flux assays using the native C53 antibody 5. Molecular docking studies to see C53’s structural rearrangements leading to ATG8 and UFM1 binding. 6. Figures from co-localization experiments in Figure 5G will be revisited and we will perform additional co-localization analyses such as Pearson coefficient under normal and stress conditions. We will also upload all the raw images as supporting material, so that anyone can independently assess our images. 7. We will upload all the source data for phylogenomic analyses, including scripts and alignments to Zenodo. 8. Test the interaction of 6 newly synthesised C53 isoforms from: (1) an alveolate (tsAr, Ciliate), (2) a stramenopile (tSar, Phaeodactylum), (3) a haptophyte (Emiliania), (4) a cryptophyte (Guillardia), (5) a diplomonad (Trypanosoma) and (6) a CrRuM with human and plant ATG8 and UFM1 proteins.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript from Picchianti et al. seeks to define the role of CDK5RAP3 (hereinafter referred as C53) during autophagy and its interplay with UFMylation. Together with UFL1 and DDRGK1, C53 is a component of a trimeric UFM1 E3 ligase complex that modifies the 60S ribosomal protein RPL26 at the endoplasmic reticulum (ER) surface upon ribosomal stalling (among other proposed functions that are not addressed). Several previous studies have implicated the UFMylation pathway in autophagy or ER-phagy although a non-autophagic fate for UFM1-tagged ribosomal subunits has also been reported.

      A previous study from the same authors (PMID: 32851973) identified an intrinsically disorder region (IDR) in C53 that is necessary and sufficient for interaction between C53 and autophagy receptor, ATG8. They reported that this IDR comprises four non canonical ATG8 interacting motifs (AIM), named shuffled AIMs (sAIMs) and showed that combinatorial mutagenesis of sAIM1, sAIM2, and sAIM3 abrogates ATG8 binding. A similar effect was observed for plant C53, though an additional canonical AIM (cAIM) in the C53 IDR had to be mutated to completely abolish C53 and ATG8 interaction. The earlier study reported that C53 IDR also interacts with UFM1, and this interaction can be disrupted in vitro by adding increasing concentration of ATG8, suggesting that ATG8 and UFM1 may compete with one another for C53 binding.

      The present paper attempts to build on this previous work by using phylogenomics to infer a co-evolutionary relationship between UFMylation machinery and sAIMs in C53, which the authors argue, constitutes further evidence of the primary importance of a role for UFMylation in ER homeostasis. The manuscript includes a lot of biochemical data using variations of in vitro and in vivo pull-down experiments to define the roles of individual AIMs in mediating the binding of C53 to ATG8 and to UFM1. They also use NMR spectroscopy in an attempt to define the structural basis of the UFM1 and ATG8 binding to C53, concluding that plant C53 interacts with UFM1 mainly through sAIM1, while interaction with ATG8 requires cAIM as well as sAIM1 and sAIM2. Finally, the authors attempt to contextualize these findings by conducting studies on Arabidopsis mutants, showing that replacing sAIMs with cAIMs causes increases sensitivity to ER stress and apparently increases formation of C53 intracellular puncta that may colocalize with ATG8.

      From these data the authors concluded that the dual-ATG8 and UFM1 binding of C53 IDR regulates C53 recruitment to autophagosomes in response to ER stress.

      Major Issues:

      1. The phylogenomics analysis conclusion that UFM1 is common in unicellular lineages and did not evolve in multicellular eukaryotes is not novel, as another comprehensive analysis of UFM1 phylogeny, published eight years ago - in 2014 - by Grau-Bové et al. (PMID: 25525215), also reported that UFM1, UBA5, UFC1, UFL1 and UFSP2 were likely present in LECA and lost in Fungi. Although the phylogenomic analysis by Picchianti et al. is also extended to DDRGK1 and C53 proteins, and some parasitic and algal lineages, their findings are incremental. Their proposed coevolution of sAIM and UFM1 is based on presence-absence correlation observed within five species (i.e., Albugo candida, Albuco laibachii, Piromyces finnis, Neocallimastix californiae, Anaeromyces robustus). However, this coevolutionary relationship must be further investigated by substantially increasing the taxonomic sampling within the UFM1-lacking group.
      2. The manuscript presents an overwhelming amount of biochemical and structural data obtained from a variety of protein binding techniques (i.e., NMR spectroscopy, in vitro GST-pulldown, fluorescence microscopy-based on-bead binding assays, and native mass-spectrometry). The results are poorly explained and not organized in a logical manner. Moreover, no attempt was made to explain the rationale behind using one technique over the other or how one method complements another to build a stronger conclusion than any individual approach. Given that none of the methods employed report quantitative measurement of binding affinities between C53 IDR and UFM1 or ATG8, it is not clear how the data presented in this manuscript contribute to our understanding of the proposed competition model for UFM1 and ATG8 binding to C53 IDR. To conclude that an interaction is "stronger" or "weaker" it is necessary to measure equilibrium binding constants. Fortunately, there are suitable techniques, including surface plasmon resonance (SPR), microscale thermophoresis (MST), fluorescence anisotropy, or calorimetry that are available to dissect these complex competitive binding interactions and to build models.
      3. The NMR studies have the potential to dissect the types of dynamic binding inherent in unstructured proteins. However, the abundant NMR data presented combined with the aforementioned binding studies, remarkably, do not seem to significantly advance our understanding of how the system is organized or even how UFM1 and ATG8 bind C53, beyond the rather vague and somewhat circular conclusion stated in the abstract: "...we confirmed the interaction of UFM1 with the C53 sAIMs and found that UFM1 and ATG8 bound the sAIMs in a different mode." Or on line 165 "Altogether these results suggested that ATG8 and UFM1 bbind the sAIMs withn C54 IDR, albeit in a different manner".
      4. The functional assays performed in Arabidopsis do not support the competitive model between UFM1 and ATG8 for binding to C53 during C53-mediated autophagy. The fluorescence microscopy images do not provide convincing evidence of colocalization between C53 and ATG8. In fact, in contrast to the claims made in the text or the quantification, mCherry-C53 fluorescence does not seem to localize in discrete puncta and its signal does not seem to overlap with ATG8A.

      Minor Issues:

      1. The authors might choose to avoid teleological arguments such as (line 135): "As the phylogenomic analysis suggested that eh sAIMs have been retained to mediate C53-UFM1 interaction..."
      2. The authors refer on multiple occasions to C53 "autoactivation" without defining what they mean by this. Do they propose that C53 UFMylates itself?.
      3. The paper might want to avoid preachy philosophical statements like "Our evolutionary analysis also highlights why we should move beyond yeast and metazoans and instead consider the whole tree of life when using evolutionary arguments to guide biological research." (333-335). While this is indeed a laudable goal, given the rather limited insights from this study, it is unclear how this paper exemplifies the notion.

      Referees cross-commenting

      Referee #2

      The challenge in providing a fair review of this manuscript is to clearly define what contributions are novel, significant advances. It is difficult to tell the way the manuscript is written, as it is unclear how the new data - which are voluminous- actually advance the model already put forth by the same authors in two previous publications. It is also unfortunate that the authors overlooked the 2004 phylogenomics paper. There clearly are some new pieces of information here, but the overall increment in knowledge is rather minimal.

      Response from Referee #3

      I agree that the authors somehow steamroll the reader with a wealth of data. But I think this can be addressed by the authors by requesting a lot more justification and by giving them the opportunity to put the significant advances into their own words. This is, in my opinion, quite doable in course of a revision. Overall I have to say that I am very sympathetic with the cross-eukaryote reactivity approach that the authors have taken. It is quite intriguing.

      Response from Referee #2

      I agree that the cross-eukaryote approach is intriguing. Shouldn't we be concerned that the 2004 publication already made two of their key points (ie present in LECA, loss in Fungi). What is the incremental insight from this paper?

      I'd appreciate an opinion from an evolutionary biologist as to how strongly one can conclude functional co-evolution from such correlative data, especially given the rather small number of supporting examples. Is it also necessary to consider counter-examples- ie species that have sAIMs but no UFM1 (I believe that they found a few such cases)?

      Response from Referee #3

      Well with these deep evolutionary questions this is always a challenge. Where does one stop to sample more homologs for one's analyses (one from each supergroup [which are no longer recognised by the community])? In that sense, the authors are right to make the parsimonious base assumption that if X and Y interact in species A and B (no matter how distant they are related) then X and Y interacted in the last common ancestor of A and B. That being said, if I would have designed this study, I would have sampled more broadly for my in vitro cross-eukaryote approach. But also this, I think, could be carried out by the authors in a reasonable timeframe. Specifically, they have now sampled from Amorphea and Archaeplastida, they should add one from TSAR, one Haptista, one Cryptista, and one CRuM. If they synthesised the proteins via a company, they could have the constructs in a few weeks for about 1K Euro - I do not think that this would be an unreasonable request.

      Significance

      Overall, while the manuscript contains an abundance of new data, the overall conclusion of the work, stated in the title: "Shuffled ATG8 interacting motifs form an ancestral bridge between UFMylation and C53-mediated autophagy" does not constitute a significant advance beyond other published phylogenomic analysis (below) and the two previous papers by the same authors, including the 2020 paper "A cross-kingdom conserved ER-phagy receptor maintains endoplasmic reticulum homeostasis during stress (PMID: 32851973)" and the 2021 paper "C53 is a cross-kingdom conserved reticulophagy receptor that bridges the gap between selective autophagy and ribosome stalling at the endoplasmic reticulum PMID: 33164651)". While a regulatory interaction between UFMylation and autophagy is of potential importance, the data in this manuscript do not constitute a major advance and fail to provide new mechanistic insight to explain the role of C53 IDR in autophagy and its interplay with UFMylation

    1. Reviewer #2 (Public Review):

      The authors utilize the publicly available dHCP dataset to ask an interesting question: how does postnatal experience and prenatal maturation influence the development of the visual system. The authors report that experience and prenatal maturation differentially contribute to different aspects of development. Namely, the authors quantify cortical thickness, myelination, and lateral symmetry of function as three different metrics of development. The homotopy and preterm infant analyses are strengths that, on their own, could have justified reporting. However, I have concerns about the analytic approaches that were used and the conclusions that were drawn. Below I list my major concerns with the manuscript.

      PMA vs. GA vs. PT

      1. The authors seek to understand the contribution of experience and prenatal development, yet I am unsure why the authors focused on the variables they did. There are three variables of interest used throughout this study: Gestational age at birth (GA), postnatal time (PT), and postmenstrual age at the time of scan (PMA). The last metric, PMA, is straightforwardly related to GA and PT since PMA = GA + PT. In most (but not all) of the manuscript, the authors use PMA and PT, with GA used without justification in some cases but not in others.

      It is unclear why PMA is used at all: PMA is necessarily related to PT and GA, making these variables non-independent. Indeed, the authors show that PMA and PT are highly correlated. The authors even say that "the contribution of postnatal experience to the development was not clarified because PMA reflects both prenatal endogenous effect and postnatal experience." So, why not use GA at birth instead of PMA? Clearly, GA is appropriate in some cases (e.g., Figure S4 or in some of the ANOVA applications), and to me, it seems to isolate the effect the authors care about (i.e., duration of prenatal development). Perhaps there is some theoretical justification for using PMA, but if so, I am unaware.

      That said, I expect that replacing all analyses involving PMA with GA will substantially change the results. I do not see this as a bad thing as I think it will make the conclusions stronger. As is, I am left unsure about what the key takeaways of this paper are.

      2. Using GA instead of PMA will have several benefits: 1) It will be much simpler to think of these two variables since they contrast the duration of fetal maturation and time postnatally. 2) This will help the partial correlation analyses performed since the variance between the variables is more independent. It will also mean that the negative relationships observed between PT and cortical thickness when controlling for PMA (e.g., Figure 2h) might disappear (reversed signs for partial correlations are common when two covariates are correlated). 3) this will allow the authors to replace Figure 1a with a more informative plot. Namely, they could use a scatter of GA and PT, giving insight into the descriptive statistics of both dimensions.

      3. I suspect that one motivation for the use of PMA over GA is for the analysis in Figure 6. In this analysis, the authors pick a group of term infants with a PMA equal to the preterm infants. Since PMA is the same, the only difference between the groups (according to the authors) is the amount of postnatal experience. However, this is not the only difference between the groups since they also vary in GA (and now PT and GA are negatively correlated almost perfectly). I don't know how to interpret this analysis since both the amount of prenatal maturation and postnatal experience vary between the groups.

      Justification of conclusions and statistical considerations

      I had concerns about some of the statistical tests and conclusions that the authors made. I refer to some of these in other sections (e.g., the homotopy analyses), but I raise several here.

      4. I am not sure what evidence the authors are using to make this claim: "we found that the cortical myelination and overall functional connectivity of ventral cortex developed significantly with the PMA but was not directly influenced by postnatal time." Postnatal time is significantly correlated with cortical myelination, as shown in Figures 2g, 2h, 3b, 3c, and postnatal time is significantly correlated with functional connectivity, as shown in Figures 4h, 5c, 5d, and 5e. Hence, this general claim that "the development of CT was considerably modulated by the postnatal experience while the CM was heavily influenced by prenatal duration" doesn't seem to be supported: both myelination and thickness are affected by postnatal experience and prenatal duration (as measured by PMA). A similar sentiment is expressed in the abstract. Perhaps the authors suggest different patterns in the strength of change for PMA vs. PT across these metrics, but if so, then statistical tests need to support that conclusion, and the claims need to reflect that sentiment.

      Interestingly, Figure S4 presents a compelling ANOVA that does support this conclusion. Still, this result is relegated to the supplement, and it also uses GA, rather than PMA, making it hard to reconcile with the other claims made in the main text. Moreover, it uses ANOVAs, which dichotomizes a continuous variable. Here and elsewhere in the manuscript (e.g., Figures 3d, 3e), the authors split the infants into quartiles and compare them with ANOVAs. Their use for visualization is helpful, but it is unclear what the statistical motivation for this is rather than treating these as continuous variables like is possible with linear mixed-effects models. Moreover, it is unclear why the authors excluded half the data from the study (i.e., quartiles 2 and 3) in this ANOVA when all four quartiles could be used as factors.

      5. It is unclear what the evidence is to support the following claim: "Both CT and CM show higher correlation with PMA in the posterior than anterior region, and higher correlation in the medial than lateral part within the anatomical mask (Figure 2a and Figure S2b-c [sic])" From Figure 2 or Figure S2, I don't see a gradient. From Figure S3, there might be a trend in some plots, but it is hard to interpret since it is non-monotonic. More generally, is there a statistical test to support this claim?

      6. "and the interaction [sic] was more prominent in CM (simple effect: t = 10.98, p < 10-9) that in than CT (t = 2.07, p < 0.05)." Does 'more prominent' mean it is 'significantly stronger'? If not, then the authors should adjust this claim

      7. Are the authors Fisher Z transforming their correlations? In numerous places, correlation values seem to be added together or used as the input to other correlation analyses. It is unclear from the methods whether the authors are transforming their correlation values to make that use appropriate.

      Homotopy analyses

      The homotopy section is a strength of the paper, but I have doubts about the approach taken to analyze this data and some of the conclusions drawn. I don't expect any of my suggestions to change the takeaway of this section, but I do think they are essential criticisms to address.

      8. I do not think that the non-homotopic control condition is appropriate. In Arcaro & Livingstone (2017), the authors had 3 categories for this analysis: homotopic pairs (e.g., left V1 vs. right V1), adjacent pairs (e.g., left V1 vs. right V2), and distal pairs (e.g., left V1 vs. right PHA1). In the homotopy analysis performed by Li and colleagues, they compare homotopic pairs with all other pairs. I don't think that is generous to the test since non-homotopic pairs include adjacent pairs that should be similar and distal pairs that shouldn't be similar. This may explain why some non-homotopic distribution overlaps with the homotopic distribution in Figure 4c.

      9. Regardless of this decision, I think the authors should reconsider their statistical test. I think the authors are using a between samples t-test to compare the 34 homotopic pairs with the hundreds of non-homotopic pairs. This is statistically inappropriate since the items are not independent (i.e., left V1 vs. right V1 is not independent of left V1 vs. right V2, which is also not independent of left V3 vs. right V2). This means the actual degrees of freedom are much lower than what is used. Moreover, I am unsure how the authors do this analysis across participants since this test can be done within participants. The authors should clarify what they did for this analysis and justify its appropriateness.

      10. Could the authors speculate on why the correlations in homotopic regions are so much lower than what Arcaro and Livingstone (2017) found. I can think of a few possibilities: higher motion in infants, less rfMRI data per participant, different sleep/wake states, and different parcellation strategies. Regarding the last explanation, I think this is a real possibility: the bilateral correlation may be reduced if the Glasser atlas combines functionally heterogeneous patches of the cortex. Hence, the authors should consider this and other possible explanations.

      11. The authors assume that the homotopic analyses mean that there are lateral connections between hemispheres (e.g., "Furthermore, the connections among the ventral visual cortex have developed during this early stage. Specifically, the homotopic connections between bilateral V1 and between bilateral VOTC both increased with GA, indicating an increased degree of functional distinction"). While this might be true, it doesn't need to be. Functional connectivity can be observed between regions that lack anatomical connectivity. Instead, two regions could both be driven by another region. In this case, the thalamus might drive symmetrical activity in the visual cortex.

      Miscellaneous

      12. I am not sure what the motivation of this line is: "Moreover, those studies did not fully control the visual experience in the first few weeks of the subjects, thus cannot give a clear conclusion whether the innate functional connectivity is unrelated to postnatal visual experience." Arcaro, Schade, Vincent, Ponce, & Livingstone (2017) did control the visual experience of subjects. Moreover, the research here doesn't control infant experience in the way this sentence implies: it implies an experiment manipulation (i.e., fully control) rather than a statistical control that is done here. Consider rephrasing

      13. I am not sure why this claim is made: "Area V1 was selected because this region is the most basic region for visual processing and probably is the most experience-dependent area during early development". Is there evidence supporting this claim? Plasticity is found throughout the visual cortex, and I think which region is most plastic depends on the definition of plasticity. For instance, most people have the same tuning properties to gabor gratings (e.g., a cardinality bias), but there is enormous variability in face tuning across cultures.

      14. The abstract says 783 infants were included in this study, but far fewer are actually used. The authors should report the 407 number in the abstract if any number at all.

      15. Any comparisons of preterms and terms ought to be given the caveat that the preterm environment can be very different than the term environment: whereas a term infant goes home and sees friends and family without restriction, the preterm environment can be heavily regulated if they are in a NICU. Authors should either provide details about the environments of the preterms in their study, or they should consider how differences in the richness of visual experience - regardless of quantity - may affect visual development.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      Referee #1

      Evidence, reproducibility and clarity

      1. This manuscript constructs a gene expression model with various factors. Specifically, the effect of cell size on gene expression is considered, which is often ignored by previous studies. One interesting finding is that the absolute number of the gene products and the concentration can have different distributions. Some predictions of the models are validated by experimental data on E. coli and yeast. This manuscript uses the mean-field approximation for cell volume, which has good accuracy when the number of stages is large. The usage of the power spectrum has a satisfactory effect on studying the concentration oscillation.

      Response: Thank you for the positive comments.

      1. Overall the paper was very difficult to follow and digest easily because of all the different factors and mechanisms invoked. It is mainly an issue of providing sufficient details for each of the factors and organizing them in a systematic and logical way. Although there is a supplementary appendix, it was hard to keep track of all the elements in the main manuscript. Perhaps something like Fig 1 of the Appendix can be presented in the main body to outline all the ingredients and how they affect each other.

      Response: In the revised manuscript, we moved Supplementary Fig. S1 in the previous version into the main text to outline all the ingredients and how they affect each other (see page 8, Fig. 2). Moreover, we provided many details for each of the biological factors and tried to organize them in a more systematic and logical way (see pages 3-7).

      1. It might be good to provide a more detailed description of the goal (studying gene product number and concentration under different parameters) after introducing the full and the reduced models. A table of symbols would also be helpful.

      Response: In the revised manuscript, we added a table explaning the meaning of all model parameters (see page 4, Table 1). Moreover, we provided a detailed description of the goal of the present paper after introducing the full and reduced models (see page 7).

      1. Some technical details in the Methods section are in fact helpful in understanding the conclusions. They can be moved to the Results section.

      Response: In the revised manuscript, we moved many technical details in Methods and Supplementary Notes to the main text to help the readers better understand the conclusions (see pages 5-10).

      1. One concern is that the central concept of this manuscript, “stage”, is not thoroughly discussed. This concept should have some significant biological meaning, not just be coined for mathematical convenience.

      Response: In the revised manuscript, we explained in detail the biological meaning of the effective cell cycle stages (see page 4). Specifically, recent studies have revealed that in many cell types, the accumulation of some activator to a critical threshold is used to promote mitotic entry and trigger cell division, a strategy known as activator accumulation mechanism. In E. coli, the activator was shown to be FtsZ; in fission yeast, it is believed to be a protein upstream of Cdk1, the central mitotic regulator, such as Cdr2, Cdc25, and Cdc13. Biophysically, the N effective stages can be understood as different levels of the key activator. Moreover, we pointed out that the power law form for the rate of cell cycle progression may come from cooperativity of the key activator that triggers cell division.

      1. Fig. 1(b) is a little strange. For the left panel, the x-axis (stage) is discrete, then the volume (y-axis) should be a step function, not a straight red line.

      Response: In the revised manuscript, we added some red dots in the stage-volume plot to show the dependence of the mean cell volume vk on cell cycle stage k for the mean-field model (see page 3, Fig. 1). Moreover, we emphasized that the joining of these dots by a straight red line is simply a guide to the eye.

      Significance

      1. The main advance is a more complete model of gene expression under more realistic organism growth conditions.

      Response: Thank you for acknowledging the results of the manuscript.

      Referee #2

      Evidence, reproducibility and clarity

      1. Jia et al. introduce a modeling framework to represent stochastic gene expression, with an explicit representation of cell volume growth, cell cycle progression (and its dependency on cell volume) and gene dosage compensation. The model is very elegant and general in that it can represent a variety of situations, simply as a matter of parametrization. Under a simplifying assumption, the authors derive a number of metrics (include stationary distribution of gene product and power spectrum of gene product fluctuation dynamics), for both absolute number and concentration of gene product molecules. They use their model and derivations to examine under which conditions cell can achieve homeostasis in the concentration of the expressed gene product, despite changes in cell volume and gene copy number following replication. They also present and discuss the conditions giving rise to specific features (i.e. bimodality in stationary distribution, peak in power spectrum) and examine these features in experimental data to conclude to infer the underlying homeostasis strategies. The model is rather general and powerful. The simplifying assumption seems reasonable (and the authors investigate to some extent its limitations, i.e. Fig. 2). The conclusions are overall convincing.

      Response: Thank you for the positive comments.

      Major comments 1. My main concern is that the metrics that the authors use to assess concentration homeostasis (i.e. the γ parameter and the presence / absence of peak in power spectrum) do not seem quite appropriate to describe how much variability / fluctuations in concentration are driven by cell cycle effects. Indeed, the γ parameter measures how much the *average* concentration in each cell cycle stage varies throughout the cell cycle. However, this variability should be compared to the total variability due to both cell cycle effects and stochastic bursting dynamics. A given level of cell-cycle dependency (say γ = 0.2) could be very visible if gene expression is weakly noisy (e.g. B low and hni high) and completely invisible is gene expression is highly bursty (large B and small hni). In the latter situation, cell-cycle effects would be meaningless for the cell to minimize. In essence, reusing the authors notations, I think γ/φ1/2 , would be a more relevant metric to observe.

      Response: In the revised manuscript, we showed that the total concentration noise φ can be decomposed as φ = φext + φint, where φext is the extrinsic noise which characterizes the fluctuations between different stages due to cell cycle effects and φint is the intrinsic noise which characterizes the fluctuations within each stage due to stochastic bursty synthesis and degradation of the gene product (see page 11). Based on the above decomposition, we introduced a new metric γ = φext/φ, which characterizes the accuracy of concentration homeostasis. Clearly, the new metric γ reflects the relation contribution of cell cycle effects in the total concentration variability. All discussions about concentration homeostasis are based on the new metric γ in the revised manuscript. Moreover, all figures have been updated by using this new metric.

      1. Similarly, when inspecting the peak in the power spectrum, the weight of the Lorentzian function(s) creating the peak, should be compared to the stationary component (λN , uN in the authors’ notations).

      Response: We cannot quite understand why the weights uk of the Lorentzian functions should be compared to the stationary component uN . In fact, all the weights uk except uN are actually complex numbers and we are not so sure about the meaning of uk/uN . However in the revised manuscript, we emphasized that the power spectrum G(ξ) is normalized so that G(0) = 1 throughout the paper (see page 13). To better understand concentration oscillations and its relation to homeostasis, we depicted both γ and H as a function of B and hni (see Supplementary Fig. S5). As expected, the off-zero peak becomes lower as B increases and as hni decreases since both of them correspond to an increase in concentration fluctuations which counteracts the regularity of oscillations; noise above a certain threshold can even completely destroy oscillations. Furthermore, we found that γ and H have similar dependence on B and hni. This again shows that the occurrence of concentration oscillations is intimately related to the visibility of cell cycle effects in concentration fluctuations.

      A complementary analysis including these two points and a discussion the relative contribution of cell-cycle effects and bursting dynamics in the total variability/fluctuation of concentrations would be important to include.

      Response: In the revised manuscript, we made some complementary analysis and discussion about the relative contribution of cell cycle effects and stochastic birth-death dynamics in the total variability of concentrations (see pages 11-14).

      Minor comments 3. The dashed line on Fig. 3a is defined as κ = √ 2 1−β . First is this empirical or does it come from a derivation? Second, it seems incomplete since it should depend on w. Intuitively, this line should correspond to the value of κ that would best mimic balanced biosynthesis in the case where β 6= 1. In other words, κ should be so that hρB0 /V (t)iprereplication = hκρB0 /V (t)ipostreplication, which yields κ = 2w(1−β) ∗ (w − 1)/w ∗ [2w(β−1) − 1]/[2(1−w)(β−1) − 1]. This indeed simplifies into κ = √ 2 1−β when w = 0.5.

      Response: Thank you for providing such a beautiful derivation. In the revised manuscript, we added this derivation into the main text (see pages 12-13). Moreover, we also made it clear that this relation can also be obtained from the perspective of power spectrum (see page 14).

      1. η is used in the caption of Fig. 2, which is cited on page 4. But it is defined only 2 sections later, on page 6.

      Response: In the revised manuscript, we gave the definition of η in both Table 1 and the caption of Fig. 3 (Fig. 2 in the old version). Please see page 4, Table 1 and page 9, Fig. 3.

      1. w is used in the main text, but only defined in the caption of Fig. 3.

      Response: In the revised manuscript, we gave the definition of w in both Table 1 (see page 4) and on page 7.

      1. w is defined as “the proportion of cell cycle before replication”. Is this in terms of cell cycle stages (i.e. w = N0/N) or actual time?

      Response: In the revised manuscript, we made it clear that w represents the proportion of cell cycle duration before replication, which should be distinguished from the proportion N0/N of cell cycle stages before replication (see page 7). This is because the transition rate between cell cycle stages is an increasing function of cell size, which means that earlier (later) stages have longer (shorter) durations.

      1. Fig. 3 indicates that power spectra are normalized so that G(0) = 1, but G(0) = 10 on the first two graphs.

      Response: Corrected as suggested (see page 12, Fig. 4). Thank you.

      1. Page 11: “bimodality in the concentration distribution is significantly less apparent”. I would suggest rephrasing “bimodality in the concentration distribution is absent” since there should be no reference to “significance” and bimodality is either present or absent (binary), not less apparent.

      Response: Corrected as suggested. Thank you.

      Referees cross-commenting

      1. Regarding the comment from reviewer 3 that ”a direct validity test should use data sets of at least two types (total, nascent RNA, etc)”. I almost made a related comment in my review, but then I held it off: This issue with using nascent RNA data is that their model does not allow an ON state. They assume that gene products are produced in instantaneous bursts, which is a fair assumption if the lifetime of gene products is large compared to the time the gene stays ON. This is ok if the considered ”gene products” are mRNA or proteins, but not nascent RNAs (for which the lifetime is the time to transcribe the gene). I did not make this comment in the end because I think the model is useful regardless. To comply with reviewer 3’s request, maybe the authors could use distributions of mRNA and protein products, but I’m not sure that such data exists (since they need cell-cycle-resolved data).

      Response: It is not possible to validate our model with nascent mRNA data because the model in its present form cannot predict nascent mRNA fluctuations. This is because unlike mature mRNA, nascent mRNA cannot be assumed to decay via first-order kinetics. A detailed response is provided below to the original comment made by Referee 3. Regarding the comment on the use of cell-cycle-resolved data measuring mRNA and protein expression – while we agree it would make an excellent test of our model, we could not find such a dataset in the literature. We point out that our model, in its present form, is interesting as it is, as a detailed biological model of mature mRNA and protein number / concentration fluctuations in growing cells. Its predictions are yet to be fully confirmed and hence may stimulate the development of further experimental single-cell studies.

      Significance

      1. The advance of this paper is essentially technical. The authors present a model that incorporates and unifies previously studied effects (cell volume homeostasis, concentration homeostasis, bursting transcription). There is no major conceptual novelty, but the combination of these different aspects and the derivations that authors present are very valuable and might be applicable to interpret data in various species.

      Response: Thank you for acknowledging the results of the manuscript.

      Referee #3

      Evidence, reproducibility and clarity

      1. The manuscript analyses a phenomenological model of stochastic gene expression. The model couples bursty transcription with cell growth, division and DNA replication. The cell cycle is divided into a large number of stages whose exponential lifetimes depend on the cell volume. It is argued that concentrations of gene products are distributed according to mixed Gamma distributions, whereas the copy numbers follow mixed negative binomial distributions. The number of modes can be different for concentrations and copy numbers, for instance the copy numbers can be unimodal while concentrations are bimodal. The case when the mean concentration does not depend on the cell cycle stage is called perfect homeostasis. It is argued that perfect homeostasis leads to Gamma distribution of the gene product concentration and that deviations from a Gamma distributions result mainly from deviations of the concentration from perfect homeostasis. It is also proposed that concentration homeostasis is difficult to obtain. These qualitative predictions of the model are tested using two data sets, one for E.coli and another for fission yeast.

      Response: Thank you for acknowledging the results of the manuscript.

      Major comments 1. A huge number of states called “cell cycle stages” have exponential life times. On my opinion, this sequence of stages is just a technicality for keeping the model within a discrete Markovian framework. More natural choices are possible, such as piecewise deterministic Markov processes, age structured diffusions, etc. The biological significance (if there is any) of such states should be explained.

      Response: In the revised manuscript, we explained in detail the biological meaning of the effective cell cycle stages (see page 4). Specifically, recent studies have revealed that in many cell types, the accumulation of some activator to a critical threshold is used to promote mitotic entry and trigger cell division, a strategy known as activator accumulation mechanism. In E. coli, the activator was shown to be FtsZ; in fission yeast, it is believed to be a protein upstream of Cdk1, the central mitotic regulator, such as Cdr2, Cdc25, and Cdc13. Biophysically, the N effective stages can be understood as different levels of the key activator. Moreover, we pointed out that the power law form for the rate of cell cycle progression may come from cooperativity of the key activator that triggers cell division.

      1. The timescales of stochastic gene expression are not correctly taken into account. It is considered that during an exponential stage the bursting approximation describes gene expression in terms of Gamma distributions for concentrations and in terms of negative binomial distributions for copy numbers. This approximation is only valid if the lifetime of a stage is much larger than the time needed to generate a burst. For RNA, this condition cannot be fulfilled for a large number of states N and/or for two states promoters with a relatively long ON state. For the protein and/or in the case of translational bursting, the condition is even more difficult to fulfil. I agree with the Reviewer 2 that once the master equation accepted the results make sense. But my criticism is different and concerns the master equation itself. In this equation the burst is considered instantaneous, whereas it needs finite time in reality. Concerning nascent mRNA, ON/OFF etc. I disagree. The notion of instantaneous burst with well defined burst size and burst frequency on a stage has a meaning if the lifetime of this stage (which is not mRNA or protein lifetime) is short. The model validity should be clearly stated.

      Response: Thank you for pointing out this important issue. When we talk about the validity of the model, we should stick to the full model, instead of the mean-field model. This is because once the full model makes sense, the mean-field model must work well when N ? 15, as we have shown in Fig. 3 and Supplementary Fig. S3. Hence our reply is based on the validity of the full model. We will reply to the above comments from the following three aspects. First, we agree with the referee that in our model, we assume that the gene product is produced in instantaneous bursts with the reaction scheme G ρpk (1−p) −−−−−−→ G + kM, k ≥ 1, M d −→ ∅, (1) where the mean burst size scales as V (t) β . Of course, in reality there is a finite time for the bursts to occur. A more general assumption is that within each cell cycle, the gene expression dynamics is characterized by the following three-stage model: G ρ −→ G ∗ , G∗ r −→ G, G∗ sV (t) β −−−−→ G ∗ + M, M u−→ M + P, M v −→ ∅, P d −→ ∅, (2) where the first two reactions describe the switching of the gene between an inactive state G and an active state G∗ the middle two reactions describe transcription and translation, and the last two reactions describe the degradation of the mRNA M and the protein P. Here the synthesis rate of mRNA depends on cell volume via a power law form with power β ∈ [0, 1]. Dosage compensation can be modeled by a decrease in the gene activation rate (for each gene copy) from ρ to κρ/2 upon replication. Previous studies have revealed that the bursting of mRNA and protein has different biophysical origins: transcriptional bursting is due to a gene that is mostly inactive, but transcribes a large number of mRNA when it is active (r ? ρ and s/r is finite), whereas translational bursting is due to rapid synthesis of protein from a single short-lived mRNA molecule (v ? d and u/v is finite). Under the above timescale separation assumptions, both mRNA and protein are produced in a bursty manner with the reaction scheme described by Eq. (1). The burst frequency for mRNA and protein are both ρ before replication and κρ after replication. The mean burst size for mRNA is (s/r)V (t) β and the mean burst size for protein is (su/rv)V (t) β , both of which have a power law dependence on cell volume (see pages 5-6). In Supplementary Figs. S1 and S2, we compare the mRNA and protein distributions for the bursty model with the reaction scheme given by Eq. (1) and the three-stage model with the reaction scheme given by Eq. (2), where both models under consideration have a cell cycle and cell volume description. It can be seen that the distributions for the two models are very close to each other under the above timescale separation assumptions with the bursty model being more accurate as r/ρ and v/d increase. Moreover, we find that the accuracy of the bursty model is insensitive to the value of the number of stages N. Here the values of N are chosen so that the ratio of the average time spent in each stage (T /N, where T ≈ (log 2)/g is the mean cell cycle duration) and the mean burst duration time (1/ρ) ranges from ∼ 0.5 − 2. This shows that the effectiveness of the bursty model does not require that the lifetime of a cell cycle stage is sufficient long. Due to mathematical complexity, we only focus on the bursty model in the present paper. The consistency between the gene product distributions for the two models justifies our bursty assumption. Second, while we assume bursty expression here, our model naturally covers non-bursty expression since the latter can be regarded as a limit of the former. Hence all the conclusions in the present paper are applicable to both bursty and non-bursty expression. In the revised manuscript, we emphasized this point (see page 4 for a detailed explanation). Last but not least, if the lifetime of the gene product is much shorter compared to the lifetime of each cell cycle stage, then the gene expression dynamics will rapidly relax to a quasi-steady state for each stage. In this case, the gene product fluctuations at each stage can be characterized by a gamma distribution in terms of concentrations and by a negative binomial distribution in terms of copy numbers, and hence the distribution of concentrations (copy numbers) for a population of cells is naturally a mixture of N gamma (negative binomial) distributions. However, the powerfulness of our analytical distribution (see page 10, Eq. (8)) is that it serves an accurate approximation when N ? 1 without making any timescale assumptions. The effectiveness of our analytical distributions is validated in Supplementary Fig. S3 for three different cases: (i) the degradation rate d of the gene product is much smaller than the cell cycle frequency f; (ii) d and f are comparable; (iii) d is much larger than f. In the revised manuscript, we also emphasized these points (see page 10).

      1. DNA replication is a stochastic event and does not occur after a fixed number of exponential stages as it is considered in this model. Concerning replication: in the model this occurs after exactly N0 steps. In reality, replication occurs somewhere between the start of S and G2/M. N0 is in fact a random variable. Probably a new mean field assumption is needed here with some justification, but I have seen nothing in the paper.

      Response: We agree with the referee that replication of the whole genome occurs in the S phase, which occupies a considerable portion of the cell cycle and thus cannot be assumed to occur after a fixed number of exponential stages. However, our model is for a single gene and since the replication time of a particular gene is much shorter than the total duration of the S phase, it is reasonable to consider it to be instantaneous. In addition, recent experiments have shown that the time elapsed from birth to replication for a particular gene occupies an approximately proportion of the cell cycle, which is called the stretched cell cycle model. This is also consistent with our assumption that replication of the gene of interest occurs after exactly N0 stages. While replication occurs after a fixed number of stages, nevertheless the time of replication is stochastic since each stage has a random lifetime. In the revised manuscript, we emphasized these points (see pages 4-5).

      1. The results in the Methods were derived heuristically and their relation to the master equation (12) is not explicit (except for the part concerning moments and their power spectrum). Furthermore, one would like to have some estimates of the biases introduced by the mean field approximation. Concerning biases introduced by the mean field approximation: Figure 2 is a numerical simulation, some analytical estimates could be better. As Figure 2 looks rather convincing, I reclassify this as minor comment.

      Response: We agree with the referee that the derivation of moments is rigorous, but the derivation of the analytical distribution given in Methods is not rigorous and cannot be directly obtained from the master equation. In the revised manuscript, we emphasized that the analytical distribution is not exact but it serves as a very good approximation (see pages 10 and 22). We showed that the analytical distribution agrees well with stochastic simulations when the number of cell cycle stages N ≥ 15 (see page 9, Fig. 3 and Supplementary Fig. S3). The logic behind our approximate distribution is that while the gene product may produce complex distribution of concentrations (copy numbers), when the number of cell cycle stages is large, the distribution must be relatively simple within each stage and thus can be well approximated by a simple gamma (negative binomial) distribution (see page 22). Due to the complexity of our model, it is very difficult to provide any analytical estimates on the bias introduced by the mean-field approximation. Often the bias of an approximation can be estimated when the approximation emerges from a systematic method such as van Kampen’s system-size expansion (see Ref. [21]). However, our mean-field model cannot be seen as the zero order term of some expansion and hence it is not possible to calculate the next-order correction which would be needed to estimate the error. However, we have tested very large swathes of parameter space and found that the mean-field approximation always works well when N ≥ 15 which is the physiologically relevant regime for most types of cells (see discussion on P. 7).

      1. The model is not minimal and depends on a huge number of parameters. It is not clear how these parameters were found and if overfitting was avoided. One may have doubts about the identifiability of the parameter N. What difference is between N = 59 and N = 60 (the value of N for the cyanobacterium)?

      Response: In the revised manuscript, we used synthetic data to show that all the model parameters involved in our model (except d and β which can be determined based on a priori knowledge) can be accurately estimated from cell-cycle resolved lineage data of cell volume and gene expression (see Supplementary Note 7). We provided details of the parameter inference method, compared the input parameters with the estimated ones and verify that they are identifiable (see Supplementary Table 1). We did not use real data to test our inference method because we could not find cell-cycle resolved lineage data for mRNA or proteins. As we noted, this is in principle possible via cell-cycle fluorescent markers. We also note that parameter inference for less detailed but similar models have been made in our previous papers — the parameters related to cell volume dynamics have been inferred in E. coli (see Ref. [51]) and fission yeast (see Ref. [52]) using the method of distribution matching, and the parameters related to gene expression dynamics have be estimated in E. coli (see Ref. [40]) using the method of power spectrum matching. Moreover, for our purpose, i.e. to investigate the effect of cell cycle and cell volume on gene expression, we do believe that our model is minimal. We captured cell growth with only one parameter g, the degree of balanced biosynthesis with one parameter β (β = 0 corresponds to the case where the synthesis rate is independent of cell volume and β = 1 corresponds to the case where the synthesis rate scales linearly with cell volume), the variability in cell cycle duration with only one parameter N, gene replication with only one parameter N0, gene dosage compensation with only one parameter κ (κ = 1 corresponds to perfect dosage compensation and κ = 2 corresponds to no dosage compensation), and the variation of size control strategy across the cell cycle with two parameters α0 and α1 (αi → 0 corresponds to timer, αi = 1 corresponds to adder, and αi → ∞ corresponds to sizer). The biological meaning of the cell cycle stages were clarified in the revised manuscript (see page 4). For our purpose, we believe that our model cannot be simpler.

      1. The authors should make clear which cell biology aspects are important, which are less important, and which were neglected in the context of their problem. Thus, in their model, cell cycle acts on gene expression mainly by duplication of burst sources and thus by increase of burst frequency after replication. Another important source of gene expression variability during the cycle, the mitotic transcription repression, is neglected.

      Response: In the revised manuscript, we clarified which cell biology aspects are important for gene expression dynamics (see page 17). Specifically, in our model, cell cycle and cell volume act on gene expression mainly by (i) the dependence of the burst size on cell volume; (ii) the increase in the burst frequency upon replication; (iii) the change in size control strategy upon replication; (iv) the partitioning of molecules at division. Point (iv) strongly affects copy number fluctuations, while it has little influence on concentration fluctuations. In addition, in the revised manuscript, we also elucidated the limitations of our model including mitotic transcription repression and others (see pages 19-20).

      1. The validity test of the model is indirect. It was tested that the concentration distribution deviates from Gamma and that the deviation correlates positively to the lack of accuracy of the concentration homeostasis. However, many models can have this behaviour. A direct validity test should use data sets of at least two types (total, nascent RNA, etc.) allowing direct estimates of some model parameters (such as burst size and frequency using nascent RNA). Concerning parsimony, I think that the authors should test it. Are all the parameters identifiable? Is there any overfitting? They could use parameter uncertainty, comparison of training /testing errors, etc. Some details about the parameter fitting method should be provided.

      Response: Regarding the parameter fitting and identifiability we have provided a detailed response to a previous comment above. However we emphasize that for the generation of Fig. 7, we did not need to estimate all model parameters from data. Hence in the previous version of the manuscript, no such estimation was done — we simply extracted the homeostasis accuracy γ, the height H of the off-zero peak of the power spectrum, and the Hellinger distance D of the concentration distribution from its gamma approximation directly from data. Finally, we point out that our model can be used to predict the dynamics of mature mRNAs, but it cannot be used to describe the dynamics of nascent mRNAs. This is because nascent mRNAs do not decay via a first-order reaction but their removal, i.e. their detachment from the gene which leads to mature mRNA, is better approximated by a reaction with a fixed decay time. This models the elongation time of nascent transcripts which does not suffer from much noise because the RNAP velocity is to a good approximation constant along the gene. See e.g. the following two papers for details: H. Xu, S. O. Skinner, A. M. Sokac, I. Golding, Stochastic kinetics of nascent RNA. Phys. Rev. Lett. 117, 128101 (2016). S. Braichenko, J. Holehouse, R. Grima. Distinguishing between models of mammalian gene expression: telegraph-like models versus mechanistic models. J. R. Soc. Interface 18, 20210510 (2021). Because of the fixed delay, the delay telegraph model (the telegraph model with a delayed degradation reaction) is non-Markovian and very different from the usual Markovian telegraph model which describes the dynamics of mature mRNA within each cell cycle. See e.g. the Supplementary Information of the following paper: X. Fu, et al. Accurate inference of stochastic gene expression from nascent transcript heterogeneity. bioRxiv (2021). Given the mathematical complexity introduced by a fixed delay, using it to describe the dynamics of nascent mRNA within each cell cycle leads to a non-Markovian model that is even more analytically intractable than the present one for mature mRNA. While an interesting research question, this is clearly far removed from the scope of our current manuscript.

      Minor comments 8. The introduction could be more pedagogical. Right now it is just an accumulation of loosely related and sometimes abruptly introduced statements. For instance, we understand that the authors want to oppose their approach to other extant approaches. However, extant approaches should be better reviewed, some of them are aged structured and perfectly suited for analysing cell cycle data. It would be useful for the reader that an example of observation explained by their model and not explained by other models (age structured or not) is discussed in detail. The model of this work does not explain size control, it just assumes that this holds, and does not discuss cell population aspects. A more nuanced positioning of this approach with respect to the literature would be useful for judging its value.

      Response: In the revised manuscript, we rewrote the introduction part to make it more pedagogical (see pages 1-2). In particular, we compared three popular models describing the cell size dynamics and the associated size homeostasis. The advantages and disadvantages of the three models were discussed.

      1. The meaning of N should be discussed from the very start when the model is introduced.

      Response: In the revised manuscript, we explained in detail the biological meaning of the effective cell cycle stages (see page 4). Specifically, recent studies have revealed that in many cell types, the accumulation of some activator to a critical threshold is used to promote mitotic entry and trigger cell division, a strategy known as activator accumulation mechanism. In E. coli, the activator was shown to be FtsZ; in fission yeast, it was believed to be a protein upstream of Cdk1, the central mitotic regulator, such as Cdr2, Cdc25, and Cdc13. Biophysically, the N effective stages can be understood as different levels of the key activator. Moreover, we pointed out that the power law form for the rate of cell cycle progression may come from cooperativity of the key activator that triggers cell division.

      1. The authors call constitutive expression the situation when the mean copy number does not depend on the volume. This choice should be clarified as in general constitutive as opposed to specific, localised or transitory expression refers to non-regulated gene expression. It seems to me that in this context, expression is only partially constitutive (independent on the volume).

      Response: In the present paper, constitutive expression means that the gene product is produced one at a time and is not produced in a bursty manner. It does not mean that the mean copy number does not depend on the volume. In the revised manuscript, we provided a more detailed discussion about how constitutive expression can be viewed as a limit of bursty expression (see page 4).

      1. In figure 1b and for exponential growth the y axis should be log(volume) instead of volume. The mean field approximation is called both “of novel type” (Discussion) and “which has a long history of successful use in statistical physics” (p4). If something is novel, then one should clearly explain why.

      Response: In fact, the y-axis in Fig. 1(b) should be volume instead of log(volume). This is because the x-axis represents the cell cycle stage instead of the real time. Note that for the adder strategy (α0 = α1 = 1), it follows from Eq. (3) on page 7 that the mean cell volume at stage k is vk = v1 + (k − 1)M0/N0, which linearly depends on k. This explains why the red curves in Fig. 1(b) are straight lines instead of exponential curves. In the revised manuscript, we also explained why the mean-field approximation used is novel (see page 7). Specifically, we pointed out that the mean-field approximation is not made for the whole cell cycle, rather we make the approximation for each stage and thus different stages have different mean cell volumes. This type of piecewise mean-field approximation, as far as we know, is novel and has not been used in the study of concentrating fluctuations before.

      1. The word “cyclo-stationarity” is used with not much definition. If this means just stationary distribution of the gene products why not use just “stationarity” instead. What means “cyclo”? A number of properties were called “rare” but it is not clear on what grounds.

      Response: In the revised manuscript, we removed the term “cyclo-stationarity” and simply assumed that the copy number and concentration distributions of the gene product at each cell cycle stage have reached the steady state (see page 8). In addition, for each property that was called “rare”, we explained the reasons in detail (see pages 14 and 17).

      1. I did not find a proof that the copy number distribution has less modes than the concentration distribution.

      Response: In fact, it is very difficult to prove that the concentration distribution has less modes than the copy number distribution. However, we have tested very large swathes of parameter space and found that the number of modes of the concentration distribution is always less than or equal to that of the copy number distribution. In the revised manuscript, we emphasized this point (see page 16).

      Significance

      1. The strength of this work is that it incorporates in a stochastic gene expression model a number of ideas on size control and dosage compensation that were discussed elsewhere from a cell population point of view. However, the proposed model is based on a number artificial choices that are difficult to justify biologically: a huge number of cell cycle discrete states and inappropriate handling of the timescales characterizing stochastic gene expression. Furthermore, the model is not minimal but depends instead on a huge number of parameters. I found the paper difficult to read and in the results presentation is not suitable for biologists that would need more details on the justification of the modelling choices and on the experimental validation of the model.

      Response: All these points have been addressed in previous replies.

      1. For mathematicians, the calculations are rather standard and may seem trivial.

      Response: Our model is complex due to the coupling between gene expression dynamics, cell volume dynamics, and cell cycle events. It is far more complex than standard models of gene expression (see e.g. Refs. [2,84,85]) because of the large amount of biology encapsulated in it and we presented a first analytical- and simulation-based analysis of concentration fluctuations when concentration homeostasis is broken.

      The computations of many quantities in the present paper are non-trivial. First, we showed that the generalized added volumes before and after replication both have an Erlang distribution. Using this property, we computed the mean cell volume in each cell cycle stage which is needed in the mean-field approximation. Furthermore, the computations of the power spectrum of concentration fluctuations are also highly non-trivial. The analytical expression of the power spectrum allows us to precisely determine the onset of concentration homeostasis. While the computations of moments of concentration fluctuations are standard, we used to the moments to construct an analytical concentration distribution which serves as an accurate approximation when N is large. Our concentration distribution is generally valid when concentration homeostasis is broken and goes far beyond recent models for growing cells which require concentration homeostasis and which do not take into account DNA replication, dosage compensation and size control mechanisms that vary with the cell cycle phase (e.g. Ref. [26] ).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript analyses a phenomenological model of stochastic gene expression. The model couples bursty transcription with cell growth, division and DNA replication. The cell cycle is divided into a large number of stages whose exponential lifetimes depend on the cell volume. It is argued that concentrations of gene products are distributed according to mixed Gamma distributions, whereas the copy numbers follow mixed negative binomial distributions. The number of modes can be different for concentrations and copy numbers, for instance the copy numbers can be unimodal while concentrations are bimodal. The case when the mean concentration does not depend on the cell cycle stage is called perfect homeostasis. It is argued that perfect homeostasis leads to Gamma distribution of the gene product concentration and that deviations from a Gamma distributions result mainly from deviations of the concentration from perfect homeostasis. It is also proposed that concentration homeostasis is difficult to obtain. These qualitative predictions of the model are tested using two datasets, one for E.coli and another for fission yeast.

      Major comments:

      The model encompasses a number of artificial choices:

      • A huge number of states called "cell cycle stages" have exponential life times. On my opinion, this sequence of stages is just a technicality for keeping the model within a discrete Markovian framework. More natural choices are possible, such as piecewise deterministic Markov processes, age structured diffusions, etc. The biological significance (if there is any) of such states should be explained.
      • The timescales of stochastic gene expression are not correctly taken into account. It is considered that during an exponential stage the bursting approximation describes gene expression in terms of Gamma distributions for concentrations and in terms of negative binomial distributions for copy numbers. This approximation is only valid if the lifetime of a stage is much larger than the time needed to generate a burst. For RNA, this condition cannot be fulfilled for a large number of states N and/or for two states promoters with a relatively long ON state. For the protein and/or in the case of translational bursting, the condition is even more difficult to fulfil.
      • DNA replication is a stochastic event and does not occur after a fixed number of exponential stages as it is considered in this model. The results in the Methods were derived heuristically and their relation to the master equation (12) is not explicit (except for the part concerning moments and their power spectrum). Furthermore, one would like to have some estimates of the biases introduced by the mean field approximation. The model is not minimal and depends on a huge number of parameters. It is not clear how these parameters were found and if overfitting was avoided. One may have doubts about the identifiability of the parameter N. What difference is between N=59 and N=60 (the value of N for the cyanobacterium)? The authors should make clear which cell biology aspects are important, which are less important, and which were neglected in the context of their problem. Thus, in their model, cell cycle acts on gene expression mainly by duplication of burst sources and thus by increase of burst frequency after replication. Another important source of gene expression variability during the cycle, the mitotic transcription repression, is neglected.<br /> The validity test of the model is indirect. It was tested that the concentration distribution deviates from Gamma and that the deviation correlates positively to the lack of accuracy of the concentration homeostasis. However, many models can have this behaviour. A direct validity test should use datasets of at least two types (total, nascent RNA, etc.) allowing direct estimates of some model parameters (such as burst size and frequency using nascent RNA).

      Minor comments:

      The introduction could be more pedagogical. Right now it is just an accumulation of loosely related and sometimes abruptly introduced statements. For instance, we understand that the authors want to oppose their approach to other extant approaches. However, extant approaches should be better reviewed, some of them are aged structured and perfectly suited for analysing cell cycle data. It would be useful for the reader that an example of observation explained by their model and not explained by other models (age structured or not) is discussed in detail. The model of this work does not explain size control, it just assumes that this holds, and does not discuss cell population aspects. A more nuanced positioning of this approach with respect to the literature would be useful for judging its value.

      The meaning of N should be discussed from the very start when the model is introduced.

      The authors call constitutive expression the situation when the mean copy number does not depend on the volume. This choice should be clarified as in general constitutive as opposed to specific, localised or transitory expression refers to non-regulated gene expression. It seems to me that in this context, expression is only partially constitutive (independent on the volume).

      In figure 1b and for exponential growth the y axis should be log(volume) instead of volume.

      The mean field approximation is called both "of novel type" (Discussion) and "which has a long history of successful use in statistical physics" (p4). If something is novel, then one should clearly explain why.<br /> The word "cyclo-stationarity" is used with not much definition. If this means just stationary distribution of the gene products why not use just "stationarity" instead. What means "cyclo"?

      A number of properties were called "rare" but it is not clear on what grounds.

      I did not find a proof that the copy number distribution has less modes than the concentration distribution.

      Referees cross-commenting

      Part 1

      I agree with the Reviewer 2 that once the master equation accepted the results make sense. But my criticism is different and concerns the master equation itself. In this equation the burst is considered instantaneous, whereas it needs finite time in reality.

      Part 2 (response to Part 2 of Rev2)

      • concerning replication: in the model this occurs after exactly N_o steps. In reality, replication occurs somewhere between the start of S and G2/M. N_o is in fact a random variable. Probably a new mean field assumption is needed here with some justification, but I have seen nothing in the paper
      • concerning biases introduced by the mean field approximation: Figure 2 is a numerical simulation, some analytical estimates could be better. As Figure 2 looks rather convincing, I reclassify this as minor comment.
      • concerning nascent mRNA, ON/OFF etc. I disagree. The notion of instantaneous burst with well defined burst size and burst frequency on a stage has a meaning if the lifetime of this stage (which is not mRNA or protein lifetime) is short. The model validity should be clearly stated.
      • concerning parsimony, I think that the authors should test it. Are all the parameters identifiable? Is there any overfitting? They could use parameter uncertainty, comparison of training /testing errors, etc. Some details about the parameter fitting method should be provided.

      Significance

      The strength of this work is that it incorporates in a stochastic gene expression model a number of ideas on size control and dosage compensation that were discussed elsewhere from a cell population point of view. However, the proposed model is based on a number artificial choices that are difficult to justify biologically: a huge number of cell cycle discrete states and inappropriate handling of the timescales characterizing stochastic gene expression. Furthermore, the model is not minimal but depends instead on a huge number of parameters.

      I found the paper difficult to read and in the results presentation is not suitable for biologists that would need more details on the justification of the modelling choices and on the experimental validation of the model. For mathematicians, the calculations are rather standard and may seem trivial. I am a systems biologist with a background in mathematics and theoretical physics.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Jia et al. introduce a modeling framework to represent stochastic gene expression, with an explicit representation of cell volume growth, cell cycle progression (and its dependency on cell volume) and gene dosage compensation. The model is very elegant and general in that it can represent a variety of situations, simply as a matter of paramterization. Under a simplifying assumption, the authors derive a number of metrics (include stationary distribution of gene product and power spectrum of gene product fluctuation dynamics), for both absolute number and concentration of gene product molecules. They use their model and derivations to examine under which conditions cell can achieve homeostasis in the concentration of the expressed gene product, despite changes in cell volume and gene copy number following replication. They also present and discuss the conditions giving rise to specific features (i.e. bimodality in stationary distribution, peak in power spectrum) and examine these features in experimental data to conclude to infer the underlying homeostasis strategies.

      Major comments:

      The model is rather general and powerful. The simplifying assumption seems reasonable (and the authors investigate to some extent its limitations, i.e. Fig. 2). The conclusions are overall convincing.

      1. My main concern is that the metrics that the authors use to assess concentration homeostasis (i.e. the γ parameter and the presence/absence of peak in power spectrum) do not seem quite appropriate to describe how much variability/fluctuations in concentration are driven by cell cycle effects. Indeed, the γ parameter measures how much the average concentration in each cell cycle stage varies throughout the cell cycle. However, this variability should be compared to the total variability due to both cell cycle effects and stochastic bursting dynamics. A given level of cell-cycle dependency (say γ=0.2) could be very visible if gene expression is weakly noisy (e.g. B low and <n> high) and completely invisible is gene expression is highly bursty (large B and small <n>). In the latter situation, cell-cycle effects would be meaningless for the cell to minimize. In essence, re-using the authors notations, I think γ / ϕ^1/2, would be a more relevant metric to observe.
      2. Similarly, when inspecting the peak in the power spectrum, the weight of the Lorenztian function(s) creating the peak, should be compared to the stationary component (λ_N, u_N in thhe authors' notations).

      A complementary analysis including these two points and a discussion the relative contribution of cell-cycle effects and bursting dynamics in the total variability/fluctuation of concentrations would be important to include.

      Minor comments:

      1. The dashed line on Fig. 3a is defined as κ = sqrt(2)^(1-β). First is this empirical or does it come from a derivation? Second, it seems incomplete since it should depend on ω. Intuitively, this line should correspond to the value of κ that would best mimic balanced biosynthesis in the case where β≠1. In other words, κ should be so that <ρB' / V(t)>_prereplication = <κρB' / V(t)>_postreplication which yields κ = 2^(ω(1-β)) * (ω-1)/ω * [2^(ω(β-1))-1]/[2^((1-ω)(β-1))-1] This indeed simplifies into κ = sqrt(2)^(1-β) when ω=0.5.
      2. η is used in the caption of Fig. 2, which is cited on page 4. But it is defined only 2 sections later, on page 6.
      3. ω is used in the main text, but only defined in the caption of Fig. 3.
      4. ω is defined as "the proportion of cell cycle before replication". Is this in terms of cell cycle stages (i.e. ω=N_0/N) or actual time?
      5. Fig. 3 indicates that power spectra are normalized so that G(0)=1, but G(0)=10 on the first two graphs.
      6. Page 11: "bimodality in the concentration distribution is significantly less apparent". I would suggest rephrasing "bimodality in the concentration distribution is absent" since there should be no reference to "significance" and bimodality is either present or absent (binary), not less apparent.

      Referees cross-commenting

      Part 1.

      I agree with reviewer 1 that a table of symbols would be helpful. On reviewer 3's second Major Comment, I don't think that the "the lifetime of a stage [has to be] much larger than the time needed to generate a burst". From how the authors write and solve the master equation, I don't think that such a separation of timescale is necessary. The authors should indeed clarify this and if reviewer 3 is correct, then that's indeed a major limitation. On reviewer 3's second Major Comment, I don't think that the "the lifetime of a stage [has to be] much larger than the time needed to generate a burst". From how the authors write and solve the master equation, I don't think that such a separation of timescale is necessary. The authors should indeed clarify this and if reviewer 3 is correct, then that's indeed a major limitation. On reviewer 3's comment "DNA replication [...] does not occur after a fixed number of exponential stages", I don't think I agree with this statement. Cell cycle progression relies on an ensemble of biochemical reactions. Representing this as a set of exponential waiting-time distributions with different means is probably amongst the most general and agnostic ways of representing this. Whether these exponential waiting-times only depend on cell volume is another question. This actually links back to reviewer 3's first Major comment and reviewer 1's comment that the concept of "stage" should be better discussed.

      Regarding the need for "estimates of the biases introduced by the mean field approximation" (reviewer 3), I guess that's the goal of figure 2. Maybe reviewer 3 should make more explicit what she/he would like to see.

      Regarding the comment from reviewer 3 that "a direct validity test should use datasets of at least two types (total, nascent RNA, etc)". I almost made a related comment in my review, but then I held it off: This issue with using nascent RNA data is that their model does not allow an ON state. They assume that gene products are produced in instantaneous bursts, which is a fair assumption if the lifetime of gene products is large compared to the time the gene stays ON. This is ok if the considered "gene products" are mRNA or proteins, but not nascent RNAs (for which the lifetime is the time to transcribe the gene). I did not make this comment in the end because I think the model is useful regardless. To comply with reviewer 3's request, maybe the authors could use distributions of mRNA and protein products, but I'm not sure that such data exists (since they need cell-cycle-resolved data).

      I disagree with the statements that "the proposed model is based on a number artificial choices that are difficult to justify biologically" and that "the model is not minimal but depends instead on a huge number of parameters." In my opinion, the model is elegantly simple to capture the mechanisms under study (i.e. the effect of cell cycle and cell volume on stochastic gene expression). It is expressed so that the model captures a broad range of situations (i.e. it reduces to simpler models as a matter of choosing parameter values, e.g. \Beta=0 => transcription independent of cell cycle; \alpha => \infty cell cycle depends only on size ...). I do not think that a series of exponential distributions for cell cycle progression is inappropriate, it is the most agnostic and general way of representing an ensemble of biochemical reactions that would be meaningless to describe explicitly. Instead, only their dependency on cell volume is taken into account (and in a very general way, i.e. parameters 'a' and \alpha). It is fair to ask the authors to clarify the concept of "stage", but I see this model as being as simple as possible, but not simpler, for the authors' purpose.

      Finally, I agree that the paper is probably "not suitable for biologists" but disagree that "for mathematicians, the calculations are rather standard and may seem trivial."

      Part 2. Resp. to reviewer 3 on the master equation (Part 1 of Rev3):

      Ok, I understand better your comment. What you mean by "the time needed to generate a burst" is the time that the gene produces RNAs, not the lifetime of the gene product (which is 1/d). That's true. It is essentially the same ifdea as what I write in my previous comment about nascent RNA data not being well captured by the model. Again, I think this is fine for "gene products" that are somewhat stable (not the case for nascent RNAs, but ok for mRNAs and proteins). This is fine by me as long as the authors explicit better this limitation of their model.

      Part 3. Response to Reviewer 3 (Part 2 of Rev 3)

      • concerning replication: Note that the mean field approximation is on cell volume, not on stage progression ("To simplify this model, [...] we ignore volume fluctuations at each stage but retain fluctuations in the time elapsed between two stages", p3). So the time at which replication occurs is already a random variable in the model. It is the sum of all the exponentially distributed random variables corresponding to stages 1 to N_0. The resulting distribution of replication time from the start of cell cycle is a random variable, which can be anything from very deterministic (N_0 very high) to very variable (N_0 very low).
      • concerning nascent mRNA, ON/OFF etc. : I'm not sure I get your objection, but the best is probably to let the authors respond to your original comment.
      • concerning parsimony: Ok, you're right. The authors should test it.

      Significance

      The advance of this paper is essentially technical. The authors present a model that incorporates and unifies previously studied effects (cell volume homeostasis, concentration homeostasis, bursting transcription). There is no major conceptual novelty, but the combination of these different aspects and the derivations that authors present are very valuable and might be applicable to interpret data in various species.

      The paper is suitable for a physics/mathematics/computational audience. It is rather technical and would not be understood by readers with only a biology background.

      Field of expertise of the reviewer: Gene regulation, single-molecule imaging, stochastic modeling.

    1. The hypocrisy and the cruelty are maddening.

      I have a general idea of Amanda Knox's story but I had never heard any specific details about the story like names or places or how she was treated. I find that with most aspects of society, especially with online activities, the people do tend to go for the crazy and outlandish stories. Once most people make up their minds about a person or story then it can be hard to change their viewpoints. No matter how many times Amanda may want to show the proof that she is an innocent person caught in the wrong place at the wrong time those people who paint her in a certain light will never change their viewpoints. Another story I can think of that shares some similarities is the Gypsy Rose case. Now Gypsy was active in the crime whereas Amanda was not active in her alleged crime. The main similarities between the two stories is how the media grabbed hold of it and that there are shows, movies, characters, etc, that are based of these real life people and the real things that happened to them. There are plenty of people who want to hold Gypsy as accountable as her at the time boyfriend and others who think she was innocent but a product of her surroundings. The way this little girl, that we were told at the time of the crime, was painted as a monster is insane to think about. But if that can happen to a young woman then anything can be thought about a mid twenties adult woman in a foreign country. The way the public romanticizes or dehumanize a person for actions they may or may not take can be insane to think about. These people who get treated this way almost never get to go back as a normal everyday person.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      We wish to thank all three reviewers for their thorough examination of our manuscript and their constructive criticism that allowed us to increase its quality. You will see that, following their recommendations, we have included a good amount of new data in the manuscript. Specifically, we added a new figure with experiments proposed by the reviewers (now Fig. 4), as well as Figs. S3 and S4. In addition, we expanded one paragraph of our Discussion to comment on a very recent article published by Huang et al in Nature Structural and Molecular Biology with conclusions pertaining the interplay of Rpd3 and Gcn5 in PHO5 gene regulation. Below we include the point-by-point response (in blue) with the changes we have implemented to address their specific points. All the additions and changes in the manuscript are made in red.

      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Novačić et al., investigate into a mechanisms of the non-coding transcriptiondriven regulation of the phosphate-responsive PHO5 gene. The authors employ CRSPRi system to discern direct contribution of the antisense non-coding transcription (CUT025) expressed during phosphate -rich conditions to transcriptional repression of the yeast PHO5 gene and therefore challenging previous study from the Svejstrup's lab that proposed a positive role for non-coding transcription in control of PHO5 gene. They propose a model where non-coding transcription represses PHO5 by mediating recruitment of Rpd3 histone deacetylase leading to altered chromatin structure at PHO5 promoter due to reduced recruitment of the RSC chromatin remodelling complex. Overall, the data presented in the manuscript are of a good quality, experiments are well controlled and nicely presented. Manuscript is well written. My specific comments are below: 1. I am somewhat confused by the data presented in Figure 5. While there is similar impact on the chromatin structure seen in rrp6D and air1Dair2D strains (Fig 5C) that corresponds to more "closed" configuration of chromatin , it is not consistent with H3 ChIP data that show higher nucleosome occupancy across PHO5 UAS in rrp6D but loss of nucleosomes in the double mutant (or there is a mistake perhaps while plotting the data?)

      We now realize that the data was plotted confusingly, and we apologize for it. While doing the H3 ChIP experiment we only prepared the +Pi samples for the air1Δ air2Δ double mutant. In the figure we only included this one data point for the double mutant, which could lead to the false conclusion that at other timepoints there are no histones at its PHO5 promoter region. We decide to remove this data point from the figure to avoid the confusion and only keep the air1Δ air2Δ data for the ClaI assay. We believe that this should not be an issue as this data point is not critical for the conclusions we are making.

      1. To further explore direct link between nc transcription, Rpd3 and rrp6 mediated effect, I suggest to test the effect on PHO5 induction upon rpd3 and rrp6 deletions in CRISPRi CUT025 background.

      We performed this experiment and now include it as Fig. S3 in the manuscript. As expected, expressing the CRISPRi system only made difference when Rpd3 was present.

      1. It seems that most noticeable effect of blocking nc transcription by an elegant approach that utilizes CRISPRi system on the phosphatase activity is seen between 0-1.5h of induction. I suggest taking additional time points at 30-45 min.

      We took additional timepoints and the results were incorporated as the new Fig. 5E. The CRISPRi effect resulting in higher acid phosphatase activity was still most noticeable after 1,5 h of induction. This was mostly in line with the fact that the difference in PHO5 mRNA levels was most pronounced after 30 min of induction (Fig. 5D), as the time needed to achieve measurable protein level after induction can lag significantly for secretory proteins, such as acid phosphatase. Secretory proteins are cotranslationally translocated into the ER, after which they traverse the secretory pathway and undergo modifications before being finally exported to the periplasm where their activity can be measured. Consequently, the increase in acid phosphatase activity upon induction is only measurable after at least an hour.

      1. How do authors explain that the effect of the exosome mutations are reversed and phosphatase activity is increased at later time point (20 h, Fig 2A)? I suggest using more distinct colour for dis3 mutants.

      That effect is indeed somewhat surprising. We hypothesize that the effects we are seeing after 20 h reflect the specific conditions of prolonged induction, i.e. keeping the chromatin open or semi-open for a very long period of time, which do not necessarily reflect the early gene induction period that we are using as a read-out of the effect of different mutations on acid phosphatase expression kinetics. We previously noticed a similar effect with chromatin remodeler-related mutants (e.g. rsc2Δ, unpublished result from S. Barbarić group), which speak in favour of the prolonged induction conditions resulting in a chromatin state with its own specialized cofactor requirements. We therefore consider the chromatin state after prolonged induction a topic for another study, however, we now comment on this effect in the manuscript. The dis3 mutants are now shown in more distinct colours.

      1. Figure 5A -label "H3 ChIP"

      The label was added.

      1. Error bars are quite high in Fig 1C, perhaps it is worth repeating the experiment

      Since significant differences in PHO5 mRNA levels can be seen between wt and rrp6Δ mutant cells at 0,75 and 3 h of induction, we feel that the higher error bars at 5 h of induction are not worth repeating the experiment – especially since the values are bound to converge to a similar one after a longer induction period, as demonstrated in Fig. 1D.

      Significance

      significant of interest for general audience

      Referee #2

      Evidence, reproducibility and clarity

      The authors study the PHO5 locus, which is known to a have antisense transcript and that has previously been shown the be important for activation of Pho5 sense transcription. The authors challenge the idea by an extensive analyses. They show the Pho5-AS represses sense transcription, and thus fits in the category as AS repressors instead of activators. They show a correlative data that when antisense goes down and sense goes up. They show that increase antisense levels leads to decrease sense levels. They use mutants of decay pathways to increase the levels antisense transcription. Moreover, they used crispri to repress the antisense transcript. Lastly, they show that histone deacetylation represses Pho5 sense. The data in the manuscript is convincing, and well presented. One thing that needs further clarification is the strategy to increase anti-sense levels by deletion mutants of decay or depletion of decay pathways. While it is clear that this stabilizes the pho5-AS and decrease pho5-sense, it is not clear that this causes an increase in transcription. Perhaps, it is possible that antisense transcript itself has a repressive effect. If one really wanted to increase antisense transcription than the antisense promoter should be increased in strength. On the other the CriprI experiment is very convincing. I am surprised how well the crisprI system works, it is thought to be not so efficient at blocking elongating polymerase and good at blocking initiation.

      We thank the reviewer for this feedback. We performed additional experiments which you will find described below. Based on the results, we would like to keep the point about AS transcription causing the effect.

      Major comments: - Are the key conclusions convincing? Perhaps, the conclusion that increased transcription leads to repression is not completely convincing. The authors use mutants in rrp6, exosome, and nrd1 to increase Pho5-AS transcription elongation. However, I am always under impression that these mutants stabilize the transcript. And the authors acknowledge this in their manuscript. So how do you discriminate between increased stability versus increased elongation? I support the conclusion that inhibition of Pho5-AS leads to increase Pho5-S. However, increase in elongation is not directly demonstrated. While still possible, it is equally possible that a more stable pho5-AS transcript has a repressive an effect on Pho5-AS. - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? See above. Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. If the authors want to keep the message that increased transcription of Pho5-AS leads to more repression that may need to consider additional experiments. For example, increasing transcription from the antisense promoter.

      We performed the proposed experiment and now include it in the manuscript as Fig. 4AB. Briefly, we inserted the strong constitutive TEF1 promoter in the antisense configuration downstream of the PHO5 gene ORF, so that it drives AS transcription. The results of this experiment very clearly show the inverse relationship between PHO5 mRNA and AS transcripts levels at +Pi conditions. Importantly, this strong constitutive AS transcription had an even more pronounced effect on PHO5 gene expression than deletion mutant backgrounds (in which, like in wt cells, the AS promoter is presumably weak), and did not allow for full level of PHO5 gene expression to be reached. To verify that the AS RNA itself does not have a regulatory role, but rather the act of its transcription represses the corresponding gene, we performed an additional experiment with appropriate diploid strains. The design of this experiment is standardly used to test whether an AS transcript can work in trans (for example see Nevers et al. 2018 NAR Fig. 6). This experiment is now included as Fig. 4C. Together, the results of these experiments paint a clear picture of AS transcription, and not AS level/stability itself, driving the repression of the PHO5 gene.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. To me this is an optional experiment, but it would benefit the manuscript
      • Are the data and the methods presented in such a way that they can be reproduced? yes - Are the experiments adequately replicated and statistical analysis adequate? yes

      Minor comments: - Specific experimental issues that are easily addressable. - Are prior studies referenced appropriately? yes - Are the text and figures clear and accurate? Yes - Do you have suggestions that would help the authors improve the presentation of their data and conclusions? no

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. The manuscript challenges previous work where it was claimed that Pho5-AS is important for activation of Pho5-S. As such, it is important work. In the field of noncoding the transcription the Pho5-AS fits in a class of AS transcript that has been well described.
      • Place the work in the context of the existing literature (provide references, where appropriate). See above.
      • State what audience might be interested in and influenced by the reported findings. In researchers in field of transcription, chromatin, and more specifically in yeast gene regulation.
      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Chromatin, transcription, yeast.

      Referee #3

      Evidence, reproducibility and clarity

      Novačić et al present a manuscript entitled "Antisense non-coding transcription represses the PHO5 model gene via remodeling of promoter chromatin structure" which is a locus-specific follow up to previous studies from Soudet and Stutz groups on genome-wide analysis of transcription interference mediated by antisense transcripts in S cerevisiae. Critically, the authors here employ a CRISPRi approach to reduce antisense transcription from reaching the PHO5 promoter and in doing so show that kinetics of PHO5 induction are increased as would be predicted from their previous model. Additionally, they show predicted epistasis between rpd3 and rrp6 on PHO5 expression and gcn5 and rrp6 that are consistent with their model. Comments are relatively minor but should be addressed. Introduction p3. "This mechanism was subsequently explored genome-wide in yeast, which revealed a group of genes that in the absence of Rrp6 accumulate AS RNAs and are silenced in an HDACdependent manner (14)." This sentence appears awkward- perhaps move "in the absence of Rrp6" to after "AS RNAs"?

      Corrected as proposed.

      p3 "Under a high phosphate concentration Pho4 undergoes phosphorylation by the cyclindependent-kinase (Pho80-Pho85)" Since "the" is used, don't use parentheses around Pho80-Pho85

      Corrected as proposed.

      Methods Give amount/concentration of glycine used in quenching formaldehyde for ChIP. Give the exact wash conditions and buffers not "extensively"

      All of those details are now provided in the manuscript. Figure 4C.

      Describe schematic in legend

      It is now described.

      Figure 4D. Indicate time of induction in legend.

      This was lacking for Figs. 4B-C (now 5B-C) so we added it there.

      Figure 5A. air∆ data are missing from later time points?

      Please see our first response to Reviewer 1. We removed the air1Δ air2Δ double mutant data, as we only had one data point for it in this assay.

      Figure 6. Legend needs to indicate what Pi conditions are. Since PHO5 expressed, appears to be low Pi. An issue that needs to be discussed is that rpd3∆ appears to decrease expression of PHO5 AS. Is this simply because of increased PHO5 expression? Does rpd3∆ have any effects on AS in high Pi? This is important to interpret if effects of rrp6 and rpd3 are epistatic or additive.

      We thank the Reviewer for bringing this to our attention. To explore the effect of rpd3Δ on PHO5 AS level, we quantified the PHO5 AS transcript by RT-qPCR with cells grown in (chemically defined) high Pi medium, which we now include in Fig. 7A. We find that rpd3Δ mutation has practically no effect on PHO5 AS transcript level both in the wt and the rrp6Δ mutant background. This result speaks in favor of rrp6Δ and rpd3Δ being epistatic rather than additive.

      Figure 7. Sth1-CHEC data are hard to interpret. Some sort of quantification might be required as effects are not clear from the browser track nor is it clear from browser track that the results are reproducible. Examination of Sth1-AA effects in gcn5∆ background might be more compelling that the effect on RSC is via acetylation. Otherwise it is a bit hard to say as RSC could be functioning in parallel to the acetylation-dependent pathways implicated.

      We agree that the presumption that histone acetylation recruits RSC to the PHO5 gene promoter had to be tested. We therefore include the experiment involving Sth1-AA depletion in the gcn5Δ background as Fig. 8A. This experiment was complicated by the fact that RSC is highly abundant (and at the same time essential for cell viability), but we resolved this by starting to deplete RSC two hours before gene induction. These results position RSC and Gcn5 in the same pathway. In contrast, more complete Sth1 depletion severely impaired viability of the rrp6Δ mutant, making it hard to interpret the effect, so we now include this result as Fig. S4.

      To show the effect of AS transcription on RSC recruitment to the PHO5 promoter more quantitatively, we re-analyzed the Sth1-CHEC data (for two independent biological replicates) and now include the log2 values for the changes in Sth1 binding in the text of the manuscript.

      Significance

      The work is focused and narrower in impact but important because direct tests of locus-specific effects are performed, validating models from previous genomic analyses. **Referees cross-commenting**

      I think the other reviews are very reasonable. I would just suggest to the authors that they think carefully about the reviews and decide what they think is most valuable to improving the work/presentation

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Revision Plan

      1. General Statements

      We really appreciate the positive comments and suggestions of the reviewers on our submitted manuscript. We think we will be able to solve the issues inquired by reviewers by adding new data and revising the phrases as detailed below.

      2. Description of the planned revisions

      Reviewer #1:

      Major comments

      Localization analysis of a transiently expressed MAP70 transgene with inactivating phosphosite mutations would be important to see whether the identified conserved phosphosites are relevant for MAP70 interaction with MTs. This experiment could be performed rapidly using transient expression in BY-2 cells.

      We agree on the importance of this analysis. Therefore we are currently preparing fluorescent markers of Nt-MAP70-2-like and its phospho-blocked (Ala) version to coexpress with MT and nuclear markers in BY-2 cells. We estimate that we need three more months to complete this experimsnt.

      The authors propose that PP2 blocks phragmoplast formation by preventing phosphorylation of class II Kinesin-12 proteins. In support, authors show that PP2 treatment correlates with a decrease in KIN12A phosphopeptide count (not fully abolished) and its failure to localize to emerging phragmoplasts in BY-2 cells and Physcomitrium. As class II Kinesis-12 proteins have been previously implicated in phragmoplast assembly this is a fairly reasonable hypothesis, but would benefit from the analysis of transgenic KIN12A variants carrying inactivating (A) or potentially activating (D/E) phosphosite mutations. Is loss of phosphorylation sufficient to prevent phragmoplast localization? Can an activated variant rescue PP2-induced KIN12A localization and cell division defects? As above, using transient expression in BY-2 cells would be a fast approach to tackle these questions.

      We are currently preparing fluorescent markers of phospho-blocked (Ala) and phospho-mimic (Asp) versions of KIN12A (PAKRP1) to coexpress with MT and nuclear markers in BY-2 cells. We will check whether they localize to phragmoplast and also test PP2 effects. We would need three more months to complete these analyses.

      Reviewer #2:

      Major comments

      • The manuscript would strongly benefit from being revised by a native english speaker. There are many unusual or awkward formulation, in particular in the abstract.

      We apologize for unnatural sentences. After adding new data and correcting the manuscript, we will ask a native english speaker to revise it.

      Reviewer #3:

      Major comments

      The major concern is lack of evidence to connect MAP70 and MT disruption upon treatment with PD-180970, in contrast to PP2, which was shown to affect localization of Kinesin-12. I wonder if authors could use taxol to stabilize MTs, then observe the localization of MAP70 with application of PD-180970?

      As we responded to reviewer 1, we are preparing the fluorescent marker of Nt-MAP70-2-like to coexpress with MT and nuclear markers in BY-2 cells. By using this multi-color marker, we will test whether PD-180970 affects the localization of MAP70 on MTs, also using taxol. However, in our experiene, taxol is not a very effective inhibitor and may not work in our transient expression system in BY-2 cells. In that case, we will analyze whether phospho-mimic (Asp) version can prevent MT disruption in the presence of PD-180970 to assess the relation of PD-180970, MAP70 and MT disruption.

      I have another concern on the action of PD-180970. PD-180970 appears to affect ubiquitously indispensable proteins for MTs. If PD-180970 disrupt MT by inhibiting phosphorylation of some MAPs, it must need time for turnover of proteins phosphorylated before PD-180970 was applied. In the proteomics experiment, author treated the cells with the compounds for 8-9 hr. On the other hand, in BY-2 cells, PD-18970 disrupted MTs only 30 min after application of PD-180970. I wonder if proteins were replaced during the 30 min. Could authors examine how long it takes to affect interphase MTs? If PD-180970 disrupts MTs in a 5-10 min like oryzalin, it is unlikely that inhibition of phosphorylation of proteins like MAP70 caused MT disruption. Rather, it may inhibit some proteins that have activity to disrupt microtubules but are usually inactivated by phosphorylation or inhibit something directly without phosphorylation.

      We agree that there is no evidence that PD-180970 disrupts MTs by inhibiting phosphorylation of MAP70. In our live-imaging system, in which reagents are added to liquid cultivation medium, the time from the reagent application to the arrival to each cell varies. Therefore, in order to accurately measure the time required for the inhibitor to take effect, it is necessary to design a new assay system, such as using fluorescent dyes to monitor the reagent's diffusion. In addition, since some reactions mediated by protein phosphorylation occur rapidly, minute-order observations might not be sufficient. Therefore, as an alternative strategy to assess the direct involvement of MAP70 phosphorylation on MT stabilization, we will examine whether PD-180970 induces MT disruption using strains expressing the phospho-blocked (Ala) and phospho-mimic (Asp) versions of MAP70 described above.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1:

      Minor comments

      The authors identified the analogs PD-166326 and PP1 as potent inhibitors of cell division. For completeness, it would be interesting to include a description of these arrest phenotypes and how they compare with that of PD180870 or PP2.

      We have added the effects of all tested compounds on Arabidopsis embryos in Fig. S3C and Table S1. Based on this data and the results of tobacco BY-2 cells, we have compared the effects of PD-166326 and PD180870, and PP1 and PP2 in Results.

      Although there are two more obvious candidates in the phosphoproteome datasets on which the authors focus on, there is very little discussion on whether the other top hits and whether they might be involved in cell division. On a related note, there is no discussion on the specificity of these compounds and the likelihood of phenotypes unrelated to cell division.

      We have added the information of “Similar proteins in Arabidopsis” and “Description and putative functions” for all identified candidates for PD-180970 and PP2 in Table S2 and S3, respectively. With referring this information, we have added the sections to describe the possible contributions of these candidates on MT organization and phragmoplast formation in Results. In addition, we have described the specificity of these compounds and the phenotypes unrelated to cell division in the section for the results of Arabidopsis roots (Fig. S2A).

      1st results section:

      "...developed into the globular stage without causing morphological defects..."

      Should omit the word "causing" or replace with "any/detectable"

      We have omitted the word "causing".

      Reviewer #2:

      Even if the identification of the kinase(s) targeted by these two compounds is missing, the characterisation of at least two downstream effectors of these elusive kinase(s) inhibited by PD-180970 and PP2 is an important step forward. I would recommend to this point make very clear in the writing (e.g. already in the abstract). Upon a superficial reading, the reader could assume that MAP70s and PAKRP1s are the direct molecular targets of these compounds.

      We appreciate the very positive comments. To clarify this point, in addition to the following responses to each suggestion, we have changed the last sentense of the abstract to “These properties make PD-180970 and PP2 useful tools for transiently controlling plant cell division at key manipulation nodes that are conserved in diverse plant species”.

      Major comments

      • I would modify the title to shift the emphasis from the methodology to the biological targets identified.

      We have changed the title to “Identification of novel compounds inhibiting microtubule organization and phragmoplast formation in diverse plant species”.

      • Concerning MAP70s the authors claim that there is little functional data about this family. Yet, a recent paper (https://www.science.org/doi/10.1126/sciadv.abm4974) identifies MAP70-5 as necessary for the proper organisation of CMTs in the endodermis and its ability to actively remodel to accommodate emergence of the lateral root primordium in Arabidopsis thaliana. This could provide a functional context to test several of the predictions that the authors list in the discussion.

      We have referred this paper in Results and Discussion, as “MAP70-5 was reported to increase MT length in vitro and to reorganize cortical MTs to alter the endodermal cell shape for lateral root initiation, suggesting that MAP70-5 mediates dynamic change of MT arrays”.

      Minor comments

      • The narrative would be improved by moving the section "PD-180970 and PP2 do not irreversibly damage viability" before the phosphoproteomic section.

      We have moved the “irreversibly” section to before the “phosphoproteomics” section.

      Reviewer #3:

      Minor comments

      In supplemental data, authors show only 12 or 14 candidates of the target. It is interesting how other MAPs including homologues of MAP70 and Kiesnin-12 in BY-2 cells were scored in the phospho-proteomics assay. I suggest authors show longer lists of proteomics including other MAPs. It would be valuable information for the research community.

      We apologize for not providing the complete dataset. We have added Dataset S1 of total protein sequences that we predicted from published RNA-sea data of BY-2 cells, and all identified proteins of phosphoproteomics assay for PD-180970 and PP2 in Datasets S2 and S3, respectively. We have moved the lists of top candidates to Tables S2 and S3.

      In Abstract, authors should mention that the two compounds reduced phosphorylation level of diverse proteins including MAP70 and Kinesin-12. This is very important results and, otherwise, it may cause misunderstanding of the activity of the compounds. In addition to this, it is better to rephrase the following sentence. "presumably by inhibiting MT-associated proteins (MAP70)" with "presumably by inhibiting phosphorylation of MT-associated proteins (MAP70)."

      To avoid such a misunderstanding, we have changed the descriptions in Abstract to “Phosphoproteomic analysis showed that these compounds reduced phosphorylation level of diverse proteins. In particular, PD-180970 inhibited phosphorylation of the conserved serine residues in MT-associated proteins (MAP70). PP2 significantly reduced the phosphorylation of class II Kinesin-12, and impaired its localization at the phragmoplast emerging site”. Due to this change, the suggested sentence was eliminated. Also in Discussion, we have mentioned the reduction of phosphorylation of various proteins by stating, "we found that PD-180970 and PP2 reduced the phosphorylation levels of diverse proteins. These parts may be further modified depending on the results of the phospho-blocked (Ala) and phospho-mimic (Asp) analyses.

      Page7 line 1st. it would be better to insert "of MAP70 family" after "in the conserved MT-binding domain" because the MT binding domains are unique to the MAP70 family. I could not understand why this is " (2nd line) consistent with PD-18970 severely disrupting all the tested MT structure". At current stage, there is no evidence that dephosphorylation of MAP70 caused the microtubule disruption. I suggest authors remove the sentence (", which was~MT structures").

      We agreed on both points and have corrected them as the reviewer suggested.

    1. the exhibition of Miss Clack’s character.

      I think this is an interesting way to phrase this. Taken by itself, we may have taken Clack's narrative as the truth, but when examined beside the other narratives, her unreliability is exposed and her character traits (and flaws) become clear.

    1. Author Response

      Reviewer #1 (Public Review):

      Viola et. al. compared the electron transfer efficiency of two types of oxygenic far-red photosystem II (PSII) with the "conventional" PSII and analyzed how these far-red PSII use the limited energy from infrared photons to proceed photosynthesis. Oxygenic photosynthesis is an energy-intensive process, and a large headroom is also needed for preventing harmful back-reactions from occurring, which can produce singlet oxygen. This research investigated how the far-rad PSII managed to do their work with limited energy.

      The authors measured and compared the forward reactions of different kinds of PSII (Chl-a-PSII, Chl-d-PSII and Chl-f-PSII), including the flash-induced chlorophyll fluorescence decay and S-states turnover. These results led to a conclusion that the forward reaction quantum efficiency was not changed between "conventional" PSII and far-red PSII. However, the back-reactions of three types of PSII are different based on the measurements of the prompt fluorescence decay, delayed luminescence decay, and thermoluminescence band locations. The authors concluded that the two far-red PSII (Chl-d-PSII and Chl-f-PSII) have a different strategy for utilizing infrared light. Indeed, the authors showed that Chl-d-PSII containing cyanobacteria produced more singlet oxygen than other types, and this result was explained by the energy profile in the electron transfer chain.

      The major strength of this research is the authors made a direct comparison of different far-red PSII under the same conditions. It's exciting to have a side-by-side comparison between two types of far-red PSII. In addition, the authors also measured the singlet oxygen produced from all types of PSII which clearly showed the differences in the routes of recombination.

      We thank the reviewer for the interest demonstrated in our work and for the thoughtful comments, that we have addressed below.

      However, there are some concerns:

      1) The flash-induced fluorescence decay, thermoluminescence, delayed luminescence and S-states turnovers of the Chl-d-PSII and Chl-f-PSII have been characterized before (ref 5, 26, 39), but from intact cells compared to isolated membranes in this study, and similar conclusions have been achieved. The authors mentioned four reasons (lines 115-120, see the manuscript for the authors' arguments "i." to "iv.") why it's important to use isolated membranes. However, in my opinion, these reasons are not sufficiently strengthened:

      i. The transmembrane potentials from cells can be collapsed by adding uncouplers;

      ii. The authors mentioned the quinone pool in the cells is uncontrollable, but the authors didn't actually measure or manipulate the quinone pool in the membrane (e.g., the ratio of QB/QB-/empty-pocket in the samples);

      iii. The phycobilisomes can be controlled by different conditions through state transitions;

      iv. The isolation of membranes may not remove membrane-related quenching mechanisms (e.g., PSII quenching in State II, spillover, etc.).

      We do not agree with the reviewer on this point. We consider the use of membranes (or isolated PSII) as being the best solution to limit the effects listed at the end of the Introduction and to provide consistency between the different measurements, some of which cannot be performed in intact cells (i.e., the UV absorption measurements). More specifically:

      i) The effectiveness of uncouplers in dissipating the membrane potential is likely to vary between species (e.g., Chroococcidiopsis cells form aggregates incapsulated by a protective layer of excreted polymers) and should be assessed by directly measuring the membrane potential. ElectroChromic Shift-based measurements of the membrane potential in cyanobacteria have only been demonstrated in Synechocystis sp. PCC6803 and Synechococcus elongatus sp. PCC7942 (Viola et al. 2019, https://doi.org/10.1073/pnas.1913099116) and still need to be adapted to the far-red species used here. Additionally, commonly used uncouplers such as CCCP and FCCP are ADRY reagents, that interfere with PSII water splitting by directly reducing TyrZ (Ghanotakis et al. 1982, https://doi.org/10.1016/0005-2728(82)90115-3), and would affect all the measurements presented in this work.

      ii) In the dark, the redox state of the PQ pool in cyanobacterial cells has been observed to be kept in a highly reduced state by respiration, with potential consequences on the QB/QB- ratio. This could well vary between species, based on their different physiologies and growth conditions. In isolated cyanobacterial membranes and PSII, the QB/QB- ratio is expected to be around 50% after a short dark adaptation. This seems to be the case in our samples, based on the flash-dependent oscillations of the S2QB- and S3QB- thermoluminescence shown in Appendix 2 compared to the literature (Rutherford et al. 1982, https://doi.org/10.1016/0005-2728(82)90061-5), assuming an initial ~75% S1 population, as confirmed by the flash-dependent oxygen evolution and UV absorption. This is now mentioned in Appendix 2.

      iii) The control of state transitions requires specific illumination regimes incompatible with the conditions required for our experiments. Moreover, state transitions remain largely uncharacterised in the far-red species used in the present work. In some of these species, the situation is further complicated by the presence of both visible and far-red light-absorbing phycobilisomes that have a different spatial distribution in the cell (MacGregor-Chatwin et al. 2022, https://doi.org/10.1126/sciadv.abj4437).

      iv) Non-photochemical energy quenching in cyanobacteria seems to occur in phycobilisomes, due to the action of the Orange Carotenoid Protein (OCP). Both OCP and the phycobilisomes, if present in cyanobacterial cells (and that depends on the strains), are removed when membranes are isolated. It’s been proposed that direct quenching of the PSII core occurs in Synechococcus elongatus 7942 cells in state II (Choubeh et al. 2018, https://doi.org/10.1016/j.bbabio.2018.06.008), but since the mechanism has not been elucidated, no conclusion can be made on whether this could occur in membranes. The same is true for spill-over. Additionally, neither of the two mechanisms could be better controlled in cells than in membranes, so there would be no advantage here from working in vivo.

      In addition, the authors reached a conclusion that the Chl-f-PSII containing species should suffer from fluctuation light-induced membrane potential spikes, but don't actually measure this in physiologically relevant preparations. It will be more beneficial to use intact cells instead of an isolated membrane. I suggest the authors either restrict their conclusions to what the isolated membranes clearly show or make measurements in intact cells.

      The proposal that the far-red forms of PSII (both Chl-d-PSII and Chl-f-PSII) should suffer from increased charge recombination induced by spikes of membrane potential in fluctuating light is not new (see for example Nürnberg et al. 2018, https://doi.org/10.1126/science.aar8313), and is based on the observations made in plant PSII (Davis et al. 2016, https://doi.org/10.7554/eLife.16921) and assumed to be universal in oxygenic photosynthesis. In PSII, the transfer of electrons from the primary donor chlorophyll to QA occurs vectorially in the membrane, against the trans-membrane electric field, thanks to these electron transfer steps being exergonic. Spikes in the electric field due to sudden intensity fluctuations increase the probability of backward electron transfer. If the overall drop in the energy of the electron from the primary donor to QA is smaller (in a long wavelength PSII), it should result in a higher probability of backward transfer for a given trans-membrane electric field, and therefore a greater susceptibility to spikes in the electric field. We did not measure these effects and we do not claim to have done so. As already mentioned in the answer to point i) above, doing so would require the development of ElectroChromic Shift-based measurements of the membrane potential in the cyanobacterial species containing far-red photosystems. This is a separate research project beyond the scope of the present work.

      In conclusion, we believe that our statement justifying the use of isolated membranes at the end of the Introduction is valid.

      1. The authors measured the fluorescence decays as part of the evidence to show the stability of S2QA-. I have several concerns about these measurements:

      i. In figure 2B, the WL C. thermalis (blue) trace has a unique decay phase with a lifetime of about 0.2s, which the authors denoted as S2QA- recombination. Could the author elaborate on how this phase was assigned to this state?

      All decay kinetics in presence of DCMU are bi-phasic (with an additional faster phase in the WL and FR C. thermalis samples, attributed to a small fraction of centres where DCMU did not bind). In the manuscript we did originally assign both phases as arising from S2QA- recombination, but it is true that the middle phase, that is slightly faster in WL C. thermalis, is too fast to originate from that. This phase can rather be ascribed to TyrZ•(H+)QA- recombination occurring in a fraction of intact PSII centres before the full stabilization of charge separation, as shown in Debus et al. 2000 (https://doi.org/10.1021/bi992749w), or in centres lacking a Mn-cluster. We have now modified the paragraph regarding the fluorescence decay in presence of DCMU accordingly (L. 142-145): “The shorter lifetime (~0.22-1 s) of the middle decay phase (amplitude 15-20%) was compatible with it originating from TyrZ•(H+)QA- recombination occurring either in centres lacking an intact Mn-cluster (24) or in intact centres before charge separation is fully stabilised, as proposed in (23).”.

      A luminescence decay phase with a similar lifetime was initially ascribed, incorrectly, only to TyrZ•(H+)QA- recombination occurring in centres devoid of an intact Mn-cluster, in Appendix 5. This has now been rectified.

      ii. In figure S1 (the full version of 2B), all the fluorescence traces seem to rise at the end of the measurements. Could the authors check whether the measuring light intensity was actinic?

      This rise is significant only in the A. marina dataset (now Figure 2-figure supplement 1), and given the low signal to noise ratio in the last points of the fluorescence curve, we consider this small anomaly to be a measuring artefact. The rise is absent in the other traces in Figure 2- figure supplement 1 and in Figure 2B, except for the last point of the A. marina dataset in Fig. 2B. The corresponding Source data provided, shows that a rise in the last point of the measurements is only present in one of the three A. marina replicates (#2), while the non-decaying fluorescence is present in all A. marina samples and discussed in the text. Except for this last anomalous point, the decay curves of the A. marina replicate #2 do not differ significantly from the other two replicates. This clearly suggests an artefact, and is not consistent with the measuring light being actinic. A clarifying sentence has been added in the legend of Figure 2- figure supplement 1.

      iii. In figure S2, it seems to me that the fluorescence decay of Synechocystis + DCMU (Green open squares) was slower than the WL C. thermalis and is similar to the FRL C. thermalis in figure 2B. If the Synechocystis + DCMU is indeed similar to FR C. thermalis, would that be consistent with the authors' conclusions?

      When fitting the Synechocystis+DCMU fluorescence decay kinetics (in what is now Appendix 1-figure 1), we obtain two decay phases with, respectively: an amplitude of ~12% and lifetime of ~0.22 s, and an amplitude of ~81% and lifetime of ~7.9 s. These values are similar to those reported for WL C. thermalis in Table 1, with an overall fluorescence decay faster than in FR C. thermalis. Nonetheless, because of the limited number of Synechocystis biological replicates, we limit ourselves to a qualitative comparison. The luminescence decay kinetics are also faster in Synechocystis (as in WL C. thermalis) than in FR C. thermalis (now Figure 5- figure supplement 2).

      These data are consistent with our conclusions: the energy gap between QA- and Phe in Chl-f-PSII is at least as large as in Chl-a-PSII, or could even be larger, as suggested by the slower S2QA- recombination measured by fluorescence (Figure 2) and luminescence (Figure 3) decay.

      iv. It's known that DCMU will alter the redox potential of QA/QA- in plants. Would it have similar effects to the PSII studied in this research? If so, it will be meaningful to include these effects in the energy diagram in fig 7.

      Yes, we do expect DCMU to change the QA/QA- redox potential in our samples, as it does in plants and other cyanobacteria, although the actual effect in different PSII types would need to be measured. The energy gap values in now Figure 8 are only estimates based on literature values and on the relative changes reported here, they are not calculated from any of our data and do not specifically refer to the experimental conditions we used, including the use of DCMU. For this reason, we think that adding the effects of DCMU in the diagram would not be particularly useful and could be confusing.

      1. The authors didn't use WL C. thermalis for measuring oxygen evolution and the authors claimed that the PSII content in WL C. thermalis is too low. Is that a technical issue (e.g., cannot purify PSII enriched membranes) or a biological issue (i.e., white light condition produced less PSII)? In Fig S9C, the oxygen generated from WL C. thermalis is comparable to FR C. thermalis. Could the author explain how they reached the conclusion that PSII in WL C. thermalis was low? In addition, the author should also provide evidence showing that the samples of WL C. thermalis do not have significant PSII activity under far-red light.

      We did measure the flash dependence of oxygen evolution in WL C. thermalis membranes, and we did observe oscillations with visible flashes (but not with far-red flashes, as expected). However, the data were not good enough to be able to perform any significant analysis. Unfortunately, in the case of WL C. thermalis, we have not been able to isolate O2-evolving cores, as stated in L. 194-195. The WL C. thermalis data have now been added in Figure 3- figure supplement 1, together with the non-normalised traces of all other samples (following the suggestion by reviewer #3), and the text has been modified accordingly. The data in Figure 3- figure supplement 1 also provide evidence that the samples of WL C. thermalis do not have significant PSII activity under far-red light (although this was already clearly demonstrated in Nürnberg et al. 2018).

      We do have evidence that the PSII content per chlorophyll is lower in WL C. thermalis than in FR C. thermalis, based on fluorescence emission spectra, yield of isolated PSII and PSI from purification procedures, and O2 evolution per chlorophyll, as can be seen for example in Figure 3- figure supplement 1. The levels of PSII accumulation depend on the growth stage (among other factors) in model species such as Synechocystis. Since C. thermalis cells grow more slowly than other cyanobacteria species and their physiology has not been studied in detail yet, it is difficult to control the levels of PSII accumulation. This explains the inter-sample variability in the rates of O2 evolution per chlorophyll measured with the Clark electrode, that have now been added in Appendix 6-table 1.

      1. The authors used an indirect method, which used chemical trap histidine and oxygen consumption, for measuring the production of singlet oxygen from different types of PSII. I have several concerns about this approach.

      i. Why not use a probe that reacts directly with singlet oxygen probes like SOSG or EPR probes to unambiguously confirm the production of singlet oxygen? The difficulties of not using SOSG mentioned in Rehman et al (SI Ref#22) should be no longer problems when isolated membranes were used. The advantage would be a validation of the results and perhaps increased sensitivity.

      Although SOSG or EPR probes could also be used to detect singlet oxygen production, these other methods seem to be significantly less sensitive than histidine trapping. For example, Fufezan et al. 2007 (https://doi.org/10.1074/jbc.M610951200) used the EPR spin trap TEMPO and needed 30 minutes of illumination. Extended illumination (up to 1 hour) has also been used to detect singlet oxygen using SOGE (Flors et al 2006, https://doi.org/10.1093/jxb/erj181).With the histidine trapping method used here, less than 2 minutes of illumination were required to measure the singlet oxygen production rates. This allowed potential problems of prolonged illumination (e.g. a loss of intact PSII centres due to photodamage) to be minimised, and allowed us to confirm the results obtained in isolated membranes with those obtained in intact cells.

      As shown in now Figure 6- figure supplement 1E, the histidine-dependent oxygen consumption was suppressed by the singlet oxygen quencher sodium azide, as also shown in Rehman et al. 2013 (https://doi.org/10.1016/j.bbabio.2013.02.016). We also independently confirmed that the singlet oxygen generated by illumination of the dye Rose Bengal can be efficiently detected with the histidine trapping method and suppressed by the addition of sodium azide (Figure 6- figure supplement 1F). For these reasons, we are confident that what we measure with the histidine trapping method is singlet oxygen production.

      ii. In Rehman et al (SI Ref#22), wild-type Synechocystis cells showed significant production of singlet oxygen in the presence of DCMU and His (Figure 3A in SI Ref#22), however, the amount of singlet oxygen measured from the membranes in this study seemed to be less (Fig S10E). Could the authors provide some explanations?

      Fig. 3A in Rehman et al. showed that the production of singlet oxygen was about 10% with respect to the oxygen evolution activity in absence of additions (open squares). The light saturation curves in Fig. 4B of the same paper also show that at saturating light intensity the singlet oxygen production rate is about 10% compared to the O2 evolution rate. The traces we show in Figure 6-figure supplement 1 are only representative. The comparison should be made between the results in Rehman et al. and the averages of biological replicates that we show in Fig. 6 (membranes) and Appendix 6-figure 4A (cells). For WL and FR C. thermalis, we measure singlet oxygen production rates that are about 20% of the O2 evolution rates, slightly higher than those measured in Synechocystis in Rehman et al. Considering the variability between biological replicates, we consider our values in line with those in Rehman et al.

      iii. Can the presented results distinguish the production of singlet oxygen from recombination or other sources (e.g., antenna, free chlorophyll)? Some key controls are needed to strengthen the authors' claims.

      This is difficult to demonstrate unequivocally, but we have different lines of evidence that support the conclusion that the increase in singlet oxygen production in A. marina originates from differences in PSII charge recombination with respect to the other samples:

      i) The high levels of singlet oxygen production are observed in intact cells as well as in membranes. In neither of these samples do we expect to have significant amounts of damaged PSII or free chlorophyll, so these seem highly unlikely as the main sources of the singlet oxygen in our measurements. This is now stated more explicitly in L. 305 and Appendix 6.

      ii) According to the data in Appendix 6-figure 1B, singlet oxygen production in A. marina membranes shows a similar light saturation to that of maximal O2 evolution. This suggests that the singlet oxygen production we measure is related to PSII photochemistry. We have now stated this explicitly in L. 288-290.

      iii) Our thermoluminescence and delayed luminescence results indicate that in Chl-d-PSII the energy gap between Phe and QA is smaller than in Chl-a-PSII, as already suggested in the literature, and Chl-f-PSII. Therefore, this indicates more charge recombination going via repopulation of Phe- in Chl-d-PSII, with a consequent increase of singlet oxygen production.

      The antenna chlorophylls could form triplets under high light, by inter-system crossing, but in intact antennas the chlorophyll triplets are expected to be mostly quenched by nearby carotenoids (see https://www.jstor.org/stable/24030848 for a review on the subject). The generation of antenna triplet states in non-photoinhibitory conditions has been demonstrated in plant and algal thylakoids (Santabarbara et al 2002, 2007 doi: 10.1021/bi0201163, doi: 10.1016/j.bbabio.2006.10.007). Yet, these signals, which are attributed to a small population of damaged antennas, are small compared to those of triplets generated by charge recombination. Due to its apparently stochastic nature, the generation of antenna triplets by inter-system crossing is not expected to be significantly different between the different PSII complexes investigated in this study.

      On the other hand, it is generally recognised that in the PSII reaction centre, the carotenoid on the D1 side is not close enough to ChlD1 to directly quench its triplet state, when formed (see Telfer et al. 1994, https://doi.org/10.1016/S0021-9258(17)36825-4). The singlet oxygen produced in the reaction centre could disrupt the coupling between chlorophylls and carotenoids in the antenna, resulting in singlet oxygen production also from the antenna, in a cascade effect. This can happen with prolonged strong illumination (Fufezan et al. 2002, https://doi.org/10.1016/S0014-5793(02)03724-9).

      iv. I could not fully understand the singlet oxygen production experiments with tris-washed samples. In my opinion, the Mn-cluster depleted PSII should have accelerated charge recombination (100 ms between the YZ/QA, vs ~ 5 sec between the S2/QA), which should lead to an increase in singlet oxygen production. Correct me if I'm wrong about this, but if my reasoning is correct then how do the authors explain the discrepancy?

      Our rationale for performing the tris-washing experiment was indeed to see if this would lead to an increase in singlet oxygen production, thus implying that the high production in the A. marina samples could arise from a higher fraction of PSII centres without the Mn-cluster, as explained both in the main text and in Appendix 6. The fact that the treatment did not increase the singlet oxygen production suggests that this does not specifically arise from PSII lacking the Mn-cluster.

      The lack of singlet oxygen increase following tris-washing is not necessarily controversial, as the fact that TyrZ•QA- recombination is faster than S2QA- recombination does not necessarily imply that more of it occurs via backward electron transfer from QA- to Phe. The removal of the Mn-cluster could decrease the production of singlet oxygen by charge recombination, since it causes an increase in the redox potential of QA and, therefore, of the energy gap between Phe and QA, thus decreasing the probability of charge recombination going via the repopulation of Phe-. This is proposed to be a mechanism to protect PSII during photoactivation of the Mn-cluster (see Johnson et al 1995, https://doi.org/10.1016/0005-2728(95)00003-2).

      Our data show that the singlet oxygen production in A. marina is not specifically related to PSII lacking the Mn-cluster and are not in conflict with what is expected based on our knowledge of PSII energetics.

      v. The y-axes in Figure S10 should either contain "delta" (Δµmol O2 ml-1) or use the measured absolute oxygen concentration. I'd suggest the latter, since the reaction is oxygen consuming, it's good to show that all the samples started with similar amounts of dissolved oxygen. Low O2 levels could decrease 1O2 production, though this would be more of an issue with cells than membranes.

      The y-axis labels in the figures (now Figure 6-supplementary figure 1 and Appendix 6-figures 1D and E, 2, 3 and 4A) have been changed to Δµmol O2 ml-1. We prefer to show the traces after subtraction of the baseline recorded in the dark (now explicitly indicated in the corresponding figure legends) for a better visual comparison. All samples were left to equilibrate with air (stirred) before starting the measurements, so all started with similar levels of dissolved oxygen. This is especially important when measuring PSI-dependent oxygen consumption (Appendix 6-figure 3), because the addition of ascorbate and TMPD leads to a transient drop in oxygen concentration in the sample, which leads to artefacts in absence of the equilibration step. This information has been added to the corresponding Materials and Methods section (4.5). Additionally, when using Rose Bengal to generate singlet oxygen, the histidine-dependent oxygen consumption was about 10 times higher than in any of the measurements done with biological samples, and still we did not observe saturation of the signal in the illumination time used (added panel F in Figure 6- figure supplement 1). Therefore, we are confident that the singlet oxygen measurements in membranes and cells were not skewed by limiting oxygen concentrations in the measuring chamber.

      The y-axis labels of what is now Appendix 6-figure 1B and C have also been corrected (as ml-1 was used instead of h-1).

      Reviewer #3 (Public Review):

      In this manuscript, Viola and co-authors address the question of how far-red-light-adapted (FRL) Photosystem II (PSII) is able to bypass the "red limit", or the minimum photon energy/frequency for charge separation to proceed effectively. They attempt to do so primarily by measuring the consequence of failure to overcome the red limit: charge recombination. From this work they have concluded that FRL PSIIs are able to achieve similar efficiency of flash-induced water-oxidizing complex turnover to those adapted to standard visible light. However, they conclude that FRL PSII which uses chlorophyll-d is significantly more susceptible to charge recombination and singlet oxygen formation, leading to increased sensitivity to high-light conditions. FRL PSII which uses chlorophyll-f, however, is adapted to be more resistant to photodamage. These strategies are differentiated by the number and type of far-red chlorophyll used and tuning of redox potentials of cofactors in PSII.

      The methods employed are well-chosen to present complementary evidence to address the questions posed. The authors have supported themselves using polarography, fluorescence decay, absorption, luminescence and thermoluminescence, and spectrometry, all of which are employed in a manner well-established in the quantification of processes in standard PSII preparations. The results, however, have some loss of data such as total yields which would be useful in interpretation as the authors have chosen to extensively normalize data for ease of visual comparison of certain features.

      Overall, the authors have adequately achieved their aims and their conclusions are well-supported. The authors also clearly state their own expectations of the impact of their work at the end of the Discussion; thanks to these results, we can better understand the ecological niche of each type of FRL-PSII and how these significantly disparate systems may be used in future agricultural research and development.

      We thank the reviewer for the positive evaluation of our work.

      Following the reviewer’s suggestions, the total yields (on a chlorophyll basis) of the flash-dependent oxygen evolution have been provided in Figure 3- figure supplement 1. These include the flash-dependent oxygen evolution data measured in WL C. thermalis membranes, that were previously omitted because of the unsatisfactory quality, and are still omitted from Figure 3 (normalised data and fits) for the same reason. The S-state distributions calculated from the fits of the flash-dependent oxygen evolution have been added in Table 2.

      Additionally, the non-normalised oxygen evolution and consumption rates used for Figure 6A and Appendix 6-figure 4 are now provided in Appendix 6-table 1.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Answers to reviewers’ comments

      (Reviewers comments are in italics. Text modifications in the manuscript file are in blue.)

      Overall, we acknowledge referee’s careful reading of the paper and comments that we think have helped further improvement of the manuscript.

      On the attached pages are our detailed point by point responses to the referees’ comments along with a description of how the manuscript was modified in accordance.

      New data included:

      In response to the comments and suggestions of both reviewers 1 and 3, we conducted new experiments to test genetic interactions between different actors of the BMP and activin pathways. These new results confirm and complement the analyses described in the original manuscript. Furthermore, as suggested by reviewer 2, we have further studied the phenotypes of hiPSC-CM, by analyzing gene expression profiles and by analyzing the morphological changes induced as a result of PAX9 knockdown.

      NB: The title has been slightly modified, to highlight the conserved features of the genetic architecture of cardiac performance revealed in the study

      __Former title: __Genetic architecture of natural variation of cardiac performance in flies.

      __Novel title: __Genetic architecture of natural variation of cardiac performance: From flies to humans.

      Reviewer 1

      1. 1. The authors utilized the RNAi-mediated knockdown approach in their functional validation studies. It is not clear how each genetic variation (SNP) affects its associated genes. Could some of the SNPs activate the candidate gene expression? For the 4 candidate genes that failed to show cardiac defects, could the overexpression of these 4 genes alter cardiac performance? Answer 1- Of course, we cannot predict direction of the effect of the variants on the function of the genes. In this context, loss-of-function experiments are subjected to a risk of false negatives. It is indeed possible that in the case of a lack of effect of the loss of function, a gain of function could reveal an effect. But gain-of-function experiments are difficult to control, and often subjected to non-specific effects because it is complicated to control the level of over-expression compared to endogenous expression. This did not seem suitable for an extensive analysis of a large number of genes. We therefore chose to test only for loss of function.

      In addition, our approach to testing heart-specific RNAi aims to assess the quality of the association results by comparing RNAi for genes identified by GWAS to randomly selected genes. It is not intended to describe precisely the involvement of each gene individually.

      (See also answer to reviewer 2 comment n°2 and the modifications to the manuscript that have been made and which address these criticism)

      * 2. babo is the type I activin receptor, not type 2. *

      Answer 2- Thank you, we have corrected this error.

      • The authors show BMP and activin pathway genetically interacts to affect cardiac performance. But it is interesting to find that these interactions are in a trait-dependent manner. For example, it seems that babo and dpp epistatically interact to regulate FS, while they additively regulate HP and DI. The authors need to discuss the complex genetic interaction further. *

      Answer 3- See reply to reviewer 3, comment N°2 below.

      4*. Both snoo and sog are identified from GWAS. How about babo and dpp? Are there any identified SNPs associated with babo and dpp? *

      Answer 4- Considering GWAS for mean phenotypes, there is no variant in dpp that are within the 100 best ranked SNPs nor within the variants identified using fast epistasis. But given the size of the DGRP population we are far from being exhaustive, as we do not reach saturation. It is therefore difficult to comment on these ‘negative’ results. However, we do identify one variant in babo using fast epistasis (see figure 2B and Table S3).

      5. It is unclear why the mad KD behaves oppositely to dpp mutant, although both proteins are involved in BMP pathway. In Figure S5, the mad KD shows reduced FS and HP, but dpp LOF mutant shows increased FS and HP (Figure S4). Can the authors perform RNAi to knockdown dpp specific in the heart to reexamine the role of dpp in the regulation of cardiac function. The whole body LOF mutant dpp-d14 might not target cardiac tissue directly to control heart performance like mad KD.

      Answer 5- (see also answer to reviewer 3 comment n°2) We did perform heart specific dpp RNAi experiments together with other tests for interactions using new allelic combinations of activin and BMP pathways and therefore can compare heart specific knock down to heterozygotes for amorphic mutations for both dpp and mad.

      Regarding dpp, congruent effects on HP, DI, SI, ESD and EDD were observed between mutant and RNAi, while RNAi had opposite effects on FS compared to heterozygotes dppd14 mutants (decreased and increased FS compared to control, respectively). In the case of mad, heterozygous mutants had no effect on FS, EDD and ESD, but similarly to dpp mutants it increased SI, DI and HP. mad RNAi uniquely decreased HP, DI and SI and increased AI. However, similarly to dpp RNAi, it induced a decrease of FS.

      Thus, systemic versus heart specific knockdown of genes induce specific effects, suggesting cardiac non-autonomous interactions. This complex picture of TGFb involvement is now discussed in the result section (see below, Reviewer 3, major comment 2).

      6*. The authors selected two novel genes to study the conversed regulation in both flies and human iPSC cells. Besides testing these novel genes, the authors should also verify whether the conserved pathways, like TGF-beta, regulate heart performance in human iPSC cells similar to the flies. *

      Answer 6- We focused on poxm/Pax9 and sr/Egr2 because none of these TFs were known to have cardiac function in fly nor in mammals. Our paralleled analyses in fly and hiPS-CM illustrates how the description of the genetic architecture of cardiac traits in flies can accelerate discovery in mammals.

      There is extensive literature describing the involvement of TGF B /BMP and Activin pathways in heart development and diseases in humans, hence the choice not to focus on these pathways in iPS-CM.

      Reviewer 2:

        • It will be interesting to compare this fly GWAS to human heart disease GWAS data (for example, cardiomyopathy, arrhythmia, heart failure) from patients. Such cross comparison could make the data set more valuable. * Answer 1- We actually did make this comparison (Table 2, Table S11) and we agree it significantly validates our approach. This identified a set of orthologous genes associated with cardiac traits both in Drosophila and humans, supporting the conservation of the genetic architecture of cardiac performance traits, from arthropods to mammals.
      1. RNAi is the only experimental approach in this manuscript to validate the functional significance from data analyses. Authors may consider using genetic mutations such as deficiency lines or P-element lines to offer an alternative approach. This is simply a suggestion to improve the rigor and reproducibility, not absolutely required. *

      Answer 2- In an attempt to provide a consistent analysis of loss of gene function, our strategy was to concentrate our analysis on the effects of heart specific knock down. This allows us to compare -in a global way- the effects of the knock down of genes identified by GWAS to those of randomly selected genes.

      Our objective was to provide a global view of the heart specific effects of the identified genes, and not to characterize precisely the involvement of each of them, using a combination of mutant alleles, RNAi and gain of function. Given the experimental burden of analyzing cardiac function, such a strategy would have indeed required us to concentrate only a very small number of genes.

      We however recognize that this strategy has limitations:

      • Some variants may lead to gain-of-function effects of genes, and our strategy is not able to test for these effects.

      • Some variants may come from non-cell-autonomous effects, which would not be replicated by our targeted RNAi strategy in the heart.

      Therefore, the false negative rate of our experiments is difficult to estimate.

      We have tried to put this into perspective and to highlight the limitations of our analysis in the results section describing RNAi validation of GWAS results.

      “To assess in an extensive way whether mutations in genes harboring SNPs associated with variation in cardiac traits contributed to these phenotypes ….. (…)

      …… These results therefore supported our association results. It is important to emphasize that our approach is limited to testing the effect of tissue-specific gene knock down. Since some of the variants may lead to increased gene function and/or expression, this can lead to a false negative rate that is difficult to estimate. In addition, some of the associated variants may influence heart function by non cell-autonomous mechanisms, which would not be replicated by cardiac specific RNAi knock down.”

      *In order to validate the roles of predicted TF binding sites, the best approach would be introducing point mutations using CRISPR/Cas9 within the binding motif then testing out molecular and physiological outcomes. Rather authors chose to test indirectly to knock down those TFs. If so, authors need to at least acknowledge the potential caveats of such approach and the limitation in related data interpretation. *

      Answer 3- The reviewer is right, the definitive proof of the involvement of a potential TF binding site on the regulation of a gene located in cis requires to mutate the binding site and to analyze the effect on the expression of the corresponding gene. But this may not be sufficient to definitely demonstrate that the potential TF is indeed a regulator of that gene (the binding motif may be target of yet another TF): definitive proof may require motifs/TF DNA binding domain swaps. This would have been out of the scope of the present study. In addition, the effects on heart performance of mutating one TFBS at a time (among several dozens) may be too weak to allow their characterization with available tools and approaches.

      We acknowledge however that our approach provides an indirect validation of transcription factors binding sites predictions. This was, in our opinion, the most efficient way to evaluate the potential effect of predicted transcription factors.

      We clarify this in the result section:

      “We did not test individually the effects on cardiac performance of mutations in predicted TFBSs located near the SNPs because any individual effect would probably be too small to be detectable by the available methods. Rather, we tested the potential involvement of their cognate TFs by cardiac specific RNAi mediated KD”

      • hiPSC-CM data is somewhat limited by only showing the HR and AP duration data. It is recommended to include some immunocytochemistry data to show the morphology, sarcomere structure of these hiPSC-CMs. Gene expression data generated by qPCR or RNA-seq in particular focusing CM structure and function genes would be helpful too.*

      Answer 4- As suggested by referee 2, we have now performed gene expression analysis and immunostaining of PAX9 KD which gave the strongest phenotype in iPSC-CM (Figure 4 J-M). This unraveled increased expression of Na+ and K+ channels, which is in line with APD shortening phenotype, as well as down regulation of CASQ2, consistent with calcium transient shortening. Expression analysis also revealed increased sarcomeric genes and NPPA/B expression, which was consistent with increased CM size as quantified by the area of TNNT2 staining per nuclei.

      These new data are described at the end of the result section:

      “APD shortening for PAX9 KD was coincident with increased expression of Na+ and K+ ion channels (SCN5A, KCNH2 and KNCQ1) (Figure 4J), supporting the APD shortening phenotype. In this context, the AP kinetics also correlated with shorter calcium transient duration (Figure S8A-D and H-K), including faster upstroke and downstroke calcium kinetics and increased beat rate (peak frequency) (Figure S8E-G and L, M), consistent with decreased expression of Calsequestrin 2 isoform (CASQ2) associated with PAX9 KD (Figure 4J). Finally, assessment of the PAX9 KD effect on sarcomeric content revealed an increase in sarcomeric gene expression (Figure 4K), and an upregulation of genes associated with an hypertrophic response (NPPA, NPPB and NPR1 (Battistoni Et al Circulating biomarkers with preventive, diagnostic and prognostic implications in cardiovascular diseases, Int J Cardiol, 2012, vol. 157) which was coincident with increased CM size as quantified by the area of TNNT2 staining per cardiac nuclei (Figure 4 L, M).

      Collectively, these data illustrate conserved functions for poxm/PAX9 and sr/EGR2 in setting the cardiac rhythm and identify PAX9 as a novel and key regulator of cardiac performance at the cellular level, via the integrated regulation of expression of genes controlling electrophysiology, calcium handling and sarcomeric functions in hiPSC-CMs.”

      Reviewer 3

      Major Comments:

      1- There is an assumption in the use of RNAi knockdown to validate the genes identified in the quantitative analysis, and that is that natural variants are themselves hypomorphic. It is possible that among the variants identified some are hypermorphic, or among the transcription factor binding sites that variants lead to increased factor binding. While RNAi knockdown is an excellent choice to begin validation, I do not think the authors can rule out that a gene not functionally validated by their RNAi tests does not have a role in cardiac function.

      Answer 1. Please see our answers to reviewer 1 comment n°1 and reviewer 2 comment n°2.

      * 2- After performing RNAi knockdown to validate genes identified by GWAS the authors focus on the TGFbeta signaling pathway for downstream analysis. To do so they examine heterozygotes for sog, a repressor of BMP signaling, and snoo, an activator of Activin pathway. The data from the snoo/sog heterozygote is compelling in its disruption of heart phenotypes, and the authors conclude a "coordinated action of activin and BMP." snoo, however, also works as a transcriptional repressor in the BMP pathway, so it's possible that the effects the authors are seeing here could be confined to an increase in BMP signaling. Unlike snoo and sog, mutations in babo and dpp are both expected to have negative effects on Activin and BMP signaling, respectively. The babo/dpp interaction is not as quantitatively convincing as the snoo/sog data, despite the integral roles both babo and dpp play in their respective pathways. If both pathways are connected, why do snoo/sog heterozygotes affect SI phenotypes, while babo/dpp heterozygotes affect fractional shortening? I think the authors data suggest an interesting potential interaction between these pathways, which could be confirmed by examining further mutant combinations, knockdowns or increased expression transgenes, but falls short of a "confirmed synergystic genetic interaction." It does, however, underscore the value of the data in the paper for opening up new avenues for future study. *

      Answer 2 (and reviewer 1 comments 3 and 5).

      These comments led us to reconsider the analysis of the phenotypes associated with loss of function of the TGFb pathway, and to analyze other pathway components combinations.

      We acknowledge reviewer 3 criticisms on snoo/sog experiments, which are difficult to interpret given the broad action snoo may have on both BMP and activin pathways. We have addressed this in the result section.

      We have also analyzed other allelic combinations of BMP and activin pathways components, which strengthen the analysis performed on dpp/babo. Indeed, we tested babo/tkv heterozygotes (respectively specific activin and BMP receptors) and found significant genetic interactions for ESD and EDD. Albeit non-significant, babo/tkv double heterozygotes display a tendency to non-additive effects on FS (p= 0,054). mad/smox heterozygotes (respectively specific downstream TFs of BMP and activin pathways) display interactions (non-additive effects) on HP, SI, DI, ESD and EDD. These new results (Supplemental Figure 4) are thus supporting the hypothesis of genetic interactions between the pathways, but also reveal, as suggested by reviewer 3, a complex relationship between both pathways since interactions are revealed for specific traits in each of the mutant combinations analyzed.

      The phenotypes related to the individual loss of function of each of the actors of these pathways (dpp, tkv and mad for BMP; babo and smox for activin) are however very similar. When they have an effect, heterozygous amorphic alleles of these genes display increased phenotypes related to rhythmicity (HP, DI, SI, AI) and FS, but decreased cardiac diameters (ESD and EDD).

      Finally, as pointed out by reviewer 1, the picture is certainly even more complex since the phenotypes of RNAi mediated heart specific loss of function are not always similar to those of systemic loss of function. Indeed, mad RNAi causes a reduction of HP, DI, SI and FS (Figure S5) whereas heterozygotes for mad12 have either no or opposite effect on these phenotypes, and mad RNAi causes a significative increase in AI whereas mad12 has no effect (Figure S4). The discrepancy between tissue specific RNAi and heterozygous background was also found in the case of dpp, but specifically for the FS. Indeed, as suggested by reviewer 1 we have analyzed the loss of function of dpp by heart-specific RNAi. dpp RNAi results in a reduction of the FS (like mad RNAi) whereas the loss of function in the whole-body results in an increase of the FS.

      We therefore re-wrote the whole corresponding section of the results and modified Figure S4 to include babo/tkv; smox/mad and dppRNAi data.

      “We further focused on the TGFb pathway, since members of both BMP and activin pathways were identified in our analyses. We tested different members of the TGFb pathway for cardiac phenotypes using cardiac specific RNAi knockdown (Figure 2C), and confirmed the involvement of the activin agonist snoo (Ski orthologue) and the BMP antagonist sog (chordin orthologue). Notably, Activin and BMP pathways are usually antagonistic (Figure 2D). Their joint identification in our GWAS suggest that they act in a coordinated fashion to regulate heart function. Alternatively, it may simply reflect their involvement in different aspects of cardiac development and/or functional maturation. In order to discriminate between these two hypotheses, we tested if different components of these pathways interacted genetically. Single heterozygotes for loss of function alleles show dosage-dependent effects of snoo and sog on several phenotypes, providing an independent confirmation of their involvement in several cardiac traits (Figure S4). Importantly, compared to each single heterozygotes, snooBSC234/ sogU2 double heterozygotes flies showed non additive SI phenotypes (two-way ANOVA p val: 2,1 10-7) suggesting a genetic interaction (Figure 2E and Figure S4A). It is worth noting however that snoo is also a transcriptional repressor of the BMP pathway (PMID: 16951053). The effect observed in snooBSC234/ sogU2 double heterozygotes can therefore alternatively arise as a consequence of an increased BMP signaling without affecting the activin pathway. We thus tested other allelic combinations for loss of function alleles of BMP and activin pathways. babo/tkv heterozygotes (respectively activin and BMP type 1 receptors) displayed non additive ESD and EDD phenotypes (Figure S4C). Synergistic interaction of BMP and activin pathways was also suggested by the analysis of fractional shortening in loss of function mutants for babo and dpp, the BMP ligand (Figure S4B). Of note, babo/tkv double heterozygotes also displayed a tendency to non-additive effects on FS albeit non-significant (two-way anova p= 0,054). In addition, mad/smox heterozygotes (specifc downstream TFs of BMP and activin pathways) displayed non-additive effects on several traits, including phenotypes related to rhythmicity (HP, SI, DI) and contractility (ESD and EDD) (Figure S4D). Altogether, cardiac performance in response to allelic combinations of activin and BMP supported a coordinated action of both pathways in the establishment and/or maintenance of cardiac activity. This was further supported by the observation that simple heterozygotes for the tested loss of function alleles displayed similar trends with respect to cardiac performance, irrespective of the pathway considered (dpp, tkv and mad for BMP; babo and smox for activin). Indeed, they displayed either no effect or increased fractional shortening and rhythmicity phenotypes (HP, DI, SI, AI), and decreased cardiac diameters (ESD and EDD). This suggests coordinated activity of both pathways. Importantly, the genetic interactions were tested using amorphic alleles that lead to systemic loss of function. The observed phenotypes may thus not unravel cardiac specific effects of the pathways. In support of this, mad cardiac specific RNAi knock down was tested (see below, Figure S5) and lead to a decreased HP, DI, SI and FS whereas heterozygotes for mad12 have either no (FS) or opposite (HP, DI, SI) effect on these phenotypes (Figure S4D). Inversely, mad RNAi caused a significant increase in AI whereas mad12 had no effect. However, heart specific dpp RNAi knock down (Figure S4E) lead to similar phenotypic trends compared to dppd14 (increased HP, DI, SI, decreased EDD and ESD) with the notable exception of FS which was reduced following cardiac specific KD (Figure S4E), but increased in dppd14heterozygotes (Figure S4B). Taken together, these data point to a complex picture of TGFb pathway activity in regulating cardiac performance, involving both the activin and the BMP pathways as well as gene specific effects with both systemic and tissue-specific contributions.”

      *Minor Comments: *

      * There is an enormous amount of data in this paper, but there are places where things are summarized a little too briefly. For example, there are no definitions given at the beginning of the Results section for traits like "Heart Period" or "Systolic Interval," which would make this work significantly more accessible for other Drosophila researchers. (They do touch on this when they explain later in the paper that certain variants are "associated with quantitative traits linked to heart size and contractility" but more background earlier would be helpful.) When we consider heart performance traits, what is the baseline from known mutants? In other words, where is the line between variation and defect? *

      Answers:

      • We have detailed the description of the traits analyzed at the beginning of the result section. We hope this improves the ease of reading in the direction suggested by the reviewer. “7 cardiac traits were analyzed across the whole population (Dataset S1 and Table 1). As illustrated in Figure 1A, we analyzed phenotypes related to the rhythmicity of cardiac function: the systolic interval (SI) is the time elapsed between the beginning and the end of one contraction, the diastolic interval (DI) is the time elapsed between two contractions and the heart period (HP) is the duration of a total cycle (contraction + relaxation (DI+SI)). The arrhythmia index (AI, std-dev(HP)/mean (HP)) is used to evaluate the variability of the cardiac rhythm. In addition, 3 traits related to contractility were measured. The diameters of the heart in diastole (End Diastolic Diameter, EDD), in systole (End Systolic Diameter, ESD), and the Fractional Shortening (FS), which measures the contraction efficacy (EDD-ESD/EDD).“

      • With respect to the baseline of cardiac performance, there is no simple answer. The baseline is influenced by the genetic background and the experimental conditions. This is the reason why any analysis of mutants or RNAi is conducted in comparison with its own control, analyzed at the same time. Concerning the DGRP lines, no baseline can be defined, since the objective is to measure the diversity of cardiac performance traits within a natural population.

    1. For what purpose? So that the process of what Becker calls “self-transcendence” may begin. And he describes the process of self-transcendence this way: Man breaks through the bounds of merely cultural heroism; he destroys the character lie that had him perform as a hero in the everyday social scheme of things; and by doing so he opens himself up to infinity, to the possibility of cosmic heroism …. He links his secret inner self, his authentic talent, his deepest feelings of uniqueness … to the very ground of creation. Out of the ruins of the broken cultural self there remains the mystery of the private, invisible, inner self which yearned for ultimate significance. …This invisible mystery at the heart of [the] creature now attains cosmic significance by affirming its connection with the invisible mystery at the heart of creation. “This,” he concludes, “is the meaning of faith.” Faith is the belief that despite one’s “insignificance, weakness, death, one’s existence has meaning in some ultimate sense because it exists within an eternal and infinite scheme of things brought about and maintained to some kind of design by some creative force (90, 9 1).” This, then, is what we might call good faith, not a flight into some immortality system. And clearly, some Christians, some Buddhists–at least the Zen Buddhists Becker himself mentions!–have faith in this sense, a faith that Becker characterizes as growing out of tasting one’s own death, embracing one’s own nothingness, and affirming–not a known ultimate meaningful–but an “invisible mystery” of ultimate meaning.

      Embrace the mystery, the sacred - accepting that one will be gone forevermore is a mighty task as our culture teaches us to seek recognition. The last thing we want to be is unrecognized, a nobody. And yet, when we are dead and dissipated back into the rest of the world, that is exactly what we will become.

      But we have to accept that reality before we can build and think beyond it to a deeper possibility of meaning. Reality brought us forth to begin with. Every moment is already sacred.

    1. Author Response

      Reviewer #2 (Public Review):

      1) “…it was important that the output response was intimately linked to the bound state of the receptor, in this case the TCR, with ligand unbinding rapidly reversing all proofreading steps. This means that dissociation of a single TCR should disrupt signaling, and implicitly assumes a direct physical connection between the bound receptor and the KP modifications. However, this mechanism becomes much harder to argue when the KP steps are physically uncoupled from bound TCR, such as in LAT microclusters or DAG production.”

      We agree that signaling events in the kinetic proofreading chain must be linked to ligand unbinding. We have added discussion to the paragraphs beginning on page 20 line 440 of recent work from Yi et al. 2019 and Lo et al. 2018 suggesting a physical link between bound TCRs and LAT clusters. The full paragraphs are reproduced below.

      “The kinetic proofreading model requires all intermediate steps to reset upon unbinding of the ligand (Fig. 1A). This means that information about the receptor’s binding state must be communicated to all proofreading steps. If kinetic proofreading steps exist beyond the T cell receptor, how is unbinding information conveyed to these effectors? Importantly, there is evidence of physical proximity of LAT with the receptor. While TCR/Zap-70 and LAT/PLCγ microclusters form spatially segregated domains, these domains remain adjacent to one another (Yi et al., 2019). Lo et al. demonstrated that the protein Lck binds Zap-70 with its SH2 domain and LAT with its SH3 domain, potentially bridging the two signaling domains together and propagating binding information (Lo et al., 2018).

      An attractive reset mechanism is the segregation of CD45 away from bound receptors, creating spatial regions in which TCR and LAT associated activating events can occur (S. J. Davis & van der Merwe, 2006). Super-resolution microscopy by Razvag et al. measured TCR/CD45 segregated regions within seconds of antigen contact at the tips of T cell microvilli (Razvag et al., 2018). Upon unbinding, these regions of phosphatase exclusion collapse, allowing CD45 to dephosphorylate receptor ITAMs and LAT clusters. However, the rate of dephosphorylation for LAT and receptor ITAMs could differ. LAT clusters exclude CD45 in reconstituted bilayer systems, potentially limiting the dephosphorylation to LAT molecules at the edges of the cluster thus slowing reset (Su et al., 2016). The kinetics of multivalent protein-protein interactions within TCR and LAT clusters can also influence dephosphorylation and dissociation rates (Goyette et al., 2022).

      A CD45-mediated reset mechanism would restrict proofreading to membrane-bound signaling events occurring within a CD45-depleted region. Downstream events that dissociate away from the membrane or diffuse out of the segregated region could not directly participate in the proofreading chain, as the collapse of a CD45 segregated region could not reset signaling entities released into the cytosol (e.g. release of IP3 in the cleavage of PIP2 to DAG).”

      2) …The data clearly demonstrate a time delay between receptor binding and the measured outputs, but it is not so surprising that this lag would exist in propagating the signal through the intracellular network.

      We apologize for this point of confusion in our methodology. We are unable to measure the time lag between receptor binding and signal propagation through the network because our system is terminated by blue light. Binding is stochastically initiated much like native ligand/receptor interactions. The time values reported in our dataset are the average ligand binding half-lives of the LOV2 ligand under various intensities of constant blue-light illumination, as measured by separate in vitro kinetic washout experiments. Our model is fit to the steady-state signaling output achieved after a 3 minute exposure of cells to LOV2 ligands of an average ligand binding half-life enforced by constant blue light illumination. We clarify this point by including the following paragraphs beginning on page 8 line 170.

      “We are unable to control when binding events start since our optogenetic system is inhibited by blue-light, as opposed to being activated by blue-light. The initiation of binding after blue-light inhibition is a function of both the stochastic relaxation of inhibited LOV2 back into the binding-state as well as the diffusion of binding-state LOV2 from outside the previously illuminated area. Without temporal control over the start of binding, it is difficult to measure the time delay between ligand binding and a downstream signaling event (Yi et al., 2019). Such studies typically require careful single-molecule imaging of numerous stochastic binding events (Lin et al., 2019).

      To overcome this technical limitation of our system, we chose instead to measure the steady-state output of the antigen signaling cascade achieved several minutes after ligand binding. Kinetic proofreading systems behave differently than non-proofreading systems at steady-state. A non-proofreading system’s steady-state output is set by the number of ligand-bound receptors and not the binding half-lives of those ligands (Fig. 3D, left). In contrast, a kinetic proofreading system can produce different steady-state outputs in response to ligands of different binding half-lives, even when ligand densities are adjusted to achieve equivalent occupancy (Daniels et al., 2006) (Fig. 3D, right). Signaling events take varying amounts of time to occur after ligand binding (Lin et al., 2019; Yi et al., 2019). However, the temporal delays between steps are on the order of tens of seconds. By imaging the cells after minutes of constant exposure to a set ligand binding half-life, we measure the steady state output achieved at a signaling event in the cascade on a longer timescale than these delays (Tischer & Weiner, 2019).”

      3) The authors use a simple equation for KP to fit their datasets in Figure 4, equivalently to their previous work. However, no goodness-of-fit metric is provided for these fits, and by manual inspection it is hard to see the defining curves of their KP model in the datasets, especially not for LAT and DAG, where the datasets look much more like vertical bars. The estimated values of steps (n) may well be the best fit to the data, but they are not necessarily a 'good' fit.

      To assist readers in assessing how well our models fit our datasets, we have included heatmaps of the residuals from each model fit (Fig 4S3) on page 52, along with discussion (reproduced below) of the residual plots of regions where our models imperfectly capture our dataset on page 13 line 283.

      “To assess our model fits, we evaluated the residuals of each model subtracted from their respective dataset. For Zap70 recruitment, our model underestimates the degree of activation at moderate binding half-lives and receptor occupancies, as indicated by the positive region in the center of the heatmap. It is possible that Zap70 recruitment reaches saturation at shorter ligand binding half-lives than our model predicts (Fig. 4S3 A). For both LAT clustering and DAG generation, our models performed poorest in the region of lowest occupancy and shortest half-life (Fig. 4S3 B&C). In this region of our dataset, the fluorescent signal from bound LOV2 above the background fluorescence of unbound LOV2 is smallest. To compensate for fluorescence of unbound LOV2, we subtract off the local background fluorescence of unbound LOV2 around each cell. In doing so we may be underestimating the amount of LOV2 bound to each cell, leading to an underestimation of signaling output by the models. Future studies at LOV2 densities approaching single molecule would better capture this regime of receptor occupancy, but cell-to-cell variation in activation would be too high to be compatible with our current steady-state analysis (Lin et al., 2019).”

      4) The values of n are also very high, which would imply that the kp rate constant might be very fast to compensate; no estimates of this value are presented. Recent data from the Dushek lab (Pettmann et al, eLife 2021) measured n to be ~3, which seems much more physically realistic. Furthermore, in their previous published work, Tischer & Weiner measured n to be 2.7 for DAG production but in the present study it is now n=11.3, using the same equation

      We are unable estimate the kp rate constant, as our datasets are at steady state and do not provide temporal information. To assess the plausibility of our higher n value fits, we explored the steady-state model presented in Ganti et al. PNAS 2020, which defines a kp rate of 0.1 s-1. This model predicts the minimum number of signaling steps required to achieve a defined Hopfield error rate at defined cognate-ligand/self-ligand concentration and half-life ratios. Our exploration of this model is detailed in Fig. 4S4 on page 53 and detailed in discussion on page 14 line 299

      “In our previous work our model fit fewer (N=2.7) steps to DAG generation. We now fit a higher number of steps (N=11.3) to DAG generation. This change could be due to the incorporation of ICAM into our current study, which has been shown to potentiate ligand discrimination (Pettmann et al., 2021). Furthermore, our previous antibody-based adhesion may have short-circuited some proofreading steps by irreversibly holding the cell membrane close to the supported lipid bilayer. To evaluate if our higher value fits are indeed the best fit values for our datasets, we fit our model to each dataset while holding the value of N constant in the range of zero to fourteen steps, and evaluated the average residual value for each model fit (Fig 4S3 D). For all signaling steps, the fit value of N was near the minima of average residual and had a lower average residual value than a model with 3 proofreading steps.

      To assess the plausibility of a larger number of proofreading steps, we implemented the steady state kinetic proofreading model from Ganti et al. (Ganti et al., 2020). The model estimates the minimum number of proofreading steps required to discriminate between cognate-ligands and self-ligands with different binding half-lives present at a given concentration ratios at a given Hopfield error-rate (Hopfield, 1974). First, we evaluated what combinations of ligand half-lives and concentration ratios an 11-step kinetic proofreading network could discriminate at an error rate less than 10-3 (Fig 4S4 A). We chose the error rate of 10-3, as it is an order of magnitude less than the theorized 10-4 upper limit error rate of the native TCR (Ganti et al., 2020). At moderate half-life ratios, an 11-step network can discriminate cognate peptides present in small concentrations (e.g. 1 cognate-ligand per 1000 self-ligands at a half-life ratio of 6).

      In our optogenetic system, the ratio of the average ligand binding half-life between the longest suppressive half-life and the shortest fully activated half-life is about 2. However, an 11-step network is insufficient to discriminate between ligands with a half-life ratio of 2, even at the high ligand ratio of 1 (equal concentrations of cognate- and self-ligand). This suggests our cells are unlikely to be detecting the average ligand binding half-life of each blue-light condition, but are more likely detecting longer-lived binding events from the underlying distribution of half-lives. Another possibility is that our in vitro washout measurements, which measure average ligand binding half-lives of soluble ligands diffusing in three dimensions, differ from the half-lives of ligand-receptor interactions between the cell’s plasma membrane and the supported lipid bilayer diffusing in two dimensions (J. Huang et al., 2010).

      To better explore the kinetic proofreading model space, we generated heatmaps reporting the required number of steps to discriminate combinations of ligand and half-life ratios at an error rate of 10-3 (Fig 4S4 B). To discriminate between ligands with a half-life ratio of two, at least 14 steps are needed when the ligands are at equal concentrations, and more than 25 steps are needed if cognate-ligands are 1 per 1000 self-ligands. The required number of proofreading steps decreases rapidly as the half-life ratio increases, reaching a minimum of 8-steps needed for a concentration ratio of 1/1000 and a half-life ratio of 10, which is more in line with physiological half-life ratios between agonist and non-agonist peptides (M. M. Davis et al., 1998).

      After comparing our results with the Ganti model, this analysis suggest that our number of fit proofreading steps may be somewhat inflated as a function of our use the average ligand binding half-lives of three dimensional washout experiments in place of the two dimensional single molecule information T cells use to make activation decisions. However, the higher fit N values are more consistent with the required number of steps to discriminate ligands under more physiological conditions than our previous measurements of ~3 steps, which would not be expected to discriminate ligands with half-life ratio of 10 even at a ligand ratio of 1 (Fig 4S4 B, right).”

      5) If the fitted value of n provides no realistic insight into the KP mechanism, it should not be discussed as though it does.

      The many assumptions of our simplistic model likely results in error in determining the absolute number of fit proofreading steps. We feel the strength of our model lies in capturing the relative increase in the strength of proofreading as signal propagates through the cascade, and not determining the absolute number of proofreading steps, though it is comforting that our values are broadly consistent with the expectations of Ganti et al. To highlight the point that relative values are the most important feature of our experiments, we are open to normalizing our n fit values by the fit n of Zap70 for all discussion of our results and the proofreading strength increase shown in Fig 4D if the reviewers think this will better highlight the relative increase in proofreading strength.

      6) While it is good to confirm it, the result that downstream signaling complexes reset more slowly than distal ones is surely to be expected, given the increased number of steps over which ligand unbinding must traverse, as in their Erlang distribution. You would not expect ERK phosphorylation to decrease at the same rate as LAT cluster dissociation for this same reason. However, the fact that the lifetime of LAT clustering (14.2s) or ZAP70 (9.6s) is so different to LOV2 (3.3s) provides good evidence that it is not proofreading, as by definition the measured outputs should rapidly return to the 'unbound' state in line with ligand unbinding. At least for LAT, there must be a 'memory' from previous signalling lasting several seconds, which means the system has not reset, as required for true KP.

      Slower resetting of downstream signaling events in a kinetic proofreading cascade is not a given, as it could be the case that all events reset at the same rate. One requirement for kinetic proofreading is that events in the chain be irreversible on the timescale of the ligand binding half-life. The steps are reset through an orthogonal pathway, opposed to traversing back down a chain of reversible reactions. Both the TCR and LAT are dephosphorylated by the phosphatase CD45, and it would be possible for CD45 to dephosphorylate both proteins at the same rate (or even dephosphorylate LAT faster than the TCR). To clarify this point, we have expanded discussion on possible reset mechanism on page 21 line 451 as reproduced below

      “An attractive reset mechanism is the segregation of CD45 away from bound receptors, creating spatial regions in which TCR and LAT associated activating events can occur (S. J. Davis & van der Merwe, 2006). Super-resolution microscopy by Razvag et al. measured TCR/CD45 segregated regions within seconds of antigen contact at the tips of T cell microvilli (Razvag et al., 2018). Upon unbinding these regions of phosphatase exclusion collapse, allowing CD45 to dephosphorylate receptor ITAMs and LAT clusters. However, the rate of dephosphorylation for LAT and receptor ITAMs could differ. LAT clusters exclude CD45 in reconstituted bilayer systems, potentially limiting the dephosphorylation to LAT molecules at the edges of the cluster thus slowing reset (Su et al., 2016). The kinetics of multivalent protein-protein interactions within TCR and LAT clusters can also influence dephosphorylation and dissociation rates (Goyette et al., 2022).

      A CD45-mediated reset mechanism would restrict proofreading to membrane-bound signaling events occurring within a CD45-depleted region. Downstream events that dissociate away from the membrane or diffuse out of the segregated region could not directly participate in the proofreading chain, as the collapse of a CD45 segregated region could not reset signaling entities released into the cytosol (e.g. release of IP3 in the cleavage of PIP2 to DAG).”

      We also added discussion of recent work from Harris et al. quantifying the slower timescale of Ca++ and ERK reset upon TCR signal termination on Page 23 line 498 as reproduced below.

      “Recently Harris et al. quantified the reset rate of the downstream signaling events Ca++ release and ERK phosphorylation upon signal inhibition to be 29 seconds and 3 minutes respectively (Harris et al., 2021). They showed both Ca++ and ERK levels can persist across short inhibitions of signaling. What makes LAT clusters different than these persistent downstream events? The dissolution of LAT clusters is directly triggered by the unbinding of ligand from the TCR, and both the TCR and LAT are de-phosphorylated by CD45. To our knowledge, the rate of ERK dephosphorylation or cytosolic Ca++ depletion are not accelerated by TCR unbinding, and are turned over through constant rather than agonist-gated degradation. A useful future line of inquiry would be to quantify the reset rate for signaling steps throughout the cascade upon ligand unbinding versus orthogonal signal inhibition (e.g. kinase inhibition).”

    1. Author Response

      Reviewer #1 (Public Review):

      The paper presents a Bayesian model framework for estimating individual perceptual uncertainty from continuous tracking data, taking into account motor variability, action cost, and possible misestimation of the generative dynamics. While the contribution is mostly technical, the analyses are well done and clearly explained. The paper provides therefore a didactic resource for students wishing to implement similar models on continuous action data.

      First off, the paper is lucidly written - which made it a very pleasant read, especially compared to many other modeling papers, and the authors are to be congratulated for this. As such, the paper provides a valuable resource for didactic purposes alone. While the employed methods are not necessarily individually novel, the assembly of various parts into a coherent framework appears nonetheless valuable.

      Thank you for the positive evaluation!

      I have two major concerns, though:

      1). My main comment regards the model comparison using WAIC (Figure 4E) or cross-validation (Figure S4a): If we translate these numbers into Bayes factors, they are extraordinarily high. I assume that the p(x_i|\theta_s) in equation 7 are calculated assuming that the motor noise on u_{i,t} is independent? This would assume that motor processes act i.i.d with a timeframe of 60ms, which is probably not a very realistic assumption- given that much of the motor variability (as stated by the authors) comes likely from a central (i.e. planning) origin. Would the delta-WAIC not be much smaller if motor noise was assumed to be correlated across time points? Would this assumption change the \sigma estimates?

      Thank you for posing this question. First, sequential models tend to have much larger differences in the likelihood of parameters given data because of the large number of individual data points within a single sequence. Thus, it is not uncommon for model comparison to show much more extreme differences between models for sequential data, as is the case in the present manuscript.

      Second, since our computational framework is based on LQG control, the model indeed assumes that motor noise is independent across time steps. We agree that this assumption might not be realistic for time steps of 16ms duration. While this assumption is certainly a simplification, the assumption of independent noise across time steps is very common both in perceptual models as well as in models of motor control, and there is to our knowledge no computationally straightforward way around it in the LQG framework. It thus applies to all of the models considered in this paper, as they all assume temporally uncorrelated noise, both in perception and action. Therefore, the ranking between the models in the model comparison should hopefully not be affected in a systematic way favoring individual models disproportionately more than others, although the magnitudes of differences in WAIC might be smaller. Since the differences in WAIC are currently in the range of 1e4, we think that they will still be significant, even when accounting for correlated noise.

      Third, we think that the simplifying assumption of independent noise does not invalidate the calculation of the WAIC, which assumes independence across trials. The p(x_i | theta_s) in equation (8) are the likelihoods of whole trials. To compute them, we assume independence of the motor noise across time steps.

      We have added a short passage in the subsection ‘model comparison’:

      “Note that the assumption of independent noise across time steps might lead to WAIC values that are larger than those obtained under a more realistic noise model involving correlations across time. However, this should not necessarily affect the ranking between models in a systematic way, i.e. favoring individual models disproportionately more than others.”

      and a passage in the discussion that points out that modeling the noise as being independent across time points is a simplifying assumption:

      “Finally, assuming independent noise across time steps at the experimental sampling rate of (60Hz) is certainly a simplifying assumption. Nevertheless, the assumption of independent noise across time steps is very common both in models of perceptual inference as well as in models of motor control, and there is to our knowledge no computationally straightforward way around it in the LQG framework.”

      2). While the results in Figure 4a are interesting, the deviation of the \sigma estimates from the standard psychophysical estimates for the most difficult condition remains unexplained. What are the limits of this method in estimating perceptual acuity near the perceptual threshold? Is there a problem that subjects just "give up" and the motor cost becomes overwhelming? Would this not invalidate the method for threshold detection?

      We fully agree that for the most difficult conditions at the lowest contrasts all sequential models we considered are biased with respect to the uncertainties obtained with the 2AFC experiment, which is supposed to be equivalent. Interestingly, when considering synthetic data, we did not see such a discrepancy. Thus, the observed bias points towards an additional mechanism such as a computational cost or computational uncertainty, that is not captured by the current models at very low contrast.

      For the results in Fig. 4, we assumed a constant behavioral cost across all conditions. The assumption that the cost is independent of perceptual uncertainty might not hold in reality, exactly in line with your hypothesis that subjects might just "give up". There are other possible explanations, though, that could potentially be relevant here. For example, the visual system is known to integrate visual signals over longer times, when contrast is lower. This may introduce additional non-linearities in the integration, which could affect the sensitivity, as already pointed out in the study by Bonnen et al. (2015).

      We have added the following passage in the discussion section:

      “In the lowest contrast conditions, all models we considered show a large and systematic deviation in the estimated perceptual uncertainty compared to the equivalent 2AFC task. Note that when considering synthetic data, we did not see such a discrepancy. Thus, the observed bias points towards additional mechanisms such as a computational cost or computational uncertainty, that are not captured by the current models at very low contrast. One reason for this could be that the assumption of constant behavioral costs across different contrast conditions might not hold at very low contrasts, because subjects might simply give up tracking the target although they can still perceive its location. Another possible explanation is that the visual system is known to integrate visual signals over longer times at lower contrasts [Dean & Tolhurst, 1986; Bair & Movshon, 2004], which could affect not only sensitivity in a nonlinear fashion but could also lead to nonlinear control actions extending across a longer time horizon. Further research will be required to isolate the specific reasons.“

      Reviewer #2 (Public Review):

      This manuscript develops and describes a framework for the analysis of data from so-called continuous psychophysics experiments, a relatively recent approach that leverages continuous behavioral tracking in response to dynamic stimuli (e.g. targets following a position random walk). Continuous psychophysics has the potential to dramatically improve the pace of data collection without sacrificing the ability to accurately estimate parameters of psychophysical interest. The manuscript applies ideas from optimal control theory to enrich the analysis of such data. They develop a nested set of data-analytic models: Model 1: the Kalman filter (KF), Model 2: the optimal actor (which is a special case of a linear quadratic regulator appropriate for linear dynamics and Gaussian variability), Model 3: the bounded actor w. behavioral costs, and Model 4: the bounded actor w. behavioral costs and subjective beliefs. Each successive model incorporates parameters that the previous model did not. Each parameter is of potential importance in any serious attempt to human model visuomotor behavior. They advertise that their methods improve the accuracy the inferred values of certain parameters relative to previous methods. And they advertise that their methods enable the estimation of certain parameters that previous analyses did not.

      What were the parameters? In this context, the Kalman filter model has one free parameter: perceptual uncertainty of target position (\sigma). The optimal actor (Model 2) incorporates perceptual uncertainty of cursor position (\sigma_p) and motor variability (\sigma_m), in addition to perceptual uncertainty of target position (\sigma) that is included in the Kalman filter (Model 1). The bounded actor with behavioral costs (Model 3) incorporates a control cost parameter (c) that penalizes effort ('movement energy'). And the bounded actor with behavioral costs and subjective beliefs (Model 4) further incorporates the human observer possibly mistaken 'beliefs' about target dynamics (i.e. how the human's internal model of target motion differs from the true generative model. Model allows for the true target dynamics (position-random-walk with drift = \sigma_rw) to be mistakenly believed to be governed by a position-random-walk with drift = \sigma_s plus a velocity-random-walk with drift = \sigma_v).

      The authors develop each of these models, show on simulated data that true model parameters can be accurately inferred, and then analyze previously collected data from three papers that helped to introduce the continuous psychophysics approach (Bonnen et al. 2015, 2017 & Knoll et al. 2018). They report that, of the considered models, the most sophisticated model (Model 4) provides the best accounting of previously collected data. This model more faithfully approximates the cross-correlograms relating target and human tracking velocities than the Kalman filter model, and is favored by the widely applicable information criterion (WAIC).

      The manuscript makes clear and timely contributions. Methods that are capable of accurately estimating the parameters described above from continuous psychophysics experiments have obvious value to the community. The manuscript tackles a difficult problem and seems to have made important progress.<br /> Some topics of central importance were not discussed with sufficient detail to satisfy an interested reader, so I believe that additional discussion and/or analyses are required. But the work appears to be well-executed and poised to make a nice contribution to the field.

      The manuscript, however, was an uneven read. Parts of it were very nicely written, and clearly explained the issues of interest. Other parts seemed organized around debatable logic, making inappropriate comparisons to--and misleading characterizations of--previous work. Other parts still were weakened by poor editing, typos, and grammatical mistakes.

      Overall, it is a nice piece of work. But the authors should provide substantially more discussion so that readers will develop a better intuition and how and why the inference routines enable accurate estimation, and how the values of certain parameters trade off with one another. Most especially, the authors should be very careful to accurately describe and appropriately use the previous literature.

      Thanks for the generous overall assessment and the thorough review! We hope that we can address the points you raised in our revised manuscript with the answers to your specific comments below.

      To summarize, we have substantially revised the discussion section to clarify our reasoning and avoid potential misinterpretations of parts of our manuscript as a misrepresentation of previous work. We have also extended the introduction and the exposition of our models in the results section to help readers develop an intuition about the models and inference routines.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes a systematic biochemical analysis of UBX proteins in facilitating protein unfolding by the p97-UFD1-NPL4 (referred to here is the p97 complex). The p97 complex binds Ub and unfolds it to allow the ubiquitylated protein to be translocated into the p97 ATPase pore for unfolding. This paper demonstrates that UBX proteins are able to reduce the necessary ubiquitin chain length in order to support unfolding by p97. They explore this using ubiquitylated CMG helicase as a substrate. Removal of CMG helicase from replicated DNA is required for completion of DNA synthesis.

      First the authors demonstrate that the p97 complex only only unfolds CMG with very long Ub chains. The then show that the high threshold for Ub is reduced when UBXN7, FAF1 or FAF2 are added. These proteins bind to both the p97 complex and Ub in substrates. This is then followed up in cells by demonstrating that removal of UBXN7 and FAF1 reduces CMG disassembly and is synthetic with reduced CMG ubiquitin ligase activity.

      The conclusion that human p97 requires UBX proteins to support unfolding/segregase activity when Ub chains are short would be strengthened by more precise characterization of the length of ubiquitin chains being studied, as the methods do not precisely determine the chain lengths and how this is overlapping with the number and location of primary ubiquitylation sites on Mcm7.

      Please see our reply above to essential revision point 2 (data in Figure 1-figure supplement 1 and Figure 2-figure supplement 3)

      The in cellulo results, while consistent with a contributing role for FAF1 and UBXN7 in disassembly of the CMG by p97, indicate that either other factors are required in cells or that p97 can disassemble CMG with relative short chains in cells without the need for the UBX proteins. This needs to be reconciled with the proposed model.

      We now discuss on lines 444-450 that CMG disassembly in the absence of UBXN7 and FAF1 might be promoted by additional UBX proteins not characterised in this study, or else be due to extensive CMG-MCM7 ubiquitylation that bypasses the requirement for UBX proteins (as predicted by our data in Figure 1). Note that short ubiquitin chains on CMG-MCM7 in cells treated with p97 inhibitor need to be interpreted with caution, as it is likely that p97 inhibition lowers the pool of free ubiquitin in cells. This point is discussed on lines 444-445 of the revised manuscript.

      Reviewer #3 (Public Review):

      The ATPase p97 (Cdc48 in yeast) unfolds ubiquitinated substrates with the help of its heterodimeric cofactor UFD1-NPL4 (U-N). Using the previously established CMG helicase complex as model substrate in a fully reconstituted biochemical assay, Fujisawa and Labib show that p97-U-N can efficiently disassemble the helicase complex only when it is modified with multiple, long ubiquitin chains. This is in contrast to the yeast Cdc48-U-N complex, which disassembles helicase complexes carrying long or short (6-10 ubiquitin moieties) chains with similar efficiency. The authors demonstrate that the requirement of p97-U-N for long chains can be overcome by the presence of p97 cofactors of the UBA-UBX type, including UBXN7, FAF1, FAF2 and (much less so) UBXN1. They show that this reduction in the 'ubiquitin threshold' of p97-U-N by UBXN7, FAF1 and FAF2 requires their UBX domain mediating p97 binding. They further show that the UBA and UIM domains of UBXN7 contribute to its activity in the assay, whereas the UBA domain of FAF1 and FAF2 is dispensable. Instead, a coiled-coil domain preceding the UBX domain of FAF1 and FAF2 is required for their activity, and both the coiled-coil-UBX domain organization and its activity are conserved in the worm homologue UBXN-3. Using UBXN7 and FAF1 knockout cells, Fujisawa and Labib then demonstrate that UBXN7 is required for efficient CMG helicase disassembly during S phase, with a minor contribution of FAF1, whereas both cofactors possess redundant roles in mitotic CMG helicase disassembly. Finally, the authors show that UBXN7 and FAF1 double knockout cells are hypersensitive to the NEDDylation inhibitor MLN4924 and suggest that this reflects their importance for p97-U-N unfoldase activity under conditions of restricted ubiquitination activity.

      This manuscript describes the intriguing observation that the yeast and mammalian Cdc48/p97-U-N complexes have distinct requirements, at least in the in vitro assay used, with respect to the substrate´s ubiquitination state and to the presence of additional cofactors. While the concept of UBA-UBX cofactors assisting/stimulating Cdc48/p97-U-N activity is well-established, their link to ubiquitin chain length is novel and unexpected. The experiments are performed to a high technical standard, and the conclusions are mostly supported by the data. However, a shortcoming of the paper is that it remains entirely descriptive regarding the effect of the UBX proteins on the ubiquitin threshold, without providing mechanistic insights into their function or the molecular basis underlying the distinct thresholds.

      1) It remains unclear if the failure of p97-U-N to disassemble the helicase complex carrying short ubiquitin chains reflects impaired binding, priming or translocation of the substrate. It should be straightforward to test if the UBA-UBX cofactors simply stabilize the p97-U-N-substrate complex.

      As shown in previous studies, human UFD1-NPL4 bind stably to p97 in the absence of UBX proteins (our new data in Figure 3-figure supplement 2D illustrate this).

      The distinct domain requirements for UBXN7 (UBA, UIM, UBX) and FAF1/FAF2 (coiled-coil-UBX) suggest different mechanisms of stimulation, which should be discussed in more detail.

      We discuss further the roles of UBXN7 and FAF1/FAF2 on lines 533-548.

      The additive defects of the UBXN7 and FAF1 double knockout cells could indicate either redundant functions (as the authors propose) or synergistic function of both cofactors. To that end, the authors could test if UBXN7 and FAF1 can bind simultaneously to the same p97-U-N-substrate complex and if they act synergistically in helicase disassembly, e.g. at limiting cofactor concentrations.

      Previous studies have found that UBXN7 binds to p97 and UFD1-NPL4 with a 1:6:1 ratio and the same is true for FAF1, without any evidence of both UBXN7 and FAF1 binding to the same p97-UFD1-NPL4 complexes (Hanzelmann et al., 2011). Correspondingly, we did not observe any synergistic effect of FAF1 with UBXN7 upon the disassembly of ubiquitylated CMG by p97-UFD1-NPL4, when comparing reactions with a single UBX protein or reactions with both (our unpublished data).

      2) Having all purified proteins at hand, the authors should test which component of the system causes the elevated ubiquitin threshold of mammalian p97-U-N, by combining yeast Cdc48 with mammalian U-N and vice versa, etc.

      We thank the reviewer for this very interesting suggestion. The data are presented in Figure 3, showing that human UFD1-NPL4 and yeast Ufd1-Npl4 set the ubiquitin threshold for their cognate unfoldase enzymes.

      Can yeast Ubx5, which is a clear homologue of UBXN7, substitute for the mammalian UBA-UBX cofactors?

      This was also an interesting suggesting – we tested Ubx5 and didn’t see any stimulation. We didn’t include the data as we lack a positive control for Ubx5 activity.

      3) The authors emphasize that mammalian p97-U-N in the absence of UBA-UBX cofactors requires long ubiquitin chains for activity. However, they should consider the possibility that the critical property is chain topology, rather than chain length. There is evidence that p97-U-N prefers substrates with branched chains (see PMIDs 28512218, 29033132), and multiple ubiquitin chains on the helicase substrate may mimic those.

      We thank the reviewer for raising this important point and we now cite the two papers mentioned above, on lines 171 and 177.

      In the revised version of the manuscript, we characterise carefully the ubiquitin chains that are formed under the various conditions used (Figure 1-figure supplement 1). Importantly, we also show that human p97-UFD1-NPL4 can disassemble highly ubiquitylated CMG, regardless of whether there are several or just one ubiquitin chains attached to CMG-Mcm7 (Figure 1-figure supplement A+C; Figure 2-figure supplement 3A).

      Moreover, we also show that human p97-UFD1-NPL4 is comparable to yeast Cdc48-Ufd1-Npl4 in being able to disassemble CMG that is highly ubiquitylated with ‘K48-only’ ubiquitin that cannot form mixed chain linkages (Figure 2-figure supplement 3B).

      These data indicate that p97-UFD1-NPL4 can disassemble heavily ubiquitylated CMG complexes with long K48-linked ubiquitin chains on CMG-Mcm7, regardless of the number of chains and regardless of the presence of other chain linkages (in addition to K48-linked chains).

      It appears that worm CDC48-U-N in the absence of UBXN-3 cannot efficiently disassemble substrate carrying even long chains (Fig. 3 - supplement 2). The authors should discuss this finding in the context of their ubiquitin threshold model.

      This is an interesting point, suggesting that the threshold of C. elegans CDC-48_UFD-1_NPL-4 is even higher than human p97-UFD1-NPL4, in the absence of UBX proteins. However, we think that this issue is beyond the scope of our manuscript and likely requires structural biology to provide a definitive explanation. Our manuscript just uses the C. elegans enzymes to make one simple and clear point – namely that the essential role of the coiled coil domain of human FAF1 is conserved in its worm orthologue UBXN-3.

    1. Author Response

      Reviewer #1 (Public Review):

      This work by Wei-Jia Luo and colleagues elegantly employs in vitro and in vivo models to demonstrate that within the mouse liver, macrophages respond to lipopolysaccharide (LPS) by releasing active IL-12 (IL-12p70), which is a heterodimer of IL-12p35 and IL-12p40. They observed that the availability of "free" IL-12p35 to this heterodimerization process is governed by the molecular chaperone HLJ1. In response to LPS, HLJ1 separates homodimerized IL-12p35 into monomers, which then can heterodimerize with IL-12p40 to form active IL-12p70. This active IL-12 is released from macrophages in the liver, which then act on neighboring natural killer T cells to release interferon gamma. This interferon gamma circulates systemically and is responsible for mortality in a mouse model of endotoxic shock.

      Overall, this work is mechanistically compelling and demonstrates a novel multicellular inflammatory pathway that contributes to death in a murine model of endotoxic shock. However, it is unclear if the observed pathway is limited to this highly reductionist model, or if it applies to models that better approximate the complexity of human sepsis. Indeed, the long-standing concept of "cytokine storm" as the major mediator of sepsis has largely failed to yield benefits in clinical trials. These numerous and repeated translational failures cast doubt on the translational validity of reductionist in vivo animal models of sepsis.

      Thank the reviewer’s affirmation. One of the major aims of our work is to identify a novel multicellular inflammatory pathway mediated by HLJ1 that contributes to endotoxic shock. We agree that although our understanding of cytokine storm as the major mediator of sepsis had made dramatic progress over the past decade, these findings could not translate yet into effective treatments. As the reviewer mentioned, almost all clinical trials targeting cytokine effects failed, especially in the context of sepsis. We also know that among several explanations, the appropriateness of in vivo animal models should be concerned (Chousterman et al., 2017). Some approaches to treat cytokine storm were aimed to target the direct tissue consequences of inflammation cascade such as the blood vessel (London et al., 2010). Another possible strategy to treat cytokine storm was to target signaling that promotes cytokine synthesis and secretion (Maceyka et al., 2012). It may be feasible to quell the cytokine storm after infection by targeting upstream signaling, and reducing cytokine synthesis as well as secretion is a valid alternative to direct cytokine antagonism (Chousterman et al., 2017). Furthermore, in this study we found Hlj1−/− mice showed reduced IFN-g and improved survival when treated with daily systemic antibiotics after CLP surgery (Figure 6), indicating that targeting cytokine storm in combination with antibiotics provides a promising therapeutic strategy to treat sepsis. Combined, we think HLJ1-targeting strategy might be a potential therapy to treat cytokine storm-associated sepsis. We emphasized and discussed the concept in the Discussion of our revised manuscript (Page 19, line 441-453).

      We highly appreciated the reviewer #1 and other reviewers raised the same issue. We worked hard and attentively to response comments point-by-point below.  

      This raises several specific concerns with regard to the model used by the investigators:

      (1) The authors use a massive dose of LPS that rapidly leads to the death of mice in 24 hours. This massive and rapid mortality is not consistent with human sepsis, which is a more crescendo course with a mortality of ~30%. Indeed, when the authors used a more clinically-relevant model of mild endotoxemia, HLJ1 appeared to have no impact on mortality (Figure 1A).

      Thank for the comment. Indeed, since we observed HLJ1 knockout mice could survive from high dose of LPS, we use 20 mg/kg LPS to perform the subsequent experiments based on these obvious and significant phenomena. We also recognized the importance of administration of low dosages of LPS. To address this issue, we performed additional experiments and made some revisions point-by-point.

      i. Because 4 mg/kg is a common non-lethal dosage to induce TLR4 and IFN-γ signaling (Kunze et al., 2019; Malgorzata-Miller et al., 2016), we performed additional experiment with 4 mg/kg LPS according to the editor’s suggestion. As a result, Hlj1−/− mice showed lower serum levels of BUN, creatinine and ALT and thus less severe organ damage than Hlj1+/+ mice after 4mg/kg LPS treatment. The data are showed in Figure 1C and D of revised Figure 1 (Figure 1).

      ii. We also performed ELISA test and found that serum levels of IFN-γ were lower in Hlj1−/− mice than in Hlj1+/+ mice after 4 mg/kg LPS injection. The result is in Figure 2C of revised Figure 2 (Figure 2).

      iii. Combined, this result indicated the effect of HLJ1 deletion on reducing IFN-γ and alleviating organ injury can also be found during moderate endotoxemia. We described and discussed the result in the revised manuscript (Page 6, line 134-141; Page 18, line 423-437)

      (2) LPS is a model of endotoxemia, not a model of sepsis. Accordingly, it is unclear if the protective benefit of blocking IL-12 will similarly be seen as a live-infection model of sepsis, in which inflammatory signaling may be necessary for pathogen clearance.

      Thank the reviewer for raising these critical issues and providing valuable suggestions. This issue was also mentioned by other reviewers. Although the LPS-induced endotoxemia is a simple model with higher reproducibility and reliability comparing to other sepsis models, it indeed cannot represent actual sepsis and is based on the notion that it is the host’s response to bacteria but not the pathogen itself, that leads to mortality and organ failure (Deitch, 2005). Therefore, according to the reviewers’ suggestion, we performed additional live-infection model of sepsis including cecal ligation and puncture (CLP) which resembles clinical disease and septic shock (Deitch, 2005) to reassure the importance of HLJ1 on sepsis. As a consequence, we found IFN-γ expression was lower in liver and spleen of Hlj1−/− mice comparing to Hlj1+/+ mice (Figure 6A and B). We analyzed serum markers of organ dysfunction and Hlj1−/− mice showed lower serum levels of BUN, creatinine and AST (Figure 6C). H&E staining showed kidney injury at the histology level after CLP surgery, while Hlj1−/− mice showed less severe kidney injury than Hlj1+/+ (Figure 6D). We further found Hlj1−/− mice showed significantly improved survival compared to Hlj1+/+ mice when mice were treated with systemic antibiotics (Figure 6E). Combined, we demonstrated the effect of HLJ1 deletion on attenuation of CLP-induced sepsis with down-regulated IFN-γ, and concluded that the benefit of blocking IL-12 and HLJ1 can similarly be seen as a live-infection model of sepsis. The result is showed as below (revised Figure 6). The corresponding result was also added in the revised manuscript (Page 11-12, line 268-286). Please check it as well as the above responses to other reviewers.

      Page 11-12, line 268-286 "HLJ1 deletion protect mice from CLP-induced organ dysfunction and septic death To address the question whether HLJ1 also regulates IFN-γ-dependent septic shock in live infection model, we performed CLP (cecal ligation and puncture) surgery which more resembles clinical disease and human sepsis. CLP significantly induced transcriptional levels of IFN-γ in the liver of Hlj1+/+ mice comparing to mice receiving sham surgery while Hlj1−/− mice showed significantly lower IFN-γ mRNA than Hlj1+/+ mice (Figure 6A). This phenomenon was not restricted to the liver since lower expression of splenic IFN-γ was also found in Hlj1−/− mice (Figure 6B). The CLP surgery resulted in serious renal and liver damage while Hlj1−/− mice showed alleviated organ dysfunction with significantly lower serum levels of BUN, creatinine and AST (Figure 6C). H&E staining showed kidney injury at the histology level after CLP, while Hlj1−/− mice showed less severe kidney injury than Hlj1+/+ mice (Figure 6D). However, there was no significant difference in survival when comparing Hlj1+/+ and Hlj1−/− mice (Figure 6E). We hypothesized that severe bacteremia contributed to mortality in mice that did not receive any treatment, so we treat mice with systemic antibiotics. As a result, Hlj1−/− mice displayed significantly improved survival compared with Hlj1+/+ mice when mice received daily systemic antibiotics after CLP (Figure 6E). These results implied the agent responsible for bacteria clearance can be combined with immune modulation such as HLJ1 targeting to improve the outcome of sepsis."

      (3) Finally, it is unclear if the findings are only relevant to mice, or if they also have relevance to humans.

      We admit human studies is important, while there are some objective difficulties need to be overcame; for example, cohort identification, individual variation, and clinical considerations. This is our limitation since our findings were only based on animal models and human cell lines. We further performed CLP experiments which is more relevant to human sepsis, while it is not a true human study. That had been added as Figure 6 of our revised manuscript (Figure 6). Actually, based on the present result, we plan to initiate some specific clinical human studies. For example, we plan to collect blood monocytes from critically ill patients from ICU to see whether HLJ1 expression levels in monocytes is higher in patients with sepsis than in patients without sepsis. On the other hand, we also want to know whether HLJ1 expression levels in monocytes or in serum are correlated to inflammatory markers such as C-reactive protein, procalcitonin, and lactate in sepsis patients, because we found serum levels of HLJ1 correlated to IL-12 in mouse. In our unpublished preliminary result, HLJ1 can be detected in serum of patients with sepsis. This inspires us to investigate whether HLJ1 can be a diagnostic or prognostic marker in the future. We anticipate these results can be in our future publications. Thank you very much for your understanding.  

      Reviewer #2 (Public Review):

      The authors show that HLJ1 converts misfolded IL-12p35 homodimers to monomers, which maintains bioactive IL-12p70 heterodimerization and secretion. In turn, this contributes to increased IL-12 activity, leading to enhanced IFN-gamma production and lethality in mice challenged with LPS to model sepsis.

      Strengths:

      • Huge and diverse dataset (e.g. in vivo, in vitro, single cell RNAseq, adoptive transfer etc.) with interesting findings that could be of relevance to the field.

      We deeply thank the reviewer for the affirmation. We hope our comprehensive dataset can provide a novel insight of relevance to the field. With this information, we also keep investigating the underlying molecular alteration resulting from endotoxin-induced immune responses. Thank you very much. At the mention of our weaknesses raised by the reviewer, we totally agreed on it and take it very seriously and revised point-by-point. Thank you very much.

      Weaknesses:

      • The flow/narrative of the paper is very hard to follow. This may result from the fact that the order of presented results is a bit puzzling. Normally, one would add-in the cytokine results (now figure 3), after the survival curves in Figure 1. Furthermore, the flow cytometry data presented in Figure 4 is more or less a validation of the scRNAseq data presented in Figure 2 in another organ. Likewise, Figure 5 is sort of a validation of Figure 3 in another organ. The authors seem to jump from organ to organ, from in vivo to in vitro and vice-versa all the time which makes the paper extremely difficult to follow.

      Thank the reviewer for the valuable suggestion. Actually, we were also hesitant to this arrangement in our first submission. We rearranged our results so that the flow/narrative of the paper can be easier to follow:

      1. We moved the result of figure 3 to become figure 2 so that the cytokine array results would after the survival curve results.

      2. The flow cytometry result presented in Figure 4 was moved to Figure 5 so that it would after the result of sc-RNA sequencing.

      3. The qPCR result of pro-proinflammatory cytokines presented in figure 5 was moved to Figure 2-figure supplement 1 so that it would be a validation of cytokine array in another organ.

      In addition, along with other suggestions from reviewers, we have rewritten the introduction and the discussion sections and reorganized whole manuscript so that we can focus more on important issues. All the modification and rearrangement can be checked in the revised manuscript with changes tracked. Please check our revised manuscript. Thank you for your kind suggestions.

      • Use of extremely high dosages of LPS.

      Thank for the comment. This issue had been raised by several reviewers and the editor. Indeed, since we observed HLJ1 knockout mice could survive from high dose of LPS, we use 20 mg/kg LPS to perform the subsequent experiments based on this obvious and significant phenomenon. We also recognized the importance of administration of low dosages of LPS. To address this issue, we performed additional experiments and made some revisions point-by-point.

      i. Because 4 mg/kg is a common non-lethal dosage to induce TLR4 and IFN-γ signaling (Kunze et al., 2019; Malgorzata-Miller et al., 2016), we performed additional experiment with 4 mg/kg LPS according to the editor’s suggestion. As a result, Hlj1−/− mice showed lower serum levels of BUN, Creatinine and ALT and thus less severe organ damage than Hlj1+/+ mice after 4mg/kg LPS injection (Figure 1C). H&E staining showed kidney injury at the histology level after LPS treatment, while Hlj1−/− mice showed less severe kidney injury than Hlj1+/+ mice (Figure 1D). The data are showed in Figure 1C and D (in below) of revised Figure 1 (Figure 1).

      ii. We also performed ELISA test and found that serum levels of IFN-γ were lower in Hlj1−/− mice than in Hlj1+/+ mice after 4 mg/kg LPS injection. The result is in Figure 2C (in below) of revised Figure 2 (Figure 2).

      iii. Combined, this result indicated the effect of HLJ1 deletion on reducing IFN-γ and alleviating organ injury can also be found during moderate endotoxemia. We described and discussed the result in the revised manuscript (Page 6, line 134-141; Page 18, line 423-437)

      • Much of the presented data is replication of previous work. For instance, neutralization of IFN-γ (e.g. Billiau et al., Eur. J. Immunol. 1987; Car et al. J. Exp. Med. 1994) and anti-IL-12 (e.g. Zisman et al., Shock 1997) has been shown to lower mortality in LPS models in mice.

      Thank reviewer for the reminding. We apologized for our unclear description leading to misunderstanding. To carefully and firstly identify the novel role of HLJ1 in sepsis, we actually investigated it on several well-known bases. Indeed, the role of IFN-γ and IL-12 has been recognized in previous studies and their neutralization attenuating LPS-induced endotoxic shock have been reported. However, our study focused on the effect of HLJ1 deletion on IL-12/IFN-γ-axis and septic death. Firstly, we observed IFN-γ and IL-12 decreased after HLJ1 deletion during sepsis. On the one hand, we use IL-12/IFN-γ neutralization and found it could improve survival in wild-type mice rather than in Hlj1 knockout mice, suggesting the importance of HLJ1 in IL-12/IFN-γ-mediated mortality. On the other hand, if the difference of mortality rate across genotypes could become no difference after IL-12 or IFN-γ neutralization, then we can infer that HLJ1 contributes to mortality mainly through IL-12 and IFN-γ signaling. These ideals came from a previous study published in Cell (Ponzetta et al., 2019). The authors elegantly proved the role of Csf3r in IL-12/IFN-γ-axis and subsequent tumor incidence by showing that IFN-γ neutralization can alter the phenotype in wildtype mice rather than in knockout mice. This rationale inspired and prompted us to perform the similar neutralization experiment for understanding the precise role of HLJ1 in sepsis.

      • No true sepsis model is used, only LPS. This is important, as for instance neutralization of IFN-γ and IL-12 has been shown to improve outcome in endotoxemia before (see above), but had no effect on survival in more relevant sepsis models such as cecal ligation and puncture (e.g. see Romero et al., Journal of Leukocyte Biology 2010; Zisman et al., Shock 1997). Furthermore, IFN-γ is even proposed (and used on a small scale) as therapy in sepsis patients to reverse immunosuppression.

      Thank the reviewer raised these critical issues and provided valuable suggestions. It was also mentioned by other reviewers. Although the LPS-induced endotoxemia is a simple model with higher reproducibility and reliability comparing to other sepsis models, it indeed cannot represent actual sepsis and is based on the notion that it is the host’s response to bacteria but not the pathogen itself, that leads to mortality and organ failure (Deitch, 2005). Therefore, we performed additional model including cecal ligation and puncture (CLP) which resembles clinical disease and septic shock (Deitch, 2005) to reassure the importance of HLJ1 to human sepsis. Please see our revised Figure 6 (Figure 6) and responses to other reviewers above.

      In accordance with the previous result from Romero et al showing that IFN-γ neutralization did not improve survival rate, we observed similar survival rate between Hlj1+/+ and Hlj1−/− mice after CLP. However, when they treated mice with systemic antibiotics, IFN-γ knockout mice survived significantly better than wild-type mice (Romero et al., 2010). In CLP model, it is possible that severe bacteremia contributed to mortality in mice that did not receive antibiotics in an IFN-γ-independent manner, so we treated mice with systemic antibiotics immediately after CLP. As a result, we further found Hlj1−/− mice showed significantly improved survival compared to Hlj1+/+ mice when mice were treated with systemic antibiotics after CLP surgery (Figure 6E), indicating that targeting cytokine storm in combination with antibiotics provides a promising therapeutic strategy to treat sepsis. The result is showed in Figure 6E (in below) of revised Figure 6 (Figure 6). This suggests that HLJ1-targeting strategy can be combined with antibiotics to become combined therapy for future clinical applications. We emphasized and discussed the concept in the Discussion of the revised manuscript (Page 18-19, line 441-453).

    1. Author Response

      Reviewer #1 (Public Review):

      In their manuscript "CompoundRay: An open-source tool for high-speed and high-fidelity rendering of compound eyes", the authors describe their software package to simulate vision in 3D environments as perceived through a compound eye of arbitrary geometry. The software uses hardware accelerated ray casting using NVIDIA Optix to generate simulations at very high framerates of ~5000 FPS on recent NVIDIA graphics hardware. The software is released under the permissive MIT license, publicly available at https://github.com/ManganLab/eye-renderer, and well documented. CompoundRay can be extraordinarily useful for computational neuroscience experiments exploring insect vision and robotics with insect like vision devices.

      The manuscript describes the target of the work: realistic simulating vision as perceived by compound eyes in arthropods and thoroughly reviews the state of the art. The software CompoundRay is then presented to address the shortcomings of existing solutions which are either oversimplifying the geometry of compound eyes (e.g. assuming shared focal points), using an unrealistic rendering model (e.g. local geometry projection) or being slower than real-time.

      The manuscript then details implementation choices and the conceptual design and components of the software. The effect of compound eye geometries is discussed using some examples. The speed of the simulator depending on SNR is assessed and shown for three physiological compound eye geometries.

      I find the described open source compound eye vision simulation software extraordinarily useful and important. The manuscript reviews the state of the art well. The figures are well made and easy to understand. The description of the method and software, in my opinion, needs work to make it more succinct and easier to understand (details below). In general, I found relevant concepts and ideas buried in overly complicated meandering descriptions, and important details missing. Some editorial work could help a lot here.

      Thank you for the very positive feedback.

      Major:

      1) The transfer of the scene seen by an arbitrary geometry compound eye into a display image lacks information and discussion about the focal center/ choice of projection. I believe that only the orientation of ommatidia is used to generate this projection which leads to the overlap/ non-coverage in Fig. 5c. Correct? It would be great if, for such scenarios, a semi-orthogonal+cylindrical projection could be added? Also, please explain better.

      For clarification, CompoundRay allows for a number of projection modes from any 3D sampling surface to visualised 2D projections. This has now been made clearer with an updated Methods section “From single ommatidia to full compound eye” (lines 171-188), and also a more clarified explanation of the display pipeline within the “CompoundRay Software Pipeline” section (lines 245-247).

      We note that Fig 5 is simply intended as an example of the extreme differences in information that can be provided by nodel (the current state of the art) and non-nodal imagers (as in biological systems). A user could indeed produce custom projections (as now noted in the future work section of the Discussion), such as semi-orthgonal+cylindrical projections by modifying the projection shaders but we do not feel that this adds substantially to the desired message of Fig 5 as currently all view images are generated using the same projection method allowing them to be compared. Further to this, a semi-orthogonal+cylindrical projection would only serve to display these types of eyes and not be of significant use outside of this category of design. Rather, the utility of CompoundRay for research is now demonstrated by the inclusion of an entirely new example experiment (lines 394-467) (Fig 10) which compares artificial and realistic compound eye models in a visual tracking task.

      In additional we note that specific references to the “orientation-wise spherical mapping” of images have been added to appropriate image captions (Fig 5 & 6).

      Finally, we have attempted to be more explicit about about the way that 2D projection systems work within CompoundRay (182-185)

      2) It is clear that CompoundRay is fast and addresses complex compound eyegeometries. It remains unclear, why global illumination models are discussed while the implementation uses ray casting to sample textures without illumination which is equivalent to projection rendering which runs fast on much simpler hardware. If the argument is speed and simplicity of writing the code, that's great, write it so. If it is an intrinsic advantage of the ray-casting method, then comparison with the 'many-cameras' approach sketched below should be done:

      In your model, each ommatidium is an independent pin-hole camera. Instead of sampling this camera by ray-casting, you could use projection rendering to generate a small image per ommatidium-camera, then average over the intensities with an appropriate foveation function (Gaussian in your scenario, but could be other kernels). The resolution of the per-camera image defines the number of samples for anti-aliasing, randomizing will be harder than with ray-casting ;). What else is better when using ray-casting? Fewer samples? Hardware support? Possible to increase recursion depth and do more global things than local illumination and shadows? Easier to parallelize on specific hardware and with specific software libraries? Don't you think it would make sense to explain the entire procedure like that? That would make the choice to use ray-casting much easier to understand for naive readers like me.

      Thanks for this feedback, and can see that it was misleading to include this in our previous Methods section. We have now reduced and moved discussion of global illumination models to the future work section at the end of the Discussion. We have also added a clarification to the end of this document that summarises this point as it was raised by multiple reviewers (see Changes Relating to Colour and Light Sampling)

      3) CompoundRay, as far as I understand, currently renders RGB images at 8-bitprecision. This may not be sufficient to simulate the vision of arthropod eyes that are sensitive to other wavelengths and at variable sensitivity.

      Thanks for pointing out this easy-to-miss implementation detail. Indeed, you are correct that the native output is at 8-bit level as is standard to match display equipment. However, we note that the underlying on-GPU implementation operates at a 32-bit depth, so exposing this to the higher-level Python API should be possible, which could then be used as you suggest. We view adding enhanced lighting properties including shadows, illumination and higher bit depths so as to better support increased-bandwidth visual sensor simulation as future updates which we have now outlined in the Discussion (line 549-553).

      Reviewer #2 (Public Review):

      In this paper, the authors describe a new software tool which simulates the spatial geometry of insect compound eyes. This new tool improves on existing tools by taking advantage of recent advances in computer graphics hardware which supports high performance real-time ray tracing to enable simulation of insect eyes with greater fidelity than previously. For example, this tool allows the simulation of eyes in which the optical axes of the ommatidia do not converge to a single point and takes advantage of ray tracing as a rendering modality to directly sample the scene with simulated light rays. The paper states these aims clearly and convincingly demonstrates that the software meets these aims. I think the availability of a high-quality, open-source software tool to simulate the geometry of compound eyes will be generally useful to researchers studying vision and visual behavior in insects and roboticists working on bio-inspired visual systems, and I am optimistic that the describe tool could fill that role well.

      Thankyou for the positive feedback.

      As far as weaknesses of the paper, the most major issue for me is that I could not find any example of why the additional modeling fidelity or speed is useful in understanding a biological phenomenon. While the work is technically impressive, I think such a demonstration would increase its impact substantially.

      An example experiment has been added as requested.

      I can identify a few more, relatively minor, weaknesses: the software tool is not particularly easy to install but I think this is due primarily to the usage of advanced graphics hardware and software libraries and hence not something the authors can easily correct. In fact, the authors provide substantial documentation to help with installation.

      Indeed, we have tried to ease installation as much as possible by provided detailed documentation. This has been updated since initial submission and proven sufficient for multiple users. We have looked into dockerising the code but as correctly identified by the reviewer there are significant challenges due to proprietory hardware and their drivers.

      Another weakness of the tool, which the authors might like to address in the paper, is that there are some aspects of insect vision and optics which are not directly addressed. For example, the wavelength and polarization properties of light rays are hardly addressed despite extensive research into the sensation of these properties. Furthermore, the optical model employed here is purely ray based and would not allow investigating the wave nature of light which is important for propagation from the corneal surface to the photoreceptors in many species.

      Indeed, it is correct that the current implementation does not allow such advanced light modellign features but as our initial aim was to allow arbitrary surface shapes this was considered beyond the scope of this work. However, we have added a short description of extensions that the method would allow without significant architectural changes which include many of those listed by the reviewer. As the renderer simulates light as it reaches the lens surface, it is hoped that further works will be able to use this natural boundary between the eye surface and it’s internals to build further computational models that use the data generated in CompoundRay as a starting point to then simulate inside-eye light transport.

    1. but before we do that let me talk about something that's even more fundamental um and helps us to understand the progression of thinking through those four schools to the what's 00:42:10 usually considered the most sophisticated in my jamaica school um and that is the distinction which is really important between existence and intrinsic existence 00:42:23 and the ex and the distinction between no existence and no intrinsic existence so this is these distinctions um if one doesn't fully comprehend the the 00:42:37 majamika system uh not fully comprehend but have some idea of the of the uh my jamaica system one then usually make is not able to make these distinctions so 00:42:49 let's talk about them for a moment um so existence um we when we talk about existence we talk about our ordinary understanding of what's real okay that things are 00:43:03 objects uh things are you know they may be in relationship but what's in relationship are two different distinct objects or entities that are in relationship and that's kind of our normal understanding of existence 00:43:15 so lacking inherent existence or intrinsic existence begs the issue to understand what is intrinsic existence okay and that's the 00:43:27 object of negation for the buddha for nagarjuna and for all those following in this tradition of nagarjuna the uh the majamika school and so 00:43:39 that's not so easy to wrap our heads around uh what is intrinsic existence in a way it's so close that we miss it you know it's it's a little bit like you know 00:43:51 staying in a in a new hotel room in a new city waking up and looking for your glasses and you can't find them and then realizing that they're already on your faces and so 00:44:05 intrinsic existence is things existing independently things existing uh through relationship um things not not things existing dependently not in independently 00:44:19 and so if we look at dependence now we can look at that at several levels and the more obvious levels you've mentioned that carlo is cause and effect causality okay but there are also more uh 00:44:33 subtle levels of dependence that the buddha and nagarjuna talk about and are real central to the philosophy so the second level is the relationship between whole and parts and parts to whole it 00:44:46 goes both ways okay that's a a a little bit you know another level if you will of of dependence uh in the particularly you know highlighted by nagarjuna and 00:44:58 then the third level which is the most uh subtle level the subtlest level which is really what we have to start to understand because the opposite of that is this independent or intrinsic 00:45:10 existence okay so this third level we call dependence through designation or sometimes called dependent designation but it's dependence through designation 00:45:22 it's a type of naming or labeling so for example barry we label or name barry my parents gave this name to barry based on a body 00:45:34 okay maybe a little tiny infant body at that time right and also uh in terms of maybe some kind of behaviors or you know how they thought this emotional structure is for this little baby right 00:45:47 he's very calm or he's very you know he's acts out a lot he's very active or you know all those things so upon all that a name is placed in this case barry okay 00:45:59 so that relationship of you know dependence through designation is really what nagarjuna is talking about when we talk about dependence um and so that's very uh 00:46:11 important to understand so the opposite of that coming back to understanding this inherent or intrinsic existence there are many words in english we use synonymous for 00:46:23 ranging not existing intrinsically or inherently or independently or from its own side those are all synonyms um to the tibetan 00:46:36 terminology that i just mentioned um so when people don't have a good appreciation for intrinsic existence and you say then so the second there were two comparisons 00:46:53 the second comparison is uh non-existence and not inherently existent so when when when when regarding says no inherent existence what often people interpret is no 00:47:07 existence at all and they fall into a nihilism that nothing exists at all so they haven't fully under appreciated this notion of um intrinsic existence so they're throwing the baby out with the 00:47:20 bathwater right when we're throwing out or negating uh intrinsic existence that they don't quite understand what that really means they think it's all of existence and therefore they you know think that nothing exists they throw the 00:47:33 baby out with a backlog so that's that's okay can i interject something before you go ahead and you you you promised us before uh the full schools before uh but but can i 00:47:44 can i make a comment here um of course about you to say because this is free flow so yeah yeah so we you know we gave the title uh 00:47:56 what is real to this uh to this i that seems to me um that's exactly that distinction that that you you made between existence 00:48:09 and intrinsic existence um inherent existence it's a it it's it's uh it's idea that that i found central and and and 00:48:22 essentially essentially useful for me for for the following reason first of all um i mean the notion of reality the notion of existence here are close i mean what what exists is what is real what is that i want to say a couple of things one is 00:48:40 that um we make a distinction with an illusory and real in our everyday life uh which it's well founded i mean if i if i see 00:48:53 the chair and there's a mirror there and i see a chair of the other side of the mirror there's a precise sense in which the chair in which the other side of the mirror is not real well this chair is real 00:49:06 um this distinction has a meaning because i can sit on the chair i can touch that one but i cannot sit on that and touch that one but 00:49:18 then we realize that some aspects of what is illusory in the chair in the mirror also are shared by the chair which i just called real which is also illusory in 00:49:31 some other sets um for instance uh the fact of being a chair uh it's uh cut out and back on so i missed you up until now please could you repeat it oh 00:49:44 uh for where for where did you be speak uh when you were saying this distinction between existence and inherent existence and non-existence non-inheritances is 00:49:56 very helpful uh and then after that i lost you yeah i wanted to um make a couple points one is that uh we use a distinction between illusory and real in everyday life for instance we say that 00:50:10 a chair but then i was saying of course then um through science uh we realized that there are illusory aspects in the chair which are just called real as well 00:50:30 but then one is tempted and that's um to say all right so there are many luxury aspects of that chair but there is a a more fundamental level in which uh 00:50:45 there is a description of what is going on there which is a real one and edinton uh made it very very vividly in a well-known uh distinction between the scientific table 00:50:57 and the everyday table when he says look i have two images two tables there there's a table of which i eat which is solid and then there's a table which i view with my scientific eyes which is made by atoms 00:51:09 uh and is not solid there's a lot of emptiness of of not emptiness negatives empty completely different sense i i've heard that that emptiness is 99.9 to the 12th 00:51:20 power based in the atom is that right yes yes but that's of course not negative emptiness that's just the lack of presence of atoms yeah um and adidas says and people use that 00:51:34 by saying the the the the chair of my uh the chairman which i see the solitude is illusory the real chair is the atoms uh this way of using the notion of real and the 00:51:49 notion of um of uh existence so what exists in the atoms uh is dangerously misleading that's what 00:52:01 i uh because uh it uh um it pushes us to try to resolve the relational and illusory aspect of reality that we see 00:52:15 in terms of some basic fundamental physical reality from which to derive it or in western subjective idealism 00:52:28 in terms and its derivation in terms of some sort of uh fundamental mind or fundamental subject which is a real existing entity 00:52:41 the cartesian mind that is certain of existing itself um or the kantian subject or even the the the fundamentality of the perception 00:52:53 itself in whosoever uh and in phenomenology so there is this western need to anchor um the uh what we mean by real or something final 00:53:07 so uh to to realize that there is dependence but then there is some basic grounds on which everything builds up on which to uh on which to sit and this is what i take emptiness 00:53:23 the notion of empty negative notion of emptiness to be useful uh to to get rid of this urge of finding beyond the uh 00:53:35 the illusory aspect of the world a a basic level which is not um uh real in in in the uh 00:53:47 in the sense of uh uh of of uh uh in which this chair is is real compared to the uh to the chair uh in the mirror but but really the fundamental way so the the the bottom line of the story the 00:54:02 the solid terrain on which to anchor the ultimate um uh uh the end point of the line of dependence the line of dependence ends to some point that's what is real 00:54:15 and and what is this nagarjuna is that that's the wrong question i mean uh it's not only that the chair the table is empty because i can understand it's something else but it's 00:54:26 also that something else is also empty because i can understand it's something else until the point in which there is this emptiness itself it's a it's empty because we shouldn't take it as a 00:54:40 as a fundamental sort of metaphysical principle on which to ground all the rest so this putting this this is yeah just putting this in slightly different 00:54:51 terminology emptiness is where it allows functionality emptiness is the lack of any kind of essence even on a you know atomic level and i agree with you what you said 00:55:04 that's i think very true um right and this is a look at when we look at the chair versus the reflection of the chair in the mirror it gets a little more complicated because both of them of course lack any 00:55:17 independent existence both okay they're both empty uh in terms of shunyata having said that the metaphor that the buddha used he gave about 10 different 00:55:29 metaphors for you know something to be illusory and one of the important ones that he used was reflection you know he used the reflection of the moon or the full moon in in the still 00:55:41 water that it looks like the moon but in fact of course it's not it's a reflection he used such things as water in a mirage sound of an echo and you know things 00:55:55 like that to illustrate okay now um let me mention two experiments if i may and you correct me where i'm wrong i'm a 00:56:07 pop physicist from the new york times okay um and one is the uh the thought experiment of ed edwin schroedinger okay the so-called shorting her cat paradox 00:56:21 or thought experiment and you have double steel box in which you have a cat there's no doors no windows right and you have a vial of very powerful acid that's 00:56:33 connected to a radioisotope the half-life of the isotope is the same duration as the duration of your experiment your thought experiment so the chance of the cat so if the radioactive material 00:56:46 decays 50 chance it you know somehow pulls a lever and the acid spills killing the cat if that radioisotope does not decay there's no spillage of the of the 00:56:59 of the acid and the cat remains alive so quantum physicists call this superposition where the cat is both alive and dead when you crack open this steel box 00:57:13 then um you observe what's inside and then the cat is either dead if the radio isotope you know decayed and knocked over the acid or 00:57:25 it's alive it didn't okay and it's it's either or whereas when you can't observe it it's both it's superposition okay second is the double slit you know you you shoot these electrons or photons you 00:57:40 know through two slits in a metal thing and then you have a screen behind and you look at the the pattern and if you have a little camera observation device at the slit level of the slits observing 00:57:52 you find a pattern below on the back on the screen that suggests what passed through the splits were particles whereas if you remove the observation device you have an interference pattern 00:58:05 suggesting what went through this list were waves okay so these two experiments at least in my very uh you know superficial understanding tell us that observer dependence is very 00:58:18 important in terms of reality okay that whether or not there is or isn't or or maybe you can what type of observer you know presence there is very much influences and determines what's real 00:58:31 and so that then uh jumps into the four you know buddhist schools of philosophy and if we go from the so-called least sophisticated up the third one would be the one you alluded to that's somewhat 00:58:45 similar to bishop barkley in the west and other idealists that say that everything is consciousness everything is mine and things that seem to be solid out there in an external reality are nothing more than projections of our 00:58:58 mind and that's actually a very sophisticated philosophy it's a very sophisticated philosophy one of the things it starts to do is it breaks down this notion of a solid external reality 00:59:10 okay but it's con it's it's critique as you have you also mentioned is that it takes the mind you know to be somehow you know uh absolute or ultimate you 00:59:22 know existing and so then the highest if you will most sophisticated school of mediumica says well what the chidoma modulus the mind-only school says that's correct up to a point but the criticism is 00:59:36 there's no uh you know absoluteness about the mind either so then you end up with that you accept an external reality you accept a mind but both you know that is every existent thing uh exists 00:59:49 without having any uh exist in relationship without having any independence or objectivity um and so that's very roughly the at least the the the last two of the three buddhist schools the 01:00:03 third one is divided again into prasannika madhyamaka and spatrontikamanjamaka using tibetan terms that are borrowing from the sanskrit um and the prasangika mud yamaka is considered the most 01:00:16 sophisticated where nothing at all has intrinsic existence the whereas the uh svaltronticom and yamaka they say that some uh conventional reality does exist uh 01:00:30 from its own side having some essence uh so there's a little bit of a distinction in the debate there um so just wanted to to mention those things i'd like you to comment

      Kerzin differentiates between existence and intrinsic existence. Intrinsic existence is what the Buddha and what Nagarjuna is trying to negate.

      Rovelli makes a good point about a prevalent attitude that science offers a truer perspective than common sense, while Nagarjuna is pointing out that even the scientific explanation is not the final one. For one thing, it implicitly depends on the existence of a reified self who is the ultimate solidified existing agent and final authority, which Nagarjuna negates with his tetralemma.

    1. The lawyer has at his touch the associated opinions and decisions of his whole experience, and of the experience of friends and authorities. The patent attorney has on call the millions of issued patents, with familiar trails to every point of his client's interest. The physician, puzzled by a patient's reactions, strikes the trail established in studying an earlier similar case, and runs rapidly through analogous case histories, with side references to the classics for the pertinent anatomy and histology. The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior.

      Very interesting the way it describes different professions. Although there have been some approaches to the memex none of them have been this universally usable.

    2. So he sets a reproducer in action, photographs the whole trail out, and passes it to his friend for insertion in his own memex, there to be linked into the more general trail.

      Even with current Zettlkasten technology like Logseq, a way to create a trail, and send off a particular trail to a friend is not present. I wonder what the copyright laws would look like when it comes to sharing excerpts as part of annotated trails like this. Would it be covered under Fair Use? What would a file format or a renderer for this look like?

    3. He can add marginal notes and comments, taking advantage of one possible type of dry photography, and it could even be arranged so that he can do this by a stylus scheme, such as is now employed in the telautograph seen in railroad waiting rooms, just as though he had the physical page before him.

      We have gotten away from written annotations for digital work and I'm not entirely sure it's a good thing. I want to think through the trade-offs of this.

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Villalta, Schmitt, Estrozi and colleagues report their results on genome compaction in one of the most complex known viruses, the Mimivirus. This work will be of interest to a broad readership, and particularly to virologists and structural biologists. The authors describe a novel mechanism used by mimivirus to compact and package its 1.2 Mb dsDNA genome. In particular, the mimivirus genome is shown to be packed into magnificent cylinder-like assemblies composed of GMC-type oxidoreductases, presenting yet another remarkable case of enzyme exaptation. By using cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET), the authors determined the structures of such fibers in several relaxation states, which presumably represent different stages of nucleoprotein unpacking upon delivery into host cytoplasm. The authors also suggest (although do not directly visualize) that the lumen of the genomic fibers contains several viral enzymes, most notably, DNA-dependent RNA polymerase, which is necessary for cytoplasmic replication of the mimivirus. Overall, this is an important discovery, which further expands our appreciation of the "inventiveness" of viruses.

      We thank this reviewer for the positive and constructive comments. We provide now some additional data corresponding to unpublished follow up studies, we hope will help all reviewers assessing the quality and reliability of our work.

      I am not an expert on helical reconstructions and cannot evaluate the validity of the models. Thus, my specific comments will focus on aspects of the work with which I am more familiar.

      1) In light of the presented results, it is reasonable to assume that GMC-type oxidoreductases of the mimivirus are very important for the formation of functional virions. However, in a previous study (PMID: 21646533), it has been shown that the genes encoding GMC-type oxidoreductases can be deleted from the virus genome (M4 mutant) without the loss of infectivity. The M4 virions were devoid of the external fibers decorating the icosahedral capsid, but the genome was still packaged. How do the authors reconcile these results with those presented in the present manuscript? This should be addressed in the Discussion section.

      In fact, like the reviewers, we initially assumed that the GMC-oxidoreductases were essential. Now, we believe it might be premature to assume that GMC-type oxidoreductases are the only type of proteins that can be involved in the scaffolding of the Mimiviridae genomic fibers. We managed to extract the genomic fiber of M4 (the isolate without GMC oxidoreductases). The fiber also has a rod-shaped structure but protein composition analysis of the purified fiber shows that different proteins are involved in its assembly.

      We hope the reviewers will accept to reserve our finding for a following publication.

      2) The authors state that mimivirus encodes two GMC-type oxidoreductases (qu_946 and qu_143) and that both could be fitted into the electron densities. However, I could not understand whether the authors think that the fibers are heteroassemblies of both oxidoreductases or different fibers are composed of different proteins, or only one is used for fiber formation. Please clarify. In case you are not able to distinguish between the two homologs (e.g., due to limited resolution), state so explicitly.

      We cannot discriminate between the two GMC-oxidoreductases due to their close identity (69% identity, 81% similarity) and the resolution of the map. Yet we think that in most cells the qu_946 GMC-oxidoreductase is the most abundant at the time of genome packaging (from our proteomic study, between 2 and 9 times). Yet, in some cells the second GMCoxidoreductase could become the most abundant and, in that case, the genomic fiber is built using qu_143.

      3) I am slightly puzzled by the observed "ball of yarn". It is hard for me to imagine that a cylindrical container/fiber containing a continuous dsDNA genome could be bent or fragmented into bundles because this would break the protein-protein interactions holding the fiber together. In Figures 1C and S1, are these parts of the same fiber or multiple fibers coming out of one capsid? Related to this question - is there evidence (e.g., from qPCR) that Mimivirus carries a single copy of genomic dsDNA per capsid?

      We believe this reviewer should think in terms of packaging. The folded genome is packaged through two lipid membranes (the one lining the capsid interior and the one in the nucleoid) concomitantly with its wrapping by the protein shell ribbon. Thus, there is plenty of space in the nucleoid at the beginning of the packaging and the genomic fiber is gently folded inside. But as more genome needs to be packaged, this compresses the flexible fiber into the nucleoid until it is totally encased in the nucleoid. That also defines the size of the nucleoid in the icosahedral capsid. This tight packaging is exemplified in Fig 1A for instance or the AFM images of the nucleoid enclosed in P3 of this file.

      We provide a more general answer in the answers requested by the editor.

      We think that the entire genome can only be packaged in the capsid through its assembly within the protein shell. We also think the genomic fiber is progressively built on the genomic DNA while it progresses into the capsid, most likely by an energy driven packaging machinery. This process can be compared to bacterial pili assembly, except that pili are built on the surface of the cell, while the genomic fiber is built into a compartment, the nucleoid, forcing it to fold in this compartment, which is only possible due to the high flexibility of the genomic fiber. Thus, the entire genome corresponds to ~40 µm of genomic fiber, which when folded as a ball can entirely fit into the nucleoid. The organization of the genome in a large “tubular structure” and its folding inside the nucleoid compartment has been previously reported by AFM studies of the mimivirus particles (Kuznetsov, Y. G. et al. Virology 2010; Kuznetsov YG et al. J. Virol. 2013, Fig 15), which the authors refer to as “highly condensed nucleoprotein masses about 350 nm in diameter within the inner membrane sacs of virions”, with the presence of tubular structure they refer to as “thick cables of the nucleic acid” (image P3 herein).

      4) The authors describe the interactions between the monomers in the dimer of qu_946 as well as between qu_946 and DNA. I would also like to see a brief description of protein-protein interactions between subunits within the same helical strand as well as between helical strands, which hold the whole assembly together (i.e., what are the contacts between green subunits as well as between green and yellow subunits shown in Fig 2C). The authors suggest that the shell "would guide the folding of the dsDNA strands into the structure" (L310). To support this statement, the authors could show the lumen of the fiber rendered by electrostatic potential.

      We thank this reviewer for these suggestions. An additional supplementary Table (Table S4) is now provided listing the various contacting residues in each genomic fiber map and for each GMC-oxidoreductase. The number of contacts obviously decrease in the relaxed structure, but even in the compact forms, we noticed there are relatively few contacts intra and inter-strands, which may also explain the flexibility of the structure. We now provide a new figure 3 in which the lumen of the fiber is rendered by electrostatic potential for the Cl1a map and each of the two GMC-oxidoreductases.

      5) Please provide some background information on the distribution of GMC-type oxidoreductases in other families of giant viruses, so that it is clearer whether the described packaging mechanism is specific to mimiviruses or is more widespread.

      This is a central point, also linked to the question about M4. In fact, like the reviewers, we initially assumed that the GMC-oxidoreductases were essential. Now, we believe it might be premature to assume that GMC-type oxidoreductases are the only type of proteins that can be involved in the scaffolding of the Mimiviridae genomic fibers.

      If this reviewer still thinks this is essential to this manuscript we can provide a multiple alignment of the GMC-oxidoreductases of members of each clade upon request.

      Reviewer #3 (Public Review):

      Since it was presented to the scientific community as a viral entity, mimivirus has the unlimited capacity to cause surprise and admiration. In this manuscript, Villalta, Schmitt, Estrozi, et al. and Abergel present how the mimivirus gigantic genome is organized into the virion. The authors succeeded in developing a protocol to trigger virus genome uncoating followed by genome-associated proteins purification. The presented data indicates that a helical shield composed of two GMC-type oxidoreductases is associated with the mimivirus genome, named genomic fiber. By cryo-EM, and cryo-tomography different forms and stages of the genomic fiber were detailed described, indicating the dynamics of fibers conformational changes, likely related to genome packing and uncoating during the virus replication cycle. In-depth analysis of a substantial number of individual virus fibers revealed that the mimivirus genome is folded and organized inside the aforementioned helical shield, which seems to be novel among giant icosahedral viruses. Proteomics in association with image analysis indicates that mimivirus packed genome forms a channel, which accommodates key enzymes related to early phases of the replication cycle, especially RNA polymerase subunits.

      I must disclose that I am not an expert on structural virology and proteomic analysis. Therefore, I don't feel I can contribute to the improvement of this kind of analysis. That said, I congratulate the authors for their efforts to make the manuscript story understandable to nonexperts.

      We are grateful to this reviewer for these positive comments.

      I have a few suggestions and comments:

      1) Please consider the "nucleocapsid" concept during genomic fiber presentation. I believe it fits in;

      We fully agree and this was why we referred to APBV-1. Obviously, it was not clear and we now explicitly use the word “nucleocapsid” in the text.

      2) The "ball of yarn" analogy is nice, but fig 1C shows several fibers unconnected (free) in one of their ends. I am wondering if it means that the genomic fiber is not a long-single structure covering the whole genome, but a bunch of several independent helical structures covering the whole genome and attached in such "ball of yarn". Like several threads connected. Could the authors clarify that please?

      In the “ball of yarn” structures, there are clearly breaks that give the impression of multiple fibers. Yet, these breaks are due to the multiple steps of the extraction, enrichment and purification treatment. The genomic fiber is built as a long (~40 µm) single structure folded in the nucleoid while it is loaded. As a result, it is tightly packed into the nucleoid and broken into fragments upon release due to the fragilizing treatment. As exemplified in the CryoEM image provided above (P9) on freshly opened capsids, these breaks appear to depend on the treatment. This reviewer could also look at the answer we provided to Reviewer 2 point 3 as this could help clarify how it is possible to package the genomic fiber and subsequently fold it into the nucleoid to the point where it is tightly packed and under pressure.

      3) Considering previously published data on proteomics of viral factories and transcriptomics of mimivirus: is there any temporal association between GMC-type oxidoreductases' peak of expression and genome replication during the viral cycle? what about RNA pol subunits? Are all those proteins highly expressed during the late cycle? or do they reach the peak concomitantly with genome replication? This information can support the discussion on the genome-fibers assembly during the cycle.

      We thank this reviewer for these suggestions. We now added time of expression of the proteins involved in the genomic fiber composition along the manuscript. We added explicit sentences in the main text both for the GMC-oxidoreductases and RNA polymerase subunits. The RNA polymerase as well as proteins involved in mRNA maturation are in the virion (Table S2 B) and studies by others demonstrate early transcription takes place in the nucleoid once transferred in the host cytoplasm (Reference 24). We also provided a link to the reviewers where to find the expression data for the different mimivirus genes. http://www.igs.cnrs-mrs.fr/mimivirus/

      4) Taken together, data seem convincing to demonstrate that the virus genome is located inside the helical shield. However, I believe that the authors could better explain why we only see 20 kb fragments in the gel, including in the control (in Fig S2).

      We hope our answers to this comment will convince this reviewer.

      Fig S2 corresponds to a regular 1% agarose gel and not to a PFGE gel. This gel was simply to show there is DNA associated with the genomic fiber and not to show the size of the DNA as the genomic fiber has been broken into pieces and we thus do not expect to have very high molecular weight. I must point out that when extracting the DNA form Mimivirus capsids using standard kits and pipetting, it also migrates at the top of the gel (Lane 1 in Fig. S2) while it would likely appear as a smear above 20 kb on a PFGE. By contrast when the viral particles are put into plugs prior lysis, the genomic DNA migrates at the proper size, as shown in the publication from Boyer et al. 2011 (reference 31), showing the genome of Mimivirus is a linear genome migrating around 1.37 Mb (Fig 1, Panel B, Lane M1). In P9 of this letter, an image of a long (> 6 µm) and flexible fiber is presented.

      Reviewer #4 (Public Review):

      In the manuscript "The giant Mimivirus 1.2 Mb genome is elegantly organized into a 30 nm helical protein shield", the authors show that, when subjected to low pH stress, the Mimivirus particle releases 30nm-diameter filamentous assemblies. These filaments consist of a protein shell that envelopes the Mimivirus genomic DNA. The protein shell is composed of two GMC-oxidoreductases, the same protein that forms the long fibers emanating from the capsid of the Mimivirus.

      Overall, despite being interested in the subject, this scientist was left confused about several aspects of the paper described below. The presentation of the material is also confusing.

      We hope the answers and images we provide to all Reviewers in page 2 to 12 herein will clarify the various points raised by this reviewer.

      1) The presented data do not allow the estimation of the amount of mimivirus genome organized into 30 nm diameter filaments. Hence, the title of the paper is misleading.

      The entire genome should be packaged in the genomic fiber. That was already observed by other and we now provide an image of the nucleoid imaged by AFM that was published. The image was extracted from Kuznetsov et al. J. Virol. 2013. See p9 of this letter.

      2)The filamentous structures are a result of extremely harsh treatment of the virus particle, which starts with a 1.5 hour-long incubation at pH 2. Do the filaments actually exist inside the virus particle as the title of the paper implies?

      The 1 h incubation at 30°C and pH 2 was only applied to recover the nucleoids (see material and method section “Nucleoid extraction”) presented in Fig S1A. Acidic treatment was never applied to produce the genomic fiber as we noticed it is sensitive to both temperature and acidic treatment. All steps of the extraction protocol were performed at pH 7.5 (section: “Extraction and purification of the mimivirus genomic fiber”). We must emphasize that the release of the genomic fiber can be seen at the very first step of the extraction protocol (protease treatment). The sample was also controlled at each step of the protocol by negative staining TEM to assess the status of the genomic fiber. We had to optimize the protocol as using a too soft proteolytic treatment led to too few opened particles but with mostly a compact genomic fiber released, if it was too harsh, all particles were opened but the genomic fiber was mostly in the ribbon state. We had to compromise to get a decent amount of compact and relaxing structures to be able to perform the present work. We would like to stress out that we could reproducibly obtain the genomic fiber from many preparations and that we could observe them with different virions (including M4), even using different protocols (only the one with the better yield is reported in the manuscript).

      In the Figure 1B the genomic fiber can be seen inside a virion and is still encased in the membrane compartment. These structures were not reported in previous cryo-EM analyses of the virions. As said above, they were only reported by AFM studies of the mimivirus particles (Kuznetsov, Y. G. et al. Virology 2010; Kuznetsov YG et al. J. Virol. 2013, Fig 15). See p9.

      Or [might] these filaments [form during] host take over?

      Or [perhaps] these filaments [result from a harsh in vitro treatment] and have nothing to do with either?"

      The first two questions can be answered with the help of cryoFIB tomography, which might be beyond the scope of a "paper revision". However, the properties of the two GMCoxidoreductases in the presence and in the absence of genomic DNA must be examined in greater detail. Can these proteins, by themselves, form similar hollow filaments (or any filaments) when subjected to the same treatment as the virus?

      I personally have difficulties to imagine that such a complex structure could be the result of an artefact due to the treatment for several reasons: - It is unlikely that by simply putting the GMC-oxidoreductases with DNA would result in a helical structure where the DNA is folded 5 times and internally lining the protein shell (extended data video1 of one tomogram). It would be like crystallizing the proteins (in a heterogeneous sample) onto the folded DNA to form a helix with a hollow lumen. The crystallographic data obtained by others by on the mimivirus GMC-oxidoreductase did not produce tubular structures either and they reported 3 crystal forms. They overexpressed the proteins in E. coli and did not report such structures bound to DNA either.

      • Given the presence of compact and relaxed forms, once relaxed the helix cannot go back to a compact state passively by simply rewinding suggesting the relaxed forms are the result of decompaction of a constrained structure. This is also supported by the loss of DNA in the relaxed state Cl3. Last steps of unfolding correspond to the loss of one ribbon strand after the other.

      • The contacts between chains intra and inter strand are also scarce supporting an active assembly of the structure. We now provide an additional supplementary Table S4 with the different contacts for the different states of the genomic fiber.

        3) Although the assignment of the qu_946 oxidoreductase to the corresponding cryo-EM density is correct (as the resolution is high enough), I am confused about the other oxidoreductase (qu_143). Where does it fit to? Which structure does it form?

      We cannot discriminate between the two GMC-oxidoreductases due to their close identity (69% identity, 81% similarity) and the resolution of the map. Yet we think that in most cells the qu_946 GMC-oxidoreductase is the most abundant at the time of genome packaging (from our proteomic study, between 2 and 9 times). Yet, in some cells the second GMCoxidoreductase could become the most abundant and, in that case, the genomic fiber is built using qu_143.

      Equally important, what is going on with the N-terminal 50-residue domain of qu_946? Is there a space for it in the cryoEM map? Is it disordered?

      The N-terminal domain is only present in the fibrils decorating the capsids. As illustrated in Fig S12, when analyzed by MS-based proteomics, the comparison of the peptide coverage of the GMC-oxidoreductases whether they compose the fibrils or the genomic fiber is not the same. The N-terminal domain is clearly covered when the fibrils (data not shown) or intact virions are analyzed and not covered when the analysis is performed on the genomic fiber. That is why we propose this N-terminal domain could be an addressing signal (see main text) and that a protease could be cleaving it in the case of the genomic fiber assembly.

      Main text: The proteomic analyses provided different sequence coverages for the GMCoxidoreductases depending on whether samples were virions or the purified genomic fiber preparations, with substantial under-representation of the N-terminal domain in the genomic fiber (Fig. S12). Accordingly, the maturation of the GMC-oxidoreductases involved in genome packaging must be mediated by one of the many proteases encoded by the virus or the host cell.

      Indeed, there is no space to accommodate this domain as it would prevent the interaction between the protein shell and the DNA or/and induce an increase of the genomic fiber diameter that would be too big to be accommodated into the nucleoid.

      4) The bubblegram analysis is not very convincing. The bubbles appear to correlate with the length or thickness of the structure - the long or overlapped structures form bubbles. The bubbles may not be due to the presence of DNA.

      The point is, as demonstrated by our structural studies, that the relaxed structure lost the DNA. This is why bubble cannot be seen in the relaxed broken fibers. On long fibers still in compact form, the DNA is visible in the structure and bubble can be seen. Yet the evidence for the presence of DNA in the structure is also provided by the agarose gel of the purified genomic fiber and the cryo-EM structures. Bubblegrams are just one additional analysis which was provided.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are sincerely grateful to the reviewers for several key comments that led us to correct some mistakes and better appreciate how to put our findings in the context of recently published data. These changes undoubtedly improved the manuscript.

      Many other reviewer comments seem to equate chaperone binding with a functional chaperone role in de novo folding. These are not the same. Cytosolic chaperones presumably “sample” nearly every protein that is synthesized by cytoplasmic ribosomes. This does not mean that every such protein would misfold if even one of those chaperones failed to bind it. If we want to understand what chaperone mutations might cause human disease due to septin misfolding, for example, it will not be enough to catalog all the chaperones that bind septins. We have already done that. What will help is to understand which chaperones make functional contributions to septin folding and complex assembly. Our study is the first to experimentally address chaperone roles in de novo septin folding, period. We take responsibility for not being sufficiently clear about the goals of our work, and, to emphasize these points, we added one sentence to the Introduction and revised another.

      Another consistent criticism was that the use of the E. coli system, both in vivo and in vitro, limited our ability to gain insight into the folding of septins in eukaryotic cells and led to a “tessellated view”. For example, reviewers claimed that our model about translation elongation rates for Cdc12 were “based mainly on the E. coli system and bioinformatics analysis”. We disagree with this interpretation. Key evidence in support of our model come from published data in yeast, specifically the much higher density of ribosomes on Cdc12 and the accumulation of ribosomes on the Pro-rich cluster near the Cdc12 N terminus. These are precisely the kinds of “more stringent analysis” in “authentic yeast” (to use Reviewers’ language) that we would have wanted to do to test our model, had they not already been done by others. Without specific suggestions, we struggle to imagine what other kinds of experiments the Reviewers have in mind, apart from a eukaryotic version of a reconstituted cell-free translation system, which Reviewer #1 admits “would be substantially difficult” and “time consuming”. While we are intrigued by the reconstituted eukaryotic cell-free translation system that was published last year (which we mentioned on lines 994-995) and look forward to exploring it in future studies, it is not commercially available and we agree that the amount of effort required to prepare it ourselves is unrealistic for the current study. Most importantly, we do not find in the critiques provided any specific reason why our E. coli-based systems experiments are intrinsically less “stringent” or “rigorous”.

      Accordingly, we think that, together with the results of multiple new experiments (detailed below), the extensive re-writing and re-ordering that we have done in the revised manuscript will be enough to better emphasize the importance and rigor of our findings and thus to address all of the Reviewers’ specific concerns.

      Reviewer 1 thought that our manuscript “does not even provide new information, since the involvement of CCT and the Hsp70 system is not novel” and thought that “the key finding of this manuscript is how chaperones are involved in the de novo folding of septins, which is not conceptually new because of previous findings, including those of the authors”. Reviewer #3 also stated that “the function of Tric/CCT in septin folding and assembly is well documented”.

      We were quite surprised at this reaction, since we dedicated a significant portion of the original manuscript (lines 68-76 and 319-322) to explicitly discussing the only other paper in the literature that specifically addresses the question of whether or not CCT is required for de novo septin folding. As a reminder, that paper explicitly stated that “it is unlikely that CCT is required to fold septins de novo” and “septins probably do not need CCT for biogenesis or folding”. With regard to involvement of the Hsp70 system, the only existing evidence in the literature on this subject is the aggregation of some septins in ssb1∆ ssb2∆ cells. Like the CCT study, that study did not distinguish whether this was a result of problems during septin synthesis and before septin complex assembly, or, alternatively, whether pre-folded and assembled septins were subject to disassembly, misfolding, and aggregation. Our experiments specifically test the fate of newly-synthesized septins prior to assembly in living cells. Our previous findings documented physical interactions between wild-type septins and multiple chaperones but did not address whether these interactions had any functional relevance. We previously reported functional effects of interactions between chaperones and MUTANT septins but, again, these studies did not address functional chaperone requirements for WILD-TYPE septins. While we did our best to highlight these points in the original document without devoting excessive amounts of text, we accept responsibility for not making these points sufficiently clear and to address this issue we added additional text, including the text quoted above, to the Introduction.

      While Reviewer #3 commented that the manuscript “is overall well presented”, Reviewer 1 thought that the manuscript was “complicated to read” with “no logical connections, just a list of many results” and mentioned that part of the difficulty was “that it contains many negative results”.

      In addition to reorganizing the manuscript, as suggested by the reviewers, we added more text at the beginning and end of nearly every section to even more explicitly state the logical connections between results. In our opinion, negative results of properly controlled experiments are valuable to the research community, and we do not understand what it is about negative results that makes them difficult to read about. Many of the extra experiments we performed were in anticipation of being asked to perform them by reviewers, some of which generated negative results. We are reluctant to remove negative results unless there is a more compelling reason. For example, to address another reviewer concern, we did remove the negative results with the Ydj1–Ssa2 compensatory mutants.

      Reviewer #2: “4) Figure 2: The labeling on the protein structure makes it seem like the exact region for Ydj1 and Hsp70 was experimentally identified, when it hasn’t.”

      We acknowledge that the first sentence of the figure legend (“the colored ribbon follows the color scheme in the sequences at right for overlapping β-aggregation, Ydj1 and Hsp70-binding sites”) could be misinterpreted, since only in the second sentence does it say “Sequence alignments show predicted binding sites”. We corrected this mistake, and added the text “Predicted chaperone binding sites” as the first words in the legend to this figure.

      Reviewer #2: “8) The authors confusingly jump back and forth between different Septins and different chaperone (Ssa1-4, Ydj1, Sis1, Hsp104). We would ask the authors to re-arrange the manuscript, collating all the yeast work in one section and bacterial work in another.”

      We re-arranged the manuscript and put all the yeast work in one section and all the bacterial work in another, with the exception of the studies of individually purified Cdc3 and Cdc12, which we put in between the yeast studies of the kinetics of de novo assembly and the yeast studies of post-translational assembly. Our reasoning is that the studies with the purified proteins demonstrate challenges with maintaining native conformations in the absence of chaperones and other septins, which flows naturally into the yeast studies asking about the ability of “excess” septins to maintain oligomerization-competent conformations in the absence of other septins and when we experimentally eliminate specific chaperones. All of the work actually manipulating E. coli genes/proteins is now together.

      Reviewer #3: “1. The co-translational binding of CCT to nascent polypeptide chains has been studied (Stein et al., Mol Cell 2019). While the authors indicate that septin subunits are engaged co-translationally, they do not comment which ones are interacting with CCT and at which state of translation. This information is crucial and should also be mentioned in the discussion section.”

      We are grateful to the Reviewer for bringing up this point, which we had overlooked. We hadn’t noticed that, in the end, only Cdc3 met the CCT confidence threshold to be included in the supplemental data of the Stein et al. paper. All septins co-purified with CCT in an earlier Dekker et al proteomic study, so we strongly suspect that the failure of the other septins to meet the confidence threshold in the Stein et al paper reflects the sensitivity of that assay, rather than a significant difference in how septin GTPase domains interact with CCT. We also hadn’t appreciated that according to that study, the main sites in the Cdc3 GTPase domain bound by CCT and Ssb are the same. Hence our statement that Ssb bound to septins “earlier” during translation, and CCT bound “later” was wrong. Instead, the overlapping Ssb and CCT site in Cdc3 turns out to be remarkably consistent with a conclusion from Stein et al paper, that CCT binds Rossmann-fold proteins like septins at sites where “early” beta strands have been translated and expose a chaperone-binding surface that later becomes buried by an alpha helix. We corrected our mistake in the text and in our model figure and added: (1) a new supplemental figure with predicted septin structures and a sequence alignment indicating where CCT and Ssb bound; and (2) text discussing the confidence thresholds for “calling” septin-CCT interaction, the Rossmann-fold binding, and how we interpret Ssb and CCT binding to the same site.

      Reviewer #3 “3. Figure 3: It is recommended to also follow Cdc10-GFP and Cdc12-GFP fluorescence. This will on the one hand generalize the presented findings and provide a direct link to other parts of the study (e.g. crosslinking analysis of Cdc10).

      We carried out the requested experiment for Cdc12, using Cdc12-mCherry rather than Cdc12-GFP because of the formation of non-native foci that we observed with Cdc12-GFP. We also attempted to analyze Cdc10, using an existing GAL1/10-promoter-driven Cdc10-mCherry plasmid that we’d made a few years ago, but it did not behave as expected, with high expression even in the absence of galactose (not shown), which prevented us from performing the requested experiment. We have a Cdc10-GFP plasmid with the inducible MET15 promoter, but this promoter does not provide sufficiently low levels of expression in repressive conditions, so there would be too much expression at the beginning of the experiment for us to accurately follow accumulation thereafter. Instead, we tried the only other plasmid we had with the GAL1/10-promoter controlling a tagged septin: Cdc11-GFP. Above a certain threshold of expression, Cdc11-GFP formed unexpected cortical foci, but we were still able to perform the analysis and found a clear delay in septin ring signal in cct4 cells, providing the requested generalization to other septins, if not Cdc10.

      Reviewer #3 “5. Figure 4C: The finding that only ssb1 but not ssb2 knockouts have an effect on joining of free Cdc12-mCherry subunits into septin rings is puzzling. Similarly, Ssb1 largely acts co-translationally, while in this assay post-translational septin ring assembly is monitored. The authors need to comment on these two points.”

      We did not examine ssb2 knockouts, so we do not know to what the Reviewer is referring in the first point. If the Reviewer means that they are puzzled by the fact that we saw a phenotype in cells in which only SSB1 was deleted and SSB2 remained, we offer two explanations. As can be seen in the Saccharomyces Genome Database entry for SSB1 (https://yeastgenome.org/locus/S000002388/phenotype), there are at least a dozen known phenotypes associated with deletion of SSB1 in cells with wild-type SSB2. We even showed a very clear septin misfolding/mislocalization phenotype in Supplemental Figure 4D. Thus while our findings are new and provide novel insights into Ssb function, they are not unprecedented. The Reviewer is correct that most Ssb is ribosome-bound and thus Ssb1 “largely acts co-translationally” but ~25% of Ssb is not ribosome-associated (PMID: 1394434). Furthermore, the lack of a strong phenotype for ssb1∆ cells in our new kinetics-of-folding experiment (see below), plus the realization that Ssb and CCT both bind the same site in Cdc3, leads us to a new model: Ssb acts both co- and post-translationally in septin folding, but only the post-translational function is associated with a phenotype in ssb1∆ cells, because in that assay we drastically overexpress a tagged septin and thereby exceed the Ssb chaperone capacity that remains when we delete SSB1. This logic also explains the first ssb1∆ phenotype we saw, when overexpressing Cdc10(D182N)-GFP. In the kinetics-of-folding assay, on the other hand, tagged septin expression is much lower and reducing the amount of total Ssb by ~50% (via SSB1 deletion) likely does not compromise Ssb function in folding the tagged septin. We therefore removed our statement that “Ssb dysfunction leaves nascent septins in non-native conformations that are aggregation-prone and unrecognizable to CCT”, revised our model figure accordingly, and added new text and citations to explain our new model.

      Reviewer #3 “Additionally, they should test whether the appearance of septin ring fluorescence is slowed down in ssb1 mutants (as shown for cct4-1 mutant cells in Figure 3B).”

      We agree that slower septin folding in ssb1∆ cells is a prediction of our model, and we performed the requested experiment and include the results in our revised manuscript. The new data show that the appearance of septin ring fluorescence is not delayed in ssb1∆ mutants, which is easily explained by the ability of Ssb2 to chaperone the folding of the low levels of tagged septin that we express in these kinds of experiments (see above).

      Reviewer #3: “7. Figure 5G: The data is not convincing. This reviewer cannot detect a specific Cdc12 band accumulating in presence of GroEL/ES.”

      We re-ran the reactions again with fresh reagents and this time ran the gel longer to reduce excess signal from free fluorescent puromycin and the bright Cdc10 bands. We now see a very clear band for full-length Cdc12 in the reaction with added GroEL/ES, fully consistent with our mass spectrometry results. We updated the figure with the new results.

      Reviewer #3: “Furthermore, the activity tests done for the chaperonin system are confusing (Supplemental Figure 7). The ATPase rate (slope!) of GroEL/GroES seems higher as compared to GroEL but according to the authors it should be opposite.”

      In our assays, the ATPase activity is so fast that for our “time 0” timepoint, much of it has already occurred by the time the reaction can be physically stopped and measured. In other words, the handling time is such that we can’t visualize what happened in the earliest stages of the reaction, where the rates could accurately be estimated as slopes. This is obvious from the fact that at time 0, the absorbance for the “GroEL alone” reaction is already more than twice the absorbance for GroEL+ES. We added clarifying text to the figure legend.

      Reviewer #3: “The refolding assay using Rhodanese as substrate is also confusing: What is the activity of native Rhodanese? The aggregated Rhodanese sample seems to have substantial activity that is not too different from a GroEL/ES-treated one. From the presented data it is not clear to the reviewer to which extend GroEL/ES prevents aggregation and supports folding of denatured Rhodanese.”

      We thank the Reviewer for bringing this to our attention, because made we mistakenly left out the values for native Rhodanese with the reporter. With regard to the aggregated Rhodanese, we failed to note that this sample contains urea. When the urea absorbance is subtracted, it is clear that the GroEL/ES-treated sample has higher activity. Furthermore, some native enzyme is likely still active within the aggregated sample, explaining the “substantial activity” that the Reviewer correctly notes. We corrected the figure and added clarifying text to the figure legend.

      Reviewer #3: “the study goes astray following aspects that does not seem relevant to this reviewer (e.g. the role of N-terminal proline residues for Cdc12 translation, Fig. 5E/F).”

      We acknowledge that we did a poor job of introducing the N-terminal Pro-rich cluster in Cdc12 with relation to our model of slow Cdc12 translation. Instead, we have revised and reorganized the manuscript to set up these experiments as a direct test of our model: if ribosome collisions on the body of the ORF drive mRNA decay, then decreasing the spacing of those ribosomes should exacerbate the problem, and eliminating the Pro-rich cluster (where published yeast data already show ribosomes accumulate) is the most logical way to test the prediction. Far from being irrelevant, the results fit the prediction perfectly and thus support the model. We expect that this change will highlight the importance of these experiments for the reader.

      Reviewer #2: “1) Fig. 1 Is the folding of Cdc3 being measured in cells lacking chaperones mentioned towards the end of the paper or are the authors referring to the lack of yeast proteins?”

      We are unclear as to what the Reviewer is asking here. The title of Figure 1 states that these are “purified yeast septins” and the figure legend further emphasizes this fact. Additionally, the Coomassie-stained gel in Figure 1A shows a single band, corresponding to purified 6xHis-Cdc3. The proteins were purified from wild-type E. coli cells, so all E. coli chaperones were present when Cdc3 initially folded, but chaperones and all other proteins were removed during the purification and prior to the analysis. We do not know what change to make.

      Reviewer #2 asked “How do the authors account for the septin defect in Ssa4 delete cells in unstressed conditions where Ssa4 would be very low already? According to the authors previous work, Ssa2 and 3 should be able to compensate.”

      We explicitly addressed this point in the original manuscript (lines 893-898). Again, we think here the Reviewer is equating chaperone binding with chaperone function. According to our previous work, Ssa2 and Ssa3 are able to bind septins, but this does not mean that they can fold septins the same way as Ssa4. We cite several papers that discuss the distinct functional roles for the different Ssa proteins. We do not think that additional clarification of this point would strengthen the manuscript.

      Reviewer #3: “6. Figure 5B: It is unclear why Cdc3 is observed in the pulldown of His-tagged Cdc12 (37˚C), although no Cdc12 was isolated under these conditions. How is that possible?”

      That is not possible. As we indicate in the figure legend and with the red asterisk, the only band appearing in that lane is a non-specific band that cross-reacts with the anti-Cdc3 and/or anti-Cdc11 antibodies. This is why it is also present in the “No septins” control lanes. We made the asterisk larger to help accentuate this point.

      Reviewer #3: “Furthermore, the authors observe a specific effect on Cdc12-Cdc11 assembly in the E. coli groEL mutant. How do they rationalize this specific effect as Cdc12-Cdc3 assembly remained unchanged? This observation also seems in conflict with the suggestion of the authors that Cdc12 preferentially recruits Cdc11 before interacting with Cdc3 (page 45, lane 1024).”

      Cdc11 was not expressed in the groEL mutants because no Cdc11 gene was present in those cells, as explained in the body text and indicated in the labeling above the lanes in Figure 5A. The band near the size of Cdc11 is a non-septin protein that bound to the beads in the groEL-mutant cells, as is shown in the immunoblot using anti-Cdc11 antibodies in Figure 5B. Thus there is no conflict to rationalize.

      Reviewer #1: “The only evidence that CCT binds to septin is the list of LC-MS/MS. Western blotting would provide more solid data.” and “2) The cross-linking experiments appears not to have been successful. Why are the Ssas, Ydjs etc not detected here? “

      First, CCT subunits are relatively low-abundance, expressed at 5- to 50-fold lower levels than other chaperone families in the yeast cytosol (see PMID: 23420633). To the Reviewer’s second point, we did in fact detect other chaperones in our crosslinking mass spectrometry experiments, including Ydj1, multiple Ssa and Ssb chaperones, Hsp104, etc., as can be seen in Table S1. However, they were also detected in negative control experiments. This is not surprising, given that these chaperones are among the most common “contaminants” of affinity-based purification schemes (see the CRAPome database at https://reprint-apms.org/). It was for this reason we had to perform so many negative control experiments, which likely produced some false negative results, as some “real” interactions were likely discarded when the same chaperone showed up in our controls. We added a figure panel with a Venn diagram of overlap between experimental and control samples, and text pointing out this caveat of our approach.

      Second, in this experiment we attempted to identify proteins that transiently interact with a specific region of Cdc10 that will later become buried in a septin-septin oligomerization interface. Due to the transient nature of the interaction, we do not expect to detect high levels of crosslinked chaperones. Mass spectrometry is significantly more sensitive than immunoblotting, so there is no guarantee that we would be able to detect a band even if the crosslinking works as desired. Indeed, the crosslinked bands we saw by immunoblot for GroEL were quite faint (see Figure 2F), despite the fact that GroEL and the T7-promoter-driven Cdc10 were among the most abundant proteins in those E. coli cells.

      Third, there is no commercially available, verified antibody recognizing yeast Cct3 for which to perform the requested immunoblot experiment. Since both the N and C termini of CCT subunits project into the folding chamber, it is unwise to use a standard epitope tagging approach, as the tags may compromise function. Indeed, for purification purposes others inserted an affinity tag in an internal loop in Cct3 (PMID: 16762366). We have a yeast strain with Cct6 tagged in an analogous way, but to perform the requested immunoblot experiment with Cct3 would require creating or obtaining the Cct3-tagged strain, deleting NAM1/UPF1, and introducing our Bpa tRNA/synthetase and GST-6xHis-Cdc10 plasmids. Given the sensitivity of detection concerns stated above, we doubt this would help.

      In summary, we prefer not to attempt the requested immunoblot experiments.

      Reviewer #1: “-Fig. 3B ant related Figures: The experiment to see if GFP-tagged septin accumulates in the bud neck is important, but only the graphs after the analysis are shown. The authors should provide the readers with representative examples from imaging data.”

      We are confused, because the images at the bottom of Figure 3A already show what the Reviewer requests. As stated in the figure legend, these are representative examples of the imaging data from a middle timepoint of one of the experiments. It would be nearly impossible (for space reasons) to provide representative images for all of the timepoints for all of the genotypes for all of the experiments. Since in our new experiments we introduce new tagged septins (Cdc11-GFP and Cdc12-mCherry), we also now include representative images of cells expressing these proteins, as well.

      Reviewer #2: “3) If the authors had evidence of chaperone interaction from their previous study, why did they not simply do IPs with fragments of the septins/chaperones?”

      We are unclear why the Reviewer is suggesting IPs after referring to our previous study. IPs are a poor choice for transient interactions, which is why we mostly avoided them in previous studies, and instead used a novel approach (BiFC) to “trap” chaperone–septin interactions. Moreover, we seek to identify chaperones that bind wild-type septins at future septin-septin interfaces on the path towards the native conformation. Fragments of septin proteins would likely misfold and would therefore likely attract chaperones that wouldn’t normally bind the full-length septin. Indeed, our previous studies demonstrated that even a single non-conservative amino acid substitution was sufficient to alter chaperone-septin binding. Thus IPs with fragments of septins or chaperones would be highly unlikely to yield informative results for the questions we seek to answer. We strongly prefer not to attempt these suggested experiments.

      Reviewer #2: “5) While differences between Ssa paralogs are highly interesting, using deletions of Ssas is not useful, given that yeast compensate by overexpressing other paralogs. The yeast GFP Septin assays should be repeated in yeast lacking all Ssas and expressing one paralog on a constitutive promoter (See numerous papers by Sharma and Masison).”

      We disagree that ssa deletions are “not useful”, since if the overexpressed paralogs cannot fulfill the same function as the deleted SSA, then we will see a phenotype. Which we do. Furthermore, we had already obtained and thoroughly tested a strain like the ones mentioned by the reviewer (ECY487, a.k.a. JN516, from Betty Craig’s lab, with ssa2∆ ssa3∆ ssa4∆ and SSA1, which is constitutively expressed, PMID: 8754838), but we found that, as published, it divides slightly more slowly even under the most permissive of conditions. The requested strain cannot be analyzed using our method, because slow accumulation of ring fluorescence could be attributed to other defects unrelated to septin folding. Thus we strongly prefer not to attempt the suggested experiments.

      Reviewer #2: “7) The authors need to clarify the experiment with the Ydj1 D36N and Ssa2 R169H. In Reidy et al, they never fully biochemically test this system and it was never examined for Ssa2-Ydj1. The authors would need to do some fundamental experiments to demonstrate the validity and functionality of this double mutant in yeast.”

      Given that this experiment was unable to generate meaningful data, since the mutations affected the kinetics of induction of the GAL1/10 promoter, we do not think the requested biochemical experiments would add any value to the study. Instead, we removed these studies from the manuscript.

      Reviewer #3: “4. Figure 3B: The difference between wt and cct4-1 cells in appearance of septin ring fluorescence is observed at one timepoint. Since this experiment is considered highly relevant, the authors are asked to include another timepoint to bolster the conclusion that Cdc3-GFP folding and thus septin ring assembly is delayed in the CCT mutant.”

      We carried out new experiments with cct4-1 cells using Cdc12-mCherry and Cdc11-GFP with more timepoints than in our original cct4-1 experiments with Cdc3-GFP. Since these experiments provide the same kinds of results, but at multiple timepoints, we do not see the value in repeating the Cdc3-GFP experiment.

      Reviewer #3: “If Ssb1 functions to maintain Cdc12 in an assembly competent state preventing misfolding, one would expect either enhanced degradation or aggregation of Cdc12-mCherry in ssb1 mutant cells. Did the authors check for such scenario? Septin aggregation has been shown in a ssb1 ssb2 double deletion strain (Willmund et al., 2013), yet the data shown here predict that aggregation might already occur in single ssb1 mutants.”

      We already examined septin aggregation in single ssb1 mutants and showed these data (Supplementary Figure 4D). Indeed, this phenotype was the rationale for testing post-translational septin assembly in ssb1 single mutants. We have seen no evidence of septin degradation in any context (as we mentioned on line 889), so we would not expect it here. While we added new text and a very new citation showing that many “misfolded” conformations of wild-type E. coli proteins avoid aggregation and degradation, we do not think that the suggested experiments would add enough value to the current study to justify the effort, time and expense.

      Reviewer #3: “Fig. 3C: The figure showing septin ring fluorescence does not include error bars. This is crucial, also because the difference between wt and ssa4 mutant cells is not large.”

      There are, in fact, error bars included in the figure, as can be most clearly seen for the final timepoint for the ssa4∆ cells. For most of the other timepoints the error bars are smaller than the data point symbols (the circles and squares). We do not think that adjusting the size or opacity of the symbols to better show the error bars will be sufficiently valuable to justify the effort.

    1. At the same time, like Harold, I’ve realised that it is important to do things, to keep blogging and writing in this space. Not because of its sheer brilliance, but because most of it will be crap, and brilliance will only occur once in a while. You need to produce lots of stuff to increase the likelihood of hitting on something worthwile. Of course that very much feeds the imposter cycle, but it’s the only way. Getting back into a more intensive blogging habit 18 months ago, has helped me explore more and better. Because most of what I blog here isn’t very meaningful, but needs to be gotten out of the way, or helps build towards, scaffolding towards something with more meaning.

      Many people treat their blogging practice as an experimental thought space. They try out new ideas, explore a small space, attempt to come to understanding, connect new ideas to their existing ideas.


      Ton Zylstra coins/uses the phrase "metablogging" to think about his blogging practice as an evolving thought space.


      How can we better distill down these sorts of longer ideas and use them to create more collisions between ideas to create new an innovative ideas? What forms might this take?

      The personal zettelkasten is a more concentrated form of this and blogging is certainly within the space as are the somewhat more nascent digital gardens. What would some intermediary "idea crucible" between these forms look like in public that has a simple but compelling interface. How much storytelling and contextualization is needed or not needed to make such points?

      Is there a better space for progressive summarization here so that an idea can be more fully laid out and explored? Then once the actual structure is built, the scaffolding can be pulled down and only the idea remains.

      Reminiscences of scaffolding can be helpful for creating context.

      Consider the pyramids of Giza and the need to reverse engineer how they were built. Once the scaffolding has been taken down and history forgets the methods, it's not always obvious what the original context for objects were, how they were made, what they were used for. Progressive summarization may potentially fall prey to these effects as well.

      How might we create a "contextual medium" which is more permanently attached to ideas or objects to help prevent context collapse?

      How would this be applied in reverse to better understand sites like Stonehenge or the hundreds of other stone circles, wood circles, and standing stones we see throughout history.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General response to the reviewer

      We thank all reviewers for their constructive comments on our manuscript. We were very pleased to see that the reviewers found our study ‘…represent new insight in the field’ (rev#1) and ‘…contains important and exciting novel findings’ (rev#2), and ‘…gives a more detailed perspective on how Src proteins (Src42A in Drosophila) control epithelial stability and the contraction of specific surfaces of epithelial cells’ (rev#3). The reviewers raised a number of specific points that we partially addressed already in a preliminary revision of the manuscript. Some more points will require some additional experiments that we will incorporate in a fully revised version of the manuscript.

      Reviewer #1

      (Evidence, reproducibility and clarity (Required)): Highest priority: 1) The Src42A knockdown and germline clone experiments both cause defects in cellularization (Fig. 2B and 9A), which could result in differences in the state of the blastoderm epithelium (cell size, cell number, structural integrity, organization, etc.) between the experimental and control conditions. In addition, Src42A knockdown appears to affect the size and shape of the egg (Fig. 9A and 9C). The manuscript would be strengthened if the authors included data to demonstrate that the initial structure of the epithelium is mostly normal (quantifications of cell size, number, etc.) in the Src42A RNAi condition, as this would bolster the argument that germband extension, rather than due to indirect effects resulting from the cellularization defects. The authors may have relevant data to do this on-hand, for example using data associated with figures 1, 3, 6, and 9.

      Response:

      The cellularization phenotype of src42A knockdown embryos has a penetrance of about 50% and exhibits a variable expressivity. We attempted to characterize this phenotype in detail, but failed to identify any dramatic differences in cellularization of the src42A knockdown embryos compared to wild type. The localization of E-cadherin, in turn is not affected, but occasionally, nuclei are dropping out of the blastoderm before cellularization is accomplished. This can result in patches of irregular cellularization, but the blastoderm epithelium in stage 6 embryos did not display major defects in overall structure. We will present additional data on the cellularization phenotypes in the fully revised manuscript. As the referee suggested, we will analyze our data to determine potential effects on the cell size, cell number and overall organization of the blastoderm before germband extension. We plan to present these data as an additional Suppl. Mat. Figure in the full revision.

      Lower priority:

      5) Figure 8 - in my opinion, using a FRAP or photoconversion approach would be a more convincing demonstration of differences in E-cadherin residency times / turnover rate than time-lapse imaging of E-cadherin:GFP alone. Authors should decide whether this improvement is worth the investment.

      Response:

      We thank the reviewer for this comment. While we believe that the data presented in Fig. 8 demonstrates a significant difference in the E-cadherin residence time based on E-cadherin-GFP fluorescence intensity, we agree with the referee that FRAP analyses would provide additional evidence to support our conclusion. For the full revision, we will therefore attempt to perform FRAP-experiments on src42A knockdown embryos expressing E-cadherin-GFP and compare the recovery time to the wild type.

      Reviewer #1 (Significance (Required)):

      The manuscript by Backer et al. examines the function of Src42A in germband extension during Drosophila gastrulation. Prior studies in the field have shown that Src family kinases play an important role in the early embryo, including cellularization (Thomas and Wieschaus 2004), anterior midgut differentiation (Desprat et al. 2008), and germband extension (Sun et al. 2017; Tamada et al. 2021). In this study, the authors showed that Src42A was enriched at adherens junctions and was moderately enriched along junctions with myosin-II. They then showed that maternal Src42A depletion exhibits phenotypes, starting with cellularization and including a defect in germband extension. The authors focus on defects in germband extension and found that Src42A was required for timely rearrangement of junctions and that the Src42A RNAi phenotype is enhanced by Abl RNAi. Finally the authors show that E-cadherin turnover is affect by Src42A depletion.

      Overall, this study provided a higher resolution description of how Src42A regulates the behavior of junctions during germband extension. I thought the authors conclusions were well supported by the data and represent new insight in the field.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: Chandran et al. investigate the role of Src42A in axis elongation during Drosophila gastrulation. Using maternal RNAi and CRISPR/Cas9-induced germline mosaics, they revealed that Src42A is required to contract junctions at anterior/posterior cell interfaces during cell intercalations. Using time-lapse imaging and image analysis, they further revealed the role of Src42A in E-Cad dynamics at cell junctions during this process.

      By analyzing double knockdown embryos for Src42A and Abl, they further showed that Src42A might act in parallel to Abl kinase in regulating cell intercalations. The authors proposed that Src42A is involved in two processes, one affecting tension generated by myosin II and the other acting as a signaling factor at tricellular junctions in controlling E-Cad residence time. Overall, the data are clear and nicely quantified. However, some data do not convincingly support the conclusion, and statistical analyses are missing for an experiment or two. Methods for several quantifications also need improvement in writing. Also, several figures (Figures 6-8) do not match the citation in the text and need to be corrected.

      Page and line numbers were not indicated in the manuscript. For my comments, I numbered pages starting from the title page (Title, page 1; Abstract, page 2, Introduction, pages 3-6; Results, pages 7-14; Discussion, pages 15-18; M&M, 19-23; Figure legends, 28-30) and restarted line numbers for each page. For Figures 6-8 that do not match the citation in the text, I still managed to look at the potentially right panels. All the figure numbers I mention here are as cited in the text. My detailed comments are listed below.

      Response:

      We apologize for the lack of organization of the manuscript and the figure numbering. In the revised version we have added page numbers, line numbers and we corrected the figure numbers.

      Major comments: 1. b-Cat/E-Cad signals at the D/V and A/P junctions in Src42Ai (Figs. 5-6). These data are critical for their major conclusion and should be demonstrated more convincingly.

      In Fig. 5A, the authors said, "When the AP border was cut, the detached tAJs moved slower in Src42Ai embryos compared to control (Fig. 5A)". However, even control tAJs do not seem to move that much in the top panels, and I found the images not very convincing.

      Response:

      We thank the referee for commenting on the lack of clarity in the presentation of the data. The overall movement within the first 10 seconds after the laser cut (determined by movement of adjacent D/V tAJs from each other) was about 2 µm in the wildtype, while in the mutant it was 1 µm. Despite this 50% difference, it may be difficult to appreciate this difference from looking at Fig. 5A in our original submission. The yellow lines in Fig 5A only showed the region of the cut, but did not indicate the movement of the tAJ from each other, which may have led to a distraction from the actual movement. We will change the annotation and the marks within the figure to visualize the movement much more clearly in the full revision. In the fully revised manuscript, we will also add movies from the experiments including marks of the tricellular junctions to follow the displacement as part of the Supplemental Material.

      Based on the genetic interaction between Src42A and Abl using RNAi (Fig. 7), the authors argue that Src42A and Abl may act in parallel. However, the efficiency of Abl RNAi has not been tested. It can be done by RT-PCR or Abl antibody staining. Also, the effect of Abl RNAi alone on germband extension should be tested and compared with Src42A & Abl double RNAi embryos. I expect the experiments can be done within a few weeks without difficulty.

      Response:

      We agree with the referee that it is important to determine the level of depletion in Abl RNAi embryos in order to interpret the genetic relationship between Abl and Src42A. In the full revision of the manuscript, we will follow the advice of the referee and analyze the knockdown, preferably by antibody labeling with an anti-Abl antibody. We will also generate single knockdowns of abl in embryos and determine their effect on germband extension compared to wildtype and src42/abl double knockdown.

      Minor comments:

      Fig. 2 - Fig. 2B: Higher magnification images of the defective cytoplasm can be shown as insets.

      Response:

      We will add some higher magnification images of the cellularization phenotype in the full revision of the manuscript. In addition, as mentioned in the response to reviewer #1, we will provide a more detailed analysis of the cellularization in src42Ai embryos in the fully revised manuscript.

      • Fig. 2E: A simple quantification of the penetrance of cuticle defects in Src42A mutants and RNAi will be helpful, as shown in Fig. S3.

      Response:

      In the full revision, we will add the quantification of the occurrence of the different classes of cuticle phenotypes.

      Fig. 9 - Fig. 9A: Magnified views of the cytoplasmic clearing can be added as insets.

      Response: As described in our response to the comments made by referee #1, we will add a more detailed analysis of the cellularization phenotype in the full revision.

      Page 14, lines 9-10: More explicit description of the phenotype rather than just "stronger compared to Src42Ai" will be helpful.

      Response:

      In the full revision, we will add a more detailed description of the phenotype and re-analyze and present data on the hatching rate, stage of lethality and cuticle phenotypes.

      Reviewer #2 (Significance (Required)): This work revealed the role of Src42A in regulating germband extension. A previous study suggested the roles of Src42A and Src64 in this developmental process using a partial loss of both proteins (Tamada et al., 2021). Using different approaches, the authors demonstrated a role of Src42A in regulating E-Cad dynamic at cell junctions during Drosophila axis elongation. Most of the analyses were done with maternal knockdown using RNAi, but they successfully generated germline clones for the first time and confirmed the RNAi phenotypes. Overall, this work contains important and exciting novel findings. This work will be of general interest to cell and developmental biologists, particularly researchers studying epithelial morphogenesis and junctional dynamics. I have expertise in Drosophila genetics, epithelial morphogenesis, imaging, and quantitative image analysis.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Chandran et al. report on the function of Src42A during cell intercalation in the early Drosophila gastrula. They create a Src42A-specific antibody (there are two Src genes in the fly genome) and examine the localization of Src42A and observe a planar-polarized distribution at cell interfaces. They then measure cell-contractile dynamics and show that T1 contraction is slower after Src42A disruption. The authors then argue that Src42A functions in a parallel pathway to the Abl protein, and that E-cadherin dynamics (turnover) is altered in Src42A disrupted embryos. Src function at these stages has been studied previously (though not to the degree that this study does), and in some respects the manuscript feels a little preliminary (please label figures with figure number!), but after editing this should be a polished study that merits publication in a developmentally-focused journal.

      1) Does the argument that Src42A has two functions fully make sense? Myosin II function is known to affect E-cadherin stability (and vice versa), so it seems that Src42A could affect both MyoII and Ecad by either decreasing Myosin II function/engagement at junctions or by destabilizing Ecad.

      Response:

      We thank the referee for raising an important point that we may not have discussed appropriately in our initial submission. We agree that the reciprocal relationship between actomyosin and E-cadherin might not be reflected equivocally in our manuscript. As the referee points out, Src42A could affect both MyoII planar localization and E-cadherin dynamics through the same pathway. Previous studies showed that Src is involved in translating the planar polarized distribution of the Toll-2 receptor by recruiting Pi3-Kinase activity to the Toll-2 receptor complex resulting in planar polarized distribution of MyoII at the A/P interfaces. These data, however do not address the possibility that a well-known Src target, the E-cadherin/ß-Catenin complex, which is extensively remodeled in germband extension contributes to the delay in germband extension. The observed defects in both studies can be attributed to both a defect in abnormal planar polarization of MyoII and the abnormal dynamics of the E-cadherin/ß-catenin complex. In either of these cases, we suggest that Src42A phosphorylates distinct substrates, the Toll-2 intracellular domain in the MyoII planar polarity pathway and the E-cad/ß-Cat complex controlling E-cad dynamics. Given the relationship between MyoII and E-cadherin, however, it is not possible to decide whether these two effects are independent functions of Src42A or are consequences of each other. Since we cannot resolve a possible epistatic relationship between these potential two activities of Src42A, we decided to extend the discussion on this topic by taking both possible scenarios into account and discussing them appropriately. We will add this discussion in the full revision of the manuscript.

      ) One obvious question that arises is the nature of cleavage defects that are mentioned that happen previously to intercalation. For example, is E-cad normal prior to intercalation initiating? How specific are the observed defects to GBE?

      Response:

      please see response to referee #1

      3) Pg. 10, "the shrinking junction along the AP axis strongly reduces its length with an average of 1.25 minute" - what is this measurement? How much is "strongly"?

      Response:

      We thank the referee for pointing out our inappropriate qualitative statement of the experimental data, which was indeed misleading. The measurement of the shrinking junction was based upon the time it takes for the AP interface junction between two adjacent vertices on the DV axis to shrink into a single 4-cell vertex. The time for this contraction was on average 1 minute 25 seconds. The data in Fig.4 A’,C show that after 2 minutes in the control embryo 100% of the observed AP junctions have collapsed and the extension of the new DV junction along AP axis has begun. At the same timepoint of 2 minutes in the src42A knockdown, we show in Fig. 4B’,D that the shrinking of the AP junction interface has still not been completed in 60% of the cases.

      In the full revision, we will remove the qualitative statement and replace it with a correct description of the measurements taken and will refer to the data described in Fig. 4 A-D.

      4) Also pg. 10, "the AP junction was not markedly reduced after 1 minute" - what is the criteria for this statement? X%? 1 minute is very specific, it feels like how much of a reduction/non-reduction should also be specific.

      Response:

      please see response to point 3.

      Reviewer #3 (Significance (Required)):

      This study gives a more detailed perspective on how Src proteins (Src42A in Drosophila) control epithelial stability and the contraction of specific surfaces of epithelial cells.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #2 and #3 noted that the manuscript was somewhat unorganized with regard to lacking the numbering of pages, lines and figures. We also noted that in the submission process the figures were not presented in the correct order. In the preliminary revision of the manuscript, we fixed these problems to facilitate the evaluation of our transferred manuscript by editorial boards.

      In addition, we also addressed issues that the referees mentioned by editing the text according to their comments. We also addressed problems regarding the presentation of the figures and statistical analyses of the data. The following changes were made:

      1. We added page numbers and line numbers.
      2. We added figure numbers to the figure panels.
      3. We corrected ordering of figures in the transferred manuscript.
      4. We addressed the following comments by statistical analyses, editing the text and the figures:

        Regarding comments from Reviewer #1:

      Highest Priority:

      2) There is a discrepancy in the staging of embryos used between some of the analyses, which make it hard to interpret some of the data. For example, characterization of the knockdowns in Fig. 1A and B are based on stages 10 and 15, whereas the majority of the paper is focused on earlier stages 6 - 8 during germband extension (e.g., Fig. 1D). The analysis for Fig. 1B would be more meaningful if it was done on the same stages used for subsequent phenotypic analysis so they can be directly compared.

      Response:

      We thank the referee for pointing out an apparent misunderstanding caused by the description of Fig. 1A,B. The data presented in Fig.1A and 1B do not show RNAi knockdown experiments, but show a comparison between embryos that are heterozygous or homozygous for the loss-of-function allele src42A26-1. These data were intended to demonstrate that zygotic mutants still maintain levels of maternal Src42A protein up until late stages of development. Data for embryos at an earlier stage (stage 5) were shown in the Supplementary Fig. S1E, where no difference in protein levels of Src42A can be observed between heterozygous and homozygous zygotic src42A26-1 embryos.

      At the beginning of the results sections 1 and 2 of the preliminary revised manuscript, we added a sentence to address the referee’s concern that earlier stages exhibit no difference in protein levels and will refer to Fig. S1E. We also more explicitly spelled that out that the experiment (referring to Fig.1A,B and S1) was intended to look at zygotic mutants and to demonstrate that our novel Src42A antibody was able to detect the reduction of maternal Src42A protein in mid- to late-stage homozygous zygotic embryos.

      3) There is incongruence between figures in terms of which junctional pools (bAJs vs. tAJs) of beta-catenin and E-cadherin are quantified that makes it difficult to draw comparisons between analyses. For example, pTyr levels are examined for both bAJs and tAJs in Figure 3, however, only tAJs are considered in Fig. 8. Similarly, in some cases planar cell polarity is considered (e.g., comparison of levels at AP vs DV bAJs in Fig. 6 and 9), and in other cases (e.g. Fig. 8) it is not.

      Response:

      We thank the referee for commenting on the different readouts for different pools of cell junctions in our experiments. In our study we considered effects on src42A on both, bAJs and tAJs by RNAi knockdown of src42A. We decided to present the data for bAJ and tAJ in separate figures for clarity and structure. For example, the data for the effect of src42A knockdown on the planar polarized distribution on bAJs of E-cadherin were presented in Fig.6, while the effect on E-cadherin residence time in tAJs were presented in Fig.8. The analysis pTyr levels considered both pools in order to determine whether src42A knockdown leads to an overall reduction of pTyr levels or to a reduction in a specific junctional pool. From our data we conclude that pTyr levels show a similar reduction in both, the bAJ and the tAJ junctions.

      In order to address the reviewer’s comment, we have linked the figures more stringently with the results text of the preliminary revision. We only referred to the reduction in PTyr levels in Fig. 3 to point out that both junctional pools are affected by reduced PTyr in src42i embryos. Furthermore, we referred to the individual figure panels when addressing junctional pools and explain the rationale to focus on particular pools (bAJs or tAJ) in the experiments in detail. For Fig. 6 we point out in the preliminary revised manuscript that we focus the analyses on the known planar polarized distribution of beta-catenin and E-Cadherin.

      Lower priority: 1) Introduction, 2nd paragraph - The modes of cell behaviors described to drive cell intercalation leaves out another clear example in the literature - Sun et al., 2017 - which describes a basolateral cell protrusion-based mechanism. While the authors cite this paper later, leaving it out when summarizing the state of the field misrepresents the current knowledge of the range of mechanisms responsible.

      Response:

      We thank the referee for this remark. In the preliminary revision, we have added to the introduction that the cell behaviors associated with germband elongation include apical and basolateral rearrangements of the cells indicating that basolateral protrusions also contribute to the set of mechanisms that drive germ band elongation.

      2) 'defective cytoplasm' - this term is confusing, and could perhaps be replaced with 'cellularization defect', or something similar.

      Response:

      We agree that the term we applied for the cellularization defect may be misleading. The observation, we intended to describe with the term was a defect in the cytoplasmic clearing which occurs in the last syncytial division and the beginning of the cell formation process. We changed the description of this observation according now refer to the defect in the preliminary revised manuscript as ‘cytoplasmic clearing defect’.

      3) Tests of statistical significance are not uniformly applied across the figures. For instance, Figures 3G + H indicate statistical significance, but Fig. 3D + E do not. Performing statistical tests throughout the paper, or clearly articulating a rationale when they are not used, would strengthen the manuscript. Specifically, the authors should consider this for Fig. 3D + E, and Fig. 7D + E, to support their arguments that rates of germband extension are different between conditions.

      Response:

      We agree with the reviewer and have provided statistical analysis for the data displayed in Fig. 3D,E and Fig. 7D,E in the preliminary revision of the manuscript.

      4) Page 12 - "We found that Src42A showed a distinct localization at the tAJs (Fig. 1B)": Figure 1B shows a quantification of levels at bAJs, not tAJs.

      Response:

      In the preliminary version of the revised manuscript, we added a quantification of the localization of Src42A at the tAJs as a part of Suppl Fig. S4. In Fig. S4A-C we show that Src42A is enriched in comparison to the bAJs.

      Regarding comments from reviewer #2:

      Major Comments:

      In Fig. 6A, b-Cat signals look fuzzier and dispersed and have more background signals in the control, compared to the Src42Ai background. Also, b-Cat signals in the control image do not seem to show enrichment at the D/V border, as shown in Tamada et al., 2012.

      Response:

      We agree with the referee that the image in Fig. 6A for the control is fuzzier and looks dispersed. This is due to the fixation method that we used. In this experiment we did not apply heat fixation, but used formaldehyde fixation in which b-catenin protein, in addition to the junctional pool, is also maintained in the cytoplasm creating the fuzzy cytoplasmic staining. We chose to do this in order to be able to co-immunolabel the embryos with b-catenin and E-cadherin antibodies; the latter staining is not working with the heat fixation applied in the Tamada et al. 2012 paper. Despite the slightly lower quality of the staining, the quantification of the data clearly indicated an effect of src42A knockdown on the planar polarized distribution of E-cad/b-cat complex does show an enrichment. In the preliminary revision added a note to the figure legend to indicate the fact that the fixation procedure was not optimized for b-catenin junctional staining. In the preliminary revision we also added a quantification of live imaging data recording E-cadherin-GFP in wild-type and src42Ai embryos. We present these additional data in Fig. S5 in the preliminary revision of the manuscript. These data are consistent with the results in Fig. 6 from the immunolabeling and support our conclusion that E-cadherin AP/DV ratio is increased in Src42A knockdown embryos.

      In Fig. 6B, C, it is not clear how the intensity was measured and how normalization was done. Was the same method used for these quantifications as "Protein levels at bicellular and tricellular AJs" on pages 21-22? Methods should be written more explicitly with enough details.

      Response:

      We thank the referee for pointing out the lack of detail in explaining how the quantification was done. In the preliminary revision of the manuscript, we extended a paragraph entitled ‘Protein levels at bicellular and tricellular junctions’ in the methods section that will serve this purpose and describe the methods that were applied for each quantification and the method as to how the data were normalized.

      Does each sample (experimental repeat) for the D/V border in Fig. 6B match the one right below for the A/P border in Fig. 6C? It should be clearly mentioned in the figure legend. The ratio of the DV intensity to AP intensity will better show the compromised planar polarity of the b-Cat/E-Cad complex.

      Response:

      We thank the reviewer for pointing out a lack of clarity in our presentation. The experimental repeats for each measurement do indeed match, i.e. the measurement of the DV border matches the same adjacent 4-cell pair in the same embryo and in total 5 distinct embryos were analyzed for each experiment. In the preliminary revision of the manuscript, we explain this detail of the experimental design in the figure legend. In the preliminary revision, we also determined the ratios of DV/AP cell interfaces for b-Cat and E-Cad and added this quantification as panel 6C and 6E for a clearer presentation of the data.

      Minor notes: Page 4, missing comma after "For example"

      Response: The text was edited accordingly.

      Page 4, "inevitable" does not make sense in this context

      Response: We eliminated ‘inevitable’ and replaced it with ‘critical’ to better indicate the importance of Canoe protein for germband elongation.

      Page 7, lines 6-7 - The localization of Src42A in control should be described in more detail and more clearly here.

      Response: In the preliminary revised manuscript, we extended our description of the distribution of Src42A in more detail pointing out its dynamics and differential distribution at distinct plasma membrane domains.

      Supplemental Fig S1 - Fig. S1D: Based on the head structure and the segmental grooves, the embryo shown here is close to late stage 13/early stage 14, not stage 15. - Fig S1E: It will be helpful if the predicted protein band and non-specific bands are indicated by arrows/arrowheads in the figure.

      Response:

      We thank the referee for their careful observation of the embryonic stage. We agree that the embryo was actually a younger stage. In the preliminary revision, we replaced the images with an example of an older stage. We will also add clear annotations as arrows to clearly mark the specific protein bands in Fig. S1E.

      Page 7, lines 21-22 - "Src42A was slightly enriched at the AP interface" - To argue that, quantification should be provided.

      Response:

      We thank the referee for pointing out a qualitative statement that we made with regard to the distribution of Src42A at the AP cell interfaces. In the preliminary revision of the manuscript, we present an additional quantification of the imaging data of Src42A immunolabeling. In Figure S4A-C, we now present a quantification of the enrichment of Src42A at the tricellular junctions. In addition, the new Fig. S4D,E shows a quantification of the planar polarized distribution of Src42A at the AP cell interfaces.

      Figure 1 - Fig. 1B: Src42A levels should be compared between control (Src42A/+) and Src42A/Src42A for each stage. It currently shows a comparison between Src42A/Src42A of stages 10 and 15.

      Response:

      We thank the referee for the comment. As indicated in our response to referee #1, the point of this analysis was to (1) provide evidence for the specificity of our new anti-Src42A antibody and (2) to demonstrate the presence of substantial material contribution of Src42A protein in zygotic mutant. We do not see the advantage to provide a detailed developmental Western-blot analysis, but we provide data in Suppl. Mat Fig S1E showing that the level of Src42A is unimpaired in stage 6 zygotic src42A[26-1] homozygous mutant embryos.

      • Fig. 1B: The figure legend says, "dotted line represents mean value and error bars," but there are no dotted lines shown in the figure. Also, what p-value is for ****? It should be mentioned in the figure legend. It also says Src42A levels were normalized against E-Cad intensity here (stages 10 and 15). They have shown that E-Cad levels are affected in Src42A RNAi during gastrulation (Fig. 6). Is E-Cad not affected in Src42A26-1 zygotic mutants at stages 10 and 15?

      Response:

      We thank the referee for pointing out inaccuracies in the presentation and the description of Fig.1B. In the preliminary revision, we emphasized the marks on the graph and provide p-values throughout. Regarding the E-Cadherin levels: E-cadherin levels were altered in src42A RNAi knockdown embryos, but not in zygotic mutants, even at later developmental stages.

      Page 8, line 14 - "Embryos expressing TRiP04138 showed reduced hatching rates with variable penetrance and expressivity depending on the maternal Gal4 driver used (Fig. 2B)" - Fig. 2B doesn't seem to be a right citation for this sentence.

      Response:

      We agree with the referee and in the preliminary revised manuscript we corrected the reference to the conclusion drawn from Figure 2A’, which does show the relationship of hatching rate to the various maternal Gal4 drivers.

      • Fig. 2C: It will be helpful to indicate two other non-specific bands in the figure with arrows/arrowheads with a description in the figure legend.

      Response:

      In the preliminary revision, we added an arrow to mark the band specific for Src42A and asterisks to mark unspecific bands in Fig 2C.

      Page 9, line 9 - This is the first time that the fast and the slow phases of germband extension are mentioned. As these two phases are used to compare the Src42A and Src42A Abl double RNAi phenotypes, they should be introduced and explained better earlier, perhaps in Introduction.

      Response:

      We thank the referee for pointing out that the two phases of germband extension were not introduced. We added a sentence to introduce and define the distinct phases of extension movements in the preliminary revision.

      Fig. 3 - Fig. 3A: It will be helpful to mark the starting and the ending points of germband elongation with different markers (arrows vs. arrowheads or filled vs. empty arrowheads).

      Response:

      In the preliminary revision, we added distinct markers to indicate the start and endpoints of germband elongation to make this figure easier to read.

      • Fig. 3C figure legend: R2 is wrongly mentioned in Fig. 3D, E. Also, R2 (coefficient of determination) needs to be defined either in the figure legend or Materials & Methods.

      Response:

      We thank the referee for pointing this misleading reference to us. In the preliminary revision we corrected the reference to R2 in Fig,3D,E and will describe the definition of R2 in the figure legend.

      • Fig. 3D, E: statistical analysis is missing.

      Response:

      In the preliminary revision, we included a statistical analysis of the data (see ref #1). We changed the figure to indicate the data sets that were analyzed and added the p-values to the figure legend.

      • Fig. 3G and H should be cited in the text.

      Response:

      In the preliminary revision, we added references to Fig 3G,H in the result section to the annotation of Fig.3F).

      • Fig. 3F: It should be mentioned that the heat map is shown for pY20 signals in the figure legend, with an intensity scale bar in the figure.

      Response:

      In the preliminary revision, we added an intensity scale bar to the figure panel and mentioned the relationship to the PY20 signal.

      Fig. 7A: Arrows can be added to mark the delayed germband extension.

      Response:

      In the preliminary revision, we added arrows to mark the anterior and posterior extent of the germband.

      Fig. 8A: It should be mentioned that the heat map is shown for E-Cad signals in the figure legend, with an intensity scale bar in the figure.

      Response:

      In the preliminary revision, we added an intensity scale to the heat map and mention the relationship to the E-cadherin signal in the figure legend.

      Fig. S3G: An arrowhead can be added to the gel image to indicate the band described in the legend.

      Response:

      In the preliminary revision, we added an arrow to help annotating the Src42A-specific bands on the Western blot.

      • Fig. 9B: Arrow/arrowheads can be added to show the absence of the signals in the nurse cells.

      Response:

      In the preliminary revision, we added markers to help recognizing the reduced signal in the nurse cells and the oocyte.

      • Fig. 9C: Indicate the ending point of the germband extension by arrows.

      Response: In the preliminary revision, we added arrows to mark the anterior and posterior extent of the germband.

      Regarding comments from reviewer #3:

      Minor notes: Page 4, missing comma after "For example"

      Response: The text was edited accordingly.

      Page 4, "inevitable" does not make sense in this context Response:

      In the preliminary revision, we eliminated ‘inevitable’ and replaced it with ‘critical’ to better indicate the importance of Canoe protein for germband elongation.

      Description of analyses that authors prefer not to carry out

      Referee #1 point2 and Referee#2 minor comment figure 1. Both referees suggest that figure 1 AB should include earlier developmental stages according to the stages looked at in the RNAi knockdown experiment.

      Response:

      The referees’ comments are likely based on a misunderstanding. The data that the reviewer are referring to present analyses of the zygotic phenotype of embryos homozygous for the src42A26-1 loss of function allele. They are not related to the maternal RNAi knockdown experiments, but were meant to demonstrate the existence and extent of a maternal pool of Src42A protein, that persists even to late stages in development. The maternal knockdown mutants are analyzed in detail at the appropriate stages in Fig. 2.

      As described in our response above, we don’t feel that a detailed developmental stage Western analysis of wildtype and src42A26-1 embryos would provide significant additional insights. As mentioned in our response above, data for an earlier developmental stage (before germband elongation, as requested by the referees, were provided in Suppl. Fig. S1E.

      Referee #1 Point 6) Figure 8E - showing images of multiple tAJs, rather than z-slices of a single vertex, would better support the claim here, as the assertion is that Src42a levels are different between control and sdk RNAi conditions, and not that it varies in the z-dimension.

      Response:

      The image series of Fig. 8E shows one representative example of multiple tAJs that have been imaged for this experiment (n=6 for wild type and n=10 for sdk RNAi). We think that the presentation of Z-slices for this experiment is important as the protein distribution needs to be considered for a larger area along the apical-lateral cell interface. In addition the quantification of the data for multiple tAJs was presented in Fig. 8F,G as a graph. We would therefore rather not change this figure in the revised manuscript.

      Referee #3 suggests that anti MyoII staining should accompany the analysis of tension measurements in the germband.

      As this analysis has already been performed by Tamada et al. 2021, we decided not to reproduce these data, but rather extend the analysis towards tension measurements, which support the findings by Tamada et al. 2021 on a functional level. We do not see the added value of adding MyoII labeling.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01016

      Corresponding author(s): Dennis Klug

      1. General Statements [optional]

      Dear editor, dear reviewers,

      thank you very much for the quick review of our manuscript as well as for the constructive criticism and the interesting discussion of our results. Reading the comments, we realized that we may have put too much emphasis on the in vivo microscopy of sporozoites and their interaction with the salivary gland. We believe that the generated mosquito lines can be used to address different scientific questions, the in vivo microscopy of host-pathogen interactions being only one of them. Because of this imbalance, and to address some of the reviewers' comments, we have partially rewritten the manuscript (particularly the introduction). At the same time, we have implemented additional data on the inducibility of the promoters used, as well as on the functionality of hGrx1-roGFP2 in the salivary glands. Furthermore, we created an additional figure to better present the expression patterns of trio and saglin promoters within the median lobe, and we expanded the section on in vivo microscopy of sporozoites. We hope that these results further highlight the significance of our study. Accordingly, we have also changed the title of the manuscript to „A toolbox of engineered mosquito lines to study salivary gland biology and malaria transmission” to indicate the broad applicability of the generated mosquito lines and we have included an additional co-author, Raquel Mela-Lopez, who conducted the redox analysis. We hope that these changes will adequately answer the questions of the reviewers and address any concerns they may have had. We look forward to hearing from you.

      With our kind regards,

      Dennis Klug

      Katharina Arnold

      Raquel Mela-Lopez

      Eric Marois

      Stéphanie Blandin

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary**

      This manuscript reports the generation and characterization of transgenic lines in the African malaria mosquito Anopheles coluzzii that express fluorescent proteins in the salivary glands, and their potential use for in vivo imaging of Plasmodium sporozoites. The authors tested three salivary gland-specific promoters from the genes encoding anopheline antiplatelet protein (AAPP), the triple functional domain protein (TRIO) and saglin (SAG), to drive expression of DsRed and roGFP2 fluorescent reporters. The authors also generated a SAG knockout line where SAG open reading frame was replaced by GFP. The reporter expression pattern revealed lobe-specific activity of the promoters within the salivary glands, restricted either to the distal lobes (aapp) or the middle lobe (trio and sag). One of the lines, expressing hGrx1-roGFP2 under control of aapp promoter, displayed abnormal morphology of the salivary glands, while other lines looked normal. The data show that expression of fluorescent reporters does not impair Plasmodium berghei development in the mosquito, with oocyst densities and salivary gland sporozoite numbers not different from wild type mosquitoes. Salivary gland reporter lines were crossed with a pigmentation deficient yellow(-) mosquito line to provide proof of concept of in vivo imaging of GFP-expressing P. berghei sporozoites in live infected mosquitoes.

      **Major comments**

      Overall the manuscript is very well written with a clear narrative. The data are very well presented. The generation of the transgenic mosquito lines is elegant and state-of-the art, and the new reporter lines are thoroughly characterized.

      This is a nice piece of work that is suitable for publication, although the in vivo imaging of sporozoites is somewhat preliminary and would benefit from additional experiments to increase the study impact.

      We would like to thank the reviewer for his/her appreciation of our manuscript. In the revised version, we have included additional experiments on in vivo imaging of sporozoites, which allowed us to quantify moving and non-moving sporozoites imaged under the cuticle of live mosquitoes. Although this is still a proof of concept, we believe that these new data provide novel interesting data and will better illustrate potential applications.

      The reporter mosquito lines express fluorescent salivary gland lobes, yet the authors only provide imaging of parasites outside the glands. It would be relevant to provide images of the parasite inside the fluorescent glands.

      We have now included images showing sporozoites inside the salivary glands in vivo in Figure 8C and discuss possible ways to further improve resolution and efficiency of the imaging procedure in lines 563-586.

      The advantage of the pigmentation-deficient line over simple reporter lines is not clear, essentially due to the background GFP fluorescent in figure 5C. Imaging of GFP-expressing parasites should be performed in mosquitoes after excision of the GFP cassette under control of the 3xP3 promoter. This would probably allow to document the value of the reporter lines more convincingly.

      Indeed, by incorporating two Lox sites in the transgenesis cassette, we designed the yellow(-)KI line to permit removal of the fluorescent cassette and completely exclude expression of the transgenesis reporter EGFP. Still, EGFP expression in the yellow(-)KI adults is restricted to the eye and ovary, as we show now in Figure 7 supplement 1D. In contrast, no EGFP fluorescence was observed in the thorax area (Figure 7 supplement 1D). Therefore, we believe that the benefit of removing the fluorescence cassette for this study is limited. Moreover, the generation of such a line would take at least 3-4 months before experiments could be performed. Nevertheless, we agree with the reviewer that removal of the fluorescence cassette would be instrumental for follow-up studies. To draw the reader's attention to this issue, we now discuss background fluorescence in lines 378-387.

      Along the same line, it is unclear if the DsRed spillover signal in the GFP channel is inherent to the high expression level or to a non-optimal microscope setting. This is a limitation for the use of the reporter lines to image GFP-expressing parasites.

      We have discussed this problem with the head of the imaging platform at our institute, and we believe that it is not a problem that occurs due to incorrect settings. Rather, it seems to be due to the significant expression differences of the two fluorescence reporters used. We agree with the reviewer that this is a limitation and discuss the problem now in lines 416-412 and 565-567.

      The authors should fully exploit the SAG(-) line, which is knockout for saglin and provides a unique opportunity to determine the role of this protein during invasion of the salivary glands. This would considerably augment the impact of the study. In this regard, line 131 and Fig S3E: why is there persistence of a PCR band for non-excised in the sag(-)EX DNA?

      We definitely share the reviewer's enthusiasm about saglin and its role in parasite development in mosquitoes. We have thoroughly characterized the phenotype of sag(-) lines with respect to fitness and Plasmodium infection. These results are described in a spearate manuscript currently in peer review and available as a preprint on bioRxiv (https://doi.org/10.1101/2022.04.25.489337). Furthermore, in the revised manuscript, we have included additional data on the transcriptional activity of the saglin promoter with respect to the onset of expression and blood meal inducibility (Figure 2). In addition, we have included a completely new Figure 3 to highlight the spatial differences in transcriptional activity of the saglin promoter compared with the trio promoter. These new data are commented in lines 206-276.

      There might be a misunderstanding in the interpretation of the genotyping PCR. The PCR shown in Figure 1 – figure supplement 3, displays PCR products for different genomic DNAs (sag(-)EX, sag(-)KI and wild type) using the same primer pair. „Excised“ refers to sag(-)EX while „non excised“ refers to sag(-)KI and „control“ to wild type. Primers were chosen in a way to yield a PCR product as long as the transgene has integrated, only the shift in size between „excised“ and „non excised“ indicates the loss of the 3xP3-lox fragment. We have now changed the labeling of the respective gel in Figure 1 – figure supplement 3 to make this clearer.

      Did the authors search for alternative integration of the construct to explain the trioDsRed variability?

      We validated trio-DsRed cassette insertion in the X1 locus by PCR. The only way to rule out an additional integration of the transgene would be whole genome sequencing, which we did not perform. Still, we believe that the observed expression patterns are due to locus-specific effects of the X1 locus. Indeed, several lines of evidence point in this direction: (1) transgenesis was realized using the phage Φ31 integrase that promotes site-specific integration (attP is 38bp long and very unlikely to occur as such in the mosquito genome) and for which we never detected insertion in other sites in the genome for other constructs inserted in X1 and other docking lines; (2) additional unlinked insertions would have been easily detected during the first backcrosses to WT mosquitoes we perform in order to isolate the transgenic line and homozygotise it; (3) we have often observed variegated expression patterns for other transgenes located in the X1 locus in the past, leading us to believe that this locus is subjected to variegation influencing the expression of the inserted promoters. Usually, the variation we observe is simpler (e.g. strong and weak expression of the fluorescent reporter placed under the control of the 3xP3 promoter in the same tissues where it is normally expressed), but some promoters are more sensitive to nearby genomic environment than others, which we believe is the case for trio. Finally, should there be additional insertions of the transgenesis cassette in the genome, they should all be linked to the X1 locus as we would otherwise have detected them in the first crosses as mentioned above, which is unlikely. Thus, although very unlikely, we cannot exclude a single additional and linked insertion possibly explaining the high/low DsRed patterns, but variegation would still be required to explain other patterns. We have mentioned this alternative explanation in the manuscript in lines 522-524.

      Line 254-255. Does the abnormal morphology of SG from aapp-hGrx1-roGFP2 result in reduced sporozoite transmission?

      This is an interesting question. For future experiments, it could indeed be important to test if the transmission of sporozoites by the generated salivary gland reporter lines is not impaired. However, the quantification of the number of sporozoites in aapp-hGrx1-roGFP2 expressing salivary glands did not reveal any significant differences from the wild type (Figure 5 – figure supplement 1B) and would definitely be sufficient to infect mice. As we have no evidence for reduced invasion of sporozoites in the salivary glands of aapp-hGrx1-roGFP2 and of the DsRed reporter lines, no good reason to believe that the expression of fluorescent proteins would interfere with parasite transmission, and as we produced these lines as tools to follow sporozoite interaction with salivary glands, we have not performed transmission experiments.

      Of note, we have now included images of highly infected salivary glands of all reporter lines in Figure 5 – figure supplement 1D to confirm that expression of the respective fluorescence reporter does not interfere with sporozoite invasion. Also we have not observed that sporozoites do not invade salivary gland areas displaying high levels of hGrx1-roGFP2.

      **Minor comments**

      -Line 51: sporogony rather than schizogony

      Schizogony was replaced with sporogony.

      -Line 56: sporozoites are not really deformable as they keep their shape during motility

      This sentence was removed.

      -In the result section, it is not clearly explained where constructs were integrated.

      We have now included the sentence „...with an attP site on chromosome 2L...“ (line 173) and the respective reference (PMID: 25869647) to give more information about the integration site.

      Line 106 and 434-435: for the non-expert reader, it is not clear what X1 refers to, strain or locus for integration?

      X1 refers to both, the locus and the docking line. We have rephrased the beginning of the result section (previously line 106) to give more information about the integration site as mentioned above.

      -Line 112-115: the rational of integrating GFP instead of SAG is not clearly explained here, but become clearer in the discussion (line

      We have slightly rephrased the sentence to better explain the reasoning for this procedure (lines 182-184).

      -Line 140: FigS2A instead of S3A

      This mistake was corrected in the revised manuscript.

      -Perhaps mention that GFP reporters (SG) might be useful to image RFP-expressing parasites.

      We have now included an image of the aapp-hGrx1-roGFP2 line infected with a mCherry expressing P. berghei strain in Fig. 7D.

      -Line 236: the authors cannot exclude integration of an additional copy (as mentioned in the discussion line 367-368).

      As discussed above, we removed „..as a single copy...“ and introduced the possibility of an additional integration linked to X1 (lines 522-524).

      -Line 257-258. The title of this section should be modified as SG invasion was not captured.

      The title was rephrased. It reads now „Salivary gland reporter lines as a tool to investigate sporozoite interactions with salivary glands” (line 356-357).

      -Line 287: remove "considerable number" since there is no quantification.

      This was removed. In addition, we included new data in this section of the manuscript and rephrased the results accordingly (lines 406-427).

      -Line 400-402: Klug and Frischknecht have shown that motility precedes egress from oocysts (PMID 28115054), so the statement should be modified.

      Thank you for this suggestion. The passage was modified accordingly.

      -Line 404: remove "significant number" since there is no quantification.

      This section was rephrased and the phrase "significant number" was removed (lines 406-427).

      -Line 497: typo "transgenesis"

      The typo was correct in the revised manuscript.

      -FigS1: add sag-DsRed in the title

      Thank you for spotting this inconsistency, we corrected this mistake (line 1134).

      -Stats: Mann Whitney is adequate for analysis in fig 2C but not 2B, where ANOVA should be used (more than 2 groups).

      We have performed now an one-way-ANOVA test and adapted figure and figure legend accordingly.

      Reviewer #1 (Significance (Required)):

      This work describes a technical advance that will mainly benefit researchers interested in vector-Plasmodium interactions. Invasion of salivary glands by Plasmodium sporozoites is an essential step for transmission of the malaria parasite, yet remains poorly understood as it is not easily accessible to experimentation. The development of transgenic mosquitoes expressing fluorescent salivary glands and with decreased pigmentation provides novel tools to allow for the first time in vivo imaging in live mosquitos of the interactions between sporozoites and salivary glands.

      Reviewer's expertise: malaria, Plasmodium berghei, genetic manipulation, host-parasite interactions

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The first achievements of the Klug et al. study are the (i) genetical engineering of the Anopheles coluzzii mosquitoes reared in insectarium, that stably express distinct fluorescent reporters (DsRed and hGrx1-roGFP2 and EGFP) under the putative "promoters" of genes reported to encode proteins expressed differentially in the pluri-lobal salivary glands(Sg) of anthropophilic blood-feeding adult females, (ii) the analysis of the promoter activity - based on the selected fluorescent reporter - with a primary focus on the salivary gland/Sg (including at the Sg lobe level) of the adult female but also considering the preimaginal developmental time with larvae and pupa samples. Of note, some data confirm the already reported time-dependent and blood meal-dependent promoter activity for the related Anopheles species. The last part presents preliminary dataset on live imaging of Plasmodium berghei sporozoites with the aim of highlighting the usefulness of these A. coluzzii transgenic

      lines to better understand how the rodent Plasmodium sporozoites first colonize and then settle as packed cells in Sg acinar host cells.

      **Major comments**

      The two first objectives presented by the authors have been convincingly achieved with (i) the challenging production of four different lines expressing different single or double reporters chosen by the authors (and appropriately presented in the result text and figure sections), (ii) the careful analysis of the spatiotemporal expression of the DsRed reporter under two "promoters" studied and with regards to the blood feeding event parameter. However, if the reason why the authors have put so much effort in the production of their transgenic mosquitoes is (and as mentioned) to provide a significant improved setting enabling the behavioral analysis of sporozoites upon colonization and survival in the Sg, it seems this part is kind of limited. Likely in relation with this perception is the fact I found the introductory section often confusing and not enough direct to the points: in particular distinguishing the rationale from the necessity to produce appropriate models, and clarifying what is/are the added value(s) offered by these new transgenic lines models when compared to what exist (in Anopheles stephensi) with specific evidence that argue for this knowledge gain. At this stage, it is unfortunately not clear to me, what is the bonus of imaging the Plasmodium fluorescent sporozoites in hosts with fluorescent salivary gland lobes if one can not monitor key events of the Sg-sporozoite interaction that were not reachable without the fluorescent mosquito lines. Furthermore, it should be better explained why the rodent Plasmodium species has been chosen rather Plasmodium falciparum (or other human species) for which A. coluzzii is a natural host; may be just mentioning that this study would serve as a proof of concept but bringing real biological insights would be fine.

      We would like to thank the reviewer for his/her evaluation of our manuscript, which has helped us clarify our manuscript on several points. Our goal here was a proof of concept demonstrating potential applications for the fluorescent salivary gland reporter lines and for the low pigmented yellow(-) line we generated. In vivo imaging of sporozoites in salivary glands is one possible application that we intended to use as proof-of-concept, but we tailored the manuscript too restrictively with this aim in mind and neglected other applications as well as characterization of the biology of salivary glands in general. To improve this, we have included further data on the blood inducibility of the promoters tested (Figure 2), the functionality of roGFP2 in the salivary glands (Figure 5), and the use of the generated lines in the examination and definition of expression patterns of salivary gland proteins in vivo (Figure 6). Accordingly, we have adjusted the entire manuscript to adequately describe all the results presented. We have also rephrased major parts of the abstract and the introduction to better describe the impact of salivary gland biology on the transmission of pathogens, and to explain the anatomy of salivary glands in more detail.

      We agree with the reviewer that it would be desirable to show direct salivary gland-sporozoite interactions in vivo. Still we believe that having mosquito lines expressing a fluorescent marker in the salivary gland as well as weakly pigmented mosquitoes are a first step to make this visualization possible, although we cannot provide a lot of quantitative data about this interaction yet.

      1- The three genes and gene products selected by the authors should definitively be more systematically explained, which means for example the authors need to introduce the different mosquito species and the parasite-mosquito host pairs they are then referring to for the promoter/encoded proteins of their interest. In the same vein, I did not find any information as to the choice of the mosquito species (A. Coluzzii) for the current work. I was curious to know what is the advantage since better knowledge was available with Anopheles stephensi with respect to (i) Saglin and its promotor activity, (ii) aap driven dsRed expression (lines already existing) and (iii) sporozoite-gland interaction.

      We have largely reworded the introduction to clarify the rationale for selecting these three promoters while providing a better understanding of salivary gland biology in general.

      The choice of the mosquito species depends, in our opinion, strongly on the perspective and on the experiments to be performed. We agree with the reviewer that the malaria mosquito A. stephensi is a widely used model, based on its robustness in breeding and its high susceptibility to P. berghei and P. falciparum infections. However, in these cases, both vector-parasite pairs are to some extend artificial. Indeed, although it is also a vector of P. falciparum in some regions, A. stephensi mostly transmits P. vivax that cannot be cultured in vitro. Thus research efforts on this vector-parasite pair is limited. Also, due to the emerging number of observed differences between Anopheles species and their susceptibility to Plasmodium infection and transmission, more research has recently been conducted on African mosquito species. This effect is also reinforced by the fact that P. falciparum, unlike all other Plasmodium species infecting humans, causes the most deaths, making control strategies for species from the A. gambiae complex such as A. coluzzii particularly important. As a result, the number of available genetic tools in A. coluzzi/gambiae has overpaced A. stephensi. These include mosquito lines with germline-specific expression of Cas9 for site-directed transgenesis, lines expressing Cre for lox-mediated recombination, and several docking lines. Such tools are, as far as we know, not available in A. stephensi and were essential in reaching our objectives. Docking lines are of particular interest because they allow reliable integration into a characterized locus, which is an advantage over random transposon-mediated integration. Random insertion sites have generally not been characterized in the past, which can cause problems since integrations regularly occur in coding sequences. Docking lines also enable comparison of different transgenes as they are all integrated in the same genetic environment, which does not ensure some expression variation as illustrated in our manuscript. For all these reasons, we have thus chosen to work with A. coluzzii.

      Concerning the use of the murine malaria parasite P. berghei instead of the human one P. falciparum, there are two reasons that motivated our choice. (1) For in vivo imaging of sporozoites, we needed a parasite line that is strongly fluorescent at this stage, and there is no such line existing for P. falciparum. Actually, there is no fluorescent P. falciparum line able to efficiently infect A. coluzzii reported thus far, as reporter genes have all been inserted in the Pfs47 locus that is required by P. falciparum for A. coluzzii colonization. (2) Imaging P. falciparum infected mosquitoes, especially with sporozoites in their salivary glands, requires to have access to a confocal microscope in a biosafety level 3 laboratory. Hence our objective here was indeed to provide a proof of principle of in vivo imaging of sporozoites in the vicinity or inside salivary glands using our engineered mosquitoes, and to provide a first analysis of this process using P. berghei as a model of infection. Nevertheless, we agree with the reviewer that the goal should be to work as close as possible to the human pathogen.

      Despite the wide range of topics that this study touches on, we want to try and keep the manuscript as concise as possible. Therefore, we have not discussed the advantages and disadvantages of the different vector-parasite pairs and ask the reviewer to indulge us in this.

      2- To help clarifying the added value of the present study, introducing the species names of the mosquito and the Plasmodium that serve as a model would be appreciated.

      We have included now the name of the used Plasmodium species in line 361. At this position we also give now more details about the transgene this line is carrying. We mention the used mosquito species A. coluzzii now at different positions in the manuscript (e.g. lines 52, 162 and 177).

      3- Since a focus is the salivary gland of the blood feeding female Anopheles sp., a rapid description of the glands with different lobes and subdomains the results and figure 1 nicely refer to, would help in the introduction.

      We explain now the anatomy of female and male mosquito salivary glands in the introduction (lines 119-123). The different lobes are now also indicated in the salivary gland images shown in several figures including Figure 1.

      4- That description could logically introduce the few proteins actually identified with lobe specific or cell domain specific expression (apical versus basal side, intracellular or surface expose, vacuole, duct...) profiles. The context with regards to sporozoite biology would then easily validate the "promoter choice". As a minor remark, I miss the reason why the authors wrote " the astonishing degree of order of the structures (referring to the packing of sporozoites within the Sg acinars) raise the question whether sporozoite can recognize each other". Please clarify since packing/accumulation can be passive due to cell mechanical constraints and explain what this point has to see with the question and experimental work proposed here?)

      We thank you for this suggestion. We have reworded key parts of the introduction to make the reasons for using the three selected promoters clearer. We also mention now other proteins expressed in the salivary glands which have been characterized in more detail because of their effect on blood homeostasis (e.g. anticoagulants) (lines 136-139).

      The mention of stack formation of salivary gland sporozoites served only to clarify that almost nothing is known about the behavior of sporozoites within the salivary glands in vivo to explain why new methods are needed to make these processes visible. We have now reworded this passage to make this clearer, and we also mention that stack formation could also occur due to mechanical constraints, as suggested by the reviewer (lines 101-102, 106-110).

      5- The selection of hGrx1-roGFP2 is quite interesting and justified but there is then no use of this reporter property in the preliminary characterization of the Sg and Sg-sporozoite interaction. Could the authors provide such characterization?

      We have now implemented data testing the functionality of hGrx1-roGFP2 in the salivary glands. We also show qualitatively that the redox state of glutathione does not change upon infection with P. berghei sporozoites (Figure 6). We now describe and discuss these new data in lines 337-354.

      6- Figure 1: it would be nice to add in the legend at what time the dissection/imaging has been made (age, blood feeding timing?). I would also omit the double mutant trio-Dsred/aapDsred in the main figure (may be supplemental) since the two single mutants Dsred separately together with the double mutant (with different fluorescence) already provide the information. I would suggest to regroup the phenotypic presentation of the transgenic line made in the KI mosquitoes (current figure 5) in the main figure 1.

      We have now added the missing information about the age of dissected mosquitoes and their feeding status in the legend of Figure 1. We also thank the reviewer for the suggestion to replace one image displaying aapp and trio promoter activity in trans-heterozygous mosquitoes with an image of the pigment deficient mutant yellow(-)KI. Still, due to the changes made to the manuscript based on the reviewers comments in general, we have now implemented new data highlighting the functionality of the generated salivary gland reporter lines investigating the redox state of glutathione as well as the expression pattern of the saglin and trio promoters at the single cell level (see Figure 3 and 6). Therefore it would no longer seem logical to introduce the yellow(-)KI mutant in Figure 1 while further data on this mutant are provided in the last two figures of the manuscript and discussed later in the manuscript (Figure 7 and 8). In addition we believe that co-expression of different transgenes (carrying fluorescent reporters) in the median and the distal lobes could potentially be interesting for certain applications. We believe that readers who might actually be interested in combining both transgenes in a cross would like to see the outcome to better evaluate the usefulness before experiments are planned and performed. This is especially true because localization as well as expression strength may differ between different fluorescence reporters while using the same promoter (e.g. the hGrx1-roGFP2 construct appears less bright and more localized to the apex of the distal-lateral lobes than dsRed, while expression of both reporters is driven by the aapp promoter in aapp-hGrx1-roGFP2 and aapp-DsRed, respectively).

      7- Figure 2:

      1. a) Is there anything known on the Sgs' size change overtime. It seems that between day 1 and 2 there is an increase of size and volume as much as I can evaluate the volume (Fig S4). Could that mean that there is increase in cell number in the lobes and therefore more cells expressing the transgene which would account for the signal intensity increase rather than more transcripts per cell? Thank you for this interesting question. The changes in the morphology of the salivary glands in Anopheles gambiae following eclosion have been studied in detail by Wells et al., 2017 (PMID: 28377572) which we cite now in the introduction (line 122-123). According to this reference, cell counts of the salivary gland are not changing upon emergence of the adult mosquito. However, we agree with the reviewer that the glands appear smaller and differ in morphology directly after eclosion. We noted that glands of freshly emerged females are more „fragile“ during dissections and lack secretory cavities, as reported by Wells et al., 2017. We believe that the increase in size occurs through the formation and filling of the secretory cavities which has been reported to take place within the first 4 days after emergence (Wells et al., 2017). This observation is in accordance with our observations that the promoters of the saliva proteins AAPP and Saglin display only weak activity after hatching, or, in the case of TRIO are not yet active directly after emergence. The timing of the formation of the secretory cavities is also in agreement with our time course experiment (Figure 2) which shows a strong increase in fluorescence intensity in dissected glands within the first 4 days after emergence.

      2. b) why choosing 24h after the blood meal to assess promoter activity in the Sgs? Do we have any information on how the blood meal impact on the Sgs'development. At this time anyway the sporozoites are far from being made. Yosshida and Watanabe 2006 mentioned at significant decrease of Sg proteins post-blood feeding. Could the authors detail their rationale based on what the questions they wish to address Thank you for this question. Unfortunately, the data available in the literature on this topic are very sparse, so we could only refer to few previous publications. The decision to quantify the fluorescence signals as early as 24 hours after blood feeding was based on Yoshida et al, Insect Mol. Biol, 2006, PMID: 16907827. The authors of this study generated the first salivary gland reporter line in A. stephensi by using the aapp promoter sequence to drive DsRed expression, and showed by qRT-PCR that DsRed transcripts increase 1-2 days after blood feeding compared to controls. Consistent with this observation and because we were concerned that putative changes in protein levels would only be visible for a short period of time, we began quantification one day after feeding. Since we observed significant changes in fluorescence intensity for the aapp-DsRed and sag(-)KI lines 24 hours after blood feeding, we retained the experimental setup and did not change it further. Nevertheless, we agree with the reviewer that different time points could help determine how long the effect lasts, and whether trio expression might also be regulated by blood feeding, but at a later time point. Still, our main objective here was to validate that the ectopic expression of DsRed driven by the aapp promoter in the aapp-DsRed line was indeed induced upon blood feeding as previously reported (PMID: 16907827). This experiment allowed us to confirm the inducibility of aapp in a different way and to show for the first time that saglin, but not trio, is induced one day after blood feeding. Our transgenic lines could be used for follow-up studies investigating the inducibility of salivary gland-specific promoters by different stimuli, or after infection with Plasmodium sporozoites. For example, for trio, transcription has been shown to increase after infection of the salivary gland by Plasmodium (PMID: 29649443).

      8- Figure 3: The figure is quite informative in terms of subcellular localization. Concerning the section "Natural variation of DsRed expression in trio-DsRed mosquitoes", I think it could be shortened because because it is a bit out of the focus the study.

      We agree with the reviewer that this part of the manuscript sticks a bit out and is not perfectly in line with the remaining results because it doesn’t deal with the salivary gland. Still, we would like to emphasise that in this work, we particularly want to show possible applications of the generated mosquito lines to address unanswered questions in host-parasite interactions and salivary gland biology. As a result, this manuscript establishes potentially important tools. For this reason, we feel it is important to mention the natural variation in DsRed expression, as this natural variation can have a significant impact on crossing schemes (especially with lines inheriting other DsRed-marked transgenes) and experiments (e.g. visualizing DsRed expression by western blot in larval and pupal stages). Furthermore, it is important for the use of the line to show that the transgene is inserted only once, at the expected location, which we try to emphasize with figure 4 – figure supplement 1 and figure 4 – figure supplement 2.

      We would also like to note that transgenesis in Anopheles is a relatively young field of research and altered expression patterns of ectopically used promoters have rarely been described so far, although this could have major implications e.g. in the case of gene drives. Therefore, we hope that the data shown will bring this previously neglected observation more into focus and highlight the importance of accurate characterization of generated transgenic mosquito lines.

      9- In contrast the last section of live imaging of P. berghei sporozoites in the vicinity and within salivary gland should be expanded. The 2 sentences summarizing the data are quite frustrating "We also observed single sporozoites moving actively through tissues in a back and forth gliding manner (Fig. 6B, Movie 3) or making contact with the salivary gland although no invasion event could be monitored"

      We have now implemented new data and extended Figure 8 showing the results of the in vivo imaging in a qualitative manner. We have rephrased the result and discussion section accordingly.

      10- I am aware of the technical difficulties to perform live imaging of sporozoite on whole mosquitoes, even when the salivary gland lobe under observation is closely apposed to the cuticle but that seems to be the final aim of the authors. I looked very carefully to the three movies and I am sorry but at this stage I could not make meaningful analysis out of them, and could not agree with the conclusions: for instances, the authors specify that sporozoites were undergoing back and forth movements (movie 3) but I do not see that and do not see the Sg contours in the available movies? The authors should also add bar and time scales to their movies. Having an in-depth description with regards to the sub-domain marked by a relevant reporter would strengthen the study, even if images are not collected in the whole mosquito to get higher resolution.

      We thank the reviewer for this comment. We have to admit that parasite imaging in fluorescent salivary glands in vivo is an ambitious goal given the complex biological system we are working with. We believe that the system presented in our manuscript is a first and important step to enable the analysis of the interaction of sporozoites with salivary glands, although in-depth analysis will require further optimization and considerable time, especially to generate quantitative data. Therefore, we now downstate the significance of our results in this respect and changed the title accordingly. Still, we also provide a more detailed analysis of the data we have already collected (Figure 8 and lines 406-427). Because we focus on the analysis of sporozoites in the thorax area in the revised manuscript, the outlines of the salivary gland are not necessarily visible in the images.

      I am not sure I understand the relevance of this quite condensed sentence in the text. Could the authors rephrase and expand if they wish to keep the issues they refer to. "The sporozoites' distinctive cell polarization and crescent shape, in combination with high motility, allows them to „drill" through tissues". I would stress more on the main unknown in terms of sporozoite-Sg interactions and the need to get right models for applying informative approaches (i.e. here, imaging).

      We thank you for this suggestion. The sentence mentioned has been removed in its entirety. We have also adjusted the text accordingly and reworded most of the introduction to make the narrative clearer (lines 91-119).

      Of note, it could help to point that the "Sgs is a niche in which the sporozoites which egress from the oocyst could mature and be fully competent when co-deposited with the saliva into the dermis of their intermediary hosts"

      We have now implemented a similar sentence in the introduction (lines 93-98).

      Reviewer #2 (Significance (Required)):

      1- Clear technical significance with the challenging molecular genetics achieved in the mosquito A. coluzzii.

      2- More limited biological significance: fair analysis and gain of knowledge of spatio-temporal of reporter expression under the selected promoter but limited significance of the final goal analysis which concerns the Plasmodium sporozoite biology once egressed from oocysts

      As stated above, we changed the title to place the focus on the engineered mosquito lines.

      3- Previous reports cited by the authors have used the DsRed reporter and the aap promoter in another Anopheles (i.e. A. stephensi, Yoshida and Watanabe, Insect Mol Biol, 2006; Wells and Andrew, 2019) which is also a natural host and vector for human Plasmodium spp.) with significantly more resolutive 3D visualization of GFP-fluorescent P. berghei but in dissected salivary glands and not in whole mosquitoes. The Wells and Andrew publication entitled "Salivary gland cellular architecture in the Asian malaria vector mosquito Anopheles stephensi" in Parasite Vectors, 2015 would deserve to be reference and described.

      Thank you very much for this suggestion. We considered citing Wells and Andrews (PMID: 26627194). However, this reference focuses very specifically on the subcellular localization of AAPP and shows only highly magnified sections of immunostained dissected and fixed salivary glands. Working only with the AAPP promoter, we felt it important to refer to the previously observed expression pattern along the entire salivary gland, as shown in Yoshida and Watanabe (PMID: 16907827). Nevertheless, we have cited two other publications by Wells and Andrews (PMID: 31387905 and 28377572) at various points in the manuscript.

      4- Audience: I would say that this work should be of interest of mostly scientists investigating Plasmodium biology (basic and field research) or in entomology of Diptera.

      5- To describe my fields of expertise, I can refer to my extensive initial training in entomology including at one point in the genetic basis of mosquito-virus interaction. I have also been working for more than 20 years in the field of Apicomplexa biology (Plasmodium and Toxoplasma) and I have long-standing interest in live and static high-resolution imaging.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Klug et al. generated salivary gland reporter lines in the African malaria mosquito Anopheles coluzzii using salivary gland-specific promoters of three genes. Lobe-specific reporter activity from these promoters was observed within the salivary glands, restricted either to the distal lobes or the medial lobe. They characterized localization, expression strength and onset of expression in four mosquito lines. They also investigated the possibility of influences of the expressed fluorescent reporters on infection with Plasmodium berghei and salivary gland morphology. Using crosses with a pigmentation deficient mosquito line, they demonstrated that their salivary gland reporter lines represent a valuable tool to study the process of salivary gland colonization by Plasmodium parasites in live mosquitoes. SG positioning close to the cuticle in 20% of females in this strain is another key finding of this study.

      The key findings from this study are largely quite convincing. The authors have created a suite of SG reporter strains using modern genetic techniques that aid in vivo imaging of Plasmodium sporozoites.

      Vesicular staining within salivary acinar cells should be stated as "vesicle-like" staining unless a co-stain experiment in fixed SGs is conducted using antisera against the marker protein(s) and antisera against a known vesicular marker (e.g. Rab11). It may also be possible to achieve this in vivo using perfusion of a lipid dye (e.g. Nile Red), but this is not necessary. As is, in Fig. 3A, there are images in which it appears that the vesicle-like staining is located both within acinar cells' cytoplasm and in the secretory cavities (e.g. Fig. 3A: aapp-DsRed bottom and middle), and this is fine, but should be more inclusively stated. Fixed staining of the reporter strain SGs would allow for clarification of this point. In previous work, other groups have observed vesicle-like structures in both locations (e.g. PMID: 33305876).

      Thank you very much for this suggestion. Indeed, when we observed the vesicle-like localization, we had similar ideas and considered investigating the identity of the observed particles in more detail. Ultimately, however, we concluded that the localization of DsRed does not play a critical role in the use of the lines as such and believe that a more detailed investigation of the trafficking of the fluorescent protein DsRed is beyond the scope of this study.

      We have thus followed the suggestion of the reviewer and now use the phrase „vesicle-like“ throughout the manuscript. In addition, we extended the discussion on the different localizations observed and presented some explanations that might have led to this observation. We also included a new reference that investigated the localization of AAPP using immunofluorescence (PMID: 28377572).

      Morphological variation is extensive among individual mosquito SGs, thought to impact infectivity, and well documented in the literature. The manuscript should be edited to make it much clearer (e.g. n = ?) exactly how many SGs, especially in microscopy experiments, were imaged before a "representative" image was selected from each data point and in any additional experiment types where this information is not already presented. Figure S8 is an example where this was done well. Figure 3A-B is an example where this was not well done. All substantial variation (e.g. "we detected a strangulation..." - line 189) across individual SGs within a data point should be noted in the Results. Because of the genetics and labor involved, acceptable sample sizes for minor conclusions may be small (5-10), but should be larger for major conclusions when possible.

      Thank you for this comment. We have improved this point by specifying precisely the number of samples and of repetitions in the respective figure legends. For example, we have now quantified the proportion of moving sporozoites and report both the number of sporozoites evaluated and the number of microscopy sessions required (see Figure 8).

      Thank you for this comment. We have improved this point by specifying precisely the number of samples and of repetitions in the respective figure legends. For example, we have now quantified the proportion of moving sporozoites and report both the number of sporozoites evaluated and the number of microscopy sessions required (see Figure 8). Regarding Figure 3, fluorescence expression and localization in salivary gland reporter lines was actually very uniform in each line. We added the following sentence in the legend of revised figures 3 and 5: “Between 54 and 71 images were acquired for each line in ≥3 independent preparation and imaging sessions. Representative images presented here were all acquired in the same session”.

      Sporozoite number within SGs has been shown to be quite variable across the infection timeline, by mosquito species, by parasite strain, in the wild vs. in the lab, and according to additional study conditions. The authors mention that the levels they observed are consistent with their prior studies and experience, but they did not utilize the reporter strains and in vivo imaging to support these conclusions, instead relying on dissected glands and a cell counter. It is important for these researchers to attempt to leverage their in vivo imaging of SG sporozoites for direct quantification, likely using the "Analyze Particles" function in Fiji. The added time investment for this additional analysis would be around two weeks for one person experienced in the use of the imaging software.

      Thank you for this interesting suggestion. Indeed, it would be beneficial to use an imaging based approach to quantify the sporozoite load inside the salivary glands. We already used „watershed segmentation“ in combination with the „Analyze Particles“ function in Fiji on images of infected midguts to determine oocyst numbers. Still, we believe this analysis cannot be applied to images of infected salivary glands mainly because of differences in shape and location of the oocyst and sporozoite stages. Sporozoites inside salivary glands form dense, often multi-layered stacks. Because of this close proximity, watershedding cannot resolve them as single particles which could subsequently be counted. This creates an unnecessary error by counting accumulations of sporozoites as one, likely leading to an underestimation of actual parasite numbers. Furthermore, given that the proximity issue could be resolved e.g. by performing infections yielding lower sporozoite densities, another problem would be that infected salivary glands prepared for imaging are often slightly damaged leading to a leak of sporozoites from the gland into the surrounding. These leaked sporozoites are likely not included on images which would then be used for analysis, potentially leading again to an underestimation of counts. Since these issues are circumvented by the use of a cell counter, we believe that this method is still the method of choice in acquiring sporozoite numbers.

      Nevertheless, we can understand the reviewer's concern that counts performed with a hemocytometer do not reflect the variability in the sporozoite load of individual mosquitoes. To highlight that all generated reporter lines can have high sporozoite counts, we have now included images of highly infected salivary glands for each line in Figure 7D.

      This manuscript is presented thoughtfully and such that the data and methods could likely be well-replicated, if desired, by other researchers with similar expertise.

      The statistical analysis is appropriate for the experiments conducted. It is currently unclear if some experiments were adequately replicated. That information should be added to the paper throughout where it is missing.

      We do appreciate your comments on our efforts to give all required information for other laboratories to replicate our experiments. We have added the missing information about the number of independent experiments in the respective figure legends wherever appropriate.

      Studies from multiple groups should be more thoroughly referenced when the authors are describing the "vesicle-like" staining patterns observed in SGs from reporter strains (e.g. Fig. 3A). Is this similar to the SG vesicle-like structures observed previously (e.g. PMIDs: 28377572, 33305876, and others)?

      Thank you for this comment. We did not discuss this observation in detail in the first version of our manuscript because the observed localization was rather unexpected, as DsRed was not fused to the AAPP leader/signal peptide. The observed localization is therefore difficult to explain, however, we have expanded the discussion on this (lines 465-482) and now cite one of the proposed references (PMID: 28377572, lines 468-469).

      There are minor grammar issues in the manuscript text (e.g. "Up to date" should be "To date"). The figures are primarily presented very clearly and accurately. One minor suggestion: In cases such as Fig. S2A images 3 and 6, where some of the staining labels are very difficult to read, please move all labels for the figure to boxes located directly above the image.

      We are sorry for the grammatical errors we have missed in the first version of our manuscript. We have now performed a grammar check over the whole manuscript. We have also increased the font size of the captions in the above figures and tried to make them better readable by moving the captions over the images.

      The data and conclusions are presented well.

      Reviewer #3 (Significance (Required)):

      This report represents a significant technical advance (improved in vivo reporter strain and sporozoite imaging), and a minor conceptual advance (active sporozoite active motility), for the field.

      This work builds off of previous SG live imaging studies involving Plasmodium-infected mosquitoes (e.g. Sinnis lab, Frischneckt lab, etc.), addressing one of the major challenges from these studies (reliable in vivo imaging inside mosquito SGs).

      This work will appeal to a relatively small audience of vector biology researchers with an interest in SGs. Many in the field still see the SGs as intractable, instead choosing to focus on the midgut due to ease of manipulation. Perhaps work like this will spark new interest in tangential research areas.

      I have sufficient expertise to evaluate the entirety of this manuscript. Some descriptors of my perspective include: bioinformatics, SG molecular biology, mosquito salivary glands, microscopy, RNA interference, SG infection, and SG cell biology.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Klug et al generated transgenic mosquito lines expressing fluorescent reporters regulated by salivary gland specific promoters and characterized fluorescent reporter expression level over the time, subcellular localization of fluorescent reporters, and impact on P. berghei oocyst and salivary gland sporozoite generation. In addition, by crossing one of the lines (aapp-DsRed) with yellow(-) KI mosquitoes, they open up the possibility to perform in vivo visualization of salivary glands and sporozoites.

      Overall the generation and characterization of these transgenic lines is well-done and will be helpful to the field. However, there are several concerns with the in vivo imaging data shown in Figure 6, which does not convincingly show fluorescent sporozoites in the lobe or secretory cavity of a fluorescent salivary gland lobe. This needs to be addressed. Points related to this concern are outlined below:

      (1) Although the authors mention that the DsRed signal was strong enough to see with GFP channel, it would be more appropriate to show that the DsRed signal from salivary glands and GFP channel image co-localize.

      We now show a merge of the GFP and DsRed signal in Figure 7 – figure supplement 2 The yellow appearance of the salivary gland in the merge likely indicates the spillover of the DsRed signal into the GFP channel. In addition we discuss the issue in lines 416-412 and 565-567.

      (2) Mosquitoes were pre-sorted using the GFP fluorescence of the sporozoites on day 17-21. From figure 4B, median salivary gland sporozoite number was about 10,000 sporozoites/mosquito on day 17-18. However, in Figure 6A there are no sporozoites in the secretory cavities. They should be able to see sporozoites in the cavities at this time. Can the authors confirm that they can visualize sporozoites in secretory cavities in vivo and perhaps show a picture of this.

      This is entirely correct. We also examined mosquitoes for the presence of sporozoites in the salivary glands and wing joints prior to imaging, as shown in Figure 7B and Figure 7 – figure supplement 2A, to increase the probability that sporozoites could be observed. Nevertheless, the area of the salivary gland that comes to the surface is often small and limited to a few cells that can be imaged with good resolution. Unfortunately, these same cells were often not infected although other regions of the salivary glands must have been very well infected based on the previously observed GFP screening (Figure 7B). In addition, with the confocal microscope available to us, we struggled to achieve the necessary depth to image sporozoites in the cavities of the salivary gland cells. For this reason, we were often able to detect a strong GFP signal in the background, but not always to resolve the sporozoites sufficiently well. Still, we have now included an image showing sporozoites in salivary glands (Figure 8C). However, we believe that the method can be further improved to be more efficient and provide better resolution. We discuss possible ways to further improve the imaging in lines 563-586.

      (3) There is no mention of the number of experiments performed (reproducibility) and no quantification of the imaging data. In the results (line 287-288), the authors state that sporozoites are present in tissue close to the gland and sometimes perform active movement. How can this be? Do they believe these sporozoites are on route to entering? More relevant to this study would be a demonstration that they can see sporozoites in the secretory cavities of the salivary gland epithelial cells, this should shown. If they have already performed a number of experiments, I would suggest to do quantification of the number of sporozoites observed in defined regions . The mention that sporozoites are moving is confounded by the flow of hemolymph. How do they know that the sporozoites are motile versus being carried by the hemolymph. Perhaps it's premature to jump to sporozoite motility in the mosquito when they haven't even shown sporozoite presence in the salivary glands.

      Thank you very much for this comment. We have followed the suggestions of the reviewer and have now quantified the behavior of sporozoites in the thorax area of the mosquito. For the analysis, we only considered sporozoites that could be observed for at least 5 minutes. This analysis revealed that 26% of persistent sporozoites performed active movements, which in most cases resembled patch gliding previously described in vitro. We adjusted the results section accordingly. In addition, we have changed the figure legend to accurately indicate the number of experiments performed. Likewise, we now also provide an image of sporozoites that we assume are located in the salivary gland (Figure 8C). Although we have not yet been able to image and quantify vector-sporozoite interactions extensively (further improvements would be required, as mentioned previously), we believe these results illustrate the potential of the transgenic lines.

      (4) In vivo imaging has been performed with the mosquito' sideways. Was this the best orientation? Have you tried other orientations like from the front (Figure 5B orientation).

      It is true that in the abdominal view as shown in Figure 7B the fluorescence in the salivary glands is very well visible. This is mainly due to the fact that in this area the cuticle is almost transparent and therefore serves as a kind of "window". Nevertheless, the salivary glands are not close to the cuticle in this position, which makes good confocal imaging impossible. Imaging always worked best where the salivary gland was very close to the cuticle, and this was always laterally. However, there were differences in the position of the salivary glands in individual mosquitoes, which also led to slight differences in the imaging angle.

      Overall, the text is easy to follow and I have only few suggestions.

      Thank you for this comment.

      In the result section, the authors describe the DsRed expression during development of mosquito (line 194-236) after they describe subcellular localization of fluorescent reporters. I felt the flow was disrupted. Thus, this part (line 194-236) could summarize and move to line 135. In this way, the result section flow according to the main figures.

      Thank you very much for this suggestion. We have considered your idea, but based on the changes we have made in response to reviewer comments and new data implemented in the form of two new figures, we believe the current order in the results section is more appropriate. The rationale was primarily to first characterize the expression of fluorescent reporters in the salivary glands of all lines before going into more detail on expression in other tissues of a single line. We then finish with potential applications like in vivo imaging of sporozoite interactions with salivary glands.

      Also, and as mentioned previously (reviewer 2, point 8), we believe it is important to describe the variability of ectopic promoter expression at a given locus with sufficient details, as this has not been characterized thus far despite its importance.

      In the result section, text line 186-190, the authors describe the morphological alternation of salivary gland in aapp-hGrx1-roGFP2. I would suggest to mention that this observation was only in one of lateral lobe. (I saw that it was mentioned in the figure legend but not in the main text.)

      We believe there has been a misunderstanding. The morphological alteration in salivary glands expressing aapp-hGrx1-roGFP2 was observed in all distal-lateral lobes to varying degrees (quantification in Figure 6E). To include as many salivary glands as possible in the quantification and because in some images only one distal-lateral lobe was in focus, only the diameter of one lobe per salivary gland was measured and evaluated. We have now revised the legend to prevent further misunderstandings.

      In the discussion section, author discuss localization of fluorescent reporters (line 322-331). When I looked at aapp-DsRed localization pattern (Figure 3A), the pattern looked similar to the previous publication by Wells et al 2017 (https://www.nature.com/articles/s41598-017-00672-0). This publication used AAPP antibody and stain together with other markers (Figure 4-7). This publication could be worth referring in the discussion section.

      Thank you for this suggestion. According to the information available through Vectorbase, we did not fuse DsRed with any coding sequence of AAPP that could potentially encode a trafficking signal. Therefore, it is rather unlikely that the observed DsRed localization in our aapp-DsRed line and the localization observed by AAPP immunofluorescence staining in WT mosquitoes match. This is further exemplified by the cytoplasmic localization of hGrx1-roGFP2 in the aapp-hGrx1-roGFP2 line, where the reporter gene was cloned under the control of the same promoter. For this reason, we had not mentioned this reference in the first version of the manuscript. In the revised manuscript, we have included now the suggested reference (lines: 475-476) and extended the discussion on possible reasons which led to the observed localization pattern.

      In the text, authors describe salivary gland lobes as distal lobes and middle lobe. It would be more accurate to refer to the lobes as the lateral and medial lobes. The lateral lobes can then be sub-divided into proximal and distal portions. I would suggest to use distal lateral lobes, proximal lateral lobes and median lobe as other references use (Wells M.B and Andrew D.J, 2019).

      Thank you for this suggestion. We have corrected the nomenclature for the description of the salivary gland anatomy as suggested throughout the manuscript.

      Overall, the figures are easy to understand and I have following suggestions and questions.

      Figure 1C) It is hard to see WT salivary gland median lobe. If authors have better image, please replace it so that it would be easier to compare WT and transgenic lines.

      We have replaced the wild-type images of salivary glands in this figure and labeled the median and distal-lateral lobes accordingly (see Figure 1).

      Figure 2) While it was interesting to observe the significant expression differences between day 3 and day 4, have you checked if this expression maintained over time or declines or increases (especially on day 17-21 when author perform in vivo imaging)?

      Thank you for this interesting question. We have not quantified fluorescence intensities in mosquitoes of higher age. Nevertheless, we regularly observed spillover of DsRed signaling to the GFP channel during sporozoite imaging, suggesting that expression levels, at least in aapp-DsRed expressing mosquitoes, remain high even in mosquitoes >20 days of age (see Figure 8A). We also confirmed this observation by dissecting salivary glands from old mosquitoes, whose distal lateral lobes always showed a strong pink coloration even in normal transmission light (data not shown).

      Figure 3A) There is no description of "Nuc" in figure legend. If "nuc" refers to nucleus, have you stained with nucleus staining dye (example, DAPI)?

      Thank you for spotting this missing information in the legend. Initial images shown in this figure were not stained with a nuclear dye. To test whether the observed GFP expression pattern really colocalizes with DNA, we performed further experiments in which salivary glands from both aapp-hGrx1-roGFP2 and sag(-)KI mosquitoes were stained with Hoechst. We have now included these new data in Figure 3 - figure supplement 1. It appears that GFP is concentrated around the nuclei of the acinar cells, which makes the nuclei clearly visible even without DNA staining.

      Figure 4B) The number of biological replicates in the figure and the legend do not match (In the figure, there are 3-5 data points and, in the legend, text says 3 biological replicates.)

      Thank you for spotting this inconsistency. The number of biological replicates refers to the number of mosquito generations used for experiments. The difference is due to the fact that sometimes two experiments were performed with the same generation of mosquitoes using two different infected mice. We have clarified the legend accordingly to avoid misunderstandings.

      Figure 4C) The number of data points from (B) is 5. However, in (C) only 4 data points are presented.

      We have corrected this mistake. In the previous version, the results of two technical replicates were inadvertently plotted separately in (B) instead of the mean.

      Figure 5) I would suggest to have thorax image of P. berghei infected mosquito to show both salivary glands and parasites.

      Thank you for this suggestion. Images in Figure 7B (previously Figure 5) were replaced with an infected specimen to show salivary glands (DsRed) and sporozoites (GFP) together.

      Reviewer #4 (Significance (Required)):

      The transgenic lines that authors created have potential for in vivo imaging of salivary gland and sporozoite interactions. Since the aapp and trio lines have distinct fluorescence expression, they could help elucidate why sporozoites are more likely to invade distal lateral lobes compare to median lobe.

      My areas of expertise are confocal microscope imaging, mosquito salivary gland and Plasmodium infection and sporozoite motility.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The first achievements of the Klug et al. study are the (i) genetical engineering of the Anopheles coluzzii mosquitoes reared in insectarium, that stably express distinct fluorescent reporters (DsRed and hGrx1-roGFP2 and EGFP) under the putative "promoters" of genes reported to encode proteins expressed differentially in the pluri-lobal salivary glands(Sg) of anthropophilic blood-feeding adult females, (ii) the analysis of the promoter activity - based on the selected fluorescent reporter - with a primary focus on the salivary gland/Sg (including at the Sg lobe level) of the adult female but also considering the preimaginal developmental time with larvae and pupa samples. Of note, some data confirm the already reported time-dependent and blood meal-dependent promoter activity for the related Anopheles species. The last part presents preliminary dataset on live imaging of Plasmodium berghei sporozoites with the aim of highlighting the usefulness of these A. coluzzii transgenic lines to better understand how the rodent Plasmodium sporozoites first colonize and then settle as packed cells in Sg acinar host cells.

      Major comments

      The two first objectives presented by the authors have been convincingly achieved with (i) the challenging production of four different lines expressing different single or double reporters chosen by the authors (and appropriately presented in the result text and figure sections), (ii) the careful analysis of the spatiotemporal expression of the DsRed reporter under two "promoters" studied and with regards to the blood feeding event parameter. However, if the reason why the authors have put so much effort in the production of their transgenic mosquitoes is (and as mentioned) to provide a significant improved setting enabling the behavioral analysis of sporozoites upon colonization and survival in the Sg, it seems this part is kind of limited. Likely in relation with this perception is the fact I found the introductory section often confusing and not enough direct to the points: in particular distinguishing the rationale from the necessity to produce appropriate models, and clarifying what is/are the added value(s) offered by these new transgenic lines models when compared to what exist (in Anopheles stephensi) with specific evidence that argue for this knowledge gain. At this stage, it is unfortunately not clear to me, what is the bonus of imaging the Plasmodium fluorescent sporozoites in hosts with fluorescent salivary gland lobes if one can not monitor key events of the Sg-sporozoite interaction that were not reachable without the fluorescent mosquito lines. Furthermore, it should be better explained why the rodent Plasmodium species has been chosen rather Plasmodium falciparum (or other human species) for which A. coluzzii is a natural host; may be just mentioning that this study would serve as a proof of concept but bringing real biological insights would be fine.

      1- The three genes and gene products selected by the authors should definitively be more systematically explained, which means for example the authors need to introduce the different mosquito species and the parasite-mosquito host pairs they are then referring to for the promoter/encoded proteins of their interest. In the same vein, I did not find any information as to the choice of the mosquito specie (A. Coluzzii) for the current work. I was curious to know what is the advantage since better knowledge was available with Anopheles stephensi with respect to (i) Saglin and its promotor activity, (ii) aap driven dsRed expression (lines already existing) and (iii) sporozoite-gland interaction.

      2- To help clarifying the added value of the present study, introducing the species names of the mosquito and the Plasmodium that serve as a model would be appreciated.

      3- Since a focus is the salivary gland of the blood feeding female Anopheles sp., a rapid description of the glands with different lobes and subdomains the results and figure 1 nicely refer to, would help in the introduction.

      4- That description could logically introduce the few proteins actually identified with lobe specific or cell domain specific expression (apical versus basal side, intracellular or surface expose, vacuole, duct...) profiles. The context with regards to sporozoite biology would then easily validate the "promoter choice". As a minor remark, I miss the reason why the authors wrote " the astonishing degree of order of the structures (referring to the packing of sporozoites within the Sg acinars) raise the question whether sporozoite can recognize each other". Please clarify since packing/accumulation can be passive due to cell mechanical constraints and explain what this point has to see with the question and experimental work proposed here?)

      5- The selection of hGrx1-roGFP2 is quite interesting and justified but there is then no use of this reporter property in the preliminary characterization of the Sg and Sg-sporozoite interaction. Could the authors provide such characterization?

      6- Figure 1: it would be nice to add in the legend at what time the dissection/imaging has been made (age, blood feeding timing?). I would also omit the double mutant trio-Dsred/aapDsred in the main figure (may be supplemental) since the two single mutants Dsred separately together with the double mutant (with different fluorescence) already provide the information. I would suggest to regroup the phenotypic presentation of the transgenic line made in the KI mosquitoes (current figure 5) in the main figure 1.

      7- Figure 2:

      a) Is there anything known on the Sgs' size change overtime. It seems that between day 1 and 2 there is an increase of size and volume as much as I can evaluate the volume (Fig S4). Could that mean that there is increase in cell number in the lobes and therefore more cells expressing the transgene which would account for the signal intensity increase rather than more transcripts per cell?

      b) why choosing 24h after the blood meal to assess promoter activity in the Sgs? Do we have any information on how the blood meal impact on the Sgs'development. At this time anyway the sporozoites are far from being made. Yosshida and Watanabe 2006 mentioned at significant decrease of Sg proteins post-blood feeding. Could the authors detail their rationale based on what the questions they wish to address

      8- Figure 3: The figure is quite informative in terms of subcellular localization. Concerning the section "Natural variation of DsRed expression in trio-DsRed mosquitoes", I think it could be shortened because because it is a bit out of the focus the study.

      9- In contrast the last section of live imaging of P. berghei sporozoites in the vicinity and within salivary gland should be expanded. The 2 sentences summarizing the data are quite frustrating "We also observed single sporozoites moving actively through tissues in a back and forth gliding manner (Fig. 6B, Movie 3) or making contact with the salivary gland although no invasion event could be monitored"

      10- I am aware of the technical difficulties to perform live imaging of sporozoite on whole mosquitoes, even when the salivary gland lobe under observation is closely apposed to the cuticle but that seems to be the final aim of the authors. I looked very carefully to the three movies and I am sorry but at this stage I could not make meaningful analysis out of them, and could not agree with the conclusions: for instances, the authors specify that sporozoites were undergoing back and forth movements (movie 3) but I do not see that and do not see the Sg contours in the available movies? The authors should also add bar and time scales to their movies. Having an in-depth description with regards to the sub-domain marked by a relevant reporter would strengthen the study, even if images are not collected in the whole mosquito to get higher resolution.

      I am not sure I understand the relevance of this quite condensed sentence in the text. Could the authors rephrase and expand if they wish to keep the issues they refer to. "The sporozoites' distinctive cell polarization and crescent shape, in combination with high motility, allows them to „drill" through tissues". I would stress more on the main unknown in terms of sporozoite-Sg interactions and the need to get right models for applying informative approaches (i.e. here, imaging).

      Of note, it could help to point that the "Sgs is a niche in which the sporozoites which egress from the oocyst could mature and be fully competent when co-deposited with the saliva into the dermis of their intermediary hosts"

      Significance

      1- Clear technical significance with the challenging molecular genetics achieved in the mosquito A. coluzzii.

      2- More limited biological significance: fair analysis and gain of knowledge of spatio-temporal of reporter expression under the selected promoter but limited significance of the final goal analysis which concerns the Plasmodium sporozoite biology once egressed from oocysts

      3- Previous reports cited by the authors have used the DsRed reporter and the aap promoter in another Anopheles (i.e. A. stephensi, Yoshida and Watanabe, Insect Mol Biol, 2006; Wells and Andrew, 2019) which is also a natural host and vector for human Plasmodium spp.) with significantly more resolutive 3D visualization of GFP-fluorescent P. berghei but in dissected salivary glands and not in whole mosquitoes. The Wells and Andrew publication entitled "Salivary gland cellular architecture in the Asian malaria vector mosquito Anopheles stephensi" in Parasite Vectors, 2015 would deserve to be reference and described.

      4- Audience: I would say that this work should be of interest of mostly scientists investigating Plasmodium biology (basic and field research) or in entomology of Diptera.

      5- To describe my fields of expertise, I can refer to my extensive initial training in entomology including at one point in the genetic basis of mosquito-virus interaction. I have also been working for more than 20 years in the field of Apicomplexa biology (Plasmodium and Toxoplasma) and I have long-standing interest in live and static high-resolution imaging.

    1. anticipations is key to 01:08:38 everything and attention is key to everything so every organism does that plants and everything else and it doesn't require a central nervous system 01:08:51 and and you i might add to this that not only is every organism cognitive but essentially every organism organism is cooperative to those cooperation and cognition 01:09:03 go hand in hand because any intelligent organism any organism that can act to better its you know viability is going to cooperate in 01:09:17 meaningful ways with other organisms and you know other species and things like that nice point because um there's cost to communication whether it's exactly whether it's the cost of making the pheromone 01:09:30 or just the time which is super finite or attention fundamentally and so costly interactions through time the game theory are either to exploit and stabilize which is fragile 01:09:42 or to succeed together yeah exactly and and and succeeding together cooperation is is is like everywhere once you once you understand what you're looking 01:09:54 for it's in the biologic world it's like everywhere so this idea that we're you know one one one person against all or you know we're a dog eat dog universe i mean it's you 01:10:08 know in a certain sense it's true obviously tigers eat you know whatever they eat zebras or whatever i mean that happens yes of course but in the larger picture 01:10:19 over and over multiple time scales not just uh you know in five minutes but over evolutionary time scales and uh you know developmental time scales and everything the cooperation is really the rule 01:10:33 for the most part and if you need if any listener needs proof of that just think of who you think of your body i mean there's about a trillion some trillion some cells 01:10:45 that are enormously harmonious like your blood pumps every day or you know this is a this is like a miracle i don't want to use the word miracle because i want to get into 01:10:59 whatever that might imply but uh it is amazing aw inspiring the the depth of cooperation just in our own bodies is like that's that's like 01:11:12 evolution must prefer cooperation or else there would never be such a complex uh pattern of cooperation as we see just in one human body 01:11:26 just to give one example from the bees so from a species i study it's almost like a sparring type of cooperation because when it was discovered that there were some workers with developed ovaries 01:11:38 there was a whole story about cheating and policing and about altruism and this equation says this and that equation says that and then when you take a step back it's like the colony having a distribution of over-reactivation 01:11:51 may be more ecologically resilient so um i as an evolutionary biologist never think well my interpretation of what would be lovey-dovey in this system must be how it works because that's so 01:12:05 clearly not true it's just to say that there are interesting dynamics within and between levels and in the long run cooperation and stable cooperation and like learning to adapt 01:12:17 to your niche is a winning strategy in a way that locking down just isn't but unfortunately under high um stress and 01:12:29 uh high uncertainty conditions simple strategies can become rife so that's sort of a failure mode of the population

      The human, or ANY multicellular animal or plant body is a prime example of cooperation....billions of cells in cooperation with each other to regulate the body system.

      The body of any multi-cellular organism, whether flora or fauna is an example of exquisite cellular and microbial cooperation. A multi-cellular organism is itself a superorganism in this sense. And social organisms then constitute an additional layer of superorganismic behavior.

    1. Reviewer #2 (Public Review):

      The research paper presents a modeling approach aimed at disentangling mother's genetic effects on their offspring in two components: prenatal environment and postnatal environment. Specifically, the authors use SEM on adopted and non-adopted individuals from the UK Biobank and leverage the variation in genetic similarities from different family structures. Because the UK Biobank is not created as an adoption study, they build seven different family structures to include all possible family combinations that can provide information regarding the two parameters of interest: those representing prenatal and postnatal environment respectively. The model is used on two phenotypes (birthweight and education attainment) to illustrate it.

      The results indicate an 'expected pattern of maternal genetic effect on offspring birthweight' and 'unexpectedly large prenatal (intrauterine) maternal genetic effects on offspring education attainment. The authors mention this result can likely be explained by adopted offspring being raised by biological relatives. They then show simulations supporting this hypothesis.

      We praise the authors for the complex analyses executed and the work done to create the model and make the scripts available to the research community. The models can be a valuable addition to the behavior genetics literature and to researcher's toolkit. We do however have a few concerns regarding 1. the meaning of the results, 2. model building decisions and the choice of sample and 3. the way some limitations are addressed. We go into more details for each of these points.

      1. Interest to study mothers' genetic effects as acting via the prenatal environment or the postnatal environment and the meaning of the parameters tested by the model

      I think this is an interesting question and a useful distinction for a number of phenotypes and the authors use the adoption design in an innovative way to define and estimate parameters that correspond to this distinction. However, I would suggest that the expressions of prenatal environmental effect and postnatal environmental effect (as distinct pathways for mother's gene to be expressed) seem to be an overstatement.

      The definition of mother genetic effects (effects of mother genotype on their child phenotype, over and above any genetic transmission) is citing Wolf & Wade 2009 (line 56) which mention the more general notion of 'maternal effect' that are defined as effect of genotype, phenotype (or both) on their offspring. I would argue that postnatal maternal genetic effects (as currently defined in the paper) are likely environmental effect and not only 'genetic effects'.

      These environmental effects are indeed partly influenced by mother's genes, but also strongly affected by other variables such as culture, generation, SES, education. It is not possible to disentangle these effects in the design(s) used here.

      This consideration can affect the authors definition of the covariance between an adopted individual's genotype and phenotype as a function of prenatal (but not postnatal) maternal genetic effects (line 93-94). The authors current assumption does not consider the potential for environmental modulation of the effect of adopted mothers' genes (which are not zero for several phenotypes). Postnatal maternal genetic effects are thus also likely to capture and represent environmental differences.

      2. Model building decisions specific to the UK biobank

      One of the main issues is that the method is tested on a sample that is not built as an adoption design. This forced the authors to make decision to circumvent this problem and lead to important limitations that are not inherent to their method, but to the specific sample they applied it to.

      a. Having adoptive parents partly genetically related to the child is breaking the logic of the adopted design. Thus, it brings back the genetic confound (passive gene-environment correlation) problem of usual family-based design. In their case, it alters their ability to differentiate between prenatal and postnatal environment.

      b. In section starting on line 426, the authors have included simulations to show how this issue could be addressed. However, it does not help the fact that in their model applied to the UK biobank, the information regarding the degree of genetic similarity between adopting parents and biological parents and the child is unknown.

      c. To address this problem in their analyses of UK biobank, authors used (Lines 302 & 417) information regarding whether children were breastfed or not (on the basis that this knowledge would be more common if the child was raised by a biological family relative) to identify adopted singletons raised by biological relatives. However, this is, at best, a mediocre index of genetic relatedness. I can see other reasons for participants to have knowledge of if they have been breastfed: because they were adopted at an older age, because they are still (or have been) in contact with their biological mother. It is also possible, albeit rare, that adoptive parents may breastfeed a child via the use of drugs to stimulate milk production. Line 420: the fact that the prenatal maternal estimate became non-significant after removing participants that were breastfed do provide results more in-line with what would be expected. But we can't use expected results as a basis to evaluate the validity of the approach. The absence of GxE and rGE are two other strong assumptions of the model that could also produce this kind unexpected results.

      d. I would suggest discussing the issue of genetic relatedness between adopting parents and offspring in terms of passive rGE which is a common problem for the estimation of parental effects in every familial design.<br /> e. Line 291: why use an unweighted PRS for EY3 (Lee, 2018), while the usual way of computing PRS (as a weighted sum of risk alleles) was used for birthweight?

      3. Limitations<br /> Assess other limitations of their method.

      a. limitation of the availability of birth father information,

      b. prenatal events uncorrelated with birthmother's genes (disease or accidents),

      c. Inferring prenatal environment effect from higher birth mother correlation compared to birthfather is subject to bias from measurement differences between the two (Loehlin, 2016).

      d. age at which the child is adopted (if the child has been partly raised by birth parents before adoption, it would bias (raise) the estimates of prenatal effects).

      e. evocative rGE not mentioned. It has been shown that parents partly react to children's behaviors. Thus, the estimate of maternal genetic postnatal effects could be biased (lowered) by evocative gene-environment correlation. In other words, the model also assumes no evocative gene-environment correlation.

      Final thoughts:

      1. I would like a better case made for why it is important to distinguish genetic effects into prenatal and postnatal effect.

      2. I would suggest the author make a clear distinction between the limits inherent to their sample (UK biobank) from those inherent to their methodological approach. I see important usefulness is plague by limits inherent to the sample used. At the same time, I am not aware of the availability of a big enough sample of adopted children with genotypic information available to compute PRS.

    2. Author Response

      Reviewer #2 (Public Review):

      Summary

      The research paper presents a modeling approach aimed at disentangling mother's genetic effects on their offspring in two components: prenatal environment and postnatal environment. Specifically, the authors use SEM on adopted and non-adopted individuals from the UK Biobank and leverage the variation in genetic similarities from different family structures. Because the UK Biobank is not created as an adoption study, they build seven different family structures to include all possible family combinations that can provide information regarding the two parameters of interest: those representing prenatal and postnatal environment respectively. The model is used on two phenotypes (birthweight and education attainment) to illustrate it.

      The results indicate an 'expected pattern of maternal genetic effect on offspring birthweight' and 'unexpectedly large prenatal (intrauterine) maternal genetic effects on offspring education attainment. The authors mention this result can likely be explained by adopted offspring being raised by biological relatives. They then show simulations supporting this hypothesis.

      We praise the authors for the complex analyses executed and the work done to create the model and make the scripts available to the research community. The models can be a valuable addition to the behavior genetics literature and to researcher's toolkit. We do however have a few concerns regarding 1. the meaning of the results, 2. model building decisions and the choice of sample and 3. the way some limitations are addressed. We go into more details for each of these points.

      1) Interest to study mothers' genetic effects as acting via the prenatal environment or the postnatal environment and the meaning of the parameters tested by the model .

      I think this is an interesting question and a useful distinction for a number of phenotypes and the authors use the adoption design in an innovative way to define and estimate parameters that correspond to this distinction. However, I would suggest that the expressions of prenatal environmental effect and postnatal environmental effect (as distinct pathways for mother's gene to be expressed) seem to be an overstatement.

      The definition of mother genetic effects (effects of mother genotype on their child phenotype, over and above any genetic transmission) is citing Wolf & Wade 2009 (line 56) which mention the more general notion of 'maternal effect' that are defined as effect of genotype, phenotype (or both) on their offspring. I would argue that postnatal maternal genetic effects (as currently defined in the paper) are likely environmental effect and not only 'genetic effects'. These environmental effects are indeed partly influenced by mother's genes, but also strongly affected by other variables such as culture, generation, SES, education. It is not possible to disentangle these effects in the design(s) used here.

      Although we have referred to the maternal effects estimated in our manuscript as “prenatal maternal genetic effects” and “postnatal maternal genetic effects”- all of these effects on the offspring are mediated through maternal phenotypes (which as the reviewer correctly notes, will be influenced by both genes and the environment). In other words, the maternal PRS used in our study proxies some maternal phenotype/s that then forms part of the offspring’s prenatal and/or postnatal environment which then affects the offspring’s phenotype. We have referred to these effects as maternal genetic effects rather than just maternal effects to emphasize the causal link with the maternal genotype and the fact that we are only proxying that part of the maternal phenotype that is explained by the relevant genetic variation (NB. This is consistent with the Wolf & Wade 2009 definition of maternal effects i.e. “…the causal influence of maternal genotypes on offspring phenotypes…”). We agree with the reviewer that our model is not attempting to disentangle proportions of variance due to genetic and environmental factors (which is not its purpose).

      This consideration can affect the authors definition of the covariance between an adopted individual's genotype and phenotype as a function of prenatal (but not postnatal) maternal genetic effects (line 93-94). The authors current assumption does not consider the potential for environmental modulation of the effect of adopted mothers' genes (which are not zero for several phenotypes). Postnatal maternal genetic effects are thus also likely to capture and represent environmental differences.

      Assuming that adopted offspring are not biologically related to their adoptive mothers, then adopted individuals’ PRS should not be correlated with adoptive mothers’ PRS. The corollary is that adoptive mothers’ PRS should not influence the covariance between adopted individuals’ PRS and phenotype (i.e. regardless of whether there is environmental modulation of the effect of adopted mothers’ genes on offspring phenotype). It is true, however, that we do not consider genotype by environment interaction effects in our model, and that this is a limitation of our model. We allude to this important point several times in the Discussion:

      “Those assumptions explicitly encoded in Figure 1 include that the total maternal genetic effect can be decomposed into the sum of prenatal and postnatal components, that genetic effects are homogenous across biological and adoptive families, the absence of genotype x environment interaction…”

      And

      “In contrast, in our design it is more important that genetic effect sizes are homogenous across adopted and non-adopted individuals (i.e. no genotype by environment interaction)…”.

      At the request of the reviewer, we now include additional discussion of GxE and other assumptions of our model in further detail in Supplementary File 17.

      2) Model building decisions specific to the UK biobank. One of the main issues is that the method is tested on a sample that is not built as an adoption design. This forced the authors to make decision to circumvent this problem and lead to important limitations that are not inherent to their method, but to the specific sample they applied it to.

      a) Having adoptive parents partly genetically related to the child is breaking the logic of the adopted design. Thus, it brings back the genetic confound (passive gene-environment correlation) problem of usual family-based design. In their case, it alters their ability to differentiate between prenatal and postnatal environment.

      We agree that the UK Biobank was never designed for this purpose, and that data from it regarding adoption is less than perfect. Nevertheless, we think that an important conclusion of our paper is that large-scale biobanks (which because of their size) contain many hundreds/thousands of adopted individuals can be used to partition maternal genetic effects into prenatal and postnatal components, provided good quality data on the adoption process has been gathered and/or genetic information on their adoptive parents.

      To help address the reviewer’s concerns we have created a Supplementary Table (Supplementary File 17) that summarizes some of the main limitations/assumptions of our model, whether they are specific to the UK Biobank dataset or intrinsic to our method, their consequences on model parameters, and possible options for addressing them.

      b) In section starting on line 426, the authors have included simulations to show how this issue could be addressed. However, it does not help the fact that in their model applied to the UK biobank, the information regarding the degree of genetic similarity between adopting parents and biological parents and the child is unknown.

      We agree- but we feel it is important to demonstrate (a) that cryptic biological relatedness between adopted individuals and their adoptive parents is a potential issue not only for our study, but for other studies attempting to utilize this information in the UK Biobank, and (b) that cryptic relatedness can be dealt with effectively through appropriate modelling in our SEM framework (i.e. even if it is not possible with the current data from UK Biobank). The corollary is that we recommend that the UK Biobank (and other large-scale biobanks) attempt to acquire information on adopted individuals and their parents through e.g. questionnaire.

      c) To address this problem in their analyses of UK biobank, authors used (Lines 302 & 417) information regarding whether children were breastfed or not (on the basis that this knowledge would be more common if the child was raised by a biological family relative) to identify adopted singletons raised by biological relatives. However, this is, at best, a mediocre index of genetic relatedness. I can see other reasons for participants to have knowledge of if they have been breastfed: because they were adopted at an older age, because they are still (or have been) in contact with their biological mother. It is also possible, albeit rare, that adoptive parents may breastfeed a child via the use of drugs to stimulate milk production. Line 420: the fact that the prenatal maternal estimate became non-significant after removing participants that were breastfed do provide results more in-line with what would be expected. But we can't use expected results as a basis to evaluate the validity of the approach. The absence of GxE and rGE are two other strong assumptions of the model that could also produce this kind unexpected results.

      We agree that (a) the inclusion of adopted individuals whose adoptive parents are biologically related to them is only one possible reason for unexpectedly strong prenatal maternal genetic effect estimates, (b) attempting to remove these individuals from the analysis using a proxy like breastfeeding information is less than perfect. As indicated above, we now discuss in detail alternative explanations for our results including violations of assumptions regarding the absence of GxE and rGE, and other explanations (assortative mating, stratification etc) (see new text in the Discussion and Supplementary File 17).

      d) I would suggest discussing the issue of genetic relatedness between adopting parents and offspring in terms of passive rGE which is a common problem for the estimation of parental effects in every familial design.

      We now include mention of passive rGE in the Discussion:

      “Rather we hypothesize it is possible that our model could have been misspecified in that substantial numbers of adopted individuals in the UK Biobank may have in fact been raised by their biological relatives. This can be thought of as (unintentional) reintroduction of passive gene-environment correlation into the study. In other words, adopted children are brought up by their genetic relatives, who in turn provide the environment in which they are raised. This induces a correlation between adopted individuals’ PRS and their environment.”

      e) Line 291: why use an unweighted PRS for EY3 (Lee, 2018), while the usual way of computing PRS (as a weighted sum of risk alleles) was used for birthweight?

      We thank the reviewer for pointing this inconsistency out. We have now rerun the analyses using weighted and unweighted PRS for both birth weight and educational attainment. The reason for running both sets of analyses is that the GWAS on which the SNPs are selected (i.e. the weights are based), contains UK Biobank individuals. This may inflate the overall strength of association between the PRS and outcome through winner’s curse (although not differentially between individuals from adoptive and biological families). In contrast, unweighted scores should be much more robust to this inflation, and so are a useful sanity check on the results.

      3) Limitations

      As our Discussion is already very long, we have created a Supplementary Table (Supplementary File 17) that summarizes some of the main limitations/assumptions of our model, their consequences on model parameters, and possible options for addressing them. We also discuss specific concerns raised by the referee below.

      Assess other limitations of their method.

      a) limitation of the availability of birth father information,

      Our model does not require information on adopted individual’s birth fathers (although it does require PRS on non-adopted individuals’ birth fathers- which is typically readily available). It does, however, make the assumption that fathers do not contribute prenatally to offspring traits- which we think is a reasonable assumption for the majority of offspring phenotypes. If PRS for adopted individuals’ biological fathers were available, then prenatal paternal genetic effects could be estimated as part of the model. To accommodate the reviewer’s request, we have included and discussed this limitation/assumption in more detail in Supplementary File 17.

      b) prenatal events uncorrelated with birthmother's genes (disease or accidents),

      We agree that our model assumes that maternal genotype is uncorrelated with prenatal environmental factors. We now discuss this assumption/limitation further in Supplementary File 17.

      c) Inferring prenatal environment effect from higher birth mother correlation compared to birthfather is subject to bias from measurement differences between the two (Loehlin, 2016).

      Whilst this is a limitation of adoption designs that estimate prenatal effects using the difference between maternal and paternal correlations with offspring phenotypes, this is not actually a limitation of our model. In our model we do not use (phenotypic) mother-child and father-child correlations (we use PRS-phenotype correlations). Also, in our model, information on the size of the prenatal (and postnatal) maternal genetic effects primarily comes from the difference between the PRS-phenotype covariance in adopted singletons compared to the PRS-phenotype covariance non-adopted individuals (i.e. not from the difference between maternal and paternal correlations with offspring phenotypes). We state this in the Introduction and Methods e.g.:

      “Thus, the difference between the genotype-phenotype covariance in adopted and non-adopted singleton individuals provides important information on the likely size of postnatal genetic effects.”

      It is also worth noting, that in our model, the size of the paternal PRS-offspring association does not factor into the estimation of maternal genetic effects (nor does the difference between the maternal PRS-offspring phenotype association and the paternal PRS-offspring phenotype association). Also, our model takes into account if there are differences in the amount of (random) measurement error in adoptive and non-adoptive families.

      d) age at which the child is adopted (if the child has been partly raised by birth parents before adoption, it would bias (raise) the estimates of prenatal effects).

      We agree and now discuss this limitation further in Supplementary File 17.

      e) evocative rGE not mentioned. It has been shown that parents partly react to children's behaviors. Thus, the estimate of maternal genetic postnatal effects could be biased (lowered) by evocative gene-environment correlation. In other words, the model also assumes no evocative gene-environment correlation.

      We agree and now discuss this limitation in Supplementary File 17 (although we note that the effect that evocative rGE will have on the SEM parameters will depend on the direction of the gene-environment correlation).

      Final thoughts

      1) I would like a better case made for why it is important to distinguish genetic effects into prenatal and postnatal effect.

      We have included the following text in the Introduction:

      “Given the increasing number of variants identified in GWAS that exhibit robust maternal genetic effects, a natural question to ask is whether these loci exert their effects on offspring phenotypes through intrauterine mechanisms, the postnatal environment, or both. Indeed, resolving maternal effects into prenatal and postnatal sources of variation could be a valuable first step in eventually elucidating the underlying mechanisms behind these associations (Armstrong-Carter et al. 2020), directing investigators to where they should focus their attention, and in the case of disease-related phenotypes, yielding potentially important information regarding the optimal timing of interventions. For example, the demonstration of maternal prenatal effects on offspring IQ/educational attainment, suggests that if the mediating factors that were responsible could be identified, then improvements in the prenatal care of mothers and their unborn babies which target these factors, could yield useful increases in offspring IQ/educational attainment.”

      2) I would suggest the author make a clear distinction between the limits inherent to their sample (UK biobank) from those inherent to their methodological approach. I see important usefulness is plague by limits inherent to the sample used. At the same time, I am not aware of the availability of a big enough sample of adopted children with genotypic information available to compute PRS.

      One of the main limitations inherent to our sample (UK Biobank) is the fact that currently we cannot be certain that adopted individuals are not biologically related to their adoptive parents. As we demonstrate, this limitation could be addressed if information were gathered regarding the relationships, which at least in principle could be done relatively easily in the UK Biobank (e.g. by questionnaire, or even better, by genotyping adoptive parents where possible). The SEMs could then be adjusted to take these relationships into account. We discuss this limitation, and many others, in Supplementary File 17, and divide the table according to whether the limitation is primarily a consequence of the dataset (UK Biobank) or the method more broadly.

      We agree with the reviewer that the size of adoption studies is currently limited (e.g. Texas Adoption Project; Colorado Adoption Study etc). Nevertheless, it is likely that the number of adopted individuals available in large-scale Biobanks will increase over time, in which case models like the one espoused in this manuscript will become increasingly useful. Importantly, our method does not require adoptive families in order to partition maternal effects, merely adopted singleton individuals, and reliable information on the biological relatedness (or lack thereof) of their adoptive parents. We feel therefore that it is important that this sort of information be gathered so that the adopted individuals within these large-scale resources can be leveraged to examine interesting questions like the ones discussed in our manuscript.

      We have added these points to the Discussion:

      “We argue that of greater consequence for the validity of our model is that any genetic relationship between adoptive and biological parents is accurately modelled and included in the SEM. Through simulation, we have shown that the consequences of model misspecification depend upon which biological and adoptive parents are related, the nature of this relationship, and the proportion of adopted individuals in the sample who have had their relationship misspecified. Our simulations also showed that correctly modelling this relationship returns asymptotically unbiased effect estimates and correct type I error rates. Clearly, knowing these cryptic relationships in the UK Biobank would allow us to properly model them and better estimate prenatal and postnatal maternal genetic effects using this resource. We emphasize that accurately modelling these relationships does not require that actual genotypes for adoptive and/or biological parents be obtained (although this would be advantageous in terms of statistical power) as our SEM allows us to model these relationships in terms of latent variables. Indeed, as large-scale resources like the UK Biobank become more common, we expect that the number of adopted individuals who have GWAS will also increase, and consequently models like the one espoused in this manuscript will become increasingly useful. High quality phenotypic information on these adopted individuals and their adoptive parents including whether they share any biological relationship will be critical to making the most of these resources.”

    1. MIT/Brown Vannevar Bush 研讨会于 1995 年 10 月 12 日至 13 日在麻省理工学院举行,以庆祝1945 年 7 月在大西洋月刊上发表的万尼瓦尔·布什(Vannevar Bush)的开创性文章《诚如所思》(As We May Think)发表50 周年。活动视频分五部分,这是第五部分,内容主要是:

      1.Lee Sproull发表的演讲:“信息是不够的:计算机对生产性工作的支持”(Information Is Not Enough: Computer Support for Productive Work)。内容介绍:对一项新技术的任何设想都意味着对人类及其行为的设想。在这次演讲中,我描述了与个人计算的最有影响力的技术愿景相关的人类行为愿景,其缩影是万尼瓦尔·布什(Vannevar Bush)的Memex--孤独的思想者和问题解决者的愿景。我将这一愿景与关于人类生产性行为如何实际发生的另一种观点进行对比--在相互依赖的社会关系中。我回顾了目前计算机对社会行动者的支持状况,并提出了另一种观点,即信息处理从属于关系管理。

      2.艾伦·凯(Alan Kay)发表的演讲:“Simex:布什的愿景中被忽视的部分”(Simex: the neglected part of Bush's Vision)。内容介绍:布什的愿景是在一张桌子上建立一个超链接的10000卷图书馆,它对个人计算的发展产生了巨大的影响,而且今天也有可能实现(甚至可以通过互联网超越它)。然而,尽管布什在30年代就从事(模拟)计算机模拟工作,但很可能他从他的工作或新建的Eniac中都看不到Memex的任何模拟作用。布什的设想中缺少什么,今天能不能发明出来?

      3.第 2 天小组讨论。

    1. MIT/Brown Vannevar Bush 研讨会于 1995 年 10 月 12 日至 13 日在麻省理工学院举行,以庆祝1945 年 7 月在大西洋月刊上发表的万尼瓦尔·布什(Vannevar Bush)的开创性文章《诚如所思》(As We May Think)发表50 周年。活动视频分五部分,这是第四部分,内容主要是:1.拉吉·瑞迪(Raj Reddy):重新审视布什的智能系统(Bush's Intelligent Systems Revisited)。内容介绍:在他著名的论文 《诚如我思》中,万尼瓦尔·布什(Vannevar Bush)为创造能够解释图片、听写、理解语言、使用超链接和从数字图书馆进行关联检索的机器提供了一个愿景。在这次演讲中,我们将回顾50年来在这些预测方面所取得的进展。

  4. Local file Local file
    1. 'I don't think it's anything—I mean, I don't think it was ever put to anyuse. That's what I like about it. It's a little chunk of history that they'veforgotten to alter. It's a message from a hundred years ago, if one knew howto read it.'

      Walter and Julia are examining a glass paperweight in George Orwell's 1984 without having context of what it is or for what it was used.

      This is the same sort of context collapse caused by distance in time and memory that archaeologists face when examining found objects.

      How does one pull out the meaning from such distant objects in an exegetical way? How can we more reliably rebuild or recreate lost contexts?

      Link to: - Stonehenge is a mnemonic device - mnemonic devices in archaeological contexts (Neolithic carved stone balls


      Some forms of orality-based methods and practices can be viewed as a method of "reading" physical objects.


      Ideograms are an evolution on the spectrum from orality to literacy.


      It seems odd to be pulling these sorts of insight out my prior experiences and reading while reading something so wholly "other". But isn't this just what "myths" in oral cultures actually accomplish? We link particular ideas to pieces of story, song, art, and dance so that they may be remembered. In this case Orwell's glass paperweight has now become a sort of "talking rock" for me. Certainly it isn't done in any sort of sense that Orwell would have expected, presumed, or even intended.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to the reviewers

      Manuscript number: RC-2022-01407

      Corresponding author(s): Ivana, Nikić-Spiegel

      1. General Statements

      We would like to thank the reviewers for careful reading of our manuscript and for their insightful and useful comments. We are happy to see that the reviewers find these results to be of interest and significance. The way we understand reviewers’ reports, their main concerns can be roughly divided in following categories: 1) providing more quantitative data 2) interpretation of the Annexin V/PI assay 3) additional evidence for calpain involvement. We intend to address these experimentally or by modifying the text, as outlined below.

      2. Description of the planned revisions

      Reviewer #1

      Fig1A/B o SYTO 16 staining suggests slight reshaping of nucleus upon spermine NONOate, showing less blurry punctae. From the SYTO 16 profile, this should be quantifiable.

      By looking at the shown examples and the entire dataset, it appears to us as if neuronal nuclei are shrinking upon spermine NONOate treatment resulting in their less blurry appearance. We are not sure if this is what the reviewer is referring to, but this can also be quantified by measuring changes in neuronal nuclear size. We already have this data from the measurements shown in Fig4 and we intend to show it in the revised version of the manuscript. Line profile measurements are also possible, but the nuclear size quantification might be more suitable for this purpose.

      o There is a subset of neuron nuclei that are SYTO 16 positive. Please quantify the ratio

      We will use our existing dataset to quantify the ratio of NFL positive and SYTO16 positive nuclei.

      FigS1A o Show NeuN with Anti-NFL merged figures

      We will show merged NeuN and anti-NFL images, which might require rearrangement of the existing figures and figure panels. We will do this in the revised manuscript.

      FigS1C o Show quantification and timeline. I want to know whether there is also a plateau reached here.

      As the data shown in the FigS1C do not include NeuN staining, we will do additional experiments and perform proposed quantifications.

      FigS2A-F o Though the statements might be true, selecting one nucleus for a line profile as a statement for the whole dataset seems problematic. Average a larger number of unbiased selected nuclei profiles across multiple cultures to make a stronger statement, or a percentage of positive nuclei as in FigS1b.

      Corresponding images and line profiles are representative of the entire dataset. However, we agree with the reviewer that this is not obvious from the current manuscript version. Thus, to strengthen our findings, we intend to quantify the percentage of positive nuclei as in FigS1b. The only difference will be that instead of NeuN, we will use SYTO16 as a nuclear marker. The reason being that the existing datasets contain images of NFL and SYTO16 and not NeuN.

      FigS3 • There are no fluorescence profiles, no quantification

      As the reviewer suggests, we will quantify the ratio of NFL positive and SYTO16 positive nuclei, and include the quantifications in the revised manuscript.

      General statement: There do seem to be punctated patterns of non-nucleus accumulating NFL fragments. Can they be localized to any specific structure?

      We assume that the reviewer is referring to neuronal/axonal debris. They are present after injury but they do not colocalize with nuclear stains. We will address this in the revised manuscript.

      Fig1C-F • I find it too simplistic to categorize c+f and d+e together. There is a huge difference in the examples of nuclear localization between d and e. To not comment on their distinction (if that is consistent) is problematic. Also, since we don't see a merge with either NeuN or SYTO 16, reader quantification is difficult.

      We thank the reviewer for bringing this up. We will carefully check our entire dataset and we will update the figures and the text accordingly. We will also show the corresponding SYTO16 images, as the reviewer suggested.

      Would the microfluidic device construction allow for time to transport any axonally damaged fragments to the soma?

      Yes, the construction of the microfluidic devices allows the transport of axonal proteins back to the soma. Based on our experiments, it seems that damaged NFL from the axonal compartment could be contributing to the accumulation of NFL fragments in the nuclei. However, this contribution seems to be minimal as we cannot detect nuclear NFL upon the injury of axons alone. Alternatively, it could be that the processing of axonal NFL fragments proceeds differently if neuronal bodies are not injured and that this is the reason we don’t detect the NFL nuclear accumulation upon injury of axons alone. We will discuss this in the revised manuscript.

      Fig2C+D • The statement ".... no annexin V was detected on the cell membrane" needs to be shown more clearly

      We will modify figures to address this comment.

      • Please provide merged AnnexinV/PI images

      We will modify figures to address this comment.

      • The conclusion about 2D, that nuclear accumulated NFL overlaps with PI is not supported by the example image shown. There are plenty of PI positive spots that are not NFL positive and even several NFL positive ones that do not have a clear PI staining. Please quantify and then show a very clear result in order to be able to suggest necrosis as the underlying process.

      We are not sure if we understand the reviewer’s concern correctly. We will try to clarify it here and in the revised text. If necessary, we will tone down our conclusion, but the reason why not all of PI positive spots are NFL positive is most likely due to the fact that not all injured nuclei are NFL positive. We quantified in FigS1 that up to 60% of nuclei under injury conditions show NFL accumulations. That is why we are not surprised to see some PI positive/NFL negative nuclei. And the fact that there are some NFL positive nuclei which appear to be PI negative is most likely related to the fact that the PI binding is affected. In addition, upon closer inspection of NFL and PI panels in Fig2d it can be observed that NFL positive nuclei are also PI positive, albeit with a lower PI fluorescence intensity. We will modify the figure to show this clearly in the revised manuscript.

      FigS5 C+D • If the case is made that nitric oxide damage induces necrosis, then why is it that the AnnexinV example of Staurosporine exposure (which induces apoptosis) looks similar to that of nitric oxide damage in Fig2d and necrosis induction with Saponin looks very different?

      We thank the reviewer for bringing this up. We will try to clarify this in the revised manuscript. Regarding the specific questions, the most likely explanation why staurosporine treated neurons look similar to the ones treated with spermine NONOate is that in the late stages of apoptosis cell membrane ruptures and allows for the PI to label nuclei. This is probably the case here as illustrated by the nucleus in the middle of the image (FigS5c) that shows the fragmentation characteristic for the apoptosis. This is not happening in early apoptotic cells due to the presence of an intact plasma membrane. On the other hand, the reason why saponin treated cultures look different compared to spermine NONOate is that membranes are destroyed by saponin so that the PI can enter the cell. For that reason, there could have not been any AnnexinV binding to the membrane which would correspond to the AnnexinV signal of spermine NONOate treated neurons. As we will discuss below, we did not try to mimic spermine NONOate-induced injury with saponin treatment. Instead this was a control condition for PI labeling and imaging. We also used a rather high concentration of saponin which probably destroyed all the membranes which was not the case with spermine NONOate treatment. We intend to do additional control experiments to address this.

      • Additionally, does necrosis induction with Saponin also cause NFL fragment accumulation in the nucleus? Please show a co-staining of them. Also, the authors want to make a claim about reduce PI binding in NFL accumulated necrotic cells. In these examples, the intensity of the nuclear stain of PI with Saponin looks dimmer than with Staurosporine. Are the color scalings similar? It might be that the necrotic process itself causes reducing binding of PI and is not related to the presence of NFL.

      With regards to this question, it is important to note that Annexin V and PI imaging was done in living cells. To obtain the corresponding anti-NFL signal as shown in Fig 2c,d we had to fix the neurons, perform immunocytochemistry and identify the same field of view. We tried to do the same procedure after saponin treatment (Supplementary Figure 5d) but the correlative imaging was very difficult due to the detachment of neurons from the coverslip after the saponin treatment. For this reason, we could not identify the same field of view co-stained with NFL. However, other fields of view did not show NFL fragment accumulation. This could also be the consequence of the high saponin concentration that we used as we discuss above. We have also noticed the reduced intensity of PI binding in the nuclei of saponin-treated neurons. However, if the necrotic process itself reduces the binding of PI to the DNA, then all of the neurons treated with spermine NONOate would have an equally low PI signal. In our experiments, only the nuclei which contained NFL accumulations had a low PI signal, while the signal of NFL-negative nuclei was higher (as shown in Fig2d). We would also like to point out again that the saponin treatment was our control of the PI’s ability to penetrate cells and bind the DNA, as well as our imaging conditions, and not the control of the necrotic process itself. This is the reason why we didn’t go into details about neuronal morphology and NFL localization upon saponin treatment. We thank the reviewer for pointing this out since it prompted us to reevaluate what we wrote in the corresponding paragraph of the manuscript. We realized that the confusion might stem from our explanation of the AnnexinV/PI assay controls in the lines 196-198 (“Additional control experiments in which neurons were treated with 10 μM staurosporine (a positive control for induction of apoptosis) or with 0.1% saponin (a positive control for induction of necrosis) confirmed the efficiency of the annexin V/PI assay (Supplementary Fig. 5c,d).”). We will modify this portion of the text to clearly state that staurosporine and saponin treatments were controls of the AnnexinV and PI binding to their respective targets and not of the apoptosis/necrosis process. When it comes to the saponin treatment, our intention was only to permeabilize the membranes in order to allow PI penetration and DNA binding and not to induce necrosis or to mimic the effect of the spermine NONOate. We also intend to perform experiments with lower concentration of saponin to try to address this experimentally in addition to the text modifications.

      Fig3d • Please show similarly scaled images from controls for proper comparison

      We will show similarly scaled images of the control neurons so that they can be properly compared. They were initially not scaled the same for visualization purposes, but we will modify this in the revised manuscript.

      • How do the authors scale the degree and kinetics of induced damage between application of hydrogen peroxide/CCCP and glutamate toxicity? Does glutamate toxicity take longer to affect the cell, not allowing enough time to accumulate NFL fragments in the nucleus?

      It is challenging to scale the degree and kinetics of induced damage with different stressors. That is why we did not intend to do this. Instead we set different injury conditions based on the published literature. That is why can only speculate when it comes to this. In this regard, it can be that the glutamate toxicity takes “longer” to affect the cells even though it is very difficult to compare them on a timescale, especially when considering different mechanisms of action. We will discuss this limitation in the revised manuscript.

      Fig4B • Some groups (like NO and NO + emricasan) have much larger numbers of close to 0 intensity, compared to the control group. Why?

      We were wondering the same when we analyzed the data. The fact that our nuclear fluorescence intensity analysis picked up NFL signal in control neurons which had no nuclear NFL accumulation made us realize that the intensity measured in the nuclei of control group comes entirely from the out of focus fluorescence – from neurofilaments in cell bodies, dendrites and axons (an example can be seen in the FigS6). That is why we presented the corresponding data with a cut-off value based on the control signal (as mentioned in lines 238-240). Since the oxidative injury causes NFL degradation (not only in neuronal soma, but also neuronal processes), the overall fluorescence intensity of the NFL immunocytochemical staining is reduced in injured neurons. We can see that in all of our images. Consequently, there is no contribution of out of focus fluorescent signal to the measured fluorescence intensity in the majority of nuclei. Due to that, the nuclei without NFL accumulation (at least 40% of injured nuclei) will appear to have a close to 0 intensity of the fluorescent signal. We will discuss and clarify this additionally in the revised manuscript.

      • Please add the ratio of above/below threshold (50/50 obviously in controls)

      We will update the figure in the revised manuscript.

      • The description of the CTCF value calculation seems a little... muddled? Several parameters are described whereas "integrated density" is not even used. Why not simply mean intensity of nuclear ROI-mean intensity of background ROI?

      We included the integrated density in the description since it is measured together with the raw integrated density and can also be used for the CTCF value calculation. However, since we didn’t use it for the CTCF calculation, we will remove it from the corresponding section of the manuscript. We calculated the CTCF value instead of calculating mean intensity of the nuclear ROI - mean intensity of the background ROI, since the CTCF value also takes into account the area of the ROI and not just the mean intensity.

      • Also, please tell me if the areas for nuclear ROIs change, as I noted for Fig1A/B

      We will include this information in the revised manuscript.

      • To make sure that one of the 3 experimental repeats didn't skew the results, please show the median fluorescence intensity for each individual experiment to clarify that the supposed effect is repeated across experiments.

      We have already noticed that in the earliest of the three experiments overall fluorescence intensity was higher, but this was consistent across all the experimental groups and did not skew the results or affect the overall conclusion. However, we will double-check this and revise the figure.

      • From the text "...and due to the NFL degradation during injury...": this seems to contradict the process? Either the NFL fragment accumulates in the nucleus or it is degraded during injury. And isn't the degradation through calpain what supposedly allows this fragment of NFL to go to the nucleus in the first place? I reckon that the authors are possibly trying to reconcile why there are many close-to-0 intensity nuclei in the NO and NO + emricasan groups, but I don't feel the explanation given here fits.

      As we tried to explain in our response above, we think that the overall degradation of neurofilaments in neurons affects the fluorescence intensity originating from the out of focus neurofilaments. Therefore, the nuclei without NFL accumulation in injured conditions have a close to 0 fluorescence intensity. Additionally, we think that this is not an either/or situation, but that both degradation and nuclear accumulation of NFL happen simultaneously. We also think that degradation of axonal NFL and the transport of its tail domain to the soma will at least partially contribute to the accumulation in the nucleus. In any case, degradation and nuclear accumulation seem to be differentially regulated in individual neurons, as some of them show nuclear NFL accumulation and some not. Furthermore, calpain and other mechanisms could also cause NFL degradation up to the point at which these fragments can no longer be recognized by the anti-NFL antibody leading to the loss of signal. We will try to clarify this in the revised version of the manuscript.

      Fig5 • Does the distribution of this GFP in B match any of the various antibody stainings of different NFL fragments? Perhaps this is still a valid fragment of NFL, just not picked up by any AB?

      The GFP signal in B appears rather homogenous and it does not match any of the various antibody stainings of different NFL fragments. As the reviewer points out, this could also be a valid fragment of NFL fused to GFP that none of our antibodies is recognizing. We will clarify this in the revised manuscript.

      • "... and was indistinguishable from the full277 length NFL-GFP." Based on what parameters?

      We will clarify this in the revised text, but we meant in terms of overall neurofilament network and cell appearance, which is commonly used to test the effect of NFL mutations.

      • The authors claim that b is different from d, but I am not convinced. I would like to see a time dependent curve from multiple cells showing a differential change in nuclear and cytosolic GFP signal.

      As we also wrote in the manuscript, in the majority of neurons that were monitored during injury we were not able to detect an increase in the GFP fluorescence intensity in the nucleus. This is what prompted further experiments with NFL(ΔA461–D543)-FLAG. We will clarify this additionally in the revised manuscript and perform line profile intensity measurements to show the difference in nuclear and cytosolic GFP signal.

      • Secondly, the somatic GFP intensity for NFL increases for full length NFL-GFP. How is this explained, if it is only a separation of NFL and GFP? If anything, GFP should float away. And if the answer is that NFL is recruited to the nucleus, you showed that inhibition of calpain activity partially prevents that. So, if calpain activity is necessary for the transport of NFL to the nucleus, then wouldn't it also cut the GFP from NFL before it reaches the nucleus?

      We thank the reviewer for bringing this up and we apologize for the confusion. This can be explained by the fact that the images were scaled in a way that the GFP signal over time could still be seen easily (i.e. differently across different time points which we unfortunately forgot to mention in the figure legend). In the revised manuscript, we will either scale the images the same or we will alternatively show the displayed grey values in individual panels.

      Fig6 • It is recommended to overlap the transfected cells with a stain for endogenous NFL to show that despite the absence of the FLAG-tag, there is still NFL.

      We did not overlap the anti-NFL with anti-FLAG and SYTO16 staining, due to the space constraint and the intent to clearly show the overlap of FLAG and SYTO16 signals in the merged images above the graphs. However, the line profile intensity measurements were done in all three channels and show that despite the absence of FLAG, there is still NFL in the nucleus (Fig6b), or that both FLAG and NFL are present in the nucleus (Fig6d, NFL signal shown in gray). However, as this is not obvious and can easily be overlooked, we will show the endogenous NFL staining overlap in the revised version of the manuscript.

      Fig7 • „ ...all disrupted neurofilament assembly...": this sounds like the staining for native NFL supposedly shows a distortion due to a dominant negative effect of the expression of these constructs? Please clarify.

      Yes, we were referring to the disruption of neurofilament assembly due to a dominant negative effect of the expression of NFL domains. We will clarify this in the revised version of the manuscript.

      Discussion: • The authors show that after overepression of the head domain only, it possibly passively diffuses into the nucleus even in the absence of oxidative injury. However, it seems to be suggested as well that the head domain would not be freely floating around if it wouldn't be for increased calpain activity as a result of oxidative injury in the first place. Therefore, a head domain fragment localized in the nucleus would still more prominently happen upon oxidative injury and interact with DNA through prior identified putative DNA interaction sites from Wang et al. Please comment.

      That is correct. Upon injury and calpain cleavage, it is conceivable that a fragment containing the NFL head domain would also be present in the cell and could potentially diffuse to the nucleus and interact with the DNA. However, by staining injured neurons with an antibody that recognizes amino acids 6-25 of the NFL head domain, we were not able to detect an NFL signal in the nucleus (FigS2a,b). It could be that either the NFL head domain does not localize in the nuclei upon injury, or that the fragment localizing in the nucleus does not contain amino acids 6-25 of the NFL head domain. As the putative DNA-binding sites described by Wang et al involve 7 amino acids located in the first 25 residues of the NFL head domain, we would expect to detect it with the aforementioned antibody. However, as that was not the case we speculated that the interaction of NFL and DNA occurs differently in living cells, as opposed to the test tube conditions utilized by Wang et al. We will comment and clarify this in the revised version of the manuscript.

      • Reviewer #2*

      • Major Comments:

      • The initial data presented in the paper is good, does response of oxidative damage with proper controls, testing the antibodies to NF-L and etc. (Fig. 1-Fig. 4). *

      We thank the reviewer for their positive feedback.

      1. The evidence for calpain involvement in NF-L cleavage during oxidative damage is missing. Provide the evidence for full length NF-L construct and deletion mutants transfected into cells by immunoblot for cleavage of NF-L, perform nuclear and cytoplasmic extract preparations and show that enrichment of the tagged cleaved NF-L fragment in nuclear fraction.

      We thank the reviewer for their comments and suggestions. Since we saw in our microscopy experiments that calpain inhibition reduced the accumulation of NFL in the nucleus, and since it is known that NFL is a calpain substrate (Schlaepfer et al., 1985; Kunz et al., 2004 and others), we did not perform additional experiments to confirm the involvement of calpain in NFL degradation during injury. However, to strengthen our findings, we intend to perform the suggested experiments and include the results in the revised manuscript.

      1. Show calpain activation during oxidative damage by performing alpha-Spectrin immunoblots identify calpain specific 150-kda Spectrin and caspase specific 120-kDa fragment generation in these cells. Also, calpain activation can be measured by MAP2 level alteration and p35 to p25 conversion. Without this evidence it's very hard to believe if the calpain activity is increased or decreased during oxidative damage and these markers are altered by using calpain inhibitors.

      To confirm the calpain activation, we intend to perform anti-alpha spectrin and/or anti-MAP2 blots in lysates of control and injured neurons and include the results in the revised manuscript.

      1. The premise that NF proteins are absent in cell bodies and present only in axons is not correct. It has been demonstrated by multiple investigators that NFs are present in the perikaryon and dendrites of many types of neurons (Dahl, 1983, Experimental Cell Research)., Dr. Ron Liem's group showed NF protein expression in cell bodies of dorsal root ganglion cells (Adebola et ., 2015, Human Mol Genetics) and also showed N-terminal antibodies for NF-L, NF-M and NF-H stain rat cerebellar neuronal cell bodies and dendrites (Kaplan et al., 1991, Journal of Neuroscience Research) when NFs are less phosphorylated. (Schlaepfer et al., 1981, Brain Research) show staining of cell bodies of cortex and dorsal root ganglion cell bodies with NF antibody Ab150, and Yuan et al., 2009 in mouse cortical neurons with GFP tagged NF-L.

      We are not sure what the reviewer is referring to since we cannot find a corresponding section in which we claim that NF proteins are absent in cell bodies. We wrote the following “Anti-NFL antibody staining of neurons treated with the control compound showed the expected neurofilament morphology, that is, a strong fluorescence intensity in axons and lower intensity in cell bodies and dendrites (Fig. 1a)” in our results section (lines 119-121), but the claim we were trying to make there was that NF proteins are particularly abundant in axons. We will clarify this in the revised manuscript.

      1. Quantifying NF-L signal or tagged NF-L fragment signals in the cell body by ICC has many problems and making conclusions. It's extremely difficult to have control over levels of proteins in transfected overexpression models and comparing two or three different constructs with each other by ICC. Not every cell expresses same levels of protein in transfected cells and quantifying it by ICC again has a major problem. This can be addressed if there are stable lines that express equal levels of protein in all cells that comparisons can be made. Under thesese circumstances validation of the hypothesis presented in the study has no strong direct evidence to demonstrate that calpain is activated and NF-L fragment translocate to the nucleus.

      We agree that the results from overexpression-based experiments should be interpreted with caution as levels of expression vary between the cells. We intend to discuss this in the revised manuscript. However, we find it difficult to experimentally address this comment since we are not sure which specific experiments the reviewer is referring to. With regards to this, we would like to emphasize that most of the initial experiments in which we observed NFL accumulation in the nuclei of injured neurons were based on the ICC labeling of endogenous NFL and didn’t involve its overexpression. This includes labeling of endogenous NFL in various types of neurons, comparing the effects of different types of oxidative injury, as well as testing the effects of calpain inhibition on the observed nuclear accumulation (Figures 1-4; Supplementary Figures 1-6). We later resorted to the overexpression experiments in primary neurons (Figures 5-7; Supplementary Figure 7, 10) to gain more information about the identity of NFL fragment which was detected in the nucleus. Due to the low transfection efficiency of primary neurons, we performed an additional set of overexpression experiments in neuroblastoma ND7/23 cells (Figure 8; Supplementary Figures 8,9) and obtained similar results in a higher number of cells. We agree that having stable cell lines which e.g. express same levels of NFL domains would be a more elegant approach and we intend to make them for our follow-up studies, however the generation of said stable cell lines might be beyond the scope of this revision. Furthermore, looking at our data with overexpression of NFL domains in ND7/23 cells (Supplementary Figure 8,9), it appears to us as if different domains are rather homogenously expressed in different cells. While the expression levels might vary, it seems that they all show the same trend when it comes to their localization (which was the main point of those experiments).

      1. The interpretation that NF-L preventing DNA labeling cells is misinterpretation. NFs have very long half-life compared to other proteins. Due to oxidative damage, DNA is degraded in the cells but NFs that have very long half-life you see as NFs rings in the dead cells. So, NFs do not prevent DNA labeling, but DNA or chromatin is degraded in dead cells.

      We thank the reviewer for their useful insight. DNA degradation could certainly be the reason why we observe a lower fluorescence intensity of the propidium iodide fluorescence in the nuclei of injured neurons. We intend to discuss this in the revised manuscript. However, if the DNA degradation is the only reason for the lower PI fluorescence intensity, then the PI fluorescence intensity would be the same in all injured nuclei. In our experiments, we saw the reduced PI fluorescence intensity in nuclei that contained NFL accumulations and not in other nuclei. Additionally, we observed a reduction of SYTO16 fluorescent labeling of nuclei which contained accumulations of the NFL tail domain, even in the absence of oxidative injury. Due to these reasons we speculated that NFL accumulation in the nucleus might hinder nuclear dyes from interacting with the DNA. But this is only a speculation and we will try to clarify this further in the revised manuscript including alternative explanations.

      Minor comments: 1. In the introduction on page 4 reference is missing for NF transport, aggregation and perikaryal accumulation (on line 93).

      We will add a reference to the revised manuscript.

      1. The statement in discussion on page 14 line 454 for Zhu et al., 1997 study is not accurate. It should be modified to sciatic nerve crush not spinal cord injury.

      We will correct this mistake in the revised manuscript.

      1. What is the size of the calpain cleaved NF-L tail domain? If you perform immunoblots on cell extracts treated with oxidative agents one would know it.

      We will perform immunoblots on cell lysates and incorporate the corresponding results in the revised manuscript.

      1. Authors could make their conclusions clear. This is particularly true for the experiments in Figure 4 panels c and d. It is very difficult to understand the conclusions of the experiments. First state the expectation and then described whether the expectation is true or different.

      We will do as the reviewer suggested in the revised manuscript.

      1. The ICC images are at extremely low magnification. They should be shown at 100x or 120x so that details of the cell body and the nucleus can be seen.

      Our intention was to show larger fields of view and wherever appropriate insets, but we will try to improve this in the revised manuscript by either zooming in, cropping or adding additional insets with individual cell bodies and nuclei. In general, images were taken with an optimal resolution/pixel size in mind for any of the used objectives (60x/1.4 NA or 100x/1.49 NA) and we can easily modify our figure panels to show more details.

      1. Oxidative damage leads to beaded accumulation of NF-L in neurites and axons. Authors should address this issue.

      We will discuss this in the revised manuscript.

      1. The combination treatment of the inhibitors (last 3 sets of the Fig. 4 b) has no statistical significance should be removed.

      Actually, these differences were statistically significant (Supplementary Table 1). For clarity and as described in the figure legend (line 516: “The most relevant significant differences are indicated with an asterisk”) we showed only a subset of them on the graph, but we will change this in the revised manuscript.

      1. Why only two antibodies recognize cleaved NF-L? If the antibodies at directed at tail region, they should recognize it unless the phosphorylated tail at Ser473 may inibit the antibody binding. In that case NF-L Ser473 specific antibody (EMD Millipore: MABN2431) may be used to test this idea.

      This is a very good point that we also wonder about. Even if all antibodies are directed at tail region, exact epitopes are not described for all of them. That makes it also difficult for us to understand and speculate on this. However, we have already ordered the new antibody as suggested by the reviewer and we will experimentally test it.

      **Referees cross-commenting**

      I agree with the reviewer#1 about presenting the quantification data for the indicated figures to make conclusions strong and see how much of variation is there among sampled cells.

      As discussed in our response to reviewer #1, we will provide additional quantifications.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      4. Description of analyses that authors prefer not to carry out

      Reviewer #2, major comment 7. Authors could do chromatin immunoprecipitation (chip) analysis to identify NF-L binding sites on chromatin and perform gel shift assays to show NF-L tail domain binding to specific consensus DNA sequences.

      We thank the reviewer for their suggestion. We are very interested in performing additional experiments and identifying the NFL binding sites on the DNA (either by chromatin immunoprecipitation or DamID-seq) and we intend to perform these experiments as soon as possible. Unfortunately, at the moment we do not have the expertise to perform such experiments in our lab. Instead, this type of follow-up project requires establishing a collaboration which is beyond the scope of this revision.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper examines EEG responses time-locked to (or "entrained" by) musical features and how these depend on tempo and feature identity. Results revealed stronger entrainment to "spectral flux" than to other, more commonly tested features such as amplitude envelope. Entrainment was also strongest for lowest rates tested (1-2 Hz).

      The paper is well written, its structure is easy to follow and the research topic is explained in a way that makes it accessible to readers outside of the field. Results will advance the scientific field and give us further insights into neural processes underlying auditory and music perception. Nevertheless, there are a few points that I believe need to be clarified or discussed to rule out alternative explanations or to better understand the acquired data.

      We thank the Reviewer for taking the time to evaluate our manuscript and for the positive response. We have now conducted further analyses to strengthen our conclusion that neural synchronization was strongest at slower musical tempi and to rule out an alternative explanation that neural synchronization was strongest for music presented near its own original or “natural” tempo. We also added some points to the Discussion in response to your comments; revised text is reproduced as part of our point-by-point responses below for your convenience. The page and line numbers correspond to the manuscript file without track changes.

      1) Results reveal spectral flux as the musical feature producing strongest entrainment. However, entrainment can only be compared across features in an unbiased way if these features are all equally present in the stimulus. I wonder whether entrainment to spectral flux is only most pronounced because the latter is the most prominent feature in music. Can the authors rule out such an explanation?

      Respectfully, it is not fully clear to us based on the literature that entrainment can only be compared across features fairly when those features are equally presented in the stimulus. Previous work in the speech domain has compared entrainment to amplitude envelope vs. spectrogram, vs. a symbolic representation of the time of occurrence of different phonemes (Di Liberto et al., 2015). Work in the music domain has compared entrainment to amplitude envelope (and its derivative) vs. features quantifying melodic expectation (surprise and entropy, quantified using a hidden Markov-model trained on a corpus of Western music; Di Liberto et al., 2020). In these papers, there was no quantification of the degree to which each feature was present in the stimulus material, and when comparing such qualitatively different features, it is not clear to us how one would do so. Nonetheless, these studies used the resulting TRF-based dependent measures to evaluate which feature best predicted the neural response. Here, although we do not know what acoustic feature might be most present / strongest in music, we believe that we can investigate the degree to which each feature predicts the neural response. In fact, we might argue the sort of reverse of the logic in your comment – that the TRF results actually tell us which feature is perceptually or psychologically the most important in terms of driving brain responses, which may not be fully predictable from the acoustics of those features.

      From a data analysis perspective, we have independently normalized (z-scored) each feature as well as the neural data, as prescribed in Crosse et al., 2021, to try to level the playing field for the musical features we are comparing. Moreover, we made changes in the discussion to acknowledge your concern. The text is reproduced here for your convenience.

      p. 26, l. 489-497: “One hurdle to performing any analysis of the coupling between neural activity and a stimulus time course is knowing ahead of time the feature or set of features that will well characterize the stimulus on a particular time scale given the nature of the research question. Indeed, there is no necessity that the feature that best drives neural synchronization will be the most obvious or prominent stimulus feature. Here, we treated feature comparison as an empirical question (Di Liberto et al., 2015), and found that spectral flux is a better predictor of neural activity than the amplitude envelope of music. Beyond this comparison though, the issue of feature selection also has important implications for comparisons of neural synchronization across, for example, different modalities.”

      2) Spectral analyses of neural data often yield the strongest power at lowest frequencies. Measures of entrainment can be biased by the amount of power present, where entrainment increases with power. Can the authors rule out that the advantage for lower frequencies is a reflection of such an effect?

      Thank you for this insightful comment. In response to your comment and the comments of Reviewer 3, we normalized the TRF correlations, stimulus–response correlations, and stimulus–response coherences by surrogate distributions that were calculated separately for each musical feature and – importantly – for every tempo condition. Following Zuk et al., 2021, we formed surrogate distributions by shifting the relevant neural data time course relative to the stimulus-feature time courses by a random amount. We did this 50 times, and for each shift re-calculated all dependent measures. We then normalized our dependent measures calculated from the intact time series relative to these surrogate distributions by subtracting the mean and dividing by the standard deviation of the surrogate distribution (“z-scoring”). Since the approach of shifting the neural data leaves the neural time series intact, the power spectrum of the data is preserved, but only its relationship to the stimulus is destroyed. After normalization, the plots obviously look a little different, but the main results – a higher level of neural synchronization to slower stimulation tempi and in response to the spectral flux – remain.

      The changes can be found throughout the manuscript, but especially on p. 11, l. 210-218, Figures 2-3 and a more detailed explanation in the Methods section.

      p. 39, l. 821-829: “In order to control for any frequency-specific differences in the overall power of the neural data that could have led to artificially inflated observed neural synchronization at lower frequencies, the SRCorr and SRCoh values were z-scored based on a surrogate distribution (Zuk et al., 2021). Each surrogate distribution was generated by shifting the neural time course by a random amount relative to the musical feature time courses, keeping the time courses of the neural data and musical features intact. For each of 50 iterations, a surrogate distribution was created for each stimulation subgroup and tempo condition. The z-scoring was calculated by subtracting the mean and dividing by the standard deviation of the surrogate distribution.”

      A related point, what was the dominant rate of spectral flux in the original set of stimuli, before tempo was manipulated? Could it be that the slow tempo was preferred because in this case participants listened to a most "natural" stimulus?

      This is a good point, thank you. We did two things to attempt to address this (see also comment Reviewer 3). First, the original tempo for each song can be found in Supplementary Table 1. To make the table more readable and more comparable with the main manuscript, we have updated the table and now state the original tempi in BPM and Hz. Second, we added histograms of the original tempi across all songs as well as the maximum amount by which all songs were tempo-shifted (i.e., the maximum tempo difference between the slowest (or fastest) version of each song segment compared to the original tempo). These histograms have been added to Figure 1 – figure supplement 2, and are paraphrased here for your convenience (p. 13 l. 265-273): The original tempo of the set of musical stimuli ranges between 1-2.75 Hz. This indeed overlaps with the tempo range that revealed strongest neural synchronization. When songs were tempo-shifted to be played at a slower tempo than the original, they were shifted by ~0.25-1.25 Hz. In contrast, shifting a song to have a faster tempo typically involved a larger shift of ~1-2.25 Hz. Thus, it is definitely possible that tempo, degree of tempo shift, and proximity to “natural” tempo were not completely independent values.

      For that reason, to investigate the effects of the amount of tempo manipulation on neural synchronization, we conducted an additional analysis. We compared TRF correlations for a) songs that were shifted very little relative to their original tempi to b) songs that were shifted a lot relative to their original tempi. We did not have enough song stimuli to do this for every stimulation tempo, but we were able to do the TRF correlation comparison for two illustrative stimulation tempo conditions (at 2.25 Hz and 1.5 Hz). In those tempo conditions, we took the TRF correlations for up to three trials per participant when the original tempo was around the manipulated tempo (1.25-1.6 Hz for 1.5 Hz or 2.01-2.35 Hz for 2.25 Hz) and compared it to those trials where the original tempo was around 0.75¬–1 Hz faster or slower than the manipulated tempo at which the participants heard the songs (Figure 3 – figure supplement 2). This analysis revealed that there was no significant effect of the original music tempi on the neural response (please see Material and Methods, p. 40, l. 855-861 and Results p. 13, l. 265-273). In response to your and Reviewer’s 3 comments, we also added this additional point to the discussion.

      p. 23-24 l. 427-436: “The tempo range within which we observed strongest synchronization partially coincides with the original tempo range of the music stimuli (Figure 1 – figure supplement 2). A control analysis revealed that the amount of tempo manipulation (difference between original music tempo and tempo the music segment was presented to the participant) did not affect TRF correlations. Thus, we interpret our data as reflecting a neural preference for specific musical tempi rather than an effect of naturalness or the amount that we had to tempo shift the stimuli. However, since our experiment was not designed to answer this question, we were only able to conduct this analysis for two tempi, 2.25 Hz and 1.5 Hz (Figure 3 – figure supplement 3), and thus are not able to rule out the influence of the magnitude of tempo manipulation on other tempo conditions.”

      3) The authors have a clear hypothesis about the frequency of the entrained EEG response: The one that corresponds to the musical tempo (or harmonics). It seemed to me that analyses do not sufficiently take that hypothesis into account and often include all possible frequencies. Restricting the analysis pipeline to frequencies that are expected to be involved might reduce the number of comparisons needed and therefore increase statistical power.

      Although we manipulated tempo, and so had an a priori hypothesis about the frequency at which the beat would be felt, natural music is a complex stimulus composed of different instruments playing different lines at different time scales, many or most of which are nonisochronous. Thus, we analyzed the data in two different ways – 1) based on TRFs and 2) based on stimulus–response correlation and coherence. Stimulus–response coherence is a frequency-domain measure, and so it was possible to do exactly as you suggest here and consider coherence only at the stimulation tempo and first harmonic, which we did (Figure 2E-J). However, for the TRF analyses, we followed previous literature (e.g., Ding et al., 2014; Di Liberto et al., 2020; Teng et al., 2021), and considered broader-band EEG activity (bandpass filtered at 0.5-30 Hz). Previous work has shown that the beat in music evokes a neural response at harmonics up to at least 4 times the beat rate (Kaneshiro et al., 2020), so we wanted to leave a broad frequency range intact in the neural data. Despite being based on differently filtered data, we found that the dependent measures from the two analysis approaches were correlated, which suggests to us that neural tracking at the stimulation tempo itself was probably the largest contributor to the results we observed here.

      Related to your comment, we added two points to our discussion, which we reproduce here for your convenience.

      p. 24-25, l. 453-461: “Regardless of the reason, since frequency-domain analyses separate the neural response into individual frequency-specific peaks, it is easy to interpret neural synchronization (SRCoh) or stimulus spectral amplitude at the beat rate and the note rate – or at the beat rate and its harmonics – as independent (Keitel et al., 2021). However, music is characterized by a nested, hierarchical rhythmic structure, and it is unlikely that neural synchronization at different metrical levels goes on independently and in parallel. One potential advantage of TRF-based analyses is that they operate on relatively wide-band data compared to Fourier-based approaches, and as such are more likely to preserve nested neural activity and perhaps less likely to lead to over- or misinterpretation of frequency-specific effects.”

      p. 29 l. 564-577: “Despite their differences, we found strong correspondence between the dependent variables from the two types of analyses. Specifically, TRF correlations were strongly correlated with stimulation-tempo SRCoh, and this correlation was higher than for SRCoh at the first harmonic of the stimulation tempo for the amplitude envelope, derivative and beat onsets (Figure 4 - figure supplement 1). Thus, despite being computed on a relatively broad range of frequencies, the TRF seems to be correlated with frequency-specific measures at the stimulation tempo. The strong correspondence between the two analysis approaches has implications for how users interpret their results. Although certainly not universally true, we have noticed a tendency for TRF users to interpret their results in terms of a convolution of an impulse response with a stimulus, whereas users of stimulus–response correlation or coherence tend to speak of entrainment of ongoing neural oscillations. The current results demonstrate that the two approaches produce similar results, even though the logic behind the techniques differs. Thus, whatever the underlying neural mechanism, using one or the other does not necessarily allow us privileged access to a specific mechanism.”

      Reviewer #2 (Public Review):

      Kristin Weineck and coauthors investigated the neural entertainment to different features of music, specifically the amplitude envelope, its derivative, the beats and the spectral flux (which describes how fast are spectral changes) and its dependence on the tempo of the music and self-reports of enjoyment, familiarity and ease of beat perception.

      They use and compare analysis approaches typically used when working with naturalistic stimuli: temporal response functions (TRFs) or reliable components analysis (RCA) to correlate the stimulus with its neural response (in this case, the EEG). The spectral flux seems the best music descriptor among the tested ones with both analyses. They find a stronger neural response to stimuli with slower beat rates and predictable stimuli, namely familiar music with an easy-to-perceive beat. Interestingly, the analysis does not show a statistically significant difference between musicians and non-musicians.

      The authors provide an extensive analysis of the data, but some aspects need to be clarified and extended.

      We thank the Reviewer for taking the time to evaluate and summarize our manuscript and for the great comments. We addressed the concerns and made changes throughout the manuscript, but especially in the introduction and discussion sections about the terminology (neural entrainment and neural measures), musical features of the stimuli, and musical experience of the participants. Below you can find the alterations described in more detail. The page and line numbers correspond to the manuscript file without track changes.

      1) It would be helpful to clarify better the concepts of neural entertainment, synchronization and neural tracking and their meaning in this specific context. Those terms are often used interchangeably, and it can be hard for the reader to follow the rest of the paper if they are not explicitly defined and related to each other in the introduction. Note that this is fundamental to understanding the primary goal of the paper. The authors clarify this point only at the end of the discussion (lines 570-576). I suggest moving this part in the introduction. Still, it is unclear why the authors use the TRF model and then say they want to be agnostic about the physiological mechanisms underlying entertainment. The choice of the TRF (as well as the stimulus representation) automatically implies a hypothesis about a physiological mechanism, i.e., the EEG reflects convolution of the stimulus properties with an impulse response. Please could you clarify this point? I might have missed it.

      Thank you for this valuable comment. We agree that it is fundamental to define and uniformly use terminology, and have made changes throughout the manuscript along these lines. First of all, we have changed all instances of “neural entrainment” or “neural tracking” to “neural synchronization”, as we think this term avoids evoking a specific theoretical background or strong mechanistic assumptions. Second, we have moved the Discussion paragraph you mention to the Introduction and expanded it. Specifically, we take the opportunity to address the association between specific analysis approaches (TRFs vs. stimulus–response correlation or coherence) and specific mechanistic assumptions (convolution of stimulus properties with an impulse response vs. entrainment of an ongoing oscillation, respectively). This allowed us to clarify what we mean when we say we prefer to stay agnostic to specific mechanistic interpretations. We are happy to have had the chance to strengthen this discussion, and think it benefits the manuscript a lot.

      We reproduce the new Introduction paragraph here for your convenience.

      p. 5-6, l. 101-123: “The current study investigated neural synchronization to natural music by using two different analysis approaches: Reliable Components Analysis (RCA) (Kaneshiro et al., 2020) and temporal response functions (TRFs) (Di Liberto et al., 2020). A theoretically important distinction here is whether neural synchronization observed using these techniques reflects phase-locked, unidirectional coupling between a stimulus rhythm and activity generated by a neural oscillator (Lakatos et al., 2019) versus the convolution of a stimulus with the neural activity evoked by that stimulus (Zuk et al., 2021). TRF analyses involve modeling neural activity as a linear convolution between a stimulus and relatively broad-band neural activity (e.g., 1–15 Hz or 1–30 Hz; (Crosse et al., 2016, Crosse et al., 2021); as such, there is a natural tendency for papers applying TRFs to interpret neural synchronization through the lens of convolution (though there are plenty of exceptions to this e.g., (Crosse et al., 2015, Di Liberto et al., 2015)). RCA-based analyses usually calculate correlation or coherence between a stimulus and relatively narrow-band activity, and in turn interpret neural synchronization as reflecting entrainment of a narrow-band neural oscillation to a stimulus rhythm (Doelling and Poeppel, 2015, Assaneo et al., 2019). Ultimately, understanding under what circumstances and using what techniques the neural synchronization we observe arises from either of these physiological mechanisms is an important scientific question (Doelling et al., 2019, Doelling and Assaneo, 2021, van Bree et al., 2022). However, doing so is not within the scope of the present study, and we prefer to remain agnostic to the potential generator of synchronized neural activity. Here, we refer to and discuss “entrainment in the broad sense” (Obleser and Kayser, 2019) without making assumptions about how neural synchronization arises, and we will moreover show that these two classes of analyses techniques strongly agree with each other.”

      2) Interestingly, the neural response to music seems stronger for familiar music. Can the authors clarify how this is not in contrast with previous works that show that violated expectations evoke stronger neural responses ([Di Liberto et al., 2020] using TRFs and [Kaneshiro et al., 2020] using RCA])? [Di Liberto et al., 2020] showed that the neural response of musicians is stronger than non-musicians as they have a stronger expectation (see point 2). However, in the present manuscript, the analysis does not show a statistically significant difference between musicians and non-musicians. The authors state that they had different degrees of musical training in their dataset, and therefore it is hard to see a clear difference. Still, in the "Materials and Methods" section, they divided the participants into these two groups, confusing the reader.

      Our findings are consistent with previous studies showing stronger inter-subject correlation in response music in a familiar style vs. music in an unfamiliar style (Madsen et al., 2019) and stronger phase coherence in response to familiar relative to unfamiliar sung utterances (Vanden Bosch der Nederlanden et al., 2022). We actually don’t think our results (stronger neural synchronization for familiar music) or these previous results are incompatible with work showing that violations of expectations evoke stronger neural responses. This work either manipulated music so it violated expectations (Kaneshiro et al., 2020) or explicitly modeled “surprisal” as a feature (Di Liberto et al., 2020). Thus, we could think of those stronger neural responses to expectancy violations as reflecting something like “prediction error”. Our music stimuli did not contain any violations, and we were unable to model responses to surprisal given the nature of our music stimuli, as we better explain below (p. 27 l. 514-529). Thus, neural synchronization was stronger to familiar music, and we would argue that listeners were able to form stronger expectations about music they already knew. We would predict that expectancy violations in familiar music would evoke stronger neural responses to those in unfamiliar music, though we did not test that here. We now include a paragraph in the Discussion reconciling our findings with the papers you have cited.

      p. 27 l. 514-529: “We found that the strength of neural synchronization depended on the familiarity of music and the ease with which a beat could be perceived (Figure 5). This is in line with previous studies showing stronger neural synchronization to familiar music (Madsen et al., 2019) and familiar sung utterances (Vanden Bosch der Nederlanden et al., 2022). Moreover, stronger synchronization for musicians than for nonmusicians has been interpreted as reflecting musicians’ stronger expectations about musical structure. On the surface, these findings might appear to contradict work showing stronger responses to music that violated expectations in some way (Kaneshiro et al., 2020, Di Liberto et al., 2020). However, we believe these findings are compatible: familiar music would give rise to stronger expectations and stronger neural synchronization, and stronger expectations would give rise to stronger “prediction error” when violated. In the current study, the musical stimuli never contained violations of any expectations, and so we observed stronger neural synchronization to familiar compared to unfamiliar music. There was also higher neural synchronization to music with subjectively “easy-to-tap-to” beats. Overall, we interpret our results as indicating that stronger neural synchronization is evoked in response to music that is more predictable: familiar music and with easy-to-track beat structure.”

      Your other question was why we did not see effects of musical training / sophistication on neural synchronization to music, when other studies have. There are a few possible reasons for this. One is that previous studies aiming to explicitly test the effects of musical training recruited either professional musicians or individuals with a high degree of musical training for their “musician” sample. In contrast, we did not target individuals with any degree of musical training, but attempted this analysis in a post-hoc way. For this reason, our musicians and nonmusicians were not as different from each other in terms of musical training as in previous work. Given this, we have opted to remove the artificial split into musician and nonmusician groups, and now only include a correlation with musical sophistication (as you suggest in your next comment), which was also nonsignificant (Figure 5 – figure supplement 2).

      3) Musical expertise was also assessed using the Goldsmith Music Sophistication Index, which could be an alternative to the two-group comparison between musicians and non-musicians. Does this mean that in Figure 5, we should see a regression line (the higher the Gold-MSI, the higher should be the TRF correlation)? Since we do not see any significant effect, might this be due to the choice of the audio descriptor? The spectral flux is not a high-level descriptor; maybe it is worth testing some high-level descriptors such as entropy and surprise. The choice of the stimulus features defines linear models such as the TRF as they determine the hierarchical level of auditory processing, and for testing the musical expertise, we might need more than acoustic features. The authors should elaborate more on this point.

      It is true that the Goldsmith Music Sophistication Index serves as an alternative way of investigating the effects of musical expertise on neural synchronization to natural music, and we now include this approach exclusively instead of dividing our sample (see response to the previous comment). Indeed, if musical sophistication would have an effect on the TRF correlations in this study, we would see a regression line in Figure 5 – figure supplement 2. Based on our experiment it is difficult to assess whether the lack of a correlation between neural measures and musical expertise is based on our choice of stimulus features. That is because our experiment was designed to investigate the effects of fundamental acoustic features of music, and it was not possible to calculate high-level descriptors, such as the entropy or surprisal, for the music stimuli we chose to work with – the stimuli were polyphonic, and moreover were purchased in a .wav format, so we do not have access to the individual MIDI versions or sheet music of each song that would have been necessary to apply, for example, the IDyOM (Information Dynamics of Music) model. As we cannot rule out that the (lack of) effects of varying levels of musical expertise on TRF correlations is due to our choice of stimulus features, we added this to the discussion.

      p. 28 l. 541-546: “Another potential reason for the lack of difference between musicians and non-musicians in the current study could originate from the choice of utilizing pure acoustic audio-descriptors as opposed to “higher order” musical features. However, “higher order” features such as surprise or entropy that have been shown to be influenced by musical expertise (Di Liberto et al., 2020), are difficult to compute for natural, polyphonic music.”

      4) Regarding the stimulus representation, I have a few points. The authors say that the amplitude envelope is a too limited representation for music stimuli. However, before testing the spectral flux, why not test the spectrogram as in previous studies? Moreover, the authors tested the TRF on combining all features, but it was not clear how they combined the features.

      One of the main reasons that we did not use the spectrogram as a feature was that it wouldn’t be possible to use a two-dimensional representation for the RCA-based measures, SRCorr and SRCoh, so we would not have been able to compare across analysis approaches. However, spectral flux is calculated directly from the spectrogram, and so is a useful one-dimensional measure that captures the spectro-temporal fluctuations present in the spectrogram (https://musicinformationretrieval.com/novelty_functions.html). Thank you for making this important point, we added this explanation to the Materials and Methods section (p. 35 l. 726-727).

      Sorry for not explaining the multivariate TRF approach better. Instead of using only one stimulus feature, e. g. the amplitude envelope, several stimulus features can be concatenated into a matrix (with the dimensions: time T x 4 musical features M at different time lags), which is then used as an input for the mTRFcrossval, mTRFtrain and mTRFpredict of the mTRF Matlab Toolbox (Crosse et al., 2016) – actually this is exactly how using a 2D feature like the spectrogram would work. The multivariate TRF is calculated by extending the stimulus lag matrix (time course of one musical feature at different time lags, T × τwindow) by an additional dimension (time course of several musical features at different time lags, T × M x τwindow). We added an explanation to the Methods section of the manuscript and hope that it is this way better understandable:

      p. 39 l. 840-842: “For the multivariate TRF approach, the stimulus features were combined by replacing the single time-lag vector by several time-lag vectors for every musical feature (Time x 4 musical features at different time lags).”

      Reviewer #3 (Public Review):

      Subjects listened to various excerpts from music recordings that were designed to cover musical tempi ranging from 1-4 Hz, and EEG was recorded as subjects listened to these excerpts. The main and novel findings of the study were: 1) spectral flux, measuring sudden changes in frequency, were tracked better in the EEG than other measures of fluctuations in amplitude, 2) neural tracking seemed to be best for the slowest tempi, 3) measures of neural tracking were higher when subject's rated an excerpt as high for ease-of-tapping and familiarity, and 4) their measure of the mapping between stimulus feature and response could predict whether a subject tapped at the expected tempo or at 2x the expected tempo after listening to the musical excerpt.

      One of the key strengths of this study is the use of novel methodologies. The authors in this study used natural and digitally manipulated music covering a wide range of tempi, which is unique to studies of musical beat tracking. They also included both measures of stimulus-response correlation and phase coherence along with a method of linear modeling (the temporal response function, or TRF) in order to quantify the strength of tracking, showing that they produce correlated results. Lastly, and perhaps most importantly, they also had subjects tap along with the music after listening to the full excerpt. While having a measure of tapping rate itself is not new, combined with their other measures they were able to demonstrate that neural data predicted the hierarchical level of tapping rate, opening up opportunities to study the relationship between neural tracking, musical features, and a subject's inferred metrical level of the musical beat.

      Additionally, the finding that spectral flux produced the best correlations with the EEG data is an important one. Many studies have focused primarily on the envelope (amplitude fluctuations) when quantifying neural tracking of continuous sounds, but this study shows that, for music at least, spectral flux may add information that is tracked by the EEG. However, given that it is also highly correlated with the envelope, what additional features spectral flux contributes to measuring EEG tracking is not clear from the current results and worth further study.

      All four of their main findings are important for research into the neural coding of musical rhythm. I have some concerns, however, that two of these findings could be a consequence of the methods used, and one could be explained by related correlations to acoustic features:

      We thank the Reviewer for the very helpful review, the summary, and the great suggestions. We addressed the comments and performed additional analysis. We made changes throughout the manuscript, but especially 1) concerning the potential advantage of the neural response to slower music, 2) the effects of the amount of tempo manipulation on neural synchronization, 3) the SVM-related analysis and 4) the relation between stimulus features and behavioral ratings. The implemented modifications can be found below in more detail. The page and line numbers correspond to the manuscript file without track changes.

      The authors found that their measures of neural tracking were highest for the lowest musical tempos. This is interesting, but it is also possible that this is a consequence of lower frequencies producing a large spread of correlations. Imagine two signals that are fluctuating in time with a similar pattern of fluctuation. When they are correctly-aligned they are correlated with each other, but if you shift one of the signals in time those fluctuations are mismatched and you can end up with zero or negative correlations. Now imagine making those fluctuations much slower. If you use the same time shifts as before, the signals will still be fairly correlated, because the rates of signal change are much longer. As a result, the span of null correlations also increases. This can be corrected by normalizing the true correlations and prediction accuracies with a null distribution at each tempo. But with this in mind, it is hard to conclude if the greater correlations found for lower musical tempos in their current form are a true effect.

      Thank you for this great suggestion. We followed your lead (Zuk et al., 2021), and normalized all measures of neural synchronization (TRF correlation, SRCorr, SRCoh) relative to a surrogate distribution. The surrogate distribution was calculated by randomly and circularly shifting the neural data relative to the musical features for each of 50 iterations. This was done separately for every musical feature and stimulation tempo condition (Figures 2 and 3). After normalization, the results look qualitatively similar and the main results – spectral flux and slow stimulation tempi resulting in highest levels of neural synchronization – persist.

      The changes in the manuscript based on your comment (and the comment of Reviewer 1) can be found throughout the manuscript, but especially on p. 11, l. 210-218, Figures 2-3 and a more detailed explanation in the Methods section:

      p. 39, l. 821-829: “In order to control for any frequency-specific differences in the overall power of the neural data that could have led to artificially inflated observed neural synchronization at lower frequencies, the SRCorr and SRCoh values were z-scored based on a surrogate distribution (Zuk et al., 2021). Each surrogate distribution was generated by shifting the neural time course by a random amount relative to the musical feature time courses, keeping the time courses of the neural data and musical features intact. For each of 50 iterations, a surrogate distribution was created for each stimulation subgroup and tempo condition. The z-scoring was calculated by subtracting the mean and dividing by the standard deviation of the surrogate distribution.”

      If the strength of neural tracking at low tempos is a true effect, it is worth noting that the original tempi for the music clips span 1 - 2.5 Hz (Supplementary Table 1), roughly the range of tempi exhibiting the largest prediction accuracies and correlations. All tempos above this range are produced by digitally manipulating the music. It is possible that the neural tracking measures are higher for music without any digital manipulations rather than reflecting the strength of tracking at various tempi. This could also be related to the author's finding that neural tracking was better for more familiar excerpts. This alternative interpretation should be acknowledged and mentioned in the discussion.

      Thank you for these important suggestions (see also comment #2 (part 2) from Reviewer 1). First up, it is important to say that all music stimuli were tempo manipulated: even if the tempo of an original music segment was e. g. 2 Hz and the same song was presented at 2 Hz, it was still converted via the MAX patch to 2 Hz again (to make it comparable to the other musical stimuli). Second, it is true that we cannot fully exclude the possibility that the amount of tempo manipulation could have an effect on neural synchronization to music – meaning that less tempo manipulated music segments (so a stimulation tempo close to the original tempo) could result in higher neural synchronization. However, we have now conducted an additional analysis to address this as best we could.

      We compared TRF correlations for a) songs that were shifted very little relative to their original tempi to b) songs that were shifted a lot relative to their original tempi. We did not have enough song stimuli to do this for every stimulation tempo, but we were able to do the TRF correlation comparison for two illustrative stimulation tempo conditions (at 2.25 Hz and 1.5 Hz). In those tempo conditions, we took the TRF correlations for up to three trials per participant when the original tempo was around the manipulation tempo (1.25-1.6 Hz for 1.5 Hz or 2.01-2.35 Hz for 2.25 Hz) and compared it to those trials where the original tempo was around 0.75¬–1 Hz faster or slower than the manipulated tempo at which the participants heard the songs (Figure 3 – figure supplement 2). This analysis revealed that there was no significant effect of the original music tempi on the neural response (please see Material and Methods, p. 40, l. 855-861 and Results p. 13, l. 265-273). In response to your and Reviewer’s 1 comments, we also added it to the discussion.

      p. 23-24 l. 427-436: “The tempo range within which we observed strongest synchronization partially coincides with the original tempi of the music stimuli (Figure 1 – figure supplement 2). A control analysis revealed that the amount of tempo manipulation (difference between original music tempo and tempo the music segment was presented to the participant) did not affect TRF correlations. Thus, we interpret our data as reflecting a neural preference for specific musical tempi rather than an effect of naturalness or the amount that we had to tempo shift the stimuli. However, since our experiment was not designed to answer this question, we were only able to conduct this analysis for two tempi, 2.25 Hz and 1.5 Hz (Figure 3 – figure supplement 3), and thus are not able to rule out the influence of tempo manipulation on other tempo conditions.”

      We also provide more information to the reader about the amount of tempo shift that each stimulus underwent. We added two plots to the manuscript that show 1) the distribution of original tempi of the music stimuli and 2) the distribution of the amount of tempo manipulation across all stimuli (Figure 1 – figure supplement 2).

      Their last finding regarding predicting tapping rates is novel and important, and the model they use to make those predictions does well. But I am concerned by how well it performs (Figure 6), since it is not clear what features of the TRF are being used to produce this discrimination. Are the effects producing discriminable tapping rates and stimulation tempi apparent in the TRF? I noticed, though, that these results came from two stages of modeling: TRFs were first fit to groups of excerpts with different tapping rates or stimulation tempo separately, then a support vector machine (SVM) was used to discriminate between the two groups. So, another way to think about this pipeline is that two response models (TRFs) were generated for the separate groups, and the SVM finds a way of differentiating between them. There is no indication about what features of the TRFs the SVM is using, and it is possible this is overfitting. Firstly, I think it needs to be clearer how the TRFs are being computed from individual trials. Secondly, the authors construct surrogate data by shuffling labels (before training) but it is not clear at which training stage this is performed. They can correct for possible issues of overfitting by comparing to surrogate data where shuffling happens before the TRF computation, if this wasn't done already.

      Thank you for noticing this important point. You are absolutely right – when re-analyzing that part of the results based on your comment, we noticed that we had an error in our understanding of the analysis pipeline. Indeed, we first calculated two TRF models for the separate groups (e. g. stimulation tempo = tapping tempo vs. stimulation tempo = 2* tapping tempo) based on all trials of each group apart from the left-out-trial. Next, the resulting TRFs were fed into the SVM which was used to predict the group. The shuffling of the surrogate data occurred at the SVM training step.

      Based on your comment, we tried several approaches to solve this problem. First, we calculated TRFs on a single-trial basis (instead of using the two-group TRFs as before, only one trial was used to calculate the TRFs) and submitted the resulting TRFs to the SVM. The resulting SVM accuracy was compared to a “surrogate SVM accuracy” which was calculated based on shuffling the labels when training the SVM classifier. Second, we shuffled, as you suggest, the labels not at the SVM training step, but instead prior to the TRF calculation. This way we could compare our “original” SVM accuracies (based on the two-group TRFs) to a fairer surrogate dataset. However, in both cases the resulting SVM accuracies did not perform better than the surrogate data. Therefore, we felt that it is the fairest to remove this part from the manuscript. We are aware that this was one of the main results of the paper and we are sorry that we had to remove it. However, we feel that our paper is still strong and offers a variety of different results that are important for the auditory neuroscience community.

      Lastly, they show that their measures of neural tracking are larger for music with high familiarity and high ease-of-tapping. I expect these qualitative ratings could be a consequence of acoustic features that produce better EEG correlations and prediction accuracies, especially ease-of-tapping. For example, music with acoustically-salient events are probably easier to tap to and would produce better EEG correlations and prediction accuracies, hence why ease-of-tapping is correlated with the measures of neural tracking. To understand this better, it would be useful to see how the stimulus features correlate with each of these behavioral ratings.

      We agree that our rating-based results could be influenced by acoustic stimulus features (at least for ease of tapping, it’s actually not clear to us why familiarity would be related to acoustics). As it is difficult to correlate stimulus features (time-domain, and one time course per song) with behavioral ratings (one single value per song per participant), we conducted frequency-domain analysis on the musical features to arrive at a single value quantifying the strength of spectral flux at the stimulation frequency and its first harmonic. We calculated single-trial FFTs on the spectral flux (which was used for the main Figure 5) for the 15 highest- and 15 lowest-rated trials per behavioral category (enjoyment, familiarity, ease to tap the beat) and participant. We compared the z-scored FFT peaks at the stimulation tempo and first harmonic for the top- and bottom-rated stimuli. We did observe significant acoustic differences between top- and bottom-rated stimuli in each category, but the differences were not in the direction that would be expected based on acoustically more salient events leading to better TRF correlations, with the exception of ease of tapping. Easy-to-tap music did indeed have stronger spectral flux than difficult-to-tap music, which is intuitive. However, spectral flux was stronger for more enjoyed music (we did not see any significant differences between TRF correlations of more vs. less enjoyed music; Figure 5C) and for less familiar music (this is the opposite of what we saw for the TRF measures). Overall, given the inconsistent relationship between acoustics, behavioral ratings, and TRF measures, we would argue that acoustic features alone cannot solely explain our results (Figure 5 – figure supplement 1, p. 21 l. 381 – 387).

    1. MIT/Brown Vannevar Bush 研讨会于 1995 年 10 月 12 日至 13 日在麻省理工学院举行,以庆祝1945 年 7 月在大西洋月刊上发表的万尼瓦尔·布什(Vannevar Bush)的开创性文章《诚如所思》(As We May Think)发表50 周年。活动视频分五部分,这是第一部分,内容主要是:

      1.Paul Penfield的开幕致辞。

      2.Andy van Dam介绍万尼瓦尔·布什及其经历。

      3.Paul Kahn(Memex专家):布什作品的视觉之旅(Visual tour of Bush's work)。

      4.道格拉斯·恩格尔巴特(Douglas Engelbart):对集体智慧的战略追求(The Strategic Pursuit of Collective IQ)。内容介绍:对我来说,布什在《诚如我思》中留下的遗产直接关系到提高人类组织所代表的社会有机体的集体智慧的非常真实和重要的潜力。最认真和有效地追求这种潜力的公司、机构--实际上是国家--显然会有强大的成功/生存优势。除此之外,整个人类能否在一个健康和 "人性化 "的社会、政治、经济和生态环境中生存,很可能取决于我们如何尽快和有效地明确追求这一潜力。

      认真的追求将涉及到我们思考方式的许多变化,与 "我们工作方式"的许多同步变化相协调--以及我们可以合作、分享、扮演新的角色、行使新的/不同的技能和方法集,等等。简而言之,这将涉及到将人类的基本感觉、运动、精神和学习能力与集体开发、整合和应用知识的任务相结合的根本性新方法。

      有效的追求将需要一种战略方法,其接受程度肯定会涉及到一些普遍存在的范式的关键转变。我想描述一下它们,以及它们在追求大规模集体智商显著提高的候选"引导"战略中的相对作用。

      技术只是该战略中的一个重要因素,在这个因素中,关键是要加快开放的超文件系统的发展,要有适当的通用功能、应用领域、互操作性和可扩展性的目标。WWW/HTML的激动人心的出现提供了一个极其重要的推动力;我想描述一下下一阶段向OHS目标演变的一些候选者。

      5.泰德·尼尔森(Theodor Holm Nelson):小路通向何方(Where the Trail Leads)。内容介绍:像任何简洁的预言作品一样,《诚如我思》支持许多解释,并导致推断的问题。我们今天聚集在一起表示敬意,并争论谁的想法最忠实地表达了最初所说的内容。

      布什预见到了一个可公开访问的、快速访问的连接性文献,这将允许人们发表已经存在的材料之间的连接。但他所预见的结构,即他所称的"线索",与今天的意大利面条式的超文本相当不同;布什的结构是基于转包(transclusion)而不是链接。它值得详细研究。

      经过适当的推断和打磨,我相信这个想法会导致跨平行媒体(连接的对象与它们的连接一起被看到),以及设计一个广泛的版权安排,以便不受约束地重新使用。

    1. MIT/Brown Vannevar Bush 研讨会于 1995 年 10 月 12 日至 13 日在麻省理工学院举行,以庆祝1945 年 7 月在大西洋月刊上发表的万尼瓦尔·布什(Vannevar Bush)的开创性文章《诚如所思》(As We May Think)发表50 周年。活动视频分五部分,这是第三部分,内容主要是:1.迈克·莱斯克(Michael Lesk)发表的演讲:“信息检索的七个时代”(The Seven Ages of Information Retrieval)。内容介绍:万尼瓦尔·布什(Vannevar Bush)在1945年的文章中提出了一个快速获取世界图书馆内容的目标,看起来它将在65年后的2010年实现。因此,它的历史堪比一个人的历史。信息检索在20世纪50年代和60年代初有其学生时代的研究阶段;然后在20世纪70年代努力争取采用,但在20世纪80年代和90年代,随着自由文本检索系统的常规使用,它已被接受。例如,我的公司不再用纸印刷其公司电话簿。现在,它正在继续前进,开展声音和图像检索项目,同时以电子方式提供现在图书馆中的大部分内容。我们可以期待着布什的梦想在一个生命周期内完成。2.第 1 天小组讨论。

    1. MIT/Brown Vannevar Bush 研讨会于 1995 年 10 月 12 日至 13 日在麻省理工学院举行,以庆祝1945 年 7 月在大西洋月刊上发表的万尼瓦尔·布什(Vannevar Bush)的开创性文章《诚如所思》(As We May Think)发表50 周年。活动视频分五部分,这是第二部分,内容主要是:

      1.罗伯特·卡恩(Robert Kahn)发表的演讲:“用数字技术增强布什的愿景”(Augmenting Bush's Vision with Digital Technology)。内容介绍:尽管万尼瓦尔·布什(Vannevar Bush)在他的经典论文《诚如所思》(As We May Think)中描述了信息共享的重要性,但他的视野必然受到当时技术的限制。特别是,我们现在认为理所当然的数字计算和通信技术甚至还没有进入他的参考框架。本讲座将探讨计算机和通信基础设施的可能演变,以及架构、技术和智能在该系统中的作用。连通性以及几乎无限的数字对象、通用服务和应用将刺激网络中的思想共享、各种联合活动、虚拟实体和团队工作。在分布式任务执行的背景下,网络内和网络外的软件代理的作用将被考虑。最后,将对智能分布式系统的前景进行探讨。

      2.蒂姆·伯纳斯-李(Tim Berners-Lee)发表的演讲:“超文本和我们的集体命运”(Hypertext and Our Collective Destiny)。内容介绍:布什考虑到研究人员被无法获取的信息所淹没的困境。他提出了MEMEX,一种可以快速访问并允许信息片段之间随机链接的机器。此后,网络和计算机使我们在速度和便利性方面超过了这个带有远见的设想。然而,我们在解决政治问题、管理大型组织或放大我们的团体直觉的能力方面没有看到巨大的进步。 我们必须做得更多,而不是赋予个人权力。我们必须让一起互动的人和机器以新的方式作为一个群体来行事。现在,我们可以通过我们的信息制造线索,我们必须创造一个基质,在这个基质中,这些线索将成长为一个越来越有意义的整体,而不是一个纠结的群体。我们和我们的文件能够作为一个大型机器一起运作,但不是作为一个大型的头脑。各种规模的团体都必须获得直觉、关联和发明的天赋,这些天赋我们通常与人而不是机器联系在一起,然后我们才能迎接布什对人类的挑战,"在种族经验的智慧中成长",而不是 "在冲突中灭亡"。

  5. Jun 2022
    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to consider more the role of the predator in predator-prey interactions, particularly from a collective locomotion aspect. This is an aspect which at times has been overlooked, with many theories, experiments and models focusing largely on the prey response, independent of how the predator behaves. The major strengths are the (1) excellent writing, (2) quality of the figures, (3) quantity of data, and (4) question tackled. The major weaknesses are (1) the volume of information (as a reader, it is quite hard to distil key points from the sheer volume of what has been presented), (2) the confined captive environment making it difficult to draw comparisons with a wild-type scenario, and (3) lack of clarity about the wider implications of the work outside of the immediate field.

      We thank the reviewer for their thoughtful review and positive comments. To address the weaknesses highlighted by the reviewer, we have revised our manuscript throughout.

      Reviewer #2 (Public Review):

      The manuscript describes a laboratory-based predator-prey experiment in which pike hunt shiner fish as a way to gain insight into the selective pressures driving the evolution of collective behavior. Unlike the predictions of classical theoretical work in which prey on the edge of social groups are considered to be at highest risk of predation, the fish in the center of the school were primarily targeted by the pike. This is because the pike uses a hunting behavior in which it slowly moves to the center of the school, seemingly undetected, until it rapidly attacks prey directly in front of its snout. This study also differs from previous studies in that both the predator and prey motion are examined, and the success of predation attempts was precisely determined. While the study demonstrates why shiners would be under selective pressure to avoid the center of a school, I am not convinced that the results explain why shiners evolved to have schooling behavior.

      The reviewer indeed highlights one of the main findings of our study, that fish closer to the group center are more at risk of being attacked by pike. They also give a proper account of its possible explanation, and highlight some of the main ways in which our study differs from previous work. The reviewer states that our results do not explain why shiners evolved to school. We agree and note that we also don’t claim this anywhere in the manuscript. Rather, we state our study provides important new insights about differential predation risk in groups of prey and highlight the important role of predator attack strategy and decision-making and prey response, with potential repercussions for the costs and benefits of grouping.

      We have considerably revised our introduction to better explain the importance of understanding differential predation risk in animal groups (lines 36-50): A key challenge in the life of most animals is to avoid being eaten. Via effects such as enhanced predator detection (Lima, 1995; Magurran et al., 1985), predator confusion (Landeau and Terborgh, 1986), and risk dilution effects (Foster and Treherne, 1981; Turner and Pitcher, 1986), individuals living and moving in groups can reduce their risk of predation (Ioannou et al., 2012; Krause and Ruxton, 2002; Pitcher and Parrish, 1993; Ward and Webster, 2016). This helps explain why strong predation pressure is known to drive the formation of larger and more cohesive groups (Beauchamp, 2004; Krause and Ruxton, 2002; B. Seghers, 1974). However, the costs and benefits of grouping are not shared equally among individuals within groups, and besides differential food intake and costs of locomotion, group members themselves may experience widely varying risks of predation (Handegard et al., 2012; Krause, 1994; Krause and Ruxton, 2002). Where and who predators attack within groups not only has major implications for the selection of individual phenotypes, and thereby the emergence of collective behaviour and the functioning of animal groups (Farine et al., 2015; Jolles et al., 2020; Ward and Webster, 2016), but also shapes the social behaviour of prey and the properties and structure of prey groups. Hence, a better understanding of the factors that influence predation risk within animal groups is of fundamental importance.

      And in the discussion now better explain the potential evolutionary consequences of the findings of our work (lines 456-466): Predation is seen as one of the main factors to shape the collective properties of animal groups (Herbert-Read et al., 2017) and has so far generally been seen as to drive the formation of larger, more cohesive groups that exhibit collective, coordinated motion (see e.g. Beauchamp, 2004; Ioannou et al., 2012; B. H. Seghers, 1974). Our finding that central individuals are more at risk of being predated could actually have the opposite effect, with schooling having a selective disadvantage and over time result in weaker collective behaviour and less cohesive schools. However, we do not deem this likely as selection is likely to be group-size dependent, as discussed above. Furthermore, our multi-model inference approach revealed that, despite more central individuals experiencing higher predation risk, being close to others inside the school was still associated with a lower risk of being targeted. As most prey experience many types of predators, including sit-and-wait predators and active predators that hunt for prey, the extent and direction of such selection effects will depend on the broader predation landscape in which prey find themselves.

      Major strengths of the paper include the precise recording of the location and orientation of all fish at all times during the experiments. This indeed provides a rich dataset that can be used to search for the factors that predict the likelihood of attack and escape with higher statistical power.

      The major concern I have about the manuscript is that the results somewhat contradict the aim of the paper as expressed in the introduction and discussion: that predator-prey interactions explain the emergent evolution of collective behavior. Figure 2C shows that fish in smaller clusters or those that were totally isolated experienced lower rates of predation and were not included in any subsequent analyses. This would suggest that shiners experiencing predation from pike would be under strong selection to avoid schooling behavior altogether. Can you compare the likelihood of predation for individuals in non-central school locations compared to individuals outside of schools altogether? It might be helpful to investigate whether other predators of shiners use predation strategies that target prey on the edge of the school to help explain why schooling could be useful. Did the likelihood of schooling decrease throughout the trials?

      The reviewer makes a good point regarding the observation that pike tended to mainly attack individuals in the main school, questioning if this would result in a selective disadvantage for schooling. We would like to point out that this result is regarding the likelihood to attack an individual, not the likelihood for a successful attack. If we look at the later we find 5 out of 8 attacks away from the main school were successful, a ratio that is actually similar to that of the main school. More importantly, when wanting to understand how predation risk is linked to group size one needs to look at the per capita risk. If we do that for the group size we used in our study, despite a moderately elevated risk of being predated in a large group, the shiners in the main school still had considerably lower individual risk to be killed than those that occurred in small sub-groups or were alone. We would like to note that in our study the shiners did not really show proper fission-fusion behaviour and by far the majority of the time the shiners were in one large cohesive school. Therefore, we feel our dataset is not suitable for a proper investigation about the role of group size in predation risk.

      We now clarify these points in the discussion (lines 467-471): While the finding that pike were more likely to attack the main school may also appear to indicate a selective disadvantage to school, calculating the per-capita-risk for each individual would actually reveal it is still safest to be part of the main school. Nevertheless, as the shiners in our study rarely exhibited fission-fusion dynamics we feel our dataset is not appropriate to make proper inferences about how predation risk is linked to group size.

      We have also slightly extended the relevant sentences in the results to further clarify the clustering results (lines 144-150): We found that, by and large, the shiners were organised in one large, cohesive school at the time of attack and rarely showed fission-fusion behaviour (merging and splitting of schools) during the trials. Only occasionally there were one or two singletons besides the main school (25 attacks) or multiple clusters of more than two fish (12 attacks Figure 2C), which tended to exist relatively briefly (mean school size: 36.5 ± 0.8). In more than 80% of these cases, pike still targeted an individual in the main cluster (Figure 2C).

      We now also provide more discussion about other predator types being likely to attack central prey (lines 343-354): That predators may actually enter groups and strike at central individuals is not often considered (Hirsch and Morrell, 2011), possibly because it contrasts with the long-standing idea that predation risk is higher on the edge of animal groups (Duffield and Ioannou, 2017; Krause, 1994; Krause and Ruxton, 2002; Stankowich, 2003). However, our finding is in line with the predictions of theoretical work that suggest that the extent of marginal predation may depend on attack strategy and declines with the distance from which the predator attacks (Hirsch and Morrell, 2011). Furthermore, increased risk of individuals near the centre of groups may be more widespread than currently thought. Predators not only exhibit stealthy behavioural tactics that enable them to approach and attack central individuals, as we show here, but may also do so by attacking groups from above (Brunton, 1997) or below (Clua and Grosvalet, 2001; Hobson, 1963; but see Romey et al., 2008), and by rushing into the main body of the group (Handegard et al., 2012; Hobson, 1963; Parrish et al., 1989).

      We furthermore discuss the potential role of group size on the observed effects (lines 441-455): In particular, while group size is not expected to effect much whether ambush predators are likely to attack internal individuals, the specific risk of central individuals could both be hypothesized to decrease with group size, such as if the predator is more likely to attack when surrounded by prey, or to not be affected by it, such as if the predator actively targets central individuals. Whatever the process, the observed findings are likely for prey that move in groups of somewhat intermediate size; for very large groups, such as the huge schools encountered in the pelagic, ambush predators may simply not be able to attack the group centre due to spatial constraints. More generally, the tendency for predators to attack the centre of moving groups may depend on the medium in which the predator-prey interactions occur. As in the air there is potential for (fatal) collisions, and on land it is physically difficult for predators to enter groups and predators’ size advantage tends to be more limited, predators may be less likely to go for the group centre as compared to in aquatic or mixed (e.g. aerial predator hunting aquatic prey) systems. Hence, the important interplay we highlight between predator attack strategy and prey response may have different implications across different predator prey systems and warrants concerted further research effort.

      Finally, in response to the reviewer’s question if the likelihood to school decreased through the trials, we did not see a change in packing faction (median nearest-neighbour distance) with repeated exposure to the pike, but shiners increasingly avoided the area directly in front of the pike’s head (lines 182-186): While the shiners did not show a change in their packing fraction (median nearest-neighbour distance) with repeated exposure to the pike (F1,52 = 1.81, p = 0.185), they increasingly avoided the area directly in front of the pike’s head (Appendix 2 – Figure 1A) resulting in the pike attacking from increasingly further away (target distance: F1,52 = 45.52, p < 0.001, see Appendix 2 – Figure 1B,C). See also further Appendix 2.

      I am also curious whether tank size affects the behavior of the fish, both of the shiners and the pike. The pike seem to be approximately 1/3 the shortest length of the tank, and 6 inches of depth have constrained the movement to be mostly in the 2D plane. A lack of open space might limit the pike's ability to hunt in any way other than this stealthy strategy. Has this stealthy hunting strategy been described in other experiments in larger or more naturalistic conditions? Does open space affect the shiners' propensity to school? Although the manuscript describes that shiners tend to school near the surface of water, does the shallow depth affect the pike's behavior? The manuscript states that some pike never attacked -- were these the largest in the study?

      While the tank is small relative to the real world, we actually decided on this size of ~2m2 based on previous experimental work on predator-prey dynamics. As we stated in the methods of the original manuscript (lines 543-545) we expect that if a much larger space would have been used, pike would actually still show the same approach and attack behaviour linked to their stealthy attack strategy. The stealthy hunting behaviour of pike and similar predators and their ability to thereby get very close to their prey has been described elsewhere (see e.g. references on lines 332-344 of the original manuscript).

      We now better explain the potential limitation of the arena size in the discussion (lines 472-480): Laboratory studies on predator-prey dynamics like ours do, of course, have their limitations. Although the size of the arena we used (~2m2) is in line with behavioural studies with large schools of fish (e.g. Sosna et al., 2019; Strandburg-Peshkin et al., 2013) and experiments with live predators attacking schooling prey (Bumann et al., 1997; Magurran and Pitcher, 1987; Neill and Cullen, 1974; Romenskyy et al., 2020; Theodorakis, 1989), compared to conditions in the wild the prey and predator had limited space to move. However, as pike are ambush predators they tend to move relatively little to search for prey and rather rely on prey movement for encounters (Nilsson and Eklöv, 2008). Increasing tank size would have made effective tracking extremely difficult, or impossible, and while a much larger tank is expected to considerably increase latency to attack, we expect it to have relatively little effect on the observed findings.

      We agree that the shallow depth of the tank is a limitation of our study and may have somewhat restricted the pikes’ natural behaviour, although pilot experiments showed that the pike exhibited normal movements and attack behaviours. Fish were tested in very shallow water to be able to acquire detailed individual-based tracking of the schools as well as compute features related to the visual field of the fish. We would also like to note that both shiners and pike can often be found in the littoral zone and come in very shallow water of only a few 10s of cm (see e.g. Krause et al., 2000b; Pierce et al., 2013; Skov et al., 2018), with some experimental work furthermore showing that pike may actually prefer shallow water (Hawkins et al., 2005). We don’t think that increasing the depth of the tank would have considerably changed the predatory behaviour of the pike, as the pike would be expected to still use their stealthy approach to get close to their prey even if the prey school would be more three-dimensional.

      We now provide a much more extensive discussion of the limited depth used in the discussion (lines 480-494): In terms of water depth, fish were tested in relatively very shallow water. This was primarily done to be able to keep track of individual identities and compute features related to the visual field of the fish. Shiners naturally school in very shallow water conditions as well as near the surface in deeper water in the wild (Hall et al., 1979; Krause et al., 2000b; Stone et al., 2016) and also pike primarily occur in the shallow littoral zone, sometimes only a few of tens of cm deep (Pierce et al., 2013; Skov et al., 2018). Furthermore, pilot experiment showed the pike did exhibit normal swimming and attack behaviour with attack speeds and acceleration comparable to previous work (Domenici and Blake, 1997; Walker et al., 2005). Recent other work on predator-prey dynamics did not find a considerable impact of adding the third dimension to their analyses (Romenskyy et al., 2020). Still, the water depth used is a limiting factor of our study and in the future this type of work should be extended to deeper water while still keeping track of individual identities over time. We expect that adding the third dimension would not change the stealthy attack behaviour of the pike and therefore still put more central individuals most at risk, but possibly attack success would be reduced because of increased predator visibility and prey escape potential in the vertical plane, which remains to be tested.

      We did not observe a relationship between pike size and tendency to attack.

      Reviewer #3 (Public Review):

      While it has long been clear that animals in groups (e.g., fish schools) benefit in terms of safety in numbers, there has also been a keen interest in which animals in the group are at higher versus lower risk (e.g., those in front, or along the edges) and how that might depend on the predator's attack strategy. This study addresses these important predator-prey details using a common predatory fish (northern Pike) attacking schools of prey fish (golden shiners). A strength of the study is that it uses cutting-edge video tracking and computational/statistical methods that allow it to quantify and follow each fish's (1 predator and 40 prey in a group) spatial position, relative spacing, orientation and even each individual's visual field and movement throughout each of 125 attacks. Most (70%) of these attacks were successful, but many were not. The variation in attack success allowed the investigators to do statistical analyses to identify key predator and prey behaviors that are associated with successful vs. unsuccessful attacks.

      The study yielded numerous interesting insights. While conventional wisdom pictures predators initiating an attack from outside of the group thus putting individuals at the group's edge at greatest risk, this study found that pike typically approached the school of prey headon both in terms of the group's orientation and direction of movement, and often stealthily moved within the group before initiating an attack. To understand which prey individual was targeted by the predator, the highly quantitative video analyses examined 11 measures of each individual prey's position and orientation at the time that the pike initiated its attack. Of course, pike showed a strong tendency to target one of the 3 closest prey, particularly prey that were more or less directly in front of the pike. However, contrary to conventional wisdom, the analysis showed that targeted prey were closer to the center than the edge, and that an individual's position and orientation relative to other nearby prey also played an important role in whether it might be targeted by the predator. Not surprisingly, analyses showed that targeted prey were more likely to escape if they were further from the predator's head and if they exhibited higher maximum acceleration. Interestingly, during the actual strike, on average, the predator accelerated to a speed about 50% faster than the velocity of the targeted prey.

      A limitation of the study (that the authors describe and discuss) is that it was conducted in a tank with no spatial refuges whereas in nature, pike are often found in areas with vegetation, and schools of prey can often potentially respond to the presence of a predator by moving towards refuge (e.g., vegetation). Also, the study was done in very shallow water (6 cm) -- likely shallower than many, if not most, natural predator-prey interactions for these species. In deeper water, the predator-prey interaction might be better analyzed in three dimensions (i.e., also accounting for variation in vertical height in the water), though the authors argue that this conventional idea is not necessarily true.

      Overall, this study provides an impressive example of the use of modern technology and statistical analyses allows us to better describe and understand the fine-scale behaviors that affect an interaction of high importance for ecology and evolution.

      We thank the reviewer for the care and attention put in their review and their detailed objective assessment of our study.

      Regarding refuge use, it is true that in the wild pike are often found in areas with vegetation, but it is actually predominantly younger pike seeking refuge among vegetation from predators themselves, including from cannibalism by larger pike (see Skov & Lucas, 2018 Chapter 5). Vegetation is also used by pike as background camouflage rather than a refuge per se, but due to their elongated body and narrow frontal body pike are able to approach and ambush prey when no vegetation is available, as we show in our study. During pilot experiments we did provide pike with refuges, but as they never used them, and it would provide a hiding place for hiding, which would have considerably impacted our ability to investigate predation risk within the schools, no refuges were provided during the experiment.

      We now added an explanation about not using refuges in the discussion (lines 495502): For our experiments we used a testing arena without any internal structures such as refuges. This was a strategic decision as providing a more complex environment would have impacted the ability of the shiners to school in large groups and would have led fish to hide under cover. Although studying predator-prey dynamics in more complex environments would be interesting in its own regard, it would not have allowed us to study the questions we are interested in about the predation risk of free-schooling prey. Furthermore, pilot experiments indicated that the pike never used refuges (consistent with previous work, see Turesson and Brönmark, 2004), so they were not further provided during the actual experiment.

      Regarding the shallow depth of the tank, we now better acknowledge this limitation and explain our reasoning (lines 480-482): In terms of water depth, fish were tested in relatively very shallow water. This was primarily done to be able to keep track of individual identities and compute features related to the visual field of the fish. We would also like to note that both shiners and pike spent a lot of their life in the littoral zone and occur in very shallow water of only a few 10s of cm (see e.g. Krause et al., 2000b; Pierce et al., 2013; Skov et al., 2018). Although the limited vertical space may have restricted the pikes’ natural behaviour to some extent, they did exhibit normal swimming and attack behaviour with attack speeds and acceleration comparable to previous work (Domenici and Blake, 1997; Walker et al., 2005). We now better discuss the limitation of the shallow depth used in the discussion on lines 477-494 (see also our responses above).

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, He and collaborators analyse eight samples from six patients with acral melanoma through single-cell RNA sequencing. They describe the tumour microenvironment in these tumours, including descriptions of interactions among distinct cell types and potential biomarkers. I believe the work is thoroughly done, but I have identified a few concerns in their depiction and interpretation of their results.

      Strengths:

      1) One of the few available single-cell studies of acral melanoma, including a non-European cohort of patients.

      2) Data will be very useful to study the immune landscape of these rare tumours.

      3) Data include adjacent tissue, primary tumours and a metastatic sample, covering all disease stages.

      4) Analyses seem to be carefully done.

      Things to improve:

      1) Figures need much more description to be understandable, in particular, axes should be clearly labeled and the colour code should be specified

      Thank you for your generous comments and suggestions. We have improved the integrity of some figures and added some figure legends. I believe this will further improve the quality of our manuscript.

      2) In some places, I would recommend the authors soften their interpretation of their analyses (for example, when they suggest targeting TNFRSF9+ T cells as a novel therapy), as these are nearly all bioinformatic in a small number of samples

      As for the conclusions of TNFRSF9, we indeed provided a possibility that TNFRSF9 may serve as a novel therapy. We made some changes to soften the statement. In addition, we have added instructions and explanations in the Discussion section.

      3) I don't think the experiments add much to the literature, as these test already known oncogenes on a common, non-acral melanoma cell line. Thanks for your comments regarding the experiments included in our study. We have pointed out this deficiency in the Discussion section, and made some experimental changes. For example, we have removed the TWIST1-related experiments from the main Results section and shown them only as non-focus work in the Supplementary Figure.

      It is difficult for us to obtain AM cell lines. No commercial AM cell lines can be purchased in ATCC or ECACC. AM cell lines are more difficult to establish and there are few reports on methods for establishing primary acral melanoma cell cultures (PMID: 22578220, PMID: 17488338). Some Japanese and Chinese researchers have isolated the primary generation of AM cells (e.g., PMID: 17488338, PMID: 22578220, PMID: 34097822), but due to the customs policy and the COVID-19 epidemic, we could not receive them within a short period. Moreover, these studies also stated their limitations; namely, that the stability during serial passaging had not been evaluated. Therefore, it may be very time-consuming to obtain operable AM cell lines for functional assays. However, our research group would like to have the opportunity to separate and culture primary cells in subsequent studies, and improve relevant experiments according to your valuable suggestions. Man thanks again for your comments.

      Reviewer #2 (Public Review):

      The study presented by Zan He et al dissects the main interactions between malignant and stromal cells present in acral melanoma samples and in adjacent tissues using single cell RNA sequencing. The study describes factors that allow communication between the different cell types, with a special focus on macrophages, lymphocytes and fibroblasts, along with malignant cells. Factors playing a role in cell-cell communication are identified and suggested to be relevant prognostic makers and/or attractive therapeutic targets.

      Historically, the study of acral melanomas has been neglected due to the low incidence among Europeandescents and this formed an important gap of knowledge in the field and hindered the development of effective therapies to control the disease. Therefore, studies that address this unmet need in melanoma research are very important and should be motivated. This includes singlecell sequencing studies that allow one to study the complexity of tumours, including microenvironment features that influence the development and effectiveness of certain types of treatment. The present study contributes information on how cells interact in the acral melanoma microenvironment and this could be a first step toward better understanding how these interactions influence acral melanoma development, progression, and therapy response.

      However, there are a few points that should be carefully considered. The authors use 3 adjacent tissues (which in theory is composed of normal skin next to a cancer lesion), 4 primary tumor samples, and one lymph node metastasis as a model to study tumor progression. Adjacent tissue is not considered a stage of tumour progression and the sample size is too small to rule out sample-dependent effects. The study is descriptive in nature and could better contextualize the findings regarding what is known for other subtypes of melanomas or other tumours. This is especially important to help readers understand why it would be relevant to study cutaneous melanomas located in acral skin. It would be helpful to explain how different it is from nonacral cutaneous melanoma, and what this study adds compared to other single-cell studies from cutaneous acral and non-acral melanomas.

      Thank you for your generous comments. It is not accurate to represent the adjacent tissue samples as ‘tumour progression’, and our study did not want to focus on the tumour developmental process. We have revised related description in the text. Tumour adjacent tissues (ATs) have always been the focus of research on TMEs. Some studies believe that there are a lot of mutations and clone amplification in normal tissues adjacent to cancer, which may be in a pre-cancerous state (PMID: 33004515), and many single-cell studies of tumours have also sampled and paired para-cancer tissues (e.g., PMID: 29988129; PMID: 35303421).

      The problem of sample size limits the generality of the results, as we pointed out in the Discussion section. Most acral melanoma (AM) patients opt for surgical resection at an early stage to avoid the possibility of metastasis. Hence, we rarely encounter patients with lymph gland (LG) metastases. We only collected one metastatic sample, because it is very rare in clinic. However, the sample has a high quality, such as a high cell activity of single cell suspension after dissociation (95.30%), and a rich amount of tumour cells and other stroma cells. Therefore, we added its sequencing data into the overall analyses, hoping to contribute to the comprehensiveness of resources and research.

      It is important to link this study with the findings regarding what is known for other subtypes of melanomas. We have already supplied the comparison of AMs with non-acral skin cutaneous melanomas (CMs), using the published data. Your comments and advices are entirely helpful to us, and we believe that the current manuscript is more comprehensive and complete.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors estimate growth curves ('nomograms') for hippocampal volume (HV) using Gaussian process regression applied to UK Biobank data and evaluate the influence of polygenic scores for HV on the estimated centile curves. By taking this into account, the centile scores are shifted up or down accordingly. The authors then apply this to the ADNI cohort and show that subjects with dementia mostly lie in the lower centiles, but this does not improve the prediction of transition from mild cognitive impairment to dementia.

      This paper is reasonably well written and the finding that centile curves for different phenotypes are sensitive to genetic features will be of interest to many in the field, albeit perhaps somewhat unsurprising given the polygenic score evaluated here is for the same phenotype under investigation (i.e. HV). I think using centiles derived from nomograms/normative models for precisely assessing both current staging and progression of neurological disorders is a highly promising direction. Regarding this manuscript, I have a few comments about the methodology and interpretation of results, which I will outline below.

      • My most significant concern is that It appears that the assumption of Gaussian residuals is violated by the HV phenotypes that the authors fit their GP to. For example, in figure 2, the distribution is clearly skewed, and the lower centiles -in particular- are poorly fit to the data. First, please provide additional metrics to assess the fit and calibration of these models quantitatively (the latter can be done e.g. via Q-Q plots).

      Thanks for pointing this out. We are sorry for causing this confusion. The skew in the figure appears because the scatter plot overlayed with the GP-generated nomogram is showing ADNI samples of all diagnoses – not the UKB training data used for the GP. The lower centiles are mainly occupied by the participants with AD or MCI (see the new plots in Figure 5). In addition, the healthy subjects from ADNI do indeed fit the model reasonably well. We have added a supplementary figure to show just the healthy subject and have made the following edits in the text to address the confusion:

      Lines 143-149: “Nomograms of healthy subjects generated using the SWA and GPR method displayed similar trends (Figure 2; Supplementary Figure S8). … This extension allowed 86% of all diagnostic groups from the ADNI to be evaluated versus 56% in the SWA Nomograms (Figure 2; Figure 2 – Figure Supplement 2).”

      Lines 159-170 (description of figure 2): “Figure 2: Comparing Nomogram Generation Methods. Nomograms produced from healthy UKB subjects using the sliding window approach (SWA) (red lines) and gaussian process regression (GPR) method (grey lines) … The benefits of this extension can be seen with scatter plots of ADNI subjects of all diagnoses overlayed (E, F… A similar figure with only the Cognitively Normal ADNI subjects can be found in Figure 2 – Figure Supplement 2

      Second, I think if the authors wish to make precise inferences about the centile distribution for the reference model, then the deviation from Gaussianity ought to be accommodated in some manner. There are several options for this, including different noise models (e.g. Gamma, inverse Gamma, SHASH, etc), variable transformation, or quantile regression. One option that could be useful in the context of Gaussian process regression is the use of likelihood warping (see e.g. Fraza et al 2021 Neuroimage and references therein) which was originally developed for GP models. I would recommend the authors pursue one of these routes and provide metrics to properly gauge the fit.

      This is an excellent point. However, we believe that given that the training data indeed follows a Gaussian distribution (see new Figure 4 – Figure Supplement 3; reproduced below) across the relevant strata (sex, PGS) and across age groups, such modifications are not required.

      • Related to the above, it is likely that the selection of subjects with high/low polygenic scores for HV changes the shape of the distribution. It is currently impossible to assess this because no data points are shown in these cases. Please also add this information, along with comparable quantitative metrics to those for the models above.

      Thank you for bringing this up. We have now added a new supplementary figure with the shape of these distributions along with the Shapiro-Wilkens test results for each of them. As can be seen, the Shapiro-Wilkens tests detects mild deviation from Normality in some cases. However, given the size of the strata N>2000 this is not surprising. Moreover, would multiple testing be applied here across the 48 comparisons, then none of the tests would be significant at the corrected threshold (P<0.001).

      • How did the authors handle site effects? There appears to be no adjustment for the fact that the ADNI data are acquired from different sites that were not used during the estimation of the normative models. I would expect to see this dealt with properly (e.g. via fixed or random effects included in the modelling) or at the very least a convincing demonstration that site effects are not clearly biasing the results.

      We agree that site effects are a major issue; we have rerun the application experiments after adjusting the ADNI volumes with NeuroCombat. The results did not change significantly, but we have changed all the reported results with the updated results. In addition, we noted this in the methods section:

      Lines 442-445: Finally, we used NeuroCombat 1 to adjust across ADNI sites and harmonize the volumes with the UKB Dataset. To do this we modelled 58 batches (UKB data as one batch and 57 ADNI sites as separate batches) and added ICV, sex, and diagnosis (assigning all UKB as Healthy and using the diagnosis columns in ADNI) to retain biological variation.

      • How do the authors interpret the finding that the relationship between the polygenic scores and HV is different in the cohorts they consider (i.e. bimodal in UKB and unimodal in ADNI)? Does this call into question the appropriateness of the subsampled model for the clinical cohort?

      While we do see a bimodal distribution in UKB the effect is not very strong as the other reviewers commented. Therefore, we have de-emphasized this aspect. One reason may be that we detect the slightly bimodal aspect in UKB because of greater statistical power due to the large sample size (one order of magnitude). One further aspect is the used SNP data, i.e., differences in genotyping platform and imputation. This is also the reason why integrating PGS directly into the predictive model comes with additional challenges. We have addressed this topic briefly in our discussion: Lines 390-392: “Lastly, a recent study of PGS uncertainty revealed large variance in PGS estimates63, which may undermine PGS based stratification; hence a more sophisticated method of building PGS or stratification may improve results further.”

      • Perhaps the authors can comment on (or better, evaluate) how this genetic shift could be accommodated in normative models (e.g. the possibility of including polygenic risk scores as predictor variables in the normative model). This would remove the need for post hoc adjustment and would allow more precise control over the adjustment than just taking the upper/lower xxx % of the PGS distribution as is done in the current manuscript.

      We agree that integration of the genetics directly into the normative models is a great idea. And this will be the direction we will be exploring in future work. However, PGS themselves are prone to show ‘site’ effects that depend on the genotyping method that was used as well as of the quality of genotyping and imputation. As a consequence, using the ‘raw’ PGS scores in predictive models brings its own challenges. Therefore, we feel that the current framework is simpler at this point and illustrates the potential of PGS when combined with normative models.

      • Related to my point above, it is perhaps unsurprising that the polygenic score for the HV phenotype influences the centile distribution. I think the paper would benefit considerably by also evaluating other polygenic scores (e.g., APOE4 as in some of the prior cited references). it would be interesting to compare the magnitude and shape differences for these adjustments. The authors can consider this an optional suggestion.

      Our rationale for focusing on HV PGS was that we sought to improve the accuracy of the normative model. The genetics influences HV and this is a first attempt to adjust for this in the normative modeling framework. Indeed, APOE-e4 has a sizable effect on HV. However, this is most likely mediated by nascent accelerated neurodegeneration, i.e., Alzheimer’s disease. Thus, in our view focusing on APOE-e4 would mean to focus on a disease effect. We address this issue briefly in the discussion (Lines 326-334). For sensitivity analysis, we did indeed test other PGS, such as AD and Whole-Brain-Volume, and found that these do not affect the normative models for HV.

      Reviewer #3 (Public Review):

      Given the large variation in and high heritability of hippocampus volume in the population, taking out known variation in the healthy population is a nice way of reducing heterogeneity, and a step forward towards using normative models in clinical practice. The dataset the nomograms are based on is large enough to do so even when stratified by polygenic scores for hippocampal volume, and these provide interesting information on the role of genetics in hippocampus volume.

      There are however several concerns regarding the applicability of the models to the ADNI dataset. First, the lack of overlap in the age range between the dataset the model is trained on and the application to subjects that are outside that age range is questionable. The authors prefer Gaussian process regression (GPR) over a sliding window-based approach using the argument that the former allows for predictions in a larger age range but extrapolation beyond the reach of the data is usually not valid. The claim that Supplementary Figure 6 shows accurate extension beyond these limits is in my opinion not justified. If anything, we can be rather certain that the extensive growth of the hippocampus up to age 48 is not realistic (see e.g. Dima et al., 2022).

      As mentioned already in response to reviewer #1, this was a miscommunication on our side. We only used the ADNI samples that were within the age range of the models they were being plotted against. The GPR model did not require smoothing at the edges of the age-range and thus can support a wider age range than the SWA. This is why we stated that the extension of the nomograms enabled more of the ADNI dataset to be used, i.e., because otherwise these samples were outside the range of the model and could not be used.

      We have changed the following lines in the manuscript to make this idea explicit:

      Lines 477-478 (end of GPR methods section): “For both SWM and GPR models, we only tested the ADNI samples that lay within the age range of each model respectively.”

      Regarding the accurate extension claim, we have edited the line (411-412) in the discussion so that it now reads:

      Lines 347-348 “In fact, our GPR model can potentially be extended a few years beyond those limits”

      Thank you for pointing out the discrepancy in the hippocampal growth around 48 with the results by Dima et al. 2022. Although sample sizes between the two studies are similar. The data availability in UKB for ages 45-50 is rather sparse (N<100; see new Figure 4 – Figure Supplement 3). Thus, the observed growth is likely due to under sampling. The growth effect has been observed in other studies using UKB data7,8. We have noted this in the discussion:

      Lines 354-356:” However, there is a possibility that our results suffer from edge effects. For example, we suspect that the peak noted in the male nomogram is likely due to under-sampling in the younger participants.”

      Second, the drop in mean 'percentile' difference between high and low polygenic scoring individuals that if one uses genetically adjusted nomograms seems nice, but this difference is currently just a number and the reader cannot see whether this difference is significant, or clinically relevant.

      We have now provided a new figure (Figure 5) that shows the boxplots behind those numbers. The MCI-to-AD conversion analyses in the ADNI explored the clinical benefit of genetically adjusted nomograms. However, adjusted, and un-adjusted percentiles performed equally well. In the discussion we argue that the MCI stage is already too late and earlier stages may benefit from the increased precision:

      Lines 373-378: “However, despite this sizable effect, genetically adjusted nomograms did not provide additional insight into distinguishing MCI subjects that remained stable or converted to AD. Nonetheless, the added precision may prove more useful in early detection of deviation among CN subjects, for instance in detecting subtle hippocampal volume loss in individuals with presymptomatic neurodegeneration.”

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this paper, the authors show that the turnover of centriole components is necessary for proper centriole maintenance within Drosophila cultured cells (during prologued cell cycle arrest) and within Drosophila oocytes, where centrioles are normally degraded prior to fertilisation. They highlight Ana1 as an important player in centriole maintenance. The authors begin with a candidate screen to identify core centriole proteins that are required to properly maintain centrioles. They then focus on Ana1, given that its depletion had the strongest effect, and show that its depletion leads to a reduction in the levels of centriole components in Drosophila oocytes. They show that the previously observed ability of centriole-targeted Polo to counteract centriole loss depends at least in part on Ana1 and that targeting Ana1 to centrioles also counteracts centriole loss. The authors conclude that Ana1 is a component of the PCM-promoted centriole integrity pathway.

      Major comments

      1. The authors say that Plk4 depletion does not lead to centriole loss, but there are significant differences in centriole number between the control and Plk4 depletion cells in Fig 1F and S1D. Please comment.
      2. One of the main results is that depletion of centriole components leads to a reduction in centrosome numbers when measured 8 days after S-phase arrest. I wonder whether a restriction of centriole duplication could add to this effect? Any cells that were in G2 or M phase when the drugs were added would presumably progress into the following S-phase and duplicate their inherited centrioles, but not if centriole duplication proteins had been depleted. It's true that Plk4 depletion leads to a relatively mild centriole loss phenotype, but can the authors be sure that this is not due to variations in the efficiency of different RNAi constructs? Perhaps the authors can show that Plk4 depletion efficiently prevents centriole duplication under otherwise normal conditions.
      3. The authors show that Ana1 depletion has the strongest effect, but this could in theory be due to differences in RNAi efficiency. I don't expect the authors to show the efficiency of all RNAi constructs, but they could state in the text that this is a caveat e.g. "...although we cannot rule out the possibility that differences in RNAi efficiency lead to the observed differences in severity of phenotype..."
      4. A key conclusion is that core centriole components turnover to some extent and that the incorporation of new molecules is necessary for centriole maintenance. This is a very interesting and important point and so it would be nice to have more direct data to support it. This could be done in different ways, including transfecting fluorescently tagged centriole components after S-phase arrest and showing that some molecules become incorporated into the centrioles, or by performing FRAP experiments. Of course, it is possible that the turnover is so low that the incorporated fluorescent molecules cannot be detected...
      5. The authors show that depletion of Ana1 from oocytes leads to a reduction in the intensity of centriole markers. They do not measure centrosome numbers, as the centrosomes cluster too tightly. The authors therefore can't be certain that Ana1 depletion leads to a reduction in centrosome numbers. The authors could show this by inhibiting centrosome clustering while depleting Ana1. There is a recent BioRxiv paper showing that centrosome clustering can be inhibited by depletion of Kinesin-1.
      6. In Figure 3B the authors show that expression of GFP-Polo-PACT partially rescues the effect of "all PCM" depletion, but this seems strange given that Polo's role is presumably to recruit PCM (which has been depleted). Can the authors comment? Also, it would make sense to test whether GFP-Polo-PACT can rescue centriole loss after the depletion of Ana1 alone (not Ana1 and all PCM). If Ana1 has a role in recruiting Polo (either directly or indirectly), which has been shown previously in mitotic cells, then there should be a rescue to some extent.
      7. In Fig4A,C, the authors say that γ-tubulin levels at centrosomes increase when GFP-Polo is forced onto the centrosomes - the graph seems to show a big increase, but the pictures do not...? Are the authors measuring total levels at all centrosomes? If so, I think they should be measuring the average at individual centrosomes. Also, why is the level of GFP alone not much higher when expressed with GFPnanoPACT (Fig 1B)? Presumably GFP should be recruited to the centrosomes by GFPnanoPACT.
      8. The authors show that tethering Ana1-GFP to the centrioles counteracts centriole loss in oocytes (Fig4G). They say that the centrosomes are most likely inactive because they don't recruit PCM, but they have only looked at γ-tubulin, which is a downstream component of the PCM. I think it is important to check whether Polo is recruited, given that tethering Polo to centrioles also counteracts centriole loss and that a recent paper showed that Ana1 has a role in recruiting Polo to centrosomes (Alvarez-Rodigo et al., 2021). The authors also say that these centrosomes do not organise microtubules but do not show the data.
      9. The authors propose that Ana1 is downstream of the PCM, and so over-expressing Ana1 should at least partially rescue centriole loss after PCM depletion. But I don't really agree with this. If Ana1 relies on the PCM then how would its overexpression manage to rescue the phenotype in the absence of the PCM? The finding that over-expressing Ana1 partially rescues centriole loss may instead suggest that Ana1 is either upstream of the PCM or part of an independent pathway. Indeed, the authors show that depletion of both the PCM and Ana1 has a stronger effect than either depletions individually - this is indicative of two independent pathways.

      Minor comments

      1. When the authors say that the centriole wall and cartwheel components are "dynamic" I think that they need to make it clear that this "dynamicity" is not very fast. Using the term dynamic tends to suggest rapid turnover (like in the PCM). Perhaps the authors could use the term "slow exchange" or something similar.
      2. The authors currently use a 0 or 1 centriole categorisation - it would be nice to see the breakdown of what percentage of cells have 0, 1, 2, or >2 centrioles, perhaps in a supplementary excel file.

      Significance

      How centrioles are eliminated in certain cells is an interesting question and the data presented is also relevant to understanding centriole biology in general, because it seems that some apparently very stable structural proteins actually turnover. It is widely known that PCM proteins turnover relatively quickly, but core centriole proteins are considered to be stably incorporated. The data will therefore raise interest in the centrosome field. I do, however, feel that for the authors to make this point more strongly it would be good to show this more directly. Overall, this is a very interesting paper that is well written. The data is well presented and supports the conclusions that centriole components turnover and that Ana1 is involved in maintaining centriole integrity.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript Pimenta-Marques build on their previous work addressing how centrioles are stabilized and maintained or destabilized and disassembled, depending on the cell type and developmental context. Using Drosophila cell culture and oogenesis as an in vivo model for centriole destabilization, they identify the centriole wall protein Ana1 as a central player in centriole stability. Its presence is required for the maintenance even of mature centrioles, suggesting that there is continued turnover of centriole structural components.

      Major comments:

      1. The experiments and results are very well described and most of the conclusions are supported by the data. One aspect needs clarification though. It is not clear to this reviewer how the authors envision the regulation and mechanism by which Ana1 functions in centriole stability. The data suggest that it can stabilize centrioles independent of PCM (Fig. 3B, 5B), yet the authors claim in the results and discussion that it functions downstream of PCM. As presented, this does not make sense. I would argue the opposite, it may function upstream or in parallel to the PCM. Related to the above, the last sentence of the intro states: "Finally, we found that both Polo and the PCM require ANA1 to promote centriole structural integrity." This is shown for Polo, but where is the data showing that PCM requires ANA1 for promoting centriole stability?
      2. I have a concern regarding the number n used for statistics in the quantifications. In many cases it seems that the number n of cells etc. was used (e.g. n>100 cells) rather than the number of experiments (e.g. n=3). The statistics should measure variability between experimental repetitions, not between cells etc. If statistics were indeed not done on experiments and would have to be changed, some of the observed effects may not be statistically significant and would require additional experimental replicates, which would increase the time needed for revision.

      Minor comments:

      1. I would advice the authors to improve the presentation of the figures. In particular the labels are in many cases very small and difficult to read. Readability is also reduced by the use of bold font in the labels and a mix of various font sizes within single figure panels.
      2. The result section could be shortened/become more readable by moving several paragraphs to the intro or discussion.
      3. The introduction is quite long and some parts read more like an introduction of a review on the topic.

      Significance

      This is a nice, focused study on the requirements underlying centriole stability and maintenance. The first part identifies the cartwheel, the centriole wall, and the PCM as important for centriole maintenance. The remaining parts identify and focus on the essential role of ANA1 in this process. This is an important finding, since the mechanisms underlying centriole stability and maintenance are poorly understood, yet highly relevant. Some cell types inactivate and/or disassemble centrioles during differentiation and this is likely important to their function. Providing more mechanistic insight, for example, regarding the relationship between ANA1 and PCM recruitment or the regulation of ANA1's centriole function by Polo, would have further strengthened the study. The audience interested in this work will be cell and developmental biologists. My expertise is in centrosome biology and microtubule organization.

      Referees cross-commenting

      I agree with the additional points raised by the other reviewers. I still think that overall the paper is fine and most things could be addressed in a reasonable time frame. The work does not provide much mechanism though. In this regard, the confusing placement of ANA1 downstream of PCM, would be the only mechanistic aspect, and it seems the authors got it wrong, at least based on the provided data. Here, additional experiments could elucidate these relationships further, but if this is not the goal, text changes could also address this and it would remain a smaller, more focused study.

    1. Peer review report

      Reviewer: Yulia Karmanova

      Institution: Research Centre Kairos

      email: yulia.karmanova@gmail.com


      General assessment

      In my honest opinion the topic of intercultural competence (ICC) should be of great interest not only to researchers involved in linguistics and pedagogics but to a general reader as well. By developing ICC, that represents a set of skills needed when encountering people from various backgrounds, one can learn valuable communication skills, flexibility in behaviour and become more aware of a lack of one’s tact and tolerance.

      The manuscript is well written in an engaging and lively style, it provides excellent context about linguistic cues of ICC that will help educators steer and stimulate the ICC development of their students.

      The manuscript cites relevant and sufficient literature that provides a very useful resource for current practitioners.

      I do not identify fundamental flaws in the manuscript, there is nothing illogical or irrational, although I have a few suggestions for minor improvements. Please see my comments below for further details.


      Essential revisions that are required to verify the manuscript

      No essential revisions. The manuscript clearly describes the research methods of data collection and analysis as well as other meaningful parameters. Section number 3 (Research Method and Results) is recipe-like, the study can be reproduced.

      The data collected for the research is impressive: 1,635 blogs (on average 400 words each) written by 672 students majoring in Hotel Management.

      The data and analysis provided in the manuscript are not deprived of clarity and logic. No additional experiments are needed to validate the results presented in the manuscript.

      Discussion and conclusion section aligns with objectives stated in the first section.

      The authors of the manuscript made a valuable contribution by identifying linguistic markers for ICC in the language use of students blogging about intercultural experiences: I-perspective lexemes, insight verbs and quantifiers. These language cues make ICC more «tangible» and as a result provide teachers with concrete tools for giving students more targeted ICC assessments in their reflective writing tasks. By giving certain linguistic prompts to students, educators may form a more thoughtful and personalised approach in describing their intercultural experience.


      Other suggestions to improve the manuscript

      The content of the manuscript is scientifically sound but has minor shortcomings that could be improved by further revisions.

      I do agree with the limitations of the research mentioned by the authors, especially with the lack of the explanatory value of a significant difference in frequency of use of the linguistic markers which I think can be resolved in future studies of this topic.

      I suggest that the authors should involve more assessors in their future research. Two lecturer-researchers and three senior students were involved in the process which I assume is not enough for such large-scale research like this. A bigger team of professional assessors could make valuable contribution when analysing the data and resolving emerging research questions.

      I would also recommend providing the manuscript with brief comments on the meanings of the parameters in column 4 (Table 3, 4, 5, 6) for readers’ clarity. What do t, p and n.s. stand for?

      I believe that the manuscript would benefit from correcting minor inaccuracies. I would recommend to:
 replace «his» with gender neutral «their», page 6: In these blogs, the language use of students serves as a vehicle of information on the students’ development of ICC, offering the reader concrete cues – henceforth referred to as linguistic markers – of his reflective learning process.

      • add a space between that and are, page 19: In order to bring more focus to our research, we initially focused on word categories thatare characteristic of properties that can be linked to ICC and cultural sensitivity, such as openness, self- relativity, curiosity and reflection or analytical thinking.

      • add missing parentheses, page 22; Deardorff, D. 2006. Identification and Assessment of Intercultural Competence as a Student Outcome of Internationalisation. Journal of Studies in International Education, 10 (3), 241-266.

      All in all, I find the topic of the manuscript fascinating and the research question relevant and essential to the field.


      Decision

      Verified manuscript: The content is scientifically sound, only minor amendments (if any) are suggested.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by O'Herron et al. describes an all-optical method combining optogenetic stimulation and 2-photon microscopy imaging to simultaneously manipulate and monitor brain microvasculature contractility in three dimensions. The method itself, which represents a microvasculature-targeted variation on a theme previously elaborated for simultaneous stimulation and monitoring of ensembles of neurons, employs a spatial light modulator (SLM) to create three-dimensional activation patterns in the brains of cranial window-model transgenic mice expressing the excitatory opsin, ReaChR, in mural cells (smooth muscle cells and pericytes) under control of the PDGFRβ promoter. The authors demonstrated that, by splitting a single 1040-nm stimulating beam into multiple beamlets using an SLM, this system is capable of optogenetically activating ReaChR at discrete depths in the neocortex, depolarizing mural cells and producing highly localized constrictions in targeted, individual microvessels. Using this system to investigate the kinetics of optogenetic-induced contraction and sensory-evoked dilation, the authors found that the onset of optogenetically evoked contraction was much more rapid than that of sensory-evoked dilation, concluding that the observed lag between sensory stimulation and vascular response does not reflect intrinsic limitations of mural cell contractile mechanisms but is instead attributable to the time course of neurovascular coupling mechanisms. They further found that by titrating the stimulation duration they could completely negate the vasodilatory response to a concurrent sensory stimulus.

      1) The red-shifted opsin, ReaChR, represents an improvement over opsins used in previously described 3D neuronal activation/monitoring systems. In particular, brief single-photon stimulation (100 ms) of ReaChR led to rapid, robust arteriole constrictions throughout the activation volume, whereas a previous generation ChR2 opsin required stimulation for seconds to achieve slowly appearing constrictions.

      Thank you for pointing out this key takeaway from our manuscript. In Figure 9 of the revised manuscript, we provide a comparison of ReaChR-induced vasoconstriction, with data previously collected across microvascular zones using line-scanning in ChR2-expressing mice. These data show how ReaChR produces faster and more potent vasoconstriction in alpha-SMA expressing SMCs and ensheathing pericytes, but has similar effects on the slow contraction with capillary pericytes.

      2) Single-photon stimulation was capable of completing stopping blood flow in a "first order pre-capillary branch". (Not clear what is meant by the phrase "pre-capillary branch"; anatomically, penetrating arterioles feed capillary branches.) While this speaks to the effectiveness of the method, it also highlights potential supraphysiological effects of stimulation and the importance of titrating stimulus intensity/duration to achieve physiologically meaningful responses.

      We have removed the term “pre-capillary” to avoid causing confusion, and now use the term arteriole-capillary transition to denote the alpha-SMA positive segment that lies between the penetrating arteriole (0th order) and the alpha-SMA low/negative capillaries (>4th order). The rationale for this terminology is provided in our new review (PMID: 34672718), which explains why the transitional zone should be considered a separate vessel type that is not arteriole and not capillary.

      We agree with the reviewer that titration of stimulation power/duration will be important and will depend on the application. We addressed this point by performing measurements of arteriole diameter with graded laser powers (Figures 5 & 7). There are many parameters to explore, but for the purposes of this manuscript, we clarify that the effect is titratable and that users should define physiological ranges in their specific circumstances, which may differ based on the experimental goals, age of mice, arteriolar size and vascular zone, and other factors.

      We also note that some applications may want to mimic pathophysiological levels of constriction, for example to mimic the effects of arterial vasospasm after subarachnoid hemorrhage, or ensheathing pericyte contraction with MCAo stroke (PMID: 26119027), or to examine the neural consequences of transient small vessel occlusion.

      3) In assessing effects of laser power, the authors assert that "increasing the laser power only slightly expanded the range of constriction". This seems a bit of an overstatement, given that increasing power (30-fold) had a greater effect on the spread (3x) than the magnitude (2x) of the response.

      Thank you for pointing this out. We have re-worded this section to avoid the overstatement and to emphasize the results more clearly on the spatial spread of constriction relative to laser power.

      The difference images in Figures 4B-C, G-H demonstrated that there was very limited spread of the constriction beyond the stimulation spots. We tested the effect of laser power on the spatial spread of constriction by stimulating with a broad range of power levels. We found that increasing the laser power led to a small increase in the spread of constriction. For example, a 30-fold increase in power (from 5 mW to 150 mW total power) led to ~3-fold increase in the spread of constriction (from ~25 µm to ~75 µm) (Figure 5A-H).

      4) The suggestion that penetrating brain arterioles possess a mechanism for upstream conduction of constrictive responses is intriguing (although this intrigue is tempered by the lack of experimental support for the operation of such a mechanism in the brain microvasculature).

      We are also intrigued by this hypothesis, which was supported by some evidence from a recent study of retinal vasculature. Kovacs-Oller et al. showed using neurocytin tracer injections into capillary pericytes, that they are linked through gap junctions and there is upstream directional diffusion of tracer. Further, they showed that electrical stimulation of a pericyte could lead to directional constriction from capillaries back to the arteriole in the retina (PMID: 32566247). The planar orientation of retinal vasculature makes this phenomenon easier to see. However, the 3D architecture of cortical vasculature is more challenging to study, particularly since the propagation along arterioles occurs along the Z axis, where spatiotemporal resolution of imaging is limited.

      Given our new data on the effects of laser power on axial spread (see reply to points 10-13 below) and the difficulty in separating active propagation from out-of-focus activation, we think there is not sufficient evidence to claim that penetrating arterioles are propagating the signal through some active process. Further experiments, including studies of the mechanisms involved, will be needed to address this hypothesis. Therefore, we have removed any discussion of potential propagation of the signal, and instead focus on the relationship between laser power and axial resolution of activation.

      5) The authors' premise for comparing contractile kinetics with sensory-evoked kinetics is flawed. In attempting to use the kinetics of optogenetic-induced constriction to infer something about the kinetics of sensory-evoked dilation, they are implicitly assuming that the kinetics of contraction and dilation processes intrinsic to mural cells are the same. This is highlighted by their use of the phrase "kinetics of the vasculature", which elides the possibility that dilation and contraction kinetics intrinsic to mural cells are different. Support for this latter possibility is provided by a previous report on renal afferent arterioles showing that the kinetics of myogenic constriction in arterioles are "substantially faster" than those of dilation (PMID: 24173354). Thus, their data do not rule out the possibility that the delay between sensory stimulation and vascular response reflects a slower intrinsic dilatory response rather than the time course of neurovascular coupling mechanisms. Furthermore, arterioles have an internal elastic lamina (IEL), which also determines the rates and degree of constriction and dilation. The IEL ends with the arterioles, and vessels with ensheathing contractile pericytes (and downstream) lack the constraints of the IEL.

      We thank the reviewer for this constructive critique. We agree that there are many issues in comparing kinetics between sensory evoked dilation and our optogenetic constriction. We have re-worded this section to avoid any mechanistic implications in the discussion of the kinetics of the different processes. However, we wish to still incorporate the details about the rapid kinetics of constriction to highlight the utility of the approach to intervene/perturb sensory-evoked responses, given that contraction can be titrated and precisely timed. We discuss the utility of this approach further below.

      6) It's not at all clear how overriding sensory-evoked dilation with optogenetically generated constriction provides a means for distinguishing neural activity from vascular responses. In particular, it is not clear how performing this maneuver while monitoring neuronal activity can provide the suggested insight into "aspects" of functional hyperemia that are essential to neuronal function beyond the relatively trivial observation that there is a point at which blood flow is too low to support continued neuronal activity.

      Thank you for raising this point. We have added more detail to our thoughts on why over-riding functional hyperemia could provide insight into the dependence of neural activity on the blood flow increase. Neural circuits are extremely complex with many different sub-types of neurons playing different roles. These subtypes have been shown to have different metabolic sensitivities and thus, may be differentially affected by blocking functional hyperemia (PMID: 26284893). This could lead to altered circuit activity which could have profound consequences for neural processing. Additionally, the energy budgets of different cellular functions within neurons are quite different (PMID: 22434069) and reducing available energy by blocking functional hyperemia could lead to differing degrees of dysfunction across important cellular processes (e.g. re-establishing the membrane potential, recycling neurotransmitters) which could again have important consequences for neural coding. Furthermore, it has been shown that there is a steep gradient of oxygen moving away from penetrating arterioles, and so neurons at greater distances from vessels may be differentially affected by blocking the hyperemic response (PMID: 21940458).

      7) With the exception of vasculo-neural coupling, where it would be the method of choice, the technology described leaves the impression of a capability in search of an application. That said, the ability to control blood flow to the point of completely stopping it may ultimately have applications in pathological settings.

      In addition to our response above on the utility of over-riding arteriole dilation during functional hyperemia, we have added to the discussion more potential uses of the technique. These include: (1) To be able to manipulate blood flow without using pharmacology or having to induce neural activity could be useful for a variety of studies involving intrinsic reactivity and compliance of vessels in both health and disease. (2) The different microvascular zones have distinct contractile kinetics. There are details that remain unstudied, such as the kinetics of different sized vessels, their location in the network, their identity as collateral arterioles or pial arterioles. Vascular optogenetics can dissect the contractile characteristics of different vessel types, similar to probing a circuit board. (3) Studies of the physiological significance of vasomotion, with respect to brain clearance of metabolic waste products. Being able to directly drive vasomotion and alter its amplitude and frequency will be an important tool for studies in this field. (4) Functional hyperemia is also impaired in many diseases, but this dysfunction could arise from impaired activity of neurons, astrocytes, or vessels. Therefore, a method to disentangle specific changes to blood vessels in vivo could be useful for understanding the vascular contributions to such diseases.

      Reviewer #2 (Public Review):

      The manuscript by O'Herron et al. describes a new technique for all-optical interrogation of the vasculature in vivo. They expressed optogenetic actuator ReaChR in vascular smooth muscle. They activated ReaChR using single-photon or 2-photon absorption. In both cases, they observed rapid and reversible constriction (presumably, due to Ca increase). Single-photon activation produced widespread constriction; two-photon activation allowed targeting of individual vessels. Using a commercial 2-photon system with a spatial light modulator on the photoactivation 1040-nm beam, they demonstrated localized constriction at multiple points along the small and large cerebral arterioles at once targeted by individual beamlets. Overall, this is a very interesting paper that clearly lays out the methodology and experimental design and carefully considers a number of potential limitations and pitfalls. This paper will serve as a valuable recourse for a large community of eLife readers interested in cerebrovascular physiology in health and disease as well as in neurovascular coupling and interpretation of noninvasive imaging.

      Given the chronic nature of the optical window, it is not clear why imaging was done under anesthesia. This point requires explanation. There is a concern that targeting of the vessel wall not possible in awake animals due to brain motion. If yes, that would be a serious limitation of the methodology.

      To ensure that our method is compatible with awake experiments, we have added awake data to the manuscript (Figure 10). We show that individual vessels can be independently targeted in the awake animal and the outcomes are not profoundly different than in the anesthetized state. As with all awake experiments, due diligence must be taken to ensure the preparation is as stable as possible, and the occasional trial may have to be removed if motion artifacts are too large.

      Reviewer #3 (Public Review):

      Strengths: In the vascular field, previous implementation of optogenetics to constrict and dilate blood vessels, has used either single photon full field and fiber illumination, or alternatively confocal and 2-photon scanning of individual vascular segments with raster scanning. The former is limited in spatial precision, activating multiple vessels over a large area, whereas raster scanning is not ideal for accumulating currents and often results in slow temporal precision. Spatial light modulator (SLM) generated diffraction patterns to achieve patterned illumination have become increasingly used in neuroscience to achieve reliable 2-photon activation of targeted neuron populations. Here the authors use this technology to depolarize and constrict smooth muscle cells in vivo. By imaging and stimulating with 2 laser lines and different optical paths they are able to stimulate opsin expressing cells and image simultaneously, which is advantageous. By using the Red-shifted opsin ReaChR for their experiments, it is possible to combine this approach (cautiously) with imaging many of the classically used 2-photon fluorophores and genetic indicators, with excitation spectrums <1040nm. Future work using variations of the technique is likely to gain valuable insight into neurovascular biology.

      Weaknesses: A major limitation of the current study is that although the authors achieve high spatial precision of ReaChR activation in the xy plane, the axial precision appears extremely poor compared to what would have been expected. For example, in Fig. 5-1 (using a 0.8NA, 16x objective), the authors achieve equivalent levels of surface arteriole constriction even when the SLM is focused 200um above the brain, and even larger constrictions as they initially move the focus away from the imaging plane. Although the axial spatial resolution appears better with the 1.1NA - 25X objective, such a large point spread function largely limits the utility of the technique, as there will always be a concern as whether the effects are spatially specific and not due to activation of vascular cells above and/or below the site of interest. This experiment that the authors have presented on axial precision is extremely important as it outlines a very important limitation of the technique (which is likely power dependent), but it remains to be completely characterized and understood. One possibility is that the power levels used by the authors are already above saturation, a problem raised by Rickgauer and Tank (2009)- PMID: 19706471, and therefore they may be able to refine the axial precision by using lower power. Further controls would be valuable to understand the precise cause of this large axial spread as it doesn't quite add up with the diameter of the bleach spot shown in figure 5-1D (some suggestions outlined in recommendations to the authors).

      We agree with the reviewers on this point. We conducted several new experiments to help elucidate the limits of axial resolution. First, we have dropped the comparison between objectives with different NA’s. This leads to unnecessary confusion, and it is common knowledge that lower NA objectives will have poorer resolution in the axial plane. We now mention this as a factor to consider, but have removed it from the figures. Second, we have shown, as the reviewer suggests below, that the stimulation power used has a dramatic effect on the axial spread of constriction (Figure 6E and Figure 7). Low powers indeed show a more narrow axial spread. However, we typically use higher powers (near or above 100 mW) to generate large constrictions in penetrating arteries, and we also include these levels to show the greater axial spread they cause. In summary, we confirm with lower powers the 3D precision of the two-photon optogenetic technique, and we show that higher powers can be used to broadly constrict penetrating arterioles for studies seeking to modulate blood flow in columns of cortical tissue supplied by penetrating arterioles.

      Regarding the stated inconsistency with the bleached spots, we think this mostly has to do with the difference between photo-bleaching fluorescent material (requiring lots of laser power) and photo-activating opsin channels (which can be done with much less power for very sensitive opsins). Additionally, the slide we bleached is optimally activated at ~800nm and so our 1040 nm stimulation required enormous power to burn the spot.

      The current version of the paper also lacks adequate quantification of the results as it is composed primarily of representative examples, which limits a proper assessment of reproducibility and variability of the effects.

      We agree that showing population averages will be more informative to the field. In the original submission, we showed mostly examples because the large parameter space (size and number of spots, position on vessels, duration and intensity of stimulation; if a stimulation train, the duration, number, and inter-pulse interval of stimulation) was explored in the early data rather than picking one set of conditions. However, we have now collected new data where parameters were typically the same and included population average plots in the figures that previously had only individual examples (Figures 2G,I, 4I,M, 4-1C, 5I, 6E,F, 7, 11-2 ) as well as the new data (Figures 8, 9, 10).

    1. Author Response

      Reviewer #1 (Public Review):

      LaRue, Linder and colleagues present an automation (GLO-Bot) and analysis pipeline building on the previously developed GLO-Roots, which makes use of a constitutively expressed luciferase gene to image plant roots in thin soil containers (rhizotrons). After validation of the system using a set of 6 accessions, the authors then take advantage of the increased throughput to phenotype root system architecture (RSA) of 93 natural Arabidopsis accessions and perform genome-wide association to identify polymorphic genomic regions that are associated with specific RSA traits. I appreciate that the authors made all data available via zenodo.

      The authors succeeded in automating the GLO-Root system. Overall, the GLO-Bot appears to be a nice platform to collect time-lapse images of root growth in soil-substrate using rhizotrons. The automation of the GLO-Roots system using the GLO-Bot is well described, although not in sufficient detail to be rebuilt by interested researchers, e.g. the software controlling the robot is not described or made available, precluding wide adoption of the method. The image processing pipeline is clearly described in the methods and in Figure 2. The pipeline open source and available for use and appears to work well overall, although in some cases the vector representation of the root system appears to be incomplete.

      We thank reviewer #1 for raising these concerns. We have now made the general code for the software available (GitHub: https://github.com/rhizolab/rhizo-server). In addition, we uploaded the rhizotron laser cutting files (Zenodo DOI: https://doi.org/10.5281/zenodo.6694558) that would facilitate rebuilding the robot.

      We understand the concerns about the vector representations of the root system.

      These root system structures visible on the GLO-Bot images are indeed disconnected in many locations, due to variability in the reporter’s intensity and obstruction of the light path by soil particles. For traits like root angle, the disconnected nature of the root system is much less impactful as this method naturally uses “segments” of the root as individual elements for angle measurements.

      The authors then present a quantitative analysis of RSA using a set of 93 accessions, with 6 replicates per accession, generating a large dataset on the diversity of RSA in Arabidopsis. Using average angle per day, the authors identify SNPs that significantly associated with angle at 28 days after sowing, and they describe a correlation between this trait and the mean diurnal temperature range at the site where the accession was originally collected. The main weakness of the manuscript in its current form are some details of the quantitative genetic analysis. In my opinion the quantitative genetic analysis would benefit from additional quality control as there are peculiarities in the dataset that was used as the basis for GWAS.

      We understand the concerns from reviewer #1 about the quantitative genetic analysis. Ultimately, we performed the analyses in the way we explained in the paper with careful consideration. We have added in additional descriptions of the rationale for chosing certain methods that hopefully elucidate why we did the analyses in the way we did. We hope this paper serves as a resource for others to pursue additional studies on traits relevant to their research.

      Reviewer #2 (Public Review):

      Therese LaRue and colleagues have developed a second generation of the GLO-Roots system that had been developed in their lab and published in 2015. Importantly, the new system (GLO-Bot) and the analysis of the resulting images has now been largely automated and therefore provides a throughput allowing for genetic studies. In an impressive endeavor the authors have transformed more than 100 diverse accessions that had been selected using sensible criteria with the luciferase construct, which then allowed the RSA of these accessions to be measured using the GLO-Bot system. On a set of 6 diverse accessions, the authors carefully identify meaningful RSA traits that they then quantified in the accessions of a larger panel of almost 100 accessions. They also benchmarked the new imaging processing tools against gold-standard manual tools. Overall, they show that the data acquisition and analysis is reproducible and reasonably accurate. They then proceeded to conduct GWAS using the RSA traits and identified several significantly associated candidate SNPs. Finally, they correlated the RSA with environmental variables and found interesting correlations that are consistent with prior studies.

      Strengths:

      The manuscript presents interesting root phenotyping technology, a comprehensive atlas of RSA under rhizotron lab conditions in Arabidopsis, candidate genes potentially underlying RSA traits, and interesting associations of RSA and climate variables. This will be inspiring and useful to many other researchers and has the potential to be explored further in future studies.

      We thank the reviewer for the encouraging feedback.

      Weaknesses:

      Some aspects of the data analyses are not well described and should be described more. The trait data is heavily processed to "breeding values" and it is a bit unclear when unprocessed and processed trait data is used and why. Also, limitations and caveats are not discussed sufficiently. For instance, presenting and discussing the issues and caveats of measuring RSA that was generated in thin and not very wide soil sheets using the GLO-Bot system when natural growth in soil is usually largely unconstrained. Moreover, the analysis of potential candidate genes from the GWAS is not very well developed. Finally, the trait data was not available with the manuscript and a major impact of a resource like this will come from the data being fully available to the community.

      We appreciate the broad comments on the manuscript and have tried to address them through the specific responses below. Overall we believe the approaches we used are effective but with specific caveats and have used the revision as a means of better communicating the limitations of the approaches chosen.

      Reviewer #3 (Public Review):

      The authors provide a thorough description of a method to transform plants to be bioluminescent upon applications of the require substrate such that roots are visible on the windows of rhizoboxes. They have expanded on previous work by automatic the imaging process with a robot that moves rhizoboxes to an imager where images are captured. They have improved the image analysis pipeline to be mostly automated with a user presumably needed to run various scripts in batch mode on directories of images. One novel aspect of the image analysis pipeline is in using image subtraction to subtract the previous time root system from the current in order to identify new growth.

      We thank the reviewer for highlighting the strengths of the manuscript.

      Overall, I think the authors provide a great amount of detail in parts needed and the methods, but some recommendations to increase reproducibility are more information about actual root traits measured. For example, one concern would be if root length is only summing pixels without considering diagonal pixels having a length of square-root of two, sqrt(2).

      This is a valid concern, rather than just summing the pixels, the length of the segments is actually calculated using the “Feret Diameter” (or caliper length) function in imageJ which does take diagonals into consideration

      While the methodological aspects of the paper are compelling, the authors have furthered the significance through a biological application for genetic analysis among accessions of Arabidopsis and correlating root traits to climatic 'envirotypes' or data from the origin site of the respective accession. This genetic analysis would be furthered by greater consideration of time series analysis and multi-trait analysis, which is possible in GEMMA. The authors could consider genetic analysis of the PCA traits as well. Given the novelty of this type of time-series, multi-trait data - the authors can reach further here.

      Absolutely, PCA approaches to disentangle the phenotype space would be highly interesting to further investigate, which we started in the Supplemental Figure 8. This figure decomposes all the data points including replicates and temporal values of the same replicate. The PC1 therefore mostly captures how plants change over time, while PC2 seems to capture the main trade-off of wide/horizontal vs deep/vertical root architectures that we describe throughout the text. We could make use of this PC space to quantify the average value per genotype in PC2 and utilize this value for GWA, although it is not obvious how replicated and temporal measurements behave in PCA and what would be its consequences when computing a genotype value. There will definitely be interesting work that we aim to pursue in this direction in the future.

      Regarding the additional capabilities of GEMMA. We are not aware of a subtool that is able to analyze time series directly in GEMMA, but we will look into it. The multi-trait analysis in GEMMA is also interesting. We have utilized the multi-trait feature in the past, but this is limited to very few traits. We have 8 time points, thus 8 traits. For reference, when we have run multi-trait LMM with 2 traits, we have typically seen runtimes of ~9 days in large clusters. New tools continue to emerge in the field of quantitative genetics, such as the use of summary statistics of multiple GWAs to gain new insights, which we will pursue in the future. We have added possible future directions to the discussion section (page 14).

      As far as the general structure of the manuscript, I struggled with the results mixing in the methods such that I was never sure if the lack of detail in methods there would be addressed later, along with the mixture of discussions. Perhaps these are personal choices, but the methods were also after supplemental. I simply ask the authors to consider the reader here by being honest with my own experience reading this manuscript.

      We appreciate this comment of reviewer #3. Since this is a “Tools and Resources” article, we believe that a substantial part of the results section should include the methods that were applied. The methodology mentioned in the results section should always help the reader to understand the illustrated results in the figures. If readers would like to apply certain methods, however, more details can be found in the materials and methods section. We apologize if this was not always successful and led to confusion. In the final formatted version, all supplemental figures would be linked to the main figures so that the materials and methods section would follow the discussion.

      Overall, I believe this manuscript advanced root phenotyping by providing relatively high-throughput (imaging is slow due to the long exposure times) data and doing the time-series, multi-trait genetic mapping. The authors mention imaging shoots but no data is presented - presumably, it would be interesting to tie that in but they may be reasons to not. The authors could also discuss more the advantages of this approach relative to color imaging that has also advanced significantly since the original GLO-Root paper was released. Last, I am not sure the description of the 6 accessions study adds much value to the paper, and probably many other preliminary studies were done to prototype. Overall, this is fantastic and substantial work presented in a compelling way.

      Unfortunately, the shoot images that were taken did not have sufficient quality for further analysis and due to technical problems, the set of shoot images is not complete. We removed the part of shoot imaging from the text. It now reads:”Inside the imaging system, the rhizotrons were rotated using a Lambda 10-3 Optical Filter Changer (Sutter Instrument®, Novato, CA). If it was the first imaging day or a designated luciferin day (every six days), GLO-Bot added 50 mL of 300 μM D-luciferin (Biosynth International Inc., Itasca, IL) to the top of each rhizotron immediately before loading the rhizotron into the imager.”

      The advantages of the GLO-Roots method over color imaging is clearly that the GLO-Roots method can capture a more complete image of root systems with finer roots (like Arabidopsis). We have added the possibility of using RGB imaging for bigger root systems to the discussion section (page 13).

    1. Is maintenance a privilege?

      I think in many ways it has become a privilege. In an age when practical skills and ability to repair are relatively rare, and when it is often cheaper (in money and time) to buy new, I think maintenance is a privilege. Can we share it, teach people to fish so to speak? Perhaps knowing how to maintain simply isn't enough; slim margins of personal time may not be best spent maintaining things (as opposed to maintaining oneself).

    1. Reviewer #1 (Public Review):

      The key question that Huang et al. are addressing is which approach, paratransgenesis, transgenesis, or the combination of both, is the most promising to combat malaria, killing parasites without affecting the mosquito host. They explored this question by generating a transgenic mosquito line secreting two effector molecules in the midgut and salivary glands, and infecting mosquitoes with Serratia bacteria expressing effector molecules. Their major finding is that a combination of both strategies has the highest inhibition of parasite development compared to transgenesis or paratransgenesis alone. This is further confirmed by mouse infections with a rodent malaria model showing that a combination of both strategies inhibits transmission to naïve mice.

      This study is comprehensive and provides significant information on the possible use of these approaches for malaria control. The effects on parasite development are clear and convincingly confirm that these strategies have the potential for reducing malaria transmission. It cannot be ruled out, however, that the more pronounced effects on parasite development of the combined approaches may be due to differences in the fitness of these mosquitoes rather than a true additive or synergistic action between transgenesis and paratransgenesis. Another limitation is that the authors do not show when parasites are killed and do not provide direct evidence of the role of the bacterial-expressed factors in the killing mechanism.

      The authors show very convincingly that transgenic mosquitoes (all possible combinations) have comparable fitness to wild types. However, these fitness studies are lacking in Serratia-infected mosquitoes, and in the transgenic-paratransgenic combination. Are those mosquitoes as fit as WT? Fitness costs could negatively affect parasite development indirectly, rendering the comparison between the treatments impossible (and negatively impacting this possible strategy). These are key controls that need to be added to the manuscript in order to support the finding that the combination is the best approach.

      It is surprising that the Sg/E line inhibits oocyst development given it uses a salivary gland promoter. The authors hypothesize that this is most likely explained by mosquitoes ingesting saliva with the blood meal. This hypothesis is interesting but needs to be tested by determining the presence of Scorpine and MP2 protein in the blood bolus. Also, at what stage are parasites killed?

      While the authors test the expression levels of Scorpine and MP2 by qRT-PCR and western blot in transgenic mosquitoes, they did not test levels in paratransgenic ones. In which tissues are these factors produced in Serratia-infected mosquitoes? Are Scorpine and MP2 produced in the midguts and/or salivary glands? And at what level? A quantitative comparison of scorpine and MP2 protein levels in transgenic and paratransgenic mosquitoes is important to determine whether levels are correlated to the effects on parasite development.

      Related to this, the engineered Serratia bacteria appear to express 5 effector molecules rather than just MP2 and Scorpine. This obviously can affect the results and also makes a direct comparison less meaningful, but we couldn't find any information on the other effectors, or on whether they are expressed and potentially responsible for the observed anti-parasitic activity.

      More information about the experimental setup is needed. The authors used a piggybac approach that has led to multiple insertions in some of the mosquito lines. Which lines did they use for the experiments? This is not clear in the manuscript. If multiple insertions were used, this should be stated and the feasibility of maintaining them (and efficacy) over different generations should be discussed.

      Oocyst and sporozoite data are not normally distributed, and therefore presenting the median instead of the mean is more informative. Furthermore, the statistical analyses done do not appear to be appropriate for this data. The authors need to either FDR-correct for multiple comparisons or do a Kruskal-Wallis test with post hoc testing. It would also be important to do statistical analyses on the prevalence.

      When discussing the ethical consequences of this approach, the authors should also discuss the possible effects of QF2, scorpine, and MP2 secretions in humans upon a blood feed.

      The authors show Serratia vertical transmission over three generations, but as the CFUs decrease over multiple generations, they should discuss whether low levels of Serratia can still block parasite development. In general, the manuscript lacks a thorough discussion of the limitations of this study.

      The discussion around line 280 should be more nuanced. I don't think the word 'protected' can be used as mice were not immunized but were simply not infected.

    1. Reviewer #1 (Public Review):

      The authors look at a few different nematode species to compare the dynamics of anaphase. They find that in some species the spindle oscillates transversely in anaphase, and in other species it does not. They ask what accounts for this different behavior. To address this question, they use ablation of the central spindle, and conclude from the result, correctly, that after the ablation the centrosomes are pulled to the opposite poles of the cell in all species. However, the magnitude, half-time and initial velocity of the recoil differ.

      To understand what accounts for the quantitative difference, the authors

      1) use a simple viscoelastic model of a constant force, F, pulling against a spring (with constant stiffness k), while the object moves through the viscous medium.

      2) estimate the cytoplasmic viscosity from tracking yolk granules,

      3) estimate parameters F and k from fitting the exponential recoil curves. They find that the greatest correlation between having transverse oscillation or not is with lower or higher viscosity, not with magnitude of the force or stiffness of the spring.

      Two major problems with this study can be identified:

      1) Meaning and significance: It is not clear if the transverse oscillation have a functional significance. In fact, they are more likely than not simply a byproduct of complex nonlinear mechanics of the mitotic spindle. It is important to understand what we can learn about the spindle mechanics from these oscillations, but there may be no evolutionary significance here. If the authors were asking - how, in many different species, the spindle scales with the cell size in the same way (as was done in Farhadifar et al 2020, which the authors do not to cite) despite large parameter variations - that would be a different story. But asking which parameter change is responsible for the behavior change is less meaningful.

      2) The study is not convincing, mainly because the model used for the fit is overly simplistic. The force is not constant, the spring stiffness is not constant, the mechanics is not, etc. There are a few different, very complex models, of the anaphase spindle with transverse oscillations - comparing to simulations of these models would be more convincing. Also, I am not quite sure whether the volume fraction of yolk is a useful parameter. Does not measuring MSD give us the diffusion coefficient and viscosity directly? I think using the factor depending on the volume fraction artificially inflates the viscosity differences. Lastly, I do not understand the theoretical argument based on comparison with Nedelec's model: in that model, increasing viscosity only slowed the oscillations down, not abolished them.

      In short, much more thorough investigation would be needed to understand which differences between the species account for the presence or absence of the oscillations, and one may question whether the answer would have a deep impact on our understanding of spindle mechanics.

    1. Reviewer #2 (Public Review):

      A summary of what the authors were trying to achieve:

      The authors have developed an approach to prediction of T cell receptor:peptide-MHC (TCR:pMHC) interactions that relies on 3D model building (with published tools) followed by feature extraction and machine learning. The goal is to use structural and energetic features extracted from 3D models to discriminate binding from non-binding TCR:pMHC pairs. They are not the first to make such an attempt (e.g., Lanzarotti, Marcotili, Nielsen, Mol. Imm. 2018), but they provide a detailed critical evaluation of the approach that sets the stage for future attempts. The hope is that structure-based approaches may have better power to generalize from limited training data and/or to model unseen pMHCs.

      An account of the major strengths and weaknesses of the methods and results:

      The authors first report (section 4.1) that their structural and energetic features contain information on binding mode, highlighting complexes with reversed binding polarity, for example, and partly discriminating MHC class I from MHC class II structures. This is encouraging but not terribly surprising. Also, with regard to MHC I vs II discrimination, it is not clear how the class II peptides are registered with respect to one another. This needs to be done by alignment on MHC and mapping of structurally-corresponding peptide positions, since the extent of N- and C-terminal peptide overhangs varies between structures and is largely irrelevant to the docking mode. Interactions between the TCR and MHC are ignored in the feature extraction process; it's possible that including these interactions could improve performance. The authors state: "To be noted that not all structures could be successfully modelled by TCRpMHC models, and so we could not submit them to the feature extraction pipeline." It's unclear what effect this could have on the results: if the modeling failures are cases of structures for which no good CDR templates could be identified, then perhaps this could bias the results.

      Section 4.2 reports a negative result: unsupervised learning applied to the extracted features is unable to discriminate binding from non-binding complexes. This suggests that there is not likely to be a simple energetic feature, such as overall binding energy, that reliably discriminates the true binders. In Section 4.3, the authors turn to supervised learning, in which training examples inform prediction by a classifier. One finding is that the pure-sequence approach using Atchley-factor encoding of the TCR:pMHC outperforms the structure-based approaches, though not by much. A combined model incorporating Atchley factors and structural features does slightly better. These results are a little hard to interpret because we don't know how challenging the 10-fold internal cross-validation is. It doesn't sound like there is any attempt to avoid testing on TCR:pMHCs that are nearly identical to TCR:pMHCs in the training sets, and the structural database is highly redundant, containing many slight variants of well-studied systems. It's also not clear how overlap between the template database used for 3D modeling and the testing set was handled; my guess is that since the model building is an external tool this was not controlled. Together, these factors may explain why the results on independent test sets are, for the most part, significantly worse than the cross-validation results. Another take-home message from the independent validation is that the sequence-only method seems to outperform the sequence+structure or structure-only methods. Although these are described as "out-of-sample validation", it's not clear how different these independent TCR:pMHC examples are from the structure dataset on which the model was trained.

      Sections 4.4 and 4.5 report that prediction accuracy varies significantly across epitopes, and this is in part determined by sequence similarity to the structural database (which provides templates for modeling and also constitutes the training set for the model). In section 4.6, the authors determine that the model does not appear to be able to predict binding affinity (as opposed to the binary decision, binding versus non-binding). Finally, in section 4.7 the authors benchmark the predictor against two publicly available, sequence-based predictors. When predicting for epitopes present in their training sets, all methods do reasonably well, with the edge going to the sequence-based ERGO method. When predicting for epitopes not present in their training sets, none of the methods perform very well. The authors state that "these results suggest that the structure-based models developed in this study perform as well as the state-of-the-art sequence-based models in predicting binding to novel pMHC, despite learning from a much smaller training set." This may be true, but the predictions themselves are not much better than random guessing (AUROCs around 0.5-0.6).

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      I'm doubtful that the proposed methods will form the basis of a practical prediction algorithm. In the absence of ability to generalize to unseen epitopes, simpler sequence-based approaches that leverage the ever-growing dataset of TCR:pMHC interactions seem preferable. I still think the study has value as a template and roadmap for future efforts, and a baseline for comparison. For me, a key unanswered question is whether the model-derived structural features are just a different, slightly noisier way of memorizing sequence, or actually contain orthogonal information that can enhance predictions. It might be possible to gain insight into this question by looking more carefully at the impact of model-building accuracy on performance (the authors use sequence similarity as a proxy, but this is confounded by overlap between the training set and the template set used for modeling). If model-building really adds something, it seems plausible that it does so by accurately capturing physical features of the true binding mode.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:

      As state above, I think the present work will have a positive impact on the field of TCR:pMHC prediction by critically evaluating the structure-based approach (and also by testing two previously published methods on independent data). I am less convinced of the utility of the specific methods than of the overall conceptual framework, evaluation procedures, and training/testing sets.

      Any additional context you think would help readers interpret or understand the significance of the work:

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** Techniques to probe the local environment of membrane proteins are sparse, although the influence of lipids on the membrane protein's function are known since many years. Therefore, the paper by Umebayashi et al. is important. The environment-sensitive dye Nile red (NR) coupled to a membrane protein is an appropriate sensor for monitoring the local membrane fluidity. Linking of Nile red to the receptor via a flexible tether was achieved with the acyl carrier protein (ACP)-tag method. Experiments showed that depending on the ACP site a certain linker length is required to have NR inserted in the membrane and thus be an effective sensor for lipid disorder. This technology could be of general usability to study the environment of membrane proteins in the context of their function. As an example, the technique allowed insulin induced membrane disorder in the close insulin receptor vicinity to be observed. Further, results suggested that tyrosine activity is required for this disorder to happen. The experimental results appear to be complete and controls were made.

      **Major comments:** 1) Sometimes technical terms are used without explanation: What is the GP value? What is ACP-IR? The spectrum was measured in number of rois? The reader can find those abbreveations out, but it would be nice to have them defined.

      We have made a list of abbreviations.

      2) Fig. 1d) is confusing. The ACP-IR labelling is evident in 3 panels, but there is no difference in the color (emission spectra of 1992-ACP-IR vs 2031-ACP-IR should be visible??). The DAPI staining is very different. When doing the latter, how difficult is it to get the staining equal?

      The differences in spectra cannot be seen because we used pseudo colors for display of the DAPI and CoA-PEG-NR staining. The reviewer’s comments about the unequal DAPI staining is correct. The reason for this is most likely that the cell membrane is unequally permeabilized by PFA treatment. As the point of this figure is just to show that the plasma membrane is labeled, dependent upon the expression of the ACP-tagged insulin receptor, we don’t think that the variable intensities of the DAPI staining is important. DAPI is simply used to indicate the position of the cells.

      3) How can one interpret Fig. 4: a) Control goes over 4 frames, at 240" insulin is added, and 10 frames should show a fluctuation difference?

      We showed 4 frames after control treatment that showed no significant change was observed by control treatment. We expected that clear changes would be invoked by insulin treatment in GP images, however these changes, while visible in the GP images, are difficult to see for the untrained observer. This is the reason why we used the ZNCC method in the subsequent figures to better visualize the changes.

      1. b) A color shift from blue to green is visible after insulin addition. But it is faint - difficult to assess from the pseudo color scheme. What does 1000 pixel top/1000 pixel bottom mean in c). Is it an attempt to better visualize the fluctuation? It is difficult to recognize a difference before and after adding insulin. d) It seems that the kymograph set should show this. What is the color scale? Why is 3 so untypical, i.e., no change? Box 6 is also peculiar: the left side does not show a strong change upon insulin administration, the right side does. Why? We appreciate the helpful comments for improving our manuscript.

      As pointed out, the change of GP value is extremely small before and after insulin addition, so it is difficult to fully visualize the change with normal pseudo-color expression. To deal with this, we adopted the following two methods to visualize minute changes.

      1) Visualization of local changes of the statistical GP value showed by ZNCC throughout the time-lapse images (Fig. 6 and Fig. S2B).

      2) Visualization of the top/bottom 1000 pixels of the sorting ZNCC value in each image (Fig. 7 and Fig. S2C). The top 1000 pixels are the ones that showed the largest changes. The bottom 1000 pixels are the ones that showed the smallest changes.

      Owing to these expressions, we found out that the level of the response against the insulin signal was spatially and temporally heterogeneous in the membrane.

      As for the color scale, in order to clarify the meaning of the difference of color, we have added the description about the relationship between the color and the ZNCC value in the results section.

      4) How is the kymogram calculated? The legend says 'The horizontal dimension represents the averaged ZNCC inside the rectangular area, and the vertical dimension represents time'. The averaged ZNCC is a single value, so it is not clear why the kymogram shows a variation from left to right. May it be the ZNCC was averaged just vertically?

      We apologize that we did not provide information regarding making the kymograph.

      In the yellow rectangular area (Fig. 6B), the ZNCC values of the pixels with the same x coordinate value were vertically averaged, which were represented as the horizontal direction of the kymograph. That is, one horizontal line of the kymograph holds the spatial distribution of the ZNCC value along the horizontal direction of the membrane, and the vertical direction shows their time changes. To make it easier to understand, we refined the description about the kymograph in the legend of Fig. 6.

      5) When calculating cross-correlation values on images, they need to be aligned. What fraction of the total image does the selected 19x19 box represent? As described, I imagine that a rolling CC over 19x19 pixels is calculated over an image from the time lapse series comparing it with the reference Iave(x,y). Compared to the 3x3 median filtered CP image, the ZNCC image should then be much more blurred??

      Below we provide more information regarding the calculation of ZNCC.

      Each local window for ZNCC calculation is set to a 19x19 pixels centered on every single pixel excluding the edges of an image. The ZNCC value calculated in that window is set to a center pixel of that area. After that, a new window centered on the adjacent pixel is set and calculate the new ZNCC. That is, the calculation window is slid throughout the image. Also, the calculated ZNCC value is not set to all the pixels of the window, but is set to only the center pixel of the window, so there is no blur effect like median filtering.

      The figure below shows a schematic view of our ZNCC calculation.

      Schematic view of our ZNCC calculation

      **Minor comment:** On page 16 supplementary is not spelled properly.

      corrected

      Reviewer #1 (Significance (Required)):

      The key point of this paper is convincing and the new technology appears to have a lot of potential. It can be applied to study membrane protein function in the context of its environment, the lipid bilayer.

      Membrane fluidity measurements have been developed (e.g., using fluorescent probes like laurdan). However, the trick to link a probe like nile red by ACP technology to the insulin receptor and to observe its activity is quite new.

      A most recent description of such a technology is in TrAC Trends in Analytical Chemistry Volume 133, December 2020, 116092.

      This is an interesting review, but not directly impacting on our work.

      **Referees cross-commenting**

      All comments are constructive and important. The paper is important but needs to be amended as proposed.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): **Summary:** In this manuscript, authors generated an ACP-attached Nile Red probe in order to specifically label Insulin receptor in the membrane. Owing to this specificity, one can measure the lipid membrane properties around a specific protein in the membrane. **Major comments:**

      For the conclusions in the manuscript to be convincing, in my opinion, these additional data need to be added. Some of these are new experiments, and some are detailed analysis of existing data. The new experiments are not for new line of investigation, instead it is to confirm their statements and conclusions. The major point is the reliability of spectral shift. In usual environment sensitive probes, it is certain that they are in the membrane whatever is done to the membrane. However, when the probe is attached to a protein, it is not trivial to have the same confidence that the probe is always inside the membrane, and it is in the same plane of the membrane. 1992-ACP-IR is a good example; authors state that it binds to the protein outside the membrane, but when there is cholesterol addition and -maybe more interestingly- cholesterol removal, the dye still reacts and changes its emission (even PreCT changes its emission quite a bit at the 570 nm region). This is a clear indication of a change in localization of the probe upon some changes in the membrane. This implies that observed spectral shifts may not be due to lipid packing differences, but due to localization of the probes. For this reason, it is crucial to know where any environment sensitive probe localize in the membrane with respect to membrane normal, and this knowledge is more important for this probe. Related to this, the spectral difference upon insulin treatment and activation of insulin receptor could be due to changes in probe's localization in the membrane. Especially because authors show in Fig1e, the spectra can change depending on the probe localization. Relatedly, quantum yield of NR should be significantly different when it is inside vs outside membrane. Authors should show QY for 1992-ACP-NR and 2031-ACP-NR with different PEG lengths and upon insulin treatment.

      We understand the logic of the request to measure the QY, since the QY of Nile red is much higher in organic solvents than in aqueous solutions, so it might be predicted that the QY of Nile red is higher in a lipid bilayer than when covalently bound to the protein in an aqueous environment. However, this argument depends upon the mechanism for the increase in quantum yield when going from aqueous to a non-polar solution. One possible explanation is based on the intrinsic properties of the dye under the two conditions. The alternative explanation would be that the dye would aggregate (be insoluble) in aqueous solution and therefore either not fluoresce or self-quench. In this case, we believe that the latter is the explanation because we and others have previously shown the turn-on properties of the probe when binding to proteins (SNAP-tag and others). It is not simple to measure QY in the cell under a microscope, but we have done something similar shown in supplementary figure 4. We labeled the three ACP-receptor complexes with PEG11-Nile red and co-stained with antibody to the Insulin Receptor. We then calculated a relative quantum yield. There were very little differences at all between the relative quantum yields, so we conclude that it is not the environment of the probe, which affects the quantum yield under these conditions, but the fact that it is covalently attached to a protein and incapable of forming aggregates. What distinguishes these constructs is the emission spectrum, not the quantum yield. In supplementary Table 2 we also did QY measurements in vitro and we could reproduce the increase of quantum yield by association with liposomes or in organic solvents. We tested whether non-covalent association with a protein would increase the QY by incubation with the lipid binding protein, BSA, in PBS. This was not the case, strongly pointing to the conclusion that it is the covalent association with the protein that increases the QY, not association with a protein. We believe that our demonstration of changes in fluorescent spectra with changes in cholesterol, large changes in fluorescent spectra with linker length for the 1992 construct and voltage sensitivity using patch-clamp prove that the Nile red is reporting on the membrane environment under the conditions we propose.

      **Minor comments:** - Fig 1d requires quantification We do not agree on this. This is simply to show that the labeling is dependent upon expression of the relevant ACP-IR constructs. There is no detectable labeling of the control.

      • Voltage sensitivity of different PEG length of 2031-ACP probe should be added. We have added this data in figure 2 panel E.

      • Fig 3a graph should show all data points, not only bar graphs. Also, the band in 3a for +CoA-PEG-NR is dimmer than other bands, is it specific to this particular gel since quantification does not show any difference?

      There is no significant difference- Fig 4d, colour code is needed.

      Done

      • Fig 5b and Fig3d are basically the same experiments in terms of control measurement, why is the difference in 3b is 0.04 GP unit while it is 0.007 GP unit?

      We explain in the MS, but have improved the title of Y-axis in Fig.5 b graph so that the difference in what is plotted is clear. - Why is inhibitor data so noisy? We should discuss.

      We don’t know the exact reason why inhibitor data is noisy, but we speculate that the actin cytoskeleton and phosphoinositide-dependent signaling could affect the membrane stability, and the membrane environment would be fluctuated in the presence of latrunculin B or PI3K inhibitor.

      Reviewer #2 (Significance (Required)): Overall, this is a very useful approach, and this line of research will yield very useful tools to shed light on how lipids surrounding proteins can change their function. Major advance of the paper is the new chemical biology tool. There is also biological data on how insulin can change the insulin receptor's membrane environment which is contradictory to some old literature claiming that InsR becomes more "rafty" upon insulin treatment (e.g., PMID: 11751579).

      If this type of tagging proves robust and reproducible (limitations and concerns listed above and below), it could be used by other researchers to tag their protein of interest and investigate the lipid environment around those proteins.

      The downside of this method is that the probe requires ACP tag, a relatively less used tag than others in biology, therefore researchers interested in using this probe should have their proteins with ACP tag. Moreover, the linker length and ACP-tag position are quite crucial parameters (and probably should be optimized for each protein). Longer PEG lengths cannot report on changes efficiently (Fig3b), while shorter lengths are prone to artefacts as they can go out of membrane (Fig1 and Fig2). This might limit its widespread use.

      The reason for using the ACP tag is that neither the SNAP tap nor the HALO tag working. The tethered Nile Red preferred to bind to the tqg rather than inserting into the membrane.

      **Referees cross-commenting** I agree with all comments and concerns of other reviewers. I see the usability and potential of this new technology along with its limitations as all three reviewers pointed out.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): See below. No concerns on any of these issues.

      Reviewer #3 (Significance (Required)): **Critique:** This MS reports a proof-of-principle for using site-directed environmentally sensitive probe technology to assess the local membrane environment of a receptor tyrosine kinase (IR) upon activation. This technology addresses a major gap in our arsenal of tools to study the mechanisms of membrane signaling as the parameters of interest are biophysical parameters rather than purely biochemical ones. How to do this with spatial and temporal resolution is a major challenge. This study builds on previous work by the Riezman group that develops an extrinsic labeling system to tether Nile Red to specific sites on the ectodomain of a signaling receptor and then probe local membrane environments as a function of receptor activity. This is a carefully done study is well-controlled, is clever in design and is well-described. Although the major issues to which such a general technology could contribute involve intracellular (and not extracellular) event, the advances described will be of general interest -- particularly that local membrane order decreases when IR becomes activated. Specific comments for the authors' consideration follow:

      **Specific Comments:** (i) As a general comment, the authors are measuring extracellular plasma membrane leaflet properties that may or may not translate to what is happening in the local inner leaflet environment. A general reader may well miss the significance of this. This point needs to be more explicitly emphasized in the Discussion.

      This has been discussed in the revised version.

      (ii) Why not treat cells with a PLC inhibitor to block PIP2 hydrolysis and ask if that inhibits membrane disorder. It is PIP2 hydrolysis/resynthesis that regulates the actin cytoskeleton at signaling receptors and this seems an attractive candidate for study.

      There is a long list of attractive post-signaling events of the insulin receptor and how this works in different cell types that could be tested. We believe that this is beyond the scope of this study and we encourage others to do this.

      (iii) The data acquisition time is at least 4 min which is long enough for activated receptors to be recruited to sites of endocytosis. Can the authors exclude the possibility that what they are measuring isn't reflective of such spatial reorganization? Does a clathrin inhibitor block the observed change in local membrane order for activated IR? We determined localization to AP2 adaptor containing clathrin coated pits at the cell surface and showed that during the time-course of the experiment that there is no significant change in co-localization or evidence for endocytosis (new figure 9). Therefore, we decided not to do the clathrin inhibitor blocking experiment because we believe that it could only lead to indirect effects.

      (iv) Receptor activation is accompanied by other transitions such as dimerization, etc. Can the authors exclude the possibility that what they are measuring is related to changes in depth of insertion of the NR probe into the plasma membrane outer leaflet that is a consequence of IR conformational transitions associated with activation? This is highly unlikely given the fact that fluidification of the membrane environment is found with all length linkers. Given the intervals in increases in linker length on the 2031 construct, which is the closest to the membrane, it is very difficult to conceive that any of the ones larger than 5 PEGs restrict significantly the membrane insertion of the dye. **Referees cross-commenting**

      I think we have a consensus opinion

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      See below. No concerns on any of these issues.

      Significance

      Critique:

      This MS reports a proof-of-principle for using site-directed environmentally sensitive probe technology to assess the local membrane environment of a receptor tyrosine kinase (IR) upon activation. This technology addresses a major gap in our arsenal of tools to study the mechanisms of membrane signaling as the parameters of interest are biophysical parameters rather than purely biochemical ones. How to do this with spatial and temporal resolution is a major challenge. This study builds on previous work by the Riezman group that develops an extrinsic labeling system to tether Nile Red to specific sites on the ectodomain of a signaling receptor and then probe local membrane environments as a function of receptor activity.

      This is a carefully done study is well-controlled, is clever in design and is well-described. Although the major issues to which such a general technology could contribute involve intracellular (and not extracellular) event, the advances described will be of general interest -- particularly that local membrane order decreases when IR becomes activated. Specific comments for the authors' consideration follow:

      Specific Comments:

      (i) As a general comment, the authors are measuring extracellular plasma membrane leaflet properties that may or may not translate to what is happening in the local inner leaflet environment. A general reader may well miss the significance of this. This point needs to be more explicitly emphasized in the Discussion.

      (ii) Why not treat cells with a PLC inhibitor to block PIP2 hydrolysis and ask if that inhibits membrane disorder. It is PIP2 hydrolysis/resynthesis that regulates the actin cytoskeleton at signaling receptors and this seems an attractive candidate for study.

      (iii) The data acquisition time is at least 4 min which is long enough for activated receptors to be recruited to sites of endocytosis. Can the authors exclude the possibility that what they are measuring isn't reflective of such spatial reorganization? Does a clathrin inhibitor block the observed change in local membrane order for activated IR?

      (iv) Receptor activation is accompanied by other transitions such as dimerization, etc. Can the authors exclude the possibility that what they are measuring is related to changes in depth of insertion of the NR probe into the plasma membrane outer leaflet that is a consequence of IR conformational transitions associated with activation?

      Referees cross-commenting

      I think we have a consensus opinion

    1. Author Response

      Reviewer #1 (Public Review):

      This study addresses the important question of understanding the cellular physiology of cholinergic interneurons in the striatum. These interneurons play a key role in learning and performance of motivated behaviors, and are central to movement disorders, psychiatric disease, and addiction. Their unique physiology, which includes tonic pacemaking activity and active conductances that shape integration of dendritic inputs, is critical to their function but is still incompletely understood. The authors cleverly integrate a series of innovative electrophysiological and optical approaches to gain insight into dendritic physiology of these neurons. Their creative approach yields some interesting and novel findings. However, there are technical and conceptual concerns that need to be addressed before these results can be readily interpreted. Some refinement of analysis and presentation, and potentially some additional experiments, will therefore be required to strengthen the conclusions and facilitate interpretation of the results.

      We believe that with several new sets of experiments and simulations, we have successfully refined the analysis and addressed the technical and conceptual problems. Indeed, we strengthened the conclusion with a novel pharmacological experiment that provided model-independent evidence of proximal-only boosting.

      Major concerns:

      1) This manuscript focuses on differential physiology of proximal and distal dendrites contribute to physiological activity and integration of inputs in cholinergic interneurons, suggesting that NaP and HCN currents act in concert to selectively boost inputs onto proximal dendrites (from thalamus), relative to inputs onto distal dendrites (from cortex). The results presented in Figures 1-4 are consistent with a distinct physiology of proximal-vs-distal dendrites based on purely electrical properties. Indeed, Figure 5 initially appears consistent with this model as well, since thalamic inputs (onto proximal dendrites) are boosted by an NaP conductance, while cortical inputs (onto distal dendrites) are not. This raises a key conceptual question: why are cortical inputs onto distal dendrites not boosted? Any depolarization of distal dendrites must pass through proximal dendrites before reaching the recording electrode at the soma. Shouldn't this signal be subject to the same active and passive conductances, and consequently the same boosting that shapes thalamic inputs onto proximal dendrites?

      You are absolutely right in the case of a linear model (passive or quasi-linear). However, for a nonlinear system, there can be preferential boosting of proximal inputs. The new Appendix 2, addresses this point with computer simulations.

      2) The quasi-linear approach to characterizing active and passive membrane properties is promising, and the choice of a cable-based model is well supported. However, the model itself is rather opaque, which limits confidence in the interpretation of the results. Additional analysis and description should be presented to alleviate concerns about whether the experimental data, which has a limited number of measurable values, may be over-fit by a model with too many free parameters. For example, why is the radius of the dendrite a free parameter that is allowed to vary in the full field vs proximal experiment (Lines 253-256) - and isn't it a serious red flag that the value returned for proximal dendrites is smaller than for the full field? Additional tables (e.g. fixed and free parameters and how they were determined), and figures (plots of how those parameters influence the fits, and how the parameters interact with one another) would considerably strengthen confidence in the conclusions drawn by the authors.

      Thank you very much for this comment. We have added in the new ms a table with all the parameters fit in the various figures, and have discussed the possible pitfalls of overfitting. Most importantly, we have provided a new appendix (#1) to the manuscript that explains the effects of the various model parameters in a systematic fashion, beginning with a passive dendrites, followed by the effects of boosting and then the effect of restorative currents that give rise to resonances. This appendix addresses the questions raised by the reviewer regarding how the various parameters influence the fits.

      We apologize, if we created a confusion, with respect to the meaning of the parameter r. It does not represent the radius of the dendrites (which is not explicitly represented at all, only implicitly through the space constant) but rather the electrotonic range of illumination. We indeed find that the fits consistently estimate a value of r for the proximal illumination which is smaller than that estimated for the full-field illumination, as it should.

      Finally, our new pharmacological demonstration of differential boosting in the case of proximal vs. fullfield illumination (see above) is entirely independent of the quasi-linear model fit. So for the main thrust of the ms, which is to demonstrate a proximal localization of nonlinearities and its correspondence to the spatial localization of excitatory afferent inputs, this is now achieved, at least vis-à-vis the NaP current, independently of the qausilinear model. However, we still find the model useful as it is used to estimate the distribution of HCN currents and provides a framework to think about how to manipulate dendritic nonlinearities experimentally.

      3) Technically, the use of ChR2 to modulate dendritic currents is creative. While the authors rightly acknowledge that activation/deactivation kinetics of the ChR2 channel will contribute to filtering, this important point should be expanded with additional analysis and potentially with new experiments. Of particular concern is the transition of ChR2 channels to an inactivated state over the comparatively long oscillating light pulse in Figure 3 Inactivation of ChR2 is prominent over this timescale and would precisely co-vary with the shift in oscillation frequency. To address this, the authors should present a direct measurement of this inactivation and account for it in their analysis of the chirp data. Alternatively, the chirp stimulus could be presented backwards (starting at high frequency), so that comparison of forwards-vs-backwards chirp recordings could disentangle this artefact. Either one or both of these additional experiments would be critical for interpreting the roll-off in photocurrent responses at high frequencies reported in Figure 3.

      Touché! You were spot on with this critique and we were wrong. We have now conducted several new experiments (that appear in the main text and in Figure 3 and all its supplements) that show that including ChR2 kinetics explicitly in the model fits actually makes the fits more self-consistent and removes some of the glaring differences between the results from the somatic voltage perturbations (Figures 1–2) and the optogenetic illumination (Figure 3). So as per your request, we have now presented a direct measurement of the deactivation (Figure 3–figure supplement 1) and we have played the “chirp” backwards (Appendix 1–figure 2) to address the issue of inactivation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      First we would like to express our deep gratitude to the reviewers for thoroughly and fairly reviewing our work.


      Reviewer #1:

      Major Concerns

      1. A major concern I have is with the use of DAPT to modulate Notch signaling, and investigate the impact on integrins, Yap, cadherins, etc. Gamma-secretase, the target of DAPT, cleaves not only Notch receptors, but also IntegrinB1, Nectins, Cadherins, Ephrins and more. This recent review lists 149 substrates (Guner & Lichtenthaler Seminars in Cell & Developmental Biology 2020). The risk that some of the results reflect DAPT impact on IntegrinB1, Cadherins etc themselves is significant. The authors should validate their findings with more specific modulation of Notch activity, for example with a Notch blocking antibody, with siRNA, or with SAHM1. We agree with the reviewer´s comment and will add additional key experiments using SAHM1 as alternative inhibitor of Notch activity.

      Furthermore, EGTA was used to "acutely destabilize VE-Cadherin". But EGTA chelates Calcium, which is essential for Notch structure, and EGTA is thus a well-known activator of Notch signaling (see eg Rand MD et al. (2000) Calcium depletion dissociates and activates heterodimeric notch receptors. Mol Cell Biol). The authors rightfully describe and cite this paper, but the use of EGTA nonetheless confounds interpretation. The authors check for NICD levels (at what timepoint?) but the staining is cytoplasmic (also not labelled in the figure per se, but described in the figure legend? - please label the staining in the panel). And in any case, NICD is very short-lived and nuclear staining cannot be taken as a hallmark of signaling activity. In particular if staining is performed at a time point at which the receptor and NICD may have been exhausted/depleted. The authors should validate these observations/conclusions with the Notch reporter to conclusively demonstrate whether EGTA does not activate Notch in their system.

      To test whether transient treatment with EGTA causes Notch activation we will repeat this experiment with Notch reporter activity as readout.

      Trans-endocytosis of NECD on different substrates: the authors suggest that trans-endocytosis of NECD by Dll4 increases on softer substrates. But the authors also show that soft substrates lead to spreading out of cells, which could confound interpretation (is overlapping membranes, not internalization). The authors could validate trans-endocytosis by FACS: check if red Dll4+ cells contain more NECD. It is also not clear to me in this experiment whether the authors are looking at green NECD, or Notch1 full length, since they write "overlap of Notch1 and Dll4", which would not reflect trans-endocytosis but interactions at the cell surface for both cells. Please also define "overlay intensity", or explain further.

      We will validate the trans-endocytosis by flow cytometry. In addition, we describe the procedure for microscopic analysis more clearly (methods section, p 4; results section, p 17-19)

      The authors conclude their introduction with a statement that mechanosensitivity of Notch is linked to endocytosis, but their conclusion from Fig 6C was that Notch stiffness-dependence was independent of endocytosis, using the rhDll4..?

      We have now rephrased this sentence.

      • *

      Minor concerns

      1. In the introduction, the authors describe Dll3 as a Notch ligand that activates Notch signaling in trans. To my knowledge, Dll3 has only been described as a cis-inhibitor of Notch signaling. (I think this may have arisen during repeated edits of the manuscript!) This has now been corrected in the current version.

      In the introduction, the authors state that Notch1, Dll4 and Jag1 control angiogenesis, but then they only describe what Notch1/Dll4 do in the next few sentences. Perhaps one sentence to describe the role of Jag1 would help avoid the feeling of being "left hanging".

      This has now been corrected in the current version.

      Data presentation: please show all bar graphs with the individual replicates (dotplots).

      We have now changed all bar graphs into scatter plots.

      Data analysis/normalization: many graphs represent normalization of values in multiple steps which are not described in the methods/legends/results. For example, Notch reporter gene activity (Fig 1A) is Firefly divided by Renilla, and presumably normalized to the control condition at 1 (or an average of 1 for the three controls?). This is not explained. Also, it is not clear whether the data reported for the Control condition are Huvec on rhDll4 compared (normalized) to Huvec on control substrate (and similar for each other condition). What controls are included in this experiment? Please provide the full data to provide insight into the magnitude of activation by Dll4 itself. Perhaps "Control" is without rhDll4? But the bar underneath A/B implies this rhDll4 was used in all conditions.

      We have edited our manuscript accordingly to avoid these ambiguities.

      Statistics: data should be presented as means +/- standard deviation, not standard error of the mean (see for example Barde & Barde Perspect Clin Res. 2012): "SEM quantifies uncertainty in estimate of the mean whereas SD indicates dispersion of the data from mean. As readers are generally interested in knowing the variability within sample, descriptive data should be precisely summarized with SD."

      We now use SD instead of SEM.

      Statistics: In the Methods section, the authors state that one-way ANOVA was followed by Dunnett's multiple comparison test, and two-way ANOVA was followed by Tukey's multiple comparison test. Dunnett is used to compare every mean to a control mean, while Tukey is used to compare every mean with every other mean. Fig 1 describes using Dunnett for Fig 1B, but the end of the legend days Tukey was used. However Fig 1A,C show internal pairwise comparisons to plastic. Please be sure to explain which statistics were used where, and why, and if plastic was set as the comparator, please be explicit about this. Fig 3 uses "Sidak's corrected two-way ANOVA" and "Sidak's multiple comparison test"? I think Sidak is a method to correct alpha or p for multiple comparisons, as stated in the first instance, but it is described why this was used here, and not in other analyses, and whether the authors then applied Tukey's post-hoc test as described in the methods section? Similar comments for Fig 6. It is counter-intuitive that the plastic -1.5kPa PDMS difference with no error-bar overlap in 1A would be 1-star significance, while the plastic-70kPa difference with almost overlapping error bars in 1B would be 4-star significance. Please check/show values. In Fig 1B Figure legend, the authors write "Data is presented in a bar plot and compared with the integrin β____1 intensities without DAPT treatment", but this is not the statistical comparison presented. Fig 3B shows a very minor difference with overlapping error bars as 3-star significance? Is this correct?

      We have checked all statistical issues and corrected where necessary. Since the sample size and variance were homogenous in all comparisons we now uniformly use ANOVA and Tukey´s multiple comparison test as post hoc to keep things simple.

      How much nuclear NICD (NICD intensity) is there in control conditions? (Control missing from Fig 1B, D).

      We will repeat the experiment and compare the NICD levels with those in non-activated cells on plastic.

      A DAPI counterstaining for 1B/D right panels would facilitate evaluation of whether NICD nuclear intensity is increased. The same applies for nuclear YAP assessment in Fig 3B. I assume a nuclear counter-stain was done for quantification of nuclear NICD intensity, and nuclear YAP intensity, but this is not described in the Materials and Methods, please add a description of how intensity was quantified, and provide nuclear counterstain images. (Also, what is the unit on the y-axis of "intensity" graphs? Arbitrary units (a.u.)?

      The counterstaining method with Hoechst as well as the use of the nuclear staining for quantitative analysis of images are now described in the Methods section and where needed in the figure legends. The y-axis of the intensity graphs now has a dimension (a.u.). We decided against overlay of the nuclear staining with the NICD or YAP images for graphical reasons (visibility of the respective staining).

      How much "overall" integrin B1 is there in DAPT-treated conditions in Fig 2C? (related to the concept that DAPT could be cleaving integrin B1, it could be depleted at 24 hours..?)

      We will additionally add this experiment and validate the effect of Noch inhibition on the overall intergrin level by the alternative inhibitor SAHM1

      More details regarding the analysis procedure need to be added to the Methods Section. Were cells segmented and then mean intensity estimated for the whole cell? Was this done by means of Intensity Ratio Nuclei Cytoplasm Tool plugin for Fiji alone? Were images background corrected, corrected for inhomogeneous illumination, normalized? In the case of Integrin beta 1 active, the expression seems to be patterned, was intensity expressed as mean intensity of every pixel corresponding to cytoplasm? For VE Cadherin staining, how was intensity estimated (only pixels corresponding to membrane were considered or every pixel of the cell)? Many figures are originated from a confocal microscope: were z-stacks acquired and then maximum projections done? Were z-stacks acquired and then fluorescence quantified in 3D images? Was a single plane acquired or analyzed, and if that is the case, how was this plane chosen?

      The requested information has now been inserted in the respective results and method sections.

      In Fig 4A, how is VE-Cadherin intensity quantified? As an average per field of view? Or per cell? And if per cell, how was each cell delineated? And if not per cell, how were equal cell numbers ensured? In FRAP experiment, how was intensity quantified? Was it per cell, per field of view or per region? Was each bleached region analyzed separately, or each cell? The datapoints should be either added to Figure 4C or as supplementary to assess the fitting. How many bleached regions per cell were done and how many cells were analyzed? In FRAP experiment, was bleaching done with an increased pixel dwell time? Was laser intensity increased? Do you have an estimation of laser power (not percentage) or flux?

      These issues are now described in more detail in the respective figure legend.

      Figure S2 is not referenced in the manuscript - I think a reference to "Figure S3" in the NECD transendocytosis section (no page numbers or line numbering) should be to Fig S2 instead?

      Sorry for this mistake! We corrected this now.

      In Figure 5A NICD nuclear intensity normalized somehow (normalization not explained), and stiffness no longer appears to regulate NICD levels as shown in Figure 1B.

      We have now described the normalization better in the figure legend. The difference to the results in Fig. 1B is that in Fig. 5A the cells were not activated by Dll4 sender cells or rhDll4 (endogenous Notch activity). This is now stated more clearly.

      Fig 6B: From the immuno at right there is a clear stiffness-dependent difference in Transferrin uptake. How were "single cell uptake" and "number of particles" quantified? (How were cell bodies identified?) Uptake could also be verified with FACS.

      In this point, we disagree with the reviewer: we really do not see a systematic difference in intensities between the different substrates. The process of image analysis is now better described in the figure legend. The result was so clear that we did not use FACS as complementary approach.

      Fig 6C: there appear to be very different numbers of cells in the brightfield image at right. Are the 70, 1.5, and 0.5 kPa Notch reporter activities different from one another or only different from plastic? Might these results reflect cell density/increased Notch signaling due to more cell-cell contacts?

      Unfortunately, with decreasing stiffness the PDMS gels become optically more and more cloudy, giving the false impression of a higher cell number. We tried to circumvent this by changing contrast and brightness of the images, but to no satisfying effect. We now mention this issue in the figure legend.

      How was the Dll4 coating of the different substrates done?

      The coating of the substrates is now described under a specific subheading in the Methods section.

      It would be helpful to describe the composition of Collagen G (Collagen I) in the text (it is a risk to expect vendor information to remain available indefinitely).

      The role and composition of the Collagen G coatings was included in the text (p 7). Further information on the manufacturer of the product used is included in the methods section.

      Please list catalog numbers for all reagents, and dilutions used for antibodies.

      We have added this information wherever possible.

      Instead of using red and green for images, maybe cyan, yellow and/or magenta could be used to help the reader see what is being shown (especially if the reader might be color blind).

      We will of course adhere to the respective policy of the publishing journal, once the manuscript is accepted.

      Packages and tools such as Intensity Ratio Nuclei Cytoplasm Tool plugin for FIJI should be referenced.

      We have now referenced respective tools.

      Reviewer #2:

      *Major comments: *

      Is there difference on a growth rate of cells on softer vrs stiffer gels that could affect cell morphology/signaling pathways?

      This is an important point and we will perform additional respective experiments.

      Nuclear localization of NICD and YAP would be good to validate with western blot.

      Quantification of Western Blots (especially after nuclear isolation) is – at least in our hands – much less sensitive and reliable then quantitative imaging. We do not think that this experiment would strengthen our study.

      In Figure 3 and Figure 5, siRNA experiments would strengthen the data. DAPT is not only an inhibitor of Notch but affects to other proteins as well. This should be stated.

      A similar point was raised by Reviewer#1 with the suggestion to use SAHM1 as an alternative to DAPT. As suggested we will add these experiments.

      How was the mean VE-cadherin branch length determined? This term often refers to angiogenesis assay/sprout formation and maybe another one should be considered here to describe VE-cadherin junction morphology.

      Add to all figure texts how many cells were used for the analyses*. *

      The cell number is now added wherever appropriate.

      In Fig. 6C the cell morphology of HUVECs look abnormal in comparison to other images and should be re-done.

      In contrast to all other experiments the cells where not confluent in this case. The different morphology is a sign of the lack of neighbours, not of some problem with the cells.

      Was all the data normally distributed and thus ANOVA was used? Please add more details on the statistics part. Did you remove outliers?

      Like also suggested by Reviewer #1 we have added more information on statistics and streamlined this. The data are normally distributed, outliers wer not removed.

      MTT assay of DAPT would need to be presented as it can be cytotoxic. Cells are not well visible in Fig 2C with DAPT. DAPI and F-actin staining would help to see the cell morphology.

      We will add respective data on cell viability after DAPT (and SAHM1) treatment in a revised version of the manuscript.

      Minor comments:

      Please clarify how coating with rhDDL4 is done as this was unclear at least for this reviewer.

      The coating of the substrates is now described under a specific subheading in the Methods section.

      HUVECs are known to be hard to transfect. Please provide data on transfection efficiencies of all transiently transfected cells.

      We did not systematically monitor transfection efficiencies in this context, since there was always an internal control (e.g. co-reporter in the reporter gene assay) or the data were obtained on a single cell based quantification. Generally, we yield transfection efficiencies around 30% with HUVECs.

      Reviewer #3:

      Major comments:

      • *

      1) The authors use recombinant Dll4 or Dll4-expressing ("sender") cells to activate Notch in co-cultured cells. This is per se fine however, one might over-estimate all other observed downstream effects as endogenous Notch activity is lower. It would be important to see how naïve HUVEC or other primary endothelial cells respond to changes in stiffness. qPCR of Notch target genes such as Hey1, Hey2, Hes5, Dll4 is frequently used as a readout of Notch activity in this context. Also. the Notch transcriptional reporter assay might be a suitable read-out-

      In Fig.5A we show data on endogenous Notch activity (- EGTA) on substrates with different stiffness. In this case NICD levels in the nucleus do not differ. It will definitely be interesting to repeat this experiment based on the reporter gene assay.

      2) As the authors mention in the Discussion, cell density could be of utmost importance given the fact that Notch signaling usually is assumed as an in trans signaling event between adjacent cell membranes. However, also other signaling modes (in cis, cis inhibition, JAG1 vs DLL4 ratio) might be important. As such, the authors should carefully document an report on cell density in all experiments. Secondly, the authors should use other conditions such as sparse cell density and thirdly the authors should measure transcriptional effects of stiffness on Notch ligand expression.

      In all experiments (with the exception of Fig. 6C) we used confluent cells. With the sparse cells (Fig. 6C) we also observe stiffness dependency. Investigating Notch ligand expression is definitely a good idea and will be investigated in the revised manuscript.

      3) The authors need to compare stiffness in their model with physiological conditions in developing tissues and ideally also in tumor which often have increased tissue stiffness.

      *Good point! We have now integrated such comparisons in the Discussion. *

      4) Is Notch activation due to changes in stiffness dependent on the presence of ligands or could it be that (unspecific) binding of Notch receptors to ECM could trigger cleavage just by conformational change?

      Since there is no stiffness dependent response on collagen (Fig. 6C, left panel), an effect of unspecific binding is highly unlikely.

    1. Author Response

      Reviewer #1 (Public Review):

      In this article, the authors investigated the role of sleep and brain oscillations in visual cortical plasticity in adult humans. The authors tested the effect of 2 hours of monocular deprivation (MD) on ocular dominance measured by binocular rivalry. In the main MDN session, MD was performed in the late evening, followed by 2 hours of sleep, during which EEG was measured. After the sleep session, ocular dominance was measured, which was followed by 4 hours of sleep, then ocular dominance was measured again in the morning. The results show that the effect of MD was preserved 6 hours after MD. The effect of MD correlated with sleep spindle and slow oscillation measures. The questions asked by the study are timely and findings are important in understanding the visual cortical plasticity in human adults, but I have some concerns regarding the experimental design, analysis, and interpretation of the results, which are listed below.

      Thank you for the positive summary of our results.

      • The authors investigated EEG activities in the central and occipital regions. The results of the relationship between slow oscillations / sleep spindles and deprivation index are very interesting. However, it appears that the activities were averaged across hemispheres in the occipital region. Previous studies (e.g. Lunghi et al., 2011; Binda et al., 2018) have demonstrated that MD is associated with up-scaling of the deprived eye and with down-scaling of the non-deprived eye (page 11). I wonder whether sleep slow oscillations and / or spindles are modulated locally in the deprived occipital region? To answer the first question raised by the authors (how MD affects subsequent sleep), wouldn't it be important to compare between deprived vs. non-deprived regions?

      In humans, the pure monocular recipient cortical regions are very small and represent only very far visual periphery. These regions are impossible to be located by EEG and they are also difficult to locate also with high resolution fMRI (ref to Koulla CB). Visual cortical organization is based on the visual field map: neurons whose visu.al receptive fields lie next to one another in visual space are located next to one another in cortex, forming one complete representation of contralateral visual space, independently of the eye from which the visual information comes. However, at finer scales ocular dominance columns exist and Binda et al (2018) showed that in adult humans MD boosts the BOLD response to the deprived eye, changing ocular dominance of V1 vertices, consistent with homeostatic plasticity. All these are well known facts to the visual community, and we believe are not worthwhile to discuss them.

      • To answer the second question (how sleep contributes to consolidation of visual homeostatic plasticity), the authors compared the deprivation index between two sessions, the main MDN and a control MDM session. The experimental designs for these two sessions were quite different. For example, MD was conducted in the evening in MDN, whereas it was conducted in the morning in MDM. Since there may be circadian effects on plasticity (Frank, 2016), the comparisons between these sessions may not be sufficient in investigating the effect of sleep itself (it could be merely due to circadian effect).

      Thank you for raising this important issue. We performed the dark exposure experiment in the morning because we wanted to minimize the occurrence of sleep during the two hours spent by participants lying down in complete darkness. Preventing sleep under these conditions in the late evening would have been extremely challenging. In order to investigate a possible influence of the circadian rhythm on visual homeostatic plasticity and its decay over time, we have performed an additional experiment. In this experiment, we have tested the effect of 2h of monocular deprivation in the same participants either early in the morning or late at night (at a time of the day comparable to the MDnight and MDmorn conditions in the main study). We report the results of this control experiment in the supplementary materials (Figure S2). We found that the effect of monocular deprivation follows a similar timecourse for the two conditions (ocular dominance returns to baseline levels within 120 minutes after eye-patch removal). Moreover, we also report that the effect of MD is slightly (but significantly) larger in the morning, compared to the evening. The results of this experiment rules out a contribution of circadian effects and reinforces the evidence of a specific effect of sleep in maintaining visual homeostatic plasticity.

      • The authors argue that NREM sleep consolidates the effect of MD. However, consolidation may last days to months or even years (Dudai et al., 2015). Since the effect is gone in 6 hours or so, it may be difficult to interpret it as consolidation. Although the findings of the effects of sleep on ocular dominance plasticity are interesting, the interpretations of the results may need to be clarified or revised.

      We thank the reviewer for raising this issue. We agree that the data show a substantial delay in the decay process of the MD effects after the removal of the patch. The present data indicate that specifically the sleep condition and not merely darkness would be responsible for the maintenance of the MD-induced effect during the night. Therefore, we gladly adhere to the request and propose to say that sleep stabilizes/maintains the effects of MD as long as sleep itself persists. Having said that, we would like to point out that the MD boost in amblyopic patients gets consolidated for up to one year and increases across night sleep as we reported in Lunghi, Sframeli et al (2019). Although these data strongly suggest that real consolidation may occur, we agree with the reviewer that our data did not directly address this question and changed accordingly the manuscript.

      Reviewer #2 (Public Review):

      This manuscript is an interesting follow up on a substantial literature on the role of sleep in promoting critical period ocular dominance plasticity, and the role of sleep in promoting adult V1 plasticity following presentation of a novel visual stimulus. For nearly all of that literature (i.e. coming from cats and mice), the focus has mainly been on Hebbian mechanisms. The authors here propose to advance the field by investigating plasticity in adult human V1, which the authors consider to be homeostatic rather than Hebbian, and which the authors consider to be a form of sleep-dependent consolidation. This is an exciting goal, and the overall study designs and control will test the effects of brief MD and subsequent sleep or wake in the dark on V1 processing for the two eyes.

      Thank you for the positive commentary on our study.

      However, the outcomes of the study suggest that the changes observed in V1 across sleep may actually be the opposite of consolidation - rather it is decay of an effect on V1 function caused by prior wake experience (MD), which disappears over subsequent hours.

      We thank the reviewer for raising this issue. We agree that the data show a substantial delay in the decay process of the MD effects after the removal of the patch. The present data indicate that specifically the sleep condition and not merely darkness would be responsible for the maintenance of the MD-induced effect during the night. Therefore, we gladly adhere to the request and propose to say that sleep stabilizes/maintains the effects of MD as long as sleep itself persists. We have revised the entire MS through the various sections to handle this important aspect and to consider that a classic correlate of memory consolidation during sleep (spindles density) also turns out to be associated with maintenance of the MD-induced ocular dominance effect.

      The authors claim differences due to sleep, but there is not a direct statistical comparison between sleep and awake-in-the-dark controls.

      We now directly compare the effect of monocular deprivation and its decay after two hours in the sleep vs dark exposure condition (MDnight vs MDmor). We now plot the results of the two conditions in the same graph (Figure 2). We found a significant interaction effect between the factors TIME (before and after) and CONDITION (MDnight and MDmor), indicating a specific role of sleep in prolonging the decay of short-term monocular deprivation.

      There is also no quantification of sleep architecture across the sleep period, to determine whether REM or NREM play a role.

      We have provided a summary table of sleep architecture in the revised version of the Supplementary Materials. The table shows descriptive statistics of sleep architecture on MDnight and CN. Also, we report the result of the paired comparison between the nights and the Spearman correlations between the deprivation indices (DI before and DI after) and the changes between the nights in sleep architecture. Tests indicate that MD does not produce any main effect on the sleep architecture and that there are no substantial associations found between sleep architecture parameters and deprivation indices. Thus, it appears that changes in SSO and spindle frequency and amplitude did not lead to an alteration in the amount of N2 or N3 sleep, as we might expect. At the beginning of the Results section we refer to the table and to the lack of statistically significant effects.

      Finally, while there are tests of changes in NREM oscillations with previous plasticity in wake, there are no direct tests of changes across sleep - i.e. the very changes that could be considered consolidation.

      We thank the reviewer for stimulating us to investigate whether there are any NREM parameters whose change within the sleep cycle can be related to the degree of plasticity maintenance observed at the end of the two hours of sleep.

      For this aim, we 1) partitioned SSO and spindle events into tertiles according to their occurrence time, 2) estimated the average measures of events belonging to the first and last tertile, and considered the variation between tertiles as an estimate of the changes across sleep. We then tested whether there is a consistent relationship between measures of individual retained plasticity (DI after) and changes in SSO and sleep spindles across sleep.

      We did the across sleep analysis of the SSO and spindles measurements and as previously explained none of the parameters showed associations across sleep with the individual DI after sleep. We report these results in the supplementary materials (Figure S8).

      Finally is also not clear that the decay of response changes is due to homeostatic plasticity - it could be just that- decay of plasticity that occurred previously. The terminology used - e.g. consolidation, homeostatic vs. Hebbian - don't seem well founded based on data.

      Thank you for raising an important point. In our study homeostatic plasticity refers to the effect of short-term monocular deprivation (so the plasticity occurred before sleep). We have rephrased the interpretation of our results in terms of stabilization/maintenance rather than consolidation of plasticity

      About homeostatic vs Hebbian plasticity, there is a quite large agreement in the literature stating that indeed the effects are different. Now we make clear in the text that Hebbian plasticity is usually associated to the boost of most successful signals in driving a neuronal response or a behavior. Here the MD produced a boost of the unused, and probably silent, eye and as such the boost it is very difficult to explain in term of Hebbian plasticity. We make now this clear in the introduction.

      Reviewer #3 (Public Review):

      In this study, Menicucci et al. induced plastic changes in ocular dominance by applying an eye-patch to the dominant eye (monocular deprivation, MD). This manipulation resulted in a shift toward even more dominance of the deprived eye, as assessed though a binocular rivalry protocol. This effect was stabilized during sleep whereas it quickly decreases in waking (in the dark). The authors interpret the MD effect as the resultant of cortical plasticity over primary visual areas and its maintenance during sleep as the consolidation of these changes. The authors thus connect their work to the literature on sleep consolidation. They further show that the magnitude of the MD effect is positively correlated with sleep markers that are involved in memory consolidation (slow oscillations and sleep spindles).

      However, I have first conceptual issues with this study. Indeed, previous findings on the replay of memories during sleep and their consolidation were mostly obtained in hippocampus-dependent forms of learning. Here, I do not really see what is it that would be replayed. Thus, I struggle understanding how rhythms, such as sleep spindles, that have been linked to the transfer of hippocampal memories to the neocortex, would be mechanistically associated with low-level plastic changes restricted to primary visual areas. In addition, the effects were observed over occipital electrodes, where sleep spindles are far fewer and lower in amplitude than other cortical regions. Furthermore, the association between MD-related plasticity and slow oscillations is interesting but, since these slow oscillations organize sleep slow waves, the lack of correlation with slow wave is surprising.

      We agree with the review that many of our results are indeed surprising, especially those related to the involvement of the spindles and for these reasons we believe that eLife would be the appropriate journal to present our work. At present the fact that sleep spindles have been associated manly in mediating transfer of memory does not exclude a more general involvement in other sensory functions.

      Connected to these conceptual issues, I think the present work has some important methodological limitations. First of all, the analyses included a rather small number of participants, which could make some analyses, in particular correlational analyses, severely underpowered.

      We thank you for stimulating us to emphasize this limitation. In the section Participants within Materials and methods we pointed out that the complexity of the experimental design and the need to take into account the complexity of sleep expressed through different parameters, the sample size used and the need for corrections for multiple tests led to highlight only associations characterized by strong effect size.

      Secondly, the approach used to explore the correlation between plasticity and sleep features focused on subset of electrodes (ROI) defined a priori. It is therefore difficult to conclude on the specificity of the results. Given the topographical maps provided by the authors, I am wondering if a more exhaustive analysis of the effect at the electrode level could not yield more robust findings.

      The need for ROIs is based on the interindividual variability of brain structures, in particular the large anatomical variability of V1 orientation implying a variably oriented dipole and a variable maximal representation of visual potentials over electrodes from Oz to CPz. Moreover, we have to cope with the volume conduction effect that limits EEG spatial resolution.

      With these limitations in mind, we very gladly adhere to the reviewer's request to evaluate the effects on individual electrodes in more detail. To this end we have prepared supplementary figures which show boxplots and scatterplots for the electrodes inside the ROIs to evaluate main effects and associations, respectively.

      Finally, given the number of features tested, I think it is important to clarify the strategy used to correct for multiple comparisons.

      We thank the reviewer for highlighting an unclear point. In the revised version of the Statistical analyses section, we have provided missing details of the procedure used for handling false positives due to multiple testing. Basically, we applied the FDR correction for each question we asked.

      For example, “at which time points does dominance remain significantly different from baseline?” or, “which EEG feature and in which area of the scalp shows changes significantly dependent on plasticity induced by monocular deprivation?” For each of these questions, we made a group of tests (for the first example, dependent on the number of points at which ocular dominance was assessed until the morning; for the second example, on the number of EEG features examined multiplied by the number of areas in which they were assessed) to which Benjamini & Hochberg's FDR correction was then applied.

    1. Reviewer #1 (Public Review):

      The role of the parietal (PPC), the retrospenial (RSP) and the the visual cortex (S1) was assessed in three tasks corresponding a simple visual discrimination task, a working-memory task and a two-armed bandit task all based on the same sensory-motor requirements within a virtual reality framework. A differential involvement of these areas was reported in these tasks based on the effect of optogenetic manipulations. Photoinhibition of PPC and RSP was more detrimental than photoinhibition of S1 and more drastic effects were observed in presumably more complex tasks (i.e. working-memory and bandit task). If mice were trained with these more complex tasks prior to training in the simple discrimination task, then the same manipulations produced large deficits suggesting that switching from one task to the other was more challenging, resulting in the involvement of possibly larger neural circuits, especially at the cortical level. Calcium imaging also supported this view with differential signaling in these cortical areas depending on the task considered and the order to which they were presented to the animals. Overall the study is interesting and the fact that all tasks were assessed relying on the same sensory-motor requirements is a plus, but the theoretical foundations of the study seems a bit loose, opening the way to alternate ways of interpreting the data than "training history".

      1) Theoretical framework:<br /> The three tasks used by the authors should be better described at the theoretical level. While the simple task can indeed be considered a visual discrimination task, the other two tasks operationally correspond to a working-memory task (i.e. delay condition which is indeed typically assessed in a Y- or a T-maze in rodent) or a two-armed bandit task (i.e. the switching task), respectively. So these three tasks are qualitatively different, are therefore reliant on at least partially dissociable neural circuits and this should be clearly analyzed to explain the rationale of the focus on the three cortical regions of interest. For the working-memory task we do not know the duration of the delay but this really is critical information; per definition, performance in such a task is delay-dependent, this is not explored in the paper.

      Also, the authors heavily rely on "decision-making" but I am genuinely wondering if this is at all needed to account for the behavior exhibited by mice in these tasks (it would be more accurate for the bandit task) as with the perspective developed by the authors, any task implies a "decision-making" component, so that alone is not very informative on the nature of the cognitive operations that mice must compute to solve the tasks. I think a more accurate terminology in line with the specific task considered should be employed to clarify this.

      The "switching"/bandit task is particularly interesting. But because the authors only consider trials with highest accuracy, I think they are missing a critical component of this task which is the balance between exploiting current knowledge and the necessity to explore alternate options when the former strategy is no longer effective. So trials with poor performance are thus providing an essential feedback which is a major drive to support exploratory actions and a critical asset of the bandit task. There is an ample literature documenting how these tasks assess the exploration/exploitation trade-off.

      2) Training history vs learning sets vs behavioral flexibility:<br /> The authors consider "training history" as the unique angle to interpret the data. Because the experimental setup is the same throughout all experiments, I am wondering if animals are just simply provided with a cognitive challenge assessing behavioral flexibility given that they must identify the new rule while restraining from responding using previously established strategies. According to this view, it may be expected for cortical lesions to be more detrimental because multiple cognitive processes are now at play.

      It is also possible that animals form learning sets during successive learning episodes which may interfere with or facilitate subsequent learning. Little information is provided regarding learning dynamics in each task (e.g. trials to criterion depending on the number of tasks already presented) to have a clear view on that.

      3) Calcium imaging data versus interventions:<br /> The value of the calcium imaging data is not entirely clear. Does this approach bring a new point to consider to interpret or conclude on behavioral data or is it to be considered convergent with the optogenetic interventions? Very specific portions of behavioral data are considered for these analyses (e.g. only highly successful trials for the switching/bandit task) and one may wonder if considering larger or different samples would bring similar insights. The whole take on noise correlation is difficult to apprehend because of the same possible interpretation issue, does this really reflect training history, or that a new rule now must be implemented or something else? I don't really get how this correlative approach can help to address this issue.

    1. Reviewer #1 (Public Review): 

      This study compares concentrations of immune mediators in vaginal samples of young women who report having had or report not having had vaginal sex. The study finds that the concentration of many immune markers is higher in samples of women who report having had sex than in samples of women who report not yet having had sex. While the results are interesting and suggestive, I do not believe this result necessarily indicates that vaginal sex increases levels of these immune mediators (a causal relationship) and that the evidence presented here is strong enough to draw this conclusion. 

      This study presents many methodological strengths. The sample size is amply sufficient to achieve high statistical power for this research question. A particular strength of this analysis is the relatively large number of participants who provided paired before and after sex samples. These samples are particularly valuable because stronger conclusions can be drawn from them, as their comparison is less likely to be confounded by unmeasured confounders. The statistical methods are largely appropriate for the research question, with the use of random effects to account for the correlation in multiple measures per participant. 

      The reason I would not draw causal conclusions from this analysis is that there is a high potential for unmeasured confounding of the association between sex and the concentration of immune mediators. The variables that were included in the multivariable analysis were for the most part not confounders, so the authors cannot claim that their results are free from potential confounding. Confounders are in general variables which are common causes of both the exposure of interest (vaginal sex) and the outcome (level of immune markers), and which are not on the causal pathway and are not a downstream effect of the outcome (inverse causality). The only variable included that is potential confounders is age. Most other variables (pregnancy, contraception, Nugent score, Chlamydia infection, and HSV-2 seropositivity) are either potential mediators of the effect of sex or downstream effects of the level of immune markers. It does not follow that adjustment for these variables would necessarily lead to an underestimation of the causal effect, as it is possible some of these variables have complex relationships with immune mediators, so it is difficult to predict how adjusting for these variables would influence results. Some of these variables are also potentially colliders, so adjustment for them may lead to bias (see an introduction to this topic in Holmberg MJ, Andersen LW. Collider Bias. JAMA. 2022;327(13):1282-1283. doi:10.1001/jama.2022.1820). There is no consideration of general social determinants of health that are more likely to be confounders because they potentially influence both sexual behavior and the immune system: socioeconomic status, ethnicity, education, employment, housing, food security, access to health care, etc. There is overwhelming evidence that young people who are sexually active tend to have very different socioeconomic characteristics than young people who are not sexually active. It is therefore difficult to assess whether the higher level of immune markers in women who are sexually active truly represents a causal effect of sex or simply reflect differences in the type of women who have sex. 

      The paired analysis also suggests that the main analysis is likely to be confounded. The evidence from the paired analysis is much stronger than the evidence from the unpaired main analysis because the paired analysis inherently adjusts for many unmeasured confounders that lead to women having sex by a certain age; the differences in paired samples are likely much closer to the causal effect of sex than the differences from the unpaired samples. We see that, in the paired analysis, the differences in levels of immune mediators before and after sex is systematically much smaller and non-significant for most immune markers. This suggests to me that the main analysis is confounded and overestimates the effect of sex on immune markers. If there is a causal effect, it is likely to be much smaller than the one estimated in the main unpaired analysis. 

      The authors argue that the smaller effects seen in the paired analysis might be due to an effect of time, where samples closer to the start of sex show smaller differences. However, I would need more evidence to be convinced of this. Notably, they use a spline analysis in Figure 4 to show the effect of time since vaginal sex. However, I would have liked to see the p-values for the time-dependent spline effect, in order to see whether the data supports that a difference in slopes before and after sex significantly improves the model. I suspect many of the splines are not significant and may not lend strong support to the hypothesis that time since sex has an effect. It is however difficult to assess this visually without a formal test. 

      While the results from the systematic review and meta-analysis are interesting and show that at least two other studies have shown similar results, I wonder whether these other studies do not have similar issues of confounding. The other previous studies have even fewer paired samples, so are likely to have weaker evidence than the current study. 

      In summary, I think this study has some important methodological strengths in terms of sampling and study design. However, I believe the interpretation of the results should be more tempered and cautious; while there are differences in levels of immune markers in women who have had and not had sex, there is not to my mind sufficient evidence that this difference is the result of a causal effect of initiation of vaginal sex, as there is likely to be some collider bias and unmeasured residual confounding in the analysis.

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, Radtke et al. use a model of helminth infection in IL-4-IRES-eGFP (4get) mice, in which transcription at the Il4 locus is reported by eGFP, in order to define the transcriptional signatures and clonal relatedness between Il4-licensed, CD4+ T cells in the mesenteric lymph nodes (mLN) and lungs. By infecting 4get mice with the hookworm Nippostrongylus brasiliensis, which is well described to induce a robust type 2 immune response, the authors isolated and sorted eGFP+CD4+ T cells from the mLN and lungs at 10day post infection and performed single cell RNA-seq analysis using the 10X Chromium platform. Transcriptional profiling of activated CD4+ T cells with scRNA-seq has been performed in a murine model of allergic asthma, including the lung and lung-draining lymph nodes, but this study involved unbiased capture of all activated CD4+ T cells (Tibbitt et al., Immunity, 2019). Radtke et al. have used a distinct model with Nippostrongylus brasiliensis and have focused on sorting Il4-licensed, CD4+ T cells, allowing for a greater number of captured CD4+ T cells with a "type 2" lymphocyte program for single cell analysis. Furthermore, this study sought to identify distinct and overlapping transcriptional signatures and clonal relatedness between Il4-licensed, CD4+ T cells in two "distant" tissues. In support of such an approach, there is growing evidence for tissue-specific and model-specific features of CD4+ T cell differentiation (Poholek, Immunohorizons, 2021; Hiltensperger et al., Nature Immunol, 2021; Kiner et al., Nature Immunol, 2021).

      Upon dimension reduction, the authors found mLN- and lung-specific clusters, including two juxtaposed clusters that form a "bridge" between the mLN and lung compartments, suggesting immigrating and/or emigrating cells. Consistent with previous studies, the dominant lung cluster (L2) exhibited unique expression of Il5 and Il13, enhanced IL-33 and IL-2 signaling, and exhibited an effector/resident memory profile. The authors did find a small cluster in the mLN (ML4) with an effector/resident memory signature that also expressed CCR9, suggesting the potential for homing to the gut mucosa. Whether this population is specific to the mLN or would also be found in the lung-draining lymph nodes remains unclear. In the mLN, the authors also describe an iNKT cell cluster with CCR9 expression and a CD4+ T cell cluster with a myeloid gene signature, but the significance of these populations remains unclear.

      The authors then use RNA velocity analysis to infer the developmental trajectory of Il4licensed, CD4+ T cells from the two tissue sites. Consistent with previous studies, the authors found that T cell proliferation was associated with fate decisions. Furthermore, among the two lung CD4+ T cell clusters, L1 represents highly differentiated, effector Th2 cells while L2, which is juxtaposed to the mLN clusters, represents a population likely entering the lung with the potential to differentiate into L1 cells.

      Next, the authors perform TCR repertoire analysis. The authors identified a broad TCR repertoire with the majority of distinct TCRs being found in only one cell. Among the TCRs found in more than one cell, a substantial number of clones can be found in both tissue sites, which is consistent with the findings that individual CD4+ T cells clones can produce different types of effector cells (Tubo et al., Cell, 2013). The authors find significant overlap of clones between the mLN and lung. In addition, they also identify clones enriched in a particular site and suggest that this represents local expansion. However, an alternative possibility is that certain CD4+ T cell clones are expanded at a particular site because the specific TCR preferentially instructs a particular cell fate. For example, fate-mapping of individual naïve CD8+ T cells suggests that certain T cell clones exhibit a greatly heightened capacity to form tissue-resident memory T cells over other cell fates (Kok et al., J Exp Med, 2020). Lastly, the authors analyze CDR3 sequences, finding the most abundant CDR3 motif belonging to the invariant TCRa chain of iNKTs. Among conventional CD4+ T cells, the abundant CDR3 motifs were not restricted to an exact TCRa/TCRb combination beyond a slight preferential usage of the Trbv1 gene. While TCR repertoire analysis allows for defining clonal relatedness among Il4-licensed, CD4+ T cells, the importance and relevance of the above findings to the in vivo type 2 immune response remain unclear.

      There are several limitations of the study:

      (1) The authors use the term "Th2 cells" to describe all Il4-licensed, CD4+ T cells. While CD4+ T helper cell nomenclature has evolved, Th2 cells and Tfh2 cells are generally used to describe distinct subsets driven by unique transcriptional programs (Ruterbusch et al., Annu Rev Immunol, 2020). While previous data suggested that Tfh2 cells are precursors to effector Th2 cells, subsequent studies support a model in which Tfh2 and Th2 cells represent distinct developmental pathways and should be designated as distinct subsets (Ballesteros-Tato et al., Immunity, 2016; Tibbitt et al., Immunity, 2019). Consequently, the authors' broad use of "Th2 cells" and a description of "Th2 cell heterogeneity" includes CD4+ T cell subsets with distinct developmental pathways that includes canonical Th2 cells as well as Tfh2 and iNKT cells. The clarity of the manuscript would be improved by describing eGFP+CD4+ cells as Il4licensed, CD4+ T cells rather than Th2 cells.

      We thank the reviewer for the helpful comment and state now that our IL-4 reporter positive population also includes cells that don’t meet the Th2 criteria in the introduction (lines 76-78).

      (2) The authors used perfused lungs to isolate Il4-licensed, CD4+ T cells for scRNA-seq of "Th2 cells" in the lung tissue. However, previous studies indicate that leukocytes, including CD4+ T cells, in lung vasculature are not completely removed by perfusion, which confounds the interpretation of a tissue cell profile due to contaminating circulating cells (Galkina, E et al., J Clin Invest, 2005; Anderson, KG et al., Nat Protoc, 2014). This is particularly true in the lung and relevant as the authors found a lung cluster (L2) with a circulating signature and suggested that L2 may represent a recent immigrant "Th2 cells". Thus, it is unclear whether L2 cluster identifies immigrant Th2 cells or simply reflect the circulating Th2 cells trapped in the lung vasculature. The study would benefit of using the intravascular staining to discriminate cells within the lungs from those in the circulation (Anderson, KG et al., Nat Protoc, 2014) for the proper isolation of Il4-licensed lung CD4+ T cells to truly define immigrant "Th2 cells" within the lung parenchyma.

      According to the reviewers suggestion we performed an intravascular staining to discriminate cells within the lungs from those in the circulation (new Figure 2—figure supplement 1). According to the vascularity staining method (with slightly increased time between i.v. and sacrifice compared to Anderson, KG et al., Nat Protoc, 2014 for higher probability of successful staining) the L2 lung cluster is a mixture of circulating cells and immigrating cells which we describe in the text (lines 210-213). The finding that the cells from the vasculature and the cells we classified as “migrating” seem to cluster together based on the similarity of their expression profiles on our UMAP further supports the classification of the L2 tissue fraction as “recent immigrants”. We thank the reviewer for this helpful comment which improved the quality of the manuscript.

      (3) The authors describe T cell exchange/trafficking across organs. However, in general, interorgan trafficking refers to lymphocyte trafficking between distinct non-lymphoid tissues, rather than trafficking between lymph nodes and peripheral tissues (Huang et al., Science, 2018). Rather than inter-organ trafficking, the authors have described shared and distinct features of Il4-licensed, CD4+ T cells from a draining lymph node of one organ (gut) and a distant non-lymphoid organ (lung). The experimental approach used makes interpretation of some of the findings challenging. Specifically, canonical effector Th2 cell differentiation is well described to occur via two checkpoints, including the draining lymph node and the peripheral (non-lymphoid) tissue (Liang et al., Nature Immunol, 2011; Van Dyken et al., Nature Immunol, 2016; Tibbitt et al., Immunity, 2019). In the draining lymph node, Th2 cells acquire the capacity to express IL-4 alone, but do not complete effector Th2 cell differentiation until trafficking to the inflamed peripheral tissues and receiving additional inflammatory signals. Consequently, it is unclear whether the differences identified in the mesenteric lymph node and lungs simply reflect well-described differences between the two Th2 cell checkpoints or organ-specific differences (gut vs lung). Il4-licensed, CD4+ T cells from the intestinal mucosa and lung-draining lymph node would also be needed to truly define organ-specific differences during helminth infection.

      According to the reviewers suggestion, we avoid the term “inter-organ trafficking” and replaced it by “at distant sites” in the title. As the reviewer points out we chose the setup of comparing a lymphoid and a non-lymphoid organ to acquire a broad picture of Th2 developmental stages in Nb infection. The limited overlap in clusters on the UMAP shows that expression profiles between MLN and lung strongly differ. However, this notion is not in conflict with cells of both organs being in a different developmental stage. We added information to highlight it in the manuscript (lines 99-101). Lung and MLN (rather than medLN and MLN) were selected to enable clonal relatedness/distribution analysis of T cells at distant sites. As part of the revision we additionally provide newly generated single cell sequencing data that compares medLN and MLN cells at day 10 after Nb infection and find that UMAP clusters are largely overlapping between medLN and MLN (new Figure 1—figure supplement 3). This suggests that there is no broad medLN/MLN site specific signature present that would force the medLN and MLN cells to cluster apart. Addition of the newly generated medLN/MLN data on the lung/MLN UMAP based on shared anchors (Stuart et al. Cell. 2019) also leads to a clear separation between all LN and lung cells supporting that cells don’t cluster due to a site-specific respiratory tract vs intestinal tract signature but likely based on developmental stages (new Fig. 1C,D). An exception are defined effector clusters that show signs of a site-specific signature (L1 expresses Ccr8, MLN4 and MLN6 express Ccr9, differences are also suggested by clustering described in lines 247-252). A similar phenotype to the one observed on the transcriptional level is observed when we cluster medLN/MLN and lung cells based on scRNAseq suggested surface marker expression after flow cytometric analysis, extending analysis to medLN on protein level (new Fig. 3). It would have also been interesting to include lamina propria T cells as the reviewer suggested but we were not able to extract high quality cells at day 10 after Nb infection which is a common limitation in the Nb model.

      (4) The study includes a single time point (day 10) whereas Tibbitt et al. performed scRNAseq in the lung and lung-draining lymph node at multiple time points during type 2 immunity (Tibbitt et al., Immunity, 2019). As a result, it remains unclear how similarities or differences between the mesenteric lymph node and lung response would change over the duration of helminth infection, especially given the helminth life cycle involves multiple infection stages.

      As part of the revision we screened for surface marker expression in the single cell sequencing dataset on transcript level and stained these on protein level (new Fig. 3 and Figure 3—figure supplement 1). This allows to follow the populations defined by scRNAseq longitudinally (d0, d6, d8, d10) by flow cytometry during Nb infection. We compared medLN, MLN and lung. The dynamic of the response in the medLN and the MLN seems similar with a small delay in the MLN compared to medLN.

      Nb with its relatively well defined migratory path through the body provides a relevant complex model antigen naturally present in the respiratory tract and the intestine during infection. However, analysis of complexity and relevance does often invoke limitations. While stage 4 larvae are found in lung and gut and certainly provide a shared antigen basis between both sites (migration stage from lung to intestine; Camberis et al. Curr Protoc Immunol. 2003), we also think that there is a reasonable number of antigens shared between different larval stages and antigen (either actively secreted or from dying larvae) that are systemically distributed. However, there are probably immunogenic differences between larval stages but to analyze these is beyond the scope of the manuscript.

      While i.e. Tibbitt et al. nicely define cell clusters with a limited number of cells they don’t include any TCR analysis and clonal information. Not much was known about the expansion of T cells in the different clusters in one organ and between organs and we provide relevant data in this regard. Furthermore, HDM as an allergy model might invoke different Th2 differentiation pathways as. i.e. Tfh13 cells are found in allergic settings but not in worm models (Gowthaman U, Science. 2019). With our approach on single cell level we were able to show effective distribution of a number of T cell clones in a highly heterogeneous immune response and describe and functionally validate successfully expanded clones / expanded TCR chains later on (i.e. new Fig. 6). This kind of analysis has not been performed for a worm model before.

      (5) The study analyzed one scRNA-seq experiment that included two mice without validation via flow cytometry or other method to infer a role of a particular finding to the type 2 immune response in vivo.

      As noted above, we screened for surface marker expression in the single cell sequencing dataset on transcript level and measured these on protein level by flow cytometry as the reviewer suggested. This allows to follow the populations defined by scRNAseq longitudinally (d0, d6, d8, d10) during Nb infection (new Fig. 3). Furthermore, we added a newly generated set of scRNAseq data which confirms and extends findings made in the initial sequencing experiment (Fig. 1C,D and Figure 1—figure supplement 3). We also included validation experiments based on the performed TCR analysis and retrovirally expressed three TCRs from our study and confirm Nb specific expansion for one of them in vivo (new Fig. 6 and Figure 6—figure supplement 1).

    1. The third UDL principle is to provide multiple means of expression and action. We find it helpful to think of this as the principle that transcends social annotation: at this point, students use what they’ve learned through engagement with the material to create new knowledge. This kind of work tends to happen outside of the social annotation platform as students create videos, essays, presentations, graphics, and other products that showcase their new knowledge.

      I'm not sure I agree here as one can take other annotations from various texts throughout a course and link them together to create new ideas and knowledge within the margins themselves. Of course, at some point the ideas need to escape the margins to potentially take shape with a class wiki, new essays, papers, journal articles or longer pieces.

      Use of social annotation across several years of a program this way may help to super-charge students' experiences.

    1. “How might we, both individually and as a society, creatively generate new visions of what it means to grow old?”

      I agree with Minha's assessment of the project. Her research question is phrased perfectly for the overall topic of these combined videos. I can't stop, and I think I won't stop thinking about what it truly means for me to age. Each voice represents a background that provides a resource for both the voice owner and the audience to answer this question. Aging for me means being more cautious with words and actions. I consciously do this because I see everyone around me go through this process and talk about it. Aging for me means looking at my grandparents and and thinking what I will do and what I will look like when I reach their age. I thought about this question a few times when I was much younger, then there was a long period of me not worrying about it at all, and in college, the question came back to me at higher rate of frequencies. I often ask myself if my future kids/grandkids (if I ever have them) would care about me and life after death was something that seems to be in my head for the longest time. Aging for me means carrying new responsibilities. I know that there are things that was acceptable when I was one year younger and became inapplicable for me the year after, and vice versa. "What it means to age?" is repeatedly asked throughout the video, motivating us to give it a try and craft our own response. This research question has well summarized for the bigger and better understanding of the purpose that these 'storytellers' and collaborators embed in this project. Same with taylortots, I may revisit this project from time to time with newer perspectives about the definition of growing old. Thank you for the insightful post!

    1. Author Response

      Reviewer #1 (Public Review):

      The authors examined the relationships between humans' heartbeats and their ability to perceive objects using touch.

      Strengths: This study is a large and sophisticated one, with great attention to detail and systematic analysis of the resulting data. The hypotheses are clear and the study was carried out well. The presentation of the data visually is very informative. With such a large and high-quality set of data, the conclusions that we can draw should be clear and strong.

      Weaknesses: The main drawbacks for me were first, exactly how the data were analysed, and second that there seem to be too many results reported to get an overall view of what the study has found.

      First, there are always a number of choices that researchers can make when analysing their data. Too many choices in fact. So we always need to see a consistent, principled, and transparent account of how those choices were made and what the effects on the data were. At present, I think this needs to be improved, partly in the justification of the analyses that were done; partly by re-doing some analyses and the presentation of results.

      Second, I admit to being a little lost when trying to understand all of the analyses - why there were done, what choices were made, and what the findings were. In some cases, it felt a little bit like the analyses were decided on only quite late - after exploring the data. One clear way to address this would be to divide the main results into two kinds: confirmatory (those that the authors expected to do before the study was run), and exploratory (those that the authors decided to do only after seeing the data). This would be both good practice and would help to focus the reader on what are the most critical findings.

      Achievements: I think the presentation of results needs to be strengthened before I can decide whether the aims are achieved.

      Impact: This will also depend on the revision of the results.

      We thank the Reviewer for these comments. In the original manuscript we thought we have been clear as to those analyses that were planned and those that were exploratory. The planned analyses are in keeping with the previous studies in the literature on which this study was based (Al et al. 2020; Al et al. 2021; Grund et al. 2021). The only exploratory analysis was the inclusion of touch variance as a co-variate. We had not expected that participants would differ so much in how long they held their touch.

      Reviewer #2 (Public Review):

      In this article, the authors set out to discover whether the cardiac cycle influences active tactile discrimination, to better understand the putative relationship between interoception rhythms and exteroceptive perception. While numerous articles have looked at these relationships in the passive domain, here the authors designed an innovative active sensing task to better understand the interaction of sensorimotor processes with the cardiac rhythm.

      The authors report a series of consecutive analyses. In the first, they find that while active discriminative touch is not modulated by the cardiac cycle, non-discriminative touch is such that the start, median duration, and end time of touches are shifted forward along the cardiac cycle towards diastole. Next, the authors examined the proportion of total start and end touches within systole versus diastole and found that across both discrimination and control conditions, touch was roughly 10-25% more likely to terminate during diastole. Further, examining the median holding time, the authors found that touches initiated during systole were lengthened in duration, consistent with a perceptual inhibition by this phase. This last effect appeared to be greatest for the highest stimulus difficulty levels, further supporting the notion that some cardiac inhibition of sensory processing may be at stake. Finally, when examining physiological responses, the authors found that cardiac inter-beat intervals were lengthened during active touch, consistent with the hypothesis that the brain may exploit strategic cardiac deceleration to minimize inhibitory effects.

      Overall, the key effects of the manuscript are fascinating and robust. A major strength of the approach here is the task itself, which utilizes a well-controlled stimulus with multiple levels of task difficulty, as well as an elegant positive control condition. This enabled the authors to look rigorously at difficulty and stimulus condition interactions with the cardiac phase. This clearly pays off in the analyses, as the authors are able to construct a more informative story about how precisely cardiac timing events modulate perception.

      Statistically speaking, I found the overall approach to be rigorous and sound. The study is well powered for a psychophysical investigation of this nature, and the interpretation of results is based on robust effects in the presence of a strong positive control.

      We thank the reviewer for these positive comments on the original version of this paper.

      Reviewer #3 (Public Review):

      The manuscript presents a carefully designed and well-controlled study on active tactile perception and its relationship to internal bodily rhythms - the cardiac cycle. This work builds on previous studies which also showed that active perception/voluntary actions occur in certain phases of the cardiac cycle, but the previous research failed to show/was not designed to show the significance of these synchronizations for perception or behaviour. To my knowledge, this is the first report that seems to experimentally show that active perception in the cardiac diastole leads to behavioural advantages - better tactile discrimination.

      The manuscript itself is very clearly written, the introduction is concise but sufficient, while the results section is very well organised and I especially like how the authors guide the reader through the analysis and additional steps taken to understand the findings even better.

      Yet, despite careful study design, effective visualisations, and elegantly constructed story, there are some analytical choices that, in my opinion, are not sufficiently justified or explained (e.g., selecting a diastolic window equal in length to the duration of systole, instead of using the whole duration of diastole). Such analytical decisions could have (at least some) effects on the obtained results and thus conclusions drawn.

      We thank the Reviewer for these comments. The analyses referred to here were planned and specifically the choice of the windows for defining systole and diastole were identical to the studies in the literature on which this study was based (Al et al. 2020; Al et al. 2021).

    1. Reviewer #2 (Public Review):

      Context:<br /> The authors propose a new analysis of an already well-studied conceptual model of adaptation to a new environment. Individual genotypes are characterized by some (breeding value for) phenotype under gaussian stabilizing selection (meaning that fitness is a gaussian function of phenotype, centered around some optimum value). The scenario assumed is that an isolated population of fixed size is initially at equilibrium (between mutation, selection and genetic drift). This population is diploid and sexual with many unlinked loci acting additively on phenotype (across loci and between homologous chromosomes). This view simplifies the analysis but is also not inconsistent with various empirical analysis of locus specific effects on quantitative traits (the empirical support is discussed and reviewed in both introduction and discussion).

      Then a change in the environment induces a shift in the optimum without affecting any other parameter (strength of selection, population size, mutation effects, existing phenotypes), see figure 1. We wish to know how the population responds to this change, both in terms of phenotype distributions, and the underlying genetic basis (how alleles of various effects change in frequency and contribute to the phenotypic response).

      This process has been at the core of the modelling of adaptation for more than a century, as it is maybe the most natural conceptual framework to describe adaptation to a new environment (a "niche shift" so to speak). It is relevant to both the study of demographic/ecological and phenotypic responses to changing conditions, and to the genomics of the changes associated with this process.<br /> However, in spite of this long history (reviewed in introduction in broad lines), we do not have an exact mathematical description of this process. The reason is that the problem is in fact very complex: the genome is a sea of various genes, each bearing various alleles (depending on the individual), that further interact mutually by selection (even though loci are additive on phenotype), because fitness is not a linear function of phenotype. The simple population genetics with two alleles and one locus seem far away...

      I think it is fair to say that the main route to handle this problem, in predominantly sexual species, has been through the approximations of quantitative genetics. There, each locus is assumed of small effect and linkage disequilibrium between them is neglected. This has led to empirically testable, and often quite accurate, predictions on the response to selection in terms of mean phenotypic change. Yet, even under this broad approximation strategy, there are various ways to derive predictions, each neglecting one force or another (genetic drift most of the time), or looking at the process over short or longer timescales.

      Aim and achievements:<br /> The authors include their work within this broad framework, but set to derive new approximations that are intended to cover several of the existing approach as subcases, and especially to handle genetic drift effects in finite populations (large ones), and short vs. longer timescales. I believe they succeed quite well in doing so: they provide clear approximation methods (in appendix mostly) and substantial simulations to show their accuracy. The derivations are fairly technical but most of the time they manage to give an intuition of where they come from and illustrate this intuition via figures in the main text. They produce a prediction of two main observable dynamics: that of the (breeding value for) phenotype itself (its mean over time, variance, third moment), and that of the genetic contribution of various loci and alleles along the genome (depending on the allelic effect on phenotype). They also describe two timescales where the dynamics are fairly different, a short timescale where the mean phenotype is shifting (quite rapidly over tens/hundreds generations) towards the new optimum, and a longer timescale where the higher moments and mostly the genetic basis changes while the mean phenotype merely wanders in a narrow vicinity of the new optimum. The connection between the two timescales is important as it is the slight differences in allele fates during the first one that result in differences in long term behavior in the longer one (illustrated in figure 3).

      The main achievement on the phenotypic response is mostly to reobtain previous approximations under somewhat different or broader assumptions. This is not useless as it may explain why these known predictions (the "Lande model") are surprisingly robust to deviations from the required conditions (e.g. figure 2). However, I think that some extra exploration of the parameter space (away from the required conditions) would allow to really see when the Lande model does fail on mean phenotype dynamics over short timescales, as anticipated. The question of whether this range is relevant remaining open to empirical measurement.<br /> Therefore, the main contribution of this ms is not on phenotypic responses but on the underlying genetic basis, and what we may expect to observe when measuring QTL's or GWAS between two populations separated by an environmental shift in the past: are there many loci contributing limited difference, or fewer loci contributing most of it. In that respect, eqs 20-21 and 25-26-27, and figures 5 and 6 display the main findings and thei check by simulations. These findings, although stemming from quite elaborate derivations, yield a fairly simple and yet accurate outcome, at least in the parameter range studied. Various other parameter sets are also checked against simulations in the appendix, and the simulation code is made available for any further check (as exploring all the possible parameters is a fairly taunting task, for an article of its own probably).

      Limits:<br /> I believe the main limit of this work is fairly explained in the discussion: to achieve mathematical tractability (a full numerical treatment being inherently impossible given the many parameters), many simplifying assumptions must be made (simple fitness landscape, simple effect of the environmental change, simple demography etc.). This means that it is possible that empirical observations will differ from the predictions for various reasons. However, quantitative genetics have already proven reasonably robust and accurate in predicting observed phenotypic dynamics, using comparable approximations so it is not madness to hope that the same will happen concerning the genetic basis of adaptation. Also, I would suspect that the methods proposed in appendix will most likely extend fairly easily to some deviations from the model's assumption: change in phenotypic variance with the new environment (a form of plasticity), or in width of the fitness function, or change the population size, without too much effect on the main conclusions. Still, some other limits may not be overcome as easily (e.g. pleiotropy among multiple traits, or non-stationary optimum), but it seems (a priori) that part of the approach could still be adapted for these situations. The main "wall-hitting" limit of the paper is inherent in the very basis of the approach, namely assuming mild changes occurring in weakly linked polymorphic and numerous loci as opposed to strong changes occurring on more tightly linked and fewer loci. These limits are all fairly described in discussion.

      Overall, this paper is not an easy read, but not by lack of clarity, rather because the problem at hand is complex, and there is a lot of material to describe. Each part flows quite well in my opinion, but there are many parts to read.

      Potential impact:<br /> I believe that because it yields relatively simple analytic outcomes (at least the predictions in main text), the paper could be useful to data analysis, mostly in the field of genomics of adaptation where it may provide testable predictions for GWAS and QTL data. It could also be used to infer genetic distributions (v(a),f(a)) from observed QTL or GWAS data, if the model is deemed valid.

      In the field of theoretical population genetics, it may also provide a methodology to capture sexual adaptation dynamics in other contexts by mixing various approximation methods: connecting distinct timescales, connecting deterministic approximations for phenotype and diffusion approximations for allele frequencies. This may not be the first time of course (see e.g. "stochastic house of cards" and their extensions), but it is here used in the context of adaption dynamics rather than equilibria, for the first time I think.

    1. Rule #7: Predict the future. The same way you would predict what's going to happen in the next season of your favorite show. Is Beechum going to kill the President? Figure out what you think is going to happen in the future based on the details of what's happened in the past.This can however very quickly lead into the mistake of Historicism. The predictive power of History has a limit. While it is true that you can identify historical trends and you can make educated guesses as to them happening again if the conditions are met, I don't think it's necessarily a good thing to rely on it. The reason for this being that we then begin to look for similarities and lose sight of other factors which may change the situation, and we run the risk of attempting to formulate historical laws. In this sense I agree with Karl Popper, in that these 'laws' aren't falsifiable, they aren't testable, unlike the sciences; history does not have this luxury

      Predict With The Objetive of Validating And Changing Your Mental Model, trying to be as comprehensive as possible. Try to Think About the powers that rules and how their oppositions can react to their actions.

      Do not make your prediction long in extension. The further you try to get the chances that you would make a critical error in your prediction grows exponentially

      Some propms always have reveal valuable and memorable information about the period.

      Changing Your Mental Model, Not To Get IT

    1. Author Response

      Reviewer #2 (Public Review):

      This is an interesting and well-performed study that adds to the literature base. The authors investigated the role of a discrete brain pathway in binge drinking of alcohol. They adopted a multidisciplinary approach that overall suggested that alcohol-induced changes at synapses of anterior insula (AI) cortex inputs to the dorsolateral striatum (DLS) maintain binge drinking. Further, they suggest this may be a biomarker for the development of alcohol use disorder (AUD).

      Strengths:

      1. Extends previous studies and builds further evidence for AI→DLS involvement in aberrant alcohol intake.

      2. Adopts elegant approaches to isolate the defined connections. This included in vivo optogenetic stimulations (both open and closed loop), recording of defined synapses in slice preparations, applying in vivo optogenetic stimulation parameters to isolated brain slices

      3. Well-controlled for the most part, although at times the authors assert "specific" effects without unequivocal proof. For example, the insula also projects to the ventral striatum and this pathway has been implicated in regulation of alcohol intake in rodent models (Jaramillo et al., 2018), and is activated in heavy drinking humans during high threat related alcohol cue presentation (Grodin et al., 2018).

      4. Measures the microstructure of drinking behavior in subjects.

      5. Employed an artificial neural network and machine learning to interrogate data. After training the network it could predict both the fluid consumed (water vs alcohol) and the virus type based on drinking microstructure data.

      6. Applied a series of behavioral tests to confirm that stimulating the defined pathway was not in and of itself reinforcing, anxiogenic or altered locomotion.

      Weaknesses:

      1. Only used male mice, in humans binge drinking in females is a major problem and rates of AUD between males and females have been converging in recent times (Grant et al., 2015).

      We took age-matched female mice that were injected with AAV-ChR2 into AIC and had them undergo the same 3 weeks of Drinking in the Dark to replicate the male data displayed in Figure 1 with an experimental focus on AIC inputs. We then performed whole cell patch clamp electrophysiology in DLS brain slices from these female mice. We measured optically evoked input-output responses (oEPSCs), AMPA/NMDA current ratios (oNMDA/oAMPA), and paired pulse ratios (oPPR). These data are presented in supplemental figure 4. In contrast to males, we did not observe any effect of alcohol consumption on AIC inputs into the DLS of female mice compared to males. We also combined both male and female datasets to statistically determine if we had sex differences for these specific measures by the existence of a main effect and/or a sex x fluid interaction. We report these statistics in text from lines 180 to 195, where we note that we did not have a sex x fluid effect for oEPSCs but did note that we had a sex x fluid effect for our oNMDA/oAMPA synaptic plasticity measure. This finding further justifies the behavioral data and circuit manipulations being conducted in solely male mice.

      While this is a fascinating sex difference and important data for the field, this manuscript is not specifically about exploring sex differences per se. We believe we have done our due diligence and correctly reported the existence of sex differences, or the possibility of sex differences, but the electrophysiological findings that we later modulate in vivo are only present in males. We point out that future work is needed to determine the contribution of circuit-specific changes in females at these synapses. Ultimately it will take much more work to fully elucidate sex difference circuit-specific mechanisms that we feel are far beyond the scope of this manuscript.

      1. At times over-interpreted, especially with regards to specificity.

      We are not exactly sure what the reviewer is referring to with “regards to specificity,” but we have done our best to address what we think they are asking and hope that we have adequately addressed this critique. We added sentences (lines 173-178) regarding alcohol-induced plasticity at other inputs to DLS that were not tested and (lines 442 - 446) how we are not sure whether these synapses control consumption of other non-alcohol substances (but point out our prior sucrose drinking data from Muñoz et al., Nat. Comm. 2018).

      1. Lacks a mechanism, although the authors do acknowledge this.

      This is just a first step towards discovering a mechanism. We previously identified an unusually alcohol-sensitive synapse and are now elucidating its behavioral role and some associated plasticity at that synapse that may be part of a mechanism. With our new single session alcohol data to compare our 3 week drinking data to, we are closer to beginning the process of discovering a mechanism. Additional work that is beyond the scope of this manuscript is needed.

      1. I would like some more discussion about the potential for this to be a biomarker in humans.

      We have removed language in the body of the manuscript and expanded on the implications of our findings at the end of our results and discussion from lines 514 to 548.

      Reviewer #3 (Public Review):

      Haggerty et al. assess how the projection from the agranular insular cortex to the dorsolateral striatum contributes to binge drinking in mice. The authors use whole-cell patch-clamp electrophysiology to examine synaptic adaptations following binge drinking (Drinking-in-the-Dark) in male mice, finding a constellation of changes that include increased AMPA and NMDA receptor function at insula synapses onto striatal projection neurons. They go on to assess a causal role for this projection in regulating binge drinking using optogenetics, finding that stimulating insula->striatal transmission in vivo reduces total ethanol consumed during DID, along with several specific behavioral measurements of drinking microstructure. One of the most interesting of these findings is a decrease in "front-loading", or drinking during the very beginning of the session, a phenotype that has been associated with problematic drinking and alcohol use disorder in humans. Finally, the authors use machine learning to build a predictive model that can reliably discern stimulated mice from controls. These studies improve our understanding of the neurocircuitry that mediates binge drinking and synaptic and circuit adaptations that occur following binge drinking. Experiments are blinded and performed in a rigorous manner, including physiological validation experiments in support of the in vivo optogenetic manipulation. Despite many strengths, there are significant limitations and gaps in the electrophysiology studies included in this version of the manuscript. As acknowledged by the authors, there are curious findings that are seemingly at odds with each other, and further studies addressing cell type specificity and/or feedforward inhibition would significantly improve the interpretation of this work. Furthermore, the manuscript would be significantly improved by an expanded Introduction containing more specific background information along with a standalone Discussion to place these findings within the broader literature. Lastly, a major limitation of these studies is the low number of mice used for the in vivo optogenetic control experiments and the exclusion of female mice throughout.

      Major concerns:

      1) Expanded Introduction and Discussion. The Introduction does not discuss and/or downplays historical literature investigating neuroadaptations following binge drinking. Studies examining changes in glutamate receptor function within striatal circuits should be discussed in greater detail, rather than the broad pass and review citation included. Behavioral studies examining how the function of the insula and DLS regulate ethanol exposure should also be discussed, especially including work examining the insula to accumbens pathway. It would also be worthwhile to reference human studies implicating the insula and DLS in AUDs.

      We have expanded the introduction and discussion to include these topics.

      2) It is difficult to form a comprehensive picture of the electrophysiological changes reported in Figure 1. The data seems to indicate increased AMPAR function, even more increased NMDAR function, decreased glutamate release probability, and decreased population spikes. These conflicting findings are acknowledged and there are two possible factors mentioned in the manuscript - differential engagement of MSN populations and changes in feedforward inhibition through local interneurons. I disagree with the authors' dismissal of potential MSN subtype-specific effects contributing to these discrepancies. Although AIC inputs innervate D1 and D2 MSNs comparably under control conditions, it is quite possible that the pathways are differentially altered following DID, as has been observed in many reports of alcohol or drug exposure (e.g. Cheng et al. Biological Psychiatry 2017). On the other hand, I wholeheartedly agree with the authors that AIC-driven feedforward inhibition through local interneurons (or even MSNs) could explain the curious divergence between the synaptic and population-level changes depicted in Figure 1. I think additional experiments addressing to help connect the dots are critical in interpreting the changes described in this manuscript. The authors could consider targeted recordings from specific cell types (e.g. D1, D2, and/or interneurons), measurements of AMPA/NMDA receptor subunit stoichiometry, and/or additional experiments in conditions where feedforward transmission is blocked (e.g. PTX or TTX/4AP).

      The reviewer has excellent points that will help elucidate a mechanism. Many of these suggestions are planned experiments in our laboratory, but are, in our opinion, beyond the scope of the present manuscript. Please see our response to Reviewer #2’s 3rd stated weakness. We have revised the text to incorporate some of the points raised here.

      3) N=2 mice in the ICSS experiment in Figure 4J is not sufficient to interpret, and including error bars on this data set is misleading. There also appears to be a difference in distance traveled between GFP and ChR2 mice in Figure 4C, but statistics are not reported. It is also hard to understand what that might mean given the way these data are normalized.

      For this revised manuscript we reran this experiment with 6 animals per group and updated Figure 4 I and J and the accompanying methods section titled “Intracranial self-stimulation” to reflect the change. We also note that the new, correctly powered experiment confirmed the previous claim that AIC inputs to the DLS do not modulate operant responding behaviors.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors have used every possible combination and permutation of treatments at different stages of diapause and post diapause development in the mouse and used conditional gene knockouts at different stages to tease out the interactions of Foxa2 with Msx1 and LIF in the reactivation and implantation process in mice. The authors extend diapause further after treatments with progesterone and an estrogen-degrading chemical to show that this will prolong diapause in the presence of Msx1. Overall this study advances our knowledge of the cross-talk between uterine endometrium and the blastocyst during and after the remarkable phenomenon that is diapause.

      Strengths

      Demonstrating that Msx1 is critical to maintaining diapause, and that diapause is maintained in Foxa2 deficient mice have clarified their interactions. It is interesting that LIF triggers implantation on day 8 but cannot support the pregnancy to full term. Suppression of the estrogen effects by progesterone or fulvestrant increases the duration of diapause. Demonstrating that Foxa2 induces diapause via interactions with MSX1 shows Foxa2 plays such an important role in the control of diapause and adds another 'cog' to the complex wheel of its control.

      Weaknesses

      There is an assumption that everyone will understand the various manipulations that are done in this study - some effort needs to be made to clarify each experimental stage. How long are the embryos viable after the extension of the diapause by the various manipulations.

      The very positive review by a well-known expert in the field of diapause is reassuring, and we agree with her suggestions to improve the quality of the manuscript. As recommended, we now provide a scheme to summarize our findings to illustrate the length of embryo dormancy (see Fig. 7).

      Reviewer #3 (Public Review):

      Matsuo et al. have authored a manuscript describing the effects of depletion of the forkhead box gene, Foxa2, on embryogenesis and gestation in the mouse. The effects of this treatment are the induction of the diapause arrest in the development of the embryo and consequent dormancy. The manuscript is wellprepared, and the figures, for the most part, are didactic and interpretable. Although the conclusions are interesting, the principal weaknesses of the manuscript are the lack of novelty and the perceived absence of some controls and follow-up experiments.

      Controls and Follow-ups:

      1) The Cre/lox system depletes rather than deletes genes. Although in situ data are presented, these are not judged to be quantitative. The usual qPCR analysis of tissues could have established the quantity of depletion. Stupid but can be done. This is important because the frequency of implantation sites in both Cre/lox models (lines 111-113) may be attributable to the residual expression of Foxa2.

      The Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ mouse models used in the current study have been used in the previous studies (refs 7 and 8 in the manuscript). The deletion efficiency of Foxa2 in Foxa2f/fPgrCre/+ mice was examined by RT-PCR and IHC (figure 2 in ref 7); while the deletion efficiency in Foxa2f/fLtfCre/+ mice was examined by IHC (figure S1 in ref 8). The deletion efficiency has been proven by hundreds of publications since the generations of Pgr-cre in 2005 and Ltf-cre mice in 2014.

      Although these mouse lines have been used before, we confirmed the deletion of Foxa2 at the beginning of our study at protein levels (fig 1c) and RNA levels (fig 1d). We understand that the reviewer is trying to link the observation that some of the knockout animals still carried implantation sites on day 8 of pregnancy with the possibility that the deletion of Foxa2 is not complete. However, it is not uncommon to observe such phenotypes that are not fully penetrant even in systemic knockout mouse models. Nonetheless, we now provide real time PCR results of uterine Foxa2 on day 4 of pregnancy in all mouse models used in the current manuscript in the new supplemental figure 1.

      2) The most novel and salient finding of the present study is that the depletion of Foxa2 results in embryos that are in a state that "morphologically resembled dormant blastocysts". A useful experiment would have been to transplant these embryos to normal recipients or to culture them in vitro to determine whether they were capable of reactivation from the dormant state.

      Whether dormant embryos in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri can be reactivated is the main question we studied. The results in figures 4-6 address this question. The blastocysts in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri can be activated on day 4 as shown in figure 4b. Without any support, blastocysts in Foxa2f/fLtfCre/+ uteri still can be reactivated on day 8 (figure 4b). In the following experiments and results shown in figures 5 and 6, we tried to improve the uterine environment by supplementing progesterone and estrogen. Dormant embryos are successfully re-activated by a LIF injection and the pregnancies proceeded to full terms.

      This reviewer suggests using normal recipients to test the reactivation of dormant embryos. Given dormant embryos can be reactivated in a knockout uterine environment, embryo transfer experiments using normal recipients are an addition measure to test the integrity of embryonic dormancy. The embryo transfer experiments may be futile attempt in our studies because of the following reasons.

      The numbers of mated mutant females that yield blastocysts are relatively meager and so are the numbers of blastocysts recovery, especially from diapausing donors. It is well known that implantation rates after blastocyst transfer are compromised due the surgical trauma and anesthesia. Therefore, the results from these experiments may not provide meaningful information.

      Furthermore, during the pandemic our mouse colonies were drastically reduced, and we are still recovering from this downturn during this “New Normal”. Notably, pregnancy rate fluctuates throughout the year even if mice are housed in a controlled environment, and pregnancy rate is often relatively poor in mutant mice which of course depend on the genetic background and diets (DOI: 10.1126/scisignal.aam9011). Most importantly, viability of diapausing embryos is amply evident from our experiments (Figs. 4-6)

      3) Figure 3C indicates that embryos recovered on Day 8 had an extensive proliferation of ICM cells, but not trophoblast. Previous studies have explored the progression of entry and exit from diapause in the mouse (DOI: 10.1093/biolre/ioz017) showing that reactivation of the embryo from diapause commences in the ICM and then proceeds to the trophoblast. It therefore may be possible that proliferation in the trophoblast is not suspended, rather than the recovered blastocyst has resumed development and that mitotic activity has not yet reached the trophoblast.

      It is common to see KI67 expression in the ICM of dormant embryos. Figure 4D from the paper quoted by this reviewer presents Ki67 staining on embryos undergoing diapause at different stages. In our study, we showed Ki67 staining on dormant embryos collected on day 8, which equals D7.5 in their figure. Our data in figure 3C is consistent with observation shown. Without LIF, embryos remain dormant in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri.

      4) In Figure 4B, neither the Ltf nor the Pgr Cre treated uteri appear normal on Day 8. This is not consistent with the conclusion in lines 170 et seq. of the manuscript. It is difficult to discern normality from Figure 4C, but it is clear that the PgrCre-lox uterus does not conform to the controls. It is later noted that there is edema in the uteri at this time in the Day 8-treated PgrCre/lox mice (lines 217-218).

      We have clarified our description.

      Lines 173-176: Notably, implantation sites with a normal appearance were observed in Foxa2f/fLtfCre/+ uteri when LIF was given on day 8 of pregnancy (Figure 4b), albeit Foxa2f/fPgrCre/+ uteri with edema have only faint blue bands. Histology of implantation sites confirmed this observation.

      In line 217, we stated that “the uterine edema in Foxa2f/fPgrCre/+ females two days after LIF injection on day 8…”. Figure 4B showed that Foxa2f/fPgrCre/+ uteri with edema have some very faint blue bands suggesting implantation-like reaction. But we do not think they are real implantation, which is confirmed by figures 4c and e.

      5) In Figure 6B, the implantation sites appear substantially smaller in mice of both mutant genotypes. Supplemental Figure 4 suggests that this is not the case. It is unclear whether the samples chosen for figures are representative of the uteri and whether variation in the size of implantation sites was observed.

      In figure 6B, the Foxa2f/f uteri samples were collected on day 10 of pregnancy, which is same as when Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ tissues were collected. Since embryos implanted in Foxa2f/f uteri on day 4 night but in Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ uteri on day 8 after LIF injections, the implantation sites are bigger in Foxa2f/f uteri. However, in supplemental figure 4 the implantation sites were collected from Foxa2f/f females on day 6 of pregnancy, which show similar size as compared to implantation sites collected from Foxa2f/fPgrCre/+ and Foxa2f/fLtfCre/+ females 2 days after LIF injection.

    1. Reviewer #2 (Public Review):

      Transgenerational effects (TE) (usually defined as multigenerational effects lasting for at least three generations) generated a lot of interest in recent years but the adaptive value of such effects is unclear. In order to understand the scope for adaptive TE we need to understand i) whether such effects are common; ii) whether they are stress-specific; and iii) if there are trade-offs with respect to performance in different environments. The last point is particularly important because F1, F2 and F3 descendants may encounter very different environments. On the other hand, intergenerational effects (lasting for one or two generations) are relatively common and can play an important role in evolutionary processes. However, we do not know whether intergenerational and transgenerational effects have same underlying mechanisms.

      This study makes a big step towards resolving these questions and strongly advances our understanding of both phenomena. Much of the previous work on mechanisms of multigenerational effects has been conducted in C. elegans and this works uses the same approach. They focus on bacterial infection, Microsporidia infection, larval starvation and osmotic stress. I did not quite understand why the authors chose to focus on P. vranovensis rather than P. aeruginosa P14 that has been used in previous studies of transgenerational effects in C. elegans. However, this is a minor point because I guess they were interested in broad transgenerational responses to bacterial infection rather than in strain-specific ones. The authors used different Caenorhabditis species, which is another strength of this study in addition to using multiple stresses.

      They found 279 genes that exhibited intergenerational changes in all C species tested, but most interestingly, they show that a reversal in gene expression corresponds to a reversal in response to bacterial infection (beneficial in two species and deleterious on one). This is very intriguing! This was further supported by similar observations of osmotic stress response.

      They also report that intergenerational effects are stress-specific and there have deleterious effects in mismatched environments, and, importantly, when worms were subject to multiple stresses. It is quite likely that offspring will experience a range of environments and that several environmental stresses will be present simultaneously in nature. I really liked this aspect of this work as I think that tests in different environments, especially environments with multiple stresses, are often lacking, which limits the generality of the conclusions.

      Another interesting piece of the puzzle is that beneficial and deleterious effects could be mediated by the same mechanisms. It would be interesting to explore this further. However, this is not a real criticism of this work. I think that the authors collected an impressive dataset already and every good study generates new research questions.

      Given these findings, I was particularly keen to see what comes of transgenerational effects. The general answer was that there aren't many, and the authors conclude that all intergenerational effects that they studied are largely reversible and that intergenerational and transgenerational effects represent distinct phenomena. While I think that this is a very important finding, I am not sure whether we can conclude that intergenerational and transgenerational effects are not related.

      In my view, an alternative interpretation is that intergenerational effects are common while transgenerational effects are rare. Because intergenerational effects are stress-specific, transgenerational effects could be stress-specific as well.

      Perhaps different mechanisms regulate intergenerational responses to, say, different forms of starvation (e.g. compare opposing transgenerational responses to prolonged larval starvation (Rechavi et al. doi:10.1016/j.cell.2014.06.020) and rather short adulthood starvation (Ivimey-Cook et al. 2021 https://doi.org/10.1098/rspb.2021.0701). Perhaps some (most?) forms of starvation generate only intergenerational responses and do not generate transgenerational responses. But some do. Those forms of starvation that generate both intergenerational and transgenerational effects could do so via same mechanisms and represent the same phenomenon. I am by no means saying this is the case, but I am not sure that the absence of evidence of transgenerational effects in this study necessarily suggests that inter- and trans-generational effects are different phenomena.

      The only concern real concern was the lack of phenotypic data on F3 beyond gene expression. Ideally, I would like to see tests of pathogen avoidance and starvation resistance in F3. However, given the amount of work that went into this study, the lack of strong signature of potential transgenerational effects in gene expression, and the fact that most of these effects were shown previously to last only one generation, I do not think this is crucial.

      It would be very interesting to compare gene expression and other phenotypic responses in F1 and F3 between P. vranovensis and PA14. Also, it would be interesting to test the adaptive value of intergenerational and transgenerational effects after exposure to both strains in different environments. This is would be very informative and help with understanding the evolutionary significance of transgenerational epigenetic inheritance of pathogen avoidance as reported previously. Why response to P. vranovensis is erased while response to PA14 is maintained for four generations? Are nematodes more likely to encounter one species than the other? Again, however, this is not something necessary for this study.

      The main strengths of this paper are i) use of multiple stresses; ii) use of multiple species; iii) tests in different environments; and iv) simultaneous evaluation of intergenerational and transgenerational responses. This study is first of a kind, and it provides several important answers, while highlighting clear paths for future work. Excellent work and I think it will generate a lot of interest in the community.

    1. Reviewer #2 (Public Review):

      This paper seeks to test the extent to which adaptation via selective sweeps has occurred at disease-associated genes vs genes that have not (yet) been associated with disease. While there is a debate regarding the rate at which selective sweeps have occurred in recent human history, it is clear that some genes have experienced very strong recent selective sweeps. Recent papers from this group have very nicely shown how important virus interacting proteins have been in recent human evolution, and other papers have demonstrated the few instances in which strong selection has occurred in recent human history to adapt to novel environments (e.g. migration to high altitude, skin pigmentation, and a few other hypothesized traits).

      One challenge in reading the paper was that I did not realize the analysis was exclusively focused on Mendelian disease genes until much later (the first reference is not until the end of the introduction on pages 7-8 and then not at all again until the discussion, despite referring to "disease" many times in the abstract and throughout the paper). It would be preferred if the authors indicated that this study focused on Mendelian diseases (rather than a broader analysis that included complex or infectious diseases). This is important because there are many different types of diseases and disease genes. Infectious disease genes and complex disease genes may have quite different patterns (as the authors indicate at the end of the introduction).

      The abstract states "Understanding the relationship between disease and adaptation at the gene level in the human genome is severely hampered by the fact that we don't even know whether disease genes have experienced more, less, or as much adaptation as non-disease genes during recent human evolution." This seems to diminish a large body of work that has been done in this area. The authors acknowledge some of this literature in the introduction, but it would be worth toning down the abstract, which suggests there has been no work in this area. A review of this topic by Lluis Quintana-Murci1 was cited, but diminished many of the developments that have been made in the intersection of population genetics and human disease biology. Quintana-Murci says "Mendelian disorders are typically severe, compromising survival and reproduction, and are caused by highly penetrant, rare deleterious mutations. Mendelian disease genes should therefore fit the mutation-selection balance model, with an equilibrium between the rate of mutation and the rate of risk allele removal by purifying selection", and argues that positive selection signals should be rare among Mendelian disease genes. Several other examples come to mind. For example, comparing Mendelian disease genes, complex disease genes, and mouse essential genes was the major focus of a 2008 paper2, which pointed out that Mendelian disease genes exhibited much higher rates of purifying selection while complex disease genes exhibited a mixture of purifying and positive selection. This paper was cited, but only in regard to their findings of complex diseases. A similar analysis of McDonald-Kreitman tables3 was performed around Mendelian disease genes vs non-disease genes, and found "that disease genes have a higher mean probability of negative selection within candidate cis-regulatory regions as compared to non-disease genes, however this trend is only suggestive in EAs, the population where the majority of diseases have likely been characterized". Both of these studies focused on polymorphism and divergence data, which target older instances of selection than iHS and nSL statistics used in the present study (but should have substantial overlap since iHS is not sensitive to very recent selection like the SDS statistic). Regardless, the findings are largely consistent, and I believe warrant a more modest tone.

      There are some aspects of the current study that I think are highly valuable. For example, the authors study most of the 1000 Genomes Project populations (though the text should be edited since the admixed and South Asian populations are not analyzed, so all 26 populations are not included, only the populations from Africa, East Asia, and Europe are analyzed; a total of 15 populations are included Figures 2-3). Comparing populations allows the authors to understand how signatures of selection might be shared vs population-specific. Unfortunately, the signals that the authors find regarding the depletion of positive selection at Mendelian disease genes is almost entirely restricted to African populations. The signal is not significant in East Asia or Europe (Figure 2 clearly shows this). It seems that the mean curve of the fold-enrichment as a function of rank threshold (Figure 3) trends downward in East Asian and European populations, but the sampling variance is so large that the bootstrap confidence intervals overlap 1). The paper should therefore revise the sentence "we find a strong depletion in sweep signals at disease genes, especially in Africa" to "only in Africa". This opens the question of why the authors find the particular pattern they find. The authors do point out that a majority of Mendelian disease genes are likely discovered in European populations, so is it that the genes' functions predate the Out-of-Africa split? They most certainly do. It is possible that the larger long-term effective population size of African populations resulted in stronger purifying selection at Mendelian disease genes compared to European and East Asian populations, where smaller effective population sizes due to the Out-of-Africa Bottleneck diminished the signal of most selective sweeps and hence there is little differentiation between categories of genes, "drift noise"). It is also surprising to note that the authors find selection signatures at all using iHS in African populations while a previous study using the same statistic could not differentiate signals of selection from neutral demographic simulations4.

      The authors find that there is a remarkably (in my view) similar depletion across all but one MeSH disease classes. This suggests that "disease" is likely not the driving factor, but that Mendelian disease genes are a way of identifying where there are strongly selected deleterious variants recurrently arising and preventing positively selected variants. This is a fascinating hypothesis, and is corroborated by the finding that the depletion gets stronger in genes with more Mendelian disease variants. In this sense, the authors are using Mendelian disease genes as a proxy for identifying targets of strong purifying selection, and are therefore not actually studying Mendelian disease genes. The signal could be clearer if the test set is based on the factor that is actually driving the signal.

      One of the most important steps that the authors undertake is to control for possible confounding factors. The authors identify 22 possible confounding factors, and find that several confounding factors have different effects in Mendelian disease genes vs non-disease genes. The authors do a great job of implementing a block-bootstrap approach to control for each of these factors. The authors talk specifically about some of these (e.g. PPI), but not others that are just as strong (e.g. gene length). I am left wondering how interactions among other confounding factors could impact the findings of this paper. I was surprised to see a focus on disease variant number, but not a control for CDS length. As I understand it, gene length is defined as the entire genomic distance between the TSS and TES. Presumably genes with larger coding sequence have more potential for disease variants (though number of disease variants discovered is highly biased toward genes with high interest). CDS length would be helpful to correct for things that pS does not correct for, since pS is a rate (controlling for CDS length) and does not account for the coding footprint (hence pS is similar across gene categories).

      The authors point out that it is crucial to get the control set right. This group has spent a lot of time thinking about how to define a control set of genes in several previous papers. But it is not clear if complex disease genes and infectious disease genes are specifically excluded or not. Number of virus interactions was included as a confounding factor, so VIPs were presumably not excluded. It is clear that the control set includes genes not yet associated with Mendelian disease, but the focus is primarily on the distance away from known Mendelian disease genes.

      Minor comments:

      On page 13, the authors say "This artifact is also very unlikely due to the fact that recombination rates are similar between disease and non-disease genes (Figure 1)." However, Figure 1 shows that "deCode recombination 50kb" is clearly higher in disease genes and comparable at 500kb. The increased recombination rate locally around disease genes seems to contradict the argument formulated in this paragraph.

      1. Quintana-Murci L. Understanding rare and common diseases in the context of human evolution. Genome Biol. 2016 Nov 7;17(1):225. PMCID: PMC5098287<br /> 2. Blekhman R, Man O, Herrmann L, Boyko AR, Indap A, Kosiol C, Bustamante CD, Teshima KM, Przeworski M. Natural selection on genes that underlie human disease susceptibility. Curr Biol. Elsevier BV; 2008 Jun 24;18(12):883-889. PMCID: PMC2474766<br /> 3. Torgerson DG, Boyko AR, Hernandez RD, Indap A, Hu X, White TJ, Sninsky JJ, Cargill M, Adams MD, Bustamante CD, Clark AG. Evolutionary processes acting on candidate cis-regulatory regions in humans inferred from patterns of polymorphism and divergence. PLoS Genet. Public Library of Science (PLoS); 2009 Aug;5(8):e1000592. PMCID: PMC2714078<br /> 4. Granka JM, Henn BM, Gignoux CR, Kidd JM, Bustamante CD, Feldman MW. Limited evidence for classic selective sweeps in African populations. Genetics. Oxford University Press (OUP); 2012 Nov;192(3):1049-1064. PMCID: PMC3522151

    1. Author Response:

      Reviewer #1:

      The manuscript by Bellio and colleagues is based on the experimental model of T. cruzi infection in WT, MyD88-/- and IL-18-/- mice previously described by the same group in a 2017 eLife publication. The main message of the current study is that, in addition to IFN-g+ Th1 effectors, T. cruzi infection induces an even larger population of cytotoxic CD4+ T cells.

      The characterization of the cytotoxic CD4+ T cells is well documented. The data shown are convincing. However, since Burel et al. (2012) described the existence of a similar population in humans infected with P. falciparum (an intracellular pathogen), the authors should modify the statement (line 35-36) in the abstract.

      First, we would like to thank Reviewer #1 for the positive comments on our work.

      Please note that our statement in the abstract is: “Here, for the first time, we showed that CD4CTLs abundantly differentiate during mouse infection with an intracellular parasite” refers to mouse experimental models of parasite infection and not to human studies. We could not find any article with Burel JG as first author published in 2012; we believe that Reviewer# 1 is referring to a study published in 2016 (Burel et al. PLoS Pathog. 2016 Sep 23;12(9):e1005839), in which a population of CD4 T cells with cytotoxic properties was described in humans after primary exposure of blood-stage malaria parasites. Please note that the finding of the important role of T-cell intrinsic IL- 18R/MyD88 signaling for the development of a strong CD4CTL response is also part of the main message of our manuscript.

      Similarly, the title "Cytotoxic CD4+ T cells… predominantly infiltrate Trypanosoma cruzi-infected hearts" is an overstatement. If cytotoxic CD4+ T cells outnumber 10:1 IFN-g-secreting population (in lymphoid tissue) their higher representation in hearts of infected mice is not a selective phenomenon but rather expected.

      We would like to thank Reviewer #1 for this comment, giving us the opportunity to clarify this point. Of note, we were not referring to the ratio of CD4CTL to Th1 cells, but to the frequency of CD4CTL among all the CD4+CD44+ (activated/memory) T cells. In fact, as shown in Figure 7-figure supplement 2, (now added to the revised ms), we found that the frequency of GzB+ cells among all activated/memory CD4+CD44hi T cells is significantly increased in the heart compared to the frequency of GzB+ among CD4+CD44hi T cells found in the spleen. Please also note that the frequency of CD4+ T cells expressing both GzB and PRF also increases in the heart compared to the spleen (Fig. 7F, middle panel and Fig. 1D left panel). We are now including this information in the revised manuscript, clarifying this point.

      My major concern is that the function of these cells remains undefined. Are they beneficial or detrimental for the host? It appears that the authors themselves could not make up their minds. The GzB+ CD4+ T cells protect but do not decrease the parasite load (Fig 6G).

      Our results in the mouse model of infection with T. cruzi, employing the adoptive transfer of WT CD4+GzB+ T cells to the susceptible Il18ra-/- mouse strain, indicate a clear beneficial role of CD4CTLs in the acute phase of experimental T. cruzi infection. Significantly extended survival was observed in the group of mice receiving sorted CD4+GzB+ cells, without, however, decreasing parasite load (Figure 6G). We would like to comment here that in order to be beneficial to the host, an immune response does not always result in decreasing the pathogen load. In fact, in certain circumstances, to hinder the excessive inflammatory response (which can lead to host death), is an advantage for the host, even if this does not result in the reduction of the pathogen numbers. The advantage conferred to the host by regulating the inflammatory response was probably also explored in pathogen/host co-evolution, giving rise to chronic infections, where the host can survive for a longer period and the pathogen increases its chances of transmission (Schneider DS & Ayres JS., 2008, Nat Rev Immunol;8(11):889; Medzhitov R, et al, 2012, Science; 335(6071):936). Therefore, the results shown on Figure 6G are fully compatible with a potential regulatory role exerted by CD4CTLs, previously proposed by other authors (Mucida et al, Nat. Immunol. 2013), and point to the beneficial role of CD4CTLs for the host in the acute phase of infection with T. cruzi, probably by contributing to the decrease of immunopathology, the detrimental side of an exacerbated immune response, as discussed. Also favoring this hypothesis, the frequency of CD4CTLs expressing immunoregulatory molecules is increased when compared to other activated CD4+T cell subsets (Figure 3 and new Figure 7-figure supplements 3 and 4). Please see our complete discussion on this subject in the revised manuscript.

      On the other hand, during the chronic phase of the disease, the persistence of the immune response against the parasite might involve functional changes in the CD4 T cell response. This hypothesis could explain the association found between CD4CTLs and cardiomyopathy in chronic Chagas patients. Therefore, a beneficial role for CD4CTLs in the acute phase is totally compatible with the hypothesis that, during the chronic response in a persistent infection, CD4CTLs might acquire a detrimental role, contributing to immunopathology. Of note, several studies in the literature have shown a beneficial role for Th1 cells during the acute phase of infection with T. cruzi, while the Th1 response has also been associated to a pathologic outcome during the chronic phase of Chagas disease (reviewed in Ferreira et al, 2014 World J Cardiol 2014 6(8):7820 and in Fresno & Girones, 2018, Front.Immunol. 9;351). Therefore, it is not implausible that the CD4CTL subpopulation, could also display different roles in the acute versus the chronic phases of the infection with T. cruzi. However, at present, this hypothesis remains speculative as stated in the manuscript discussion. An extensive investigation of the role of CD4CTLs, as well as of immunoregulation mechanism acting in chronic Chagas patients need to be conducted to fully answer this question, which is beyond the scope of the present work. Nevertheless, we acknowledge that the alternative possibility remains, in which the higher levels of CD4CTLs in chronic patients reflect elevated parasite burden and/or inflammation in the heart, without a direct involvement of this cell subset in the pathology. Please see our answer to Review #2 on this topic and the inclusion of discussion clarifying this point in the revised manuscript.

      Are they terminally differentiated or "exhausted" effectors? GzB+ CD4+ T cells can be found in the hearts of chronically infected mice, but we do not know if they are specific for pathogen or self Ags. Do they express the markers of exhaustion on day 14 in the heart?

      1) We have commented in the first version of the manuscript that one of the limitations of our work is the fact that very few CD4 epitopes of T. cruzi presented by I-Ab have been described so far, and this limits the investigation on the specificity of CD4CTLs in our model. This is a very interesting and important question, which, however, is not possible to address in the present work.

      We would like to thank Reviewer#1 for the suggestion of performing a broader analysis on the expression of immunoregulatory markers associated with exhaustion and/or terminal differentiation, which adds for the comprehension of CD4CTL biology in the model of acute infection with T. cruzi. Whether GzB+CD4+ T cells are terminally differentiated or "exhausted" effectors is an interesting and debated question. It was initially hypothesized that since exhausted T cells share features with terminally differentiated T cells, this would suggest a developmental relationship between these cell states (Akbar, A.N. & Henson, S.M., 2011 Nat. Rev. Immunol.11:289; Blank, C.U. et al, 2018, Nat.Rev.Immunol,19:665). However, subsequent studies showed that exhausted T cells seem to be derived from effector cells that retain the capacity to be long-lived (Angelosanto, J.M. et al., 2012, J. Virol. 86: 8161). In the first version of our manuscript, we investigated the expression of several markers associated with exhaustion such as 2B4, Lag-3, Tim-3 and CD39, besides the downregulation of CD27 on GzB+ CD4+ T cells (Figures 1E, 3B, 3D-E and 5E). In general, cells losing the expression of CD27 have been characterized as Ag-experienced further differentiated cells (Takeuchi and Saito, 2017, Front.Immunol. 8:194). Our finding that, differently from GzB-negative cells, most GzB+CD4+ T cells had lost the expression of CD27, suggested to us that CD4CTLs present in the spleen of mice infected with T. cruzi might be further differentiated T cells (Figure 3E). The transcription factor Blimp-1 controls the terminal differentiation of cells in a variety of immunological settings and its high expression in CD4+ and CD8+ T cells is associated to the expression of immunoregulatory markers (Chihara, N. et al, 2018, Nature 558:454). The observed high expression of Blimp-1 by GzB+CD4+ T cells (Figure 5D) is also compatible with the hypothesis that CD4CTLs are terminally differentiated. Of note, most of the exhaustion studies were performed on CD8+ T cells and it is still not well established if this phenomenon is equally regulated in CD4+ T cells. We have now extended the investigation on the expression of terminal differentiation/exhaustion markers, including PD-1 staining, on GzB+PRF+ CD4+ T cells in the spleen and in the heart of infected mice. Results in Figure 7-figure supplement 3, show that CD44hiGzB+PRF+ CD4+ T cells compose the subset of activated cells among which the higher frequency of cells expressing these markers is found, both in the spleen and in the heart, at day 14 pi. The only exception was the equal ratio of cells expressing PD-1, and at equivalent levels, when comparing CD44hiGzB-PRF- and CD44hiGzB+PRF+ CD4+ T cells in the spleen. Non-significant differences in the percentages of cells expressing PD-1 among CD44hiGzB-PRF- and CD44hiGzB+PRF+ CD4+ T cells were found in the heart. However, the intensity of expression of the PD-1 marker (MFI) was significantly higher among CD44hiGzB+PRF+ compared to CD44hiGzB-PRF- CD4+ T cells infiltrating the heart. Furthermore, we also compared the frequency of CD44hiGzB+PRF+ CD4+ T cells expressing Lag-3, Tim-3, CD39 and PD-1, and their corresponding MFI values, between the spleen and the heart (Figure 7-figure supplement 4). Of note, while MFI values of Tim-3, CD39 and PD-1 expression were increased on CD4CTLs (CD44hiGzB+PRF+) in the heart compared to CD4CTLs in the spleen, Lag-3 expression levels were decreased on CD4CTLs infiltrating the cardiac tissue. Despite exhaustion being often seen as a dysfunctional state, it is important to note that the expression of these inhibitor molecules allows strongly activated T cells to persist and partially contain chronic viral infections without causing immunopathology and that highly functional effector T cells can also express such inhibitory receptors (reviewed in Wherry, E.J., 2011, Nat. Immunol.,12:492; Blank, C.U. et al, 2018, Nat. Rev. Immunol., 19:665). Interestingly, only PD-1, but not Lag-3, Tim-3 or CD39 expression is upregulated on CD8CTLs in the heart relatively to the spleen, an indication that the T. cruzi-infected cardiac tissue is a less so-called exhaustion-inducing environment compared to certain tumors (Figure 7- figure supplement 4). It is known that many immunomodulatory molecules, including Lag-3, Tim-3, PD-1 and CD39 are co-expressed as part of a module composing a larger co-inhibitory gene program, which is expressed in both CD4+ and CD8+ T cells under certain activation conditions, driven by cytokine IL-27 (Chihara, N. et al, 2018, Nature 558:454). The opposing behavior of Lag-3 expression, which is downmodulated on CD4CTLs in the heart in comparison to the spleen, indicate that CD4CTLs infiltrating the heart are not typically exhausted cells. Of note, a recent study has shown that exhausted CD8+T cells can partially reacquire phenotypic and transcriptional features of T memory cells, in a process that includes the downmodulation of Lag-3 expression (Abdel-Hakeem, M.S. et al, 2021, Nat.Immunol., 22:1008). As requested, these new data were included (Figure 7-figure supplements 3 and 4) and discussed in the revised manuscript.

      The factors that control differentiation of cytotoxic CD4+ T cells are the same as for IFN-g- Th1 cells. MyD-88-/- and IL-18-/- mice significantly lack both populations and succumb to T. cruzi infection. In their 2017 eLife publication, this group reported that survival of infected MyD-88-/- and IL-18-/- mice can be rescued by adoptive transfer of purified total WT CD4+ T cells, which was attributed entirely to their ability to secrete IFN-g (at least in the case of MyD-88-/- recipients). In the current study, the authors only used infected IL-18-/- recipients and show that this time transfer of GzB+ CD4+ T cells is sufficient to confer the protection. When compared with the old data, the rescue of the infected IL-18-/- with only GzB+ CD4+ T cells looks weaker (2 surviving animals out of 10 pooled from 2 experiments), strongly suggesting that IFN-g Th1 cells do play a significant role. It is unclear when the parasite load in Fig G6 was evaluated. It would be good to show deltaCT values for individual mice.

      We thank Reviewer #1 for the opportunity to clarify the point on the protective role of Th1 and CD4CTLs cells during T. cruzi infection and to better discuss our data. Please note that we do not question the beneficial role of Th1 cells in this infection model. In our paper published in 2017 in eLife, we have shown that the adoptive transfer of IFN-g- deficient CD4+ T cells do not result in the decrease of parasite loads in susceptible recipient mice. These results are totally in agreement with the known beneficial role of Th1 cells during infection with T. cruzi, through the microbicidal action of IFN-g, which was also described by other groups.

      The new information that our present study brings is that the adoptive transfer of GzB+CD4+ T cells with poor (GzB-YFP+) or no (Ifng-/-) capacity of IFN-g secretion, also significantly extended survival of infected Il18r-/- mice, which have lower levels of both Th1 and CD4CTLs, compared to WT mice (Figure 6G and Figure 6-figure supplement 2). Please note that 3 (not 2) out of 10 mice that received GzB+CD4+ T cells survived. We stated in our discussion that, together, our present and past data demonstrate that both Th1 and CD4CTL are important for improving survival, although through different mechanisms, since adoptively transferred GzB+CD4+ T cells (as well as Ifng-/- CD4+ T cells) were not capable of reducing parasite load but, notwithstanding, extended survival.

      Following the guidelines of the Animal Care and Use Committee, in order to prevent/alleviate animal suffering, all laboratory animals found near death must be euthanized. Therefore, parasite load in the hearts was evaluated in mice found at the moribund condition (a severely debilitated state that precedes imminent death, as defined in Toth, L.,2000; ILAR J, 41:72), presenting unambiguous signals that the experimental endpoint has been reached. We have now included 2ˆDeltaCT values for individual mice in Figure 6G, as requested.

      Because donor IFN-g-/- CD4+ T cells do express IFN-gR (Supp Fig 6-2), IFN-g produced by IL-18-/- host cells could enhance the activity and/or help expand cytotoxic CD4+ T cells among the IFN-g-/- CD4+ donor population. To directly test the protective role of cytotoxic CD4+ T cells in the absence of IFN-g, the authors should treat infected IL-18-/- mice that have received IFN-g-/- CD4+ T cells with anti-IFN-gamma Ab.

      It is known that IFN-g is critically important for resistance against infection with T. cruzi. Accordingly, Ifng-/- mice are extremely susceptible, dying at early time points of infection (Campos, M. et al, 2004, J.Immunol, 172:1711). Of note, IFN-g production by other cell types, and not only derived from CD4+ T cells, is relevant for resistance against infection, as demonstrated for CD8+ T cells (Martin D & Tarleton R. Immunol Rev. 2004, 201:304). In our present work, we performed experiments where Ifng-/- CD4+ T cells were adoptively transferred to susceptible Il18ra-/- mice, with the goal of testing whether the transferred cells would be able to confer some increment in the survival time of infected mice, despite of not being able to decrease parasite loads, a direct consequence of their deficiency in IFN-g production, as previously shown (Oliveira et al., 2017, eLife). In fact, this turned out to be the case and we showed that the transfer of purified Ifng-/- CD4+ T cells extended survival (Figure 6-figure supplement 2). Of note, our data demonstrate that the percentage of GzB+CD4+ T cells is not affected in the total absence of IFN-g, since Ifng-/- mice display the same frequency of this cell population as found in WT mice (Figure 4B). The increased survival of adoptively transferred mice is compatible with a regulatory function of GzB+CD4+ T cells, which additionally express several immunoregulatory molecules, as shown. Whether IFN-g produced by the host is enhancing the activity and/or expanding cytotoxic CD4+ T cells among the transferred T cell population is not an essential point here, since we were not aiming to test the protective role of cytotoxic CD4+ T cells in the total absence of IFN-g in the host mice.

      The intracellular cytokine staining in this study appears to be suboptimal. Instead of stimulating with PMA/ionomycin in the presence of Golgi block, Roffe et al. (2012) stimulated lymphocytes with anti-CD3 prior to adding Brefeldin A, an important technical difference which may explain the rather low frequencies of IFN-g+ and IL-10+ cells in this study.

      We respectfully disagree from Reviewer #1 on this point. The frequency of IFNg+ CD4+ and IL-10+CD4+ T cells in the spleen of mice infected with T. cruzi Y strain obtained in our experiments is in the same range to what was previously described by other research groups investigating the immune response to this parasite, including studies that have employed anti-CD3 stimulation and brefeldin A, such as Jankovic, D. et al, 2007, JEM 204:273 (Fig.S1), cited in our manuscript (page 9, lines 218-219), among others (Nihei J et al, 2021, Front. Cell. Infect. Microbiol.11:758273; Martins GA et al, 2004, Microbes Infect 6:1133 – Fig.6B; Hamano S. et al, 2003, Immunity, 19:657- Fig. 2A). In the present work, we used the combination of monensin and brefeldin A after PMA/iono treatment, and found the same frequency of IFN-g+CD4+ T cells described in a previous study of our group, where staining was performed after incubation of splenocytes with parasite-derived protein extract and brefeldin A alone (Oliveira AC et al., 2010, PLoSPath 6(4):e1000870 –Fig. 8D). On the other hand, please note that the study cited by Rev. #1 (Roffe et al., JI 2012) employed a different strain of T. cruzi, the Colombiana strain, which differs in several aspects from the Y strain used in our work. Colombiana induces a different pathology, with distinct kinetics. In that study, intracellular IFN-g and IL-10 detection was performed at a much later time point of infection (day 30 pi), and in cells infiltrating the heart, not the spleen. In summary, frequencies of IFN-g and IL-10 secreting CD4+ T cells described in our manuscript are comparable to the ones found in the spleen of mice infected with the same or similar strains of T. cruzi and reported in articles of prestigious journals by other groups, cited above.

      Reviewer #2:

      In this work, Professor Bellio and her colleagues provide compelling evidence to show unusually strong induction of cytotoxic CD4 T cells (CD4CTLs) in Trypanosoma cruzi-parasitized mice. Using genetic models and mixed bone marrow chimeras they dissect the signals responsible for CD4CTL induction in this infection and identify T cell-intrinsic IL-18R/MyD88 signaling as the key inducer. The CD4CTLs that clonally expand in T. cruzi infection outnumber CD4 cells with typical Th1 profile (IFN-γ secretion) and bear the hallmarks of CD4CTLs described in other model systems and in humans. Utilizing GzmbCreERT2/ROSA26EYFP reporter mice, the authors show that adoptive transfer of CD4 cells that have made GzB can increase the survival of T. cruzi parasitized l18ra-/- mice. Finally, the authors describe a clear correlation between the frequency of CD4CTLs the circulation of patients with T. cruzi-induced chronic Chagas cardiomyopathy, implying a pathogenic role for these cells in chronic disease.

      The findings reported here are an important addition to the understanding of both the origin of CD4CTLs and their potential role in host protection or disease. The evidence provided in support of the main claims is very strong and the association between CD4CTLs and Chagas disease quite intriguing. There are, however, some aspects of the work that would benefit from further clarification or experimental support, so that alternative interpretations of the data can be excluded.

      The defining characteristic of CD4CTLs that separates them from other CD4 subsets is the production of granzymes and perforin and, by extension, the ability to kill target cells in a granzyme/perforin-dependent manner. In contrast, all T cells can kill target cells via alternative mechanisms that are not dependent on granzyme/perforin, for example through expression of TNF family members. It would appear that much, if not most, of the killing activity of T. cruzi-induced CD4CTLs can be attributed to FasL (Fig. 1B). FasL-mediated killing is not restricted to CD4CTLs and as the title of one of the cited studies (Kotov et al., 2018) states, "many Th cell subsets have Fas ligand-dependent cytotoxic potential". It would be important to ascertain if expression of granzyme/perforin by CD4CTLs in T. cruzi infection is also associated with granzyme/perforin-dependent cytotoxicity. This affects the direct and indirect in vitro cytotoxicity assays, as well as the interpretation of in vivo protection.

      Similarly, the protective effect of transferring GzmbCreERT2/ROSA26EYFP reporter-positive cells to Il18ra-/- mice may not be necessarily mediated in a granzyme/perforin-dependent manner or by CD4CTLs for that matter. The reporter will mark cells that express GzB at the time of tamoxifen administration but does not guarantee that these cells will continue to express GzB or that they will prolong survival of recipients in a granzyme/perforin-dependent manner.

      While the authors provide evidence that GzB-producing cells are largely distinct from IFN-γ-producing cells, the reporter-positive cells may still contain genuine Th1 cells. Given Th1 cells have been previously found necessary for protection of Il18ra-/- mice in the T. cruzi model, can a role for Th1 cells in this transfer model be formally excluded? The authors do convincingly demonstrate that IFN-γ itself is not essential for protection, but that does not leave granzyme/perforin-dependent as the only other alternative. For example, the experiment described in Fig. 6G lacks an important control, the transfer of reporter-negative cells. What would the conclusion be if reporter-negative (but T. cruzi-specific) cells proved as protective as reporter-positive cells?

      We would like to thank Reviewer #2 for the positive comments on our study and for giving us the opportunity to better discuss and clarify the relevant points raised in this review.

      (i) Concerning the role of GzB/PRF in cytotoxicity: as explained in more details in our next answer to Reviewer #2, we have now shown that the cytolytic activity of the CD4 T cell subset differentiating in the murine T. cruzi-infection model is totally dependent on a GzB- and PRF-mediated mechanism.

      (ii) Concerning a possible role for Th1 in the adoptive transfer experiments: please note that the parasite load is not decreased by the adoptive transfer of CD4+GzB+ T cells (Figure 6G); Additionally, we showed that the adaptive transfer of Ifng-/- CD4+ T cells also extend the survival of infected mice (Figure 6-figure supplement 2), but did not decrease parasite levels (Oliveira et al., 2017). These results exclude a role for Th1 cells, which are known to exert an important microbicidal function through the production of IFN-g, as previously demonstrated by us (Oliveira, 2017) and other groups. Together, our present and past data support the notion that both Th1 and CD4CTL are important for extending survival, although through different mechanisms. Our results are in accordance with an immunoregulatory role played by CD4CTLs, likely through the GzB/PRF/FasL-mediated killing of infected APCs in an IFN-g-independent manner, although it is not possible to attribute the beneficial role of the adoptively transferred CD4CTLs exclusively to their cytolytic function, as discussed in the revised manuscript. Of note, we also show here that most CD4+GzB+PRF+ T cells express high levels of immunomodulatory molecules, raising the possibility that the beneficial role of adoptively transferred CD4CTLs might rely on the concerted action of their cytolytic function and immunomodulatory activity. Please see the full discussion on this point in the revised version of the manuscript.

      (iii) Concerning the adoptive transfer of GzB-EYFP-negative cells: unfortunately, GzB-EYFP-negative cells cannot be employed as a control, since in the GzmBCreERT2/ ROSA26EYFP mouse line age, only 1 - 3 % of total splenic CD4+ T cells express EYFP after induction by tamoxifen (Figure 2-figure supplement 3). This contrasts to 10-40% of GzB+ and PFR+ cells among CD4+ T lymphocytes, observed by intracellular staining. Consequently, the majority of the CD4+GzB+ T population is EYFP-negative in this system and thus, sorted “GzB-EYFP-negative”, based on the absence of expression of EYFP, would not be bona-fide GzB-negative cells. If it were possible to sort GzB reporter-negative cells, Th1 cells would be among the sorted cells and upon adoptive transfer they would secrete IFN-g and, consequently, decrease the parasite load in recipient mice (Oliveira, 2017). However, in the absence of the proposed immunoregulatory action of CD4CTLs, Th1 cells transferred alone might also increase pathology and, consequently, it is possible that they would not extend survival, albeit diminishing parasite load. It is expected that higher levels of extended survival would be attained when both Th1 and CD4CTLs are transferred, as discussed in the manuscript and in answer (ii) above. Importantly, please note that one current hypothesis is that CD4CTLs differentiate from Th1 and, therefore, the adoptive transfer of Th1 cells will not guarantee that Th1-derived CD4CTLs would not be developing in vivo, unless special engineered mouse strains, not available at present, would be employed for these experiments.

      Reviewer #3:

      By modelling trypanosoma cruzi infection in mice, the authors highlighted the presence of a subsets of CD4 T cells expressing canonical markers and transcription factors of CTLs and capable of exerting antigen specific and MHC class II restricted cytotoxic activity. Mechanistically, using KO mice, the authors have shown that myd88 expression is required for strengthening the CD4 CTLs phenotype during the infection.

      Moreover, by investigating the presence of a previously published CD4 CTLs gene signature in a mixed bone marrow chimera settings they highlighted a cell intrinsic role for Myd88 in imprinting the signature. The study also identifies Il18R as a myd88 upstream receptor potentially responsible for CD4 CTLs development by showing that lack of IL18R phenocopied myd88 deficiency in failing to promote a CD4 CTLs phenotype.

      Finally, by showing the direct correlation between perforin expressing CD4 T cells in Chagas infected individuals and parameters of heart disfunction the authors hinted at a possible involvement of CD4CTLs in a clinical setting.

      -The core finding of the paper, providing the first evidence of CD4 CTLs development in a mouse model of intracellular parasite is well supported by the data. The expression of markers correlated to CD4 cytotoxicity in other settings and gene signatures fits well the phenotype described and suggests possible common features for CD4 CTLs development across infection with different pathogens.

      This manuscript will boost the knowledge over the involvement of non canonical CD4 types in the immune responses to parasites. Moreover the finding that CD4 CTLs are the predominant phenotype in organs importants for viral replication imply an involvement of these cells in the development of the pathology that will have to be taken into accounts in future studies.

      • The understanding of the parental relationship beteween CD4CTLs and Th1 remains unclear and it's complicated by the low numbers of IFNg (regarded as an hallmark of functional Th1) producing CD4 T cells detected in the model. IFN-g production by CD4 is lower than 10% even when achieved by PMA/Iono stimulation and half of Gzb+ CD4 stain positive for the cytokine. On the other hand the putative transcription factor of Th1 development, Tbet, is expressed by all Gzb positive CD4s. This discrepancy and the low number of IFNG+ should be better discussed by the authors.

      First, we would like to thank Reviewer #3 for the constructive criticism on our manuscript. Regarding the apparent discrepancy on the frequencies of IFN-g+ and Tbet+ CD4+ T cells in our model, please first note that the percentage of IFN-g+ CD4+ T cells detected in the present study is comparable to the ones found in the spleen of mice infected with the same or similar strains of T. cruzi and reported by other groups (please see our complete answer to Reviewer #1 on this topic). With this remark done, we think that the apparent discrepancy between the expression of T-bet and the low fraction of GzB+CD4+ T cells producing IFN-g is a very interesting question. It is known that T-bet is a key transcription factor associated with the development of IFN-g-producing CD4+ T cells and that it also coordinates the expression of multiple other genes in CD4+ T cells and in other cell types. Also, T-bet can interact with other proteins, resulting in the induction or inhibition of key factors in T cell differentiation (reviewed in Hunter, 2019, Nat. Rev. Immunol, 19:398). Importantly, it has been shown that during the late stages of Th1 cell activation, T-bet recruits the transcriptional repressor Bcl-6 to the Ifng locus to limit IFNg transcription (Oestreich, 2011, JEM, 208:1001) Therefore, T-bet action is not limited to transactivation of the Ifng gene, but can also act as part of a negative-feedback loop to limit IFN-g production in certain cells. We do not believe that Bcl-6 is playing a role in CD4+GzB+ T cells in our model, since we found that the majority of CD4+GzB+ T lymphocytes express Blimp-1 (Figure 5D), and Blimp-1 and Bcl-6 are known to be reciprocally antagonistic transcription factors.

      However, the possibility remains that another repressor factor is downregulating Ifng gene transcription in the majority of T-bet+ CD4+GzB+ T cells, with the participation of T-bet or not. Of note, Blimp-1 was shown to be a critical regulator for CD4 T cell exhaustion during infection with T. gondii, and CD4+ T cells deficient in Blimp-1 produced higher levels of IFN-g in infected mixed-bone marrow chimeric mice reconstituted with WT and Blimp-1 conditional knock-out cells (Hwang, S., 2016, JEM 213:1799). Furthermore, Blimp-1 attenuates IFN-g production in CD4 T cells activated under nonpolarizing conditions and chromatin immunoprecipitation showed that Blimp-1 binds directly to a distal regulatory region in the Ifng gene (Cimmino, L. et al. 2008, JI 181:2338). We have also shown that, like Blimp-1, Eomes is expressed by around 60% of the GzB+CD4+ T cells (Figure 2G). It is known that Eomes controls the transcription of cytotoxic genes and promotes IFN-g production in CD8+ T cells, binding to the promotor of the Ifng gene. Interestingly, Eomes was also shown to participate in the induction of immunoregulatory/exhaustion receptors, such as PD-1 and Tim-3. Furthermore, deficiency of Eomes led to increased cytokine production (Paley, M.A. et al., 2012, Science 338: 1220). More recently, evidence in favor of the participation of Eomes in the repression of IFN-g production in TCR-gamma-delta T cells was also published (Lino, C. et al.,2017, EJI 47:970). Therefore, these studies indicate the complex control of Ifng gene, in which T-bet, Eomes, Blimp-1 and possible other TFs might play concerted roles. We think it would be interesting to investigate the role of Eomes and/or Blimp-1 in the repression of the Ifng gene in GzB+CD4+ T cells. Kinetics studies on the expression of these TFs, may contribute for the better understanding of the parental relationship between CD4CTLs and Th1 cells, a fundamental question, not completely understood yet. A comment on this subject was included in the revised manuscript.

      On the same note, while the confirmation of a CD4 CTLs gene signature in the model is very convincing, it must be noted that the one used as a reference was obtained by performing single cell RNA seq , taking into account only IFNg+ CD4 cells and then comparing Gzb+ and Gzb- negative in the setting. The authors are instead using bulk RNA seq and comparing populations of cells that would have none VS low levels of Th1. In this view, while the confirmation of the CD4 CTLs signature is striking, addressing the relative relationship with Th1 cells is complicated. Using Gzb YFP reporters in the setting could help improving the resolution between the 2 subsets.

      Our analysis clearly demonstrated the presence of the CD4CTL signature among WT CD4+ T cells, and its absence among Myd88-/- CD4+ T cells from the same mixed-BM chimeric mice. Together with our past work (Oliveira, 2017) and results included in the present manuscript, this analysis strongly contributes to demonstrate the importance of T-cell intrinsic IL-18R/MyD88 signaling for the development of a robust CD4CTL response to infection with an intracellular parasite. Although these results argue in favor of a common origin for CD4CTLs and Th1 cells during infection, an interesting point is that Ifng-/- mice display the same percentage of GzB+CD4+ T cells as WT mice (Figure 4B), suggesting that GzB+CD4+ T cells might emerge independently of IFN-gdependent Th1 cells. Therefore, the possibility remains that not all CD4CTLs are derived from the putative terminal differentiation of Th1 cells but that, instead, a divergence between the Th1 and CTL differentiation programs might occur at an earlier step. Although addressing this fundamental question goes beyond the possibilities of the present study, we believe that our results bring an important and substantial contribution for the understanding of the biology of CD4CTLs in response to infection and highlights the importance of IL-18R/MyD88 signaling for the reinforcement and/or stabilization of CD4+ T cell commitment into the CD4CTL phenotype. Regarding the use of GzB-YFP reporters, please see our answer below.

      • The dependancy on the Myd88/IL18r axis to promote CD4 CTLs is well characterized and the prolonged survival rate of IL18r-/- after the adoptive transfer of Gmb YFP+ CD4 is very convincing. However instead of using PBS as control the authors could have used YFP- or total CD4 cells for the task. While in previous publication it was already showed that protection was achieved by transferring the total CD4 population; comparing GzB + VS GzB- would have added useful insights over the amount of protection conferred by the subtypes and relative roles of CD4 CTLs and Th1 in the model. Parasitemia could also be reassessed in this view.

      We have already discussed the impossibility of sorting bona-fide GzB-negative cells from the reporter mouse strain available. Please see our complete answer to Reviewer 2 on this issue (iii) in this point-by-point letter. Moreover, due to the low percentage of GzB-EYFP cells labeled in the tamoxifen-treated reporter mice, a high number of mice is necessary for performing these adoptive transfer experiments. Unfortunately, due to the COVID-19 pandemic and its consequences on our animal facility, at present it is impossible to repeat this experiment including total CD4+T cells within a reasonable time. However, we have already shown in our past study (Oliveira, 2017), that the transfer of total WT CD4+T cells to Il18ra-/- mice, increased survival and lowered parasite load. On the other hand, our current data demonstrate that the adoptive transfer of GzB+CD4+ T cells increases survival but does not change the parasite load (Figure 6G). Therefore, these data strongly support that GzB+CD4+ T cells act in an IFN-g-independent way and, hence, differ from Th1 in the effector mechanism employed for extending survival of the recipient mice. In summary, our results favor the notion that CD4CTLs and Th1 cells have complementary roles, both being able to extend survival of recipient mice, although only Th1 are effective in lowering parasite load.

    1. Author Response

      Reviewer #1 (Public Review):

      The results are quite interesting and potentially have important therapeutic implications. Nevertheless, in the current form there are several weaknesses that diminish the strength of the findings.

      1) As the authors note, they do not provide direct evidence for the ultimate conclusion of the study that assembly with β2a and β2e subunits are necessary for CaV2.3 channels to contribute to pacemaking in SN DA neurons. The authors state siRNA knockdown experiments in SN DA neurons are technically challenging. Nevertheless, shRNA knockdown studies in SN neurons have been previously published. Such a study is critical to provide direct evidence for what would be a very important and impactful finding.

      Please refer to our detailed response to essential revision 1 above.

      2) Relative contribution of CaV1.3 (L‐type) and CaV2.3 channels to pacemaking in SN DA neurons. As the authors note, a phase III clinical trial for the L‐type channel blocker, isradipine, showed no efficacy for neuroprotection, even though some mice studies suggested this might be efficacious. On the other hand, the authors' previous work with CaV2.3 knockout mice suggest inhibition of this channel would be more appropriate for a neuroprotective response. It would be useful to get a direct comparison of the impact of isradipine and SNX‐482 on pacemaking in SN DA neurons (Figs. 1 and 2). If their impacts on pacemaking (and Ca2+ oscillations) are similar it would suggest something beyond the pacemaking Ca2+ influx could be responsible for neuroprotection (e.g. changes in NCS‐1 expression as previously suggested by the authors).

      The question about the relative contribution of Cav1.3 and Cav2.3 on pacemaking is complex due to the finding that different results have been obtained regarding the role of L‐type channels on pacemaking. In Cav1.3 knockout mice pacemaking frequency is normal (7, 8). Inhibition (of Cav1.2 and Cav1.3) by dihydropyridine Ca2+ channel inhibitors (e.g. isradipine, nimodipine) was found to inhibit pacemaking in some (e.g. 9‐11) but not in all (8, 12) reports. This seems to be dependent on experimental conditions, but the reasons for these discrepancies are currently unclear. Similarly, we find inhibition of pacemaking by SNX‐482 in cultured midbrain neurons (this paper) but, as previously reported, not in Cav2.3‐deficient mice (1). While this toxin is well suited to isolate Cav2.3‐mediated Ca2+ current components, effects on pacemaking in DA neurons have to be interpreted with more caution because (as clearly outlined in our original MS and our previous paper, 1), SNX‐482 is also a potent inhibitor of Kv4.3 channels. We consider this limitation even more in the discussion of SNX‐482 effects on pacemaking in cultured neurons (data now moved to Suppl Fig. 5) in the revised MS (end of page 15, top of page 16), although the SNX‐482 changes suggest an involvement of Cav2.3 for AP generation.

      Although we acknowledge the relevance of the question raised by the reviewer, based on our previous findings (1) the absence of an obvious role of Cav2.3 for pacemaking in SN DA neurons (despite their role for Ca2+ transients) as an experimental read‐out prevents a straightforward approach to study the contribution of different β‐subunits and their splice variants for this process.

      3) The slice recording data (Fig. 9) are confusing and raise concerns about adequacy of pharmacological isolation of CaV2.3 currents in this preparation. The accuracy of interpretation of the data in Fig. 9 rests critically on the idea that the cocktail of CaV channel blockers given successfully isolates CaV2.3 currents. Yet, the amplitudes of the exemplar currents shown for plus or minus the CaV channel blocker cocktail are almost the same. This cannot be due to CaV2.3 providing the dominant current in the slice preparation since addition of SNX‐482 only decreased Ca2+ current amplitude by 13% (Suppl Fig. 5). It is not clear to me why the steady‐state activation and inactivation curves experiments were not conducted in the cultured neuron preparation (Figs. 1 and 2) where there seems to be better control of pharmacological block of different Cav channel isoforms.

      We have now performed the isolation of SNX‐482sensitive currents not only in the cultured neuron preparation as suggested but, in addition, also in SN DA neurons. The latter experiments gave essentially identical steady‐state inactivation parameters as compared to our "R‐type" current (current remaining in the presence of all other channel blockers). This now also allows a direct comparison of SNX‐482‐sensitive current properties in cultured neurons and in slices (see response above). We now also specifically discuss previous reports of SNX‐482‐sensitive Rtype components in the introduction to allow comparison of these reports with our findings. Please also note that in our legend to Fig. 9A (original MS, now Fig. 6) we have explicitly stated that recordings of "similar amplitudes were chosen" to facilitate comparison of current kinetics. We still think that this makes sense and kept this part of the figure but now strengthened this point even more in the figure legend (Fig. 6).

      4) While the transcript data show that β2a and β2e are present in SN DA neurons, numerically they would still represent only a minority of the beta subunits present (<25%). I don't think sufficient thought has been given to this in the discussion of the results. Unless there is some preferential association of CaV2.3 with β2a and/or β2e, there would be a mix of channels with the majority incapable of supporting pacemaking in SN DA neurons. Given this, one would not necessarily expect that the gating characteristics of CaV2.3 would be the same as what is obtained with reconstituted channels in tsA201 cells where all the channels are assembled with β2a or β2e (see point #5 below).

      We now give this important point more thought in the discussion and mention that our data would imply such a preferential association of Cav2.3 with β2a and/or β2e and provide possible explanations. In addition, as in the original MS, we also provide alternative interpretations (Discussion, pg 14, 2nd and 3rd paragraph).

      5) The V0.5,inact of putative CaV2.3 channels in SN DA neurons of ‐52.4 mV was said to be 'very similar' to the value of ‐40 mV that was observed in tsA201 cells. A difference of +12 mV in voltage‐dependence gating of ion channels is substantial and should not be brushed off. A more nuanced interpretation would be that in SN DA neurons CaV2.3 likely associates with other beta subunits in addition to b2a and b2e and so one would not necessarily expect the V0.5,inact to be the same as what is observed in reconstituted channels in tsA201 cells.

      The V0.5,inact of ‐52.4 mV refers to the control current. We correctly stated that the V0.5,inact of R‐type current was ‐47.5 mV (as also shown in Table 3), i.e. only about 7 mV more negative than in tsA‐cells. We now rephrased this chapter because we also included the new data with inactivation data of SNX‐482sensitive currents in cultured neurons and in SN DA neurons recorded in slices (Discussion, page 13, 2nd paragraph). We do not refer to "'very similar" (difference ~5 mV) values anymore as suggested.

      Reviewer #2 (Public Review):

      This reviewer is very enthusiastic about the work but notes that most of the conclusions are based on data obtained by overexpressing Cav2.3 and accessory subunits in a heterologous expression system. The authors make a good argument for cross‐correlation between data in tsA‐201 cells and dopaminergic neurons, but it is unclear that the results will translate from one system to another. More data may be needed to do so (the reviewer does understand that these are challenging experiments), which the authors acknowledge in a section about the study's limitations. Based on this, it seems that the title is misleading without additional data supporting the role of Cav2.3 in dopaminergic neurons. Along the prior line, statements linking the study results to potential pathological implications seem a big stretch not supported by current data, and therefore should be eliminated.

      An issue with this manuscript is that the narrative and organization of the data are difficult to follow. The reviewer understands that the authors are weaving a complex story that involves using multiple techniques and approaches. Still, the way the data is organized and described makes the reader go back and forward to compare and contrast results constantly. This is further complicated by the fact that some experiments are done in dopaminergic neurons and others in tsA‐201 cells (the identity of the cell type used should be made clearer), the order of some figures is not appropriate (Supp Fig 1 for example) and some figure panels are not discussed (Supp Fig 5E to 5J).

      The MS has been completely rewritten, based on the additional SNX‐482experiments we have now performed both in the cultured DA neurons as well as in the midbrain slices. We therefore also moved data on effects on the spontaneous activity of cultured neurons by SNX‐482 into the supplement to make the key results easier to follow. The identity of neurons is indicated in all headers of table and figure legends to identify cell types. We also changed the title to “β2‐subunit alternative splicing stabilizes Cav2.3 Ca2+ channel activity during continuous midbrain dopamine neuronlike activity” to attenuate our previous statement regarding a role in dopaminergic midbrain neurons.

    1. Author Response

      Reviewer #1 (Public Review):

      As we lack empirical data of the response of most species to environmental changes, developing predictive tools based on traits that are easier to access or infer may help us developing better management tools. This is the case even for terrestrial mammals, a rather well studied group but with a large study bias towards temperate Europe and North America. This study uses maximum longevity, litter size and body mass to predict the sign and size of the relationships between annual temperature and precipitation anomalies and population growth rates, using the Living Planet database for times series of abundance and Chelsa for weather anomalies. The authors use a Bayesian framework to relate the size and absolute magnitude of the relationships between detrended population growth rates and weather anomalies, the framework accounting for the uncertainty in estimates as well as phylogenetic dependencies. They did not find any systematic effects -- on average the slopes of the relationships were close to 0 -- but the magnitude of the coefficients decreases for species with high maximum longevity and low litter size. Therefore, this study points to possible predictions of the magnitude of the response to weather variability using simple demographic indices such as longevity and litter size. The study has clear limitations that are common to similar "meta-regressions" using publicly available databases, but they are not ignored when discussing the results. One would hope that such limitations would lead to improving the quality of such databases, both in terms of taxonomic and geographic coverage as well as quality of data.

      We would like to thank Reviewer 1 for their overall positive feedback and constructive comments on the method and our predictions. We have now included complementary analyses based on high-quality subsets (≥ 20-year records; using life history traits estimated from structured population models), have clarified our set of hypotheses and discussed our results accordingly. Detailed responses are given below.

      I would like to challenge the authors in terms of why one would expect relationships of a given sign or magnitude. First with respect to sign of relationships, even for the same species and the same weather parameters, one could expect different signs depending on where the study is done with regards to the climatic niche. If one is close to the warm (or wet) edge, any positive temperature (or precipitation) anomalies would probably have a negative effect, but the reverse would happen when close to the cold or dry edge. There are studies showing such demographic and growth rate variability differences. I find therefore hard to interpret the sign of such weather anomalies and what it tells us about the "effect" of weather variability.

      We think that this is an important point to discuss with respect to the importance of within-species variability in population dynamics. Certainly, from the results L203-206 it is clear that populations of the same species can have responses of differing signs. It is also interesting to note that this may be the result of a population’s position in the climatic niche. However, aside from exploring this for species with long-term demographic monitoring across the range, we do not feel that exploring this was in the scope of the current study across species. We agree fully however that adding this perspective to studies of how populations are responding to changing climates is critical. As well as the paper mentioned below by Gaillard et al. (2013), recent work in Plantago lancelota with extensive spatial replication has also begun to reveal these within-range dynamics as a function of latitudinal or climatic gradients (Römer et al. 2021). We have added further discussion of this to the manuscript L330-340. We believe that this point adds to the context of our results highlighting variability within-species. In addition, we have clarified in the introduction that no clear directional responses of populations to weather anomalies was expected among and within species L133-135.

      Römer, G., Christiansen, D. M., de Buhr, H., Hylander, K., Jones, O. R., Merinero, S., ... & Dahlgren, J. P. (2021). Drivers of large‐scale spatial demographic variation in a perennial plant. Ecosphere, 12(1), e03356.

      Second with regards to the magnitude, it is clear that the maximum growth rate is strongly linked to maximum longevity and litter size -- slow species have a much lower maximum rate of growth than fast species. So, one would expect that variability of population growth rates is larger in fast species than slow species, and therefore the magnitude of their response to environmental variability. Now the question might also be whether weather variability explains a smaller or larger proportion of the variability in population growth rates -- that is, does weather have a relatively larger influence in fast species than slow species? You might have the answer but with the multiple standardizations of the response and predictor variables it is not obvious (that is, when you standardize the response and predictor variables, coefficients are correlations, but this is across species, not for a given population).

      The reviewer raises a very interesting and important point on whether the patterns we observe are simply a result of larger variability in growth rates in short-lived species. We have two responses to this point: 1) while there is indeed larger variation in the population growth rates of short-lived species, we believe that this variability is likely an evolved life-history strategy in response to the environment, and thus a key component of patterns we observe, 2) we also feel that our use of models that included annual effects, and state-space models with explicit process-noise terms, account for any confounding effect of this variation.

      To address the first point in more detail, we expect that life-histories (and thus population dynamics) are evolved responses to the environment (Stearns, 1992). For ‘fast’ organisms therefore, their intrinsic life-history strategy results in boom-bust population dynamics relative to ‘slow’ species. This is clearly observable in transient or non-asymptotic dynamics, where short-lived species more often have short-term population dynamics with a greater magnitude (Stott et al. 2011). On this point, we therefore argue that this variation in population growth is part of what we are trying to capture. Anomalies in the weather are therefore expected to act more strongly in ‘fast’ species. Following this point and the comments of Reviewer #3, we have now included more explicit hypotheses in terms of life-history L133-144.

      For the second point, while we may expect this variability to be the result of dynamics we are trying to capture, this does not preclude other sources of variation in population size confounding the patterns we could observe. For example, hunting pressure may influence both short-term population variability and long-term trends. As a result, we aimed to capture this residual variation using auto-regressive terms for year in our GAMs. While these terms do not explicitly model variability in population growth, they do account for a component of the trend, with variation (error around the trend, which is expected to be larger for fast species), and auto-regressive components of population change. Moreover, we did additional analyses using a state-space modelling approach. In the state-space approach, process noise, which in our case would equate to variability in population growth, is explicitly modelled and accounted for. We therefore believe that our analyses account for residual variability in population growth rates. State space models were also highly correlated with our auto-regressive GAMs, and we can therefore conclude that we do not expect that this variability influences our findings. We have now asserted this in the Methods section L531-535.

      Stearns, S.C., 1992. The evolution of life histories (No. 575 S81).

      Stott, I., Townley, S. and Hodgson, D.J., 2011. A framework for studying transient dynamics of population projection matrix models. Ecology Letters, 14(9), pp.959-970.

      Your analyses remove trends -- that is, climate or other systematic change as opposed to weather anomalies (yearly differences) -- and trends might be the main concerns in terms of conservation. This is made clear in the discussion but perhaps not as much in the introduction where you seem to focus on climate change (the title reflects this well, however, as you mention weather, not climate). This confusion between weather and climate is often made in the literature, when reference is made to climate effects rather than weather effects.

      We agree with the reviewer that climate and weather are often conflated in ecological studies. We apologise for this oversight in the introduction, and agree that the narrative and link to weather was not made explicit in the previous version. Following this point and the suggestions of Reviewer #3, we have now restructured large sections of the introduction to improve the clarity of our hypotheses. To address this point, we have now included specific introduction of different components of climate that species populations may respond to, including short-term extreme weather patterns as we explore in this study. Please find this revised section L80-97.

      Finally, I would like to see a measure of how good is the prediction you can make using traits. You may have "significant effects" but not helping much in terms of prediction (see PB Adler et al. 2011 in Science, for an example with species richness and productivity).

      On this point we disagree with the reviewer. The core of our analysis framework was to examine the predictive performance of models. We do not report any significant effects, and instead use Bayesian inference. Throughout the analysis framework, we used explicit tests of out-of-sample predictive performance with leave-one-out cross validation (Vehtari et al. 2017). This is asserted in the manuscript title and results section when introducing our spatial analysis L188-191. Cross validation was combined with model selection to test the predictive performance of a set of candidate models with respect to base models excluding predictors of interest. This predictive performance framework was not applied to examine the directional effects (question 1), as these models did not contain key predictors. However, model selections using predictive performance were done throughout questions 2 and 3, to explore spatial and life-history effects. We highlight this point in both the results L188-191 and methods sections L608-615. In the case of life-history, we found that relative to the base model, out-of-sample predictions were improved when including univariate life-history traits relative to the base model, and thus life-history traits aid in predicting weather responses.

      We did not explore the relative predictive performance of life-history traits with respect to other traits such as dietary specialisation, which have been shown to be important in climate responses (Pacifici et al. 2017). We believe that this would have been out of scope for the purpose of the current study, where we aimed to test specific hypotheses established in life-history theory.

      Pacifici, M., Visconti, P., Butchart, S.H., Watson, J.E., Cassola, F.M. and Rondinini, C., 2017. Species’ traits influenced their response to recent climate change. Nature Climate Change, 7(3), pp.205-208.

      Vehtari, A., Gelman, A. and Gabry, J., 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and computing, 27(5), pp.1413-1432.

      Reviewer #2 (Public Review):

      Jackson et al. present a global analysis of the effects of life history on the response of terrestrial mammal populations to weather, showing that litter size and longevity significantly alter how populations respond to anomalies in temperature and rainfall. The topic is highly interesting, as it has implications for what data we should monitor to make more reliable predictions about species' responses to climatic change, and how we should prioritise which species to conserve by identifying those which might be at greatest risk.

      The authors comprehensively validate their results with substantial secondary analyses, and I believe that their assertions are supported by the results presented here. Whilst global scale analyses such as this provide useful generalities, they should be taken as that: an investigation of the general trends observed across large spatial scales, and caution should be taken extrapolating too far away from the species which have been analysed for this study.

      We thank the reviewer for their positive feedback, and agree with not drawing too many generalities from our findings. In the first paragraph of the discussion L253-262, we now explicitly refer to the results in the context of mammal population-dynamics/conservation.

      Reviewer #3 (Public Review):

      In this study, the authors aim to investigate how mammalian species are likely to respond to climate change. To this end, they investigate the effects of weather anomalies on the growth rates of mammalian populations. They use long-term population records for 157 terrestrial mammals from the Living Planet database. They explore three different questions using a two-step modelling approach: (1) whether temperature and precipitation anomalies have significant effects on population growth rates across species; (2) whether responses differ among species and biomes; and (3) whether life-history traits explain species responses to weather anomalies.

      The work undertaken in this manuscript is of broad appeal in the field and has the potential to inform conservation. Overall, the methodology is sound and the modelling framework robust; the authors took care to test the robustness of their models by fitting alternative sets of models. The two-step design of this study is interesting and the choice of the study system is relevant for the questions the authors aim to tackle. The authors also paid attention to some important points that are at times overlooked such as resolving taxonomy before running their analyses. I also appreciated the fact that the authors made their code available.

      We thank the reviewer for their positive feedback on the manuscript, which highlights many of our key goals with the paper.

      I nevertheless think that, in its present form, the main weakness of this manuscript is the clarity of the writing, the framing of the study and the overall flow. I found the manuscript at times a bit difficult to follow. That said, I think there is much scope for the authors to improve it. First, I think the work would benefit from better explanation of the underlying hypotheses. Second, in some places I think the authors go into a lot of details at the expense of clarity. As such, I think the authors should strive to better balance clarity with detailed information (notably in the results and methods; adding summary sentences, for example, could help clarify these sections). Third, I think there is room for improvement in the narrative and the flow of the introduction and the discussion. Finally, I think stronger justifications are sometimes required regarding specific points of the analysis.

      I believe that the conclusions of this work are supported by the data and the analyses, and think they are of interest and relevant to the field. However, I think the discussion should highlight the main limitations of the study. In particular, I think the biases in the data should be discussed, and notably whether these biases are expected to affect the results (and if so, in what way).

      To conclude, I think that beyond the aforementioned weaknesses of this study, the results and the methods are of interest for the field. I think the modelling framework is applicable to other study systems and relevant to the field as well.

      We warmly thank the reviewer for their positive words and thorough constructive feedback. We have extensively re-worked large sections of the manuscript (particularly the discussion and introduction) based on these points, and done our best to address all of them. Generally, we have strived to improve the clarity and succinctness of the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Guggenmos proposes a process model for predicting confidence reports following perceptual choices, via the evidence available from stimuli of various intensities. The mechanisms proposed are principled, but a number of choices are made that should be better motivated - I develop below a number of concerns by order of importance.

      I’d like to thank the reviewer for their thorough and excellent review. It’s no set phrase that this review substantially improved the manuscript.

      1) Lack of separability of the two metacognitive modules.

      Can the author show that the proposed model can actually discriminate between the noisy readout module and the noisy report module? The two proposed modules have a different psychological meaning, but seem to similarly impact the confidence output. Are these two mutually exclusive (as Fig 1 suggests), or could both sources of noise co-exist? It will be important to show model recovery for introducing readout vs. report at the metacognitive level, e.g., show that a participant best-fitted by a nested model or a subpart of the full model, with a restricted number of modules (some of the parameters set to zero or one), is appropriately recovered? (focusing on these two modules) This raises the question of how the two types of sigma_m are recoverable/separable from each other (and should they both be called sigma_m, even if they both represent a standard deviation)? If they capture independent aspects of noise, one could imagine a model with both modules. More evidence is needed to show that these two capture separate aspects of noise.

      Testing the separability of the two noise types (readout, report) is a great idea and I have now performed a corresponding recovery analysis. Specifically, I have simulated data with both noise types for different regimes of sensory and metacognitive noise. As shown in the new Figure 7—figure supplement 6, the noise type can be precisely recovered in the most typical regimes.

      I now refer to this analysis in the subsection 2.4 Model recovery (Line 521ff):

      “One strength of the present modeling framework is that it allows testing whether inefficiencies of metacognitive reports are better described by metacognitive noise at readout (noisy-readout model) or at report (noisy-report model). To validate this type of application, I performed an additional model recovery analysis which tested whether data simulated by either model are also best fitted by the respective model. Figure 7—figure supplement 6 shows that the recovery probability was close to 1 in most cases, thus demonstrating excellent model identifiability. With fewer trials per observer, recovery probabilities decrease expectedly, but are still at a very good level. The only edge case with poorer recovery was a scenario with low metacognitive noise and high sensory noise. Model identification is particularly hard in this regime because low metacognitive noise reduces the relevance of the metacognitive noise source, while high sensory noise increases the general randomness of responses.”

      In principle, both noise modules can co-exist and model inversion should be possible (though mathematically more complicated). On the other hand, I anticipate that parameter recovery would be extremely noisy in such a scenario. For this work, I decided to not test this possibility as it would add a lot of complexity, with a high probability of ultimately being unfeasible.

      2) The trade-off between the flexibility of the model (modularity of the metacognitive part, choice of the link functions) and the generalisability of the process proposed seems in favor of the former. Does the current framework really allow to disambiguate between the different models? Or at least, the process modeled is so flexible that I am not sure it allows us to draw general conclusions? Fig 7 and section 3 of the results explain that all models are similar, regardless of module of functions specified; Fig 7 supp shows that half of participants are best fitted by noisy readout, while the other half is best fitted by noisy report; plus, idiosyncrasies across participants are all captured. Does this compromise the generalisability of the modeling of the group as a whole?

      This is a fair point and I understand the question has two components: a) is the model too flexible, potentially preventing generalized conclusions? b) is the flexibility of the model recoverable?

      Regarding a), I should emphasize that the manuscript (and toolbox) provides a modeling framework, rather than a single specific model. In other words, researchers applying the framework/toolbox must make a number of decisions: which noise type? which metacognitive biases should be considered? which link function? To ensure interpretability / generalizability, researchers have to sufficiently constrain the model. Due to this framework character, it makes sense that the manuscript is submitted under the Tools & Resources Article format rather than the Research Article format.

      On the other hand, I agree that it is the duty of the manuscript introducing the framework to provide all necessary information to help the researcher make these decisions. This is where the reviewer’s point b) is critical and I hope that with the new parameter and model recovery analyses in the present revision (see other comments) I meet this requirement to a satisfactory degree.

      To clarify the scope and aim of the paper, I now put a new subsection in front of the example application to the data from Shekhar and Rahnev, 2021 (Line 534ff):

      “It is important to note that the present work does not propose a single specific model of metacognition, but rather provides a flexible framework of possible models and a toolbox to engage in a metacognitive modeling project. Applying the framework to an empirical dataset thus requires a number of user decisions: which metacognitive noise type is likely more dominant? which metacognitive biases should be considered? which link function should be used? These decisions may be guided either by a priori hypotheses of the researcher or can be informed by running a set of candidate models through a statistical model comparison. As an exemplary workflow, consider a researcher who is interested in quantifying overconfidence in a confidence dataset with a single parameter to perform a brain-behavior correlation analysis. The concept of under/overconfidence already entails the first modeling decision, as only a link function that quantifies probability correct (Equation 6) allows for a meaningful interpretation of metacognitive bias parameters. Moreover, the researcher must decide for a specific metacognitive bias parameter. The researcher may not be interested in biases at the level of the confidence report, but, due to a specific hypothesis, rather at metacognitive biases at the level of readout/evidence, thus leaving a decision between the multiplicative and the additive evidence bias parameter. Also, the researcher may have no idea whether the dominant source of metacognitive noise is at the level of the readout or report. To decide between these options, the researcher computes the evidence (e.g., AIC) for all four combinations and chooses the best-fitting model (ideally, this would be in a dataset independent from the main dataset).”

      In addition, the website of the toolbox now provides a lot more information about typical use cases: https://github.com/m-guggenmos/remeta

      3) More extensive parameter recovery needs to be done/shown. We would like to see a proper correlation matrix between parameters, and recovery across the parameter space, not only for certain regimes (i.e. more than fig 6 supp 3), that is, the full grid exploration irrespective of how other parameters were set.

      The recovery of the three metacognitive bias parameters is displayed in Fig 4, but what about the other parameters? We need to see that they each have a specific role. The point in the Discussion "the calibration curves and the relationships between type 1 performance and confidence biases are quite distinct between the three proposed metacognitive bias parameters may indicate that these are to some degree dissociable" is only very indirect evidence that this may be the case.

      A comprehensive parameter recovery analysis is indeed a key analysis that was missing in the first version of the manuscript. I now performed several analyses to address this, rewrote and extended section 2.3 on parameter recovery. The new parameter recovery analysis was performed as follows (Line 455ff):

      “To ensure that the model fitting procedure works as expected and that model parameters are distinguishable, I performed a parameter recovery analysis. To this end, I systematically varied each parameter of a model with metacognitive evidence biases and generated data. Specifically, each of the six parameters (σs, ϑs, δs, σm, 𝜑m, δm) was varied in 500 equidistant steps between a sensible lower and upper bound. The model was then fit to each dataset. To assess the relationship between fitted and generative parameters, I computed linear slopes between each generative parameter (as the independent variable) and each fitted parameter (as the dependent variable), resulting in a 6 x 6 slope matrix. Note that I computed (robust) linear slopes instead of correlation coefficients, as correlation coefficients are sample-sizedependent and approach 1 with increasing sample size even for tiny linear dependencies. Thus, as opposed to correlation coefficients, slopes quantify the strength of a relationship. Comparability between the slopes of different parameters is given because i) slopes are – like correlation coefficients – expected to be 1 if the fitted values precisely recover the true parameter values (i.e., the diagonal of the matrix) and ii) all parameters have a similar value range which makes a comparison of off-diagonal slopes likewise meaningful. To test whether parameter recovery was robust against different settings of the respective other parameters, I performed this analysis for a coarse parameter grid consisting of three different values for each of the six parameters except σm, for which five different values were considered. This resulted in 35·51 = 1215 slope matrices for the entire parameter grid.”

      In addition, I computed additional supplementary analyses assessing a case with fewer trials, a model with confidence biases, and models with mixed evidence and confidence biases. For details about these analyses, I kindly point the reviewer to section 2.3. Together, these new analyses demonstrate that parameter recovery works extremely well across different regimes and for all model parameters, including the metacognitive bias parameters mentioned in the reviewer’s comment.

      1.8: It would be important to report under what regimes of other parameters these simulations were conducted. This is because, even if dependence of Mratio onto type 1 performance is reproduced, and that is not the case for sigma_m, it would be important to know whether that holds true across different combinations of the other parameter values.

      I now repeated this analysis for various settings of other parameters and include the results as new Figure 6—figure supplement 2. While the settings of other parameters affect the type 1 performance dependency of Mratio (with some interesting effects such as Mratio > 1), parameter recovery of sigma_m is largely unaffected. The same basic point thus holds: Mratio shows a nonlinear dependency with type 1 performance, but sigma_m can be recovered largely without bias under most regimes (the main exception is a combination of low sensory noise and high metacognitive noise under the noisy-readout model, which is also mentioned in the manuscript).

      Is lambda_m meaningfully part of the model, and if so, could it be introduced into the Fig 1 model, and be properly part of the parameter recovery?

      I now reworked the part about metacognitive biases to make it more consistent and to introduce lambda_m on equal footing with the other metacognitive bias parameters. I now distinguish between metacognitive evidence biases (the two main bias parameters of the original model, phi_m and theta_m) and metacognitive confidence biases, i.e. lambda_m and a new additive confidence bias parameter kappa_m. The schematic presentation of the model framework in Figure 1 is updated in accordance:

      This change also complies with reviewer 2, who rightfully pointed out that the original model framework put much stronger emphasis on bias parameters loading on evidence than on confidence. The metacognitive confidence bias parameters are now also part of the parameter recovery analyses (Figure 7—figure supplement 2).

      While it is still feasible to combine the two evidence-related bias parameters and lambda_m – as queried by the reviewer – not all mixed combinations of evidence- and confidence-related bias parameters perform well in terms of model recovery (in particular, combining all four parameters; cf. Figure 7—figure supplement 3). Hence, a decision on the side of the modeler is required. I comment on this important aspect at the end of the section 1.4 about metacognitive biases (Line 276ff):

      “Finally, note that the parameter recovery shown in Figure 4 was performed with four separate models, each of which was specified with a single metacognitive bias parameter (i.e., 𝜑m, δm, λm, or Km). Parameter recovery can become unreliable when more than two of these bias parameters are specified in parallel (see section 2.3; in particular, Figure 7—figure supplement 3). In practice, the researcher thus must make an informed decision about which bias parameters to include in a specific model (in most scenarios one or two metacognitive bias parameters are a good choice). While the evidence-related bias parameters 𝜑m and δm have a more principled interpretation (e.g., as an under/overestimation of sensory noise), it is not unlikely that metacognitive biases also emerge at the level of the confidence report (λm, km). The first step thus must always be a process of model specification or a statistical comparison of candidate models to determine the final specification (see also section 3.1).”

      4) An important nuance in comparing the present sigma_m to Mratio is that the present model requires that multiple difficulty levels are tested, whereas instead, the Mratio model based on signal detection theory assumes a constant signal strength. How does this impact the (unfair?) comparison of these two metrics on empirical data that varied in difficulty level across trials? Relatedly, the Discussion paragraph that explained how the present model departs from type 2 AUROC analysis similarly omits to account for the fact that studies relying on the latter typically intend to not vary stimulus intensity at the level of the experimenter.

      I thank the reviewer for this comment which made me realize that I incorrectly assumed that my model requires multiple stimulus difficulty levels. The only parameter that would require multiple stimulus intensities is the sensory threshold parameter, but for this parameter I already state that it requires additional stimulus difficulties close to threshold (Line 147ff). Otherwise I now made extensive tests that the model works just fine with constant stimuli. My reasoning mistake (iirc) was related to the fact that I fit a metacognitive link function, which I thought would require variance on the x-axis; but of course there is already plenty of variance introduced through noise at the sensory level, so multiple difficulty levels are not required to fit the metacognitive level. I now removed the relevant references to this requirement from the manuscript.

      Nevertheless, I agree that it is interesting to perform the comparison between Mratio and sigma_m also for a scenario with constant stimuli. See both the new Figure 6–supplement 1 with constant stimuli, and the (updated) main Figure 6 with multiple stimulus levels for comparison.

      The general point still holds also for constant stimuli: Mratio is not independent of type 1 performance. Thus, the observed dependence on type 1 performance is not due to the presence of varying stimulus levels. I now reference this new supplementary figure in Result section 1.8 (Line 389).

      5) 'Parameter fitting minimizes the negative log-likelihood of type 1 choices (sensory level) or type 2 confidence ratings (metacognitive level)'. Why not fitting both choices and confidence at the same time instead of one after the other? If I understood correctly, it is an assumption that these are independent, why not allow confidence reports to stem from different sources of choice and metacognitive noise? Is it because sensory level is completely determined by a logistic (but still, it produces the decision values that are taken up to the metacognitive level)?

      The decision to separate the two levels during parameter inference was deliberate. I now explain this choice in the beginning of Result section 2 (Line 416ff):

      “The reason for the separation of both levels is that choice-based parameter fitting for psychometric curves at the type 1 / sensory level is much more established and robust compared to the metacognitive level for which there are more unknowns (e.g., the type of link function or metacognitive noise distribution). Hence, the current model deliberately precludes the possibility that the estimates of sensory parameters are influenced by confidence ratings.”

      Indeed, I would regard it as highly problematic if the estimates of sensory parameters were influenced by confidence ratings, which are shaped by a manifold of interindividual quirks and biases and for which computational models are still in a developmental stage. Yet, from a pure simulation-based parameter recovery perspective, in which the true confidence model is known, using confidence ratings would indeed make sensory parameter estimation more precise (because of the rich information contained in continuous confidence ratings which is lost in the binarization of type 1 choices).

      6) Fig 4 left panels: could you clarify the reasoning that due to sensory noise, overconfidence is expected, instead of having objective and subjective probability correct aligning on the diagonal? Shouldn't the effects of sensory noise average out? In other words, why would the presence of sensory noise systematically push towards overconfidence rather than canceling out on average?

      As an intuitive explanation consider the case that no signal is present in a stimulus, e.g., a line grating in a clockwise/counterclockwise orientation discrimination task with an angle of 0 degrees. Since there is no true information in the stimulus, type 1 performance will be at chance level irrespective of sensory noise.

      However, sensory noise matters for the metacognitive level. Assuming no sensory noise (i.e., sigma_s = 0), the observer’s stimulus/decision variable would be zero and thus confidence would be zero. Thus, confidence would exactly match type 1 performance. Yet, assuming the presence of sensory noise, the stimulus estimate (“decision value”) will be always different from point-zero, if ever so slightly. While the average estimate of the stimulus variable across trials will indeed cancel out to zero, each individual trial will be different from zero (in either direction) and hence also the confidence will be different from zero in each trial. Since confidence is unsigned, the average confidence will be greater than zero and thus give the impression of an overconfident observer.

      Note that this explanation was implicitly included in the paragraph on the 0.75 signature of confidence (“When evidence discriminability is zero, an ideal Bayesian metacognitive observer will show an average confidence of 0.75 and thus an apparent (over)confidence bias of 0.25. Intuitively this can be understood from the fact that Bayesian confidence is defined as the area under a probability density in favor of the chosen option. Even in the case of zero evidence discriminability, this area will always be at least 0.5 − otherwise the other choice option would have been selected, but often higher.”, Line 257ff).

      7) The same analysis as Fig 6 but for noisy readout instead of noisy reports do not show the same results: both sigma_m and m-ratio vary as a function of type 1 performance. Does this mean that the present model with readout module does not solve the issue of dependency upon type 1 performance?

      I refer to this in the Result section: “The exception is a regime with very high metacognitive noise and low sensory noise under the noisy-readout model, in which recovery becomes biased” (Line 391ff). Indeed, the type 1 performance dependency of sigma_m recovery in this edge case is not as good as in the noisyreport model. However, note that recovery is stable across a large range of d’ including the range typical aimed for in metacognition experiments (i.e., medium performance levels to ensure sufficient variance in confidence ratings).

      It is also important to point out that a failure to recover true parameters under certain conditions is not a failure of the model, but a reflection of the fact that information can be lost at the level of confidence reports. For example, if sensory noise is very high, the relationship between evidence and confidence becomes essentially flat (Figure 3), producing confidence ratings close to zero irrespective of the level of stimulus evidence. It becomes increasingly impossible to recover any parameters in such a scenario. Vice versa if sensory noise is extremely low, confidence ratings approach a value of 1 irrespective of stimulus evidence, and the same issue arises. In both cases there is no meaningful variance for an inference about latent parameters. This issue is more pronounced in the noisy-readout case because it requires an inversion of precisely the relationship between evidence and confidence.

      8) In Eq8, could you explain why only the decision values consistent with the empirical choice are filtered. Is this an explicit modeling of the 'decision-congruence' phenomenon reported elsewhere (eg. Peters et al 2017)? What are the implications of not keeping only the congruent decision values?

      I apologize, this was a mistake in the manuscript. The integration is over all decision values, not just those consistent with the choice. I corrected it accordingly.

      Reviewer #2 (Public Review):

      This paper presents a novel computational model of confidence that parameterises links between sensory evidence, metacognitive sensitivity and metacognitive bias. While there have been a number of models of confidence proposed in the literature, many of these are tailored to bespoke task designs and/or not easily fit to data. The dominant model that sees practical use in deriving metacognitive parameters is the meta-d' framework, which is tailored for inference on metacognitive sensitivity rather than metacognitive biases (over- and underconfidence). This leaves a substantial gap in the literature, especially as in recent years many interesting links between metacognitive bias and mental health have started to be uncovered. In this regard, the ReMeta model and toolbox is likely to have significant impact on the field, and is an excellent example of a linked publication of both paper and code. It's possible that this paper could do for metacognitive bias what the meta-d' model did for metacognitive sensitivity, which is to say have a considerable beneficial impact on the level of sophistication and robustness of empirical work in the field.

      The rationale for many of the modelling choices is clearly laid out and justified (such as the careful handling of "flips" in decision evidence). My main concern is that the limits to what can be concluded from the model fits need much clearer delineation to be of use in future empirical work on metacognition. Answering this question may require additional parameter/model recovery analysis to be convincing.

      I thank the reviewer for these encouraging and constructive comments!

      Specific comments:

      • The parameter recovery demonstrated in Figure 4 across range of d's is impressive. But I was left wondering what happens when more than one parameter needs to be inferred, as in real data. These plots don't show what the other parameters are doing when one is being recovered (nor do the plots in the supplement to Figure 6). The key question is whether each parameter is independently identifiable, or whether there are correlations in parameter estimates that might limit the assignment of eg metacognitive bias effects to one parameter rather than another. I can think of several examples where this might be the case, for instance the slope and metacognitive noise may trade off against each other, as might the slope and delta_m. This seems important to establish as a limit of what can be inferred from a ReMeta model fit.

      This is an excellent point and was also raised by reviewer #1. See major comment 3 of reviewer #1 for a detailed response. In short, I now provide comprehensive analyses that demonstrate successful parameter recovery across different regimes and both noisy types (noisy-readout, noisy-report). See Figure 7.

      Regarding the anticipated trade-offs between the confidence slope (now referred to as multiplicative evidence bias) and metacognitive noise / delta_m (now additive evidence bias), there is a single scenario in which this becomes an issue. I describe this in the Results section as follows (Line 480ff):

      “Here, the only marked trade-off emerges between metacognitive noise σm and the metacognitive evidence biases (𝜑m, δm) in the noisy-readout model, under conditions of low sensory noise. In this regime, the multiplicative evidence bias 𝜑m becomes increasingly underestimated and the additive evidence bias δm overestimated with increasing metacognitive noise. Closer inspection shows that this dependency emerges only when metacognitive noise is high – up to σm  0.3 no such dependency exists. It is thus a scenario in which there is little true variance in confidence ratings (due to low sensory noise many confidence ratings would be close to 1 in the absence of metacognitive noise), but a lot of measured variance due to high metacognitive noise. It is likely for this reason that parameter inference is problematic. Overall, except for this arguably rare scenario, all parameters of the model are highly identifiable and separable.” In my experience, certain trade-offs in specific edge cases are almost inescapable for more complex models. Overall, I think it is fair to say that parameter recovery works extremely well, including the ‘trinity’ of metacognitive noise / multiplicative evidence bias / additive evidence bias.

      • Along similar lines, can the noisy readout and noisy report models really be distinguished? I appreciate they might return differential AICs. But qualitatively, it seems like the only thing distinguishing them is that the noise is either applied before or after the link function, and it wasn't clear whether this was sufficient to distinguish one from the other. In other words, if you created a 2x2 model confusion matrix from simulated data (see Wilson & Collins, 2019 eLife) would the correct model pathway from Figure 1 be recovered?

      Great point. I introduced a new subsection 2.4 “Model recovery”, in which I demonstrate successful recovery of noisy-readout versus noisy-report models. See also my response to the first comment of Reviewer #1, which includes the new model recovery figure and the associated paragraph in the manuscript. The key new figure is Figure 7—figure supplement 6.

      • Again on a similar theme: isn't the slope parameter rho_m better considered a parameter governing metacognitive sensitivity, given that it maps the decision values onto confidence? If this parameter approaches zero, the function flattens out which seems equivalent to introducing additional metacognitive noise. Are these parameters distinguishable?

      Indeed, the parameter recovery analysis shows a slight negative correlation between the slope parameter (now termed multiplicative evidence bias) and metacognitive noise (Figure 7). As the reviewer mentions, this is likely caused by the fact that both parameters lead to a flattening /steepening of the evidenceconfidence relationship. For reference, in the empirical dataset by Shekhar & Rahnev, the correlation between AUROC2 and the multiplicative evidence bias is almost absent at r = −0.017. Critically, however, while an increase of the metacognitive noise parameter σm will ultimately lead to a truly flat/indifferent relationship between evidence and confidence, the multiplicative evidence parameter 𝜑m only affects the slope (i.e., asymptotically confidence will still reach 1). This is one reason why parameter recovery for both σm and 𝜑m works overall very well. The differential effects of σm and 𝜑m are now better illustrated in the updated Figure 3:

      Also conceptually, the multiplicative evidence parameter 𝜑m plausibly represents a metacognitive bias, with either interpretation that I suggest in the manuscript: as a an under/overestimation of the evidence or as a an over/underestimation of one’s own sensory noise, leading to under/overconfidence, respectively. In sum, I think there are strong arguments for the present formalization and interpretation.

      • The final paragraph of the discussion was interesting but potentially concerning for a model of metacognition. It explains that data on empirical trial-by-trial accuracy is not used in the model fits. I hadn't appreciated this until this point in the paper. I can see how in a process model that simulates decision and confidence data from stimulus features, accuracy should not be an input into such a model. But in terms of a model fit, it seems odd not to use trial by trial accuracy to constrain the fits at the metacognitive level, given that the hallmark of metacognitive sensitivity is a confidence-accuracy correlation. Is it not possible to create accuracy-conditional likelihood functions when fitting the confidence rating data (similar to how the meta-d' model fit is handled)? Psychologically, this also makes sense given that the observer typically knows their own response when giving a confidence rating.

      While I agree of course that metacognitive sensitivity quantifies the relationship confidence-accuracy relationship, a process model is a distinct approach and requires distinct methodology. Briefly, the current model fit cannot be improved upon, as it is based on a precise inversion of the forward model. Computing accuracy-conditional likelihoods would lead to a biased parameter estimates, because it would incorrectly imply that the observer has access to the accuracy of their choice. While the observer knows their choice, as the reviewer correctly notes, they do not know the true stimulus category and hence not their accuracy.

      I argue in the manuscript that both approaches (descriptive meta-d’, explanatory process model) have their advantages and disadvantages. The concept of meta-d’ / metacognitive sensitivity does not care why a particular confidence rating is the way it is, or whether an incorrect response is caused by sensory noise or by an attentional lapse. On the one hand, this implies that one cannot draw any conclusions about the causes and mechanisms of metacognitive inefficiency, which could be perceived as a major drawback. In this respect, it is a purely descriptive measure (cf. last comment of Reviewer #1). On the other hand, because it is descriptive, it can simply compare the confidence between correct and incorrect choices and thus, in a sense, capture a more thorough picture of metacognitive sensitivity; that is, being metacognitively aware not only of the consequences one’s own sensory noise (as in typical process models), but also of all other sources of error (attentional lapses, finger errors, etc.). I now added an additional paragraph in which I summarize the comparison of type 2 ROC / meta-d’ and process models along these lines (Line 800ff):

      “In sum, while a type 2 ROC analysis, as a descriptive approach, does not allow any conclusions about the causes of metacognitive inefficiency, it is able to capture a more thorough picture of metacognitive sensitivity: that is, it quantifies metacognitive awareness not only about one’s own sensory noise, but also about other potential sources of error (attentional lapses, finger errors, etc.). While it cannot distinguish between these sources, it captures them all. On the other hand, only a process model approach will allow to draw specific conclusions about mechanisms – and pin down sources – of metacognitive inefficiency, which arguably is of major importance in many applications.”

      • I found it concerning that all the variability in scale usage were being assumed to load onto evidencerelated parameters (eg delta_m) rather than being something about how subjects report or use an arbitrary confidence scale (eg the "implicit biases" assumed to govern the upper and lower bounds of the link function). It strikes me that you could have a similar notion of offset at the level of report - eg an equivalent parameter to delta_m but now applied to c and not z. Would these be distinguishable? They seem to have quite different interpretations psychologically: one is at the level of a bias in confidence formation, and the other at the level of a public report.

      I substantially reworked the section about metacognitive biases, including an additive metacognitive bias (κm) also at the level of confidence. The previous version of the manuscript already included a multiplicative bias parameter loading onto confidence (previously referred to as ‘confidence scaling’ parameter, now multiplicative confidence bias λm), but it was considered optional and e.g. not part of the parameter recovery analyses.

      My previous emphasis on biases that load onto evidence-related variables was due to a more principled interpretation (e.g. ‘underestimation of sensory noise’), but I agree that metacognitive biases must not necessarily be principled and may be driven e.g. by the idiosyncratic usage of a particular confidence scale. Updated Figure 1 sketches the new, more complete model.

      Is a mix of evidence- and confidence-related metacognitive bias parameters distinguishable? I tested this in Figure 7—figure supplement 3.

      The slope matrices show that e.g., the model suggested by the reviewer (two evidence-related bias parameters 𝜑m and δm + an additive confidence-based bias parameter κm) is to some degree dissociable, although slight tradeoffs start to emerge with such a complex model. By contrast, a mix of only one evidence-related and one confidence-related bias parameter is much more robust. In general, I thus recommend using at most two metacognitive bias parameters, which are selected either based on a priori hypotheses or on a model comparison. I comment on the necessity of choosing one’s bias parameters in a new paragraph in section 1.4 about metacognitive biases (Line 276ff):

      “Finally, note that the parameter recovery shown in Figure 4 was performed with four separate models, each of which was specified with a single metacognitive bias parameter (i.e., 𝜑m, δm, λm, or m). Parameter recovery is more unreliable when more than two of these bias parameters are specified in parallel (see section 2.3; in particular, Figure 7—figure supplement 3). In practice, the researcher thus must make an informed decision about which bias parameters to include in a specific model (in most scenarios 1 or 2 metacognitive bias parameters is a good choice). While the evidence-related bias parameters 𝜑m and δm have a more principled interpretation (e.g., as an under/overestimation of sensory noise), it is not unlikely that metacognitive biases also emerge at the level of the confidence report (λm, km). The first step thus must always be a process of model specification or a statistical comparison of candidate models to determine the final specification (see also section 3.1).”

    1. Author Response

      Reviewer #1 (Public Review):

      The paper correctly identifies two biophysical properties that may impact an OHC contribution to cochlear amplification. These are the membrane RC time constant and prestin kinetics. The RC problem was identified by Santos-Sacchi 1989 (1) based on measures of OHC membrane capacitance, electromotility (eM) and published OHC resting and receptor potential data. At issue was a 20 dB disparity between threshold BM measures and eM when the resting potential (RP, ~ -70 mV)) is displaced from the voltage at maximal eM gain or peak NLC (Vh; ~ -40 mV). If RP were actually at Vh then the problem would not have been identified, assuming that prestin's voltage-responsiveness were frequency-independent, which was not in question at that time. Over the last two decades several groups have found prestin performance to be low pass. Isolated OHCs, macro-patch and OHCs in situ cochlear explants all show this low pass behavior. To date, no manipulations of load have pushed the voltage responsiveness to frequency-independent. This manuscript tries to avoid the kinetics issue and attempts to focus on the RC problem that has been dealt with extensively since 1989, including at that time a suggestion that the RC problem points to the dominance of the stereocilia bundle (2).

      The authors suggest that kinetics of prestin is not addressed in the current manuscript, but this is not the case. In ignoring the paper from Santos-Sacchi and Tan 2018 (3), reliance on Frank et al.'s (4) data explicitly utilizes their kinetic results. OHC84 (so-called short cell, 51 um long) is essentially frequency-independent after microchamber voltage roll-off correction. The authors choose 1 nm/mV gain at 50 kHz to work with in their arguments. As it turns out, the corrected eM of OHC84 is wrong since it does not fix the reported 23 kHz microchamber voltage roll-off. While OHC65 is appropriately fixed, OHC84 is over compensated. Gain at 50 kHz should be about half the chosen gain. This is not the most problematic issue for their arguments, however.

      In Santos-Sacchi and Tan 2018 (3) we show that low frequency (near DC) eM gain for OHCs averaging 55.3 um long is about 15 nm/mV. This indicates, as noted in that paper, that the resting potential of OHC84 was far shifted from Vh, accounting for its wide-band frequency response. If indeed, the authors still maintain that OHC eM is frequency-independent, ala Frank et al. (and in disregard to other publications where, to the contrary, eM gain would be far less at 50 kHz - see (5, 6)), then the eM gain at 50 kHz should be closer to 15 nm/mV; large enough, I think, to make their RC problem exercise overkill. That is, even in 1989 such a gain would not have suggested an RC problem. This is assuming that the normal resting potential is at Vh. Of course, at Vh membrane capacitance would be about twice that of linear capacitance (due to peak NLC) - the cell time constant does not discriminate against source of capacitance. All in all, isolated OHC biophysics that provides the voltage dependence and the kinetics of prestin cannot be ignored to deal with the RC problem in isolation. Doing so will give a false sense of how the cochlea works, and will encourage others to neglect, without rationale, published pertinent data, as with the Sasmal and Grosh 2019 (7) model where the OHC is treated as a frequency-independent PZE device.

      Finally, to scorn the significance of component characteristics comprising the whole cochlea, e.g., based on isolated OHC biophysics or prestin's cryo-EM structure, as a fallacy of composition suffers itself from hasty generalization. Of course, knowing the biophysics of single OHCs informs on the system response. Otherwise, the prestin KO would have been an unfunded goal, never allowed to pass beyond a system modeler's review. Indeed, the authors would have none of the "carefully" chosen data to present their RC counter argument. Pertinent, published biophysical characteristics must be included in any critical discussion on OHC performance. For that matter, cochlear modelers must follow the same rule.

      We thank reviewer #1 for the suggestions on the kinetics of prestin and previous literature.

      Although there is no data (to our best knowledge) for electromotilty (eM) in isolated basal murine OHCs, a more thorough review of the existing literature on the topic suggest that the assumed parameters are indeed a reasonably conservative estimation of eM in situ.

      Additionally, the OHC parameters are pessimistic enough to account for a doubling of effective capacitance due to NLC.

      Regarding the fallacy of composition, we are puzzled that the reviewer interpreted it as a “scorning” of the OHC biophysics, obviously important for cochlear function. The raised point is simple and rather obvious: a system built with low-pass filters doesn’t mean that the system is a low-pass filter. This is elucidated with the analogy, familiar to electrical engineers, that high- and band-pass filters are often built by cascading and mixing the response of low-pass filters. The “fallacy of composition” therefore lies in the conclusion that since eM is “low-pass”, it can’t possibly contribute to high frequency amplification. Strikingly, this conclusion is often based on measured vibrations near the OHCs showing transfer functions with >30 dB peak-to-tail ratio, and that are somewhat consistent with the inner working of cochlear models. That is, we are criticizing one specific interpretation of the biophysical data, not certainly suggesting that collecting and analyzing the data in the first place is unimportant.

      Reviewer #2 (Public Review):

      In the inner ear, the cochlea transforms sound-induced vibrations into electrical signals that are sent to the brain. Cochlear outer hair cells (OHCs) are thought to amplify these vibrations, but it is unclear how amplification works. Sound-induced vibrations modulate the current entering an OHC, which drive its receptor potential, causing the OHC to change length. The change in length owing to the receptor potential variation, known as the OHC's electromotile response, depends on the size of the receptor potential. However, the receptor potential decreases with increasing sound frequency, because of the resistance (R) and capacitance (C) of the OHC's membrane. This paper addresses the RC problem, limitations on high-frequency amplification owing to the OHC's receptor potential decreasing with frequency.

      The authors use a well-known simplification of the RC problem and some back-of-the-envelope calculations to argue that OHCs can amplify sufficiently well at high frequencies to match experimental data, despite the decrease in their receptor potentials. They argue that changes to OHC properties along the cochlea allow them to amplify at high frequencies and that OHCs reduce noise and distortion. They argue against OHCs as being cochlear impedance regulators and that OHCs do not limit cochlear tuning.

      Figure 1 and Equations 1-6 are useful teaching tools but are not novel. The back-of-the-envelope calculations use these equations and a limited number of data points from the literature. There are many prior models that show amplification despite the RC problem, but they are not analyzed or discussed in much detail.

      How RC OHC filtering reduces noise without reducing the signal is not explained. The type of noise calculation done in Appendix 1 is well-known and the application is again a rough back-of-the-envelope calculation. Most of the statements about noise are not fleshed out or supported by calculations.

      The discussion about tonotopic variations has little new data. Fig. 2 uses two data points from the literature and an unpublished data point from a colleague. The fact that BM displacement is smaller at the base than at the apex is well known. There is speculation that reduced OHC motion is "effectively counteracted" by gradients in OHC capacitance and MET current, but no evidence is presented.

      The discussion about distortions is pedagogical but is again speculation without new or strong-supporting evidence. Fig. 3 argues that OHCs might reduce high-frequency distortions, but don't limit the cochlear amplifier. The plots shown are either well-known consequences of filtering or a summary of the authors' previous model data.

      The arguments against OHCs as regulators and that they don't limit tuning are not well flushed out, speculative, and unsupported by new calculations or data.

      This paper does not clarify OHC operation or the RC problem, because it mixes speculation, limited data, and topics that are not clearly related to the problem.

      We agree with reviewer #2 that there are no new physics principles elucidated here, and that most of the discussion relies on simple calculations. But we believe that such simple calculations are the missing piece (absent in the literature) that allow one to appreciate the magnitude of the problem under exam—magnitude typically inflated by focusing on quantities whose physical significance is uncertain. In other words, we believe that the simplicity of the calculations and physical reasoning is not a bug, but a feature of the paper.

      We believe that in his criticism regarding various topics of discussion presenting little or speculative new evidence, this reviewer might not have fully considered that most of the evidence provided here is fundamentally a physics-based review of the recent experimental data, incidentally the same type of data previously employed to argue that the RC problem is dramatic in the first place. Likely we didn't convey this message clearly enough in the manuscript.

      While the arguments against OHCs as regulators are not all new, they are often ignored (or perhaps forgotten) and we believe there is a value in synthesizing them all in one place. The support for these arguments comes from fundamental hydrodynamic principles, previous modeling studies, and most importantly from OCT data collected over the last 6 years. Of course, the discussion on the plausibility of suggested mechanisms lacking a concrete proposal cannot be 100% “analytic”.

      About noise and signal amplification, the missing piece perhaps is that distributed internal noise sources (e.g., thermal and shot noise) are independent of each other and hence spatially incoherent. While the manuscript doesn’t specifically deal with signal vs. noise amplification in cochlear models, spatially distributed amplification is known to boost signals more than internal noise—a principle universally used in telecommunications and addressed in >60-year-old literature.

      Reviewer #3 (Public Review):

      This paper discusses the effect of the low-pass filtering between outer hair cell transducer current and receptor voltage. The filter's cut-off frequency (where the response is down by a factor of 0.71 of its maximum) can be quantified by the resistance and capacitance of the cell hair cell's basolateral membrane. The capacitance value is determined mainly by the lipid membrane and is augmented by the charge movement of the piezoelectric prestin molecule, which endows the OHC with its electromotile properties. The OHC's capacitance (C) value is pretty well known. The resistance (R) is determined mainly by K+ channels in the basolateral membrane, a value that is also known reasonably well. The low-pass cut-off frequency is equal to (2pi*RC)^-1 and has a value of a ~1 to a few kHz - a value that has both experimental and theoretical support. The low-pass filtering of membrane voltage is important because the cell responds to membrane voltage by shortening and lengthening - this electromotility is thought to be key to the cochlea's operation and in particular to cochlear amplification, the process that enhances the magnitude and tuning of the cochlea's passive response to sound. However, the auditory system works to 80 kHz and even higher in some animals. Thus, it has been posed (let's say by team A) that the RC cut-off frequency value of a few kHz makes electromotility too slow to operate "cycle-by-cycle" up to several 10s of kHz. The article under review, representing team B, supports "cycle-by-cycle" action, arguing that the several kHz cut off frequency is not a problem and is even an advantage.

      The arguments put forward in favor of cycle-by-cycle action are:

      1. The size of the motions, even with the low-pass-filtered attenuation are as large or larger as those measured in the cochlea at high frequencies.

      2. Noise is often increasing as frequency decreases, thus low-pass-filtering is actually good, to reduce the predominantly low frequency noise.

      3. Harmonic distortion is at supra-CF frequencies, so it's good if the hair cell is low-pass-filtering to reduce harmonics.

      These three points are reasonable, and the quantification relating to statement 1 is convincing. However, the quantification associated with point 2 is muddled. The hair cell voltage signal is expressed in volts, but the noise value is given in terms of the current mediated by 1-5 channels. A quantitative comparison should be made, with signal and noise expressed in the same units, preferably volts and volts/root(Hz), with a bandwidth estimated. The appendix attempts to be more quantitative and something like that short appendix should be incorporated into the paper. If a quantitative comparison in standard units is not possible with current data, that can be stated and underscores that we really don't know whether the noise is a problem for cycle-by-cycle amplification. Point 3 is reasonable and nicely illustrated in Fig. 3B. I did not get anything from Fig. 3A and the corresponding discussion on page 8 lines 320-335. Panels C and D were under-explained and could be removed, and the caption's reference to "short wave hydrodynamics" was also under-explained.

      The arguments put forward to challenge gain control mechanics, which employ DC shifts to set effective operating conditions:

      1. Operation based on DC and quasi-DC operating points is sensitive to noise, which as noted above is often increasing as frequency decreases.

      2. Operation that employs a DC shift for operating point is likely to work in such a way to reduce stiffness, which has been shown to be inconsistent with active cochlear responses. For example, stiffness reduction would reduce traveling wave wavelength and thus alter the response phase and timing to a degree that has not been observed experimentally. This has long been known and relevant papers are cited.

      Point 4 was not convincing to me because the motions related to setting operating conditions could be larger than the nanoscale cycle-by-cycle response motions - thus these operating point motions could be above the noise values that seem limiting to cycle-by-cycle amplification. Point 5 is a nice reminder of the conclusion that, based on experimental findings and physics-based basic cochlear models, the cochlear amplifier must work by means of energy injection. This point was made clearly by Kolston (well cited in this paper) and later supported by other work.

      The present paper is informative in many ways and offers useful insights for further exploration. It is nicely written and illustrated. Because the signal and noise values are not quantified, the basic claim, that the cochlea amplifier can amplify a noisy signal effectively, is not convincing and that basic question is still unsettled. Overall, the paper would be improved if the claims and arguments were presented more tightly, with fewer digressions, and more modestly.

      We thank reviewer #3 for the many comments and suggestions.

      We agree that plotting the spectral density of a “near-threshold” OHC signal vs. inherent electric noise results in much simplification. Regarding noise and signal amplification, previous work on transmission lines points out that amplification is the way to increase SNR along the line.

      We believe that part of the undergoing confusion is that the problem is not how OHC can amplify a “noisy signal” —the cochlea amplifies “noisy” sounds similarly as it amplifies pure tones— but how OHCs can amplify signals in presence of internal noise. Amplification and detection are two distinct things, and signal amplification does not rely on detection. Detection is an intrinsically nonlinear decision process (e.g., signal present/absent). Amplification in relevant frequency ranges is what allows to detect signals in the real world (e.g., radio receivers). The cochlea (as portrayed by classic theories) does not seem exceptional in this regard.

      We agree that the effect of noise on DC responses is not very clear in the manuscript. Although it is difficult to make quantitative statements on a hypothesis that lacks a concrete mechanistic proposal, ~63% of (inherent) electric noise power is confined below the RC corner frequency, i.e, the frequency band of the regulatory OHC. In presence of (unavoidable) flicker and brown noise (e.g., Brownian motion of stereocilia), this percentage can only increase. Conversely, in the frequency band of OHC cycle-by-cycle amplification, the noise power is only a tiny fraction of the total.

    2. Reviewer #1 (Public Review):

      The paper correctly identifies two biophysical properties that may impact an OHC contribution to cochlear amplification. These are the membrane RC time constant and prestin kinetics. The RC problem was identified by Santos-Sacchi 1989 (1) based on measures of OHC membrane capacitance, electromotility (eM) and published OHC resting and receptor potential data. At issue was a 20 dB disparity between threshold BM measures and eM when the resting potential (RP, ~ -70 mV)) is displaced from the voltage at maximal eM gain or peak NLC (Vh; ~ -40 mV). If RP were actually at Vh then the problem would not have been identified, assuming that prestin's voltage-responsiveness were frequency-independent, which was not in question at that time. Over the last two decades several groups have found prestin performance to be low pass. Isolated OHCs, macro-patch and OHCs in situ cochlear explants all show this low pass behavior. To date, no manipulations of load have pushed the voltage responsiveness to frequency-independent. This manuscript tries to avoid the kinetics issue and attempts to focus on the RC problem that has been dealt with extensively since 1989, including at that time a suggestion that the RC problem points to the dominance of the stereocilia bundle (2).

      The authors suggest that kinetics of prestin is not addressed in the current manuscript, but this is not the case. In ignoring the paper from Santos-Sacchi and Tan 2018 (3), reliance on Frank et al.'s (4) data explicitly utilizes their kinetic results. OHC84 (so-called short cell, 51 um long) is essentially frequency-independent after microchamber voltage roll-off correction. The authors choose 1 nm/mV gain at 50 kHz to work with in their arguments. As it turns out, the corrected eM of OHC84 is wrong since it does not fix the reported 23 kHz microchamber voltage roll-off. While OHC65 is appropriately fixed, OHC84 is over compensated. Gain at 50 kHz should be about half the chosen gain. This is not the most problematic issue for their arguments, however.

      In Santos-Sacchi and Tan 2018 (3) we show that low frequency (near DC) eM gain for OHCs averaging 55.3 um long is about 15 nm/mV. This indicates, as noted in that paper, that the resting potential of OHC84 was far shifted from Vh, accounting for its wide-band frequency response. If indeed, the authors still maintain that OHC eM is frequency-independent, ala Frank et al. (and in disregard to other publications where, to the contrary, eM gain would be far less at 50 kHz - see (5, 6)), then the eM gain at 50 kHz should be closer to 15 nm/mV; large enough, I think, to make their RC problem exercise overkill. That is, even in 1989 such a gain would not have suggested an RC problem. This is assuming that the normal resting potential is at Vh. Of course, at Vh membrane capacitance would be about twice that of linear capacitance (due to peak NLC) - the cell time constant does not discriminate against source of capacitance. All in all, isolated OHC biophysics that provides the voltage dependence and the kinetics of prestin cannot be ignored to deal with the RC problem in isolation. Doing so will give a false sense of how the cochlea works, and will encourage others to neglect, without rationale, published pertinent data, as with the Sasmal and Grosh 2019 (7) model where the OHC is treated as a frequency-independent PZE device.

      Finally, to scorn the significance of component characteristics comprising the whole cochlea, e.g., based on isolated OHC biophysics or prestin's cryo-EM structure, as a fallacy of composition suffers itself from hasty generalization. Of course, knowing the biophysics of single OHCs informs on the system response. Otherwise, the prestin KO would have been an unfunded goal, never allowed to pass beyond a system modeler's review. Indeed, the authors would have none of the "carefully" chosen data to present their RC counter argument. Pertinent, published biophysical characteristics must be included in any critical discussion on OHC performance. For that matter, cochlear modelers must follow the same rule.

      1. J. Santos-Sacchi, Asymmetry in voltage-dependent movements of isolated outer hair cells from the organ of Corti. J. Neurosci. 9, 2954-2962 (1989).<br /> 2. A. J. Hudspeth, How the ear's works work. Nature 341, 397-404 (1989).<br /> 3. J. Santos-Sacchi, W. Tan, The Frequency Response of Outer Hair Cell Voltage-Dependent Motility Is Limited by Kinetics of Prestin. J. Neurosci. 38, 5495-5506 (2018).<br /> 4. G. Frank, W. Hemmert, A. W. Gummer, Limiting dynamics of high-frequency electromechanical transduction of outer hair cells. Proc. Natl. Acad. Sci. U. S. A. 96, 4420-4425 (1999).<br /> 5. J. Santos-Sacchi, D. Navaratnam, W. J. T. Tan, State dependent effects on the frequency response of prestin's real and imaginary components of nonlinear capacitance. Sci. Rep. 11, 16149 (2021).<br /> 6. J. Santos-Sacchi, W. Tan, Complex nonlinear capacitance in outer hair cell macro-patches: effects of membrane tension. Sci. Rep. 10, 6222 (2020).<br /> 7. A. Sasmal, K. Grosh, Unified cochlear model for low- and high-frequency mammalian hearing. Proc Natl Acad Sci U S A 116, 13983-13988 (2019).

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This is a well-executed and interesting study addressing a still controversial issue in clathrin-mediated endocytosis, namely the nature of curvature generation during formation of endocytic clathrin coated vesicles. The authors have applied new techniques to this old question, including state-of-the-art high resolution 3D single-molecule localization microscopy (SMLM, i.e. Super-resolution microscopy), a new maximum-likelihood based fitting framework to fit complex geometric models into localized point clouds (Wu et al., 2020, BioRxix) and mathematical modeling leading to a new cooperative curvature model of clathrin coat remodeling and temporal reconstruction of CCP structural dynamics based on the distribution of static super-resolution images. This is an important contribution, but will it resolve the controversy of constant curvature vs constant area for CCP invagination? I doubt it. In some ways the controversy is somewhat contrived and, as this paper shows the answer is unlikely to be either or. Below are some specific comments, in somewhat random order, from someone (a curmudgeon?) who has reviewed and/or carefully read these papers since 1980. Points that the authors should address are in bold. All can be addressed with modifications to the text, as the one experiment I asked for (quantification of clathrin recruitment) is impossible with this approach).

      • I wonder how many people who cite Heuser's 1980 paper have ever read it carefully. Indeed, many of the observations made here were also made by Heuser. Below, for example, is a summary I wrote, but then removed from a review as it was too lengthy "While Heuser favored the model that CCPs assemble first as flat structures and then rearrange during invagination, he was also careful to note several caveats. First, he observed that the edges of CCPs were 'ragged', likely reflecting sites of assembly of new polygons and that pentagons were more abundant at the edges. Thus, he argued that 'if even a few of these edge pentagons were destined to become completely surrounded with hexagons, it would be necessary to conclude that some degree of curvature can be built into coats as soon as they form". Second, by examining tilted sections he observed that "even the flattest baskets have a small degree of inward curvature, and many were complete hemispheres". Finally, he cautioned that his images were snap-shots and a precursor-product relationship could not, therefore, be unambiguously established and that the very large flat lattices he observed might well be 'prove to be some sort of dead end'. We now know that fibroblasts, in particular, have large numbers of static flat clathrin plagues."

      Thus, many of the author's conclusions, i.e. that 'completely flat clathrin coats are rare (pg 12, although they're not numbered), and that curved structures can be seen to emerge from the edges of flat lattices (see Supplemental Figure 1a, 3 examples on the right) are indeed consistent with Heuser's observations. In many ways, Heuser's 1980 paper is used as a straw man argument for the constant area model. The authors should more accurately cite and acknowledge this seminal paper.

      Response: __We thank the reviewer for this insightful and constructive input on the interpretation of the constant area model (CAM). We have revised the discussion (Page 14, Lines 397-402), citing Heuser’s observations more carefully and in similarity of what was already suggested eloquently by the reviewer. We agree that the strict interpretation of the CAM is misleading, and early evidence already suggests its flawed approximation of the endocytic mechanism (further mentioned now on __Page 15, Lines 429-431).

      • As Heuser did in his 1980 classic, the authors here would do well to note several caveats related to their analyses. These include:

      +

      Like Heuser they have assembled static imaged to create a pseudotemporal model, albeit using a much more quantitative approach. Nonetheless, it seems that this assumes only a single, stereotypic pathway for CCV formation. How good is this assumption? We know from dynamic imaging that there exists significant heterogeneity in both the kinetics and the molecular composition of CCPs. The authors should acknowledge this limitation.

      __Response: __We agree with the reviewer that the lack of direct temporal information is a clear limitation of our approach.

      We now introduce this limitation on Page 16, Lines 474-484, where we discuss the disadvantage of reconstructing an average trajectory based on static images. Here, the assumption of a single, stereotypic pathway of endocytosis is addressed. We cannot exclude the possibility of slight mechanistic variations being averaged out using our approach. However, we want to highlight the fact that our approach seems sensitive enough to distinguish between structures that originate via endocytosis, and structures that derived from a different pathway, potentially from the Golgi.

      We further address the kinetic variability in terms of abortive events on Page 14, Lines 405-411, __and discuss their effect on the mechanistic interpretation of our results. Generally speaking, abortive events are characterized as dim and short-lived structures in live-cell acquisitions. As the earliest structures in our data set already contain half the final coat area, we are most likely not capturing these abortive events in the first place (potential technical reasons for not capturing earlier structures are discussed on __Page 14, Lines 385-395).

      • The method, which required that they 'optimized the sample preparation to densely label clathrin at endocytic sites' involves labeling cells to near saturation with rabbit polyclonal antibodies to both clathrin light chains and clathrin heavy chains followed by detection with a second polyclonal donkey anti-rabbit. This gives 20 nm of additional and presumably flexible linker on the label. How might this effect the measurements and modeling? The Wu et al paper, which BTW has not been peer-reviewed, shows high precision fitting of the nuclear pore structure, but using endogenously tagged NUP-95, not two-layers of antibodies. The authors will need to discuss this limitation, it is my biggest concern regarding the analysis shown.

      Response: __We acknowledge the limitations imposed by indirect immunolabelling and formulated a hypothesis on how this could affect our model fit (mentioned on __Page 13, Line 363, illustrated in Supplementary Figure 6). A larger linkage error between label and target molecule would increase the distribution of localizations around the true underlying structure. As LocMoFit fits our spherical model directly to the localization coordinates, it is able to take this distribution into account, and will weigh the fit results based on the uncertainty of the localization estimation. A uniform distribution of labels around the true underlying structure should therefore be fitted accurately also at larger linkage error. A non-uniform labeling could occur should e.g. the densely crowded space between the coat and the plasma membrane not allow for the diffusion of the antibody to the clathrin epitopes. In that case, labeling would be one-sided, and instead of the true underlying structure, LocMoFit would optimize the spherical model to the highest probability density of label around + 10 nm from the true clathrin coat. This would result in an overestimation of the radius by the model, which we could correct by substracting 10 nm from the experimentally determined radius. This was done in Supplementary Figure 6 for the hypotheses of (1) uniform displacement by the antibodies; (2) biased displacement of the antibodies towards the cytosol; and (3) biased displacement of the antibodies towards the plasma membrane. Whilst we see that the fitting parameters scale with the corrected radii, the mechanistic interpretation of partial flat pre-assembly on the membrane, and subsequent bending and surface area growth still holds true.

      • One reason for continued controversy in this field is the lack of rany attempt to resolve findings obtained using different methods. Can a parsimonious explanation be found, or are their artifacts or misinterpretations of previous findings that can explain the discrepancies? Any valid model should fit all of the valid data. For example, the authors fail to cite a recent paper by Willy et al in Dev Cell (PMID 34774130), which has been on BioRxiv since 2019 (doi: https://doi.org/10.1101/715219). Here, similar to this present study, the authors used high resolution SIM-TIR to analyze ~1000 CCPs in 3 different cells lines (sadly non-overlapping with the cells used herein) and in Drosophila embryos to quantitatively test the two models. They conclude that their findings unambiguously support a constant curvature model. The authors would do the field a favor if they carefully read this paper and identified areas of commonality (i.e. that curvature is detected at early stages in both cases) and possible explanations for the discrepancies. Certainly, they should not ignore it.

      Response: __We agree with the reviewer on the importance of consolidating findings from different studies to converge to a generally accepted mechanism of clathrin coat formation. We had indeed cited Willy et al in the introduction, but agree that further discussion of their findings should be included. We therefore discuss their findings in more detail, also in comparison to our work, on __Page 17, Lines 502-511. We agree that we reach contradictory conclusions, which we think lies at least in part with the way that Willy et al. analyze their data. Willy et al. acquire 2D projections of the endocytic clathrin structures, whose size is just at the limit of their image resolution. They then compare their projected sizes to a purist constant area model, which assumes that a coat has to grow to its entire surface as an entirely flat structure and then instantaneously snaps to an increased curvature, resulting in a sudden drop of the projected area (footprint). As we and others (e.g. Bucher et al 2011, Heuser, 1980) have observed, completely flat lattices are rare, and curvature is initiated before final surface area is acquired. We do not agree that the absence of a purist constant area model implies that clathrin mediated endocytosis follows a constant curvature trajectory. Instead, we imagine that our cooperative curvature model is likely to fit well with the observations of Willy and colleagues.

      • An important body of evidence that is not considered in their model or discussion is that derived from live cell imaging. In addition to the heterogeneity mentioned above, studies have shown that the clathrin addition to CCPs is complete (i.e. the growth phase) occurs within the first ~20-30s, followed by a variable length (0->100s) plateau phase (Loerke et al, PMID 21447041). Both the current study and the Willy et al study admit that they may not be able to detect the earliest intermediates in CCP assembly. Indeed, in this study the smallest surface area CCPs are only 2-fold smaller than the largest CCPs, suggesting that over half of the triskelions have been recruited before a CCP can be distinguished from the background of clustered, nonspecifically-bound antibodies. Could the authors be monitoring events during the plateau phase and not the earliest events? Regardless, the findings are important as they address the nature of curvature generation during this plateau phase. While monitoring curvature generation during early events in CME, a recent study (Wang et al., eLife, PMID 32352376) showed that the acquisition of curvature within the first 20s of CCP assembly was a distinguishing feature between abortive and productive events. The authors might discuss how these studies on CCP dynamics might (or might not) inform their models.

      __Response: __We thank the reviewer for this very insightful comment and discuss this hypothesis on __Page 16-17, Lines 485-511. __We suggest that part of the initiating/growth phase observed in live-cell dynamics falls into the fast, flat assembly that we are unable to capture with our approach. It is challenging to clearly identify at which point in real-time we are detecting our earliest sites. We would however argue that the plateau phase in real-time could coincide with curvature generation and final addition of triskelia at the lattice rim. The variability in the duration of this plateau phase could therefore result from variable recruitment speed of triskelia and other factors during the finalizing of the vesicle neck.

      • The authors advertise 'quantitative' description of clathrin coated structure and indeed their measurements and models are quantitative; but there is no measure of intensity/numbers of triskelions and CCP growth: an important piece of quantitative data. I expect this is impossible with indirect immunofluorescence but should be considered as a limitation of the approach. Indeed, to my knowledge no one has yet quantitatively measured curvature generation in parallel to clathrin addition at CCPs (closest is Saffarian and Kirchhausen, PMID 17993495), but they don't discuss the relationship.

      Response: __We agree with the reviewer that quantifying the number of triskelia would be an essential piece of information to correlate area growth and curvature generation with dynamic information retrieved from fluorescence intensity in live-cell studies. Unfortunately, the indirect immunolabelling approach used in this work complicated this quantification, and direct comparison between number of localizations and fluorescence intensity cannot be made. However, we do observe a correlation between coat surface area and number of localizations in our data and show this in the newly added __Supplementary Figure 7. This allows us to formulate the hypothesis on Page 16-17, Lines 485-511, which suggests that the plateauing of fluorescence intensity coincides with curvature generation and final triskelia addition to the coat rim. We further highlight the necessity of capturing both high spatial and temporal resolution simultaneously, to ultimately overcome this limitation.

      • On page 7 equation 1, you assume a constant growth rate for addition of triskelia, but later describe that the rate might be cooperative (as the number of edges increases). How would this affect your modeling?

      Response: __We formulate the __surface area growth rate of the clathrin coat to be proportional to the rim length with a constant____ rate. The cooperativity between clathrin molecules we consider to affect the rate of curvature generation. The more molecules are present, the more the entire coat is inclined to bent. We rephrased that section to emphasize this distinction (Page 8, Line 217).

      Minor points:

      • Can you indicate in the first paragraph of the results that you are using indirect immunofluorescence with rabbit anti-CLCA, anti-CHC and detection with donkey anti-rabbit for labeling, to augment the rather vague statement 'we optimized the sample preparation to densely label clathrin at endocytic sites'.

      Response: __We added a clear indication on the labelling strategy used in this work on __Page 4, Lines 109-110.

      • I'm not comfortable with the conclusioin on page 5 that your data 'indicates that at the time point of scission, the clathrin coat of nascent vesicles is still incomplete'. Other explanations might be the relative kinetics of scission vs CCP growth (i.e. these structures are too transient to detect), or that deeply invaginated pits are sheered-off the membrane during sample preparation (there is evidence that most biochemically isolated CCVs are derived from sheered CCPs).

      Response: __We extended the explanation for the absence of fully closed vesicles with the hypotheses mentioned by the reviewer on __Page 5, Lines 159-161.

      • Bottom of page 5, can you briefly mention what data is shown in Supplemental Figure 2 (ie. Figure 2D and examples of likely non-endocytic CCPs shown in Supplemental Figure 2). When I read this, I questioned your speculation.

      Response: __We clarified the cross reference to (now) Supplementary Figure 3 accordingly on __Page 6, Lines 184-185.

      • Can you indicate N CCPs from N cells in the data in Tables 2-3 for fibroblasts and U2OS cells? Do you observe and have to ignore a larger number of flat/clustered CCPs in the fibroblasts?

      Response: __We indicated the number of cells and sites per data set in the Table captions on __Page 36, Lines 51; 959; and 967. We did not quantify the number of flat/clustered, plaque like structures in our data sets. During data acquisition, we would specifically select cells with minimal number of these structures present, and even within this cell chose an area in the periphery exhibiting low number of plaques. Our data is therefore not ideal to reliably quantify plaque density between different cell lines. Qualitative observations showed that whilst we had to disregard a few cells from the U2OS and SK-MEL-2 cell-lines due to high plaque formation, the 3T3 fibroblasts were relatively straight forward to image, as few cells showed high plaque density. A recent study by Hakanpää et al., 2022 (bioRxiv) showed the decreased formation of plaques when cells were seeded on fibronectin. The fact that fibroblasts excrete their own fibronectin agrees well with our observations of relatively few 3T3 cells exhibiting extensive plaque formation.

      • The last 3 paragraphs of the Introduction are results. The Introduction might best be used to review literature in more detail, discuss the reasons why uncertainty still exists and perhaps indicate how the methods applied here will help.

      Response: __We re-wrote the last 3 paragraphs of the introduction, now clearly stating the knowledge gap in the field, and what methods would be required to bridge it (Page 3, Lines 80-102).__

      Reviewer #1 (Significance (Required)):

      This is another excellent addition to a growing list of papers seeking to define the process of curvature generation at endocytic clathrin coated pits. In my opinion, its impact would be increased by better integrating the results presented here with other studies and methods, including the recent paper by Willy et al and the large body of literature on coated pit dynamics, some of which might be relevant in interpreting results, or at least placing them in a real vs pseudo-temporal perspective. The methods introduced and the quality of imaging, modeling and quantification further increase the study's significance. The finds will be of interest to those in the CME field, those studying membrane curvature generation in other contexts, those modeling CME, vesicle formation and curvature generation and those using SMLM to discern the structure of macromolecular assemblies.

      Reviewer expertise: Clathrin-mediated endocytosis (Sandra Schmid)

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary In this article, the authors aimed to investigate the dynamic of clathrin lattice during clathrin-mediated endocytosis (CME). Overall, they successfully achieved the goal by observing a large number of clathrin spots from several cell lines with 3D single-molecule localization microscopy (SMLM). With the help of this high-resolution imaging technique, they were able to describe the physical properties of each spot and reconstruct the assembly and remodeling of the clathrin coat. Moreover, by comparing the constant area/curvature model with their own data, the authors highlighted that neither of the prevailing models perfectly explained what they observed and proposed 'cooperative curvature model'. With the novel model, the authors were able to reconstruct the clathrin coat remodeling in different cell lines and concluded that the simultaneously bending and assembly of the clathrin coat is a homogenous property of endocytosis.

      The experiments and analytical procedures are well-designed and performed, and the manuscript is well-organized. The conclusion 'cooperative curvature model' was deduced from a large amount of data analysis and clearly stated in the text. I would like to recommend its publication if the following issues will be clarified.

      Major comments:

      1. The authors compared the morphological dynamics of clathrin-coated pit among three different cell lines (SK-MEL-2, U2OS, and 3T3) and found slight differences. As U2OS cells was derived from bone tissues, it has different mechanical properties (membrane tension, elasticity of cortical layer, etc..). It would be interesting to consider those mechanical properties in understanding the morphology (Figure 2) and progress (Figure 4) of the CME. Considering the fact that the bending energy of the plasma membrane is dependent on the membrane tension, they may be able to find some relationships between mechanical properties of the cell cortex and CME.

      __Response: __We thank the reviewer for this comment and very much agree that the relationship between mechanical properties structural adaptation of the endocytic machinery is a highly interesting question. We came to the same conclusion and are therefore exploring this relationship at the moment. This is however not a straightforward task, and the complex nature of plasma membrane mechanics necessitates careful experimental design. It is therefore outside the scope of this publication. We do think this point further highlights the potential of the method presented here, as it allows the investigation of additional principles in clathrin-mediated endocytosis mechanics. We do hope to share our insights on this topic soon.

      In Figure 4, the authors estimated the progression of the CME using the frequency distribution of theta. However, I wonder how they handled the events which were aborted in the middle of the CME. It had been suggested that some CME are aborted during the initial step of the CME. The authors should consider (at least discuss) those abortive events, which can disturb the analysis.

      Response: __Generally speaking, abortive events (now discussed on __Page 14, Lines 405-411) are characterized as dim and short-lived structures in live-cell acquisitions. As the earliest structures in our data set already contain half the final coat area, we are most likely not capturing these abortive events in the first place (potential technical reasons for not capturing earlier structures are discussed on Page 14, Lines 385-395).

      Abortive events throughout the later process of endocytosis would, according to our data, still follow the same mechanistic trajectory as other sites. They could potentially slightly skew our pseudotime analysis, as they would result in an overestimation of specific endocytic stages. The overall mechanistic insight of our work would not be greatly affected, as curvature generation would still occur according to the same trajectory. Due to the low impact on our overall results we do not discuss these late abortive events further.

      Minor comments:

      1. Page5, result section 2. The author should further explain why vesicles from trans Golgi could responsible for the small disconnected set of data points corresponding to the vesicles with larger curvatures.

      Response: __We extended our explanation for the presence of non-endocytically derived structures in our data set on __Page 6, Lines 184-189. We further extended the supplementary information with an additional experiment (Supplementary Figure 4), highlighting the absence of AP2-positive structures within the disconnected population. As AP2 is a specific marker for CME, these results further solidify our hypothesis. Further experiments would be required to determine their exact origin, and are outside of the scope of this publication.

      Page7, line 6. The author assumed that the clathrin coat starts growing on a flat membrane. However, as is mentioned in the discussion, clathrin has been proved to have curvature sensing ability which could be further amplified by adapter proteins by several times (Zeno et al., 2021). So, it seems that clathrin preferred a highly curved membrane instead of a flat one. Is it still reasonable to make this assumption?

      Response: __Whilst our assumption states the growing of clathrin coat on flat membranes, we do not restrict our model to an intercept through 0, and it would therefore still hold true even in the case of growth starting on slightly bent membranes. The impact of the preference of clathrin for curvature is considered as a potential mechanistic explanation for the positive feedback in curvature generation described by our model. We therefore already cite the reference mentioned by the reviewer on __Page 8, Line 224.

      As we do observe flat structures in our data set (discussed more in detail now on Page 14, Lines 396-404), we still think the assumption of early flat growth holds true.

      Page 9, result section 4. In the sentence: "we effectively generated the average trajectories of how curvature, surface area, projected area and lattice edge change during endocytosis in SK-MEL-2 cells (Figure 4B-E)." Here I think the authors are describing Figure 4C-F.

      __Response: __That is correct, an oversight on our part. We changed the cross-reference.

      Page 11, discussion. In the sentence: "A deviation of the cross-sectional profile from a circle is nevertheless preserved in the averaging (Supplementary Figure 5)." I didn't see supplementary figure 5 in the article.

      Response: __We changed the cross-reference. We were addressing a subsection of __Supplementary Figure 8.

      Reviewer #2 (Significance (Required)):

      From a vast amount of microscopic images and data analysis, the manuscript gives a clear model on the progress of the CME, which integrates two opposing models; constant area and constant curvature models. This is a big progress in our understanding of the molecular mechanism of CME, and will attract many researchers in the field of cell biology. From a viewpoint of my expertise (molecular imaging of plasma membrane and endocytic processes), this manuscript has significant impact on the related research fields.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors used single-molecule localization microscopy of clathrin in fixed cells (2 human cell lines, one mouse) to capture snapshots of a clathrin-mediated endocytosis (CME), fitted these localizations to a geometric model of a forming vesicle, and used these fitted measurements to test existing models of clathrin-mediated vesicle formation before refining their own. Specifically, the closing angle, a measure of vesicle completeness, was used as a proxy for growth-stage of the vesicle such that the many captured snapshots could reconstruct a pseudo-timeline with an unknown parameterization of time on closing angle. Two standard models of CME vesicle formation, where the surface area is kept constant or where the curvature is kept constant, were examined and determined to be incommensurate with the pseudo timelines of curvatures and surface area. The authors then describe their own model for CME vesicle formation, in which neither surface area nor curvature are constant in evolution of the vesicle, and cooperative forces are hypothesized to non-linearly modulate the curvature-growth as a function of closing angle. Additionally, by binning snapshots and then aligning, scaling, and azimuthally smoothing each bin, they reconstruct representations of distinct endocytic stages.

      Major comments:

      Most results are quite convincing, and the authors do a nice job of displaying examples of SMLM data, both with fit results as well as example clathrin assemblies that are too far removed from their budding-vesicle model to be included for analysis, for example. It is also worth noting that the clathrin images themselves appear to be very high-quality - clearly, as detailed in the methods, attention was given to each step of the imaging and reconstruction process.

      While the presented cooperative curvature model seems reasonable and surely fits the curvature-, surface area-, and rim length-vs. closing angle data better than the simplistic constant surface-area and constant curvature models, it also has more parameters, namely: gamma (the initial rate of curvature change with closing angle) and H_0 (the final preferred curvature). It would be appropriate to calculate an information criterion (e.g. Bayesian), using an assumption of Gaussian-distributed errors (presumably the data fitting in R was least squares, so this would match) to justify the additional parameters.

      Response: __This is an important observation by the reviewer. Indeed, our model uses one more parameter compared to the models we compare it with. To justify this, we performed the calculation as suggested by the reviewer, and found that the cooperative curvature model (CoopCM) indeed results in the lowest BIC (__Supplementary Notes). We therefore are confident that out of the three models tested in this work, our CoopCM fits best to the underlying experimental data (Page 8, Lines 232-235).

      A related issue relates to the error in the extracted value of the closing angle from a single 3D reconstruction - the error distribution should be quantified for this very important parameter. The errors in the other parameters extracted from the fits are less important, but would enhance the paper.

      Response: __We thank the reviewer for pointing out the importance of the estimation error of the key parameter closing angle. To address this point, based on the geometrical model, we simulated clathrin-coated structures with closing angles evenly distributed across the entire range (0-180°). This realistic simulation represents the data quality (e.g., localization precision and labeling efficiency) of the experimental data (corresponding methods are included in __Pages 22- 23, Lines 679-706). The result of fitting these structures using LocMoFit shows an unbiased estimation with small spread of the error (overall STD = 2.82°; see the newly included Supplementary Figure 2a).

      Pseudo-temporal sorting on closing angle makes sense and I appreciate the authors mentioning potential caveats to the monotonicity, etc. However, a comment about the impact of closing angle errors on the pseudo-time determinations would be helpful. The agreement of theta-rank plots with the hypothesized sqrt(t) scaling is reassuring.

      I additionally appreciate the robustness of fitting a geometric structure from localizations rather than relying on pseudo-temporal sorting on clathrin count extracted from localization-merging of multi-blinking emitters.

      Response: __The pseudo-temporal sorting is based on the precisely estimated closing angle, and therefore is also precise, as the distribution of the fitted closing angle has no significant distortion compared to the expectation (__Supplementary Figure 2b).

      The authors did a nice job of qualifying their more speculative claims, in particular I appreciated their mentioning the possibility that smaller clathrin coats could be below their detection limit.

      The authors state a set of data points in suppl. figure 2D (and suppl. Fig 3A-C) are "likely" small clathrin-coated vesicles from the trans Golgi. I appreciate the examples rendered in that figure so a reader can appraise, but if they have my background they might not know how reasonable exclusion of this data is from model testing. This claim could be rephrased or the rationale expanded upon to justify the Golgi hypothesis.

      Response: __We agree with the reviewer and further expanded on our hypothesis on the origin of the structures within the disconnected cloud of data points (Page 6, Lines 184-189). We further performed an additional experiment (Supplementary Figure 4)__, where we simultaneously imaged the clathrin coat at high resolution, and the CME specific AP2 complex tagged with GFP at diffraction limited resolution. We observed that there were no AP2-GFP positive structures present in the disconnected cloud of our data set, and conclude that these structures indeed must originate via a different pathway.

      The data and methods are presented such that they could be reproduced, and replicating their experiment in multiple cell lines, across multiple species, would seem to be adequate replication. As mentioned above, the statistical analysis of whether the model complexity is justified by improved goodness of fit is currently missing but can readily be checked and added.

      Minor comments:

      Last paragraph of the introduction, positive feedback is mentioned but not the slowing down as preferred curvature is realized (inclusion of which might help foster a clearer understanding of the model early on).

      Response: __We now mention the slowing down towards a preferred curvature in our introduction on __Page 3, Lines 100-102.

      In Fig. 1, please state in the figure caption what is being displayed in the two large panels and what is the color map. Is this the 3D data from the overlapping elliptical Gaussians projected on the plane in a "hot" map? Further, in the top right small panels, are the x-y images projections of all z, or measured at a specific z?

      Response: __We adjusted Figure 1 and the figure caption to clearly explain what is mentioned in each superresolution panel. The exact details for image rendering, including the color map and gaussian blurring of the localization coordinates are now described in the methods on __Page 21, Lines 625-627. Ultimately, the x-y images represent an enlarged view of the projections as visible in the previous two panels. We hope that rephrasing of Figure 1 legend clarifies this accordingly.

      In Eqn. (1), epsilon is not defined.

      Response: __The definition is mentioned on __Page 8, Line 210, right before the equation, same as for kon.

      For the theta-rank plots (Fig4 B, SFig D-F ii) moving the theta(t)=sqrt(t) red curves behind sorted theta data would make the data easier to see.

      __Response: __We adjusted the Figures according to the reviewer's suggestion.

      "Laser" in sentence about the speckle reducer should probably be plural.

      Response: __We corrected this grammar mistake, and changed “laser” to “lasers” on __Page 20, Line 586.

      I would like to see the "custom" algorithm based on redundant cross-correlation for drift correction briefly described.

      Response: __We added an explanation on the algorithm used for the drift correction on __Pages 20-21, Lines 611-617.

      A legend for supplemental figure 3 A-C would be nice.

      Response: __We added a legend for the various models in (now) __Supplementary Figure 5, and further made some clarifications in the figure caption.

      If the definition of the abbreviation flat-to-curved-transition as FTC was explicit I missed it.

      Response: __As we do not use this abbreviation anywhere else in the manuscript, we removed it from the __Supplementary Note to avoid confusion.

      Resolution of 20 and 30 nm (laterally and axially, respectively) was quoted once towards the beginning of the manuscript as being an improvement resulting from the localization method described in Li et al., 2018. Resolution can be difficult to speak about precisely, but the methods section would seem to indicate that localizations are filtered at 20 nm lateral localization precision (potentially 30 nm axially?), and I think the authors could consider rephrasing to depict this unless I am missing elsewhere a description of the resolution metric being used.

      Response: __The original 20 and 30 nm resolution (laterally and axially) was calculated based on the median localization precision values in x-y and z for a representative image, using the FWHM approach (described in Methods __Page 21, Lines 621-624). After consideration of the reviewer's question, we found the modal value to be a better quantity to calculate the resolution, and changed this in the text accordingly (Page 4, Lines 113-115, and Methods Page 21, Lines 621-624).

      Reviewer #3 (Significance (Required)):

      Proteins involved with inducing curvature in membranes are in general very exciting targets for localization microscopy, yet still for many systems questions remain unanswered. The authors tackle one such question in this manuscript. In other, unresolved, discussions, the posed hypotheses are quite similar to the simplistic models surpassed in this work (e.g. that curvature scales linearly with local protein copy number, or that surface area scales linearly with local protein copy number). The idea of cooperativity may be useful for others to consider, and the authors additionally demonstrate a seemingly smooth workflow using their separately described tools (primarily LoMoFit; Wu et al. 2021).

      I myself am not an expert on CME or vesicle trafficking. My background is primarily in SMLM method development and SMLM / fluorescence image analysis. From my perspective, the novelty of the biological conclusions appears to be the authors' specific cooperative model and the presence of two structural states which are enriched (closing angle 70{degree sign} and 130{degree sign}). As referenced, and authors F. Frey and U. S. Schwarz nicely present in Bucher et al. 2018, the constant curvature and constant surface area models are known to be inaccurate descriptions of CME evolution, and further it is also known that clathrin first assembles small flat structures before beginning to curve the membrane. However, the 3D super-resolution imaging and direct evaluation of a 3D model geometry in this work is a nice extension of the 2D super-resolution imaging and projection evaluation in the authors' previous work studying endocytosis through ensemble averaging in yeast (Mund et al. 2018) as well as the analysis on projections in Bucher et al. 2018. Fully 3D treatment of the clathrin structures allows the authors to orient asymmetric assemblies such that they are averaged out in their ensemble reconstruction, and as they point out the molecular specificity afforded by a fluorescence-based technique ensures unbiased segmentation of clathrin-involved endocytic sites. In other words, while this work does not describe a technical advance not already described elsewhere, it sets a nice example for those researching protein-membrane interactions of how to leverage the right tools to clearly and directly answer their questions. With their additional work to make these tools extensible to other geometries, multiple color channels, etc., I expect their work to inspire quality studies in other systems. That significance is complementary to their proposal of a reasonable model for the geometric evolution of CME.

      References:

      Maximum-likelihood model fitting for quantitative analysis of SMLM data, Yu-Le Wu, Philipp Hoess, Aline Tschanz, Ulf Matti, Markus Mund, Jonas Ries, bioRxiv 2021.08.30.456756; doi: https://doi.org/10.1101/2021.08.30.456756

      Bucher, D., Frey, F., Sochacki, K.A. et al. Clathrin-adaptor ratio and membrane tension regulate the flat-to-curved transition of the clathrin coat during endocytosis. Nat Commun 9, 1109 (2018). https://doi.org/10.1038/s41467-018-03533-0

      Markus Mund, Johannes Albertus van der Beek, Joran Deschamps, Serge Dmitrieff, Philipp Hoess, Jooske Louise Monster, Andrea Picco, François Nédélec, Marko Kaksonen, Jonas Ries, Systematic Nanoscale Analysis of Endocytosis Links Efficient Vesicle Formation to Patterned Actin Nucleation, Cell, 174, 4, (2018). https://doi.org/10.1016/j.cell.2018.06.032.

      s

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      In this article, the authors aimed to investigate the dynamic of clathrin lattice during clathrin-mediated endocytosis (CME). Overall, they successfully achieved the goal by observing a large number of clathrin spots from several cell lines with 3D single-molecule localization microscopy (SMLM). With the help of this high-resolution imaging technique, they were able to describe the physical properties of each spot and reconstruct the assembly and remodeling of the clathrin coat. Moreover, by comparing the constant area/curvature model with their own data, the authors highlighted that neither of the prevailing models perfectly explained what they observed and proposed 'cooperative curvature model'. With the novel model, the authors were able to reconstruct the clathrin coat remodeling in different cell lines and concluded that the simultaneously bending and assembly of the clathrin coat is a homogenous property of endocytosis. The experiments and analytical procedures are well-designed and performed, and the manuscript is well-organized. The conclusion 'cooperative curvature model' was deduced from a large amount of data analysis and clearly stated in the text. I would like to recommend its publication if the following issues will be clarified.

      Major comments:

      1. The authors compared the morphological dynamics of clathrin-coated pit among three different cell lines (SK-MEL-2, U2OS, and 3T3) and found slight differences. As U2OS cells was derived from bone tissues, it has different mechanical properties (membrane tension, elasticity of cortical layer, etc..). It would be interesting to consider those mechanical properties in understanding the morphology (Figure 2) and progress (Figure 4) of the CME. Considering the fact that the bending energy of the plasma membrane is dependent on the membrane tension, they may be able to find some relationships between mechanical properties of the cell cortex and CME.
      2. In Figure 4, the authors estimated the progression of the CME using the frequency distribution of theta. However, I wonder how they handled the events which were aborted in the middle of the CME. It had been suggested that some CME are aborted during the initial step of the CME. The authors should consider (at least discuss) those abortive events, which can disturb the analysis.

      Minor comments:

      1. Page5, result section 2. The author should further explain why vesicles from trans Golgi could responsible for the small disconnected set of data points corresponding to the vesicles with larger curvatures.
      2. Page7, line 6. The author assumed that the clathrin coat starts growing on a flat membrane. However, as is mentioned in the discussion, clathrin has been proved to have curvature sensing ability which could be further amplified by adapter proteins by several times (Zeno et al., 2021). So, it seems that clathrin preferred a highly curved membrane instead of a flat one. Is it still reasonable to make this assumption?
      3. Page 9, result section 4. In the sentence: "we effectively generated the average trajectories of how curvature, surface area, projected area and lattice edge change during endocytosis in SK-MEL-2 cells (Figure 4B-E)." Here I think the authors are describing Figure 4C-F.
      4. Page 11, discussion. In the sentence: "A deviation of the cross-sectional profile from a circle is nevertheless preserved in the averaging (Supplementary Figure 5)." I didn't see supplementary figure 5 in the article.

      Significance

      From a vast amount of microscopic images and data analysis, the manuscript gives a clear model on the progress of the CME, which integrates two opposing models; constant area and constant curvature models. This is a big progress in our understanding of the molecular mechanism of CME, and will attract many researchers in the field of cell biology. From a viewpoint of my expertise (molecular imaging of plasma membrane and endocytic processes), this manuscript has significant impact on the related research fields.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors attempt to optimize the FluoroSpot assay to allow for the assessment of cross-reactive antibodies targeting conserved epitopes shared by multi-allelic antigens and those specific to unique antigen variant at the B cells level. This is a critical aspect to consider when identifying targets of a broad range of cross-reactive antibody for vaccine development and the antigen VAR2CSA used in this work is one that will benefit from the method described in the manuscript.

      Overall, this is a method manuscript with extensive detail of the assay validation process. The description of the assay performance steps using, first monoclonal antibodies and later hybridoma/immortalized B cells was important to understand conditions that can influence the antigen-antibody interactions in the assay. This multiplex approach can assess the cross-reactivity of antibody to up four allelic variants of an antigen with the possibility to explore the affinity of antibody to a particular variant using the RSV measurements. The validation of the assay with PBMC from malaria exposed donors both men and women (that naturally acquired high titer of antibodies to VAR2CSA during pregnancy) is a strength of this work as this is in the context of polyclonal antibodies with more heterogenous antibody binding specificities.

      The ability of the assay to detect cross-reactive antibodies using all four tags appear highly variable even in the context of monoclonal antibody targeting the homologous antigen labelled with all 4 tags.

      We understand the concern for variability, but we think that in general the assay was very consistent. Regardless of the configuration used, we detected strikingly comparable number of spots/well, especially when the homologous antigen labelled with four tags was used (Figure 2A). Similar consistency has been previously reported when a similar assay was used to study cross-reactivity in dengue-specific antibodies.

      Overall, it appears that the assessed antibody reactivity with TWIN tagged antigens was relatively low and this needs to be explained and discussed as the current multiplex method, as it is, might just be optimized for study of cross-reactive antibodies to 3 antigens.

      The LED380 (used to detect and visualize the TWIN tag) indeed gave more background than the other three detection channels. We normally observed a ring of fluorescence at the edge and the middle of the wells, accompanied by lower intensity of the spots. These two characteristics are apparent in the figures and RSV plots presented in the manuscript. In an attempt to reduce these issues, we attempted to substitute the TWIN tag for a BAM tag detected with a peptide-specific antibody (data not presented). However, that approach did not improve the readout and we therefore decided to keep the TWIN-StrepTactin pair for all the experiments. Importantly, even with these issues, routine manual inspection of the wells confirmed the Apex software automatically and efficiently counted “real” spots giving us confidence on the performance of the assay. We acknowledge that exclusion of the LED380 data would lead to higher assay accuracy. However, it would result in reduced ability to assess broad antibody cross-reactivity, which was the primary objective of our study. We have added text briefly discussing this to the revised manuscript (lines 154-160).

      As acknowledged by the authors, the validation of this assay on PBMC from only 10 donors (7 women and 3 men) is a caveat to the conclusion and increasing this number of donors (the authors have previously excelled in B cells analyses of PfEMP1 proteins and would have PBMC readily available) will strengthen the validity of this assay.

      We thank the reviewer for this comment and agree the number of donors tested is far from sufficient to provide any conclusive evidence regarding frequencies of VAR2CSA-specific and cross-reactive B cells in the context of placental malaria. However, we firmly believe that the validation of the assay – which was the objective of the study – is sufficient, especially because we included human B-cell lines isolated from donors naturally exposed to VAR2CSA-expressing parasites. Futures studies including more donors and full-length VAR2CSA antigens are certainly warranted. As the performance of assay has now been validated (this manuscript) to our satisfaction, we are indeed planning such studies.

      Reviewer #2 (Public Review):

      The manuscript describes the development of a laboratory-based assay as a tool designed to identify individuals who have developed broadly cross-reactive antibodies with specificity for regions that are common to multiple variants of a given protein (VAR2CSA) of Plasmodium falciparum, the parasite that causes malaria. The assay has potential application in other diseases for which the question ofacquisition of antibody-mediated immunity, either through natural exposure or through vaccination, remains unresolved.

      From a purely technical/methodological viewpoint, the work described is of high quality, relying primarily on the availability of custom-designed, in-house-derived protein and antibody reagents that had, for the most part, been validated through use in earlier studies. The authors demonstrate a high degree of rigour in the assay development steps, culminating in a convincing demonstration of the ability to accurately and reproducibly quantify cross-reactive antibody types under controlled conditions using well-characterized monoclonal antibodies.

      In a final step, the authors used the assay to assess the content of broadly cross-reactive antibodies in samples from a small number of malaria-exposed African men and women. Given that VAR2CSA is a parasite-derived protein that is exclusively and intimately involved in the manifestation of malaria during pregnancy, with specific localisation to the maternal placental space, the premise is that antibodies -including those with cross-reactive specificities - should be almost exclusively detectable in samples from women, either pregnant at the time of sampling or having been pregnant at least once. The assay functioned technically as expected, identifying antibodies predominantly in women rather than men, but it failed to identify broadly cross-reactive antibodies in the women's samples used, only revealing antibodies with specificity for just one of the different variants used. The latter result could have two mutually non-exclusive explanations. On the one hand, the small number of women's samples (7) screened in the assay could simply be insufficient, demanding the use of a much larger panel. On the other hand, for technical reasons the assay involves the use of only relatively restricted parts of the VAR2CSA protein, and this particular aspect may represent its primary limitation. In earlier work, the authors did identify broadly cross-reactive antibodies in samples from African women, but that work relied on the use of the whole VAR2CSA protein present in its natural state embedded in the membrane of the infected red cell, or as a complete protein produced in the laboratory. The important point being that the whole protein likely interacts with antibodies that recognize protein structures that the isolated smaller parts of the whole protein used in the assay fail to reproduce, and that the cross-reactive antibodies identified recognize these structures that are conserved across different VAR2CSAvariants. The authors recognize these potential weaknesses in their discussion of the results. It is also possible that VAR2CSA variants expressed by parasites from geographically-distinct regions (Africa, Asia, South America) are themselves distinct, and this aspect could also have affected the outcome, since the variant protein sequences used in the assay were derived from parasites originating in these different regions.

      The assay could find application in the malaria research field in the specific context of assessments of antibody responses to a range of different parasite proteins that are, or have been, considered candidates for vaccine development but for which their extensive inherent allelic polymorphism has effectively negated such efforts.

      We thank the reviewer for the kind evaluation. We fully acknowledge the need for more comprehensive studies to assess the robustness of the pilot data regarding antibody cross-reactivity after natural exposure in the present study, which was aimed to document the performance of the complicated multiplexed assay rather than to provide such evidence. As mentioned above, we are currently planning such a study. We also acknowledge the need to assess the degree of cross-reactivity to full-length antigens rather than domain-specific components of them. This is obviously particularly true for large, multi-domain antigens such as PfEMP1 (including VAR2CSA). Such an exercise is complicated by the need for appropriately tagged antigens. We are intrigued by the apparent discrepancy between the degree of antibody cross-reactivity in depletion experiments using individual DBL domains of VAR2CSA (low cross-reactivity) versus full-length VAR2CSA antigens (very substantial cross-reactivity) reported by Doritchamou et al., and are keen to apply our approach to explore that finding. Therefore, as also mentioned above, we are currently planning a study employing tagged full-length VAR2CSA allelic variants as well.

    1. Reviewer #1 (Public Review):

      The authors test whether neurons in V1 show "multiplexing", which means that when two stimuli A and B are presented inside their receptive fields (RFs), the neuronal response fluctuates across trials between coding one of the two, leading to a bimodal spike count histogram. They find evidence for this "mixture" model response in a subset of V1 neurons. They next test whether the spike count noise correlations (Rsc) vary between pairs of neurons that prefer the same versus different stimuli, and show that Rsc is positive for neurons that prefer the same stimulus but negative for neurons that prefer different stimuli.

      While this paper shows some intriguing results, I feel that there are a lot of open questions that need to be addressed before convincing evidence of multiplexing can be established. These points are discussed below:

      1. The best spike count model shown in Figure 2C is confusing. It seems that the number of "conditions" is a small fraction of the total number of conditions (and neurons?) that were tested. Supplementary Figure 1 provides more details (for example, the "mixture" corresponds to only 14% of total cases), but it is still confusing (for example, what does WinProb>Min mean?). From what I understood, the total number of neurons recorded for the Adjacent case in V1 is 1604, out of which 935 are Poisson-like with substantially separated means. Each one has 2 conditions (for the two directions), leading to 1870 conditions (perhaps a few less in case both conditions were not available). I think the authors should show 5 bar plots - the first one showing the fraction for which none of the models won by 2/3 probability, and then the remaining 4 ones. That way it is clear how many of the total cases show the "multiplexing" effect. I also think that it would be good to only consider neurons/conditions for which at least some minimum number of trials are available (a cutoff of say ~15) since the whole point is about finding a bimodal distribution for which enough trials are needed.

      2. More RF details need to be provided. What was the size of the V1 RFs? What was the eccentricity? Typically, the RF diameter in V1 at an eccentricity of ~3 degrees is no more than 1 degree. It is not enough to put 2 Gabors of size 1 degree each to fit inside the RF. How close were the Gabors? I am confused about the statement in the second paragraph of page 9 "typically only one of the two adjacent gratings was located within the RF" - I thought the whole point of multiplexing is that when both stimuli (A and B) are within the RF, the neuron nonetheless fires like A or B? The analysis should only be conducted for neurons for which both stimuli are inside the RF. When studying noise correlations, only pairs that have overlapping RFs such as both A and B and within the RFs of both neurons should be considered. The cortical magnification factor at ~3-degree eccentricity is 2-2.5mm/degree, so we expect the RF center to shift by at least 2 degrees from one end of the array to the other.

      3. Eye data analysis: I am afraid this could be a big confound. Removing trials that had microsaccades is not enough. Typically, in these tasks the fixation window is 1.5-2 degrees, so that if the monkey fixates on one corner in some trials and another corner in other trials (without making any microsaccades in either), the stimuli may nonetheless fall inside or away from the RFs, leading to differences in responses. This needs to be ruled out. I do not find the argument presented on pages 18 or 23 completely convincing, since the eye positions could be different for a single stimulus versus when both stimuli are presented. It is important to show that the eye positions are similar in "AB" trials for which the responses are "A" like versus "B" like, and these, in turn, are similar to when "A" and "B" are presented alone.

      4. Figures 5 and 6 show that the difference in noise correlations between the same preference and different preference neurons remains even for non-mixture type neurons. So, although the reason for the particular type of noise correlation was given for multiplexing neurons (Figure 3 and 4), it seems that the same pattern holds even for non-multiplexers. Although the absolute values are somewhat different across categories, one confound that still remains is that the noise correlations are typically dependent on signal correlation, but here the signal correlation is not computed (only responses to 2 stimuli are available). If there is any tuning data available for these recordings, it would be great to look at the noise correlations as a function of signal correlations for these different pairs. Another analysis of interest would be to check whether the difference in the noise correlation for simply "A"/"B" versus "AB" varies according to neuron pair category. Finally, since the authors mention in the Discussion that "correlations did not depend on whether the two units preferred the same stimulus or different", it would be nice to explicitly show that in figure 5C by showing the orange trace ("A" alone or "B" alone) for both same (green) and different (brown) pairs separately.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their constructive comments and are pleased that all reviewers share our opinion, that the present study “makes an important contribution to the molecular architecture of mitochondria”, is in addition “an important advancement in our understanding of the mechanism by which Cqd1 regulates CoQ distribution” and will “thereby appealing to the broad readership of the journals”. We are convinced that addressing the important points raised by the reviewers will further strengthen the manuscript and result in additional significant insights in the molecular function of Cqd1.

      Reviewer #1:

      The major concerns affecting the conclusions are: 1) Experimental evidence is lacking on the contribution of contact site formation by Cqd1 to the effects on mitochondrial architecture and respiration-dependent growth. Determining the effects of the overexpression of the kinase-dead mutant on mitochondrial morphology and contact site formation with Por1-Om14 can address that.

      We thank reviewer #1 for raising these important points. Indeed, the various functions of Cqd1 might be independent from each other and so far we cannot distinguish between them. As suggested by the reviewer we will analyze the effect of overexpression of CQD1 in the Dups1 deletion mutant and make use of the point mutant in the conserved ATP binding domain which cannot complement the phenotype of the Dups1 Dcqd1 double deletion mutant. We generated a yeast mutant strain expressing Om14-3xHA in the absence of wild type Cqd1. Expression of the cqd1(E330A) mutant in the Om14-3xHA background and subsequent immunoprecipitation will allow us to test whether ATP binding is also essential for contact site formation. Preliminary experiments showed that the overexpression of cqd1(E330A) in the Dcqd1 deletion background results in a growth defect comparable to that caused by overexpression of CQD1 WT. Therefore, we think it might be more promising to analyze the interaction of Om14 and Cqd1 E330A at wild type level in order to avoid pleiotropic effects.

      In addition, we will further characterize the cqd1(E330A) mutant by analyzing the effect of its overexpression on mitochondrial morphology, cell growth and assembly of MICOS and F1FO ATP synthase in the Dcqd1 deletion background.

      2) Related to point #1, Cqd1 overexpression in deltaUsp1 cells could have addressed whether the role of Cqd1 in contact sites and mitochondrial architecture is independent of its role on CoQ distribution and phospholipid metabolism. Further characterization of the kinase-dead Cqd1 mutant on CoQ distribution, contact sites, mitochondrial archictecture and phsophsolipid metabolism might help discerning how these activities can be separated.

      We agree that the related points 1) and 2) raised by reviewer #1 are important and addressed our plans in the response on point 1).

      3) It is unclear how both Cqd1 overexpression and deletion induce mitochondrial fragmentation. Performing live cell imaging with a mitochondrial-phoactivatable GFP to measure mitochondrial fusion rates could help discerning the causes for fragmentation. It is a possibility that overexpression induced fragmentation by activating fission without changing fusion, while deletion induced fragmentation by blocking fusion.

      We thank reviewer #1 for bringing up this point. Perhaps our explanation in this respect was too short. Fig. 4E shows that deletion of CQD1 does not result in altered mitochondrial morphology, however, deletion of CQD1 in the Dups1 background leads to virtual complete fragmentation of the mitochondrial network. This is likely due to inhibition of mitochondrial fusion through disturbed processing of the fusion protein Mgm1 (see Fig. 4D). In contrast, overexpression of CQD1 does NOT result in formation of small mitochondrial fragments, but in formation of huge mitochondrial clusters which in addition contain a large proportion of ER membranes. So, we don’t think that this phenotype is related to either enhanced fission or reduced fusion. We will clarify this point in text of the revised manuscript.

      Minor comment:

      1) Figure 4 claims that mitochondrial function is impaired by ups1 deletion, which Cqd1 deletion exacerbates. However, no respiration data is shown in figure 1, only measurements of mitochondrial architecture are shown. Thus, oxygen consumption measurements are needed to claim effects on mitochondrial function.

      We did not want to claim that mitochondria lose respiratory competence upon simultaneous deletion of CQD1 and UPS1. Actually, our results indicate that the Dups1 Dcqd1 double deletion mutant grows like wild type on complete medium containing glycerol. Therefore, respiration is not impaired in this mutant. However, mitochondrial function is not restricted to ATP production by oxidative phosphorylation. The reviewer probably refers to Figure 4 where we show that mitochondrial biogenesis and dynamics are impaired in the Dups1 Dcqd1 double deletion mutant – the heading of the legend summarizes this as "mitochondrial function". We will be more precise in the revised version on this point and add a panel showing growth of the mutant strain on non-fermentable carbon source to avoid any further confusion.

      2) Some Western blots lack quantifications and statistical analyses of independent experiments.

      It is correct that some quantification and the respective statistics were missing in the initially submitted manuscript. We will add the requested information in the revised version of the manuscript.

      Reviewer #2:

      I have the following concerns for the authors to consider. (1) Although biochemical evidence shows that Cqd1 is likely a factor that forms CS structures in mitochondria, it would make the manuscript stronger if the authors can observe uneven distribution of Cqd1 in the mitochondrial membranes (assessed by fluorescent microscopy or ideally high-resolution microscopy) and the presence of Cqd1 in the region of close apposition of the OM and IM by immunogold labeling for electron microscopy.

      Two independent lines of evidence show that Cqd1 is a novel contact site protein: (i) it is found in the contact site fraction in density gradients (Fig. 6A), and (ii) it can be co-immunoprecipitated with outer membrane proteins (Fig. 6G, H, I). Furthermore, the co-IP is supported by cross-links of expected size (Fig. 6F). In sum, we feel that this is solid evidence to support our claim that Cqd1 is present in mitochondrial contact sites. However, it still might be interesting to check an uneven distribution of Cqd1 in mitochondria, as suggested by the reviewer. We will do this by 3D deconvolution fluorescence microscopy.

      (2) Since the structural characterization of Cqd1 is important to understand its interactions with the OM proteins and other UbiB protein kinase-like family proteins, Coq8 and Cqd2, take different orientations, the membrane topology of Cqd1 should be experimentally analyzed. The authors state, "two hydrophobic stretches can be identified in the Cqd1 sequence, of which the first one (amino acids 125-142) might be a bona fide transmembrane segment" (lines 97-100); then is Cqd1 a single membrane spanning protein or two-membrane spanning protein?  

      Unfortunately, it was not possible to test the location of the N terminus experimentally because an N-terminally tagged variant of Cqd1 (tag inserted between presequence and mature part) turned out to be unstable. We consider it very unlikely that the second hydrophobic stretch is a transmembrane domain as it is rather short (only 11 amino acids). Furthermore, several Cqd1 homologs in other fungi, including Yarrowia lipolytica, Aspergillus niger and Schizosaccharomyces pombe, are lacking the second hydrophobic stretch. Therefore, we propose that the major part of Cqd1 including the protein kinase-like domain is exposed to the intermembrane space. We will point out this more clearly in the revised manuscript.

      (3) The authors state, "conserved GxxxG dimerization motif (amino acids 504‐508)" (Fig. 1A caption), but this description needs a reference. The GxxxG motif was proposed to mediate transmembrane helix-helix association (https://doi.org/10.1006/jmbi.1999.3489), which is not consistent with the membrane topology proposed by the authors.

      We thank reviewer #2 for this comment. It is correct that GxxxG motifs are usually present in transmembrane a-helices. However, there is information available indicating that these motifs may also be present in soluble proteins and are stabilizing dimeric interactions for instance in the homodimeric Holliday-junction protein resolvase (Kleiger et al., 2002; doi: 10.1021/bi0200763.). However, as this point is not critical for our conclusions we will remove the discussion of the GxxxG motif from the revised manuscript.

      (4) What is the role of the kinase activity of Cqd1 in the CS formation? The effects of overexpression of Cqd1 (Fig. 7) should be tested for its E330A mutant.

      We also thank reviewer #2 for raising this important point similar to reviewer #1. Please see our response to point 1) of reviewer #1.

      (5) Is there stoichiometric as well as quantitative information on the 400 kD complex consisting of Cqd1, Por1 and Om14? Does the stoichiometry and amount of the complex depend on the growth condition? Does the complex contain other Por1 interacting IM proteins like Mdm31?

      We appreciate that reviewer #2 points out this important aspect. It might well be that the amount of the Cqd1 containing complex depends on growth conditions since its presence might be important for phospholipid homeostasis, CoQ distribution and mitochondrial architecture and morphology which for sure strongly depend on growth conditions. Therefore, we will try to analyze the amount of the Cqd1 complex present in mitochondria isolated from yeast cells grown on different media by BN-PAGE. So far we do not have any information on the stoichiometry of this complex and we feel that an analysis would go beyond the scope of this study. We agree with reviewer #2 that Mdm31 is an obvious candidate for an interaction partner of Cqd1. We actually tested this by co-immunoprecipitation using Cqd1-3xHA or Mdm31-3xHA. However, none of these approaches resulted in successful co-isolation of the potential interaction partner. We will mention this result in the revised manuscript.

      (6) For Fig. 7E, the authors state, "consistently, we observed dramatically increased mitochondria‐ER interactions Cqd1 overexpression", but this observation could be due to secondary effects because overexpression of Cqd1 itself already caused abnormal morphology of mitochondria.

      We thank reviewer #2 for bringing up this important point. To check whether the increased mitochondria‐ER interactions are a secondary effect due to altered mitochondrial morphology we will analyze the mitochondria‐ER interactions in other mitochondrial morphology mutants by fluorescence microscopy. This will reveal whether abnormal mitochondrial morphology generally leads to disturbed ER structure.

      (7) Since the antagonistic role of Cqd2 to Cqd1 was proposed, the results of the experiments for Cqd1 can be compared with those for Cqd2. For example, what will become of overexpression of Cqd2 instead of Cqd1 for Fig. 7? What is the lipid composition of the cqd1Dcqd2D double deletion mutant cells (the decreased PA level is recovered?)? Lines 424-425: In summary, overexpression of Cqd1 causes severe phenotypes on growth, formation of mitochondrial structural elements, and mitochondrial architecture and morphology. Is this phenotype affected by overexpression of Cqd2?

      This point raised by reviewer #2 is very interesting. Our preliminary experiments and previously published data (Tan et al., 2013) indicate that overexpression of Cqd2 is also toxic and results in the formation of huge mitochondrial clusters. Therefore, we will extend our study and analyze the effect of overexpression of CQD2, either alone or in combination with overexpression of CQD1.

      Reviewer #3:

      1) The central point of the paper is that Cqd1 is part of a novel contact site between the inner and the outer membrane. Om14 and Por1 were identified as outer membrane components of this contact site by immunoprecipitation. The data look convincing but they were generated from targeted experiments to test the involvement of suspected proteins. Ideally, one would like to see a cross-linking mass spectrometry (XL-MS) experiment that identifies the physical interactions of Cqd1 without bias.

      We thank reviewer #3 for acknowledging the presented data as convincing. Considering the significant amount of experiments planned for the revised version of the manuscript, we hope that reviewer #3 agrees that this point is not essential.

      2) Could an analogous blot of the MICOS complex be added to Figure 6D?

      Of course, we are happy to include BN-PAGE analysis showing the running behavior of MICOS next to the Cqd1 containing complex in Fig. 6D.

      3) In the Introduction, a host of contact sites is mentioned, which are partly from older papers. I'm not sure whether this is the accepted view of the field. Also, newer data suggest that the permeability transition pore is derived from complex V rather than ANT, CK, and VDAC. The authors should double check in order to represent the current state of the art

      We thank reviewer #3 for this comment. We will update this part according to the more recent literature.

    1. Author Response

      Reviewer #2 (Public Review):

      First, I want to congratulate the author team on this manuscript, which I read with great pleasure. I think this will be a fine addition to the literature!

      The present MS by Clement et al. provides a comprehensive overview of the brain shapes of lungfishes. Besides previously known/described brain endocasts, the work includes models and descriptions of previously undescribed taxa. Notably, all CT data are deposited online following best practices when working with digital anatomy. The specimen sample is impressive, especially as the sampled material is housed in museum all over the world. Although the sample size may seem numerically low (12 taxa), this actually is a comprehensive sample of fossil (and extant) lungfishes in terms of what's preserved in the first place.

      The study at hand has several goals: (1) The description of lungfish brains for taxa that were previously undescribed; (2) the quantification of aspects of brain shape using morphometric measurements; (3) the characterization of brain shape evolution of lungfishes using exploratory methods that ordinate morphometric measurements into a morphospace.

      The provided 3D data and descriptions will serve as valuable comparisons in future lungfish work. This type of data is imperial for palaeontological studies in general, and the anatomical information will be extremely valuable in the future. For example, anatomical characters related to brain architecture have been shown to be informative about phylogeny in the past, and the presented data may inform future phylogenetic studies. The quantification of brain shape via (largely linear) measurements is relatively simplistic, and can thus only detect gross trends in brain shape evolution among lungfishes. The authors describe several such trends - such as high variation in the olfactory brain region in comparison to other parts of the brain. The results and interpretations drawn from the authors are supported by their data, and the approach taken is valid, even if more sophisticated shape quantification methods (e.g. 3D landmarking) and analytical methods (e.g. explicit phylogenetic comparative methods) are available, which could provide additional insights in the future.

      We agree with Reviewer #2 that 3D geometric morphometrics could have provided more sophisticated analytical methods. However, geometric morphometrics has some limitations with regard to the type of data that we analysed: (1) low sample size and (2) missing/incomplete data. In order to have a comprehensive coverage of the brain shape, it would have required to have numerous landmarks (and semilandmarks) to represent the complexity of brain shape.

      First, our sample size (12 taxa) is low (although it is an impressive sample size when considering the type of data). Although there are no universal rule concerning the ratio “number of specimens / number of landmarks” (Zelditch et al., 2012), ideally the sample size must be from two to three times the number of landmarks. Thus, with a sample size of 12 we could have used ca. 4-6 landmarks which is very limited to describe complex shapes. In addition, in order to use geometric morphometrics (2D or 3D), the landmarks should be present on all the specimens. Because of the partial completeness of the studied fossils, the brain endocasts are not uniformly known for each species. Incomplete and deformed specimens prompt the removal of potential landmarks for analyses. Even using right-left reflexion of the endocasts, most specimens do not share all neurocranial information.

      We agree with Reviewer #2 that a phylogenetic PCA could have provided interesting analytical perspectives. Phylogenetic PCA are available on standard PCA, it is uncertain that it can be used on Bayesian PCA and InDaPCA (this method has been published very recently, and we haven’t found much literature about it). However, we did not find an adaptation of phylogenetic PCA to the BPCA nor the InDaPCA; we even contacted Liam Revell, who created the phylogenetic PCA, about this issue.

      The presented results and interpretations in this regard must be seen as a preliminary assessment of lungfish brain evolution, but it is clearly written and generally well performed.

      A potential shortcoming of the paper is the lack of explicit hypothesis testing, which is not problematic per se, but puts limits on the conclusions the authors can draw from their data.

      We decided to address the issues using exploratory methods rather than testing hypotheses. It is a more conservative approach, since it is the first quantitative analysis of dipnoan endocasts. Future analyses, will be able to formulate hypotheses based on our interpretation of our exploratory approach. We hope to stimulate such hypotheses testing, when in the future further dipnoans will be added; however, one has to remember that ossified neurocrania are known in Devonian dipnoans and one partially ossified neurocranium in a Carboniferous, the remaining dipnoans have cartilaginous neurocrania which limit the sample size from which endocast data could be gathered.

      For example, the authors state that different anatomical parts of the labyrinth (particularly, the utricle with respect to the semicircular canals or saccule) may show modular dissociation from other labyrinth modules, based on the polarity of eigenvalue signs of the PCA analysis. I think this is fine as a first approximation, but of course there are explicit statistical tools available to test for modularity/integration, such as two-block partial least squares regression analysis (Rohlf & Corti 2000, Syst. Biol.). I don't see the lack of usage of such methods as problematic, because you cannot do everything in one paper, and the authors remain careful in their interpretation.

      We agree with Reviewer #2 that different geometric morphometrics methods have been developed to look at variational modularity; one of the co-authors (RC) has been publishing a few papers on patterns of morphological integration and modularity in fishes (see Larouche, Cloutier & Zelditch, 2015, Evol. Biol.; Lehoux & Cloutier, 2015, J. Exp. Zool. Mol. Dev. Evol.; Larouche, Zelditch & Cloutier, 2018, Sci. Rep.). Interesting a priori hypotheses of brain modules could have been formulated and tested for modularity using for example Covariance Ratio (CR) and distance matrix approach. But still the low sample size and the incompleteness of the data are major constrains to test modularity. We would however endeavour to use such methods in future work as more complete material becomes available.

      It may be advisable, however, to add the odd sentence or statement about how some findings are preliminary or hypothesized, and that these should receive further treatment and testing using other methods in the future. I think this approach is actually very rewarding, because then you can inspire future work by outlining outstanding research problems that arise from the new data presented herein.

      We have now included an additional sentence early in the Discussion section stating: “We acknowledge that our investigation of lungfish brain evolution as elucidated from morphometric analysis of cranial endocasts is still preliminary in several respects. We hope that our study can inspire future work on the neural evolution of both fossil and extant lungfish.”

      In the following, I comment on a few aspects of the manuscripts. These represent instances where I had additional thoughts or ideas on how to slightly improve various aspects of the manuscript.

      1) Presentation of PCA results

      The authors provide several PCA analyses (preliminary analyses on partial matrices, BPCA, InDaPCA), and are very explicit about the procedures in general. For instance, I appreciate they explicitely state using correlation matrices for PCA analyses due to the usage of different measurement units among their data.

      Visually, the BPCA and InDaPCA are presented in figures 2 and 3, whereas the preliminary partial matrix PCAs are only reported as supplementary figures. While I don't object to any of this, I find the sequence of information given in the results section suboptimal.

      The figures have now been substantially reorganised to include more within the main body text and not as Supplementary Information, and we hope that this improves the sequence of information within the manuscript.

      The authors start by discussing the partial matrix analyses, although none of these analyses are visually/graphically depicted in the main text figures, and although their results do not seem to be of real importance for the narrative of the discussion. The other two PCA analyses actually are presented afterwards and separately, but they convey some common signals, particularly that the major source of variation seems to be a decreasing olfactory angle with increasing olfactory length, and a scaling relationship between all linear measurements (which all have the same eigenvector signs on the first PC axis). I wonder if an alternative way of presenting the PCA results would be better for this particular MS. For example, the authors could give "first level observations" first ("PCA analyses agree in X,Y,Y"), and then move to second order observations ("Morphospace of BPCA has some interesting taxon distribution with regard to chirodipterids"; "InDaPCA axis projections continuously retrieve clustering of specific variables"). I suspect this would shorten the text somewhat and could serve as a clearer articulation of the take home messages?

      Accordingly with Reviewer #2, we have now provided “first level” observations based on the standard PCA. We added some further comments on the species distribution in the morphospaces.

      2) Selection of PC axes for interpretation

      You describe how you use the broken-stick method to decide how many PC axes are retained for the interpretation of results, which I agree is a good procedure. However, I have a few questions regarding this. First, in line 331 (description of InDaPCA) you state that the first three axes are non-trivial "based on the screeplot" - which got me confused because it sounds a bit like eyeballing off the screeplot. Have you used the broken stick method for all your PCA analyses?

      Originally, we used both screeplot and broken-stick method, however, we are now solely using the broken stick method to determine the number of non-trivial axes. We agree with Reviewer #2 that this method is more rigorous than the scree plot. Our choice is greatly inspired by the studies of Jackson (1993, Ecology) and PeresNeto et al. (2005, Computational Statistics & Data Analysis). We have now edited the text so that our methods are clearer (and removed the text relating to the screeplot such as “based on the screeplot…”).

      The second question relates to the results of the broken stick method, which I did not find reported. Unless I am mistaken, for the xth axis, the method sums the fractions of 1/i (whereby i = x..n; n = number of axes), and divides this number by n to get a value of expected variation per axis. This number is then compared with the actual value of variance explained by the axis. So for the 1st of 17 axes, the broken-stick expectation is = (1 + 1/2 + .. + 1/17) / 17. If you apply this to your BPCA, the third axis' value (i.e., (1/3 + ... + 1/17)/17) is 0.114, which is smaller than the reported 0.120 that PC3 explains. Thus, following the broken stick method, PC3 does explain more variation that expected (and should thus be retained, contra your comment in line 311 which refers to two non-trivial axes)?

      We thank Reviewer #2 for the insightful evaluation of our paper who took the time to validate each step of our analyses. Effectively, we agree with Reviewer #2 that based on the broken stick method the third axis in nontrivial. The value for the third axis is 1,0531310. Thus, we are presenting these results as well as discussing the three PCA projections (axis 1 versus axis 2, axis 2 versus axis 3, axis 1 versus axis 3).

      Related to this potential issue is the presentation of the BPCA results in Fig. 2: You present loadings of three PC axes, although only the first two are considered in morphospace bi-plots and although the text also mentions only two non-trival axes. If the third axis is indeed non-trivial, then the loading-presentation could be retained in the figure, but then the authors should consider showing a PC1 vs. PC3 plot in addition to the currently presented biplot showing the first and second axis only. If the third axis indeed is trivial, as currently suggested by the text, then showing the loadings is unnecessary.

      We consider showing a biplot of PC1 vs PC3 unnecessary as those shown (PC1 vs PC2) already account for 83.4% of the variation captured. We have edited these figures so that the loadings related to PC3 have also now been omitted.

      It would be great if you clarify the usage/application of the broken stick method for all your PCAs. An easy way to report the results may be the add a row to each of your PCA loading tables in the supplements, in which you divide the actual value of variation explained by the value expected under the broken stick method - this way, all axes which explain more variation than expected by the stick method have values larger than 1, and axes which explain less have values lower than 1.

      We have taken this suggestion from Reviewer #2 on board and have now recalculated all values for the brokenstick method for each analysis; we also provide broken-stick values in their respective loading tables in the SI.

      3) Missing commentary on allometry

      In basically all PCA analyses, the first PC axis seems to be dominated by allometric size effects, given that all linear measurements have the same eigenvalue signs. The authors do acknowledge this (lines 314-316; 335-336), but offer no further comment on size effects/allometry.

      We agree that normally the first axis represents variation related mainly to size changes and shape changes related to size (allometry). However, we are reluctant to assume that our first axis corresponds to evolutionary allometry. Among others, Klingenberg & Zimmermann (1992) and Klingenberg (1996) used standard PCA (or multi-group PCA) to disentangle evolutionary and ontogenetic allometry (as well as static allometry) mainly by analysing multiple specimens for each group (or species) in order to have a better repartition of the covariance. Since our sample is limited to 12 species, and that they are all represented by a single specimen (except for Dipterus), it would be difficult to clearly discriminate variation associated to allometry. Even in a case of ontogenetic allometry, a sample size of 12 would have been limited to unambiguously conclude any variation.

      For example, it would be interesting to see how the linear measurements scale with overall head size. Similarly, the authors note that the semicircular canal measurements covary strongly, as do the utricle and saccule height/length measurements (paragraph line 346). Basically, it seems that the semicircular canal measurements scale with one another: as one gets bigger, so gets the other. It is interesting that the utricle does not seem to follow the same scaling pattern as the saccule and semicircular canals, and it would be good to hear if the authors think that there is a functional implication for this. Increases in utricular/saccular/semicircular canal sizes are usually explained by increased sensitivity - so is an increased utricular size a compensatory development to decreased semicircular canal+saccule size to retain an overall level of sensitivity, or does it maybe related to a relative change of importance of the specific functions, e.g. increased importance of linear accelerations in the horizontal plane with simultaneous decrease of importance of angular and vertical accelerations?

      We thank Reviewer 2 for this suggestion about overall head size scaling - endocast measurements. Our original study design also included measurements of dermal skulls, but we omitted this from the final version as the material available was far too incomplete to be able to conduct meaningful analyses. It is a topic of future study that some of us (AC, RC) have already discussed as a potential future project to be investigated.<br /> With respect to the functional implications of the modular dissociation of the labyrinths, we have expanded the final paragraph of the “implications for sensory abilities” within the Discussion, and similarly added the sentence “However, we acknowledge that it is difficult to determine if increased relative utricular size results from greater reliance of sensitivity in the horizontal plane alone, or if it expands to compensate for e.g. relative stagnation of the sacculus + semicircular canals in some way. Further studies, such as investigation of neuronal densities in extant lungfish labyrinths, may potentially in part clarify this uncertainty in future.”

      4) Labyrinth size

      With the above mentioned utricular exception, labyrinth size measurements particularly on the semicircular canals seem to imply that there is a relative consistent scaling relationship between the canals. When one canal gets larger, so do the others, perhaps thereby retaining canal symmetry across different absolute labyrinth sizes. Labyrinth size in tetrapods is often interpreted in relation to body size/mass or head size (e.g. Melville Jones & Spells 1963, Proc. R. Soc. Lond. Biol. Sci.; Spoor & Zonneveldt 1998, Yearb. Phys. Anthr.; Spoor et al. 2002, Nature; Spoor et al. 2007, PNAS; Bronzati et al. 2021, Curr. Biol.), as deviations from the expected labyrinth size per head size indicate increased or decreased relative labyrinth sensitivities. Large relative head sizes of birds and (within) mammals have generally been interpreted as indicative of "active" or "agile" behaviour, although doubt has been casted on these relationships recently (e.g., Bronzati et al. 2021). Increased sampling of relative labyrinth size from various vertebrate groups would be important to better understand labyrinth sizefunction relationships. Melville Jones & Spells (1963) have shown that fishes have large labyrinth sizes compared to most tetrapods, but they don't have lungfish data and the large labyrinth sizes of fishes have often remained uncommented on in tetrapod works. I think this study offers a fantastic opportunity to provide comparative labyrinth size data for lungfishes. In this regard, it would be really interesting to quantify labyrinth size relative to head size, and show a respective (phylogenetic) regression analysis. Ideally, the size of the labyrinth could be quantified along the arc lengths of the semicircular canals, but other ways are also thinkable (for example a box volume of labyrinth size by the existing measurements, contrasted with a box volume of the skull, i.e. heightwidthlength).

      Firstly, many thanks for the suggested reading of Bronzati et al. (2021) And while we consider a labyrinth skull size regression analysis to be a worthwhile suggestion, we have chosen not to include one in this study, partly as there is no phylogenetic regression based on the new methods that we are using, and secondly that it forms the basis of another study currently underway by some of the authors.

    1. Reviewer #1 (Public Review):

      In this study, the authors aimed to address the important question of the mechanism of deep brain stimulation (DBS) in treating Parkinson's disease, based on a mouse model that the authors established previously.

      The strength of the study lies on 1) avoiding the interference of stimulation artefacts of using electrophysiological recording technique, and 2) examining effects on cell-type or projection-specific targets.

      However, there are several critical problems in this study. First, the low temporal resolution and the averaged population signal (rather than from individual neurons) of the fibre photometry data prevents in-depth enough analysis of the effects of DBS on the target areas to draw useful conclusion. Thus, all interpretations were based on an average rise in GCaMP-reported calcium signals with pretty low temporal resolution. As a result, important readouts that were analysed in many previous studies such as the firing patterns (e.g. rhythmic) or synchrony among neurons were missed by this approach. Take one example. The conclusion that antidromic activation is excluded as a possible mechanism is based partly on the lack of good correlation of the averaged calcium signal with the behavioral improvement. However, such a lack of correlation is also evident in the averaged calcium signal and the improvement in movement behavior under 60Hz and 100 Hz stimulation (Figure 2). While a higher average in calcium signal is observed under 60Hz DBS than 100Hz, the improvement in motor behavior is lower than that induced by 100 Hz DBS. This highlights the severe limitation of the fibre photometry data in revealing the therapeutic mechanism of DBS.

      Second, there is no clear elucidation of the pathological changes revealed by the fibre photometry in PD mice to illustrate what is normal and what is abnormal, and how the DBS rectifies the abnormal changes. For example, when we need to interpret the effect of the DBS on calcium activities in the subthalamic nucleus (STN), the substantia nigra pars reticulala (SNr) and the primary motor cortex (M1), what abnormal GCaMP signal did the authors find, compared with healthy control mice? Without such information, it is difficult to get a sense of what an increase in GCaMP signal in STN, SNr and M1 mean with respect to motor control, and therefore what it means with respect to the effect of DBS. With the specific context of a peak (actually a biphasic waveform) of the calcium activity in the PD anima, it is puzzling that a surge of STN is correlated with movement onset, while in principle it should result in movement termination. Therefore, it is critical to know if there is there such a correlation in healthy animals. If yes, this may not indicate a pathological change that needs to rectified by DBS. If no, how the pathological appearance of such change leads to parkinsonian motor symptoms (akinesia, bradykinesia etc) must be established.

      Third, it is well-known that clinical DBS employed at least 120 Hz stimulation. In fact, the authors had also demonstrated in their previous report that the optimal stimulation frequency in the mouse model is around 180Hz. But the present study utilised clearly suboptimal frequencies (60 and 100Hz only) to address the mechanism. It is possible that different mechanisms or combinations of mechanism may take place under different stimulation frequencies. As such, any conclusion drawn from this study may not represent the whole picture.

      Given the above consideration, I do not think that the authors have achieved the aim of their study, as the results cannot convincingly support their conclusions.

    1. Reviewer #3 (Public Review):

      Zadbood and colleagues investigated the way key information used to update interpretations of events alter patterns of activity in the brain. This was cleverly done by the use of "The Sixth Sense," a film featuring a famous "twist ending," which fundamentally alters the way the events in the film are understood. Participants were assigned to three groups: (1) a Spoiled group, in which the twist was revealed at the outset, (2) a Twist group, who experienced the film as normal, and (3) a No-Twist group, in which the twist was removed. Participants were scanned while watching the movie and while performing cued recall of specific scenes. Verbal recall was scored based on recall success, and evidence for descriptive bias toward two ways of understanding the events (specifically, whether a particular character was or was not a ghost). Importantly, this allowed the authors to show that the Twist group updated their interpretation. The authors focused on regions of the Default Mode Network (DMN) based on prior studies showing responsiveness to naturalistic memory paradigms in these areas and analyzed the fMRI data using intersubject pattern similarity analysis. Regions of the DMN carried patterns indicative of story interpretation. That is, encoding similarity was greater between the Twist and No-Twist groups than in the Spoiled group, and retrieval similarity was greater between the Twist and Spoiled groups than in the No-Twist group. The Spoiled group also showed greater pattern similarity with the Twist group's recall than the No-Twist group's recall. The authors also report a weaker effect of greater pattern similarity between the Spoiled group's encoding and the Twist group's recall than between the Twist group's own encoding and recall. Together, the data all converge on the point that one's interpretation of an event is an important determinant of the way it is represented in the brain.

      This is a really nice experiment, with straightforward predictions and analyses that support the claims being made. The results build directly on a prior study by this research group showing how interpretational differences in a narrative drive distinct neural representations (Yeshurun et al., 2017), but extend an understanding of how these interpretational differences might work retrospectively. I do not have any serious concerns or problems with the manuscript, the data, or the analyses. However I have a few points to raise that, if addressed, would make for a stronger paper in my opinion.

      1) My most substantive comment is that I did not find the interpretive framework to be very clear with respect to the brain regions involved. The basic effects the authors report strongly support their claims, but the particular contributions to the field might be stronger if the interpretations could be made more strongly or more specifically. In other words: the DMN is involved in updating interpretations, but how should we now think about the role of the DMN and its constituent regions as a result of this study? There are a number of ideas briefly presented about what the DMN might be doing, but it just did not feel very coherent at times. I will break this down into a few more specific points:

      While many of us would agree that the DMN is likely to be involved in the phenomena at hand, I did not find that the paper communicated the logic for singularly focusing on this subset of regions very compellingly. The authors note a few studies whose main results are found in DMN regions, but I think that this could stand to be unpacked in a more theoretically interesting way in the Introduction.

      Relatedly, I found the summary/description of regional effects in the Discussion to be a bit unsatisfying. The various pattern similarity comparisons yielded results that were actually quite nonoverlapping among DMN regions, which was not really unpacked. To be clear, it is not a 'problem' that the regional effects varied from comparison to comparison, but I do think that a more theoretical exploration of what this could mean would strengthen the paper. To the authors' credit, they describe mPFC effects through the lens of schemas, but this stands in contrast to many other regions which do not receive much consideration.

      Finally, although there is evidence that regions of the DMN act in a coordinated way under some circumstances, there is also ample evidence for distinct regional contributions to cognitive processes, memory being just one of them (e.g., Cooper & Ritchey, 2020; Robin & Moscovitch, 2017; Ranganath & Ritchey, 2012). The authors themselves introduce the idea of temporal receptive windows in a cortical hierarchy, and while DMN regions do appear to show slower temporal drift than sensory areas, those studies show regional differences in pattern stability across time even within DMN regions. Simply put, it is worth considering whether it is ideal to treat the DMN as a singular unit.

      2) I think that some direct comparison to regions outside the DMN would speak to whether the DMN is truly unique in carrying the key representations being discussed here. I was reluctant to suggest this because I think that the authors are justified in expecting that DMN regions would show the effects in question. However, there really is no "null" comparison here wherein a set of regions not expected to show these effects (e.g., a somatosensory network, or the frontoparietal network) in fact do not show them. There are not really controls or key differences being hypothesized across different conditions or regions. Rather, we have a set of regions that may or may not show pattern similarity differences to varying degrees, which feels very exploratory. The inclusion of some principled control comparisons, etc. would bolster these findings. The authors do include a whole-brain analysis in Supplementary Figure 1, which indeed produced many DMN regions. However, notably, regions outside the DMN such as the primary visual cortex and mid-cingulate cortex appear to show significant effects (which, based on the color bar, might actually be stronger than effects seen in the DMN). Given the specificity of the language in the paper in terms of the DMN, I think that some direct regional or network-level comparison is needed.

      3) If I understand correctly, the main analyses of the fMRI data were limited to across-group comparisons of "critical scenes" that were maximally affected by the twist at the end of the movie. In other words, the analyses focused on the scenes whose interpretation hinged on the "doctor" versus "ghost" interpretation. I would be interested in seeing a comparison of "critical" scenes directly against scenes where the interpretation did not change with the twist. This "critical" versus "non-critical" contrast would be a strong confirmatory analysis that could further bolster the authors' claims, but on the other hand, it would be interesting to know whether the overall story interpretation led to any differences in neural patterns assigned to scenes that would not be expected to depend on differences in interpretation. (As a final note, such a comparison might provide additional analytical leverage for exploring the effect described in Figure 3B, which did not survive correction for multiple comparisons.)

      4) I appreciate the code being made available and that the neuroimaging data will be made available soon. I would also appreciate it if the authors made the movie stimulus and behavioral data available. The movie stimulus itself is of interest because it was edited down, and it would be nice for readers to be able to see which scenes were included.

      To sum up, I think that this is a great experiment with a lot of strengths. The design is fairly clean (especially for a movie stimulus), the analyses are well reasoned, and the data are clear. The only weaknesses I would suggest addressing are with regards to how the DMN is being described and evaluated, and the communication of how this work informs the field on a theoretical level.

    1. we need to treat one another with respect despite our differences like this is like an aspiration for people 00:41:01 right except for they thought it was in the bottom quarter of stuff for everybody else so what happens if i'm like i i would like to get back to treating other people's respect but i don't think 00:41:12 they care about that for me back to that ambiguous interactions that we have all the time i'm gonna read disrespect into most everything i see right and so i i think it's really critical like 00:41:25 like i talked about this as like congruence right this need for our private selves and our public selves to be as as closely aligned as possible we've known for a long time that that's that's a critical part of fulfillment 00:41:37 and self-actualization i mean how how do you get there you're the expert on that like how do you how do you get there if you have a divided self like my private self is different than my public self like so we know that at an individual 00:41:48 level but given the the fact of collective illusions i believe this idea of congruence may be the most important thing you can do for other people right because it doesn't help anyone when we misread each 00:42:00 other so profoundly

      Congruence is the antidote to collective illusion.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all reviewers for their very helpful comments. We feel that the comments pointed to a few main issues that we could remedy. First, we found that many comments and concerns could be addressed with work from our previous paper (doi.org/10.1101/2020.11.24.396002). To fix this, we added additional descriptions of experiments done previously and additional citations. We discussed more in depth an experiment that shows that ciliary membrane and membrane proteins can indeed come from the cell body plasma membrane, we talked more about how we determined that the actin puncta are representative of membrane remodeling functions like endocytosis, and we discussed some of the mechanistic insights provided by our previous work that are applicable here. We hope that this helps to answer several of the reviewer questions. Second, there were a few experiments we thought would be useful to add. These are represented in bold in our responses below. Briefly, we added a measure of internalization or endocytosis in the drp3 mutant, we added some images of cilia to the phalloidin figure to orient readers’ views of the cell, we added some additional mechanistic insight (supplemental figure 3), and we added an axoneme stain to confirm that the axoneme was extending (supplemental figure 4). Finally, we fixed some of our wording in the paper to represent our findings more accurately. Together, we hope that these revisions will address the reviewer concerns.

      Additionally, we added some data that we collected while waiting on reviews. We investigated the requirement for myosin in this pathway and include this data in the supplement.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The current manuscript by Bigge et al. demonstrated that the chemical inhibition of GSk3 causes ciliary elongation in Chlamydomonas reinhardtii. They show that lithium induced ciliary lengthening is majorly due to GSK3 inhibition. Consistent with earlier reports, they show that new protein synthesis is not required for lithium induced ciliary elongation. The authors report that targeting endocytosis either by using chemical inhibitors (dynasore and CK-666) or genetic mutants (dpr3 and Arpc4) does not cause lithium induced ciliary elongation. They further reveal enhanced actin dynamics in lithium treated cells and such activity is lost in Arpc4 mutants. Based on these results, the authors concluded that endocytic pathways may be involved in lithium induced ciliary lengthening. The results are interesting, and this work is important in understanding more about ciliary length regulation. However, more experimental evidence addressing the current interpretation that endocytic pathways may be involved in lithium induced ciliary lengthening is required.

      Major comments: 1 The authors use chemical inhibitors as major tools for their study. However, the specificity of these inhibitors is a concern. How specific are these GSK3 inhibitors such as LiCl? Can authors show that LiCl mediated ciliary lengthening is due to inhibition of GSK3? Authors used BFA and Dynasore to show that not the Golgi, but the endocytosis derived membrane is required for ciliary lengthening. Again, here the specificity of these inhibitors is a concern. Especially as Dynasore has been shown to have non-specific effects.

      We agree that the specificity of chemical inhibitors can be a concern. This is why we used 4 separate inhibitors of GSK3, each showing elongation of cilia and an increase in actin puncta (suggesting an increase in actin dynamics at the membrane). While these different inhibitors may have different off-target effects. Their intended target, GSK3, is the same, suggesting that the shared phenotype from each inhibitor is conserved. The ability of LiCl to affect GSK3 activity in Chlamydomonas was also investigated in depth with a kinase assay and a western blot in Wilson, 2004 (doi: 10.1128/EC.3.5.1307-1319.2004). To address the off-target effects of Dynasore, we employed the drp3 mutant to confirm genetically what we saw from the chemical inhibition. We also show in our previous paper that Dynasore and PitStop2 have similar effects in Chlamydomonas, both of them inhibiting the internalization of a dye-labelled membrane, suggesting that they both function to block endocytosis (doi.org/10.1101/2020.11.24.396002). While no mutant or alternative inhibitor is available to look at the effects of BFA, this inhibitor and its effects on cilia have been well-characterized in Dentler, 2013 (doi.org/10.1371/journal.pone.0053366).

      Does inducing/enhancing endocytosis independent of GSK3 by other means has any effect on ciliary length regulation?

      Our concern with the proposed experiment is that even if elongation requires endocytosis, all endocytosis might not lead to ciliary elongation when endocytosis is for other purposes. For example, endocytosis could occur for other purposes, like nutrient uptake, that will have no effect on cilia. The plasma membrane to cilium pathway may be a targeted pathway triggered by specific disruptions. Therefore, we don’t feel that the proposed experiments will add to our model.

      The major claim of this paper is that LiCl mediated ciliary lengthening is due to enhanced endocytosis. Although authors showed that inhibition of endocytosis results in reduced ciliary length, it is important to show if GSK3 inhibition by LiCl (or any other inhibitor) causes any increased cellular endocytosis? Similarly, what is the effect of GSK3 mutants on endocytosis?

      *We show an increase in actin dynamics at the membrane and actin puncta following treatment with LiCl and the other GSK3 inhibitors. We show here and in our previous paper (doi.org/10.1101/2020.11.24.396002), that these puncta are likely endocytic based on the timing of their appearance and the proteins required for puncta formation (including the Arp2/3 complex and Clathrin) (Figure 7, previous paper). We updated our latest version to reflect the data we have already collected and presented as follows: *

      “Further, they rely on proteins typically thought to be involved in endocytosis including the Arp2/3 complex and clathrin, and they form at times when it makes sense for endocytosis to be occurring, like immediately following deciliation when membrane and protein must be recruited to cilia in a timeframe too short for new protein and membrane synthesis, sorting, and trafficking (Bigge et al. 2020). Thus, we stained cells with phalloidin to visualize filamentous actin and these endocytosis-like punctate structures when cells are treated with GSK3 inhibitors.”

      A phenotypic mutant of GSK3 does not currently exist in Chlamydomonas, and methods of reliably introducing mutants in Chlamydomonas do not currently exist. Thus, we used the array of GSK3 inhibitors.

      Are these endocytic processes enhanced specifically at/or around the cilium during the ciliary lengthening process?

      *Based on our phalloidin staining data, these processes are primarily enhanced near the cilium, but puncta also exist throughout the cell. To more clearly show this and in response to a comment from reviewer 2, we added a set of images with brightfield to demonstrate where the dots are in relation to cilia. We also added arrows to the images in the figure to point out the apex of the cell as determined by the filamentous actin structures in the cells. *

      Authors claim that drp3 is a target of GSK3 and, similar to the canonical dynamin, functions in endocytosis. While, it is an important observation, experiments are required to show the role of drp3 in endocytosis and also to show that it is indeed a target of GSK3.

      To address this comment, we are employing an experiment that was designed in our previous paper (doi.org/10.1101/2020.11.24.396002, Figure 5B-E). This experiment uses a lipophilic membrane dye, FM4-46FX. The dye binds to the membrane but is unable to enter the cell alone. It is quickly endocytosed and results in vesicular-like structures within the cell. We added a panel to Figure 3 where we do this experiment in wild-type and ____drp3 mutant cells. This shows that endocytosis is affected by the mutation in DRP3. The discussion of this new data is summarized in the text as follows:

      “Additionally, we showed that this DRP is required for internalization of a lipophilic membrane dye, FM4-46FX through endocytosis. This dye binds to the membrane but is unable to enter the cells on its own and must be endocytosed. In wild-type cells it is quickly endocytosed and visible as puncta within the cell (Figure 3F, H) (Bigge et al. 2020). However, in drp3 mutants the amount of dye endocytosed is significantly lower (Figure 3G-H), suggesting that DRP3 is required for optimal endocytosis in these cells.”

      Mechanistic insights into how endocytosis/actin dynamics regulate ciliary lengthening would be interesting to see. Further, it is interesting to see if the ciliary signaling defects caused by abnormal ciliary length can be rescued by inhibition of endocytosis.

      *In our previous paper (doi.org/10.1101/2020.11.24.396002), we dive into the mechanisms tying together actin dynamics, endocytosis, and cilia. We find that Arp2/3 complex-nucleated actin networks are required for endocytosis to reclaim ciliary membrane and membrane proteins from a pool in the plasma membrane for the rapid early stages of ciliary assembly. We believe that this is a similar mechanism to what is occurring when cells elongate following lithium treatment. This is because there are several parallels in phenotypes: *

      -The Arp2/3 complex is required for both ciliary assembly (Figure 1, previous paper) and ciliary elongation resulting from lithium treatment. In the case of ciliary assembly, treating with cycloheximide to block the synthesis of new protein fully eliminates regrowth in the absence of the Arp2/3 complex, suggesting this Arp2/3 complex dependent mechanism in early ciliary assembly does not involve new protein synthesis (Figure 2, previous paper). Similarly, the process of ciliary elongation in response to lithium does not require new protein synthesis.

      *-A burst in actin dynamics/actin puncta occurs immediately following deciliation during early regrowth and during growth initiated by lithium treatment. We know these puncta are Arp2/3 complex and clathrin dependent (Figures 4 and 7, previous paper). *

      *-Both initial ciliary assembly or ciliary maintenance and elongation of cilia due to lithium treatment require endocytosis (Figures 5, 7-8, previous paper) but not require Golgi-derived membrane (Figure 3, previous paper). *

      *-Also in the previous paper, we find that this mechanism is required for the internalization and relocalization of a ciliary membrane protein for mating (Figure 6, previous paper). We also find that ciliary membrane proteins move from the plasma membrane to the cilia during ciliary assembly (Figure 7-8, previous paper). *

      *This is summarized in the text as follows: *

      *In the introduction we added: *

      “Previous data from our lab suggest that the Arp2/3 complex and actin are involved in reclaiming material from the cell body plasma membrane that is required for normal ciliary assembly (Bigge et al. 2020). We show that the Arp2/3 complex is required for the normal assembly of cilia and for endocytosis of both plasma membrane and plasma membrane proteins in various contexts. Further, we find that deciliation triggers Arp2/3 complex-dependent endocytosis by observing an increase in actin puncta immediately following deciliation (Bigge et al. 2020).”

      And in the discussion we added:

      “Previous work has shown that while the Golgi is required for ciliary maintenance and assembly (Dentler 2013), it is not the only source of membrane. Instead, we found that membrane reclaimed through actin and Arp2/3-complex dependent endocytosis is required for ciliary assembly or growth from zero length (Bigge et al. 2020). More specifically, we found that the Arp2/3 complex is required for normal ciliary maintenance and ciliary assembly, especially in the early stages when membrane and protein are needed quickly. The Arp2/3 complex is also required for the internalization of membrane and a specific ciliary membrane protein required for mating. Further, we show that endocytosis-like actin puncta form immediately following deciliation in an Arp2/3 complex and clathrin-dependent manner, and that membrane from the cell body plasma membrane can be reclaimed and incorporated into cilia (Bigge et al. 2020). This led us to question whether that same mechanism might be required for ciliary elongation from steady state length induced by lithium treatment.”

      Minor comments: 1. The paper needs a thorough proof reading as it harbors many spelling mistakes, grammatical errors, and poor sentence formation in multiple instances.

      *The paper was thoroughly read, and spelling mistakes and grammar were fixed. *

      Supplemental Figure S2A and S2B should be quoted separately from S2C and S2D.

      *This was updated in the latest version of the paper. *

      In Page 6 paragraph 2 - "authors wrote "To determine if GSK3 could be a potential kinase for this protein, we employed ScanSite4.0, which confirmed that of the 9 DRPs of Chlamydomonas, the only one with a traditional GSK3 target sequence was DRPs (Supplemental Figure 2)." No data is shown in S2 with regard to this. Either data needs to be shown or change the text in a way to avoid confusion.

      *The text was changed in a way to avoid confusion. *

      It would be nice to see if GSK3 can actually phosphorylate DRP3.

      *This would be interesting, however there is not currently a simple way to test this. There is not an antibody for DRP3 that shares enough of its immunogen sequence with the Chlamydomonas DRP3 sequence to use for a western blot. *

      The authors observe that arpc4 mutants do not form actin puncta upon LiCl treatment. Could this phenotype be rescued by complementing with WT ARPC4.

      *We showed in our previous paper (doi.org/10.1101/2020.11.24.396002) that the actin puncta could be rescued by re-expression of wild-type ARPC4 (Figure 4). *

      The concentration of inhibitors is described differently in the text and figure legends (for example Fig. 4A)

      *In the figure legend of figure 4, the concentration of 6-BIO was accidentally reported as 100 µM instead of the correct value (100 nM) as it was throughout the rest of the paper. This was addressed in the latest version. *

      The p values are not significant in some of the figures. (Fig. 4D &Fig. 5C)

      P values were provided for all comparisons in an effort to be transparent and so that readers could draw their own conclusions about the data.

      Reviewer #1 (Significance (Required)):

      The current manuscript by Bigge et al. demonstrates that endocytosis is required for GSK3 inhibition mediated ciliary lengthening. Maintenance of proper length of cilia is crucial and its dysregulation results in pathogenesis. This work takes the field forward and helps in our understanding of how ciliary length is regulated. This work is of interest to researchers working in the field of ciliary biology as well as to those working on endocytosis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors show in this study that Lithium and other GSK3-beta inhibitors induce cilia elongation in Chlamydomonas. They further demonstrate that inhibition of endocytosis by Dynasore prevents the induced elongation of cilia. They speculate that a Dynamin-related protein might be involved in this process, and determine 9 Dynamin related proteins (DRPs) in Chlamydomonas of which DRP3 shows the highest sequence similarity. Lithium-induced ciliary elongation is prevented in DRP3 mutants supporting the author's hypothesis and indicating that DRP3 might be a GSK3-beta target, similar to some animal Dynamins. Since Dynamins interact with the F-actin regulator ARP3/3-complex, and because F-actin reorganization is observed in cells after GSK3-beta inhibition, they test the induction of ciliary elongation in arpc4 mutants and after blocking the ARP-complex by CK-666. Indeed, F-actin remodeling and cilia elongation were prevented after loss of ARP-complex function. The induction of ciliary elongation and F-actin remodeling also correlates with the emergence of strong F-actin punctae in cells, and the authors interpret that as induction of Dynamin-dependent endocytosis (also addressed in a current preprint from the group). From that, the conclude that endocytosis is required for delivering membrane to the growing cilium and that this is required for the observed effects. While this claim is somewhat supported by a lack of cilia elongation inhibition after treatment to prevent protein synthesis or Golgi function, direct evidence for membrane delivery to the cilium, the need for membrane delivery for ciliary elongation, and presence of bona fide endocytotic vesicles is sadly missing. Therefore, this study sheds new light on an important process in ciliary functional regulation and also furthers our understanding on why GSK3-beta inhibition induces elongated cilia in many cell systems, but I am not convinced that the conclusions are actually supported by the data, as the two key points in question were not experimentally addressed at this point.

      Main points: 1. The authors need to demonstrate that new membrane is delivered in the process to the growing cilium. E.g. this could be done by membrane stains (pulse) and static or live-cell imaging analysis in untreated, GSK3-beta inhibitor treated and in mutants.

      *In our previous paper (doi.org/10.1101/2020.11.24.396002), we do an experiment similar to the one described here (Figure 8, previous paper). We biotinylated all surface proteins, then removed the cilia (and therefore all labelled ciliary surface proteins) and allowed them to regrow. We then isolated the new cilia and probed for biotinylated proteins because any biotinylated proteins must have come from the surface of the cell. We found that the cilia did contain membrane proteins from the surface of the cell. This experiment shows that membrane and membrane proteins derived from the plasma membrane are entering growing cilia during regeneration. We added a description of this experiment to the text as follows: *

      “Conversely, when treated with Dynasore to inhibit endocytosis, cilia could not elongate to the same degree as untreated cells (Figure 3A-B), implying endocytosis is required for lithium-induced elongation and that endocytosis requires dynamin. This is consistent with results from our previous studies which show that ciliary membrane and membrane proteins are delivered from the cell body plasma membrane to the cilia. In an experiment first performed in Dentler 2013 and then later in Bigge et al. 2020, we biotinylated all cell surface proteins. Then, deciliated cells and allowed cilia to regrow. We then isolated cilia and probed for biotinylated proteins. Any biotinylated proteins present must have come from the cell body plasma membrane, and we found that indeed biotinylated proteins exist in the newly grown cilia, suggesting that ciliary membrane and membrane proteins can be recruited from the cell body plasma membrane (Dentler 2013; Bigge et al. 2020).”

      However, this experiment cannot be done in the case of lithium because cilia are not removed meaning they already will contain labelled surface proteins. Additionally, cells do not regrow cilia in the presence of lithium, meaning that we cannot add a regeneration. Regardless, work from our previous paper described above does establish that ciliary membrane and membrane proteins are able to come from the cell body plasma membrane as the reviewer requested.

      Along the same line, the authors need to demonstrate that the punctae are truly endocytotic vesicles. For that uptake assays/stains could be used and additional markers. Furthermore, there are multiple modes of endocytosis (e.g. Clathrin) besides Dynamin. The authors should determine if blocking other modes of endocytosis has similar or divergent effects on cilia elongation.

      *In our previous paper (doi.org/10.1101/2020.11.24.396002) we supplement the actin puncta data with membrane labelling to show that the puncta are likely endocytic pits (doi.org/10.1101/2020.11.24.396002, Figure 5). We also show that the puncta require both the Arp2/3 complex and active clathrin to form, further suggesting that they are endocytic (Figure 7, previous paper). We added this to the paper as follows: *

      “Further, they rely on proteins typically thought to be involved in endocytosis including the Arp2/3 complex and clathrin, and they form at times when it makes sense for endocytosis to be occurring, like immediately following deciliation when membrane and protein must be recruited to cilia in a timeframe too short for new protein and membrane synthesis, sorting, and trafficking (Bigge et al. 2020). To provide additional evidence that these are endocytic puncta, we also showed that a corresponding increase in membrane internalization occurs during this same timeframe using a fluorescent membrane dye that is endocytosed in wild-type cells (Bigge et al. 2020).”

      Additionally, Dynamin is required for most forms of endocytosis, including clathrin mediated endocytosis. In the previous paper (doi.org/10.1101/2020.11.24.396002), which we cite here, we do a deep dive into which endocytic proteins are present in Chlamydomonas. We found that clathrin mediated endocytosis is the most highly conserved on the endocytic processes we looked at (Figure 5, previous paper).

      We did add a new figure to this paper (Figure 4) using a dye that labels membrane in lithium treated cells. This dye binds to the plasma membrane but is unable to enter cells by itself and must be endocytosed. We found that during the first 30 minutes of lithium treatment there is increased membrane dye internalization.

      No cilia are actually shown in the study. I personally, would like to see how these cilia look like, especially in relation to the sites of F-actin remodeling and punctae formation. What comes first? Please also provide a axoneme staining to confirm elongation of the ciliary core and what happens to the tubulin pool when cilia cannot elongate any more? Is it accumulating at the ciliary base?

      We added a panel demonstrating where the puncta are in relation to cilia in Figure 4 with a brightfield overlay.* We also look at the appearance and timing of these puncta more in depth in our previous paper (doi.org/10.1101/2020.11.24.396002, Figure 7). We find that puncta form immediately following deciliation and start to return to normal following about 10 minutes of regrowth. We think that this mechanism of ciliary elongation in lithium is similar to what occurs during those early steps of ciliary assembly suggesting that the dots likely form very early on. *

      We also included axoneme staining in Supplemental figure 4*. We show that the axoneme does continue to elongate with the cilia. After about 90 minutes, the cilia actually stop growing and detach from the cells (doi: 10.1128/EC.3.5.1307-1319.2004, doi: doi.org/10.1247/csf.12.369). However, we are interested in the more acute mechanisms that result in ciliary elongation. *

      The authors also claim that the method of GSK3 inhibition is not important. It would be more correct to say that the mode/drug of GSK3 inhibition is not important, but discuss how some of the minor variance between treatments could be explained (incl. the timeline and temporal dynamics of the diverging effects; and the dose-dependency as low concentrations of BIO seem to induce shortening but high doses induce elongation of cilia).

      *We further discussed this in the text as follows: *

      “The minor variances between the drugs could be explained by the timeline in which we tested cilia (90 minutes) or the exact dosages we used. An example of this is 6-BIO where treatment with a low dose of 100 nM caused ciliary lengthening, but treatment with a higher concentration of 2 µM reportedly caused ciliary shortening (Kong et al. 2015). Together, the data suggest that the mode of inhibition by chemical targets of GSK3 is not important for ciliary lengthening. Whether GSK3 was inhibited via competition for ATP binding or phosphorylation, cilia were able to elongate.”

      They propose here a positive effect of F-actin build up in cilia length regulation, while most studies to date report ciliary shortening to correlate with increased F-actin at the ciliary base. I believe that this is not highlighted and discussed enough, which I find reduces the overall quality of the paper (but is easy to improve). It might be also interesting to test if other F-actin inducers/stabiliziers have the same effect?

      *This is addressed in the discussion in the latest version in depth as follows: *

      “One important detail to point out is that Chlamydomonas differ from mammalian cells in that they have a cell wall. The stability awarded by the cell wall means that Chlamydomonas does not require a cortical actin network as mammalian cells do. Thus, in Chlamydomonas, we are able to investigate actin dynamics and functions without the interference of the cortical actin network. This also means that some of the effects we see might be masked in mammalian cells by the presence of the cortical actin network and the effect that it has on ciliary assembly and maintenance.”

      *We also added a section to the introduction to address this concern early on so that readers will have this difference in mind as they read the paper: *

      “Additionally, unlike mammalian cells, Chlamydomonas lacks a cortical actin network which simplifies the relationship between cilia and actin and makes this an ideal model to study such interactions.”

      Also, F-actin inducers/stabilizers do not typically have the same effect because the filamentous actin needed for these processes must be dynamic, or able to undergo rapid depolymerization and repolymerization as needed during this fairly quick timeframe. This is demonstrated in Avasthi, 2014 (*doi.org/10.1016/j.cub.2014.07.038). Cells were treated with several actin targeting inhibitors including LatB which results in depolymerization of filaments and Jasplakinolide which results in stabilization of filaments. In both cases, ciliary regeneration is impaired suggesting that actin must be dynamic for its functions related to cilia. *

      Minor points: 1. In many Figures, the x-axis is labeled "Number of values", but I think that maybe number of observations might be more appropriate.

      We discussed this point and decided to change the axis titles to “Number of cilia”.

      The author often use the word "normally" elongating, but in all cases the elongation is induced = abnormal situation. Maybe the authors could use a different term.

      We originally used “normally” because there are times when we get defective elongation but not no elongation. In the latest version we changed this to “elongation consistent with untreated wild-type cells” or something along those lines.

      It is puzzling as to why DRP3 was chosen, while DRP2 actually is most similar in terms of domain composition. Maybe they could discuss that. They also could explain a bit better how the mutants were generated in which a "cassette was inserted early in the gene". What kind of disruption is expected?

      DRP3 was chosen because it has the highest sequence identity (and similarity). DRP2 while containing all domains, has low overall sequence conservation. DRP3 is also the only DRP that showed a potential GSK3 target site when investigated with ScanSite4.0. This was all made clearer in the text as follows:

      “Chlamydomonas contains 9 DRPs with similarity to a canonical dynamin (DRP1-9). Despite lacking 2 of the canonical dynamin domains, the DRP with the highest sequence similarity and identity to canonical dynamin is DRP3 (Supplemental Figure 2C-D). To determine if GSK3 could be a potential kinase for this protein, we employed ScanSite4.0, which confirmed that of the 9 DRPs of Chlamydomonas, the only one with a traditional GSK3 target sequence was DRP3.”

      The representative images in Figure 4A do not really seem to match the quantifications.

      *The quantitative data suggest that these different treatments have increased dots, which we believe the representative images do show. LiCl and CHIR99021 have the most dots, while 6-BIO and Tideglusib have more dots, but less than LiCl and CHIR99021. *

      line 109: "of-targets" should be off-targets

      Fixed in the latest version, thanks for pointing this out.

      line 141: "delivery form the Golgi" should be FROM the Golgi

      Fixed in the latest version, thanks for pointing this out.

      line 160: "was DRPs" should be was DRP3

      Fixed in the latest version, thanks for pointing this out.

      line 204/205: the sentence starting "Thus, we phalloidin..." should be rephrased. It sounds not quite correct

      Fixed in the latest version, thanks for pointing this out.

      line 209: Figure 4A should refer to Figure 4B

      Fixed in the latest version, thanks for pointing this out.

      line 211: "times or rapid ciliary" should be of rapid ciliary...

      Fixed in the latest version, thanks for pointing this out.

      line 257: "in lithium." Should be in lithium treated cells Fixed in the latest version, thanks for pointing this out.

      Reviewer #2 (Significance (Required)):

      This study sheds new light on an important process in ciliary functional regulation and also furthers our understanding on why GSK3-beta inhibition induces elongated cilia in many cell systems, but I am not convinced that the conclusions are actually supported by the data, as the two key points in question were not experimentally addressed at this point.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Chlamydomonas maintains relatively regular length of cilia (flagella). However, when the cell is exposed to high concentration of lithium ions, it elongates cilia further. In this work, Bigge and Avasthi made experiments to build a potential hypothesis of molecular mechanism of this unusual cilia elongation. Their hypothesis is (1) cilia elongation is triggered, depending on supply of extra membrane (not proteins), (2) membrane is supplied from plasma membrane by clathrin-dependent endocytosis (not from Golgi), (3) this endocytosis contains Arp2/3 complex, (4) GSK3 downregulates Arp2/3 dependent endocytosis and (5) GSK3 is suppressed by lithium. They conducted well-organized experiments to prove each step. While some of them are indirect, their hypotheses were supported experimentally in outline.

      (1) is undoubted, since the authors demonstrated that inhibition of protein production by cycloheximide did not influence cilia elongation.

      (2) The authors clearly demonstrated that source of ciliary membrane for elongation is plasma membrane and not Golgi by examining specific inhibitors' effect. They also showed protein transfer from plasma membrane to cilia, by biotinylaing surface proteins in the cell, deciliating and growing cilia and detecting biotinylated proteins in cilia. This part rather characterizes initial growth of cilia, not elongation. Therefore this result must be properly described in the context of this work (which is elongation of cilia).

      This comment was particularly helpful as it also helps us address some of the comments from the other reviewers. We updated the description of this experiment in the context of this work in the latest version as follows:

      “Further, they rely on proteins typically thought to be involved in endocytosis including the Arp2/3 complex and clathrin, and they form at times when it makes sense for endocytosis to be occurring, like immediately following deciliation when membrane and protein must be recruited to cilia in a timeframe too short for new protein and membrane synthesis, sorting, and trafficking (Bigge et al. 2020). To provide additional evidence that these are endocytic puncta, we also showed that a corresponding increase in membrane internalization occurs during this same timeframe using a fluorescent membrane dye that is endocytosed in wild-type cells (Bigge et al. 2020).”

      For (3)-(4), they visualized Arp2/3 localization, showing highly condensed Arp2/3. They interpreted these particles as sign of clathrin endocytosis. Since so far such an endocytosis particle has not been reported in Chlamydomonas, the authors confirmed that DRPs are target of GSK3 to indirectly show GSK3 influences formation of endocytosis. This reviewer thinks the author should be able to directly confirm endocytosis for example by electron microscopy (of traditional epon-embedded and stained cells).

      We visualized Arp2/3 complex-dependent filamentous actin localization. We provide DRP3 as a potential target of GSK3, but do not report that it is the target that results in increased endocytosis or increased ciliary length. We agree that electron microscopy would be ideal to visualize endocytosis in these cells. However, we feel this is outside the scope of this current work. But, we do have plans to look at endocytosis in Chlamydomonas *using electron microscopy in the future and hope that the increased context from the previous data are sufficient at this time. *

      (5) was elegantly proved by multiple drugs (all known as inhibitor of GSK3), including lithium.

      After fixing these points, this manuscript will be ready for publication.

      Minor points: Line188-191: not clear. What are *** and ****?

      Fixed in the latest version, thanks for pointing this out.

      Line262-264: It would be helpful how the initial cilia growth of the arpc4 cell.

      We agree that this would be helpful information, and included more of a description of how ciliary growth is affected by loss of Arp2/3 complex function in the latest version: “Specifically, we found that the Arp2/3 complex is required for reclamation of membrane from a pool in the plasma membrane during the rapid growth that occurs during early ciliary assembly”.

      Line321: it should read as follows. Cang 2014; carlsson and Bayly 2014). While we...

      Fixed in the latest version, thanks for pointing this out.

      Line329: were -> where

      Fixed in the latest version, thanks for pointing this out.

      Line365-366: Lithium-treated cells are not motile. Any thought why? Maybe protein production is not necessary for apparent cilia elongation, but necessary for elongation of functional cilia.

      *This is an interesting idea. However, even when protein production is allowed to proceed, Lithium-treated cells are not motile. This is a ciliary dysfunction, and in fact, after about 90 minutes incubation with lithium, the cilia of these cells start to crash out or fall off, demonstrating that these are not healthy cells or healthy cilia. *

      Reviewer #3 (Significance (Required)):

      This work is an important step toward the understanding of cilia elongation and thus growth mechanism. It will attract wide audience who have interest in cell biology and motility. My expertise is about motile cilia and their 3D structure.

    1. Author Response

      Joint Public Review:

      Strengths: The study represents a step forward in relating immune responses to infection outcomes that of urgent interest to public health, especially the timing of shedding and frequency of supershedding events. Nguyen et al.'s model provides a useful framework for understanding the links between immune effectors and infection outcomes, and it can be expanded to encompass further biological complexity. The study system is a good choice, given the ubiquity of both helminth and bacterial infections, and experimental infections of rabbits provide a useful point of comparison for past work in mice.

      We appreciated these general comments.

      Limitations: The present study does not explicitly account for differences in helminth infection dynamics across the two species represented in the data nor does it include feedbacks between the bacterial and helminth infections. Nguyen et a. therefore show the limits of what can be learned from focusing on the bacterial and immune dynamics alone, and this study should serve to motivate further work that can build on this modeling approach to produce a more comprehensive view of the interactions among species infecting the same host. Future studies examining the impact of helminth infection intensity would be tremendously useful for assessing the potential of anthelminthics to reduce the prevalence of bacterial respiratory diseases. Finally, subsequent studies may need to look beyond the factors examined here to understand why shedding varies so much through time for individual hosts.

      We agree that focusing only on the bacterial infection is a limitation in this study. We followed a parsimonious approach and decided to concentrate on B. bronchiseptica shedding in the four types of infection. While we do have data on the dynamics of infection of the two helminth species, adding these data would have been an enormous amount of work and too much to present in a single paper. Yet, we have already investigated some of these bi-directional effects using the BT group (Thakar et al. 2012 Plos Comp. Biol.) and plan to keep working on these rich datasets in the future.

      We also agree that it is important to understand the rapid variation in Bordetella shedding observed, which appears to be a common feature in many other host-pathogen systems. This requires a completely new set of experiments on infection and shedding at the local tissue level.

      Specific comments

      Definition of supershedding: A major stated goal of the MS is to investigate the effect of coinfection by helminths on supershedding. In order to compare animals with different coinfections, it is therefore necessary to have a common definition of supershedding. At present, the authors use a definition that depends on which arm of the experiment the animals belong to. This complicates the analysis and clouds its interpretation.

      We value this comment and see the implication of using different datasets to quantify supershedding. To overcome this problem, we now propose a slightly different approach where we pull the four infections together and calculate a common 99th or 95th percentile threshold. This common threshold is then used to calculate the number of hosts with at least one supershedding event above this cut-off, for every type of infection. Therefore, while the threshold is the same the percentage of hosts with supershedding events varies among infection groups.

      Inconsistent approach: Within each experimental treatment, the data display variability on at least three levels: (i) within animals, day-to-day shedding displays variability on a fast timescale; (ii) within animals, infection status varies more slowly over the course of infection; (iii) between animals, there is variation in both (i) and (ii). The authors' model seems well-designed to handle this variability, but the authors are strangely inconsistent in their use of it. To be specific, to account for level (i), the authors very sensibly adopt a zero-inflated model for the shedding data, whereby the rate of shedding (colony-forming units per second, CFU/s) is assumed to arise from a mixture of a quantitative process (which we might think of as intensity of potential shedding) and an all-or-nothing process (which might arise, for example, if some discrete behavior of the animal is necessary for shedding to occur at all). The inclusion of the all-or-nothing process necessitates an additional parameter, but it allows the non-zero shedding data to inform the model. To account for level (ii), the authors use a four-dimensional deterministic dynamical system. Three of the four variables are related to the measured components of the immune response. The fourth is related to the aforementioned potential shedding. Level (iii) is accounted for using a hierarchical Bayesian approach, whereby the individual animals have parameters drawn from a common prior distribution. This approach seems very well designed to address the authors' questions using the data at hand. However, they fail to exploit this, in at least three ways. First, even though the model appears designed specifically to allow for non-shedding animals, the authors exclude animals on an ad hoc basis. Second, rather than display the shedding data in the form recommended by the model, they display log(1+CFU/sec), which is arbitrary and problematic. Its arbitrariness stems from the fact that this quantity is sensitive to the units used for shedding rate. Third, despite the fact that the model appears specifically designed to account for variability at each of the three levels, they do not give enough information to allow the reader to judge whether the model does in fact do a good job of partitioning this variability.

      Please see comments to each specific matter below.

      Exclusion of animals: In view of the fact that the model the authors describe can account for variability on all three levels, it is strange that they exclude animals that shed too little or not at all. It would be preferable were the authors to base their conclusions on all the data they collected rather than on a subset chosen a posteriori. It is true that the non-shedders will have no information about the time-course of shedding; on the other hand, including them does not complicate the analysis, and it does allow for estimation of the all-or-nothing probability in a coherent fashion. In particular, the fact that coinfection appears to have an impact on whether animals shed at all is itself directly related to the authors' central questions. More generally, ad hoc exclusion of data raises concerns about the repeatability of the experiments that, in this case, appear entirely avoidable.

      Rabbits that were infected but never shed were excluded from all our original analysis and continue to be excluded in our updated version. Our focus is on the dynamics of shedding and including animals that do not shed is not informative to our objective. Moreover, these animals do not provide meaningful information on rabbits that are infected but do not shed, since this is a very small number (n=7) to draw meaningful conclusions across four types of infection. Rabbits with three or less shedding events larger than zero (i.e. CFU/s>0) were originally excluded from the modeling and continue to be excluded. This decision was motivated by technical reasons of model convergence and our commitment to generate meaningful results; in other words, it is difficult to fit a model, and provide robust results, on a time series with only three points larger than zero, irrespective of the number of zero points in the time series.<br /> In summary our subset of animals was not chosen a posteriori but based on clear objectives (i.e. pattern of shedding between and within types of infections), a rigorous approach and reliable results. We have further clarified our approach in the Results and Material and Methods.

      Incomplete description of the analysis: The description of the statistical analysis will not be complete until sufficient information is provided to allow the interested reader to decide for him- or herself whether the conclusions are warranted and for the motivated reader to reproduce the analysis. In particular, it is necessary to specify all priors fully. At present, these are not described at all, except in vague, and even incoherent, ways. Also, it is necessary to provide details of the MCMC performed. Specifically, the authors should describe the MCMC sampler and show their MCMC convergence diagnostics. Finally, it is good practice to display both the priors and the posteriors: it is impossible to assess the posteriors without an understanding of the priors.

      We have carefully revised our approach and results and now provide a complete description of our analysis with additional/new details on Parameter calibration, Model fitting, Model validation and Model selection in Material and Methods, and Appendix (Appendix-3 and 4). Specifically, we have included all priors, along with all posteriors, for the four types of infection in Table 2. We have also explained how the MCMC simulations were performed and how model convergence diagnosis was assessed (section ‘Parameter calibration and Model fitting’). In Appendix-3 we also show the parameter MCMC trace plots for the four types of infection.

      Second, rather than display the shedding data in the form recommended by the model, they display log(1+CFU/sec), which is arbitrary and problematic. Its arbitrariness stems from the fact that this quantity is sensitive to the units used for shedding rate.

      A clear feature of our shedding data is that there is large variation in the level of shedding both within and between hosts. Because of this, data were presented as log(1+CFU/s) to reduce the skewness of the datasets, and thus the variance, and facilitate the visualization of the experimental and simulated results. The use of data in the form of CFU/s would have made the visualization much harder, especially at low shedding where a large fraction of the data come from.

      The practice of displaying the data on a log-scale is appropriate when the underlying process is exponential or when the amount of relative variation is large, including when representing rates. This practice is widely used when modeling infectious diseases and describing biomedical results. A typical example is the overdispersion of macroparasite infections in host populations, or the large variation in the size of outbreaks by microparasite infections, these data are often described on a log-scale. An example closer to our case is the study on influenza-bacteria coinfection by Smith et al. 2013 Plos Pathogens. Given the nature of our data we found that plotting the level of shedding on a log-scale was the most effective way to represent our results.

      Model adequacy: The authors' argument rests on the model's ability to adequately account for the data. The authors need to provide some evidence of this, in one form or another. Ultimately, the question is whether the data are a plausible realization of the model. The authors should show simulations from the model (including the measurement error and not merely the deterministic trajectories) and compare these simulations to the data. In particular, it seems worryingly possible that the fitted model is capable of capturing certain averages in the data while, at the same time, failing to describe the infection progression for any of the actual infected animals.

      As previously reported, we have now provided full details on model fitting and model convergence in the section ’Parameter calibration and Model fitting’ and ‘Model validation’ in Material and Methods, and ‘Model validation’ and ‘Model convergence’ in Appendix (Appendix3 and 4).

      Regarding the evidence that the data are a plausible realization of the model, we have moved the original figure S1 in the main text (now figure 5). This figure shows the good fit of the model to neutrophil, IgA and IgG, both using individual and group data from every infection. We have also revised the quality of the plot to highlight individual simulations. To avoid too much crowding the 95% CIs for every individual are not reported, however, in Appendix-1 we provide the posterior parameter estimations and their 95% CIs, for every individual and as a group average, for the three co-infections (simulations for B rabbits were performed at the group level only).

      In the new figure 6 (original figure 5), we have now included the individual trajectories (without 95% CIs to avoid overcrowding), alongside the group trends, for the neutralization rates of neutrophils, IgA and IgG which are the important parameter regulating infection and where the CIs are large enough to show the individual data. The other rates have too narrow CIs to single out individual trajectories and, thus, we only reported the group trends.

      In the revised figure 7 (original figure 6) we have revised the quality of the plots to highlight individual trajectories, in addition to the median trend, but have not included the individual 95% CIs, again to avoid overcrowding.

      Finally, the main text associated to these figures has been updated accordingly.

      Confusion of correlation and causation: At various points, the authors succumb to the temptation to interpret their model literally and to interpret the correlations they observe as evidence for a causal linkage between the three immune components they measure, bacterial shedding, and coinfection. They should be more careful and circumspect in the description of their results.

      We have thoroughly revised the presentation and discussion of the results to avoid the overinterpretation of the findings.

      Additional Issues:

      Eqs 1-4. These equations are not mechanistic in any meaningful sense. Essentially, they posit the existence of exponential time-lags between the three immunity variables, and a simple linear killing relationship between each of the variables and pathogen load. To interpret the equations literally risks making unwarranted conclusions. For example, any physiological variable correlated with any of the three variables in the model might equally well be credited with the influence on shedding attributed to IgA, IgG, or neutrophils.

      This work tests the hypothesis that neutrophils, IgA and IgG affect the dynamics of B. bronchispetica infection and, in turn, bacterial shedding. Of course, there are many other immunological mechanisms that could contribute to the pattern observed and that can be tested, as there are many other variables correlated with these dynamics that do not play any role in these patterns, as noted by the reviewer. We follow a parsimonious approach by focusing on three immune variables previously identified as important in regulating Bordetella infection. To avoid excessive complexity and allow model tractability, our informed decision was to simplify the relationship between immunity and infection, without losing the important role of the immune variables selected. Finally, by referring to previous work by others and us we do note that the immune mechanisms described can be much more complex.

      l 456. Do the authors account for the variability in time spent with plates? Implicitly, the assumption is made that the amount of time a rabbit spends with a plate, i.e., the decision as to whether to engage in a behavior that will terminate the plate interaction, is independent of everything else. This raises the question: Does the time spent per plate correlate with anything?

      We always recorded the amount of time spent with the plate, and every rabbit had a maximum interaction time of 10 minutes. Rabbits are very inquisitive and rarely we had animals that did not interact or had to remove the plate because they were chewing the media; usually animals used the entire 10 minutes. Analyses do account for the interaction time and are presented as Colony Forming Unit/second (CFU/s). As noted in the Material and Methods section ‘Observation model’: ‘The probability of having a shedding event is independent of time since inoculation, in that shedding can occur anytime during the experiment and anytime during the interaction with the petri dish”. This assumption is based on our observations of rabbit behavior during the trials.

    1. Reviewer #3 (Public Review):

      The authors present a systematic assessment of low complexity sequences (LCRs) apply the dotplot matrix method for sequence comparison to identify low-complexity regions based on per-residue similarity. By taking the resulting self-comparison matrices and leveraging tools from image processing, the authors define LCRs based on similarity or non-similarity to one another. Taking the composition of these LCRs, the authors then compare how distinct regions of LCR sequence space compare across different proteomes.

      The paper is well-written and easy to follow, and the results are consistent with prior work. The figures and data are presented in an extremely accessible way and the conclusions seem logical and sound.

      My big picture concern stems from one that is perhaps challenging to evaluate, but it is not really clear to me exactly what we learn here. The authors do a fine job of cataloging LCRs, offer a number of anecdotal inferences and observations are made - perhaps this is sufficient in terms of novelty and interest, but if anyone takes a proteome and identifies sequences based on some set of features that sit in the tails of the feature distribution, they can similarly construct intriguing but somewhat speculative hypotheses regarding the possible origins or meaning of those features.

      The authors use the lysine-repeats as specific examples where they test a hypothesis, which is good, but the importance of lysine repeats in driving nucleolar localization is well established at this point - i.e. to me at least the bioinformatics analysis that precedes those results is unnecessary to have made the resulting prediction. Similarly, the authors find compositional biases in LCR proteins that are found in certain organelles, but those biases are also already established. These are not strictly criticisms, in that it's good that established patterns are found with this method, but I suppose my concern is that this is a lot of work that perhaps does not really push the needle particularly far.

      As an important caveat to this somewhat muted reception, I recognize that having worked on problems in this area for 10+ years I may also be displaying my own biases, and perhaps things that are "already established" warrant repeating with a new approach and a new light. As such, this particular criticism may well be one that can and should be ignored.

      That overall concern notwithstanding, I had several other questions that sprung to mind.

      Dotplot matrix approach<br /> The authors do a fantastic job of explaining this, but I'm left wondering, if one used an algorithm like (say) SEG, defined LCRs, and then compared between LCRs based on composition, would we expect the results to be so different? i.e. the authors make a big deal about the dotplot matrix approach enabling comparison of LCR type, but, it's not clear to me that this is just because it combines a two-step operation into a one-step operation. It would be useful I think to perform a similar analysis as is done later on using SEG and ask if the same UMAP structure appears (and discuss if yes/no).

      LCRs from repeat expansions<br /> I did not see any discussion on the role that repeat expansions can play in defining LCRs. This seems like an important area that should be considered, especially if we expect certain LCRs to appear more frequently due to a combination of slippy codons and minimal impact due to the biochemical properties of the resulting LCR. The authors pursue a (very reasonable) model in which LCRs are functional and important, but it seems the alternative (that LCRs are simply an unavoidable product of large proteomes and emerge through genetic events that are insufficiently deleterious to be selected against). Some discussion on this would be helpful. it also makes me wonder if the authors' null proteome model is the "right" model, although I would also say developing an accurate and reasonable null model that accounts for repeat expansions is beyond what I would consider the scope of this paper.

      Minor points<br /> Early on the authors discuss the roles of LCRs in higher-order assemblies. They then make reference to the lysine tracts as having a valence of 2 or 3. It is possibly useful to mention that valence reflects the number of simultaneous partners that a protein can interact with - while it is certainly possible that a single lysine tracts interacts with a single partner simultaneously (meaning the tract contributes a valence of 1) I don't think the authors can know that, so it may be wise to avoid specifying the specific valence.

      The authors make reference to Q/H LCRs. Recent work from Gutiérrez et al. eLife (2022) has argued that histidine-richness in some glutamine-rich LCRs is above the number expected based on codon bias, and may reflect a mode of pH sensing. This may be worth discussing.

      Eric Ross has a number of very nice papers on this topic, but sadly I don't think any of them are cited here. On the question of LCR composition and condensate recruitment, I would recommend Boncella et al. PNAS (2020). On the question of proteome-wide LCR analysis, see Cascarina et al PLoS CompBio (2018) and Cascarina et al PLoS CompBio 2020.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to create a machine learning framework for analyzing video recordings of animal behavior, which is both efficient and runs in an unsupervised fashion. The authors construct Selfee from recent computational neural network codes. As the paper is methodsfocused, the key metrics for success would be (1) whether Selfee performs similarly or more accurately than existing methods, and more importantly (2) whether Selfee uncovers new behavioral features or dynamics otherwise missed by those existing methods.

      Weaknesses:

      Although the basic schematics of Selfee are laid out, and the code itself is available, I feel that material in between these two levels of description is somewhat lacking. Details of what other previously published machine learning code makes up Selfee, and how those parts work would be helpful. Some of this is in the methods section, but an expanded version aimed at a more general readership would be helpful.

      Thanks for the suggestions. We expanded the paragraphs describing training objectives and AR-HMM analysis. We also revised Figure 2C for clarity, and we have added a new figure, Figure 6, to describe how our pipeline works in detail. We also added a detailed instructions for Selfee usage on our GitHub page.

      *The paper highlights efficiency as an important aspect of machine learning analysis techniques in the introduction, but there is little follow up with this aspect.

      Our model only had a more efficient training process compared with other self-supervised learning methods. We also found our model could perform zero-shot domain transfer, so training may not even be necessary. However, we did not mean that our model was superior in terms of data efficiency or inference speed. We have revised some of the claims in the Discussion.

      *In comparing Selfee to other approaches, the paper uses DeepLabCut, but perhaps running other recent methods for more comprehensive comparison would be helpful as well.

      We compare Selfee feature extraction with features from FlyTracker or JAABA, two widely used software. We also visualized the tracking results of SLEAP and FlyTracker in complement to the DeepLabCut experiment.

      *Using Selfee to investigate courtship behavior and other interactions was nicely demonstrated. Running it on simpler data (say, videos of individual animals walking around or exploring a confined space) might more broadly establish the method's usefulness.

      We used Selfee with open field test (OFT) of mice after chronic immobilization stress (CIS) treatment. We demonstrated that our pipeline from data preprocessing to all the data mining algorisms with this experiment, and the results were added to the last section of Results.

      Reviewer #2 (Public Review):

      Jia et al. present a CNN based tool named "Selfee" for unsupervised quantification of animal behavior that could be used for objectively analyzing animal behavior recorded in relatively simple setups commonly used by various neurobiology/ethology laboratories. This work is very relevant but has some serious unresolved issues for establishing credibility of the method.

      Overall Strengths: Jia et al have leveraged a recent development "Simple Siamese CNNs" to work for behavioral segmentation. This is a terrific effort and theoretically very attractive.

      Overall Weakness: Unfortunately, the data supporting the method is not as promising. It is also riddled with incomplete information and lack of rationale behind the experiments.

      Specific points of concern:

      1) No formal comparison with pre-existing methods like JAABA which would work on similar videos as Selfee.

      We added some comparisons with JAABA and FlyTracker extracted features, and also visualized FlyTracker and SLEAP tracking results aside from DeepLabCut. This result is now in the new Table 1. To avoid tracking inaccuracy during intensive interactions and potential inappropriately tuned parameters, we used a peer-reviewed dataset focused on wing extension behavior only. Our results showed a competitive performance of Selfee as other methods.

      2) For all Drosophila behavior experiments, I'm concerned about the control and test genetic background. Several studies have reported that social behaviors like courtship and aggression are highly visual and sensitive to genetic background and presence of "white" gene. The authors use Canton S (CS) flies as control data. Whereas it is unclear if any or all of the test genotypes have been crossed into this background. It would be helpful if authors provide genotype information for test flies.

      We have added a detailed sheet about their genotype in this version. The genetic information of all animals can also be found on the Bloomington fly center by the IDs provided. In brief, five fly lines used in this work are in the CS background: CCHa2-R-RAGal4, CCHa2-R-RBGal4, Dop2RKO, DopEcRGal4 and Tdc2RO54. We did not back cross other flies into the CS background for three reasons. First, most mutant lines are compared with their appropriate control lines. For example, in the original Figure 3B (the new Figure 4B), for CCHa2-R-RBGal4 > Kir2.1 flies contained wildtype white gene, so the comparison with CS flies would not cause any problem. For TrhGal4 flies, they were in white background, and so were other lines that had no phenotype. At the same time, in the original Figure 3G to J (the new Figure 4G to J), we used w1118 as controls for TrhGal4 flies, which were all in mutated white background. Second, in the original Figure 4F and G (the new Figure 5F and G), we admitted that the comparison between NorpA36, in mutated white background, and CS flies was not very convincing. Nevertheless, the delayed dynamic of NorpA mutants was reported before, and our experiment was just a demonstration of the DTW algorithm. Lastly, our method focused on the methodology of animal behavior analysis, and original videos were provided for research replications. Therefore, even if the behavioral difference was due to genetic backgrounds, it would not affect the conclusion that our method could detect the difference

      3) Utility of "anomaly score" rests on Fig 3 data. Authors write they screened "neurotransmitter-related mutants or neuron silenced lines" (lines 251-252). Yet Figure 3B lacks some of the most commonly occurring neurotransmitter mutants/neuron labeling lines (e.g. Acetelcholine, GABA, Dopamaine, instead there are some neurotransmitter receptor lines, but then again prominent ones are missing). This reduces the credibility of this data.

      First of all, this paper did not intend to conduct new screening assays, rather we used pre-existed data in the lab to demonstrate the application of Selfee. Previous work in our lab focused on the homeostatic control of fly behaviors, so most listed lines used here were originally used to test the roles of neuropeptides or neurons nutrient and metabolism regulation, such as CCHarelated lines, a CNMa mutant, and Taotie neuron silenced flies. There were some other important genes that were not involved in this dataset. Some most common transmitters are not included for two reasons. First, common neurotransmitters usually have a very global and broad effect on animal behaviors, and even if there is any new discovery, it could be difficult to interpret the phenomenon due to a large number of disturbed neurons. Second, most mutants of those common neurotransmitters are not viable, for example, paleGal4 as a mutant for dopamine; Gad1A30 for GABA, and ChATl3 for acetylcholine. However, we did perform experiments on serotonin-related genes (SerT and Trh), octopamine-related genes (Tdc and Oamb), and some other viable dopamine receptor mutants.

      4) The utility of AR-HMM following "Selfee" analysis rests on the IR76b mutant experiment (Fig4). This is the most perplexing experiment! There are so many receptors implicated in courtship and IR76b is definitely not among the most well-known. None of the citations for IR76b in this manuscript have anything to do with detection of female pheromones. IR76b is implicated in salt and amino acid sensation. The authors still call this "an extensively studies (co)receptor that is known to detect female pheromones" (lines310-311). Unsurprisingly the AR-HMM analysis doesn't find any difference in modules related to courtship. Unless I'm mistaken the premise for this experiment is wrong and hence not much weight should be given to its results.

      We have removed the Ir76b results from the Results. The demonstration of AR-HMM was now done with a mouse open field assay.

      Reviewer #3 (Public Review):

      This paper is describing a machine learning method applied to videos of animals. The method requires very little pre-processing (end-to-end) such as image segmentation or background subtraction. The input images have three channels, mapping temporal information (liveframes). The architecture is based on tween deep neural networks (Siamese network) and does not require human annotated labels (unsupervised learning). However, labels can still be used if they are produced, as in this case, by the algorithm itself - self-supervised learning. This flavor of machine learning is reflected in the name of the method: "Selfee." The authors are convincingly applying the Selfee to several challenging animal behavior tasks which results in biologically relevant discoveries.

      A significant advantage of unsupervised and self-supervised learning is twofold: 1) it allows for discovering new behaviors, and 2) it doesn't require human-produced labels.

      In this case of self-supervised learning the features (meta-representations) are learned from two views of the same original image (live-frame), where one of the views is augmented in several different ways, with a hope to let the deep neural network (ResNet-50 architecture in this case) learn to ignore such augmentations, i.e. learn the meta-representations invariant to natural changes in the data similar to the augmentations. This is accomplished by utilizing a Siamese Convolutional Neural Network (CNN) with the ResNet-50 version as a backbone. Siamese networks are composed of tween deep nets, where each member of the pair is trying to predict the output of another. In applications such as face recognition they normally work in the supervised learning setting, by utilizing "triplets" containing "negative samples." These are the labels.

      However, in the self-supervised setting, which "Selfee" is implementing, the negative samples are not required. Instead the same image (a positive sample) is viewed twice, as described above. Here the authors use the SimSiam core architecture described by Chen, X. & He, K (reference 29 in the paper). They add Cross-Level Discrimination (CLD) to the SimSiam core. Together these two components provide two Loss functions (Loss 1 and Loss 2). Both are critical for the extraction of useful features. In fact, removing the CLD causes major deterioration of the classification performance (Figure 2-figure supplement 5).

      The authors demonstrate the utility of the Selfee by using the learned features (metarepresentations) for classification (supervised learning; with human annotation), discovering short-lasting new behaviors in flies by anomaly detection, long time-scale dynamics by ARHMM, and Dynamic Time Warping (DTW).

      For the classification the authors use k-NN (flies) and LightGBM (mice) classifiers and they infer the labels from the Selfee embedding (for each frame), and the temporal context, using the time-windows of 21 frames and 81 frames, for k-NN classification and LightGBM classification, respectively. Accounting for the temporal context is especially important in mice (LightGBM classification) so the authors add additional windowed features, including frequency information. This is a neat approach. They quantify the classification performance by confusion matrices and compute the F1 for each.

      Overall, I find these classification results compelling, but one general concern is the criticality of the CLD component for achieving any meaningful classification. I would suggest that the authors discuss in more depth why this component is so critical for the extraction of features (used in supervised classification) and compare their SimSiam architecture to other methods where the CLD component is implemented. In other words, to what degree is the SimSiam implementation an overkill? Could a simpler (and thus faster) method be used - with the CLD component - instead to achieve similar end-to-end classification? The answer would help illuminate the importance of the SimSiam architecture in Selfee.

      We added more about the contribution of the CLD loss in the last paragraph of Siamese convolutional neural networks capture discriminative representations of animal posture, the second section of Results. Further optimization of neural network architectures was discussed in the Discussion section. As for why CLD is that important, there are two main reasons. First of all, all behavior photos are so similar that it is not very easy to distinguish them from each other. In the field of so-called self-supervised learning without negative samples, researchers use either batch normalization or similar operations to implicitly utilize negative samples within a minibatch. However, when all samples are quite similar, it might not be enough. CLD uses explicit clusters to utilize negative samples within a minibatch, in the word of the authors “Our key insight is that grouping could result from not just attraction, but also common repulsion”, so that provides more powerful discrimination. The second reason is what the author argued in the CLD paper, CLD is very powerful in processing long-tailed datasets. As shown in the original Figure 2—figure supplement 5 (the new Figure 3—figure supplement 5), behavior data are highly unbalanced. As explained in the CLD paper. CLD fights against long-tailed distribution from two aspects. One is that it scales up the importance of negative samples within a mini-batch from 1/B to 1/K by k-means; another is that cluster operation could relieve the imbalance between the tail and head classes within a mini-batch. Here I quote: “While the distribution of instances in a random mini-batch is long-tailed, it would be more flattened across classes after clustering.” It was also visualized in Fig5 of the CLD paper.

      To the best of our knowledge, SimSiam is the simplest method that would work with CLD. In the original CLD paper, they combined CLD method with other popular frameworks including BYOL and Mocov2. However, those popular frameworks are more complicated than SimSiam networks. We have attempted to combine CLD with BarlowTwins but failed. As the author of CLD suggested on Github: “Hi, good to know that you are trying to combine CLD with BarLowTwins! My concern is also on the high feature dimension, which may cause the low clustering quality. Maybe it is necessary to have a projection layer to project the highdimensional feature space to a low-dimensional one.” In terms of speed, there are two major parts. For inference, only one branch is used, so the major contribution of efficiency comes from CNN backbone. In theory, light backbones like MobileNet would work, but ResNet50 is already fast enough on a model GPU. As for training, the major computational cost aside from the CNN backbone is from Siamese branches. Two branches, two times of computation. Nevertheless, CLD relied on this kind of structure, so even if the learning framework is simpler than Simsiam, it is not likely to achieve a faster training speed. As for other structures, I think this new instance learning framework (https://arxiv.org/abs/2201.10728) is possible to achieve a similar result with fewer data and in a shorter time. However, this powerful method could be used with CLD. We might try it in the future.

      One potential issue with unsupervised/self-supervised learning is that it "discovers" new classes based, not on behavioral features but rather on some other, irrelevant, properties of the video, e.g. proximity to the edges, a particular camera angle, or a distortion. In supervised learning the algorithm learns the features that are invariant to such properties, because humanmade labels are used and humans are great at finding these invariant features. The authors do mention a potential limitation, related to this issue, in the Discussion ("mode splitting"). One way of getting around this issue, other than providing negative samples, is to use a very homogeneous environment (so that only invariance to orientation, translation, etc, needs to be accomplished). This has worked nicely, for example, with posture embedding (Berman, G. J., et al; reference 19 in the manuscript). Looking at the t-SNE plots in Figure 2 one must wonder how many of the "clusters" present there are the result of such learning of irrelevant (for behavior) features, i.e. how good is the generalization of the meta-representations. The authors should explore the behaviors found in different parts of the t-SNE maps and evaluate the effect of the irrelevant features on their distributions. For example, they may ask: to what extent does the distance of an animal from the nearest wall affect the position in the t-SNE map? It would be nice to see how various simple pre-processing steps might affect the t-SNE maps, as well as the classification performance. Some form of segmentation, even very crude, or simply background subtraction, could go a very long way towards improving the features learned by Selfee.

      In the new Figure 3—figure supplement 1, the visualization demonstrates that our features contained a lot of physical information, including wing angles, animal distance and positions in the chamber. “Mode-split” can be partially explained by those features. We actually performed background subtraction and image crop for mice behaviors, where we found them useful.

      The anomaly detection is used to find unusual short-lasting events during male-male interaction behavior (Figure 3). The method is explained clearly. The results show how Selfee discovered a mutant line with a particularly high anomaly score. The authors managed to identify this behavior as "brief tussle behavior mixed with copulation attempts." The anomaly detection analyses were also applied to discover another unusual phenotype (close body contact) in another mutant line. Both results are significant when compared to the control groups.

      The authors then apply AR-HMM and DTW to study the time dynamics of courtship behavior. Here too, they discover two phenotypes with unusual courtship dynamics, one in an olfactory mutant, and another in flies where the mutation affects visual transduction. Both results are compelling.

      The authors explain their usage of DTW clearly, but they should expand the description of the AR-HMM so that the reader doesn't have to study the original sources.

      We expanded the section that talks about AR-HMM mechanisms.

    1. Author Response

      Reviewer #1 (Public Review):

      This work offers a simple explanation to a fundamental question in cell biology: what dictates the volume of a cell and of its nucleus, focusing on yeast cells. The central message is that all this can be explained by an osmotic equilibrium, using the classical Van't Hoff's Law. The novelty resides in an effort to provide actual numbers experimentally.

      In this work, Lemière and colleagues combine physical modeling and quantitative measures to establish the basic principles that dictate the volume of a cell and of its nucleus. By doing so, they also explain an observation reported many times and in many different types of cells, of a proportionality between the volume of the cell and of its nucleus. The central message is that all this can be explained by an osmotic equilibrium, using the classical Van't Hoff's Law. This is because, in yeast cells, while the cell has a wall that can contribute to the equilibrium, the nucleus does not have a lamina and there is thus no elastic contribution in the force balance for the nucleus, as the authors show very nicely experimentally, using both cells and protoplasts and measuring the cell and nucleus volume for various external osmotic pressures (the Boyle Van't Hoff Law for a perfect gas, also sometimes called the Ponder relation) ¬- this was performed before for mammalian cells (Finan et al.), as cited and commented in the discussion by the authors, showing that mammalian cells have no significant elastic wall (linear relation) while the nucleus has one (non linear relation). This is well explained by the authors in the discussion. It is one of the clearer experimental results of the article. Together, the data and model presented in this article offer a simple explanation to a fundamental question in cell biology. In this matter, the principles are indeed seemingly simple, but what really counts are the actual numbers. While this article sheds some light on this aspect, it does not totally solve the question. The experiments are very well done and quantified, but some approximations made in the modeling are questionable and should at least be discussed in more length. Overall, this article is extremely valuable in the context of the recent effort of the cell biology and biophysics communities to understand the fundamental question of what dictates the size of cells and organelles. I have a few concerns detailed below. Importantly, there are many very interesting points of the article that I am not discussing below, simply because I completely agree with them.

      1) The main concern is about the assumption made by the authors that the small osmolytes do not count to establish the volume of the nucleus. It was shown that small osmolytes such as ions are a vast majority of the osmolytes in a cell (more than ten times more abundant than proteins for example, which represent about 10 mM, for a total of 500 mM of osmolytes). This means that just a small imbalance in the amount of these between the nucleus and cytoplasm might have a much larger effect than the number of proteins, which is the osmolyte that authors choose to consider for the nuclear volume.

      The point of the authors to disregard small osmolytes is that they can freely diffuse between the cytoplasm and the nucleus through the nuclear pores. They thus consider that the nuclear volume is established thanks to the barrier function of the nuclear envelope, which would retain larger osmolytes inside the nucleus and that the rest is balanced. This reasoning is not correct: for example, the volume of charged polymers depends on the concentration of ions in the polymer while there is no membrane at all to retain them. This is because of an important principle that the authors do not include in their reasoning, which is electro-neutrality.

      Because most large molecules in the cell are charged (proteins and also DNA for the nucleus), the number of counterions is large, and is probably much larger than the number of proteins. So it is hard to argue that this could be ignored in the number of osmotically active molecules in the nucleus. This is known as the Donnan equilibrium and the question is thus whether this is actually the principle which dictates the nuclear volume.

      The question then becomes whether the number of counterions differs between the cytoplasm and the nucleus, and more precisely whether the difference is larger than the difference considered by the authors in the number of proteins.

      How is it possible to estimate this number? One of the numbers found in the literature is the electric potential across the nuclear envelope (Mazanti Physiological Reviews 2001). The number is between 1 and 10 mV, with more cations in the nucleus than in the cytoplasm. This number could correspond to much more cations than the number of proteins, although the precise number is not so simple to compute and the precision of the measure matters a lot, since there is an exponential relation between the concentrations and the potential.

      This point above is simply made to explain that the authors cannot rule out the contribution of small osmolytes to the nuclear volume and should at least leave this possibility open in the discussion of their article.

      As a conclusion, I totally agree with equation 3 which defines the N/C ratio, but I think that the Ns considered might not be the number of large macromolecules which cannot pass the nuclear envelope, but rather the small ones. Whether it is the case or not and what is actually the important species to consider depends on the actual numbers and these numbers are not established in this article. It is likely out of the scope of the article to establish them, but the point should at least be discussed and left open for future studies.

      We appreciate these excellent points made by the reviewer and their numerous consultants. We amend the discussion of colloid osmotic pressure in the text to reflect these points.

      2) The authors refer to the notion of colloidal pressure, discussed in the review by Mitchison et al. This term could be confusing and the authors should either explain it better or just not use it and call it perfect gas pressure or Van't Hoff pressure. Indeed, what is meant by colloidal pressure is simply the notion that all molecules could be considered as individual objects, independently of their size, and that it is then possible to apply the Van't Hoff Law just as it was a perfect gas, hence the notion of 'colloidal' pressure, which would be the osmotic pressure of all the individual molecules. The authors might want to discuss, or at least mention, that it is a bit surprising that all these crowded large macromolecules would behave like a perfect osmometer and that the Van't Hoff law applies to them. Alternatively, it could be simpler to consider that what actually counts for the volume is mostly small freely diffusing osmolytes, to which this law applies well, and which are much more numerous.

      3) Very small point: on page 7 the authors refer to BVH's Law (Nobel, 1969). It is not clear what they mean. If they refer to the Nobel prize of Van't Hoff, it dates from 1901 (he died in 1911) and not 1969. I am not sure if there is something in one of the Nobel prizes delivered in 1969 which relates to this law. I checked but it does not seem to be the case, so it is probably a mistake in the date.

      The citation is correct. It's a JTB paper by Park S. Nobel describing the BHV relation in biology.

      4) On page 11, bottom, the result of the maintenance of the N/C ratio in protoplast is presented as an additional result, while it is a simple consequence of the previous results: both the cell and nuclear volume change linearly with the external osmotic pressure, so it is obvious that their ratio does not change when the external pressure is changed.

      This result was not trivial. Although both cells and nuclei volume change linearly with the inverse of the external osmotic concentration in protoplasts, it was not obvious whether the two volumes change with the same proportion (ie same slope on the BVH graph).

      Another result, not commented by the authors, is that this should be true only in protoplasts, since in whole cells, the cell wall is affecting the response of the cell volume, but not the nucleus, so the ratio should change.

      In whole cells, the maintenance of the N/C ratio is in fact also maintained, consistent with the model. This result is now clarified in the manuscript (Figure 1C and D plus Figures 3D and S1C).

      5) The results in Figure 5, with the inhibition of export from the nucleus, are presented as supporting the model. It is not really clear that they do. First the effect is very small, even if very clear. Again, the numbers matter here, so the interpretation of this result is not really direct and more calculation should be made to understand whether it can really be explained by a change of number of proteins. The result in panel F is even more problematic. The authors try to argue that the nucleus transiently gets denser, based on the diffusion of the GEMs and then adapts its density. It rather seems that it is overall quite constant in density, while it is the cell which has a decreasing density ¬- maybe, as suggested by the authors, because there are less ribosomes in the cytoplasm, so protein production is reduced. This could have an indirect effect on the number of amino acids (which would then be less consumed). A recent article by Neurohr et al (Trends in cell biology, 2020) suggests that such an effect can lead to cell dilution, in yeast, because the number of amino acids increases. In this particular case, this increase would affect the nuclear volume rather than the cell volume because of the presence of the cell wall and the rather small change.

      We agree that there are different possible interpretations for these results. We have carefully reconsidered the interpretation and have rewritten the entire text for Figure 5

      6) Page 16: it seems to me that the experiments presented in the chapter lines 360 to 376, on the ribosomal subunits, simply confirm that export is impaired, and they do not really contribute to confirm the hypothesis of the authors that it is the number of proteins in the nucleus which counts.

      We agree. We highlight the ribosomal subunit proteins as they are very abundant nuclear shuttling proteins that provide a good example for the dynamics of nuclear protein accumulation.

      The next paragraph with the estimation of the number of proteins in the nucleus and cytoplasm and how they change relatively upon export inhibition also appears to mostly demonstrate that export has been inhibited.

      The authors propose to use the number they find, 8%, to compare it to the change in the N/C ratio, which is of the same order. Given how small these numbers are, and the precision of such measures, it is very hard to believe that these 8% are really precise at a level which could allow such a comparison. The authors should really estimate the precision of their measures if they want to claim that. It is more likely that what they observe is a small but significant change in both cases; a small change means it is small compared to the total, so it is a fraction of it, and it is measurable, which means it is more than just a few percent, which is usually not possible to measure. So it means that it is in the order of 10%. This is the typical value of any small but measurable change given a method for the measure which can detect changes around 10%. In conclusion, these numbers might not prove anything.

      It could also be that the numbers match not just by chance, but that the osmolyte which matters is, for this type of experiment, changing in proportion to the amount of proteins (which would be possible for counter ions for example). But determining all that requires precise calculations and additional measures. It is thus more a matter of discussion and should be left more open by the authors.

      We agree that these measurements are not so precise. We have carefully reworded this section and removed these specific comparisons.

      Reviewer #2 (Public Review):

      The goal of the paper is to test the idea that colloidal osmotic pressure controls nuclear growth as suggested by Tim Mitchison in a recent review.

      In fleshing out the idea, Lemiere and colleagues develop a simple mathematical model that focuses on the forces generated by the movement of macromolecules across the nuclear-cytoplasmic boundary, ignoring any contribution of ions or small molecules which they assume equilibrate across the nuclear envelope. In testing this model, they focus their quantitative analysis on the response of cells that lack a wall (protoplasts) to osmotic shocks and to perturbations of nuclear export, protein synthesis and symmetric cell division. They also analyse the motion of small 40nm particles to test how diffusion is affected by these perturbations in both compartments.

      Their analysis leads them to make some important observations that suggest that the system is even simpler than they might have hoped, since under the conditions tested nuclei (which lack lamins) behave as ideal osmometers. That is, the nuclei and cytoplasm grow and shrink in concert following sudden osmotic shocks. This suggests that the tension in the nuclear envelope, which gives nuclei their spherical shape, plays no role in constraining nuclear size.

      While most of the paper's claims are well supported by their data under the assumptions of the model, there are a few claims that are less convincing.

      For example, while their data are consistent with the idea that cells regulate their nuclear/cytoplasmic size ration using an adder type mechanism, in which a fix ratio of nuclear and cytoplasmic proteins are synthesised per unit time as cells grow, this has not been rigorously put to the test. In addition, while the diffusion analysis is very interesting, it does not fully support the authors' simple model linking diffusion, molecular crowding and colloidal osmotic pressure, something that could be more thoroughly discussed in the manuscript.

      We added new data showing that slowing growth rate leads to a proportionate decrease in N/C ratio correction. This strengthens this portion of the paper.

      We have added an improved discussion of the GEMs data and its limitations.

      Reviewer #3 (Public Review):

      This manuscript by Lemière and colleagues presents a view on how nuclear size is set by simple physical principles. The first part of the work describes a theoretical framework with the nucleus and the cell as two nested osmometers. Using fission yeast as a model, the authors then show that protoplasts and nuclei behave as ideal osmometers, i.e. show linear changes in volume upon change in external osmotic pressure. Consequently, the nuclear to cell volume ratio remains constant upon osmotic changes, but increases upon block of nuclear export, which leads to higher nuclear protein contents. Measurements of diffusion in the cytoplasm and nucleoplasm back these data. Finally, in the last part of the manuscript, the authors show that nuclear growth through a passive osmotic model can explain the previously described homeostasis of nuclear volume.

      The manuscript is clearly written, and the data are clean and overall solid. I very much liked the simple view on the phenomenon of constant nuclear to cytosol ratio and the mix of modelling and experiments supporting the model that nuclear size is set passively by osmotic principles.

      There are however a few points that are slightly at odds with the model and/or require further explanation to make the model compelling and discuss it in view of previous findings.

      1) Isn't the finding that diffusion rates are faster in the nucleus (line 298, Fig S4C), indicating lower crowding in the nucleus, at odds with the finding that the non-osmotic volumes are similar in the two compartments? If the nucleus is less crowded, does this not suggest a lower pressure than the cytosol? I would also like to see this finding appear in Figure 4, which only reports on the normalized diffusion rates in both nuclei and cytosol.

      We have added this figure to the main Figure 4, as requested. We agree that this raises some interesting questions. Our current interpretation is that composition of the nucleoplasm and cytoplasm are different and therefore affect GEMs diffusion and colloid osmotic pressure slightly differently.

      2) Similarly, I don't understand the observed change in diffusion rates of GEMs upon LMB treatment (Fig 5F). If the nucleus behaves as an ideal osmometer, then any change in protein density between the nucleus and the cytosol, leading to change in osmotic pressure, will lead to a change in nuclear size that should re-equilibrate the osmotic pressures between the two compartments. The prediction would thus be that, if LMB treatment does not change overall protein concentration, at equilibrium there is no change in either osmotic pressure or density as measured by GEM diffusion rates. This is indeed illustrated by the constant normalized non-osmotic volume of the nucleus after LMB treatment. Is the change in diffusion rates perhaps only transient until a new steady state is reached? Or is there a change upon total protein content in the cell after LMB treatment?

      3) In the experiments labelling proteins with FITC, are the reported values really those of protein concentrations or rather protein amounts? Isn't the enlargement of the nucleus upon LMB treatment compensating for this increase in amounts, returning the nucleus to a similar concentration as before treatment? A change in concentration is not in agreement with the reported constant non-osmotic volume of the nucleus.

      These measurements of intensity are of concentrations. We add in the text this prediction that changes in concentration will be compensated for by swelling in nuclear volume and now interpret the data in light of this prediction. We add new data that total FITC staining for protein and RNA shows no change in concentration in compartments, consistent with this model.

      4) The authors state that "a previous paper proposed a model for N/C ratio homeostasis based upon an active feedback mechanism (Cantwell and Nurse, 2019)" (lines 471-472). My understanding of this previous study is that nuclear size was proposed to be set by a limiting component, itself proportional to cell volume. No feedback was postulated. This previous model is in fact not too different from what the authors propose here, with the previously proposed limiting component now corresponding to the nuclear macromolecules that produce colloid osmotic pressure and thus set nuclear size. Though the present study goes significantly further in presenting the passive role of osmosis in setting nuclear size, it is a misrepresentation to portray this previous model as fundamentally different. Furthermore, it is not clear whether the new osmotic pressure-based model produces a better fit than the previous 'limiting component model'. Figure 7E here is very similar to Fig 4I in Cantwell and Nurse 2019, but it is difficult to judge the similarity of the fits.

      The Cantwell and Nurse paper tested two models. The first was based upon nuclear growth being a fraction of cell growth. This model is qualitatively similar to ours. However, they discarded this initial model because it fitted poorly with their data. They then went to propose a second model, which contains a critical equation in which nuclear growth rate is a function of the N/C ratio, i.e. the system is sensing the N/C ratio and adjusting nuclear growth rate as a function of the N/C ratio. In other words, this is a feedback mechanism. The Cantwell paper does not describe this "feedback" term explicitly in the text, but it is clearly present in the equations. Therefore, our model which lacks any feedback term is fundamentally different from the Cantwell limiting component model.

      We show that our model fits our data much better than the Cantwell model. We believe that the different views in these studies arise from differences in the experimental data. These differences may arise from two technical differences: 1) Their use of binning could be responsible for flattening the nuclear growth rate as a function of the nuclear volume at start. 2) Their estimates of cell and nuclear volumes using a 2D image and geometric assumptions may be less accurate than our automated 3D volume method.

      5) If nuclear size is set purely by osmotic regulation, how do you explain that mutants in membrane regulation (such as nem1 and spo7, see Kume et al 2017; or lem2, see Kume et al 2019) previously shown to have an enlarged nucleus, display increased nuclear size?

      This is an interesting question that we are currently pursuing. It is likely that these mutants affect multiple processes besides nuclear envelope expansion. For example, at least some of these mutants have altered chromatin organization could cause increase in colloid pressure. There may also be significant defects in chromosome segregation, which leads to production of different-sized nuclei with abnormal number of chromosomes. Some of the N/C ratio defects reported in these papers may arise from their 2D measurement methods, which are not accurate for misshapen nuclei. In our preliminary results, lem2 mutants do not have N/C ratio defects.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The data presented in the first part of the study are convincing. However, it is unclear whether each step of cell elongation and alignment, cell migration, cell dedifferentiation and regenerative response, is required for fin regeneration following amputation. As indicated in the discussion, the authors cannot provide evidence for the requirement of migration or dedifferentiation for the overall success of fin regeneration. Such limitations should be more clearly stated.

      We have modified the title and abstract to avoid overstating the requirement of the particular responses to successful regeneration. Furthermore, we have stated the limitations of our study more clearly in the discussion.

      We have removed the word “requires” from the title, it now reads: Zebrafish fin regeneration involves generic and regeneration-specific osteoblast injury responses

      In the discussion we state the limitations on page 21 as follows:

      “Unfortunately, currently existing tools to block dedifferentiation are either mosaic (activation of NF- κB signalling using the Cre-lox system) or cannot be targeted to osteoblasts alone (treatment with retinoic acid). Due to these limitations in our assays, we can currently not test what consequences specific, unmitigated perturbation of osteoblast dedifferentiation has for overall fin / bone regeneration. Conversely, the interventions presented here that specifically perturb osteoblast migration are limited as they act only transiently, that is they can severely delay, but not fully block migration. Furthermore, while interference with actomyosin dynamics reduces regenerative growth, we cannot distinguish whether this is caused by the inhibition of osteoblast migration or due to other more direct effects on cell proliferation and tissue growth. Thus, an unequivocal test of the importance of osteoblast migration for bone regeneration requires different tools.”

      In the second part of the study, the term trauma needs to be clarified or reconsidered. A trauma model would imply that healing is impaired. Evidence for a non-healing phenotype is lacking and is expected in support of a trauma model.

      We apologize if our use of the term trauma has caused confusion. We have simply used it interchangeably with “injury”. We have now removed all references to “trauma” in the text.

      The authors describe the process of fin regeneration that may share common features with bone regeneration in other species. In the absence of direct evidence of common mechanisms between fin regeneration and bone regeneration in other systems, the authors should remain focused on "fin regeneration" in their conclusions rather than referring to "bone regeneration" and "bone formation" in more general terms.

      We have rephrased the conclusion to have it more centred on bone regeneration in the fin. The relevant parts of the discussion now read on page 25 as follows:

      In conclusion, our findings support a model in which zebrafish fin bone regeneration involves both generic and regeneration-specific injury responses of osteoblasts. Morphology changes and directed migration towards the injury site as well as dedifferentiation represent generic responses that occur at all injuries even if they are not followed by regenerative bone formation. While migration and dedifferentiation can be uncoupled and are (at least partially) independently regulated, they appear to be triggered by signals that emanate from all bone injuries. In contrast, migration off the bone matrix into the bone defect, formation of a population of (pre-) osteoblasts and regenerative bone formation represent regeneration-specific responses that require additional signals that are only present at distal-facing injuries. The identification of molecular determinants of the generic vs regenerative responses will be an interesting avenue for future research.

      Reviewer #2 (Public Review):

      The study by Sehring et al. depends on an extensive and thoroughly acquired collection of data points in combination with a robust and rigorous statistical analysis. I see that the authors have spent a lot of effort into this and I am overwhelmed by the number of analyzed data points that again depend on careful measurements at the cellular level in a more or less intact tissue. However, since just a fraction of cells has been chosen to be incorporated into the statistical analysis, there is a certain risk of a biased selection. I think the reader of the paper would appreciate a somewhat clearer picture of how the authors get to their final numbers, starting from the original image data. This appears of particular importance when it comes to determining the elongation of cells and the angular deviations from the proximo-distal axis. In many cases (e.g. Fig.2 A, B, D and E), the reader has to take those numbers without seeing any primary image data. A practicable solution to that issue would be to complement the accompanying Excel sheets of raw data with corresponding image material. This should show an overview of a representative sample for the dedicated experiment, together with some appropriate magnifications of analyzed cells including the axes along which those measurements have been performed. Also, it would be important to state within the methods section of the paper whether the measurements have been done manually using Fiji or whether a certain automated Fiji plug-in has been used for this part of the analysis.

      Osteoblasts line the bony hemirays on the inner and outer surface (see Figure 1A), and for quantifications of osteoblast morphology, we analysed the osteoblasts of the outer layer of one hemiray (the hemiray facing the objective in whole mount imaging). While we have no direct evidence for this, we think it is reasonable to assume that osteoblasts in the other “sister” hemiray behave the same, and we have anecdotal evidence that osteoblasts on the inner surface of the hemirays also migrate and dedifferentiate. Thus, we don’t think that restriction of the analysis to one hemiray and the outer surface introduces bias.

      For measurement of morphology, we used a transgenic line expressing a fluorescent protein (FP) in osteoblasts in combination with Zns5 antibody labelling. Zns5 is a pan-osteoblastic marker which localizes to the cell membrane. Therefore, combination of a cytosolic FP labelling with the membrane labelling by Zns5 provides solid definition of single cell outlines. For general morphology studies and drug intervention studies, we used bglap:GFP transgenics. In the transgenic intervention studies (manipulation of NF-kB signalling), mCherry is expressed together with CreERT2 under the osterix promoter and used as cytosolic labelling of osteoblasts. Our analyses are always based on segments, e.g. we present data for segments 0, -1, 2. Within these segments all FP+ Zns5+ cells were included into the analysis, and cells along the whole proximodistal axis of a segment were measured. Measurements were performed manually, and the analysist was blinded. With these set-ups, not only a fraction but all FP+ Zns5+ osteoblasts present in those segments that we analysed were included into the analysis, and thus no selection was necessary that could have introduced bias. As suggested by Reviewer #2, we have added representative sample images to the accompanying Excel sheets of raw data for the dedicated experiments. Within these, the axes along which the measurements have been performed are indicated.

      We have expanded the description of the analysis in the method section. It now reads on page 36 as follows:

      “To quantify osteoblast cell shape and orientation, the transgenic line bglap:GFP in combination with Zns5 AB labelling was used. Osteoblasts of the outer layer of one hemiray (facing the objective in whole fin mounting) were imaged and analysed. As Zns5 localizes to the plasma membrane of all osteoblasts, the combination of both markers provides solid definition of single cell outlines. All GFP+ Zns5+ cells with such a defined outline within an analysed segment were included into the analysis, and cells along the whole proximodistal axis of a segment were measured. In the transgenic intervention studies, mCherry is expressed under the osx promoter and was used as cytosolic labelling of osteoblasts. Using Fiji (Schindelin et al., 2012), the longest axis of a FP+ Zns5+ cell was measured as maximum length, the short axis as maximum width, and the ratio calculated. Simultaneously, the angle of the maximum length towards the proximodistal ray axis was measured for angular deviation. All measurements were performed manually, with the analyst being blinded.”

      Along the same line, it would strengthen the statement provided by the statistical diagram in Fig.3A if the authors could show images of cells from segment -1 and -2 for all three experimental conditions. In particular, since the depicted segment -1 osteoblasts look rather roundish than elongated (compare with Fig.1 C and D, images and width/length ratio).

      As suggested by the reviewer, we have added representative sample images of cells in segment -1 to the figure, the images that were already there in the previous version of the figure were from segment -2 (new data in Figure 4A). As legible from the graphs, there is a certain range of morphology within each segment / assay with an obvious overlap between the segments. This can make it difficult to realize the difference between the segments by looking on the images alone, and we have therefore added arrowheads to highlight examples of roundish and elongated cells. Yet as mentioned above, all cells were included into the analysis.

      In regards to the biology itself, Sehring and colleagues claim that the complement system is required for injury-induced directed osteoblast migration. To strengthen this point it would be beneficial if the authors could show that the central complement components C3 and C5 are indeed expressed at the amputation site where the dedifferentiated pre-osteoblasts migrate to. It would be interesting to learn about the localization of C3 and C5 expression in the conventional amputation as well as the double-injury condition. Apparently, the RNAscope-based in situ hybridization seems to work quite well in the Weidinger lab.

      Complement precursor proteins are thought to be mainly expressed in the liver and distributed throughout the body via the circulation. Injury would then result in local production of the activated C3a and C5a peptides via a cascade of proteolytic processing. Unfortunately, we lack the tools to detect the C3 and C5 precursor proteins or the mature cleavage products of the complement factors, which mediate the biological function of the cascade (e.g. antibodies against the zebrafish proteins / peptides). We have also attempted RNAScope for c5a and c3a.1 in fins, but these turned out to not produce any specific stainings, thus the results of these experiments remained inconclusive and we have not included them in the manuscript.

      However, we analysed expression of the RNA coding for the precursors of the complement factors c5 and the six zebrafish paralogs of c3 using qRT-PCR on liver, non-injured fins and fins at 6 hpa (samples derived from segment -1 plus segment 0). These new data can be found in Figure 5B. Compared to the expression levels in the liver, expression in non-injured fins could hardly be detected. Interestingly, c5 and c3a.5 levels were upregulated in injured fins, but compared to the expression in the liver still only slightly, e.g. c5 is about 17 Ct values (2 to the power of 17 = 130000 times) more highly expressed in the liver than in the injured fin. These results are consistent with the idea that the majority of complement factors that are activated after injury is derived from precursors that are expressed in the liver and are distributed via the circulation to the fin, as is considered standard for the complement system. Interestingly, however, local production might contribute as well.

      Overall our new data support our conclusion that the complement system is an important regulator of osteoblast migration in vivo, since the receptors are present in osteoblasts (see also response to the next issue), while systemic and local expression can provide the precursors for injury-induced production of the activated factors that might act as guidance cues.

      To judge whether this osteoblast's migratory response is cell-type specific and cell-autonomous it would be good to know if c5ar1 and c3ar are solely expressed in osteoblasts, or rather broadly within tissue lining the hemirays.

      While we had already shown that c5aR1 is expressed in osteoblasts, we have now added additional RNAscope in situ analysis for c5aR1 showing that the receptor is also expressed in other cell types (new data in Figure 5 – figure supplement 1A). We have also attempted RNAScope for c3aR in fins, which however did not produce specific staining, thus remained inconclusive; we have not added these data to the manuscript. However, we established fluorescent activated cell sorting from bglap:GFP transgenic fins, which gives us an additional tool to analyse to which extent expression is specific to osteoblasts. By qRT-PCR analysis we found that c5aR1 and c3aR are expressed in both GFP+ osteoblasts and other cells that are GFP– (these will mainly represent epidermis and fibroblasts, to a lesser extent endothelial and other cell types). These new data can be found in Figure 5 – figure supplement 1B.

      While our qRT-PCR data and the c5aR1 RNAScope results show that the complement receptors are not specifically expressed in osteoblasts, we do not consider this result to be in conflict with our model that the complement system regulates osteoblast migration. Other cell types migrate after fin amputation as well, which is best described for epidermal cells (Chen et al., Dev Cell 2016, 10.1016/j.devcel.2016.02.017), but likely also occurs for fibroblasts (Poleo et al., DevDyn 2001, doi: 10.1002/dvdy.1152), and it is conceivable that the complement system plays a role in regulating these events as well.

      Reviewer #3 (Public Review):

      Weaknesses:

      1) The major conclusions on osteoblast dedifferentiation and migration are solely based on a bglap:GFP strain, which does not allow a pulse-chase approach in injury responses. Specificity of this strain to osteoblasts is also doubtful because as many as 20% of GFP+ cells are in proliferation. Specificity of bglap:GFP to mature osteoblasts is a major concern. Important caveats associated with this reporter strain are not carefully considered.

      To address these comments, we have performed several additional experiments as described below. In addition, we would like to refer the reviewer to our previous papers, where we have analysed the process of osteoblast dedifferentiation (Knopf et al., Dev Cell 2011, doi: 10.1016/j.devcel.2011.04.014; Geurtzen et al., Development 2014, doi: 10.1242/dev.105817; Mishra et al. Dev Cell 2020, doi: 10.1016/j.devcel.2019.11.016). Using transgenic reporters and immunofluorescence we have shown in these previous papers that osteoblasts in the non-injured fin express Bglap but not the pre-osteoblast marker Runx2 (and are thus by our definition differentiated). We apologize if we failed to explain the logic of our approach in this manuscript, we have restructured the results to clarify these, as indicated below.

      We have also performed the following additional experiments.

      1) To confirm the specificity of the bglap:GFP line for mature osteoblasts, we have performed three experiments:

      a) immunofluorescence against Runx2 on 7 dpa regenerates, at a stage where blastema proliferation at the distal tip of the regenerate produces new osteoblast progenitors, while in more proximal (older) regions osteoblasts have already started to differentiate and new bone matrix has formed. We found that Runx2 is expressed in distal regions in pre-osteoblasts, while bglap:GFP is only expressed in proximal regions in osteoblasts which do not express Runx2. Thus, formation of new bony segment during regenerative growth, bglap:GFP is activated in mature osteoblasts and the population does not include osteoblast precursor cells. These new data are found in Figure 2 – figure supplement 2B.

      b) we have refined and expanded our methods and are now able to determine the expression patterns of markers of the osteoblast differentiation status with single cell resolution using RNAScope in situ hybridization. Using this, we can now show that at 1 day post amputation, in segment -2 of the fin stump, which represents a segment equivalent to the non-injured state, since no dedifferentiation occurs here, bglap:GFP+ cells do not express endogenous runx2a. These new data are found in Figure 1 – figure supplement 1A.

      c) Using RNAScope, we can show that cyp26b1, a gene associated with dedifferentiated osteoblasts, is likewise not detected in bglap:GFP+ cells in segment -2 at 1 dpa (new data in Figure 1 – figure supplement 1B).

      Together, these data confirm that the bglap:GFP line is specific for differentiated osteoblasts, and does not label osteoblast progenitors. See the response to issue 2 below for how we describe these new data in the revised version of the manuscript.

      2) Regarding the proliferation of bglap:GFP osteoblasts: In the experiment the reviewer refers to (now Figure 5 – figure supplement 3A), we make use of the persistence of the GFP protein in the bglap:GFP line to detect dedifferentiated osteoblasts. Thus, at the time of analysis, when these GFP+ cells proliferate, they are not differentiated anymore. We can show this as follows:

      Although bglap expression is downregulated during osteoblast dedifferentiation and thus also GFP levels eventually drop in the transgenic line, we can nevertheless use this line to trace osteoblasts, since GFP protein persists for up to three days in cells that shut down endogenous bglap and also bglap:GFP transgene transcription. While we have already shown this previously (Knopf et al., Dev Cell 2011, doi: 10.1016/j.devcel.2011.04.014; Geurtzen et al., Development 2014, doi: 10.1242/dev.105817; Mishra et al. Dev Cell 2020, doi: 10.1016/j.devcel.2019.11.016), we have now also used RNAScope to confirm this. We analysed the expression of GFP on protein and RNA level in the bglap:GFP line. In bglap:GFP fish, in a mature segment in non-injured fins the regions close to the joints are devoid of cells expressing GFP (Figure 1G). Yet after amputation, we observe GFP+ cells in this distal part of segment -1 (Figure 1G, D). RNAscope in situ shows that these GFP+ cells are negative for gfp RNA (new data in Figure 1D). Thus, the observed fluorescence is due to the persistence of the GFP protein and not due to a potential upregulation of the transgene (Figure 1E).

      Importantly, we have now also added data describing the proliferative state of bglap:GFP+ osteoblasts. First, in the non-injured fin, bglap:GFP+ cells are non-proliferative (new data in Figure 5 – figure supplement 2B). After amputation, proliferation can be detected in GFP+ cells at 2 dpa (Figure 5 – figure supplement 2B), and proliferation is restricted to segment -1 and segment 0 (new data in Figure 5 – figure supplement 2C). As we show in Figure 1B, at 2 dpa, dedifferentiation as defined by bglap downregulation is not complete in segment -1, rather here a mixture of cells with different bglap levels are found. We have thus combined EdU labelling with RNAscope against bglap in segment -1 to analyse to which extent bglap and EdU anticorrelate. These data show that EdU is hardly ever incorporated into cells expressing high levels of bglap, while the majority of the proliferating osteoblasts are dedifferentiated, as they express only low levels of bglap (new data in Figure 5 – figure supplement 2D). Together, these data show that mature osteoblasts are non-proliferative, and upon amputation, when they are dedifferentiated, they become proliferative. Thus, the absence of proliferation in bglap:GFP+ cells in the non-injured fin adds to the evidence that this line is specific for mature osteoblasts, but due to the persistence of the GFP protein it can be used to analyse dedifferentiated osteoblasts.

      These data are described on page 14 of the manuscript as follows:

      “In the non-injured fin, bglap:GFP+ osteoblasts are non-proliferative, but upon amputation osteoblasts proliferate at 2 dpa (Figure 5 – figure supplement 2A, B). Proliferation is restricted to segment -1 and segment 0 (Figure 5 – figure supplement 2C), and RNAscope in situ analysis of bglap expression revealed that the majority of EdU+ osteoblasts have strongly downregulated bglap (Figure 5 – figure supplement 2D). Inhibition of C5aR1 with PMX205 had no effect on osteoblast proliferation in segment -1 at 2 dpa (Figure 5 – figure supplement 3A). Furthermore, upregulation of Runx2 was not changed by PMX205 treatment (Figure 5 – figure supplement 3B), and regenerative growth was not affected in fish treated with either W54011, PMX205 or SB290157 (Figure 5 – figure supplement Figure 3C). We conclude that the complement system specifically regulates injury-induced osteoblast migration, but not osteoblast dedifferentiation or proliferation in zebrafish.”

      3) To support our conclusion that osteoblasts migrate, we performed time-lapse imaging using a transgenic line expressing the photoconvertible protein kaede in osteoblasts (entpd5:kaede). Local photoconversion of only the proximal half of a segment allowed us to trace these photoconverted osteoblasts. This revealed that converted cells appear in the distal part of the segment within 1 dpa, which can only be explained by relocation of the cells. These new data can be found in Figure 1F and they are described on page 7 of the revised manuscript as follows: To trace osteoblasts, we used the transgenic line entpd5:kaede (Geurtzen et al., 2014), in which Kaede fluorescence can be converted from green to red by UV light (Ando et al., 2002). We photoconverted osteoblasts in the proximal half of segment -1, while osteoblasts in the distal half remained green (Fig. 1F). At 1 dpa, red osteoblasts were found in the distal half (Fig. 1F), showing that photoconverted osteoblasts had relocated distally.

      2) The authors poorly define dedifferentiation. They use reduced bglap:GFP or bglap mRNA expression as a sole criterion for dedifferentiation. The authors state that NF-kB and retinoic acid can inhibit osteoblast dedifferentiation. However, this simply reflects of the well-described fact that these signals promote osteoblast differentiation.

      We define dedifferentiation as the reversion of a mature cell into an undifferentiated progenitor-like status. This involves the following characteristics: 1) the expression of markers of the differentiated state are downregulated; 2) early lineage markers are re-expressed; 3) the cells become proliferative; and 4) they have the ability to re-differentiate into mature cells. Based in this definition, the downregulation of an osteoblast-specific marker can be used as a read-out for osteoblast dedifferentiation. Bglap is an established marker for mature osteoblasts (Kaneto et al., 2016 doi.org/10.1186/s12881-016-0301-7¸ Yoshioka et al., 2021 doi: 10.1002/jbm4.10496; Kannan et al., 2020 doi: 10.1242/bio.053280; Sojan et al., 2022 doi.org/10.3389/fnut.2022.868805; Valenti et al., 2020 doi.org/10.3390/cells9081911). While we use downregulation of bglap expression as our main read-out for osteoblast dedifferentiation in our experimental interventions (actomyosin inhibition, retinoic acid treatment, complement inhibition), we have expanded our methods to characterize osteoblast dedifferentiation, and have re-arranged our manuscript to show these data in the beginning of the results.

      Already in the previous version of the manuscript we have shown that endogenous bglap is strongly expressed in segment -2, (the segment that does not respond to fin amputation and thus represents the non-injured state), while it is downregulated in a graded manner in segment -1 and segment 0 (the segments where dedifferentiation happens). We have now moved this data to the re-designed Figure 1B. In addition to bglap, we can now show that entpd5, a gene required for bone mineralization, is strongly expressed in osteoblasts of segment -2, while it is massively downregulated in segment -1 and segment 0. These new data can be found in Figure 1C. Thus, entpd5 is another differentiation marker whose loss characterizes osteoblast dedifferentiation. Importantly, we can confirm by RNAScope that the pre-osteoblast marker runx2a is absent in mature segments but is upregulated in segment 0 and segment -1 at 1 dpa (new data in Figure 1 – figure supplement 1A). Similarly, cyp26b1, an enzyme shown to regulate dedifferentiation, is upregulated in segment 0 and segment -1, but not expressed in segment -2. (new data in Figure 1 – figure supplement 1B). Furthermore, we have repeated all experiments where we have previously quantified dedifferentiation upon experimental interventions using downregulation of bglap:GFP (actomyosin inhibition, retinoic acid treatment, complement inhibition). We now can fully confirm the previous conclusions using the more rigorous quantification of dedifferentiation using RNAScope analysis of endogenous bglap levels. We have replaced all bglap:GFP data with the new bglap RNAScope data. These new data are found in Figure 3F, Figure 3 – figure supplement 1A, Figure 4B and Figure 5F.

      Overall, we support our conclusion that osteoblasts dedifferentiate by the loss of the two differentiation markers bglap and entpd5, the upregulation of the pre-osteoblast marker runx2a and the dedifferentiation-associated gene cyp26b1, and the fact that osteoblasts become proliferative. We hope that the reviewer considers this sufficient evidence.

      In mammals, the available literature relatively convincingly concludes that NF-kB signaling negatively regulates osteoblast differentiation (Yao et al., 2014, doi: 10.1002/jbmr.2108; Swarnkar et al., 2014 doi.org/10.1371/journal.pone.0091421, Chang et al., 2009, doi.org/10.1038/nm.1954). Yet in zebrafish osteoblasts, we have previously shown that NF-kB signaling is active in mature osteoblasts and needs to be downregulated for dedifferentiation to occur (Mishra et al., 2020, 10.1016/j.devcel.2019.11.016). Importantly, in our previous work we showed that at least during fin regeneration, NF-kB signalling is not involved in osteoblast differentiation (Mishra et al., 2020, 10.1016/j.devcel.2019.11.016). Specifically, osteoblasts in which Nf-kappaB signaling is enhanced or inhibited differentiate completely normally during the later stages of fin regeneration in the fin regenerate. Hence, our findings with the Nf-kappaB intervention studies done in this manuscript, where we look at osteoblasts in the stump within 1 dpa, cannot be explained by them affecting osteoblast differentiation.

      For retinoic acid signalling, multiple roles in bone development and repair have been described in mammals. For zebrafish osteoblasts, it was shown that during the outgrowth phase of bone regeneration, retinoic acid negatively regulates osteoblast differentiation in the blastema (Blum & Begemann, 2015, 10.1242/dev.120204). Yet importantly, it also negatively controls the dedifferentiation of osteoblasts in the stump right after amputation (Blum & Begemann, 2015, 10.1242/dev.120204). Thus, the effect we observe at the early timepoints we analyse in our intervention studies (retinoic acid treatment) are due to the effect on osteoblast dedifferentiation.

      We have added a short definition of dedifferentiation to the results section (page 6). There it reads as follows:

      “We have previously shown that osteoblasts dedifferentiate in response to fin amputation, that is they revert from a mature, non-proliferative state into an undifferentiated progenitor-like state, which includes loss of bglap expression and upregulation of the pre-osteoblast marker runx2 (Knopf et al., 2011; Geurtzen et al., 2014).”

      In addition, we have restructured the results to describe our use of tools and the new data on page 6 of the revised manuscript as follows:

      Using RNAScope in situ hybridization, we can now show that downregulation of bglap occurs in a graded manner and that entpd5 expression is similarly downregulated during dedifferentiation (Figure 1B, C). At 1 day post amputation (1 dpa), expression of entpd5 and bglap remains high in segment -2, but gradually decreases towards the amputation plane and is almost entirely absent from segment 0, with entpd5 downregulation being more pronounced (Figure 1B, C). While RNA expression of these genes is downregulated within hours after injury, GFP or Kaede fluorescent proteins (FPs) expressed in bglap or entpd5 reporter transgenic lines persist for up to three days, even though transgene transcription is shut down rapidly as well (Knopf et al., 2011). We can confirm these earlier findings using the more sensitive RNAScope in situs. In bglap:GFP transgenics at 2 dpa, gfp RNA and GFP protein colocalized to the same cells in segment -2, where osteoblasts do not dedifferentiate (Fig. 1D). In contrast, in the distal segment -1 GFP protein was present, but barely any gfp transcript could be detected (Fig. 1D). Thus, persistence of FPs in reporter lines can be used for short-term tracing of dedifferentiated osteoblasts (Fig. 1E). At 1 dpa, bglap:GFP+ cells upregulated expression of the pre-osteoblast marker runx2a and of cyp26b1, an enzyme involved in retinoic acid signalling (Blum and Begemann, 2015), which regulates dedifferentiation (Figure 1 – figure supplement 1A, B). Both markers were exclusively upregulated in segment -1 and segment 0 at 1 dpa, but were absent in segment -2. Together, these data show that osteoblasts in segment -1 and segment 0 lose expression of mature markers and gain expression of dedifferentiation markers.

      3) The authors do not rigorously demonstrate that mature osteoblasts indeed migrate. What they showed in this study is simply cell shape changes.

      We have the following evidence for osteoblast migration:

      1) bglap:GFP+ cells relocate from the centre of segments towards the amputation plane (after fin amputations) or towards both injuries in the hemiray model. In this revised manuscript we show that transgene expression is not upregulated in these regions, but that GFP fluorescence there must be due to relocation of cells in which GFP protein persists (new data in Figure 1D, E; see also response to “Weaknesses, issue 1” above)

      2) Using the entpd5:kaede transgenic line, which is expressed in mature osteoblasts throughout segments, we have photoconverted only the proximal half of a segment, which allowed us to trace these photoconverted osteoblasts. This revealed that converted cells appear in the distal part of the segment within 1 dpa, which can only be explained by relocation of the cells. These new data can be found in Figure 1F.

      3) Already in the previous version of the manuscript, we have performed live imaging to track single cell behaviour. Using double transgenic fish expressing both GFP and kaede in osteoblasts, we deliberately only partly converted kaedeGreen to kaedeRed, which resulted in different hues for each osteoblast. This distinct colouring facilitates observing single cells. Video 1 shows the directed movement of cell bodies relative to their surroundings within 2 hours (see also Figure 2 – figure supplement 1A).

      4) Osteoblasts display the typical cell shape changes associated with active migration (elongation along the axis of migration, extension of dynamic protrusions), data in Figure 2.

      Together, we think these are convincing data supporting the conclusion that osteoblasts actively migrate.

      4) The hemiray removal model is highly innovative, but this part of the study is not very well connected to the rest of the study.

      We have rephrased the first sentence of the hemiray paragraph to make the connection more perceptible. It now reads as follows:

      In response to fin amputation, all osteoblast injury responses occur directed towards the amputation plane, that is dedifferentiation is more pronounced distally, osteoblasts migrate distal wards and the proliferative pre-osteoblast population forms distally of the amputation plane. We wondered how osteoblasts respond to injuries that occur proximal to their location. To test this, we established a fin ray injury model featuring internal bone defects.

    1. We are not sorry for him—we learn that, not to be sorry for the dead. But for ourselves? This terror is always so fresh, so unexampled.

      This is quite a bold statement, especially for an opening paragraph. It makes the reader stop and think, potentially reflecting on their own life. It also allows us to connect with the narrator as they think about the terror they may have experienced in their own lives.

    1. Author Response

      Reviewer #1 (Public Review):

      Using Tet-off system, Kir2.1 was expressed (or not) during the key time of callosal development from E15 to P15. Restoring activity either by adding Dox during a critical period from P6 to P15 or using DREADDs from P10-14 could rescue the callosal projection to the cortex, whereas later restoration of activity (with Dox) was not successful. Did this successful rescue lead to normal activity? Calcium imaging in animals with Kir2.1 had low levels of any kind of activity, both highly correlated and low correlation, but P6-13 dox treatment partially restored only low-correlation activity and not high correlation activity at P13. The effects of DREADDs on activity was not similarly measured though it was effective for at least partially restoring the callosal projection.

      Overall this study builds on earlier findings regarding the importance of neuronal activity in the formation of a normal callosal projection, using in utero electroporation which is particularly well suited for this subject. It makes the case very compellingly that near-normal callosal connectivity can be produced if activity is permitted during a critical period window from P6 or P10 to P15, though the exact timing of this window is imprecise because the elimination of Kir expression was not systematically quantified. For transmembrane proteins like channels it can often take many days for protein expression to completely abate.

      We thank the reviewer for their positive evaluation and the constructive comments. Based on the comment on Kir expression, we conducted new experiments using pTRE-Tight2Kir2.1EGFP, with which EGFP signals reflect localization of over-expressed Kir2.1, and examined when the expression of Kir2.1EGFP went down after Dox treatment at P6. At P6 (before Dox treatment), the signals of Kir2.1EGFP (stained with anti-GFP antibody) were observed in the periphery of the soma and along dendrites, implying that Kir2.1EGFP was transported to the cellular membrane. At P10 and P15 (4 days and 9 days after Dox treatment), Kir2.1EGFP signals were not observed in the periphery of the soma and along dendrites. We noted that low-level green signals were observed in the central part of the cell body. These may stem from low-level expression of Kir2.1EGFP in nuclei or cytosol even after Dox treatment. Alternatively, and more likely, these may reflect bleed-through of RFP signals into GFP channel. Overall, we confirmed that Kir2.1 proteins that were localized to the cellular membrane were largely down-regulated. We described these observations in detail in the figure legend of Figure 1-figure supplement 3, and added the result as Figure 1-figure supplement 3.

      I found the quantification of the callosal projection to be rather minimal and the normalization approach not entirely transparent. For example does activity from P10-15 restore the full normal PATTERN of callosal connectivity or merely the density of input overall?

      We thank the reviewer for this comment. Based on the comment, we added analyses of the pattern of callosal projections; the width of callosal axon innervation zone in layers 2/3 and 5, and densitometric line scans across all cortical layers. Our original quantification showed that the density of callosal axons reaching their target layer (i.e. cortical layer 2/3) is almost recovered in P6-P15 DOX condition (Fig1B-D), but new analyses suggest some aspects of callosal axon projections (the width of the innervation zone in layer 2/3 and 5 (Figure 1-figure supplement 4A,B), and lamina specific innervation pattern (Figure 1-figure supplement 4C)) might be only partially recovered. We have added these new results as Figure 1-figure supplement 4. In future study, we would like to assess the effect of the manipulations at finer resolution by 3D morphological reconstruction of axons of individual neurons.

      Also in the discussion it would be nice to more clearly establish whether activity is thought to be maintaining a projection already formed by P10 or permitting the emergence of such a pattern.

      Thank you for the suggestion. We have added thorough discussions about this point as follows. Page 7, lines 198-208:

      “In the previous study, we showed that callosal axons could reach the innervation area almost normally under activity-reduction, and that the effects of activity-reduction became apparent afterwards (Mizuno et al., 2007). Callosal axons elaborate their branches extensively in P10P15 (Mizuno et al., 2010), and axon branching is regulated by neuronal activity (Matsumoto and Yamamoto, 2016). It is likely that activity is required for the processes of formation, rather than the maintenance of the connections already formed by P10, but the current study employed massive labeling of callosal axons which is not suited to clarify this. In addition, the restoration of activity in the Tet-off (Figure 1) or DREADD (Figure 2) experiment may not completely rescue the ramification pattern of individual axons. Single axon tracing experiments (Mizuno et al., 2010; Dhande et al., 2011) would be required to clarify this. Nonetheless, our findings suggest that callosal axons retain the ability, or are permitted, to grow and make region- and lamina-specific projections in the cortex during a limited period of postnatal cortical development under an activity-dependent mechanism.”

      The calcium imaging is a valuable validation of the Kir expression approach, but it the study here appears to overinterpret what may simply be an intermediate level of activity restoration rather than a specific restoration of L events, as it seems that L events would be the most likely to occur under conditions of reduced overall activity. One possibility is that the absence of H events at P13 in the calcium is due to residual Kir expression creating a drag on high level network activation rather than any more complicated change in patterned spontaneous activity/connectivity. The conclusions from this study regarding the permissive role of activity during a critical window and the lack of a requirement for highly correlated activity are valuable, even if somewhat imprecise on both counts. The authors should probably refrain from use of the term patterned activity given that this was measured but not systematically compared to unpatterned spontaneous activity.

      We thank the reviewer for this constructive comment. Based on this comment, we removed the term “patterned activity” throughout the manuscript and revised the title, abstract, introduction, results, and discussion extensively. For example, in the Discussion, we revised as follows.

      “We have shown that the projections could be established even without fully restoring highly synchronous activity (Figure 4). L events, but not H events, were present in P13 cortex after Dox treatment at P6. L events may be sufficient for the formation of callosal projections. Alternatively, any form of activity with certain level(s) (i.e., “sufficiently” high activity with no specific pattern) could be permissive for the formation of callosal connections.”

      Reviewer #2 (Public Review):

      Tezuka et al. use in vivo manipulations of spontaneous activity to identify the activitydependent mechanisms of callosal projection development. Previous research of the authors' and other labs had shown that overexpressing the potassium channel Kir2.1, which reduces activity levels in the developing cortical network, blocks the formation of callosal connections almost entirely.

      The current manuscript corroborates and extends these previous discoveries by:

      1) Demonstrating that the effect of Kir overexpression can be rescued by pharmacogenetic network activation using DREADDs.<br /> 2) Revealing the requirement of network activity for the development of callosal projections during a particular developmental time window and by<br /> 3) Directly relating perturbed callosal development to the actual changes in activity patterns caused by the experimental manipulations.

      Thus, this paper is important for our understanding of the role of neuronal activity in the development of long-range connections in the brain. In addition it provides strong evidence for a role of specific activity patterns in this process.

      In general, the approach is very straightforward and the results clearly interpreted. Nevertheless, there are a few points to consider.

      We thank the reviewer for these positive and supportive comments.

      1) It is not clear in which cortical area(s) the in vivo 2-photon recordings were performed and in how far cortical areas that actually receive/send callosal projections were included or not in the analysis.

      In response to this comment, we revised the text in the method section as follows.

      “We aimed to record spontaneous neuronal activity in putative binocular zones in V1 (2.5 mm lateral of midline and 1 mm anterior of the posterior suture). Since the boundaries between V1 and higher visual areas, AL/LM are not as obvious as those in adult, our recordings likely contained juxtaposed lateral monocular V1 and AL/LM as well.”

      Based on our colleaguesʼ unpublished observations, V1 and AL/LM can be distinguished solely by spontaneous activity patterns even before eye-opening. They also found frequencies of spontaneous activity are similar across mono/binocular regions of V1 and AL/LM (Murakami, Ohki, et al. unpublished). Thus, our results should hold even with the variability in recording sites.

      2) It is not discussed what the duration of the CNO effect is. Do daily injections rescue activity patterns for 24 hours or a significant proportion of this period?

      In response to this critical comment, we revised the text in the method section as follows.

      “A previous study showed that an intraperitoneally injected CNO was effective (in terms of increasing activity) for about 9hrs (Alexander et al., 2009). The “partial rescue” effect we observed (Figure 2) may suggest that activity was not fully restored during 24hrs by our daily CNO injections.”

      Reviewer #3 (Public Review):

      The manuscript by Tezuka adds to an emerging story about the role of activity in the formation of callosal connections across the brain. Here, the authors show that they can use a TET system to switch off the activity of an exogenous potassium channel, in order to probe when activity might be necessary or sufficient for the formation of callosal connections. The authors find that artificial restoration of activity with DREADS is sufficient to rescue the formation of callosal connections, and that there is a critical period (somewhere between P5-P15) where activity must occur in order for the connections to form within the cortex. Finally, the authors show that when the potassium channel is removed during the critical period, the cortex exhibits activity, but few highly synchronous events. These results indicate that it is activity in general and not specifically highly synchronous activity that is necessary for the final innervation of the callosal cortex.

      In general, the study is well done, and the writeup is polished, well summarized. The figures are solid. There are only a few criticisms/suggestions.

      We thank the reviewer for the positive evaluation.

      Major issue: Have the authors demonstrated a requirement for "patterned spontaneous activity"?

      The authors claim variously in the abstract ("a distinct pattern of spontaneous activity") and in the results (pg 6, "our observations indicate that patterned spontaneous activity") and discussion (pg 6, "we demonstrated that patterned spontaneous activity") that it is "patterned" spontaneous activity that is key for the formation of callosal connections. However, when I was reading the paper, I came to the opposite conclusion: that any sufficiently high spontaneous activity is sufficient for the formation of these connections.

      The authors showed that relieving the KIR expression from P5-15 allows the connections to form; however, in Figure 4, the authors show that the nature of the activity produced in the cortex (in terms of mixtures of H and L events) is very different. Nevertheless, the connections can form. Further, the authors showed that increasing activity when KIR is expressed using DREADS restores the connections. The pattern of activity produced by this DREADS + KIR expression is likely to be very different from the pattern of activity of a typically-developing animal. In total, I thought that the authors demonstrated, quite nicely, that it is just the presence of sufficient activity that is key to the innervation of the contralateral cortex. (It's not cell autonomous, as the authors showed before; there seems to be a "sufficient activity" requirement).

      Therefore, I think the authors should remove references to the requirement of patterned activity and instead say something about sufficiently high activity (or some characterization that the authors choose). I think they've shown quite nicely that a specific pattern of the spontaneous activity is not important.

      We thank the reviewer for this very important insight and interpretation. After considering all the currently presented data again, we have come to agree with the interpretation stated by the reviewer. We removed the term “patterned activity” throughout the manuscript and revised the title, abstract, introduction, results, and discussion extensively. Nevertheless, we would not completely discard the possibility that specific patterns of spontaneous activity, such as L-events, could potentially have some active contribution to the development of projection circuits, and would like to further address this in future study.

      For example, in the Discussion, we revised the text as follows.

      “We have shown that the projections could be established even without fully restoring highly synchronous activity (Figure 4). L events, but not H events, were present in P13 cortex after Dox treatment at P6. L events may be sufficient for the formation of callosal projections. Alternatively, any form of activity with certain level(s) (i.e., “sufficiently” high activity with no specific pattern) could be permissive for the formation of callosal connections.”

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Sasaki et al titled "Conditional GWAS of non-CG transposon methylation in Arabidopsis thaliana reveals major polymorphisms in five genes" employed conditional GWAS to identify trans-regulators of mCHG levels in Arabidopsis natural accessions, after controlling for mCHH. Using loss of function mutants for couple of these genes, the authors also tested their effects on mCHG levels.

      Overall, this manuscript makes a nice contribution. I suggest the following improvements to enhance the quality of this manuscript.

      Comments:

      1. MSI1 has been shown to be copurified with TCX5, a component of DREAM Complex. The DREAM complex transcriptional regulates CMT3, MET1, DDM1 in a cell cycle dependent manner (ref: Yong-Qiang Ning, 2020 nature plants). Tcx5/6 double mutants have ectopic gain of TE and genic mCHG. It would be nice to refer this paper and add to the MSI1 part accordingly. Absolutely: thanks for suggesting this!

      Multifaceted regulation of mCHG levels seems to be evident from this and previous studies. Why would such complex pathways evolv to regulate mCHG? Bewick et al 2016 and Wendte et al 2019 showed lack of CMT3 or ectopic expression of CMT3 can influence CG gene body methylation (gbM). One possibility is that these five factors regulate CHG to maintain it at a level that is just enough to target TE. Irrespective of the functional relevance of gbM, differences in the levels of these five factors might result in erroneous gbM. It would be interesting to look for the rates of gbM and number of gbM genes in the natural accession carrying 1 to 4 number of mCHG-decreasing alleles. Also, in the one line from Iberian peninsula carrying polymorphisms in all five genes.

      Yes, the connection between CHG and gbM is very interesting and deserves more attention. We looked for the effect of cumulative mCHG-decreasing alleles on gbM, but there was no association with gbM — but this is really not expected given the stable epigenetic inheritance of gbM. The Iberian peninsula line carrying all decreasing alleles did slightly lower gbM levels, but it is impossible to exclude the effects of population structure. Since we have nothing to add beyond speculation, we prefer not to go into this topic.

      The authors mentioned a significant peak for mCHG|mCHH on RdDM-targeted transposons was located 196 bp downstream of MIR823a and not on mature miRNA. Therefore, this cannot directly impair miR823 base pairing with CMT3 mRNA transcripts and its cleavage. Moreover, natural accessions carrying alternative MIRNA823 allele show reduced CMT3 and mCHG levels, meaning more miR823 levels? Does this 196 downstream region contain any regulatory feature that effects miR823 transcription? Or this region still falls in the primary miRNA hairpin region? A single nucleotide change in pri-miRNA can have a significant impact on its secondary structure that can impede DICER processivity and effectively levels of mature miR823 molecules? It will be beyond the scope of this paper to pin down the exact mechanism. But a simple stem loop RT-PCR for miR823 levels in reference and alternative accessions would be informative (on accessions that grow at the same speed). Perhaps, the authors can at least model SNP induced pri-miRNA secondary structure variations using Vienna RNAFold (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) and present MEF values (maximum free energy) for representative accessions.

      Stem-loop qRT-PCR for MIR823a expression would indeed be helpful to confirm allelic effects. However, comparing lines with wildly different genetic backgrounds is fraught with difficulty due to trans-effects. Furthermore, MIR823a is expressed specifically during embryogenesis, and the expression quickly decreases after the early heart stage (Papareddy et al., 2021). Thus, we would need to extract microRNA from embryos at exactly the same developmental stage, from lines that may develop at different speeds.. Most likely, time-series data would be required, and generating such data is a massive undertaking. As noted in the paper, we did measure MIR823a expression by stem-loop qRT-PCR for several lines carrying reference and alternative alleles but the results were inconclusive. A proper study of this is beyond the scope of this paper.

      Testing predicted effects on RNA secondary structure, on the other hand, is eminently feasible. As suggested, we used Vienna RNAFold for the region, including the GWAS peak. Since the SNP is linked to a 35 bp deletion (shown in S4A), it is closer to the MIR823A coding region than 196 bp. However, the results indicate that the SNP (Chr3:4496626) is not within the stem-loop. It remains possible that this SNP tags multiple SNPs in the annotated stem regions. This is now mentioned.

      Figure 1A can be made more reader friendly. Perhaps this can be broken down into correlation plots for individual conditions or tissue types. In addition, it might be good to add individual r-square values for each of them instead of compound r-square.

      We respectfully disagree, since the main point of the figure is the overall correlation and heterogeneity, rather than the correlation within sub-sets. Instead of splitting the plot, we changed color contrasts to make it easier to read.

      Page 3, Paragraph 1 from line 3 to end of paragraph. The authors wrote "Much of this variation is due to differences in the environment (including tissue, which can be viewed as a cellular environment)". A possible explanation is these two tissues have different mitotic indices (fraction of cells diving and non-diving; flowers have more dividing cell, leaves have more non dividing and endoreduplicated cells) that explains non-CG variation. I would suggest authors to change the text to this and refer to Filipe Borges et al 2021 Current biology paper.

      This is certainly a possibility, although higher mCHG levels in flower buds presumably also reflect higher CMT3 expression during embryogenesis (Feng et al. 2020; Gutzat et al. 2020; Papareddy et al. 2021). We now mention both explanations and cite Borges et al. (2021).

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Sasaki et al. carried out a conditional GWAS analysis of TE-CHG methylation in Arabidopsis thaliana natural accessions. They revealed multiple associations with SNPs in known DNA methylation genes. A new finding is the association found proximal to JMJ26, which had no previously described role in the maintenance/establishment of RdDM-targeted transposons. The authors validate the JMJ26 association using a loss-of-function mutant of JMJ26, which essentially recapitulates the GWAS effect, suggesting that JMJ26 is likely causal. An important point of the study is that the associations detected with conditional GWAS have not been seen in previous univariate (i.e. unconditional) GWAS, probably due to to a lack of power. At the sub-genome-wide threshold the authors discovered further, albeit weaker, associations that were also highly enriched for known DNA methylation genes.

      Overall impression:

      The manuscript is clearly written, and the functional validation of the JMJ26 GWAS signal is commendable and certainly goes beyond the typical GWA study. Beyond this validated association however, the GWAS results are mainly confirmatory. They essentially highlight that methylation genes previously identified by way of mutant screens are variable in natural populations, and (probably) causative of non-CG methylation variation in TEs. What I personally found very distracting throughout the manuscript was the strong emphasis on the methodological aspect; that is, the conditional GWAS, which is really not new. Furthermore, the conceptual/philosophical discussion about what is a complex trait or what can be called polygenic was slightly pedantic and distracted from the biological message.

      There are three points here. First, we disagree that the GWAS results are confirmatory. Sure, only one of our associations is connected with a novel gene, but the fact that the four other genes apparently harbor major polymorphism is a new finding that contributes to our understanding of the function of this trait (and, possibly, these genes). Second, while it is possible that we emphasize statistical methodology too much, we do this for clarity, not to claim that what we are doing is novel. Third, we are similarly not interested in defining what is polygenic and what isn’t, but rather put the results in the context of other studies. We have changed the writing in various places to make it clearer (and hopefully less distracting/pedantic).

      A conceptual comment:

      • The conditional GWAS presented here is conceptually very similar to conditional QTL mapping approaches where candidate loci are included, a priori, as covariates in the model, and a scan is performed to search for additional modifiers. It is known that this approach increases power because the scan is performed on the residual trait variation (having accounting for effect of candidate loci). This is also the idea behind MQM mapping, although in the latter the inclusion is not restricted to candidate loci. Instead of including candidate SNPs as covariate the authors include TE-CHH methylation levels as a covariate as it is highly correlated with TE-CHG methylation. By doing this, the authors essentially "control" for any SNP affecting the covariance between CHG and CHH, even if these SNPs (and their genetic architecture) remain unknown. Hence, the conditional scan is mainly on the residual variation in TE-CHG methylation that is unique to this context (i.e. independent of CHH). That additional TE-CHG associated loci pop up in this scan is perhaps not so surprising.

      We agree, and have even written papers on this very subject. We were surprised by this comment as we felt we had included lengthy sections (see also comment above) about methodology, emphasizing that multi-trait analysis is a good idea in principle. One of our purposes here is to provide a beautiful example demonstrating this. We have tried to make these points clearer.

      The finding that this conditional GWAS yields again a handful of loci of that explain a considerable part of the trait (now residual trait) variation leads the authors to suggest that the genetic architecture underlying non-CG methylation of TEs is not "polygenic". I think this is semantics. All the authors have done is relegate any causal SNPs underlying the covariance between TE-CHG and TE-CHH to the right hand side of the equation of their GWAS model, and subsumed it under the predictor "TE-CHH methylation levels". That is, the genetic architecture underlying this covariance is still unknown, difficult to identify and probably highly polygenic.

      Again we agree, and fail to see why the reviewer thinks we do not. Nowhere do we claim that the overall covariance has a simple basis, and we explicitly state that it is the conditional mCHG variation that has an oligogenic basis. We did write that “univariate GWAS of mCHG variation failed to detect any significant associations, leading us to conclude, erroneously, that the trait was simply too polygenic”, which was imprecise, and arguably erroneous. The word “erroneously” has been removed in the revision.

      The authors essentially decompose a complex traits into parts and map genetic architectures for each part. Although each part seems less complex and more oligogenic than polygenic, when putting all the parts back together, I would argue we are getting close to a complex trait with a polygenic architecture. The study by Hüther et al, which the authors also cite, is another example of how a complex trait can be decomposed into parts. In reference to one of the authors' GWAS associations, they say "...this association was also recently found by Hüther et al. (2022) using GWAS for unconditional mCHG levels of individual transposons. The MIR823A polymorphism appears to almost exclusively affect mCHG (Figs. S2, S3), primarily targeting the same transposons as a CMT3 knock-out...". In the case of Hüther et al., the complex TE-CHG methylation trait is simplified by selecting specific TEs, a priori, that are differential methylated in CMT3 knock-out lines. One could go on like this, and continue to peel away this complex trait. But, again, this does not mean that the overall TE-CHG methylation trait is not complex nor polygenic. It spirals down into a discussion of what is actually meant by "complex" or "polygenic", which is an interesting discussion, but - in the case - of this manuscript takes away from the biological message. My point is perhaps best reflected in the following statement from the discussion section: "Despite high heritability, univariate GWAS of mCHG variation failed to detect any significant associations, leading us to conclude, erroneously, that the trait was simply too polygenic (Kawakatsu et al., 2016)." But a few lines below the authors seem to realize what they have actually done "We believe that, by controlling for mCHH, we have effectively simplified the trait, revealing genetic factors affecting mCHG only, perhaps by affecting the maintenance of this type of DNA methylation."

      The phrase “seem to realize” is unwarranted and unnecessary sarcasm. Given that we cite the two century-old papers that first demonstrated that it was possible to decompose complex traits into Mendelian ones, it should be obvious that we understand what we have done. That our writing could have been better is another matter. As noted above, the word “erroneously” has been dropped, and we have also changed the second sentence to make it obvious that this is obvious. We suspect that whether one finds this part of the Discussion “distracting” or not depends on training and background — our objective was to explain our results to readers who (unlike us and the reviewer) are not well-versed in quantitative genetics.

      Specific comments

      1. A large part of the manuscript focuses on SNPs that enriched for a priori genes that fall below the genome-wide significance threshold. While I see the reasons for doing this in this particular manuscript, I do not see how this is useful in general (again this approach is partly "sold" on methodological grounds). The approach can obviously not be extended to study traits where a priori gene sets are unavailable or incomplete. Moreover, the "FDR" approach based on the a priori gene set labels GWAS hits that are not within the a priori set "false discoveries", which may or may not be true. Moreover, there is no "natural" stopping point for going below GWAS thresholds. An alternative, to this would be to perform a targeted GWAS for a priori genes (+ a LD window around them). Since this alleviates the multiple testing burden, I would be curious to see what this yields both in terms of conditional and unconditional analysis. Candidates that show a signal could be included as covariates and a conditional scan for unknown genes could then be performed.

      The FDR analysis using a set of a priori genes should be explained in detail in this ms. It is cumbersome to go to another manuscript to see what was done exactly, especially since this information is also difficult to dig up in the Atwell 2010 study. Although I understand the idea behind this approach, I would be concerned that this type of "FDR" analysis assumes that that all methylation genes are known. A novel candidate that was perhaps never identified in mutants screens before would be classified as a false discovery. Similarly, known candidates that carry no functional polymorphisms in nature, perhaps because they are highly constraint, will never become a discovery.

      Comments 1 and 9 largely overlap, and so we moved 9 here for clarity and respond to both at the same time. We agree that the enrichment analysis should be explained in this article as well, so as to save the reader from finding the supplement to an old paper. A new section has been added to Methods. In this section, we also try to preempt some of the misunderstandings in the reviewer's comments.

      First, our approach is indeed generally applicable. Whether it is useful depends on what you want to do, and yes, the utility will depend on the quality of the independent data, but note that the a priori gene set does not have to be genes: you could use this approach to compare coding vs non-coding regions of the genome, for example.

      Second, we are not trying to “sell” our approach (or anything else for that matter).

      Third, the approach does not label GWAS hits that are not within the a priori set as false discoveries: it says nothing about these hits.

      Fourth, we are not sure what is meant by a ‘“natural” stopping point for going below GWAS thresholds’, but our approach does provide a simple way to explore how FDR (in the a priori set!) depends on the threshold used.

      Fifth, the proposed alternative of “targeted GWAS” (non-genomewide association, as it were) is not equivalent, because our approach was not designed to increase power by alleviating the multiple testing burden, but rather to rigorously demonstrate that there is a signal in the data when faced with uncalibrated p-values. That it can also be used to explore sub-significant associations is a nice side-effect that we exploit here.

      Sixth, we do not assume that all methylation genes are known, nor is our goal to find them all.

      With regards to the CMT2 signals (particularly section "Further evidence for allelic heterogeneity at CMT2") it would have been useful/clearer to break down CHH into CWA and non-CWA.

      While this is a sensible suggestion, the focus of this paper is on mCHG, and refining the mCHH measurement would essentially amount to re-doing all analyses.

      I understand that the authors set out to do this conditional analysis because previously no hits could be found for CHG TE methylation. However, have the authors considered going the other way around and performing a CHH|CHG analysis to find additional QTL affecting CHH methylation, partly indepedently of CHG?

      Yes, this was in the paper, but we only mention it in the Discussion (and Fig S13) as the results were only of methodological interest (as expected, they were very similar).

      The authors write: "While both mCHG and mCG showed high heritability, GWAS yielded little in terms of significant associations. This might be because these "traits" are highly polygenic, or because they are at least partly transgenerationally inherited, and hence do not behave like standard phenotypes." Please clarify what they mean by "not behave like standard phenotypes".

      Done.

      The authors write: "Our starting point is the observation that mCHG and mCHH levels on transposons are strongly correlated in the 1001 Epigenomes data set (Kawakatsu et al., 2016), especially for RdDM- targeted transposons (Fig. 1A; see Methods). Much of this variation ....". What is mean by "this variation"?

      The sentence has been changed to make this clearer.

      A few lines below, they write "...huge". Please rephrase.

      Done.

      The authors write: "sample data set ("Leaf SALK ambient temperature"; n=846). Interestingly, the covariance between mCHH and mCHG showed the same pattern in data generated by knocking out known or potential DNA methylation regulators in the same genetic background (Fig. 1B) (Stroud et al., 2013). This demonstrates strong co-regulation of these types of methylation, in particular for RdDM-targeted transposons." It is noticeable that many double mutants are off the diagonal. To me this indicates that they affect one context more than the other (i.e. they break covariance). Second, it suggests that they are probably interacting non-additively. It would be great if the authors could comment on this observation; perhaps also later in the ms, where they make a case for additivity.

      We are not convinced that the double- or triple-mutant show non-additivity. Adding up effects in Figure 1 works pretty well. As for our GWAS results, it is clear that small effects (like the ones in our GWAS) will always tend to look additive for simple mathematical reasons. This does not mean that no interactions exist, and we emphasize this in the paper. We also have an example of non-linearity when it comes to TE activity. This is now also emphasized.

      The authors write: " it is difficult to say what fraction of these factors is genetic and what is environmental, but, regardless of this, we hypothesized that the substantial covariance could reduce power of GWAS for either mCHH or mCHG (when using a standard univariate model), and that an analysis accounting for this covariance might perform better...". The arguments given thus far are not sufficient to understand why a "substantial covariance" between traits would reduce the power to map individual traits. I think more needs to be done here to motivate this.

      The sentence following the one quoted is “In essence, we sought to simplify a complex trait by breaking it into constituent parts”, which is very much part of the motivation. As the reviewer noted above, it is not surprising that a conditional analysis turns out to be more powerful. The comment may have arisen from the statement “This insight is the basis for this paper”, which is misleading — there is no insight here, just a very obvious hypothesis, which turned to be correct. We have changed the writing to make this clearer.

      The authors write" "However, MSI1 is required to control DNA methylation via repression of MET1, and a loss of FAS2 in CAF-1 induces mCHG hypermethylation (Fig 1B) (Stroud et al., 2013; Jullien et al., 2008)...", where is the "FAS2 in CAT-1" result visible in Fig. 1B?

      fas2 induces mCHG hypermethylation in CMT2-targeted TEs, presumably via a complex that also involves MSI1. It is marked in Fig. 1B. We have rephrased the sentence to make this clearer.

      The results presented in "A jmjC gene is a novel modifier of mCHG in RdDM-targeted transposons" could have been showcased better. Only after reading the methods part did I realize that the authors generated CRISPR mutants. It reads as if the authors just picked up some available loss of function mutants and profiled them. But, clearly, much more work was involved here and the authors could have brought that out more. Perhaps more generally, I think all the new functional analysis the authors perform is largely "under-sold" in this manuscript at the expense of unnecessary methodological/concpetual discussion (see point above).

      We actually generated CRISPR/CAS9 mutants only for MIR823A (Table S5). For JMJ26, a t-DNA insertion line was available, and results based on this and rescue lines provided sufficient results. To clarify this, we corrected the subsection titles.

      In section "The power and complexity of conditional GWAS", the authors write "The performance of GWAS relies on using the right model for the relation between genotype and phenotype. As with other statistical methods, using the wrong model may lead to unpredictable results." This seems like a too obvious of a statement.

      Indeed: it is meant ironically. It is obvious, yet people do it.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Sasaki et al. carried out a conditional GWAS analysis of TE-CHG methylation in Arabidopsis thaliana natural accessions. They revealed multiple associations with SNPs in known DNA methylation genes. A new finding is the association found proximal to JMJ26, which had no previously described role in the maintenance/establishment of RdDM-targeted transposons. The authors validate the JMJ26 association using a loss-of-function mutant of JMJ26, which essentially recapitulates the GWAS effect, suggesting that JMJ26 is likely causal. An important point of the study is that the associations detected with conditional GWAS have not been seen in previous univariate (i.e. unconditional) GWAS, probably due to to a lack of power. At the sub-genome-wide threshold the authors discovered further, albeit weaker, associations that were also highly enriched for known DNA methylation genes.

      Overall impression:

      The manuscript is clearly written, and the functional validation of the JMJ26 GWAS signal is commendable and certainly goes beyond the typical GWA study. Beyond this validated association however, the GWAS results are mainly confirmatory. They essentially highlight that methylation genes previously identified by way of mutant screens are variable in natural populations, and (probably) causative of non-CG methylation variation in TEs. What I personally found very distracting throughout the manuscript was the strong emphasis on the methodological aspect; that is, the conditional GWAS, which is really not new. Furthermore, the conceptual/philosophical discussion about what is a complex trait or what can be called polygenic was slightly pedantic and distracted from the biological message.

      A conceptual comment:

      • The conditional GWAS presented here is conceptually very similar to conditional QTL mapping approaches where candidate loci are included, a priori, as covariates in the model, and a scan is performed to search for additional modifiers. It is known that this approach increases power because the scan is performed on the residual trait variation (having accounting for effect of candidate loci). This is also the idea behind MQM mapping, although in the latter the inclusion is not restricted to candidate loci. Instead of including candidate SNPs as covariate the authors include TE-CHH methylation levels as a covariate as it is highly correlated with TE-CHG methylation. By doing this, the authors essentially "control" for any SNP affecting the covariance between CHG and CHH, even if these SNPs (and their genetic architecture) remain unknown. Hence, the conditional scan is mainly on the residual variation in TE-CHG methylation that is unique to this context (i.e. independent of CHH). That additional TE-CHG associated loci pop up in this scan is perhaps not so surprising.

      The finding that this conditional GWAS yields again a handful of loci of that explain a considerable part of the trait (now residual trait) variation leads the authors to suggest that the genetic architecture underlying non-CG methylation of TEs is not "polygenic". I think this is semantics. All the authors have done is relegate any causal SNPs underlying the covariance between TE-CHG and TE-CHH to the right hand side of the equation of their GWAS model, and subsumed it under the predictor "TE-CHH methylation levels". That is, the genetic architecture underlying this covariance is still unknown, difficult to identify and probably highly polygenic.

      The authors essentially decompose a complex traits into parts and map genetic architectures for each part. Although each part seems less complex and more oligogenic than polygenic, when putting all the parts back together, I would argue we are getting close to a complex trait with a polygenic architecture. The study by Hüther et al, which the authors also cite, is another example of how a complex trait can be decomposed into parts. In reference to one of the authors' GWAS associations, they say "...this association was also recently found by Hüther et al. (2022) using GWAS for unconditional mCHG levels of individual transposons. The MIR823A polymorphism appears to almost exclusively affect mCHG (Figs. S2, S3), primarily targeting the same transposons as a CMT3 knock-out...". In the case of Hüther et al., the complex TE-CHG methylation trait is simplified by selecting specific TEs, a priori, that are differential methylated in CMT3 knock-out lines. One could go on like this, and continue to peel away this complex trait. But, again, this does not mean that the overall TE-CHG methylation trait is not complex nor polygenic. It spirals down into a discussion of what is actually meant by "complex" or "polygenic", which is an interesting discussion, but - in the case - of this manuscript takes away from the biological message. My point is perhaps best reflected in the following statement from the discussion section: "Despite high heritability, univariate GWAS of mCHG variation failed to detect any significant associations, leading us to conclude, erroneously, that the trait was simply too polygenic (Kawakatsu et al., 2016)." But a few lines below the authors seem to realize what they have actually done "We believe that, by controlling for mCHH, we have effectively simplified the trait, revealing genetic factors affecting mCHG only, perhaps by affecting the maintenance of this type of DNA methylation."

      Specific comments

      • A large part of the manuscript focuses on SNPs that enriched for a priori genes that fall below the genome-wide significance threshold. While I see the reasons for doing this in this particular manuscript, I do not see how this is useful in general (again this approach is partly "sold" on methodological grounds). The approach can obviously not be extended to study traits where a priori gene sets are unavailable or incomplete. Moreover, the "FDR" approach based on the a priori gene set labels GWAS hits that are not within the a priori set "false discoveries", which may or may not be true. Moreover, there is no "natural" stopping point for going below GWAS thresholds. An alternative, to this would be to perform a targeted GWAS for a priori genes (+ a LD window around them). Since this alleviates the multiple testing burden, I would be curious to see what this yields both in terms of conditional and unconditional analysis. Candidates that show a signal could be included as covariates and a conditional scan for unknown genes could then be performed.
      • With regards to the CMT2 signals (particularly section "Further evidence for allelic heterogeneity at CMT2") it would have been useful/clearer to break down CHH into CWA and non-CWA.
      • I understand that the authors set out to do this conditional analysis because previously no hits could be found for CHG TE methylation. However, have the authors considered going the other way around and performing a CHH|CHG analysis to find additional QTL affecting CHH methylation, partly indepedently of CHG?
      • The authors write: "While both mCHG and mCG showed high heritability, GWAS yielded little in terms of significant associations. This might be because these "traits" are highly polygenic, or because they are at least partly transgenerationally inherited, and hence do not behave like standard phenotypes." Please clarify what they mean by "not behave like standard phenotypes".
      • The authors write: "Our starting point is the observation that mCHG and mCHH levels on transposons are strongly correlated in the 1001 Epigenomes data set (Kawakatsu et al., 2016), especially for RdDM- targeted transposons (Fig. 1A; see Methods). Much of this variation ....". What is mean by "this variation"?
      • A few lines below, they write "...huge". Please rephrase.
      • The authors write: "sample data set ("Leaf SALK ambient temperature"; n=846). Interestingly, the covariance between mCHH and mCHG showed the same pattern in data generated by knocking out known or potential DNA methylation regulators in the same genetic background (Fig. 1B) (Stroud et al., 2013). This demonstrates strong co-regulation of these types of methylation, in particular for RdDM-targeted transposons." It is noticeable that many double mutants are off the diagonal. To me this indicates that they affect one context more than the other (i.e. they break covariance). Second, it suggests that they are probably interacting non-additively. It would be great if the authors could comment on this observation; perhaps also later in the ms, where they make a case for additivity.
      • The authors write: " it is difficult to say what fraction of these factors is genetic and what is environmental, but, regardless of this, we hypothesized that the substantial covariance could reduce power of GWAS for either mCHH or mCHG (when using a standard univariate model), and that an analysis accounting for this covariance might perform better...". The arguments given thus far are not sufficient to understand why a "substantial covariance" between traits would reduce the power to map individual traits. I think more needs to be done here to motivate this.
      • The FDR analysis using a set of a priori genes should be explained in detail in this ms. It is cumbersome to go to another manuscript to see what was done exactly, especially since this information is also difficult to dig up in the Atwell 2010 study. Although I understand the idea behind this approach, I would be concerned that this type of "FDR" analysis assumes that that all methylation genes are known. A novel candidate that was perhaps never identified in mutants screens before would be classified as a false discovery. Similarly, known candidates that carry no functional polymorphisms in nature, perhaps because they are highly constraint, will never become a discovery.
      • The authors write" "However, MSI1 is required to control DNA methylation via repression of MET1, and a loss of FAS2 in CAF-1 induces mCHG hypermethylation (Fig 1B) (Stroud et al., 2013; Jullien et al., 2008)...", where is the "FAS2 in CAT-1" result visible in Fig. 1B?
      • The results presented in "A jmjC gene is a novel modifier of mCHG in RdDM-targeted transposons" could have been showcased better. Only after reading the methods part did I realize that the authors generated CRISPR mutants. It reads as if the authors just picked up some available loss of function mutants and profiled them. But, clearly, much more work was involved here and the authors could have brought that out more. Perhaps more generally, I think all the new functional analysis the authors perform is largely "under-sold" in this manuscript at the expense of unnecessary methodological/concpetual discussion (see point above).
      • In section "The power and complexity of conditional GWAS", the authors write "The performance of GWAS relies on using the right model for the relation between genotype and phenotype. As with other statistical methods, using the wrong model may lead to unpredictable results." This seems like a too obvious of a statement.

      Significance

      The manuscript is clearly written, and the functional validation of the JMJ26 GWAS signal is commendable and certainly goes beyond the typical GWA study. Beyond this validated association however, the GWAS results are mainly confirmatory. They essentially highlight that methylation genes previously identified by way of mutant screens are variable in natural populations, and (probably) causative of non-CG methylation variation in TEs.

  6. May 2022
    1. scanned for solutions to long-standing problems in his reading,conversations, and everyday life. When he found one, he couldmake a connection that looked to others like a flash of unparalleledbrilliance

      Feynman’s approach encouraged him to follow his interests wherever they might lead. He posed questions and constantly

      Creating strong and clever connections between disparate areas of knowledge can appear to others to be a flash of genius, in part because they didn't have the prior knowledges nor did they put in the work of collecting, remembering, or juxtaposition.

      This method may be one of the primary (only) underpinnings supporting the lone genius myth. This is particularly the case when the underlying ideas were not ones fully developed by the originator. As an example if Einstein had fully developed the ideas of space and time by himself and then put the two together as spacetime, then he's independently built two separate layers, but in reality, he's cleverly juxtaposed two broadly pre-existing ideas and combined them in an intriguing new framing to come up with something new. Because he did this a few times over his life, he's viewed as an even bigger genius, but when we think about what he's done and how, is it really genius or simply an underlying method that may have shaken out anyway by means of statistical thermodynamics of people thinking, reading, communicating, and writing?

      Are there other techniques that also masquerade as genius like this, or is this one of the few/only?

      Link this to Feynman's mention that his writing is the actual thinking that appears on the pages of his notes. "It's the actual thinking."

    1. Joint Public Review:

      The present manuscript compares the connectomes of a large range of mammal species using diffusion MRI data. The manuscript reports two main findings: (1) connectomes of more related species are generally more similar, as assessed using Laplacian eigenspectra, than of unrelated species; (2) differences between species' connectomes are generally driven by local regional connectivity profiles, whereas global features are generally preserved.

      The first finding is comforting, but in a way not extremely surprising. It would be extremely surprising if more related species do not show more similarity in their connectome. Indeed, this is the reason many phylogenetic analyses use statistical techniques that take the relatedness of species explicitly into account. I find the statement that connectome organization recapitulates traditional taxonomies a bit over the top, as this suggests that a phylogenetic tree constructed based on connectomes would be similar to a tree based on other measures, such as morphology or genetics. This will probably be the case, but is not what the authors have tested here.

      The second result is in my opinion the key result of the paper. The main novelty of the paper is that -finally, for the field-bridges approaches taken by some researchers in searching for differences across species (these are usually researchers interested in anatomy) and researchers searching for conserved principles across species (usually researchers approaching connectivity from a network or graph theory perspective). By showing what aspects of a connectome are generally conserved and which are changed, this paper starts unifying the two views and this is an important contribution.

      It would, however, have been nice if the authors had explored this notion a bit further. Now, they just state that taking certain features into account means the connectomes look more different, but they do not zoom into the specific brains to see what this means at a biological level. Some of the authors have published, for instance, on the unique connectivity profiles of parts of the human brain and it would have been nice to show that these fall under the local regional connectivity profile aspects of the connectomes. This is a missed opportunity to even further unify the different research traditions.

      The manuscript suggests that white matter connectivity in mammals is more similar between species within one taxonomic group than across different groups, proposing that the brain's connectome reflects phylogenetic relationships. The manuscript further details which features of the network organisation are associated with larger differences across groups and hence may drive speciation; and which features seem to be a common principle across mammals.

      The authors present evidence based on the analysis of diffusion-weighted brain imaging data across 124 species, 111 of which were included in the comparison. The dataset is a great resource to address their research question.

      The paper is clear and the evidence compelling. The manuscript adds valuable insights into the connectome architecture across species, potentially opening a new perspective on the link between genetics and behaviour. I would like to point out the great open science practice of the authors - code is available with a great ReadMe to guide potential users, connectivity matrices are available, and all software packages used in the analyses have been cited.

      The figures are clear and complement the manuscript.

      Technical Comments:

      - Spectral approach / Interpretation<br /> It would be good to have more insight into the meaning of the spectral distance results. My understanding is this: the eigenvalues of the normalised Laplacian obviously have a mean of 1 (because their sum equals the trace of the Laplacian, which is equal to N [number of nodes]). Therefore, the distances between the spectra essentially amounts to comparing higher moments, and in particular the variance (as the histograms look quite Gaussian, I am guessing the distances are dominated by differences in the variance). But what does it mean that bats have a higher variance in these Eigenvalues than primates? I know that the authors try to give *some* insight, e.g. that when the distribution is peaky around 1, it means there are more stereotypical local patterns of connectivity. I understand that. But what are these patterns?

      - Effect Size / Null Distribution<br /> I like the idea and the ambition of this paper. My main concern is that the differences are very small. Pretty much all the measures (laplacian eigenspectra and network-theoretic measures) are very similar between animals. This can be interpreted in two ways. (1) it may mean that the brain organisation is preserved, which is the interpretation of the authors. But it could also mean that (2) the metrics are not very informative. How do we know if we are in situation (1) or (2)? There is no comparison to a good null model (except in Fig4 but I don't think a random network is a good null). One possible null is two random networks connected to each other with a few random connections (to mimic left-right brains)?

      * The authors use cosine similarity to compare the eigenspectra distributions. I think this does them a disservice. cosine similarity normalises the distributions quadratically instead of linearly. But the main thing that is changing is the variance. So normalising quadratically diminishes the dissimilarities between distributions. I have looked at their data (thanks for sharing!) and using multidimensional scaling with Euclidean looks much better than with cosine distance. I would suggest using euclidean.

      * The authors use a bootstrapping method to calculate an average distance which they claim is useful because they don't have the same number of animals in each category. I don't think this bootstrapping is useful at all. If anything, it just adds noise. Averaging 10,000 samples with replacement does not change the outcome compared to simply averaging the matrices without the sampling. To test this: vary n and it should converge to the average of the original non-sampled data. (I've tried it!)

      * The authors should clarify whether they are using the weighted or binarised connectivity matrices in the spectral approach (and also what threshold). I suspect that they are using binarised matrices, which probably explains why the spectral results fit better with the graph topology results when the latter uses binarised matrices.

      - Parcellation.<br /> One main issue is the way in which the connectomes are divided up into 200 regions each, independent of the brain size. This to me seems a confound. I know it's rather standard practise in the field, but I have yet to see a validation that this does not influence the results. Given the enormity of the dataset here I would ask the authors to run their analyses in a way that the number of regions is a function of the size of the brain-this is a much more realistic assumption, as we know that a shrew size brain has about 20 cortical areas, whereas the human has about 180 according to Glasser et al.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents an approach to estimate Rt for 170 countries. While it is an impressive amount of work, I think the pipeline is similar to many currently available frameworks. The paper claims the following novelties over current framework, but more efforts are needed to be done to make it convincing.

      1) Obtain stable estimate from multiple types of data:

      It turns out the stable estimates just repeatedly use the same approaches to different time series (Figure 3A middle). From the wording I think there should be some methods to combine these time series to have a single estimate of Rt. Overall, the Rt and the time series of infection should be unique. It would be suboptimal, for example if there are big differences in the results from death time series and reported cases time series, which one should I trust?

      We think it is a strength to compute different Rt values based on different data, as this allows researchers, policy makers and the public alike to compare the information from different observation types directly. Any discrepancy between two Re trajectories (e.g. between the Re based on cases, Rcc(t), and that based on hospitalisations Rh(t)) is an indication to investigate which external variables (e.g. testing strategy) have changed. We have found it a great advantage when communicating and sharing our results outside of academia that we could point to these separately obtained Re estimates: if the estimates all agreed, more confidence could be given to them.

      If one would want to estimate a single estimate, this would require adopting a fundamentally different framework to estimate Re, which exceeds the scope of this work. One could use heuristics (weights representing the trustworthiness of a given source at a given time) to combine the various Re estimates into a single ensemble estimate. Alternatively, one could model the full underlying population dynamics (e.g. with a compartmental model including hospitalization and death) and adopt a fully Bayesian approach to fitting such a model. However, both options require heuristics or priors that will vary substantially through time and per country (as discussed in the Supplementary Discussion), and thus limit how widely the pipeline can be applied.

      We have revised the manuscript to make it more clear (early on) that we estimate multiple Re values from separate types of data (see also the response to reviewer 3, item #5). In addition, we now discuss more explicitly what the advantages and disadvantages are of showing these estimates separately (lines 281-290).

      2) Adequate representation of uncertainty:

      This is the result in Figure 2, suggesting the CI from EpiEstim is too narrow. This would be expected given that EpiEstim assumed the input infection time series is observed and fixed. It would be expected that the proposed approach would provide wider CI and hence the proportion covered would be more. However, I think to validate the wider CI is the correct one, simulation studies are required. I think the most related one would be Figure 1B. The results suggested that the approach works when the Rt is not rapidly changing. However, I have concern on the methods for simulation (details below).

      Indeed, the difference in coverage between our method and EpiEstim is due to observation noise. We agree the CI from EpiEstim should be correct assuming that the infection incidence time series can be observed perfectly. However, in reality quite a bit of variability is introduced between infection and case observation: not only due to the delay from infection to observation, but also due to e.g. reduced testing capacity on weekends or reporting errors. To accurately assess the coverage of our method (and whether the CIs are too narrow or too wide) we need to include realistic amounts of observation noise in the simulations. This is why we add autocorrelated noise to our simulated observations, where this noise mimics observed residuals in Switzerland and other countries (Figs. S3, S4, S15, S17).

      We have now added explicit comparison to the EpiEstim confidence intervals to supplementary Fig. S4. In addition, we extended the corresponding method section to describe more extensively why and how we added observation noise to our simulations (lines 498-518; see also the detailed response to comment 4 below).

      3) Real-time of the Rt

      There is no simulation about the real-time property of the Rt. The most related one is still Figure 1B. However, looks at the right-tail of the figure (the real-time performance), the proportion covered the true value is decreasing and more efforts are needed to support the framework can be accurate in real-time. For example, how is the real-time performance when Rt is increasing, or Rt decrease sharply due to lock down?

      As suggested, we included an additional simulation study to investigate the accuracy and stability of the last possible Re estimate. We present this analysis in a new results paragraph (subsection "Stability of Re estimates in an outbreak monitoring context"; line 121) and Figure S10. Using this analysis, we highlight the trade-off that exists between the timeliness of the Re estimates and their stability.

      4) simulation methods to estimate Rt

      Both 2) and 3) need simulation to support the results, and hence the simulation approach would be critical. The first part based on Poisson distribution to generate an infection time series, which is OK. However, the issue is the secondary part about how the authors obtained the time series for death/hospitalization/reported cases. To me, after generating the infection time series, based on the delay distribution from infection to death/hospitalization/reported, we could obtain those time series. I am not clear and sure if the authors approach is correct by using smoothing and fitting ARIMA to get those time series.

      We believe there may have been some confusion about how our simulation set-up works, and we provided insufficient detail on the design decisions behind this set-up. We have added more explanation for both points to the paper (lines 503-518; additional supplementary Figs. S15-S17). In brief, our simulation process consists of three parts. We first conduct the two steps the reviewer also mentioned: (i) simulating the infection time series, and (ii) simulating the observed time series by using the delay distribution from infection to death/hospitalisation/case report.

      However, we find that the observations simulated this way are too smooth compared to real data (see Figure S17). Possible reasons for this are that the delay distribution does not account for weekend and holiday effects, the random and occasional delay in recording confirmed cases, nor irregular components such as confirmed cases that are imported from abroad. We therefore added a noise term in our simulations, resulting in a third step: (iii) adding noise generated from an ARIMA model.

      To obtain a realistic ARIMA model for this third step, we fitted a model based on the confirmed case data for SARS-CoV-2 in Switzerland. Specifically, we first obtained the additive residuals based on the log-transformed confirmed cases. We then fitted ARIMA models of various orders and assessed the resulting ACF and PACF plots of their residuals. Based on this, we chose an ARIMA(2,0,1)(0,1,1) model. We refer to Figure S16 to support this: The first row shows the ACF and PACF plots of the original residuals, showing strong autocorrelation. The second row shows the ACF and PACF plots of the residuals after fitting the ARIMA model. We see that there is little autocorrelation left, indicating that this model is reasonable.

      In Figure S17, we present simulated observations based on all three steps, and one can see that they look more realistic than the simulated observations after step (ii).

      We would also like to point out that the ARIMA model is only used to obtain simulated observations. Our main method to estimate Re and obtain the related confidence intervals does not require fitting an ARIMA model.

      Minor comments:

      1) What does near real-time mean? The estimates of Rt are delayed for a few days like other approaches?

      Indeed, the estimates of Rt are delayed by the time it takes from infection to a case to be observed. We have replaced the term “near real-time” by “timely” throughout the manuscript, and added this explanation of the delay more explicitly to the text (line 86).

      2) For the results in Table 1, I think if there are some results suggesting that other approaches (like EpiEstim) perform worse than the proposed approach, it would be better to illustrate the value of the proposed approach.

      We have improved and extended the comparison of our method against others in two ways: (i) we added further comparison of the coverage of our method vs. that of EpiEstim to Fig. S4 (see also the response to major comment 2), and (ii) we added comparison against different commonly used pipelines (see minor comment 3 below). Instead of comparing to other approaches, the analysis in Table 1 was meant to illustrate the use of the Re estimates resulting from our method alone.

      3) I think more discussions are needed for the similarity and differences for current approach. For example, Abbott et al (https://wellcomeopenresearch.org/articles/5-112) used a similar pipeline.

      We added a section to the results (paragraph starting line 182; Fig. 3), dedicated to comparing our approach with relevant alternatives. We compared some of our empirical results with the estimates published on epiforecasts.io (based on EpiNow2 package from Abbott et al.), as well as official COVID-19 Re estimates for Austria (by AGES) and Germany (by RKI). We find that estimates published by the RKI and AGES health authorities are likely to be overconfident and to suffer from previously-identified biases (notably in Gostic et al., 2020, PLOS Computational Biology). We provide a detailed comparison of the features and approaches of these methods (EpiNow2, AGES, RKI), with the addition of the epidemia R-package (Supp File S2). This comparison highlights the unique features of the method developed: its ability to account for time-varying delay distributions and to combine symptom onset data with case data.

      4) Figure S11 is about accounting for known imports. While if the local cases are dominant and hence imported cases would not have a big impact on estimates of Rt. The impact of imported cases on estimates of Rt could be complicated, as suggested in Tsang et al. (https://pubmed.ncbi.nlm.nih.gov/34086944/). In addition to assuming imported cases and 'exported' cases could be canceled, it is also assumed that the imported cases had similar transmissibility to the local cases, which may not be true if there is border control.

      We thank the reviewer for this interesting comment and reference. We added a brief discussion in the result section of the manuscript to address this limitation (lines 174-177).

      Reviewer #2 (Public Review):

      This manuscript describes an algorithm of estimating real time effective reproductive number R_e (t). This algorithm combines several methods in a reasonable way: deconvolution of time series of reported case into time series of infection, a Poisson model for generation of infections, and block-bootstrap of residuals to assess uncertainty. Each component is not necessarily novel, but the performance of this algorithm has been validated using comprehensive simulation studies. The algorithm was applied to COVID-19 surveillance data in selected countries across continents, revealing a great deal of heterogeneity in the association of R_e (t) with nonpharmaceutical interventions. Overall, the conclusions seem reliable.

      I have several moderate critiques and suggestions:

      1) From a statistical point of view, it seems much more natural to integrate the infection generation process and the delay from infection to reporting, possibly with reporting errors, into the same model, with which you will avoid combining the bootstrap and the credible intervals in a somewhat awkward way. I understand you can take advantage of EpiEstim package, but the likelihood is very simple and easy to program up. Nevertheless, I'm not strongly against the current paradigm.

      We agree that such an integrated approach is useful, and makes the uncertainty interval estimation more coherent. However, in such an integrated approach one can not use the analytical solution for the likelihood, and methods that choose this approach (like EpiNow2 and epidemia) tend to pay for it in computational complexity. It also makes it harder to include time-varying delay distributions into the model, one aspect that sets our pipeline apart from existing alternatives.

      An additional advantage of our method is that estimates for the infection incidence are not influenced by priors on Re. In case of a bad model fit this allows us to separate more easily which part of the model may be misbehaving; and as such can help as a sanity check.

      Lastly, our framework has the advantage of modularity: pieces of the pipeline can be (and were) continuously refined or replaced with better pieces. This continuous improvement process allowed a flexible response to the pressing circumstances (the COVID-19 pandemic), and allowed us to extend it to entirely new types of proxy data (e.g., wastewater viral loads - https://ehp.niehs.nih.gov/doi/10.1289/EHP10050 ).

      2) Is there a strong reason to believe the residuals are autocorrelated? The block sampling with block size 10 seems arbitrary. The authors fitted an ARIMA model to the residuals for some countries, how good was the fitting? If the block size doesn't matter, then probably the stronger but simpler assumption of independent residuals may not compromise the estimation of R_e (t) much.

      Yes, there is reason to believe the residuals are autocorrelated. New supplementary Figure S15 shows the ACF and PACF of the residuals based on the confirmed cases of Switzerland, China, New Zealand, France and the US, and one can see that for most countries, the obtained residuals are clearly autocorrelated. We added this point to the simulations method section in the paper (lines 503-518). Please also see our response to Reviewer 2, major point 4 above.

      Choosing an optimal block size for the block bootstrap method is generally difficult. To capture weekly patterns, we need a block size of at least 7. We tried different sizes and found that 10 tended to work well in a variety of simulation settings (an example is given in Fig. S19).

      3) I don't see the necessity of using segmented R_e (t) instead of a smooth curve in the simulation studies. The inferential performance, especially the coverage of the CI's, is much less satisfactory when a segment has a steep slope. The authors may consider constructing splines based on the segments or using basis functions directly.

      We started using a segmented Re(t) trajectory to allow for simple parametric generation of different scenarios (e.g. in new Fig. S10), and to specifically study our ability to estimate sudden transitions in Re (discussed wrt. Table 1, Fig. S2). We agree this approach makes our method look worse than necessary, since it is generally difficult to estimate such abrupt changes in Re. However, we thought this would be the more stringent test of our method, as we will perform better on any more smooth trajectory.

      4) The authors smoothed the log-transformed observed incidences to come up with the residuals. For Poisson data, a variance-stabilized transformation is taking the square root, not the logarithm. In addition, as you already have bootstrap estimates, why not using quantiles directly for CIs but instead using a normal approximation (asymptotic)? When incidence is low, the normal approximation may be much less satisfactory. Also, when using normal approximation for CI, it's much safer to calculate standard deviation and construct CI at the log-scale, i.e., log(θ ̂^*(t)), and then exponentiate back.

      Our goal of transforming the original case observations is to stabilize the variance of the residuals. Indeed, the square root transformation is generally recommended if the data to be transformed is Poisson distributed. In our case, however, the original case observations are not quite Poisson. Specifically, the infection incidence at time t given the past incidence is modelled with a Poisson process (see Section 4.4), but the case observations are modelled with an additional convolution step of the infection incidence with a delay distribution, and there is additional variation due to e.g. weekday effects. It is thus not clear a priory which transformation works best for our data, and we therefore investigated various possible transformations (including the square root transformation). We found that no transformation was uniformly the best for data of different countries, but that the log-transformation tended to perform best overall. This is why we chose the log-transformation. Please see the new supplementary Figure S14, where we show the residuals after the square root transformation and the log transformations for various countries.

      Regarding the bootstrap confidence intervals, we also investigated different options. Again it is not clear a priory which bootstrap confidence interval performs best for our data, so we compared common choices like quantile, reversed quantile and normal-based in a simulation study. Specifically, we assessed their coverage and found that the normal-based confidence intervals performed best overall (see Fig. S4).

      For low incidence settings, none of the bootstrap methods perform very well (as bootstrap consistency does not apply). We now mention this consideration in the paper (line 442).

      Finally, regarding the suggestion to compute exp(SD(log(X)): This quantity is generally different from SD(X), which we need for the confidence intervals. We also refer to the coverage in the various supplementary figures (e.g. S2, S4, S5) to support that our approach works well.

      5) The stringency index is a convenient metric for intervention intensity. However, it doesn't reflect actual compliance as the authors admitted. Another likely more pertinent metric is human movement (could be multiple movement indices). Human movement indices may not be available in all countries, but they are available in some, e.g., the US, and first wave in China. In some states of the US, it was clear that human movement decreased substantially even before initiation of lockdown. Lack of human movement metrics most likely has contributed to the difficulty in the interpretation of Figure 4.

      We have added mobility data (from Apple and Google location data) to our general dashboard, and to the analysis shown in Fig. 5. The mobility traces give more detailed insight in the behavior that may have led to decreases in Re. However, we find similar patterns wrt. decreases in Re as with the stringency index. A more extensive analysis that focuses on different phases of the pandemic may allow for more detailed insights, but we believe this is beyond the scope of our manuscript.

    2. Reviewer #1 (Public Review):

      This paper presents an approach to estimate Rt for 170 countries. While it is an impressive amount of work, I think the pipeline is similar to many currently available frameworks. The paper claims the following novelties over current framework, but more efforts are needed to be done to make it convincing.

      1) Obtain stable estimate from multiple types of data:<br /> It turns out the stable estimates are just repeatedly use the same approaches to different time series (Figure 3A middle). From the wording I think there should be some methods to combine these time series to have a single estimate of Rt. Overall, the Rt and the time series of infection should be unique. It would be suboptimal, for example if there are big difference in the results from death time series and reported cases time series, which one should I trust?

      2) Adequate representation of uncertainty:<br /> This is the result in Figure 2, suggesting the CI from EpiEstim is too narrow. This would be expected given that EpiEstim assumed the input infection time series is observed and fixed. It would be expected that the proposed approach would provide wider CI and hence the proportion covered would be more. However, I think to validate the wider CI is the correct one, simulation studies are required. I think the most related one would be Figure 1B. The results suggested that the approach works when the Rt is not rapid changing. However, I have concern on the methods for simulation (details below).

      3) Real-time of the Rt<br /> There is no simulation about the real-time property of the Rt. The most related one is still Figure 1B. However, looks at the right-tail of the figure (the real-time performance), the proportion covered the true value is decreasing and more efforts are needed to support the framework can be accurate in real-time. For example, how is the real-time performance when Rt is increasing, or Rt decrease sharply due to lock down?

      4) simulation methods to estimate Rt<br /> Both 2) and 3) needs simulation to support the results, and hence the simulation approach would be critical. The first part based on Poisson distribution to generate an infection time series, which is OK. However, the issue is the secondary part about how the authors obtained the time series for death/hospitalization/reported cases. To me, after generating the infection time series, based on the delay distribution from infection to death/hospitalization/reported, we could obtain those time series. I am not clear and sure if the authors approach is correct by using smoothing and fitting ARIMA to get those time series.

      Minor comments:<br /> 1) What is near real-time mean? The estimates of Rt are delay for a few days like other approach?<br /> 2) For the results in Table 1, I think if there are some results suggesting that other approaches (like EpiEstim) perform worse than the proposed approach, it would be better to illustrate the value of the proposed approach.<br /> 3) I think more discussions are needed for the similarity and differences for current approach. For example, Abbott et al (https://wellcomeopenresearch.org/articles/5-112) used a similar pipeline.<br /> 4) Figure S11 is about accounting for known imports. While if the local cases are dominate and hence imported cases would not have a big impact on estimates of Rt. The impact of imported cases on estimates of Rt could be complicated, as suggested that in Tsang et al. (https://pubmed.ncbi.nlm.nih.gov/34086944/). In addition to assume imported cases and 'exported' cases could be canceled, it is also assumed that the imported cases had similar transmissibility to the local cases, which may not be true if there is border control.

    1. Author Response

      Reviewer #1 (Public Review):

      Xiong and colleagues use an elegant combination of theory development, simulations, and empirical population genomics to interrogate a largely unexplored phenomenon in speciation/ hybridization genomics: the consequences and implications of admixture between species with differing substitution rates. The work presented in this well-written manuscript is thorough, thought provoking, and represents an important advancement for the field. However, there are a few instances where I feel the strength of the conclusions drawn is not fully supported.

      Thank you for the positive comments!

      The authors begin by presenting evidence based on whole genome sequencing that the two focal species, P. syfanius and P. maackii, are highly diverged despite ongoing hybridization. Though the discussion of remarkable mitochondrial sequence similarity is underdeveloped. I do not understand how such a pattern is not most likely the result of introgression from one species to the other given the relatively high FST across much of the nuclear genome coupled with the generally higher mitochondrial mutation rate in animals.

      That’s a very good point. We have included this likely explanation of mitochondrial genome similarity in Line 84-86.

      Next, they posit that barrier loci are likely to exist. To support this assertion, the authors use a combination of parental population genetic diversity and divergence comparisons and ancestry pattern analysis in hybrid populations. They show that there is a strong correlation between divergence across pure species and within species diversity across the autosomes. Then using four hybrid individuals they show that low ancestry randomness, as quantified estimates of between group and within group entropy, is associated with genomic region of reduced within group diversity and elevated between group divergence. The use of entropy estimates as a stand-in for admixture proportions and ancestry block analysis when sample size is severely limited is particularly clever. Though I must admit, I do not fully understand the derivations of the two entropy measures, it seems to me that relatedness might have a strong effect on the interpretability of between individual entropy estimates (Sb). With very small population sizes this may be a real issue.

      Yes, genetic relatedness will play a big role in between-individual entropy (Sb). A group of highly correlated individuals will produce highly predictable ancestry (knowing one individual’s local ancestry gives much information on the local ancestries of others), and Sb will be small because entropy is a measure of uncertainty. If inbreeding is very severe, Sb will no longer be a useful measure because it will be too small across the entire genome. In our hybrid samples, although some genomic regions imply the possibility of inbreeding (see local ancestry of Z chromosomes in Figure 3–Figure supplement 1), there is still considerable variation of Sb across the genome which allows us to test for its correlation with DXY and π.

      A brief discussion of potential caveats in using the new method developed here seems warranted given its potential usefulness to the population genomics field more broadly. One plausible but less likely alternative interpretation of these patterns is briefly discussed.

      We have now devoted the first subsection of Discussion to the caveats and various motivation for entropy metrics. The appendix also contains further explanation of our intuition (section “Appendix-The entropy of ancestry”).

      The authors then move on to evidence for divergent substitution rates. Analysis of both D3 and D4 statistics using several different outgroups and a series of progressively stringent FST thresholds shows that site patterns between the two species are highly asymmetrical with P. maackii lineage harboring more substitutions than P. syfanius. The authors offer two possible explanations for this finding and then test both hypotheses. First, they use a comparative tree-based method to show that there is little phylogenetic evidence for lineage biased hybridization from outgroups into either of the focal lineages. Further, the range overlaps of the study species do not correspond with the inferred direction of allele sharing from the Dstat analysis. This is a good argument against contemporary gene flow between the outgroups and P. syfanius, but I am not convinced that ancient gene flow that could have occurred when, say, species distributions may have been different, can be ruled out using this analysis.

      Yes, we also felt that our original wording was overly strong. Now we say that our argument is based on current geographic distributions, but that archaic gene flow cannot be totally ruled out. However, we also point out that archaic gene flow with outgroups should still leave some detectable fractions of paraphyletic local gene trees after phylogenetic reconstruction. (Line 192-194).

      To test whether this asymmetry can be explained by a difference in substitution rate between the two species the authors show that observed D3 increases and D4 decreases with increasingly divergent outgroups as predicted by theory developed here. The authors take this as evidence supporting the divergent substitution rates. Though they claim only that existence such rate divergence is likely. The unfortunately limited samples sizes seem to preclude attaining more certainty than this. Interestingly, as a byproduct of using D4 as an extended measure of site pattern asymmetry the authors highlight one way in which the ABBA-BABA test can give false positives for introgression. This is an important contribution to the field.

      We agree with the reviewer that, for our data type – a handful of unphased genomes, it will be difficult to obtain more direct evidence for substitution rate differences. In line 182-187, we show using maximum-likelihood gene tree reconstruction that P. maackii samples often inherit more derived mutations than P. syfanius. This could be viewed as a separate test utilizing more accurate substitution models in phylogenetic software, while our theoretical calculation provides a coarse but testable signature of D3 and D4.

      To provide more direct evidence, we believe one ought to measure spontaneous mutation rates in both species under their native habitats, and obtain better knowledge of generation times and population sizes. The limitation of sampling and rearing these rare species are major barriers for incorporating this kind of evidence into this study.

      Finally, the authors observe a monotonic relationship substitution rate ratio and relative genetic divergence across the genome which is in line with their theoretical predictions for differential substitution rates in the face of gene flow. From this they infer an 80% increase in substitution rate from P. syfanius to P. maackii. It is remarkable to be able to extract these substitution rates from genomic regions with the least gene flow. However the veracity of these estimates relies on the assumptions I have highlighted above and should be presented with appropriate caution.

      We have included the limitations of our conclusions in the final subsection of the Discussion. Because high FST regions are relatively rare, estimates of observed rate ratio “r” have larger errors in those regions. This problem is partially resolved by using the entire monotonic relationship between r and FST to estimate the true rate ratio, so we rely not only on regions with the least gene flow but the full dataset.

      However, we do agree with the reviewer that ours is still a coarse theoretical framework since we do not impose a realistic substitution model (e.g., we don’t allow reverse mutations). We have now emphasized this weakness in the Discussion (Line 348-350).

      Reviewer #2 (Public Review):

      In their manuscript ("Admixture of evolutionary rates across a hybrid zone"), Xiong et al. use whole genome resequencing data to assess rates of genome evolution between two species of butterflies and determine whether putative barrier loci between the species are also those that evolve at asymmetric rates between them. This work presents a novel hypothesis and rigorously tests these ideas using a combination of empirical and theoretical work. I think the authors could more formally link loci that are evolving at highly asymmetric rates with those that are most likely to be barrier loci by evaluating the relationship between ancestry entropy and ratios of substitution rates between species. Additionally, clarifying the relationship between barrier loci and asymmetric evolution would be beneficial (i.e. are loci that we typically envision to be barrier loci, such as loci involved in reproductive isolation, evolving at asymmetric rates or do asymmetrically evolving loci represent a new type of barrier loci?).

      Many thanks for these comments! For the second point (clarifying the relationship between barrier loci and asymmetric evolution), we specifically mean that barrier loci (which specifically are of interest to those who study speciation) cause asymmetric rates of evolution to be preserved between hybridizing species. Asymmetric rates themselves are caused by other factors (spontaneous mutation rate differences, generation times, environmental effects) specific to each species, and barrier loci merely prevent the mixing of asymmetric rates. For the first point (evaluating the relationship between entropy and ratios of substitution rates).

    1. Gyuri Lajos 2 minutes ago https://youtu.be/5IfgBX1EW00?t=887 Listen go Frank Herbert for 3 minutes What he says there is perfect harmony of what you say. Thank you for saying. Top Quotes from the Frak Herbert Interview "remember that there's nothing at all wrong with saying that the Protestant ethic is full of it that it's all right to 00:14:30 enjoy your work you don't have to fight your way out of bed every morning you can get up every morning eager to go do whatever it is you do have a love affair 00:14:43 with your with your world and remember that you're not going to be able to predict every consequence of what you do" fiducary roots of science "question things I have the most fun that I'm writing questioning things that people do not question the assumptions that everybody 00:15:56 knows are true I'm going to declare a heresy for you all science if you go 00:16:07 back into its ruts saying why do I believe this well I believe this because of these tests and this this proof well why do I believe this why did I set up 00:16:21 this test why did I believe that proof all science goes back to something that we believe because we believe it we 00:16:34 believe it because we believe it and we have no proof for it it's like a religion so" And the message: Being comfortable with the unknown, as a finite human being "when you dig into the roots of 00:16:45 science a gray area at the bottom but it's like a balloon and the surfaces word the computer science has given us I 00:17:00 love this language the surface of the balloon is their face with what we do not know inside the balloon as we blow into it is what we have proved okay but 00:17:17 as we increase what we think we know we increase our exposure to what we do not know this is one of the inevitable laws 00:17:28 of our universe" as we increase what we think we know we increase our exposure to what we do not know this is one of the inevitable laws 00:17:28 of our universe no dead end, on and on and on "but isn't it more interesting to live in a universe where there are unknowns to discover new lands 00:17:43 to explore than to live in an absolute box where when you find the edge that's it baby no place to go from there I 00:17:57 I like the fact that we cannot predict everything I like the fact that we live in a universe where anything may happen because the alternative to me is a 00:18:12 constricting dead end" No End is the Ending, never Ending! Thank you Quinn. You've got it. Creating a space whaer I can share the same learnings. Anybody who got as far as Chapter House, may be on the second time of reading of it all will sure to get THIS. I believe that Show less Read more 0 0 Reply Gyuri Lajos 42 minutes ago Thank you articulating what I felt back then when I read it back then when it came out. I learned since recently that the message is "being comfortable with unknown", nay delight in it with pious awe towards the dignity of being reflected in human being

      never ending is the ending

      being comfortable with the unknown

      Frank Herbert Dune

    1. Author Response

      Reviewer #1 (Public Review):

      Redman and colleagues employed microprisms and two-photon optical imaging to track separately the structure of dorsal CA1 pyramidal neurons or the activity patterns of dorsal Dentate Gyrus, CA3, CA2 and CA1 pyramidal neurons, longitudinally in live mice. First, they carried out a characterization of the optical properties of their system. Second, they performed an example tracking of dendritic spines in the apical aspect of dorsal CA1 pyramidal neurons. Finally, they characterized differences in spatial coding along the tri-synaptic pathway, in the same animals. The main focus of the manuscript is technological and the authors show interesting data to support their technique, which I believe will be of relevance to neuroscientists interested in the hippocampal formation.

      Strengths.

      While using microprisms to achieve a "side" view of neurons in specific brain areas is not new per se [see Chia et al., J. Neurophysiol. (2009), Andermann et al., Neuron (2013), Low et al., PNAS (2014) etc.] the authors were able to visualize activity of a large neuronal circuit such as the hippocampal trisynaptic pathway - for the first time - in the same animal exploring an environment. This is not only a technical feat but it opens new scientific avenues to study how information is transformed at different stages within the hippocampus, as such I think this will be of broad interest for people in the field. In addition, the authors demonstrated imaging of dendritic spines in the apical aspect of pyramidal neurons but limited to dorsal CA1 due to the labelling density of the transgenic mouse line they decided to use. Despite the fact that imaging apical dendritic spines in dorsal CA1 has been shown earlier [see Schmid et al., Neuron (2016) and Ulivi et al., JoVE (2019)], the use of the micro periscope greatly increases the flexibility of these sort of experiments by enabling tracking of large portion (both apically and basally) of the dendritic arbors of dorsal CA1 pyramidal neurons.

      Thank you for the positive comments. We have clarified that apical CA1 dendrites have been imaged in previous work as you point out, just not along the somatodendritic axis (lines 127-130). We have also clarified that we were able to image CA2 and CA3 spines as well (only DG exhibited the increased labeling density in Thy1-GFP-M mice; lines 130-132).

      Weaknesses.

      While the data are sufficient to demonstrate the technique, the conceptual advance of the paper is very narrow. The findings on spatial coding differences in different hippocampal subregions - namely a nonuniform distribution of spatial information in the different hippocampal subregions - do not add new knowledge but largely confirm the literature. The results on the dynamics of apical dendritic spines of pyramidal neurons in dorsal CA1 seem to confirm previous work, but the interpretation of these results differs fundamentally. In fact both papers cited by the authors (Attardo et al., and Pfeiffer et al.,) come to the conclusion that dendritic spines on basal dendrites of CA1 pyramidal neurons are highly unstable, at least by comparison to other neocortical areas. The authors seem to ignore this discrepancy. However, this discrepancy has importance also to the characterization of the technique the authors developed. In fact, the optical resolution of the system strongly affects the ability to resolve neighboring spines - especially at the high density of dorsal CA1 - and thus it has a direct effect on the measures of synaptic stability [Attardo et al., Nature, (2015)]. The authors duly report lateral and axial resolutions for their micro periscopes and both are lower than the ones of Attardo and Pfeiffer, thus the authors should consider the effects of this difference on the interpretation of their data.

      We agree that the advance described in this manuscript is more methodological than conceptual. We do have other studies in progress that will be of greater conceptual interest. However, we believe the technique is of sufficient interest to the field that it is worth publishing the methodological approach and characterization as soon as possible.

      We have also addressed the comparison with Attardo et al. and Pfeiffer et al. mentioned by the reviewer. We actually agree with the previous work that dendritic spines in CA1 show a high degree of instability compared to cortex, finding ~15% spine addition and ~13% spine subtraction between consecutive days (Fig. 3H, I), similar to single-day turnover rates observed in Attardo et al. and other papers. Despite the high turnover rate, the fraction of experimentally observed spines that persist across 8-10 days plateaus around 75-80%, indicating that there is a substantial fraction of apical spines that remain stable in the face of ongoing daily turnover. This was also observed in basal dendrites by Attardo et al. (with similar survival fractions) and Pfeiffer et al. (albeit with lower survival fractions), so we would not necessarily characterize this as a discrepancy. We have clarified these points in the manuscript (lines 157, 162-168, 331-332).

      The reviewer pointed out that some previous studies used super-resolution microscopy to detect smaller structures and reduce optical merging. This would be an excellent extension of our work, as in principle super-resolution microscopy could be used with the implanted microperiscopes. Although the survival fractions we observed were similar to Attardo et al., they were higher than Pfeiffer et al., possibly due to the predicted effects of optical merging. We have updated the text to note that our results may inflate the degree of stability due to resolution limitations (lines 165-68, 335-340).

      Reviewer #2 (Public Review):

      Strengths

      The Hippocampus is a key brain region for episodic and spatial memory. The major Hippocampal subregions: Dentate Gyrus (DG), CA3, and CA1 have predominantly been investigated independently due to technical limitations that only allow one subregion to be recorded from at a time. In this paper the authors developed a new method that allows DG, CA3, and CA1 to be imaged simultaneously in the same mouse during behavior with a 2-photon microscope. This method will allow investigation of the interactions between Hippocampal subregions during memory processes - a critical yet unexplored area of Hippocampal research. This method therefore provides a new tool that will help provide insight into the complex functions of the Hippocampus during behavior.

      This method also provides high resolution optical access to deep dendritic structures that have been out of reach with existing methods. The authors demonstrate they can measure the structure of single spines on distal apical dendrites of CA1 cells. They track populations of spines and quantify spine changes, spines loss, and spine appearance. Spine turnover is thought to be a key process in how the Hippocampus encodes and consolidates memories, and this method provides a means to quantify spine dynamics over very long time periods (months) and can be used to study spine dynamics in CA3 and DG.

      We appreciate the comments.

      Weaknesses

      This method requires the implantation of a relatively large glass microperiscope that cuts through part of the Septal end of the Hippocampus. This is a necessary step to image transversally and observe all the major subregions simultaneously. This is an unfortunate limitation as it damages the very circuits being investigated. The authors attempt to address this by measuring the functional properties of Hippocampal cells, such as their place field features, and claim they are similar to those measured with other methods that do not damage the Hippocampus. However, it is very likely the implant-induced damage is affecting the imaged cells in some way, so caution should be taken when using this method. The authors are very aware of this and briefly discuss the issue. In addition, the authors observe damaged adjacent to face of the glass microperiscope that extends to ~300 um from the face. This area should therefore be avoided when imaging the Hippocampus through the microperiscope.

      We agree. This will be important for the interpretation of experiments using the microperisope approach. For many experiments, electrophysiology or traditional CA1 imaging approaches might be preferable to avoid damage to the hippocampal structure. We have tried to be straightforward about these caveats in our discussion. However, we believe the capability of imaging the transverse hippocampal circuit will allow a number of experiments that are currently intractable, and that the benefits will outweigh the caveats in these cases.

      Reviewer #3 (Public Review):

      Redman et al. describe a novel approach for long-term cellular and sub-cellular resolution functional and structural imaging of the transverse hippocampal circuit in mice. The authors discuss their procedure for implanting a glass microperiscope and show data that clearly support their ability to simultaneously record from neurons within the DG, CA3, and CA1 subregions of the hippocampus. They offer optical characterization demonstrating sufficient resolution to image at the cellular and subcellular level, which is further supported by experimental data characterizing changes in morphology of CA1 apical dendritic spines. Finally, neurons are recorded from as mice engage in navigation behavior, allowing authors to characterize spatial properties of hippocampal cells and relate findings to prior work in the field.

      The ability to image from multiple hippocampal subregions simultaneously is a great technical achievement, sure to advance study of the hippocampal circuit. In particular, this approach will likely have tremendous application for addressing the question of how neural representations dynamically change across the hippocampal subfields during initial encoding of novel contexts or later during retrieval of familiar. While the feasibility and utility of this preparation is supported by the data, further characterization of recorded cells will aid the comparison of data collected using this imaging approach to data previously collected with other methodologies.

      Thank you for the comments, we have addressed the specific concerns below.

      1) Further measures could be taken to more thoroughly evaluate the impact of the implant on cell health. While authors evaluate glial markers, it is not obvious how long after implant these measurements were taken. Additionally, authors could characterize cell responses of neurons recorded proximal to and more distal to their implant to further evaluate implant effect on cell health.

      Good points. We have added the date post implantation for the histology samples (Figure 1F caption). To address the second point, we added additional experiments characterizing functional response properties as a function of depth (Figure S7). We did not find systematic changes in place field width or place cell spatial information, as a function of imaging depth (lines 220-224; Figure S7A, B). We did however find a significant relationship between the decay constant for the fitted transients and depth, with cells close ( 130 um) to the surface of the microperiscope face exhibiting slower decay (Figure S7C). This appeared to be due to a small fraction of cells exhibiting longer decay times closer to the microperiscope face. As a result, we advise only imaging neurons >150 um from the microperiscope face (lines 224-226).

      2) More in-depth analysis of place cells will aid the comparison of data collected using this novel approach to previously published data. For instance, trial-by-trial data and clearer descriptions of inclusion criteria will allow readers a more detailed understanding of observed place cells.

      We have included example place cells with individual trial data (Figure 5C) and have added additional discussion and detail on our selection process for identifying place cells (lines 207-209, 663-666, 674676). In the revised manuscript, we further increased the stringency of our place cell criteria so that none of the cells with time shuffled responses pass the criteria. It should be noted that our place cells were not as reliable as those recorded in the presence of reward (Go et al, 2021). We chose to forgo reward to help ensure that the neurons were responding to spatial location and not to other task variables, but this likely reduced response reliability (see Krishnan et al, bioRxiv; Pettit et al, 2022). We have added discussion of this issue to the manuscript (lines 307-318).

    1. Author Response

      Reviewer #1 (Public Review):

      The goal of the work was to test for direct and indirect fitness costs associated with specific types of constructs that could be used for gene drive. The authors conclude that there are no direct fitness costs associated with the presence and expression of either Cas9 or the guide RNAs but that the Cas9 is causing off-target cuts that result in loss of fitness. They also conclude that a newer form of CAS doesn't cause these off-target cuts. While the goal of this study is important, there are many caveats associated with the work as reported, and these limit interpretation of the results, Many of the caveats are pointed out in the discussion.

      1.a) I am specifically concerned by the fact that from what I read, a company made the transgenic lines and that there was only one transgenic line per treatment. Unless the fly line used for the insertion was completely homozygous for the chromosome where the insertion was made, the lines could have differed in fitness, due to somewhat deleterious reccessives captured in one G1 but not another. This cost could have persisted for a number of generations after the crosses were made, especially in the high frequency "releases". This may not have been a real problem, but without any replication it is difficult to know.

      We apologize that this was unclear in our initial submission. We did in fact generate several transgenic lines of each construct and used independently obtained lines for each of our population cages, except for the Cas9_gRNAs construct, where four lines were used in seven population cages (replicates 1 to 4 were founded with the same line). All of these were also crossed to w1118 flies before we obtained homozygous lines, so the impact of deleterious alleles would have been minimized. We have edited the section “Generation of transgenic lines” in the Methods to clarify this.

      We also examined the possibility of fitness effects being caused by such alleles in our maximum likelihood analysis (assuming they are unlinked from the construct — otherwise they should have appeared as direct fitness effects). This model was not a good match for the data, nor was the model with direct fitness effects. Based on these results, we consider it unlikely that such deleterious alleles had a major impact on the observed frequency trajectories in our cage populations.

      1.b) My concern is reinforced by the fact that the no-Cas9, no-gRNA line goes up in frequency for the first 5 generations and then becomes stable in frequency. The loss of the fitness advantage is consistent with a fitness effect partially linked to the insertion site in that one cross but not others.

      Both of these cages were made with independent lines. We agree with the reviewer that the increase in frequency of the no-Cas9_no-gRNAs construct at the beginning of the experiment seems surprising at first. However, if an initial fitness advantage was truly driving the dynamics of this construct, we would expect that the “initial off-target model” (where fitness costs originated before the experiment) should have yielded the highest model quality in our maximum likelihood analysis, since we also allowed advantageous cut off-target alleles (i.e., fitness estimates > 1) in this model. While the maximum likelihood fitness estimate in the “initial off-target model” indeed exceeded the reference value of 1, its 95% confidence interval still included a fitness value of 1, and a neutral model actually yielded the lowest AICc value (i.e., best model quality, Table 3). We think that one possible explanation for this apparent initial frequency increase is that population cages tend to undergo larger than average fluctuations in the first one or two generations due to the smaller initial population size and potential health differences between founding fly lines (which can persist for a generation or two). We briefly note this in the manuscript methods section.

      1.c) It is important to note that the starting points are cages with separate vials of the control and experimental strain. Even a small difference in development time of the two strains in the first generation could lead to an excess of homozygotes in the next generation.

      We agree. In our maximum likelihood framework, such differences in development time should show up as a viability difference (fraction of offspring that made it to adulthood in the time window of our experiment). We now note in our revised manuscript that fitness differences between genotypes could be due to longer development time rather than an increase in the juvenile death rate in Cas9_gRNAs carriers. In the “Phenotypic fitness assays” section of our revised manuscript, we additionally state that “longer development time of individuals carrying the Cas9_gRNAs construct would also have appeared as a viability cost in our cage study but not in these fitness assays.”

      1.d) I am also concerned by the fact that the main conclusion is that the decline in frequency in the Cas9-gRNA line is due to off-target cuts, but there was no sequencing to back up that conclusion. In the discussion, this problem is mentioned but dismissed. I don't see how it can be dismissed when this is a major conclusion that remains based on very indirect evidence.

      We thank the reviewer for raising this important concern, which touches on the issue of how our approach differs from previous approaches that sought to directly detect off-target cleavage through sequencing. Our approach, by contrast, seeks to provide a “direct” measurement of the fitness of an allele. While this allows us to avoid the challenging task of detecting off-target mutations in vivo through whole-genome, population-level sequencing (and then predicting their potential effects), it comes at the price that inferences about the molecular nature of these fitness effects will rely on indirect evidence. However, we want to point out that our conclusion of these fitness effects being primarily due to off-target cleavage is based on three independent lines of evidence: (i) The maximum likelihood analysis of the frequency trajectory of the Cas9_gRNAs construct, where statistical model comparison ranked the off-target effect model higher than the direct fitness costs model; (ii) The fact that we inferred fitness costs only for the Cas9_gRNAs construct but not the construct in which Cas9 was replaced with the high-fidelity Cas9HF1 endonuclease (which should have similar expression and thus, similar direct fitness costs); and (iii) The heterogeneity we observed in the frequency trajectories of the Cas9_gRNAs construct in our cages, which is consistent with a model where off-target sites accumulate over the course of the experiment yet more difficult to reconcile with a model of direct fitness costs.

      Inspired by the reviewer’s recommendation, we wondered whether we may in fact be able to directly detect cuts at a few computationally predicted off-target sites. To this end, we performed Sanger sequencing at six sites that were computationally predicted for our Cas9_gRNAs construct by CRISPR Optimal Target Finder, which unfortunately revealed only wild-type sequences (this analysis is described in the new section “Evaluation of computationally predicted off-target sites”). However, we believe that this does not rule out off-target cutting as the primary driver of fitness costs for the Cas9_gRNAs construct due to the following arguments we state in the discussion section of our revised manuscript:

      “For example, our sequencing approach would not have allowed us to detect larger insertion/deletion events, which are frequently observed at on-target sites (48, 49). More likely though, we suspect that cleavage events occurred at other sites than the six computationally predicted ones. Indeed, the predictions by CRISPR Optimal Target Finder are based on cleavage specificity in cell lines, where off-target cutting is known to occur more frequently than in animals (47). All but one of the predicted off-target sites carry combinations of single nucleotide mismatches in the PAM-proximal and the distal region, which could make in-vivo cleavage less likely at these sites. Generally, our results are consistent with other studies that found off-target cleavage to frequently occur at sites which would have been difficult to predict computationally (50).”

      In a sense, our inability to detect any mutated alleles at this small set of computationally predicted off-target sites might actually highlight a key benefit of our approach: It can estimate the potential fitness costs of a construct without having to rely on accurate computational predictions of putative off-target sites or requiring the very costly approach of whole-genome, population-scale sequencing.

      Additionally, we would like to point out that while we found off-target effects to explain the empirical data best, we would probably consider our estimation of the overall magnitude of the fitness costs of the Cas9_gRNAs construct as one of the main conclusions of our manuscript, together with the fact that these were avoided when using the high-fidelity Cas9HF1 endonuclease instead. Thus, even if some readers may remain skeptical about the role of off-target cleavage (and we made sure to qualify our claims on this in the Discussion section accordingly), our systematic analysis of the overall fitness effects is more robust and should be of broad interest.

      1.e) When releasing homing gene drives, the initial frequency of the transgenic line is very low, and as in the Garrood et al paper cited, it is possible for the gene drive to outpace the non-target cutting. The modeling does not address what the impact of the presumed fitness costs in this experiment would be for a replacement/suppression drive released at low frequency.

      We thank the reviewer for raising this point. It has led us to add a completely new analysis on the “Effect of off-target fitness costs on gene drive performance”, in which we now show simulation results to illustrate the effect of direct and off-target fitness effects on both modification and suppression homing drives. We have also added more discussion on how these different types of fitness costs may affect other frequency-dependent CRISPR based gene drives.

      Reviewer #2 (Public Review):

      This paper reports a set of Drosophila population cage experiments aimed at quantifying fitness effects associated with the expression of Cas9 gene drive constructs in the absence of homing. The study attempts to deconvolve fitness effects due to the presence of the active nuclease at a genomic location from those that arise from off-target effects elsewhere in the genome: an important issue when considering gene drive strategies in the wild. To distinguish effects due to cleavage at the target site from activity elsewhere in the genome, a construct where Cas9 was replaced with a high fidelity nuclease (Cas9HF1) was employed. The experimental design compares the active nuclease-gRNA constructs targeting a site on another chromosome with no gRNA and reporter only controls, all inserted in the same locus. The Cas9 construct was assayed in 7 replicates with Cas9HF1 and controls assessed as duplicates with cages running for between 8 and 19 generations.

      2.a) There is a lack of clarity in terms of the cage set up design, the description in the supplementary methods could clarify if all the replicates came from a single founder and the difference in set-ups that necessitated ignoring some 1st generations.

      Thank you for pointing this out. We have thoroughly revised and extended our Methods section on “Generation of transgenic lines” to clarify this point. We now explicitly mention that we generated several transgenic lines of each construct and used independently obtained lines for each of our population cages, except for the Cas9_gRNAs construct, where we used four lines in seven population cages (replicates 1 to 4 were founded with the same line).

      For the cage start conditions, we now note that “To avoid potentially confounding maternal fitness effects on the construct frequency dynamics (which could arise based on minor differences in health or age between the initial batches of flies mixed together), we excluded the first generation of five cage populations…” In general, it is quite common for this to happen in insect population cage studies (please see some examples below) and is always a very short-term effect.

      2.b) The main finding reported from this part of the work is that with the control populations the frequency of the construct remained fairly constant across the generations, but the active nuclease tended to decline. I am somewhat confused by some of the claims here. First, the authors report a "bottoming out" effect where construct frequency declines then levels off: I am not entirely convinced that Figure 2 shows this. For example, comparing replicates 4 and 5 (8 and 16 generations respectively), it looks to me that there is a steady decline at the same rate with no evidence for a plateau. Perhaps replicates 2 and 3 show "some" evidence of leveling. In addition, replicates 4, 5, 6 and 7 have similar construct starting frequencies (particularly 5 and 7, which are only a few % different) yet the former show a steady decline whereas the latter maintain the construct at a steady level. This does not appear to be consistent with the author's explanation of higher off-target effects in populations carrying high frequencies of the construct. It would be helpful if the authors could more clearly explain the trajectories presented in Figure 2.

      We agree with the reviewer that our initial description of the raw construct frequency dynamics solely based on visual clues was making too strong claims (e.g., “different frequency dynamics between single replicates”) without providing more quantitative statistical support. This was originally intended as some basic introduction, with our maximum likelihood analysis then providing a more rigorous assessment in the next section. To improve clarity, we have completely restructured this in our revised manuscript. We removed the comparison of Cas9_gRNAs replicates solely based on visual clues, highlighted the general heterogeneity in trajectories among replicates (without making any specific claims), and instead of the vaguely defined “bottoming out” interpretation, we now only mention the average construct frequency change for the Cas9_gRNAs construct. In addition, we now present our more rigorous maximum likelihood analysis of the construct frequency trajectories and statistical model comparison earlier on in the Results section, so that all of our conclusions are now based on this statistical analysis, rather than an initial visual inspection of the curves. Please see also our comments to point 3.a) below, as reviewer 3 made very similar comments and suggestions.

      2.c) Utilising the allele frequencies obtained from the cages, 2 locus ML models were applied with the construct insertion site and an idealised off target site. They argue, correctly in my view, that fitness effects can be attributed to off target activity and not cleavage at the 3L target since the Cas9HF1 construct shows no substantive effect. In the models they assume that the presence of Cas9 in the germline (or maternally contributed) will invariably lead to cleavage at the idealised site. The model indicates that the construct insertion per se has no direct fitness costs but that off-target effects may have fitness consequences of approximately 30%, and seek to support this conclusion with simulations. I found this section difficult to follow but I feel that the conclusions are supported.

      We agree with the reviewer that the “Maximum likelihood analysis” section was too dense and therefore challenging to follow, especially for non-expert readers who may not be very familiar with such methods. We have revised and extended this section. In particular, we now also provide a brief summary of the modeling approach at the beginning of the section and have added subsection titles aiming to better guide the reader through the various steps of the analysis. Furthermore, we added a table with an overview of all tested models and highlighted the best-fitting models in tables 2 and 3. We hope that this has improved the clarity of our revised manuscript.

      2.d) Direct phenotypic assays with the active Cas9 nuclease were performed, looking at viability, mating preference and fecundity. Relegating these data to the supplements is not useful. While significant effects are attributed to the Cas9-gRNA construct, the authors cannot rule out a DsRed effect and it is a shame they did not assay at least one of the control constructs. In addition, in their modelling they assume that Cas9 activity will always cleave but see no evidence for this in the heterozygote viability assay. Whether this is due to the difference in rearing conditions that the authors claim is debatable.

      We thank the reviewer for this valuable feedback. As suggested, we have moved the phenotypic assays (Methods & Results) of the Cas9_gRNAs construct to the main part of the revised manuscript. We decided to conduct phenotypic assays only for the Cas9_gRNAs construct, because it was the only one that displayed some fitness costs in our maximum likelihood analysis (in particular, the DsRed construct did not display any fitness costs in the cages). However, given more time and capacity, we agree that additional phenotypic assays would have been desirable (e.g., a larger sample size per construct and additional constructs). Regarding our choice of model for the maximum likelihood analysis, we used a highly simplified off-target approach, which was necessary given the available information.

      2.e) Finally, since the initial cage experiments suggest that the Cas9HF1 enzyme reduces off-target effects they assay this enzyme in a model homing drive, indicating that this enzyme performs as well as the regular Cas9. Again, relegation of these data to supplementary datasets is unhelpful and it would improve the manuscript if these results could be simply summarised in a figure.

      We added an additional figure at the end of the “Cas9HF1 homing drive” section in the Results showing the gene drive inheritance rate and resistance allele formation rate in early embryos for the Cas9HF1 and Cas9 homing drive respectively. The gene drive inheritance rate is the percentage of offspring with DsRed fluorescence when crossing individual gene drive heterozygotes with “wildtype” homozygotes (i.e., not carrying any gene drive allele) and is used to calculate the gene drive conversion rate (i.e., the rate at which wildtype alleles are converted to drive alleles) mentioned in the main text. We hope that this has improved the clarity of our revised manuscript.

      2.f) Taken together, I think this is a useful study but is presented in a way that is at times impenetrable to the non expert. More clarity in presenting the cage and modelling data, as well as promotion of figures from supplementary material to the main manuscript would considerably aid the non expert and provide greater confidence in the interpretations. If these issue could be clarified I feel the work provides a useful addition to the gene drive field and will help those thinking about developing such strategies, particularly relevant are the findings related to the Cas9HF1 enzyme.

      We thank the reviewer for the valuable feedback. We have significantly revised the Results as well as the Discussion, provided additional information on the modeling approach, and shifted supplementary material to the main text of the manuscript. We hope this has improved the overall clarity of the manuscript.

      Reviewer #3 (Public Review):

      The manuscript by Langmuller, Champer and colleagues reports a set of experiments and models investigating the fitness effects of transgenes in Drosophila melanogaster carrying CRISPR components to determine how useful such transgenes may be for population control. This study benefits from well-designed transgene constructs that allow the investigators to distinguish the effects of on-target and off-target Cas9 endonuclease activity, and a sophisticated maximum likelihood modeling framework that allows estimation of the fitness effects of the transgene constructs. The manuscript's major shortcoming is the absence of statistical analysis of the allele frequency data and some potentially unrealistic assumptions that went into the model.

      3.a) My first recommendation is that a statistical analysis of the allele frequency data should be included in the manuscript, rather than inferring patterns solely from visual inspection of the data. Specifically, the manuscript claims that (lines 176-180): "We found Cas9_gRNAs to be the only construct that systematically decreased in frequency across all replicate cages (Figure 2). Interestingly, the allele frequency change was not consistent with fixed direct fitness costs. Instead, the construct frequency "bottomed out" in most replicates, and this occurred more quickly when the starting frequency was higher (Figure 2)." These conclusions regarding allele frequency changes should be supported by statistical analyses. What is the uncertainty surrounding the allele frequency estimates? Some indication of this uncertainty (such as error bars) could be added to Figure 2. Which of the trajectories in Figure 2 show a statistically significant change in allele frequency over the course of the experiment? Is the increase in the frequency of the no-Cas9_no-gRNA replicates significant? What support is there for the claim that the allele frequency changes "bottomed out"? Does a non-linear model fit these data significantly better than a linear trend? What is the evidence that allele frequency decreases slowed earlier "when the starting frequency was higher"? What is the evidence that "replicates 3 and 4 ... had very different frequency dynamics"? While they started at different frequencies, the slope of those two trajectories could be statistically indistinguishable. What is the authors' interpretation of the Cas9_gRNAs replicates 6 & 7 whose trajectories did not decrease?

      We thank the reviewer for this detailed recommendation. We agree that our description of construct frequency dynamics solely from visual clues was indeed making too strong claims (e.g., regarding “different frequency dynamics”) without providing enough statistical support for these specific statements. We had originally thought that some readers would prefer we first provide such a qualitative description of the allele frequency trajectories, prior to going into the mathematically more rigorous (but therefore also more complicated) maximum likelihood inference of fitness costs and statistical model comparison of different selection scenarios (“full inference model” vs. “construct model” vs. “off-target model”, etc.)

      In response to the reviewer’s comments, we decided to completely restructure this first part of the Results section. Specifically, we have removed our comparison of Cas9_gRNAs replicates solely based on visual clues, and also any mention of the admittedly vaguely defined “bottoming out” behavior. Instead, we now only mention the average frequency change for the Cas9_gRNAs construct across all replicates, while highlighting the heterogeneity among replicates. The maximum likelihood analysis is now introduced right after this and has also been revised extensively to improve clarity. We believe that this analysis provides a very powerful framework for the systematic inference of fitness costs and for assessing which of the different selection scenarios best explains our empirical data. This is because it combines the data from all replicates while fully accounting for the heterogeneity among them. For example, it could well be that construct frequency trajectories in individual replicates may not be statistically distinguishable from neutral evolution, yet in aggregate, an inferred fitness cost of the construct becomes highly significant. Note that the maximum likelihood framework also provides confidence intervals for its estimates, based on the entirety of the data. So the question of whether a departure from a neutral model is significant comes down to whether the 95% confidence interval surrounding the fitness estimate of the given construct still includes a value of 1 (which it does for the “direct fitness” estimate of the full model, but not for the “off-target fitness” estimate, see Table 2).

      Regarding the comment about error bars for the allele frequency trajectories in Figure 2, we want to point out that our construct frequency estimates are actually based on the genotype counts of all adult flies present in the given cage experiment at the specific time point. We therefore did not include uncertainty estimates in Figure 2, nor did we include sampling noise in the maximum likelihood analysis. We have now clarified this in the caption of Figure 2 and in the Methods section (“Maximum Likelihood framework for fitness cost estimation”). We also acknowledge that we still cannot rule out sampling noise completely (for example through escaped flies, phenotyping errors, or loss of frozen flies due to destruction or other issues). However, we expect that the relative contribution of these errors should be negligible compared to drift.

      The reviewer raises an interesting question: Why did the Cas9_gRNAs construct frequency not decrease in the two replicates with the highest construct starting frequency (replicate 6 and 7)? A possible explanation could be that — given a limited set of off-target sites — cut off-target alleles that impose a fitness cost will accumulate and start to independently segregate from the construct alleles very quickly in populations where the construct has a high starting frequency (and thus a higher overall rate of cleavage events). We now state this possible explanation in the section on “Construct frequency dynamics suggest moderate off-target fitness costs” of our revised manuscript.

      3.b) My second recommendation involves the assumptions that went into the maximum likelihood modeling. In particular, it strikes me as unrealistic to assume that 1) the genome contains only a single off-target site that is entirely responsible for the decrease in fitness due to Cas9 activity; and 2) that the rate of off-target mutation is as high as it is assumed to be ("In individuals that carry a construct, all uncut off-target alleles are assumed to be cut in the germline, which are then passed on to offspring that could suffer negative fitness consequences."). Regarding point 1), isn't a more realistic scenario that there are multiple off-target sites, each with a potentially different fitness consequence resulting from Cas9-induced mutations? If so, doesn't the likelihood that all off-target sites have been cut depend on the number of such sites, as multiple off-target sites should reduce the mutation rate at any single site. This possibility also suggests that there may be multiple loci with potentially deleterious Cas9-induced alleles segregating within the experimental populations. Regarding point 2), even assuming only a few potential off-target sites per genome, it seems like the rate of off-target cutting would have to be unrealistically high to approach mutating all off-target sites in the population. The conversion efficiency of the constructs used here is reported as ~80% and 60% in females and males, respectively; it seems likely that the rate of Cas9 mutation at off-target sites is lower than this efficiency for the target site. These assumptions should be justified or relaxed before claiming that mutational saturation of off-target sites is responsible for a decreasing fitness loss over the course of the experiments (after confirming that there is statistical support for the claim that the allele frequency trajectories bottom out).

      The reviewer raises a very important point: modeling only one off-target site that represents the net fitness effect of Cas9 cleavage outside the target region as well as a cut rate of 100 % (i.e., the off-target site is always cut in the presence of Cas9) is highly idealized.

      (1) We agree with the reviewer that in reality, the experimental populations might have a polygenic off-target landscape, where the fitness of cleavage alleles could differ vastly within as well as between loci. However, given the limited number of data points (e.g., n=87 generation transitions for experimental populations with the Cas9_gRNAs construct), it would be extremely difficult if not impossible to disentangle the numerous parameters that would be necessary to describe such a more complex off-target scenario with our modeling approach. We have now highlighted our model choices, potential caveats, and resulting limitations in both the Discussion section and also the section “Construct frequency dynamics suggest moderate off-target fitness costs” in the Results.

      (2) Similar to the single off-target locus, our cut rate of 100 % is an idealized assumption that was chosen with the aim to reduce model complexity. As outlined above, it would be extremely hard to disentangle the cut rate from other parameters (such as the number of target sites if fitness effects are multiplicative across loci). Additionally, we would like to point out that the reported conversion efficiencies (~80 % in males, ~60% in females) are not the conversion efficiencies of the constructs in the experimental populations shown in Figure 2, but of separate homing drives with a single gRNA. All constructs in the experimental populations are designed in a way that no homing can occur, and they have four gRNAs if any. We apologize for the confusion. Our revised manuscript contains now a paragraph in the “Cas9HF1 homing drive” section in the Results that highlights the differences between the constructs in the cage populations and the homing drives assessed in this study. Furthermore, we have added an additional figure that displays the individual results of the homing drive (Figure 5) — we hope this improves clarity.

      3.c) My third suggestion involves the correspondence between the results of the likelihood modeling and the phenotypic assays. The best fit model inferred a viability loss of 26% and no detectable effects on female choice (or male attractiveness) or fecundity. In contrast, the phenotypic assays inferred no detectable effect on viability, but a 50% reduction in male attractiveness and 25% reduction in female fecundity. I think that the authors' conclusion that "[t]hese assays broadly confirmed our previous findings" needs some context or explanation as to how these numerically discrepant findings are broadly confirming, beyond the speculation that the discrepancy in viability may be due to rearing in vials vs. population cages.

      We thank the reviewer for pointing this out. We removed the claim that the phenotypic assays “broadly confirmed our previous findings” and highlight now the differences in estimated fitness costs for male and females in the phenotypic assays as well as the discrepancy to our maximum likelihood estimates. Furthermore, we provide now additional explanations for what might be causing this phenomenon (i.e., single crosses vs. large populations, vial vs. cage, interactions between individual genotypes and the environment, delayed development of construct homozygotes being interpreted as reduced viability in the maximum likelihood analysis). We also point towards the discrepancies in the Discussion of our revised manuscript and recap potential explanations.

      3.d) My fourth suggestion involves the comparison between the Cas9_gRNAs and Cas9HF1_gRNAs transgenes. The inference that off-target cuts are the major source of fitness loss for the Cas9_gRNAs construct relies heavily on the observation that there was no decrease in allele frequency for the two Cas9HF1_gRNAs replicates. It therefore seems critical to be confident in this observation, and to rule out alternative explanations as much as possible. For example, did the authors confirm that the Cas9HF1_gRNAs construct has on-target Cas9 activity levels as high as the Cas9_gRNAs construct? Although I am not certain about this (see comments in the next paragraph on this point), I think the transgene constructs used to estimate drive conversion rates are different from the constructs used for the population cage experiments; if this is correct, I think it would be helpful to provide the on-target mutation rates for the actual constructs used in the population cages.

      The reviewer is correct: The constructs in the population cages are different to the homing gene drives for which we estimated the gene drive conversion rates. However, we were able to confirm at least one mutated gRNA target site in every PCR-based genotyped offspring of individuals carrying either the Cas9_gRNAs or the Cas9HF1_gRNAs construct (this is now specified in the manuscript). Thus, we did not expect a systematic difference in on-target mutation rates for Cas9_gRNAs, and Cas9HF1_gRNAs constructs respectively. We acknowledge in the Discussion that construct performance might substantially vary with genomic sites and even organisms.

      3.e) Relatedly, I was confused about the portion of the manuscript that reports the drive conversion efficiency. The manuscript states, "As a proof-of-principle that Cas9HF1 is indeed a feasible alternative, we designed a homing drive that is identical to a previous drive (45), except that it uses Cas9HF1 instead of standard Cas9. This drive targets an artificial EGFP target locus with a single gRNA (see Methods)." Given that the rate of drive conversion was estimated by the loss of GFP, these homing drive constructs must be different from the constructs used in the population cage experiments, as those constructs targeted a site on chromosome 3L which does not contain GFP. I could not find a description of these homing constructs in the Methods - while a reader might be able to puzzle this out by reading reference #45, I think it would be helpful to explicitly describe these details in this manuscript.

      We apologize for the confusion. We have highlighted the similarities (e.g., nanos promoter, DsRed) as well as the differences (e.g., number of gRNAs) between the homing drives and the constructs in the cage populations at the beginning of the section “Cas9HF1 homing drive” in the Results. We hope this makes it more clear.

    2. Reviewer #3 (Public Review):

      The manuscript by Langmuller, Champer and colleagues reports a set of experiments and models investigating the fitness effects of transgenes in Drosophila melanogaster carrying CRISPR components to determine how useful such transgenes may be for population control. This study benefits from well-designed transgene constructs that allow the investigators to distinguish the effects of on-target and off-target Cas9 endonuclease activity, and a sophisticated maximum likelihood modeling framework that allows estimation of the fitness effects of the transgene constructs. The manuscript's major shortcoming is the absence of statistical analysis of the allele frequency data and some potentially unrealistic assumptions that went into the model.

      My first recommendation is that a statistical analysis of the allele frequency data should be included in the manuscript, rather than inferring patterns solely from visual inspection of the data. Specifically, the manuscript claims that (lines 176-180): "We found Cas9_gRNAs to be the only construct that systematically decreased in frequency across all replicate cages (Figure 2). Interestingly, the allele frequency change was not consistent with fixed direct fitness costs. Instead, the construct frequency "bottomed out" in most replicates, and this occurred more quickly when the starting frequency was higher (Figure 2)." These conclusions regarding allele frequency changes should be supported by statistical analyses. What is the uncertainty surrounding the allele frequency estimates? Some indication of this uncertainty (such as error bars) could be added to Figure 2. Which of the trajectories in Figure 2 show a statistically significant change in allele frequency over the course of the experiment? Is the *increase* in the frequency of the no-Cas9_no-gRNA replicates significant? What support is there for the claim that the allele frequency changes "bottomed out"? Does a non-linear model fit these data significantly better than a linear trend? What is the evidence that allele frequency decreases slowed earlier "when the starting frequency was higher"? What is the evidence that "replicates 3 and 4 ... had very different frequency dynamics"? While they started at different frequencies, the slope of those two trajectories could be statistically indistinguishable. What is the authors' interpretation of the Cas9_gRNAs replicates 6 & 7 whose trajectories did not decrease?

      My second recommendation involves the assumptions that went into the maximum likelihood modeling. In particular, it strikes me as unrealistic to assume that 1) the genome contains only a single off-target site that is entirely responsible for the decrease in fitness due to Cas9 activity; and 2) that the rate of off-target mutation is as high as it is assumed to be ("In individuals that carry a construct, all uncut off-target alleles are assumed to be cut in the germline, which are then passed on to offspring that could suffer negative fitness consequences."). Regarding point 1), isn't a more realistic scenario that there are multiple off-target sites, each with a potentially different fitness consequence resulting from Cas9-induced mutations? If so, doesn't the likelihood that all off-target sites have been cut depend on the number of such sites, as multiple off-target sites should reduce the mutation rate at any single site. This possibility also suggests that there may be multiple loci with potentially deleterious Cas9-induced alleles segregating within the experimental populations. Regarding point 2), even assuming only a few potential off-target sites per genome, it seems like the rate of off-target cutting would have to be unrealistically high to approach mutating all off-target sites in the population. The conversion efficiency of the constructs used here is reported as ~80% and 60% in females and males, respectively; it seems likely that the rate of Cas9 mutation at off-target sites is lower than this efficiency for the target site. These assumptions should be justified or relaxed before claiming that mutational saturation of off-target sites is responsible for a decreasing fitness loss over the course of the experiments (after confirming that there is statistical support for the claim that the allele frequency trajectories bottom out).

      My third suggestion involves the correspondence between the results of the likelihood modeling and the phenotypic assays. The best fit model inferred a viability loss of 26% and no detectable effects on female choice (or male attractiveness) or fecundity. In contrast, the phenotypic assays inferred no detectable effect on viability, but a 50% reduction in male attractiveness and 25% reduction in female fecundity. I think that the authors' conclusion that "[t]hese assays broadly confirmed our previous findings" needs some context or explanation as to how these numerically discrepant findings are broadly confirming, beyond the speculation that the discrepancy in viability may be due to rearing in vials vs. population cages.

      My fourth suggestion involves the comparison between the Cas9_gRNAs and Cas9HF1_gRNAs transgenes. The inference that off-target cuts are the major source of fitness loss for the Cas9_gRNAs construct relies heavily on the observation that there was no decrease in allele frequency for the two Cas9HF1_gRNAs replicates. It therefore seems critical to be confident in this observation, and to rule out alternative explanations as much as possible. For example, did the authors confirm that the Cas9HF1_gRNAs construct has on-target Cas9 activity levels as high as the Cas9_gRNAs construct? Although I am not certain about this (see comments in the next paragraph on this point), I think the transgene constructs used to estimate drive conversion rates are different from the constructs used for the population cage experiments; if this is correct, I think it would be helpful to provide the on-target mutation rates for the actual constructs used in the population cages.

      Relatedly, I was confused about the portion of the manuscript that reports the drive conversion efficiency. The manuscript states, "As a proof-of-principle that Cas9HF1 is indeed a feasible alternative, we designed a homing drive that is identical to a previous drive (45), except that it uses Cas9HF1 instead of standard Cas9. This drive targets an artificial EGFP target locus with a single gRNA (see Methods)." Given that the rate of drive conversion was estimated by the loss of GFP, these homing drive constructs must be different from the constructs used in the population cage experiments, as those constructs targeted a site on chromosome 3L which does not contain GFP. I could not find a description of these homing constructs in the Methods - while a reader might be able to puzzle this out by reading reference #45, I think it would be helpful to explicitly describe these details in this manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      The primary strength of this study is in establishing the N999S heterozygous mouse as a useful model system for debilitating paroxysmal non-kinesigenic dyskinesia (PKND), with or without epilepsy. This outcome was hard-won following a comprehensive analysis of biophysical, neurophysiological, and behavioral tests. Ultimately the convincing evidence was demonstrated through a clever application of a stress-related behavioral test (quite in alignment with triggers in patients) to elicit the hypo-motility associated with PKND. Like patients who exhibit variable penetrance, even highly inbred mice exhibit much variability, and uncovering a robust phenotype took a nuanced approach and perseverance.

      To reach this point, several experiments provided mechanistic insights into the mutant channel behavior. First, whole-cell patch clamp experiments revealed shifts in the G-V consistent with gain-of-function behavior previously characterized using the N999S and D434G mutants expressed heterologously. Novel observations of H444Q revealed a loss-of-function (LOF) behavior with the G-V shifted to positive potentials but to a lesser degree. These electrophysiological phenotypes establish the rank of predicted severity as N999S>D434G>H444Q.

      This prediction was tested in brain slices of heterozygous animals where the mutant channels would be normally spliced and associate with WT subunits and other components such as beta subunits. The investigators evaluated BK currents by patch clamp from hippocampal neurons where BK channels are known to play key functional roles. Both N999S and D434G showed the predicted increase in current magnitude, though interestingly the differences between them apparent in heterologous expression were lost in the native setting. Curiously, no differences in BK current magnitude were observed in neurons of heterozygotes carrying the putatively LOF mutation H444Q.

      In terms of seizure susceptibility, D434G mutants different from WT and less severe than N999S mutants with respect to time to evoked seizure, although differences in "EEG power" were not statistically significant between D434G and WT. These observations support the conclusion that D434G represents an intermediate disease phenotype.

      The behavioral studies were the most effective in revealing differences among the variants and in defining GOF N999S heterozygotes as a compelling animal model for PKND and providing evidence that the LOF mutation conferred the opposite effect of hyperkinetic mobility. The findings provide the new insight that KCNMA is the target of heritable, monogenic disease, a conclusion that was previously not forthcoming because known human mutations have arisen de novo. The dyskinetic phenotypes in response to stress induction are wholly consistent with patient symptoms.

      With respect to rigor and reproducibility, it is commendable that the investigators were blinded to genotype during data collection and analysis. Moreover, the study provides an important confirmation of previous findings from another lab regarding the cellular phenotype of the N999S mutant. WT controls were compared to transgenic littermates within individual transgenic lines. In some cases, the sample sizes were rather low (see below), but otherwise the study seems rigorous.

      The strengths of the manuscript far outweighed the weaknesses. The experiments interpreted to suggest a gene dosage effect with D434G were not compelling to this reviewer and might be better documented in the supplement with the conclusion that further work is required.

      Due to pandemic-related animal and lab issues, we were unable to generate and surgically implant full Kcnma1D434G/D434G homozygous cohorts for the EEG/seizure portion of the study. We focused instead on using the limited mice of this genotype for the novel PNKD3 assays (n=7), leaving the seizure dataset at n=3.

      To address the concern, the Kcnma1D434G/D434G data was removed from Figure 4 to avoid overinterpretation of a gene dosage effect. However, we did retain the individual measurements within the Results text (lines 383 and 385), on the basis of facilitating direct comparisons between our study and other D434G studies. For example, even with only three measurements, the trend toward the shortest seizure latencies in Kcnma1D434G/D434G mice is similar to the result obtained with an independently generated D434G mouse model (Dong et al, 2022). Yet seizure power and the presence of spontaneous seizures do not show a similar trend, suggesting our results differ from theirs in these important aspects. This is now stated more clearly in the revised conclusion for that paragraph, ‘While not conclusive and requiring substantiation in a larger cohort, the Kcnma1D434G/D434G seizure data raise the possibility of a gene dosage effect with D434G that qualitatively differs from an independently-generated D434G mouse model (Dong et al., 2022),’ (lines 388-390).

      In contrast to the seizure part of the study, the increased severity of Kcnma1D434G/D434G PNKD-immobility is fully supported by the data with sufficient statistical power (Figure 5D). However, the idea that the increased severity with homozygous D434G in PNKD-immobility was consistent with gene dosage observations for seizure was removed for consistency (lines 549-550).

      As a side note, we also added additional clinical descriptors (akinesia) and colloquial descriptions for PNKD3 (‘drop attack’) to disambiguate how a PNKD3 episode appears different from other types of motor dysfunction. This was to facilitate comparison with the two other KCNMA1-D434G models (mouse and fly; Dong et al, 2022; Kratschmer et al., 2021), which report aspects of dyskinesia in the setting of baseline locomotor dysfunction. To our knowledge, these models have not been evaluated for the striking ‘drop attack’ immobility presenting in patients (lines 84-85).

      The consequences of the altered BK current levels were assessed on the voltage dependence of firing frequency in the hippocampal neurons, but it was not very clear how increased BK current would enhance neuronal excitability. Also, how might it lead to the PKND phenotype? A paragraph even speculating on these mechanistic links in the Discussion would be welcome.

      The mechanism for how BK currents increase action potential firing are not fully identified in this study (see also response to reviewer #2). In the Results, a new paragraph was added at the end of action potential section to summarize the AHP changes in more detail and speculate an indirect mechanism of action for the increase in BK current, predicted from a similar ‘GOF’ BK current type, where β4 regulation of BK channels is lost (lines 294-304). Additional details have also been added to the Discussion regarding the factors contributing to lower seizure threshold (lines 675-680).

      Additional re-organization of Discussion text addresses the basis for PNKD. A direct statement that it is not clear yet which neurons/circuits are the most critical for PNKD-like symptomology was added, and which of these express BK channels (lines 680-700). We follow with a succinct summary of phenotypically-relevant PNKD models. While there is a lot to unpack with respect to similarities and differences between different paroxysmal dyskinesia models in the literature, they ultimately shed little light the question of KCNMA1 PNKD3-related dysfunction. With the addition of the d-amp rescue control, we focus mainly on the amphetamine response predicting a CNS locus (lines 692-693). The d-amp response may even suggest dopaminergic pathways (some of which express BK channels) as a plausible to investigate in future studies, but due to the complex interplay of d-amp dosage and the novel motor assay, we don’t think speculating on a specific circuit is supported with enough actual data to add in the Discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      The 2019, Johnson et al., Science study (referred to as "2019 study" or "prior study" in the rest of the comments) measured mutational robustness in F1 segregants derived from a yeast cross between a laboratory and a wine strain, which differ at >35,000 loci. To realize this, the authors developed a pipeline 1) to create the same set of transposon insertion mutations in each yeast strain via transformation; and 2) to measure the fitness effects of these specific insertion mutations.

      In this manuscript, the authors applied the same pipeline to laboratory evolved yeast strains that differ in only tens or hundreds of loci and thus are much less divergent than those used in the prior study. Both studies aim to characterize how the fitness of the sets of insertion mutations (mostly deleterious) vary depending on the existing mutations (mostly beneficial) in those yeast strains. However, the current manuscript, especially when compared to the prior study, suffers from several major weaknesses.

      First, only 91 genes out of >6,000 genes in the yeast genome are perturbed in the manuscript. The small set of disruption mutations is unlikely to faithfully capture the pattern of epistasis in the selected clones. By comparison, >1,000 insertion mutations were evaluated in the 2019 study. Because the majority of the >1,000 tested mutations were neutral, the authors focused on 91 insertions that had significant fitness effects. The same 91 insertion mutations are used in the current study. However, as evident in both studies, epistasis plays an important role in how insertion mutations interact with different genetic backgrounds. Considering the vastly different genetic backgrounds between clones used in the prior and current studies, the insertion mutations of interest in the current study is unlikely to be the same as those in the prior study. The large-scale genetic insertion used in the prior study is suggested to be conducted in the current study.

      This concern is summarized in Essential Revision 1 above; see our comments there for our detailed response. Briefly, we have added an additional Figure Supplement (Fig. 1 – Supplement 8; see above) demonstrating that the 91 insertion mutants have a similar range of effects in this study as in the previous one (which may be expected since the genetic backgrounds here are as closely related to those in the 2019 study as the backgrounds in the 2019 study are to each other).

      Second, the statistical power in the current manuscript is insufficient to support the conclusions. Fitness errors were not considered when several main conclusions were drawn (fitness errors on the y-axis of Figure 1B are not available; fitness errors on the x-axis of Figure 2 are not available). The current conclusions are invalid without knowing the magnitude of fitness error. Fitness of each clone should be measured in at least two replicates in order to infer errors of fitness measurements. Additionally, the authors isolated two clones from the same timepoint of each population and treated them as biological replicates based on the fitness correlation between the two clones. However, this practice can be problematic because the extent of fitness correlation varies across populations and it is less likely to capture the patterns of epistasis when clones are isolated from more heterogeneous populations. Similarly, the authors could avoid this bias by measuring the fitness of each clone in multiple replicates and treat the two clones from the same timepoint/population separately.

      We agree that details about statistical methods, most of which are taken from Johnson et al. (2019), were not clear in our text. As we also describe in our response to the Essential Revisions above, we have rewritten a large part of the methods text to provide more details about statistical methods and have calculated and reported errors more broadly:

      Errors on fitness effects: We have expanded our methods text describing how the fitness effect of a mutation is determined for a single clone / condition. This text now emphasizes the internal replication provided by redundant barcodes, which allows us to calculate a standard error for the effect of a mutation in a single clone / condition. These errors are shown in Figure 1 – Figure Supplements 1-3. We have also added details on how errors are calculated for a mutation for a population-timepoint, and these errors are now included in Figure 2.

      Errors on the DFE mean: We discuss this below.

      Considering clones separately: As we also describe in the essential revisions above, Johnson et al. (2021) shows that the mutational dynamics in these evolving populations are dominated by successive selective sweeps, so we expect clones isolated from the same population-timepoint to rarely differ by many mutations. However, we agree that there are likely some cases in which the two clones have important genetic differences. To address this concern, we have reanalyzed our data as you suggest, considering each clone separately. The results of this analysis are included for every main text figure in the form of figure supplements (Figure 1 - figure supplement 7, Figure 2 - figure supplement 5, Figure 3 - Figure supplement 5, and Figure 4 - figure supplement 1), which show that our qualitative conclusions are unchanged.

      Reviewer #2 (Public Review):

      Johnson and Desai have developed a yeast experimental-evolution system where they can insert barcoded disruptive mutations into the genome and measure their individual effect on fitness. In 2019 they published a study in Science where they did this in a set of yeast variants derived by crossing two highly diverged yeast. They found a pattern that they termed "increasing cost epistasis": insertion mutations tended to have more deleterious effects on higher fitness backgrounds. The term "increasing cost epistasis" was coined to play off the converse pattern commonly observed in experimental evolution of "diminishing returns epistasis" wherein beneficial mutations tend to have smaller effects on more fit backgrounds. Another way to think about fitness effects is in terms of robustness: when mutations tend to have little effect on phenotype, the system is said to be robust. Thus, when increasing costs epistasis is observed, it suggests that higher fitness backgrounds are less robust.

      In this paper, Johnson and Desai use this same barcoded-insertions system in yeast, but here the backgrounds receiving insertions are adapting populations. More specifically, they took 6 replicate populations that evolved for 8-10k generations and inserted a panel of 91 mutations at 6 timepoints along the evolutions. They then did this entire experiment in two different environments: one in rich media at permissive temperature (YPD 30) and one in a defined media at high temperature (SC 37). Importantly, the mutations accumulating in a population over time here are driven by selection-and thus the patterns of epistasis observed here are probably more relevant to "real" evolution than the backgrounds from the 2019 paper. The overarching question, then, is whether similar patterns of epistasis is found in these long-term adaptations and across conditions as was previously observed.

      The first major finding in this work is that at YPD 30 (where the yeast are presumably "happy"), the mean fitness effect does decline in most (but not all) populations as they adapt. Since the population is becoming more fit over time (relative to a constant reference type), this is consistent with the previously observed pattern of increasing cost epistasis. The strength of the effect is, however, weaker than in that previous work. The authors speculate that this may reflect the fact that far few mutations are involved here than in the previous study-giving far fewer opportunities for (mostly negative) epistatic effects. I find this explanation likely correct, although speculative.

      The second major, and far more surprising, result is that in the other condition (SC 37), the insertion mutations mutations do not show a consistent trend: mean fitness effect of the insertion mutations does not change as adaptation proceeds. This is despite the fact that fitness increases in these population over time just as it did in the YPD populations. Toward the end of the paper, the authors speculate as to why this is the case. They argue that in the YPD 30 environment, selection is mainly on pure growth rate. They suggest that the growth rate depends on different components such as DNA synthesis, production of translation machinery, and cell wall synthesis. Critically, these components are non-redundant and can't "fill in" for each other. So, for example, rapid DNA synthesis is of little value if cell-wall synthesis is slow. As adaptation fixes mutations that increase the function of one of these growth components, they shift the "control coefficient" to other components. This, they argue, may be the major explanation behind increasing cost epistasis. I find the logic of their argument compelling and potentially providing great insight into developing a richer view of epistasis. Future experiments will be needed to test how well the hypothesis holds up. They then flip the argument around and suggest that in the SC 37 environment, the targets of selection are fundamentally different from those in growth-centric YPD 30 conditions. Instead, they argue, there is likely more redundancy in the components that mutations are affecting. I again find their arguments compelling.

      After establishing these observed patterns for mean effects, they examine individual mutations and look at the relationship of fitness effects as a function of background fitness. The upshot of this analysis is that there are more negative correlations than positive ones (especially in the YPD 30 conditions), but also that there is a lot of variation: there are many mutations that show no correlation and a small number with a positive correlation. This casts substantial doubt on the simplistic view that for the vast majority of mutations, fitness itself causes mutations to have greater costs.

      We thank the reviewer for these positive comments and the nice summary of our work.

      As a minor point of criticism, a lot of statistical test are being done here and there is no attempt to address the issue of multiple testing. I would like the authors to address this. I say minor because I don't think the overarching patterns are being affected by a few false positive tests.

      Related points were also raised by the other reviewer. To address this, we have added multiple-hypothesis-corrected p-values for these least-squares Wald Tests (using the Benjamini-Hochberg method) to our dataset (Supplementary File 1). As you suggest, for this particular analysis in which we compare the overall number of mutations following each pattern, we are willing to accept the possibility of false positives, so we still use the original p-values to categorize the mutations in Figure 2. We address this point in the main text and provide the numbers of mutations falling in each category after we perform this correction:

      “Because we are primarily focused on comparing the frequency of each pattern across environments, we report these values before multiple-hypothesis-testing correction here and in Figure 2; after a Benjamini Hochberg multiple-hypothesis correction these values fall to 24/77 (~31%), 15/74 (~20%), 9/77 (~12%), and 11/74 (~15%), respectively.”

      From here the authors turn to using a formal modeling to understand epistasis better. For each mutation, they fit the fitness data to three models: fitness-mediated model = fitness effects are explained by background fitness, idiosyncratic model = fitness effects can change at any point in an evolution when a new mutation fixes, and full model = fitness effects depend on both fitness and idiosyncratic effects.

      My major criticism of the work lies here: the authors don't explain how the models work carefully and thoroughly, leaving the reader to question downstream conclusions. Typically, when models are nested (as the fitness-mediated and idiosyncratic models appear to be nested within the full model), the full model will, by definition, fit the data better than the nested models. But that is not the case here: for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Further, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail. Another frustrating part of the model fitting is that each model is fit to each mutation individually, but there is no attempt to justify this approach over one where each model is expected to explain all mutations. I'm not saying I think the authors have chosen a poor strategy in what they have done, but they have made a set of decisions about how to model the problem that carries consequences for interpretation, and they don't justify or discuss those decisions. I think this needs to be added to the paper. I think this should include both a high level, philosophy-of-our-approach section and, probably elsewhere, the details.

      The reason this matters is because the authors move on to use the fitted models and the estimated coefficients from the models in discussing and interpreting the structure of epistasis. For example, they say "We find that the fitness model often explains a large amount of variance, in agreement with our earlier analysis, but the idiosyncratic model and the full model usually offer more explanatory power." Looking at Figure 3A, this certainly appears to be the case, yet that type of statement is squarely in the domain of model comparison/selection-but as explained above, this issue is not addressed. Relatedly, the authors go on to argue that "Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations." I'm left wondering whether the details of the model fitting process matter. I am left asking how the idiosyncratic model would perform on data that arose, for example, under the fitness-mediated model? Or how it would perform on data originating under the full model? Is it true that when data arises under a different model (say the full model) but is fit under the idiosyncratic model, negative coefficients always represent negative epistasis and positive coefficients will always represent positive epistasis and that model misspecification does not introduce any bias? Another thing I am left wondering about concerns the number of observed coefficients in the idiosyncratic model: if one mutation shows similar effects across backgrounds, it might generate one coefficient during model fitting, while another mutation that has different effects on different backgrounds could give rise to several coefficients-is there some type of weighting that addresses the fact that individual mutations can generate different numbers of coefficients? One can imagine bias arising here if this isn't treated carefully.

      One of the main conclusion that the authors reach is that the pattern of increasing cost epistasis (observed previously and here in the YPD 30 environment) may not arise from the effect of background fitness itself, but instead arise because epistatic effects tend to be negative-and the more interactions there are (with mutations accumulating over time), the more they tend to have a negative cumulative effect. I find it very likely that the authors have this major conclusion correct. By contrast, they find that at SC 37, the distribution of fitness effects is less negatively skewed-with a considerable number of coefficients estimated to be > 0. They close with a really interesting discussion exploring how these patterns likely arise from underlying biology of the cell, metabolic flux, redundancy, and selection for loss-of-function vs gain-of-function. I find a lot of this interesting and insightful. But because some of their conclusions rest squarely on the modeling, I encourage the authors to be more thorough and convincing in how they execute this aspect of the work.

      Thanks for these detailed comments about the modeling approach and analysis, which raise points that were also described in the Essential Revisions and by Reviewer 1. We agree that these details were not presented sufficiently clearly in the original manuscript. In the revised manuscript, we have added a much more in-depth section on the details of the modeling procedures in the Materials and Methods, including formulas for each model and a discussion of how noise could affect our modeling results (see responses to essential revisions and reviewer 1 above for more information). This includes an analysis of shuffled and simulated datasets, which will give readers a better sense of how to interpret these modeling results. We have also included a new paragraph in the results that compares the models for each mutation and for the entire dataset using the Bayesian Information Criteria (BIC):

      “We can also ask which model best explains the data using the BIC, which penalizes models based on the number of parameters. The small squares below the bars in Figure 3A indicate which model has the lowest BIC for each mutation. In YPD 30°C, the full model has the lowest BIC for 40/77 (~52%) mutations and the idiosyncratic model has the lowest BIC for 37/77 (~48%). In SC 37°C, the full model has the lowest BIC for 49/73 (~67%) mutations and the idiosyncratic model has the lowest BIC for 24/73 (~33%). When we assess how well each model fits the entire dataset in each environment, the full model has a lower BIC than the idiosyncratic model in both environments.”

      We also appreciate the suggestion to look at how coefficients are spread among mutations. We have made a new supplemental figure (Figure 3 - Figure supplement 3) that clearly shows the coefficients broken down by mutation for each condition. This figure shows that coefficients are often clustered for one mutation. That is, multiple populations often have similar coefficients / patterns of epistasis for a particular mutation. We don’t view this as a source of bias in our data, but as an indication that the mutations fixing in these populations sometimes exhibit similar patterns of epistasis with these insertion mutations. We now reference this supplemental figure in the main text (“see Figure 3 – figure supplement 3 for a breakdown of coefficients by individual mutations”) as a better representation of the coefficients that result from our modeling.

    2. Reviewer #2 (Public Review):

      Johnson and Desai have developed a yeast experimental-evolution system where they can insert barcoded disruptive mutations into the genome and measure their individual effect on fitness. In 2019 they published a study in Science where they did this in a set of yeast variants derived by crossing two highly diverged yeast. They found a pattern that they termed "increasing cost epistasis": insertion mutations tended to have more deleterious effects on higher fitness backgrounds. The term "increasing cost epistasis" was coined to play off the converse pattern commonly observed in experimental evolution of "diminishing returns epistasis" wherein beneficial mutations tend to have smaller effects on more fit backgrounds. Another way to think about fitness effects is in terms of robustness: when mutations tend to have little effect on phenotype, the system is said to be robust. Thus, when increasing costs epistasis is observed, it suggests that higher fitness backgrounds are less robust.

      In this paper, Johnson and Desai use this same barcoded-insertions system in yeast, but here the backgrounds receiving insertions are adapting populations. More specifically, they took 6 replicate populations that evolved for 8-10k generations and inserted a panel of 91 mutations at 6 timepoints along the evolutions. They then did this entire experiment in two different environments: one in rich media at permissive temperature (YPD 30) and one in a defined media at high temperature (SC 37). Importantly, the mutations accumulating in a population over time here are driven by selection-and thus the patterns of epistasis observed here are probably more relevant to "real" evolution than the backgrounds from the 2019 paper. The overarching question, then, is whether similar patterns of epistasis is found in these long-term adaptations and across conditions as was previously observed.

      The first major finding in this work is that at YPD 30 (where the yeast are presumably "happy"), the mean fitness effect does decline in most (but not all) populations as they adapt. Since the population is becoming more fit over time (relative to a constant reference type), this is consistent with the previously observed pattern of increasing cost epistasis. The strength of the effect is, however, weaker than in that previous work. The authors speculate that this may reflect the fact that far few mutations are involved here than in the previous study-giving far fewer opportunities for (mostly negative) epistatic effects. I find this explanation likely correct, although speculative.

      The second major, and far more surprising, result is that in the other condition (SC 37), the insertion mutations mutations do not show a consistent trend: mean fitness effect of the insertion mutations does not change as adaptation proceeds. This is despite the fact that fitness increases in these population over time just as it did in the YPD populations. Toward the end of the paper, the authors speculate as to why this is the case. They argue that in the YPD 30 environment, selection is mainly on pure growth rate. They suggest that the growth rate depends on different components such as DNA synthesis, production of translation machinery, and cell wall synthesis. Critically, these components are non-redundant and can't "fill in" for each other. So, for example, rapid DNA synthesis is of little value if cell-wall synthesis is slow. As adaptation fixes mutations that increase the function of one of these growth components, they shift the "control coefficient" to other components. This, they argue, may be the major explanation behind increasing cost epistasis. I find the logic of their argument compelling and potentially providing great insight into developing a richer view of epistasis. Future experiments will be needed to test how well the hypothesis holds up. They then flip the argument around and suggest that in the SC 37 environment, the targets of selection are fundamentally different from those in growth-centric YPD 30 conditions. Instead, they argue, there is likely more redundancy in the components that mutations are affecting. I again find their arguments compelling.

      After establishing these observed patterns for mean effects, they examine individual mutations and look at the relationship of fitness effects as a function of background fitness. The upshot of this analysis is that there are more negative correlations than positive ones (especially in the YPD 30 conditions), but also that there is a lot of variation: there are many mutations that show no correlation and a small number with a positive correlation. This casts substantial doubt on the simplistic view that for the vast majority of mutations, fitness itself causes mutations to have greater costs. As a minor point of criticism, a lot of statistical test are being done here and there is no attempt to address the issue of multiple testing. I would like the authors to address this. I say minor because I don't think the overarching patterns are being affected by a few false positive tests.

      From here the authors turn to using a formal modeling to understand epistasis better. For each mutation, they fit the fitness data to three models: fitness-mediated model = fitness effects are explained by background fitness, idiosyncratic model = fitness effects can change at any point in an evolution when a new mutation fixes, and full model = fitness effects depend on both fitness and idiosyncratic effects.

      My major criticism of the work lies here: the authors don't explain how the models work carefully and thoroughly, leaving the reader to question downstream conclusions. Typically, when models are nested (as the fitness-mediated and idiosyncratic models appear to be nested within the full model), the full model will, by definition, fit the data better than the nested models. But that is not the case here: for many mutations the idiosyncratic model explains more of the variance than the full model (e.g. Figure 3A). (Note, the fitness-mediated model never fits better than the full model). Further, when dealing with nested models in general, one should ask whether the more complex model fits the data enough better to justify accepting it over simpler model(s). There are clearly details and constraints in the models used here (and likely in the fitting process) that matter, but these are not discussed in any detail. Another frustrating part of the model fitting is that each model is fit to each mutation individually, but there is no attempt to justify this approach over one where each model is expected to explain all mutations. I'm not saying I think the authors have chosen a poor strategy in what they have done, but they have made a set of decisions about how to model the problem that carries consequences for interpretation, and they don't justify or discuss those decisions. I think this needs to be added to the paper. I think this should include both a high level, philosophy-of-our-approach section and, probably elsewhere, the details.

      The reason this matters is because the authors move on to use the fitted models and the estimated coefficients from the models in discussing and interpreting the structure of epistasis. For example, they say "We find that the fitness model often explains a large amount of variance, in agreement with our earlier analysis, but the idiosyncratic model and the full model usually offer more explanatory power." Looking at Figure 3A, this certainly appears to be the case, yet that type of statement is squarely in the domain of model comparison/selection-but as explained above, this issue is not addressed. Relatedly, the authors go on to argue that "Positive and negative coefficients in the idiosyncratic model represent positive and negative epistasis between mutations that fix during evolution and our insertion mutations." I'm left wondering whether the details of the model fitting process matter. I am left asking how the idiosyncratic model would perform on data that arose, for example, under the fitness-mediated model? Or how it would perform on data originating under the full model? Is it true that when data arises under a different model (say the full model) but is fit under the idiosyncratic model, negative coefficients always represent negative epistasis and positive coefficients will always represent positive epistasis and that model misspecification does not introduce any bias? Another thing I am left wondering about concerns the number of observed coefficients in the idiosyncratic model: if one mutation shows similar effects across backgrounds, it might generate one coefficient during model fitting, while another mutation that has different effects on different backgrounds could give rise to several coefficients-is there some type of weighting that addresses the fact that individual mutations can generate different numbers of coefficients? One can imagine bias arising here if this isn't treated carefully.

      One of the main conclusion that the authors reach is that the pattern of increasing cost epistasis (observed previously and here in the YPD 30 environment) may not arise from the effect of background fitness itself, but instead arise because epistatic effects tend to be negative-and the more interactions there are (with mutations accumulating over time), the more they tend to have a negative cumulative effect. I find it very likely that the authors have this major conclusion correct. By contrast, they find that at SC 37, the distribution of fitness effects is less negatively skewed-with a considerable number of coefficients estimated to be > 0. They close with a really interesting discussion exploring how these patterns likely arise from underlying biology of the cell, metabolic flux, redundancy, and selection for loss-of-function vs gain-of-function. I find a lot of this interesting and insightful. But because some of their conclusions rest squarely on the modeling, I encourage the authors to be more thorough and convincing in how they execute this aspect of the work.

    1. Reviewer #3 (Public Review):

      Punishment is a key form of learning and behavior change, yet its core behavioural and brain mechanisms remain poorly understood and certainly less well understood than reward learning. This manuscript by Jacobs et al from the Moghaddam laboratory uses dual fibre photometry for calcium transients to make an important advance in understanding how punishment is learned by studying how punishment changes action and punisher coding in the PFC and VTA of rats. This work builds on the elegant single unit work from this group reported previously. The authors use a single action, probabilistic task whereby rats are first trained to nosepoke for sugar pellets on an FR1, with a 5 sec DS signalling reinforcement. Then, in blocks of 30 trials each, the nosepoke is punished on a probabilistic contingency of 0%, 6%, 10%. The authors used dual fibre photometry to concurrently record calcium transients in "dmPFC" and VTA, with a focus on transients related to action emission and punisher as well as reward delivery.

      There are quite a few key findings here: 1) action transients in dmPFC change across punishment from modest inhibitory transients in 0% risk to no change (i.e possible loss of inhibitory transient in PFC) or modest positive transients (in VTA) as risk increased from 6-10%; 2) comparison with past single-unit data suggested similarity between photometry and single unit measures for the action but not DS; 3) there was no change in punisher transients in these regions; 4) diazepam which had modest behavioral effects to alleviate punishment had no effects on PFC transient to the action or punisher but did reveal peri-action ramping-like transients in VTA; 5) diazepam increased correlated activity between VTA and PFC at 0% and 6% risk

      Overall, I enjoyed reading this manuscript and I learned much from it. The work builds neatly and clearly on the past work of this group in this task, providing new information on how punishment shapes action coding in the prefrontal cortex and VTA, how it shapes correlated activity between these regions, and how benzodiazepines may affect these to achieve their anxiolytic effects. The critical conclusions are that these regions are important for action, but not punisher, encoding, and that peri-action ramping in VTA neurons and VTA-PFC correlated activity contribute to the anxiolytic effects of benzodiazepines in this task.

      Comments

      1. I think it is worth drawing the distinction between punishment (i.e. learning and performance) versus the punisher (footshock). For example, the title (and across the manuscript) refers to "punishment coding" to mean transients to the punisher itself. I would suggest using "punisher" when referring to the outcome used (footshock) or its associated transients and "punishment" when referring to learning. So, learning punishment involves changes in action but not punisher encoding in these regions.

      2. "dmPFC". Different researchers mean different things by this term. Would it be possible to state exactly where the fibres were instead (e.g., Laubach et al., eNeuro, 2018)?

      3. I did struggle to understand the functional significance of the PFC transients. I am convinced they are real and robust because we see precisely the same in our own unpublished work. But, I am still puzzled as to what a loss of an 'inhibitory' transient around the punished action in PFC means? This is not really addressed but it is the main effect of punishment on action coding in the PFC and I think some readers would appreciate the author's interpretation of this.

      4. Related to 3, it was also not clear why these PFC transients differed only at 6% risk and not also 10% risk. Again, I think this is worth discussing.

      5. Re: analyses. I thought these were generally well done. There are two questions one might be interested in. The first is whether the transients are different from 0%. The second is whether transients differ across sessions. The figures do a good job at answering the second question (which to me is the most important question) by using coloured bars above transients to show when session differences are present as assessed by a robust analysis. However, I do think some readers would also appreciate knowing whether and when transients themselves were significantly < or > 0%. Perhaps these figures could be presented as supplementary data.

      6. The comparison with previously published single-unit data was very interesting. Here I was persuaded that these correlations were meaningful because of the difference between these correlations for cue and action. I am not suggesting the authors do the following, I only offer it for their consideration in future work. Kriegeskorte has developed ways of assessing dissimilarity in different data types from the same behavioural designs that could prove very helpful and persuasive here (e.g., Front. Syst. Neurosci., 24 November 2008; https://doi.org/10.3389/neuro.06.004.2008).

      7. The authors comment on the overgeneralisation of punishment learning. That is, in session 1 there is a broad suppression of behavior by punishment that was not obviously present in the remaining sessions. I am not sure overgeneralisation is the best term because this implies punishment learning generalised. More likely is that Pavlovian fear was present in session 1 to generally suppress nosepoking and this fear was reduced in the remaining sessions as the instrumental punishment contingency was learned. Bolles made this point some years ago and it may be worth citing Bolles et al. Learning and Motivation Volume 11, Issue 1, February 1980, Pages 78-96, on this point.

    1. Author Response

      Reviewer #2 (Public Review):

      This manuscript from Shi, Ballesta, and Padoa-Schioppa examines the relationship between neural activity in the monkey orbitofrontal cortex (OFC) and various choice patterns that arise in sequential (versus simultaneous) choice. This approach addresses a central question in the study of decision-making: how can one identify value-dependent versus value-independent effects on choice behavior when value is defined from that behavior itself? Here, the authors document three behavioral differences in sequential choice: choosers are nosier, show an order bias, and show a preference bias. Leveraging a conceptual computational framework for OFC activity that the authors have developed over many years, the authors link reduced accuracy to changes in neural valuation in the OFC, order effects to post-valuation decision activity in the OFC, and preference effects to extra-OFC processes. For decision neuroscientists, these findings show specific differences between sequential and simultaneous choice, and suggest the integration of multiple stages (valuation, decision, and post-decision) in the selection process. More broadly, this work shows how an examination of neural activity can shed light on aspects of the decision process that cannot be distinguished by an examination of behavior alone.

      Strengths:

      Overall, this paper presents a novel and thoughtful task design that allows comparison of neural and behavioral value and choice effects. In concert with an established circuit-based framework for parsing different types of OFC response patterns, the authors test and validate a number of hypotheses on the link between neural activity and choice.

      (1) Comparing sequential and simultaneous choice tasks in an interleaved manner is a clever approach to separate valuation and comparison processes in time. While not entirely novel (e.g. see work from the Hayden group), the combination of this approach with the OFC response pattern (offer value, chosen value, chosen juice) framework allows a distinction between valuation and comparison-related effects.

      (2) This paper is the latest in a significant series of related papers on orbitofrontal activity from this group, and cleverly utilizes their expertise in characterizing, analyzing, and conceptualizing different patterns of OFC activity. In addition to the long-established offer value/chosen value/chosen juice categorization, recent papers from this group have established the causal contribution of OFC offer value activity to economic choice and established similar OFC neural contributions to sequential and simultaneous choice tasks.

      (3) Apart from a causal test (e.g. cell type specific stimulation) of the contribution of different neural responses to different choice effects, the next strongest evidence is a demonstration of a consistent relationship across sessions. The authors show such a relationship between offer value coding strength and choice accuracy, between chosen value sequence effects and behavioral order bias, and between chosen juice inhibition and order bias. At the least, these relatively strong effects show a strong correlation between different OFC responses and behavior.

      Thank you for emphasizing these points.

      Weaknesses:

      While the experimental approach and rigor of the analyses are strengths, there are issues of interpretation and generality of analytical approaches that should be clarified.

      (1) The abstract, introduction, and discussion touch on canonical behavioral economic choice effects as a prelude to the behavioral effects documented here, but it's not clear they are so closely related. [A] Many of the effects in the cited literature (framing effects in risky choice, preference reversals, etc.) are robust across different task paradigms, whereas the effects shown here arise specifically from a comparison of choice across different task paradigms (sequential vs. simultaneous). Furthermore, [B] it's not clear that the term "bias" adequately captures the array of effects in the behavioral economic literature (for that matter, [C] one of the main effects in this paper is reduced choice accuracy rather than a bias). [D] The paper would benefit from a clearer conceptual linkage between documented behavioral biases (particularly in humans) and the effects shown here.

      [B] We beg to differ. In our reading of the literature, the term “bias” is very general and it is invoked practically every time choices present some effect that seems idiosyncratic or “irrational”. The list of documented biases is very long – a good reference is the Wikipedia page on cognitive biases (for more scholarly references, see (Gilovich et al., 2002; Kahneman et al., 1982)).

      [A] As for whether biases documented in behavioral economics are robust across task paradigms, that’s really matter of perspectives. For example, we all understand the phenomenon of loss aversion (a.k.a. “status quo bias”) to be very robust and almost intuitive. But before the prospect theory paper of Kahneman and Tversky (1979), that was not at all the case. In the 15 years following that paper, much of what Kahneman and Tversky did was to show how loss aversion affected choices in different domains (Kahneman and Tversky, 2000). Other biases are much less reliable. For example, there is an extensive literature on decoy effects – i.e., violations of the axiom of “independence of irrelevant alternatives”. However, it turns out that the strength and even the direction of decoy effects depend on seemingly minor details (Spektor et al., 2021). In other words, decoy effects are not as robust as one might think. As for the biases dicussed here, our hunch is that the order bias is quite ubiquitous. Indeed, it was already documented using different tasks in different species (Krajbich et al., 2010; Rustichini et al., 2021). The preference bias might also be the manifestation of a rather general phenomenon. Afterall, there is a common intuition that when a decision is difficult we sometimes fail to finalize it, and eventually choose some default option. In conclusion, we think of the two biases discussed here as conceptually very comparable to biases described in behavioral economics.

      [C] We agree that the drop in accuracy is (strictly speaking) not a choice bias, and we carefully chose the title and wrote the whole manuscript to keep that point clear. However, let us note that the drop in accuracy observed under sequential offers could easily be construed as a choice bias – specifically, a bias favoring in any situation the lesser option (lower value). As we conclude the present study, this phenomenon continues to fascinate us. Indeed, while it is clear that the behavioral effect arises at the valuation stage, we still don’t understand why the activity range of offer value cells is reduced under sequential offers. Naively, one might have guessed the opposite – i.e., that when only one offer is on display, the lack of competition translates to stronger offer value signals. We plan to give this issue more thought in the future. One possibility is that the system modulates the activity range of offer value cells depending on the task and/or the behavioral context. If so, differences in choice accuracy measured under sequential versus simultaneous offers would be a manifestation of a more general phenomenon. Of course, this matter remains open for future research.

      [D] The link between the biases discussed here and other biases described in the literature is conceptual. The main point we want to make is this: Over the past 20 years, we have gained some understanding of the neural circuit and mechanisms underlying simple economic choices. While our understanding remains incomplete and object of ongoing research, notions acquired for simple choices can be used to make sense of a broader class of choices. Thus, in principle at least, it is possible to shed light on a variety of traits and biases by observing the activity of particular cell groups. The last paragraph of the ms conveys this point.

      (2) The analyses rely on a particular quantification of choice behavior (probit regression), which interprets choice effects (e.g. relative valuation of the two juices, sigmoid steepness) via specific parameter combinations and relies on specific assumptions about the construction of choice (e.g. cumulative normal distribution, constant sigmoid slope across order effects). This method of quantifying choice behavior is well-documented in previous studies, allowing a comparison to past work. However, given the importance of this approach to both quantifying choice effects and comparing choice to OFC responses, the paper would benefit from directly addressing two issues: (1) how well does probit regression actually capture stochastic choice behavior (in both Task 1 and Task 2), and (2) do the findings rely on specific choice modeling assumptions? The second issue is most important for the order bias effects, which assume a constant sigmoid across conditions - do the authors reach similar conclusions if this assumption is relaxed?

      Thanks for raising this question. We address it more thoroughly below (under “Recommendations for the authors”, point (2)). In a nutshell, when we designed the behavioral analysis, we chose the probit function and the log value ratio model (as opposed to the value difference model) based on general considerations and for consistency with our previous studies. We now conducted a series of control analyses using logit instead of probit and value difference instead of log value ratio. We also repeated all the analyses of neuronal activity using measures for relative value, choice accuracy and order bias derived from these behavioral models. The upshot is that all of our results hold true independently of the regression model used to analyze choices. Thus we kept the results as in the original ms, and we included a new section in the Methods to describe our control analyses (p.16-17).

      (3) There are some issues with the strength and interpretation of the preference bias that need to be addressed. Re: strength and significance of the preference bias, the text seems to overemphasize the dependence of the effect on relative value (rotation of the rho-2 vs rho-1 ellipse) at the cost of the simple task difference (shift in the ellipse above the identity line). Conceptually, a preference bias (an shift in relative value towards the favored item) requires only the task difference, not the dependence on relative value. It would be clearer for example if the main text (pg. 6) presented the statistics (t-test, Wilcoxon) supporting the difference in relative values (rhos) between Tasks 1 and 2. Furthermore, the rotation does not seem as robust: the text states that the result is significant in both animals (p<0.04) but the ANCOVA results (Fig 3C and 3F) suggests that the effect is only significant in Monkey J. Is the preference effect significant only in one animal, and if so, is the effect significant across the combined data?

      Let us refer to Fig.3C. There is no question that the separation between the red and blue lines is statistically significant (order bias). In addition, the two lines appear (a) displaced upwards and (b) rotated counterclockwise compared to the identity line. In our understanding, the question raised by R2 is whether the two effects – displacement (a) and rotation (b) – are both present and both necessary to define the preference bias. We actually gave this issue extensive thought early on, and we concluded that displacement and rotation are not easily dissociable, at least in our data set. The reason is simple: to dissociate them, we would have to make some assumption about the center of rotation. For example, if we assume that the center of rotation is [0, 0], then there clearly is a rotation but the displacement is close to zero. Conversely, if the center of rotation is [1, 1] (which, in some ways, is a more logical assumption), the rotation is still there but the displacement is >0. When we considered these elements, we realized that any choice of a center of rotation would be somewhat arbitrary. Further complicating things, once a center of rotation is chosen, rotation and displacement are non-commutative operations. Importantly, this issue only affects the displacement, meaning that the rotation angle (and its statistical significance) does not depend on choosing any particular center of rotation. In this light, we chose to define the preference bias in a way that is more tight to the rotation than to the displacement, while noting that the net effect of the phenomenon was to bias choices in favor of the preferred juice (hence, the phrase “preference bias”). The only problem with this definition is that it doesn’t do full justice to the phenomenon in monkey G (Fig.3F), where the displacement is more clearly evident than the rotation (indeed, the latter only trends towards statistical significance (p=0.07)). Still, we don’t see a better way to design our analyses. Thus we kept the ms unchanged in this respect.

      (4) On a related note, the authors present and view the effects as detrimental for the animals, but I think they have to more explicitly state how they are defining outcomes. For example, the abstract states "By neuronal measures, each phenomenon reduced the value obtained on average in each trial and was thus costly to the monkey". Does this mean that outcomes are less valuable, with value defined by (offer value cell) firing rates? A clarification is particularly important for the preference bias, where animals show a stronger bias for the preferred option compared to simultaneous choice. At the behavioral level, this effect seems to only be a poorer outcome if one assumes that simultaneous choice demonstrates true values - can it not be assumed that sequential choice demonstrates true preference, and the preference bias reduces performance in simultaneous choice? The authors may have an explanation in mind based on OFC value coding, and it would be helpful to be explicit here.

      Thank you for raising this question. The revised ms includes a new section (Discussion; ‘The cost of choice biases’; p.13) that discusses this important issue. In a nutshell, if in two conditions subjective values are the same but choices are different, in one or both conditions the subject fails to choose the higher value. In that sense, the choice bias is detrimental. Our analyses of neuronal activity indicated that subjective offer values were (a) the same in the two tasks and (b) independent of the presentation offer in Task 2. Hence, both the preference bias and the order bias were detrimental to the animal.

      (5) Finally, at a broad level, the authors rigorously define and test hypotheses about how the different behavioral effects relate to OFC activity within the context of their neurocomputational framework (offer value, chosen value, chosen juice cells arranged in a competitive inhibition network; Fig. 1). However, it should be acknowledged that the primary conclusions - about how the different behavioral effects arise during valuation, comparison, or post-comparison - relies on the assumption that the different OFC response patterns reflect these specific circuit functions, and that OFC is causally related to choice. It would be more balanced if the authors could acknowledge this point in the discussion, and discuss any relevant potential alternative explanations for their findings.

      This issue is addressed above (Essential revision, point 1). In essence, R2 is correct: all our analyses were designed, and all our results are interpreted, under a series of assumptions. Most of these are backed by empirical evidence (e.g., showing that the encoding of decision variables in OFC is categorical in nature). However, one assumption remains a working hypothesis. Specifically, we assume that the cell groups identified in OFC constitute the building blocks of a decision circuit. If so, the activity of different cell groups may be associated with different computational stages. We edited the Discussion to clarify this point (p.11-12). As for possible alternative explanations, we agree that it is a very reasonable question to ask, but we honestly are at a loss addressing it. Indeed, one would never conduct the analyses presented in this ms if not in the framework of Fig.1. Consequently, it is hard to come up with any interpretation for the results without embracing that computational framework. If R2 can propose some alternative interpretation for the results presented in the ms, we would be more than happy to think about it, and possibly revise our thinking.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements [optional]

      We are grateful for the very kind, thoughtful, and detailed comments of the reviewers, which we have strived to fully integrate into the revised manuscript.

      Of note are the concerns with the data from stages S21 and S22, which we acknowledge do appear to be qualitatively and quantitatively distinct from the other samples. While we are unable to completely disambiguate meaningful biological variation from technical or experimental noise using our data, we hope a few additional analyses and visualization tools we have included can provide greater confidence in the reliability of our findings.

      Additionally, while attempting to evaluate Reviewer #2’s suggestions about examining the distribution of intergenic peaks along the genome, we discovered an error in our code that resulted in the improper assignment of peak categories. The error resulted in the improper assignment of intronic and exonic peaks as intergenic peaks. While the largest group of peaks in our dataset remains distal intergenic peaks (30.2%), and distal intergenic peaks remain a larger proportion of our intergenic peaks than proximal intergenic peaks, many of the peaks originally assigned to the intergenic categories have been reclassified as exonic or intronic peaks. We have updated our code and figures upon reanalysis of our data and have revised our findings and discussion accordingly.

      Description of the planned revisions

      Reviewer #3, Comment #3 of 11_

      “In general, I thought that the bioinformatic methods (i.e., the code or the options used for each program) would have been helpful for my understanding in some cases. The authors say that these will be published on an accompanying GitHub repository, which should be fine if this is sufficient for journal policy.”_

      We are still at work compiling the code for our analyses into a more reader-friendly form and setting up a GitHub repository to enable easy access to more detailed methods for interested readers. Some of the most important settings have been included in the Methods and Supplementary Methods sections, but we hope to include more thorough detailing of our pipelines in the GitHub repository. The raw data for portions of the RNA-Seq and all of the ATAC-Seq data have been uploaded to the Sequence Read Archive, and we are finalizing additional raw data submission. We are also in the process of determining what data to include in our Gene Expression Omnibus submission, which we hope to include all pertinent final data analysis files as well as any intermediate or accompanying datasets which would facilitate downstream analyses. The large size and number of our final analysis files has resulted in some challenges with data transfer and storage, which has delayed the upload and submission process.

      We are also collating several of the data visualization scripts built for this manuscript into a Jupyter notebook. This tool will enable the visualization of ImpulseDE2 models and peak classifications for arbitrary genes and genome regions of a user’s choice, alongside additional functions which are discussed in this revision plan.

      Description of the revisions that have already been incorporated in the transferred manuscript

      We have addressed the following substantive concerns with the manuscript:

      Reviewer #2, Comment #2 of 3:_

      “Authors have repeatedly used S21 and S22 throughout the manuscript to support their claims with clustering etc. May authors shed some light on the differences in replicates for these timepoints. Furthermore, I could not find Fig 3J, perhaps author would like to point out Fig 3H.”_

      Reviewer #3, Cross-comment #2 of 3:_

      “Focus on stages S21/S22: This might indeed be somewhat problematic. The libraries from these two stages (particularly S21) seem to be very different from those from the other stages. In the PCA (Fig. 1C), S21 doesn't cluster well with anything, and the difference between the two replicates is massive compared to other stages. The accessibility pattern (Fig. 1D) also looks odd. The libraries also have the lowest scores for % of mapped reads (Fig. S2B), fragment size distribution (S2E), and Spearman correlation (S2I). All this could be biologically sound and be due to a major developmental transition at this point, but maybe it justifies revisiting the data and testing whether leaving out S21 (and/or S22) makes a big difference for the clustering analyses.”_

      1. Reviewers #2 and #3 discussed concerns with the outlying nature of libraries S21 and S22. We had also previously held concerns about these samples and had performed some analyses to examine whether the global properties of our dataset are dramatically changed upon removing those samples. We did not observe dramatic changes to the structure of our data in the absence of the S21/S22 samples.

        • a. Samples S21 and S22 appear to be highly separated from the rest of our data using Principal Components Analysis. We had also previously believed that this suggested that these samples might be problematic. However, a colleague indicated to us that researchers in microbiome ecology had observed similar phenomena, often caused by strong single axes of variation (or “linear gradients”) in the datasets. In “Uncovering the Horseshoe Effect in Microbial Analyses” (mSystems, 2017) by Morton et al., the authors describe how a strong linear gradient can create a “horseshoe effect” or “Guttman effect”, where PCA results in the two ends of a linear gradient appearing to come together in ordinal space. The authors also describe a similar “arch effect” which strongly resembles the general shape of our PCA curve. We suggest that the strong apparent “outlier” appearance of S21 and S22 may be exaggerated or induced by the technical “arch effect” phenomenon, and may be caused by a strong single biological gradient – a developmental timecourse – which our data aimed to capture.
        • b. We also performed PCA on our dataset with the S21 and S22 time points removed prior to performing the analysis (see right panel, bottom). When we did so, we observed that the relative positions of the remaining libraries remains largely similar, with time points closer to the middle of development showing a positive loading in PC2, and time points closer to the beginning and end of development showing a negative loading. This suggests that the second major axis of variation in our dataset would remain a contrast between middle vs. terminal timepoints, even without the S21/S22 data, and that the relative positioning of the remaining data within PC-space is not entirely driven by S21/S22.
        • c. To further assess the degree of the S21/S22 samples’ outlying effects, we also performed ImpulseDE2 analysis to generate model fits without S21/S22 data. Doing so allowed us to determine to what degree the S21/S22 stages are necessary for driving the accessibility trajectory of individual peaks, and of the data more broadly. We performed IDE2 with either all data, or the S21/S22 data removed prior to input into IDE2. This generated two sets of model fits to the “cloud” of accessibility vs. time measurements: one that included the S21/S22 data, and one without. We evaluated, for each peak in our dataset, the time point at which the IDE2 model achieved maximum accessibility (the “IDE2 max fit”), and plotted both the “all” and “noS21S22” data as a histogram (see right panel, top graph). The presence of peaks that achieve predicted maximum accessibility in the S21/S22 stages in the “no S21/S22” data is a result of how we calculate “max fit”, which does not require that there is a known accessibility value at a given timepoint; only that the time point during which the model fit is maximum is closest to the timing of that developmental stage. Overall, we still observed early, middle, and late enrichment of IDE2 max fit even when the S21/S22 data are removed. We do see a rightward shift in the middle timepoint histogram in the direction of later stages, although this may be expected given the absence of concrete accessibility values at S21/S22 in the “no S21/S22” data. This indicates that our data globally retain the general trends of early, middle, and late enrichment of accessibility in the absence of the S21/S22 data. Moreover, this suggests that, even without the S21/S22 data, the remaining data from early and late stages result in a model fit that still predicts maximum accessibility at middle developmental stages for many peaks.
        • d. To further measure the influence of the S21/S22 data in IDE2 model fit, we also evaluated the degree of change in the global behavior of a peak when the S21/S22 stages were removed. This analysis aimed to assess whether removing S21/S22 data resulted in an IDE2 model with the same general trajectory as with all data, as opposed to the more stringent requirement of evaluating whether the exact developmental stage of the peak was changed. To perform this analysis, we grouped developmental stages into five quintiles, each representing three stages of development. We asked, for each peak in our dataset, whether that peak’s IDE2 max fit was “stable” when the S21/S22 data were removed; that is, if the quintile of the IDE2 max fit was altered when the S21/S22 data were removed (i.e. if a peak moved more than 3 developmental stages away from its original position), a peak was considered “unstable”. We observed that over 80% of peaks in each quintile remained “stable” after removing the S21/S22 data, suggesting that the vast majority peaks show the same general trajectory of accessibility even without the S21/S22 data. Peaks within the middle time points appeared to be more unstable than peaks at the terminal timepoints, which could be expected given that the S21/S22 timepoints constituted the middle-most timepoints in our dataset.

      We acknowledge that the S21/S22 timepoints still appear to be qualitatively different in other ways. Moreover, we acknowledge that some of the peaks in our dataset are “dependent” on the S21/S22 stages, given that their accessibility trajectory changes when these stages are removed. It is difficult to determine whether a change in accessibility trajectory for a given peak caused by the removal of S21/S22 data is indicative of technical differences in sample preparation, such as batch effects; biological variation, such as a potentially unknown mutant or sick embryo; or due to genuine wildtype biological processes that occur at the S21/S22 stages.

      These caveats acknowledged, a comparative analysis of the data in the absence of the S21/S22 stages suggests that much of the global picture of development remains the same. In the interest of providing the data we generated as a resource, we decided to include the S21/S22 data in the final manuscript we have prepared for submission.

      We have included an additional supplementary figure (Supp. Fig. 2.2) highlighting these further analyses, which we hope future readers will consider when performing their own analyses with these timepoints, as well as a summary of the ways we evaluated this potential concern in the Supplementary Methods. To facilitate future users of this dataset, we will include the model parameters calculated from IDE2 using both the full dataset and the data with S21/S22 removed in the GEO accession data, as well as a Jupyter notebook (ParhyaleATACExplorer.ipynb) that allows users to plot the raw accessibility data and IDE2 model fits for individual peaks of interest (C, example on right panel), so that downstream experiments can consider the potential differences with the S21/S22 samples.

      Reviewer #2, Comment #3:_

      “The majority of ATAC-seq peaks in the distal intergenic regions is a very surprising result. Authors defend this result by suggesting that this organism has big genome. May author perform a short analysis that shows that these peaks are indeed represent nearby genes or may point towards 3D genome organisation. For example, I see that this genome might have regions in the genomes that are densely organised in gene clusters, in those cases does the pattern remains same i.e he majority of the genes are very distant from each other and hence use vital regulatory elements?”_

      Reviewer #3, Cross-comment #3 of 3:_

      Peaks in distal intergenic regions: I agree that this could be elaborated on. It might also be that >10 kb is not actually that distal for Parhyale. I would suggest to split the "distal peaks" further (e.g., in 10 kb or 2-log steps, or whatever makes most sense) and try to understand if >10 kb is mostly <20 kb, or if most of them are hundreds of kb from the nearest gene?_

      1. Reviewers #2 and #3 expressed interest in understanding the absolute distribution of distal intergenic peak distances from nearby genes in our dataset. In generating the analyses to address this question, we stumbled upon an error in our code that reveals that the true number of intergenic peaks is much lower than we had originally reported. We discuss the nature of the error below. Moreover, we address the previous question using the new data, which overall still indicates that distal intergenic peaks remain a large portion of the Parhyale genome.
        • a. To address Reviewer #2’s comments with respect to the presence of potential clusters of intergenic regions, we built a Python tool (included in ParhyaleATACExplorer.ipynb) enabling the visualization of different cis-regulatory element categories along a genomic coordinate. Upon plotting our data with this tool, we observed problems with the categorization of the peaks – namely, that intronic and exonic peaks were erroneously classified as intergenic peaks (see right panel, top). We analyzed our script for classifying annotations more carefully and realized that we had erroneously used “bedtools closest” instead of “bedtools intersect” to try to identify all peaks overlapping with gene annotations in our genome. We corrected this error and observed the expected distribution and categories of peaks in our data (right panel, bottom).
        • b. The revised peak categories have been added to the updated manuscript in Fig. 3H and Fig. 5C. The categories of peaks we observed differ substantially from our previous results, in that we observe a much higher representation of exonic and intronic peaks in our dataset, with intronic peaks now representing 28.2% of all peaks (increased from <1%), and distal intergenic peaks representing 30.2% (decreased from 51.2%). While distal intergenic peaks remain the largest category over time, the proportion is relatively equal to the fraction of intronic peaks. Intergenic peaks (distal and proximal combined) now make up only a slightly larger fraction of peaks (37.2%) than gene body peaks (exon, intron; total 34.4%). This updated result is a significant departure from our previous report, and we have updated the text of the manuscript to correct this mistake.-
        • c. While intergenic and distal intergenic peaks constitute a much smaller portion of our data, we still wanted to address Reviewer #2 and #3’s questions about the distribution of distances between intergenic peaks and nearby genes. We generated a plot to illustrate the number of intergenic peaks at variable distances to the nearest gene (B, right panel). As illustrated in the plot, there are a very large number of distal intergenic peaks, including many peaks >100kb away from the nearest gene. The average distance of intergenic peaks from the nearest gene was 73,351bp. We neglected to mention in the original manuscript that one of the rationales for choosing a 10kb cutoff as “distal intergenic” was that peaks beyond this distance would be considerably more difficult to isolate as single fragments combined with a proximal promoter using PCR, agnostic of their orientation with respect to the promoter element. Such peaks could not have been easily identified using previous transgenic approaches, and are thus distinguished from “proximal” peaks by their necessary identification using techniques such as ATAC-Seq. We have updated the text to reflect this distinction.
        • d. Given that both intergenic and gene body peaks appeared to comprise large fractions of our revised data, we also examined the relative enrichment of intergenic and gene body peaks with respect to time (after normalizing for the fraction of “unknown” peaks, as suggested by Reviewer #3). We observed that the proportion of peaks belonging to intergenic and promoter regions declined slightly as development progressed, while the proportion of gene body peaks increased (E, below). There appeared to be slightly more intergenic peaks than gene body peaks at all developmental time points, and the ratio of intergenic peaks to gene body peaks declined very slightly over time (F, below). These data indicate that intergenic and gene body peaks have different enrichment trajectories over time. As development progresses, gene body peaks are increasingly enriched, and may have a greater impact on gene regulation. We have added these additional observations to the text and to a new Supplementary Figure 2.3.

      We have also addressed the following textual and conceptual concerns with the manuscript:

      Reviewer #3, Comment #1 of 11_

      I felt that the first paragraph of the introduction is not necessary._

      1. We believe the introductory paragraph helps frame the paper in the context of the broader scope of advances in technologies for emerging research organisms – currently, it has become straightforward to both generate a genome sequence and to identify and manipulate coding genes of interest across diverse taxa, but the identification of gene regulatory mechanisms remains more difficult. We have edited the introduction to better reflect this perspective and to link the first paragraph to the rest of the paper.

      Reviewer #2, Comment #1 of 3_

      “In Introductory paragraph 2, sentence one, authors suggest that gene regulation plays more important role in evolutionary process than genes. Although a significant amount of research has been dedicated to gene regulation based evolution still this field is in nascent form. For example evidence of inheritance of the gene regulation pattern across generation is scarce and requires more evidence. I suggest authors to modulate the claim that still gene based evolution is the main paradigm instead otherwise.”_

      Reviewer #3, Cross-comment #1 of 3_

      Evolution via gene regulation vs. coding sequence: While (to my understanding) it is largely accepted in the field that changes to the CDS will often have more deleterious effects than changes to the expression of a gene, I agree that this could be elaborated on a bit.

      1. As requested by Reviewers #2 and #3, we have clarified the language surrounding the debate between gene functional and gene regulatory evolution to indicate that both mechanisms appear to be important for evolutionary processes, with the importance of the latter more recently revealed.

      Reviewer #3, Comment #2 of 11_

      Use of Genrich: I presume this was run on both duplicates simultaneously? This is not clear from the methods section. It might have implications for downstream analyses (e.g., differential accessibility between time points) because running on both sequencing library replicates simultaneously leads to a single "replicate" of peaks per time point, while running it individually leads to two. However, I have never tested if this actually does make a difference. Maybe the authors have and can comment on this?

      1. In response to Reviewer #3’s inquiry about Genrich, we have added additional clarifying information into the Methods section. “Genrich analysis was run on both duplicate libraries simultaneously; Genrich performs peak calling on each peak individually, and then merges the p-values of the replicates using Fisher’s method to generate a q-value, obviating the need to calculate an Irreproducible Discovery Rate (IDR).” We did not test running Genrich on individual libraries, opting for the more conservative approach of using the combined q-value as a filtering score for peak quality. For further information, the reviewer can see the Genrich Github repository section here: < [https://github.com/jsh58/Genrich#multiple-replicates]

      Reviewer #3, Comment #4 of 11_

      The section on the IDE2 models (the paragraph at the end of page 4/beginning of page 5) was unclear to me but appears sound. (The only instance where I didn't quite understand what the program actually does.) Maybe this can be explained a bit easier?_

      1. As requested by Reviewer #3, we have attempted to explain the methods and logic of using ImpulseDE2 a bit more clearly:

      “To identify regions of dynamically accessible chromatin, we used the ImpulseDE2 (IDE2) pipeline (Fischer et al., 2018). IDE2 differs from other software for differential expression analysis in that it allows the investigation of trajectories of dynamic expression over large numbers of timepoints. It does so by modeling a gene expression trajectory as an “impulse” function that is the product of two sigmoid functions (Chechik and Koller, 2009; Yosef and Regev, 2011). This approach enables the modeling of a trajectory of gene expression in three parts: an initial value, a peak value, and a steady state value, thus summarizing an expression trajectory using a fixed number of parameters. With the ability to capture the differences between early, middle, and late expression values for each gene in a dataset, IDE2 also enables the detection of transient changes in gene expression or accessibility during a time course. Identifying differential expression over large numbers of timepoints is difficult for more categorical differential expression software such as edgeR and DESeq2, which generally use pairwise comparisons between timepoints to assess change over time (Love et al., 2014; Robinson et al., 2010).”

      Reviewer #2, Comment #2 of 3_

      2-2) Authors have repeatedly used S21 and S22 throughout the manuscript to support their claims with clustering etc. May authors shed some light on the differences in replicates for these timepoints. Furthermore, I could not find Fig 3J, perhaps author would like to point out Fig 3H.

      Reviewer #3, Comment #5 of 11_

      On page 7, Fig.3J needs changing to 3H. This figure should, in my opinion, also contain the absolute number of peaks for each time point to set the individual proportions into context.

      1. As requested by Reviewer #3, we have added a bar charts representing the number of peaks found at each time point (Fig. 3H) and the number of peaks found in each cluster (Fig. 5C) to the peak type proportion plots. We have also fixed references to Fig. 3J to instead refer to Fig. 3H – we apologize for the confusion.

      Reviewer #3, Comment #6 of 11_

      Last paragraph of the "Improving the Parhyale genome annotation" section: I think this needs to focus on those regions of the genome for which the location is known - after all, the "unknown" regions" could all be "distal transgenic", which would significantly change the relative proportions._

      1. We have revised our analysis of this topic with our updated peak type proportions, as described above in point 2d above under “substantive concerns”.

      Reviewer #3, Comment #7 of 11_

      “On page 9, t-SNE is mentioned but doesn't seem to be cited.”

      1. As requested by Reviewer #3, we have added citations for the t-SNE method, as well as scikit-learn, the software we used for t-SNE visualization.

      Reviewer #3, Comment #8 of 11_

      “The third paragraph on page 9 ("We evaluated the differences...") should mention the fact that clusters 1 and 2 are the only ones with significant proportions of exonic and intronic peaks. In the accompanying figure (5C), the total number of peaks would again be helpful.”_

      1. After identifying the error in our peak category classification pipeline, this observation was no longer true. However, upon examining the new distributions by cluster, we observed that in Clusters 3–7, for which we observed GO enrichment for developmental processes, there appeared to be slightly higher enrichment of intronic regulatory elements than distal intergenic regulatory elements. These results resemble the observation from recent work showing that tissue-specific enhancers are enriched in intronic regions in various human cell types (e.g. Borsari et al. 2021, Genome Research). We have noted this new observation in the text.

      Reviewer #3, Comment #9 of 11_

      In figure 5D, I can't quite make out at which stage the dip in the peak of Cluster 8 occurs. This is quite an unusual pattern of accessibility change, and I can't help but wonder if it has something to do with the quality of one of the libraries? Also, the fact that half of the peaks fall into unmapped regions of the genome is unusual, and I feel this deserves more discussion._

      1. In Figure 5D, Reviewer #3 asks about a dip in accessibility for Cluster 8 peaks. The dip in accessibility was actually observed for Cluster 9 peaks and is marked by the asterisk in that panel. We have updated the figure legend to clarify the significance of the asterisk and have referred readers to examine Supp. Fig. 5.1B, where the IDE2 model fits more clearly show a collective dip in accessibility for Cluster 9 peaks. Upon examining the size distribution of the clusters, we have also noticed that Cluster 8 is the smallest cluster. We have noted the small cluster size and high “unknown” peak enrichment for Cluster 8 in the text.

      Reviewer #3, Comment #10 of 11_

      “On page 10, the abbreviation PFM appears, but it is only explained in the legend of Fig.4. This should appear in the text.”_

      1. Reviewer #3 mentions that on page 10, we use the abbreviation for position frequency matrices (PFMs) without previous reference. We first introduce the abbreviation on page 8, but given the repeated use of “PFM” on page 10, we have added an additional explanation of the abbreviation on page 10, for ease of reading.

      Reviewer #3, Comment #11 of 11_

      “The section on "Concordant and discordant expression and accessibility" is the one I disagree most with. The authors seem to suggest that a repressive cis-regulatory module should become less accessible when the gene is activated. However, they leave trans-acting factors completely out of their conceptualisation here. It is in general likely the availability of transcription factors that leads to repression, while the "silencer" can be well accessible in all cells. Moreover, it has become clear in recent years that CRMs are not just repressors or enhancers per se but can act as either depending on the availability of transcription factors. I think these facts could partially explain the weak correlation and should be discussed.”_

      1. We appreciate the comments from Reviewer #3, which alerted us to the more recent literature around the bifunctional potential of regulatory elements. We have revised our claims to clarify that concordance and discordance analysis cannot be used to directly assign “enhancer” or “silencer” identity to given regulatory elements. Instead, we suggest that evaluating concordance and discordance can be useful for downstream users of our data, such as those aiming to build reporter constructs for a given gene of interest. To facilitate such tool development, we have built additional functions into a Jupyter notebook to enable the visualization of accessibility, gene expression, fold change of accessibility and gene expression, significance of fold change, and concordance/discordance assignment for arbitrary peak-gene pairs. An example of this visualization is shown on the following page. Panel A shows the region around the Engrailed-1 and Engrailed-2 loci in Parhyale (text labels within the plot region were added manually in Illustrator). Panel B shows visualization of the En1 promoter peak alongside En1 expression. Significant log fold changes (DESeq2 padj < 0.05) are marked by asterisks in the bar plots, and concordance/discordance assignment at each time point is indicated by the color of the comparison text (red = concordant, blue = discordant). Panels C and D show accessibility and expression visualization for a single peak (En1 peak5) compared to two nearby genes (En1 and En2). We hope to include sufficient documentation in our GitHub repository such that using these tools is accessible for most researchers, even with limited programming knowledge.

      Description of analyses that authors prefer not to carry out

      We were unable to easily visualize the distribution of regulatory elements across the whole genome as suggested by Reviewer #2. One challenge of working with the Parhyale genome is the lack of complete chromosomes. The genome is distributed across ~290,000 contigs of variable size. We were unable to find any software that could be easily and quickly set up to visualize our data, although we will provide in a Jupyter notebook the tools for local visualization of peak types that we developed.

    1. publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.

      That is the key issue

    1. A lot of us may have felt pressure at times to find our purpose — to find our one true cause, our personal mission, what we personally should be doing and where we fit in.

      I think everyone rushes to find out their purpose in life but I think it's fine not to know. You'll get there eventually in life. My purpose in life has always been to be a good person and I realized that purpose a long time ago while I was in a bad place in life. Your purpose doesn't have to be the same as anyone else's. It's simply yours and you choose what to make of it.

    1. The new lines you mention really are present in the text content of the element. HTML tags are not being replaced by new lines, they just get omitted entirely. If you look at the textContent property of the <p> element you selected in the browser console, and you'll see the same new lines. Also if you select the text and run window.getSelection().getRangeAt(0).toString() in the browser console you'll see the same new lines. In summary, this is working as it is currently expected to. What I think may have been surprising here is that the captured text is not the same as what would be copied to the clipboard. When copying to the clipboard, new lines in the source get replaced with spaces, and <br> tags get converted to new lines. Browser specifications distinguish the original text content of HTML "in the source" as returned by element.textContent from the text content "as rendered" returned by element.innerText. Hypothesis has always captured quotes from and searched for quotes in the "source" text content rather than the "rendered" text. This behavior causes issues with line breaks as well. It might make sense for us to look at capturing the rendered text (as copied to the clipboard) rather than the source text in future. We'd need to be careful to handle all the places where this distinction comes up, and also make sure that all existing annotations anchor properly. Also we should talk to other parties interested in the Web Annotations specifications to discuss how this impacts interoperability.
      What I think may have been surprising here is that the captured text is not the same as what would be copied to the clipboard. When <mark>copying to the clipboard, <mark style="background-color: #8000314f">new lines in the source</mark> get <mark style="background-color:#00800030">replaced with spaces</mark>, and <br> tags get converted to new lines</mark>. </br> <mark>Browser specifications distinguish <mark style="background-color: #00800036">the original text content of HTML "in the source"</mark> as returned by <mark style="background-color: #00800036"/>element.textContent</mark> from <mark style="background-color: #ffa500a1">the text content "as rendered" returned by element.innerText.</mark></mark> Hypothesis has always captured quotes from and searched for quotes in the "source" text content rather than the "rendered" text.
    1. "As We May Think" predicted (to some extent) many kinds of technology invented after its publication, including hypertext, personal computers, the Internet, the World Wide Web, speech recognition, and online encyclopedias such as Wikipedia:

      Dispositivo avanzado para la época, pudo predecir de forma general el funcionamiento de la web hoy en día, aun así ni siquiera se ha igualado ese nivel de pensamiento, puesto que el Memex planteaba una forma de imitar procesos neuronales complejos de organización y asociación.

    1. Herald: Nay, ill it were to mar with sorrow's tale The day of blissful news. The gods demand Thanksgiving sundered from solicitude. If one as herald came with rueful face To say, "The curse has fallen, and the host Gone down to death; and one wide wound has reached The city's heart, and out of many homes Many are cast and consecrate to death, Beneath the double scourge, that Ares loves, The bloody pair, the fire and sword of doom"-- If such sore burden weighed upon my tongue, 'Twere fit to speak such words as gladden fiends. But--coming as he comes who bringeth news Of safe return from toil, and issues fair, To men rejoicing in a weal restored-- Dare I to dash good words with ill, and say How the gods' anger smote the Greeks in storm? For fire and sea, that erst held bitter feud, Now swore conspiracy and pledged their faith, Wasting the Argives worn with toil and war. Night and great horror of the rising wave Came o'er us, and the blasts that blow from Thrace Clashed ship with ship, and some with plunging prow Thro' scudding drifts of spray and raving storm Vanished, as strays by some ill shepherd driven. And when at length the sun rose bright, we saw Th' Aegaean sea-field flecked with flowers of death, Corpses of Grecian men and shattered hulls. For us indeed, some god, as well I deem, No human power, laid hand upon our helm, Snatched us or prayed us from the powers of air, And brought our bark thro' all, unharmed in hull: And saving Fortune sat and steered us fair, So that no surge should gulf us deep in brine, Nor grind our keel upon a rocky shore. So 'scaped we death that lurks beneath the sea, But, under day's white light, mistrustful all Of fortune's smile, we sat and brooded deep, Shepherds forlorn of thoughts that wandered wild, O'er this new woe; for smitten was our host, And lost as ashes scattered from the pyre. Of whom if any draw his life-breath yet, Be well assured, he deems of us as dead, As we of him no other fate forebode. But heaven save all! If Menelaus live, He will not tarry, but will surely come: Therefore if anywhere the high sun's ray Descries him upon earth, preserved by Zeus, Who wills not yet to wipe his race away, Hope still there is that homeward he may wend. Enough--thou hast the truth unto the end.

      Herald: menelaus had disappeared don't make me taint good news with bad

               there was a storm and boats crashed but we were spared, they may be alive but they will think we are dead just as we think they are dead
      
                wait for Menelauss's return because Zeus favors him
      

      .

    2. Think you--this very morn--the Greeks in Troy, And loud therein the voice of utter wail! Within one cup pour vinegar and oil, And look! unblent, unreconciled, they war. So in the twofold issue of the strife Mingle the victor's shout, the captives' moan. For all the conquered whom the sword has spared Cling weeping--some unto a brother slain, Some childlike to a nursing father's form, And wail the loved and lost, the while their neck Bows down already 'neath the captive's chain. And lo! the victors, now the fight is done, Goaded by restless hunger, far and wide Range all disordered thro' the town, to snatch Such victual and such rest as chance may give Within the captive halls that once were Troy-- Joyful to rid them of the frost and dew, Wherein they couched upon the plain of old-- Joyful to sleep the gracious night all through, Unsummoned of the watching sentinel. Yet let them reverence well the city's gods, The lords of Troy, tho' fallen, and her shrines; So shall the spoilers not in turn be spoiled. Yea, let no craving for forbidden gain Bid conquerors yield before the darts of greed. For we need yet, before the race be won, Homewards, unharmed, to round the course once more. For should the host wax wanton ere it come, Then, tho' the sudden blow of fate be spared, Yet in the sight of gods shall rise once more The great wrong of the slain, to claim revenge. Now, hearing from this woman's mouth of mine, The tale and eke its warning, pray with me, "Luck sway the scale, with no uncertain poise. For my fair hopes are changed to fairer joys."

      we won, troys' triumphant and subdued are like oil and water the triumphant revel in it, the subdued weep and toil

      if we don't desecrate troys' shrines we'll be fine but if our people do it'll be bad

      we all have cause to celebrate

    1. Blog Tucker Carlson: Biden Giving WHO Power to 'Deploy Proactive Countermeasures Against Misinformation and Social Media Attacks' By Craig Bannister | May 20, 2022 | 10:39am EDT Tucker Carlson (Screenshot) Pres. Biden has found a new way to censor free speech – by giving the World Health Organization (WHO) control of Americans’ speech – Fox News Host Tucker Carlson warned on Thursday. After dissolving his “Disinformation Governance Board, due to public outcry, Biden is preparing to sign WHO’s new World Pandemic Treaty, giving a global operational control and power – through ‘proactive countermeasures’ - to combat what it deems “disinformation,“ Carlson explained, citing a WHO working group's draft text:#stickypbModal625{ position : relative; z-index : 30; margin:0px px; padding: 9px; background: rgba(0,0,0,0.0);} @media only screen and (max-width: 1024px) {#stickypbModal625 { flex-wrap: wrap;}} googletag.cmd.push(function() { googletag.display("div-hre-CNS-News-625"); }); “So, what would this ‘operational control’ mean? “Let’s be specific. Right off the bat, the treaty demands ‘National and global coordinated actions to address the misinformation, disinformation, and stigmatization that undermines public health.’ “Oh! Here we go! Right to censorship: ‘People are criticizing us, and for public health reasons, that can't be allowed. If you criticize us, people will die.’  “So, you saw yesterday that the Biden administration, in the face of universal laughter and derision, had to fire the head of its new Ministry of Truth - but they found another way to do it: ‘W.H.O. Secretariat to build capacity to deploy proactive countermeasures against misinformation and social media attacks.’” “So, they are going to get to censor anybody who doesn't agree with what they do, as they control the intimate details of your life,” Carlson explained: “And they will control those details. Under this treaty, the World Health Organization will get to establish vaccine passports and regulate travel. World Health organization will ‘Develop standards for producing a digital version of the international certificate of vaccination and prophylactics.’  “Okay.  “So you may think, ‘Well, it is just about COVID and I went along with mandatory vaccines and vaccine passports at the time, how bad could it be?’ [Laughs] First of all, if you went along with that, you should be repenting right about now. But, it is not just about COVID because the W.H.O. Will be in charge of ‘The digitalization of all health forms.’ The World Health Organization will also ‘Share real-time information about travel measures.’  “So you are going to find out exactly when you are allowed to get on a bus or train or airplane, or how about your bicycle, will they regulate that too? Maybe. Now the World Health Organization has sought this authority for years. Of course. Who doesn't want more power?” Carlson then played a foreboding comment by W.H.O. Director-General Tedros Adhanom Ghebreyesu. “Here’s Tedros back in April of 2020: “People in countries with stay-at-home orders are understandably frustrated with being confined to their homes for weeks on end. But the world will not and cannot go back to the way things were. There must be a new normal. A world that is healthier, safer, and better prepared.” Americans should question relinquishing control over their lives to an unelected person and global authority they had no say in choosing, Carlson said:#stickypbModal711{ position : relative; z-index : 30; margin:0px px; padding: 9px; background: rgba(0,0,0,0.0);} @media only screen and (max-width: 1024px) {#stickypbModal711 { flex-wrap: wrap;}} googletag.cmd.push(function() { googletag.display("div-hre-CNS-News-711"); }); “Okay, so there’s a guy with a long and documented history of subverting public health, who is clearly a liar, who is acting as an agent for the Chinese government, and you have to ask yourself, ‘Did I vote for that guy? Is he one of my elected representatives in this democracy? How did he get power over where I can travel and when?’ “Good question.”

      Summary of Tucker's televised evening talk show.

    1. Author Response

      Reviewer #3 (Public Review):

      The import of soluble precursor proteins into the mitochondrial matrix is a complex process that involves two membranes, multiple protein interactions with the translocating substrate, and distinct forms of energetic input. The traditional approaches for in vitro measurement of protein translocation across membranes typically involve radiography or immunodetection-based assays. These end-point approaches, however, often lack optimal resolution to analyze the sequential processes of protein transport. Therefore, the development of techniques to dissect the kinetic steps of this process will be of great interest to the field of protein trafficking.

      This study by Ford et al. employs a novel bioluminescence-based technique to analyze the import of presequence-containing precursors (PCPs) into the mitochondrial matrix in real time. As a follow-up study to previous work from the Collinson group (Pereira et al. 2019), this approach makes use of the split NanoLuc luciferase enzyme strategy, whereby mitochondria are isolated from yeast expressing matrix localized 'LgBiT' (encoded by the mt-S11 gene) and used for import experiments with purified PCPs containing 'SmBiT' (the 11-residue pep86 sequence). The light intensity that results from the high-affinity interaction of pep86 with mt-S11 is convincingly shown in this study to be a reliable reporter of protein import into the matrix space. Therefore, from a technical stance, this appears to be a very promising approach for making high-resolution measurements of the different kinetic steps of protein translocation.

      The authors leverage this technology to seek insights into several features of mitochondrial protein import, with some observations challenging key longstanding paradigms in the field. Using series of PCP constructs differing in length and placement of the pep86 peptide, the authors perform luminescence-based import tests with varying protein concentration, energetic input, and presequence charge distribution. Fits to the time course data suggest two main kinetic steps that govern matrix-directed import: transit of the PCP across the TOM complex into the IMS and association of the PCP with the TIM23 motor complex. The results support some very interesting insights into TIM23-mediated protein import, including: that precursor accumulation is strongly dependent on length; that the kinetically limiting step of IM transport is engagement with the TIM23 complex, not transmembrane transport itself; and that presequence charge distribution differently affects import rate and matrix accumulation. The results of this study appear repeatable among samples and the mathematical fits to time courses are well explained. However, there remain some questions about the nature of the experimental approach and the interpretation of the kinetics data in terms of the underlying biological processes. These questions are as follows:

      Major points

      Overall system characterization and mathematical analysis

      1) The Western-based characterization of the amount of matrix-localized 11S (shown in Figure 1 - figure supplement 1) shows that the concentration of 11S varies significantly (> twofold concentration difference, quantified as a ratio to Tom40) among yeast/mitochondria preps. Is there a particular reason for this large variability? Perhaps more significantly, the import efficiency (judged by luminescence amplitude) shows high batch variability as well (> twofold efficiency difference). While this series of experiments makes the case that the luminescence readout of import is not limited by matrix-localized 11S, it does raise a potential concern of batch-to-batch variation in import competence. Could this have any implications for the reproducibility of results by this assay, particularly regarding the kinetic parameters reported?

      It is very difficult to know what causes this variability as it can be seen even between triplicate preparations carried out on the same day. It could be due to slight differences in the flasks used to grow cells (such as the size of the baffles). However, we have determined that the variability in 11S concentration does not correlate with import competence (Figure 1 – figure supplement 1C), and that the kinetics of import are not affected (Figure 1 – figure supplement 2C).

      2) My understanding from the Pereira 2019 JMB paper is that the yeast expressing the matrix-targeted 11S were engineered so that the 11S construct contained a 35 residue presequence from ATP1. In Figure 1 - figure supplement 1, panel A, it looks like the mitochondria-derived 11S constructs are significantly larger than the purified 11S constructs used to calibrate concentration. If the added residues on the mitochondrial 11S constitute a presequence, then they should be cleaved up on import to yield the mature sized protein. Why are the mitochondrial 11S constructs so much larger than the purified ones? Explicit labeling of MW markers would be useful here.

      We noted that it seemed likely that the presequence was not getting cleaved off. There may also be some kind of SDS-PAGE mobility issues for 11S (common for beta-barrels), such that the purified version has a different mobility to the matrix localised version. Therefore, the possibility remains that the MTS is cleaved off, but the mature product migrates anomalously on gels. For this reason we carried out experiments to show that 11S is matrix localised, which turned out to be the case (Figure 1 – figure supplement 1D). So irrespective non-MTS cleavage, or unexpected gel mobility of correctly processed 11S, the reporter is where it should be – in the matrix. These points are elaborated in the text.

      Labels have been added to molecular weight markers, as requested.

      3) From Figure 1D, given that the amplitude linearly increases with added Acp1pep86 up to ~45 nM, this suggests that matrix-localized 11S is in stoichiometric excess of imported peptide within this range of added substrate. Given a matrix [11S] of 2.8 uM, a stoichiometrically equivalent amount of Acp1-pep86 would be equivalent to an import of <0.5% of added substrate, and it is suggested that import efficiency is actually much lower than that. How can this very low import efficiency be explained?

      Import is single turnover under our assay conditions and is therefore limited by the number of import sites rather than matrix [11S]. Under standard conditions, we intentionally add substrate in vast excess and only anticipate that a very small proportion will be imported.

      4) Apropos of point #3 above: Given the low efficiency of import observed for the purified PCP substrates in this study, one wonders if this due to the formation of off-pathway (translocation incompetent) precursors established during the import reaction, before substrates have a chance to engage OM receptors (e.g., due to aggregation, etc.) In this case, the interpretation of single-turnover conditions may instead be caused by a vast majority of PCP losing translocation competence, rather than the requirement for energetic resetting that is suggested. Might this be a possibility?

      We anticipate that some PCP will aggregate and add substrate in excess to allow for that. Our interpretation of the reaction as single turnover was drawn from a comparison of PCP-pep86-DHFR import amplitude in the presence versus absence of MTX, rather than amplitudes from absolute amounts of PCP. We cannot think of a reason why MTX would affect protein solubility.

      5) Import time courses in many cases show a progressive drop in luminescence at later time points after a maximum value has been reached. This reduction in signal cannot be accounted for by the two rate constants in the equation used in two-step kinetic model. How were such luminescence deviations accounted for when fitting data to obtain these kinetics parameters? What might be the reason for this downward drift in signal once maximum amplitude has been reached?

      We almost always see this gradual drop in luminescence in both the mitochondrial and bacterial systems. The data points acquired after the amplitude are excluded for the fitting. The assay is based on an enzymatic reaction and we think that the downward drift is due to a combination of substrate depletion and accumulation of reaction products.

      Import kinetics: dependence on total protein size

      6) In Figure 3 - figure supplement 1, some of the kinetic parameters from the PCP concentration-dependent responses are quite noisy. For instance, responses for the shortest constructs (L and DL) show a lot of variability in the k1 and k2 parameters. Is this (partly) due to difficulty in resolving these two parameters during the nonlinear least-squares fitting protocol for these particular constructs?

      It is difficult to resolve k1 and k2 perfectly, so the numbers are only estimates.

      7) The data in Figure 3, panels E and F (derived from Figure 3 - figure supplement 1) in some cases show non-linear dependence of kinetic parameters on the 'N to pep86 distance' for the length (panel E) and position (panel F) variants. For instance, from the length series, the k1 mean goes from 132 to 385 to 237 nM for the DL, DDL, and DDDL constructs, respectively. The variances suggest that these differences are real. Is there a reason that kinetic parameters would have such non-monotonic dependence on length?

      We don’t know the reason for this variance, but it could be investigated in future studies.

      Import kinetics: dependence on energetic input

      8) The data of Figure 4A show the results of partial dissipation of the membrane potential by 10 nM valinomycin. Most studies designed to cause a gradual dissipation of membrane potential do so by protonophore (e.g., CCCP) titration. Given that matrix-directed import is completely blocked by low micromolar amounts of this potent ionophore, it would be useful to have an independent readout (e.g., TMRM measurements) of the residual membrane potential that exists upon treatment with the lower concentrations of valinomycin used here.

      We have now included data that shows the partial effect of 10 nM valinomycin on membrane potential (TMRM measurements) and protein import (Figure 4 – figure supplement 1A-B).

      9) The step associated with k1, designated as transport across the TOM complex, is suggested to go to completion before starting the step associated with k2, engagement of the TIM23 complex. The k1 step shows a strong dependence on membrane potential (Fig. 4A, middle), particularly for the length series. Why would this be, given that no part of translocation across the OM should be associated with a valinomycin-sensitive electric potential?

      This effect is relatively small and mainly affects shorter PCPs. Our interpretation is that passage of the PCP through TOM is reversible, and committing PCP to import across the IMM (which requires ∆ψ) prevents this reversibility. However, it is also possible that transport through TOM and TIM23 are partially coupled. Both these possibilities are discussed in the discussion.

      Working model

      10) One of the most surprising outcomes of this study is that passive transport of substrates across the TOM complex and energy-coupled transport via the TIM23 complex are kinetically separable and independent events. As the authors note in the Discussion, the current paradigm of the field is that matrix-targeted substrates concurrently traverse the OM and IM via the TIM-TIM23 supercomplex, and this model is supported by quite a bit of experimental evidence. Even in this study, the fact that the PCP-pep86-DHFR construct exposes the pep86 sequence to the matrix in the presence of MTX (Figure 2) is evidence of a two membrane-spanning intermediate. Key mechanistic questions arise regarding the model proposed in this study. For example, if PCPs traverse the TOM complex as a stand-alone step, what is the driving force (e.g., a simple pathway of protein interactions with increasing affinity)? And would soluble, matrix-directed substrates be expected to accumulate in the very restricted space of the IMS? If so, how would TIM23directed membrane proteins keep from aggregating in the aqueous IMS? These questions would be worth addressing in the discussion of the model.

      We have included a discussion of the experimental evidence for TOM-TIM23 supercomplexes. The acid chain hypothesis has been proposed as the driving force for PCP transport though TOM ‒ an interaction between positive charges of the presequence and negatively charged residues within the TOM40 channel. Proteins that are targeted to the IMS are imported through TOM without the participation of TIM23 and we think that matrix-targeted proteins can do the same. This could explain why TOM is in excess over TIM23. We also think that some matrix-targeted PCPs can accumulate in the IMS, although this may not be true of membrane proteins.

      Import kinetics: dependence on MTS charge distribution

      11) The fact that import rates are increased with a more electropositive presequence makes sense in terms of the electrophoretic pull exerted on the PCP (matrix, negative). However, the greater accumulation of precursors containing more electronegative presequences remains puzzling. In the manuscript, this is explained based on the concept that accumulation of positive charges will cause partial collapse the membrane potential. However, I am still uncertain about this explanation for a few reasons. First, for each PCP, the presequence will constitute just a small fraction of the total length of the precursor, and therefore contribute a small fraction of the total charge density of imported protein. Would such a small change in total PCP charge be expected to have the dramatic effect observed among samples?

      The majority of the total PCP charge is from the mature region, and whilst the positive charges in the presequence undoubtedly deplete ∆ψ, the differences in extent of ∆ψ depletion that we see between PCPs that vary in charge, is due to the difference in charge of the mature regions (as their presequences are identical).

      Second, given the small amount of protein imported under these conditions, would the total charge of imported PCPs be expected to affect transmembrane ion distribution so significantly? For instance, as I recall, it takes up to micromolar amounts of mitochondria-targeted lipophilic cations (e.g., TPP+) to cause a major change in the TMRM-detected membrane potential.

      The effect was indeed unexpected. Despite the seemingly small number of PCPs that are imported, the total number of charged residues will be much greater.

      Finally, I would expect isolated mitochondria to be capable of respiratory control. It is well known, for example, that isolated mitochondria can respond to temporary draw-down of the membrane potential (e.g., by ADP/Pi addition) by going into state 3 respiration and restoring membrane gradients. Why would that not be the case here (Figure 5D)?

      The isolated mitochondria that we used for the import assays demonstrate increased O2 consumption in response to ADP addition, as expected (Figure 5 – figure supplement 1A-B). In addition to this new figure, we have now included TMRM data (Figure 6 – figure supplement 2B) that shows a depletion of ∆ψ in response to ADP addition, that is temporary and dependent on the amount of ADP added. We are therefore confident that our isolated mitochondria are capable of respiratory control as expected. We think that the lack of restoration of ∆ψ, following import-induced dissipation, is a consequence of the import process in vitro. Perhaps the import process compromises the channel resulting in concomitant ion/ charge dissipation during the active process. Moreover, this is likely to be exacerbated in vitro upon acute exposure to PCP, causing a sudden saturation of the import sites – thereby compromising the ∆ψ and the mitochondria’s ability to rapidly recover (this possibility has been noted in the MS).

      General

      12) Although the spectral approach in this study is developed as an alternative to the more traditional import assays, it would be useful to have some control import tests (done with Westerns or autoradiography) as complements to the luminescence-based imports. For example, control tests to accompany Figure 1 that show import efficiency or tests that accompany Figure 3 to show import of the different length and position series constructs. Perhaps this could be done with immunodetection of Acp1 or the pep86 epitope, showing protease-protected, processed import substrates that appear in a membrane potential/ATP-dependent manner. Even if the results from the more traditional techniques ran contrary to the results using the NanoLuc system, this would still allow the authors to compare which effects are consistent and which are dissimilar between different approaches.

      We have now included a Western blot import assay for the PCP-pep86-DHFR substrate and show that import is ∆ψ-dependent (Figure 2 ‒ figure supplement 1).

      13) The authors might also consider conducting imports with mitoplasts as a way to test the kinetic model that includes the TIM23-mediated step alone.

      We conducted import assays with mitoplasts and have now included this as a main Figure 5.

      14) It is difficult to follow the logic in the Discussion regarding the number of TIM23 sites limiting the number of 11S imported into mitochondria in live cells (page 15, lines 23-27). Are the authors suggesting that in vivo, one TIM23 complex serves to transport a single protein? This needs to be clarified.

      This has been removed, and this section of the discussion has been clarified.