5,256 Matching Annotations
  1. Feb 2021
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the editors and the four reviewers for their careful consideration of our manuscript. We are very grateful for their positive appreciation of our work and we believe that their suggestions, which have been included in the preliminary revised version of the manuscript whenever possible, have greatly improved the quality of the paper and have helped us deepen our understanding of the results.

      We were happy to note that all the reviewers found value in our work, as stated in their general comments: “This is certainly a useful contribution to our understanding of neuronal V-ATPase functions in vivo” (…)” (Reviewer 1) – “Dulac et al report the very interesting discovery of a previously uncharacterized neuronal specific regulator of the V-ATPase. (…) The experiments are very well performed, the data presented very convincing and the paper is well written.” (Reviewer 2) – “The discovery of a neuronal specific regulator of the V-ATPase is very interesting (…) The work is therefore of great interest to researchers working on synaptic function in general and on synaptic vesicle biology in particular.” (Reviewer 3) – “The authors have used well-designed experiments to convince the localization and function of VhaAC45L in synaptic vesicle acidification.” (Reviewer 4).

      In their remarks, the reviewers suggested additional experiments that could be done to improve our understanding of the role of this new V-ATPase regulator, as well as several minor issues. We have addressed all their comments in our answers below, in which the full text of the reviews is included in blue type, and the responses in black. The line numbers refer to the revised version of the manuscript.

      Reviewer #1

      Dulac et al. present a first in vivo characterization of the 'accessory' v-ATPase subunit vhaAC45L in Drosophila. The key findings are localization and association of the protein with v-ATPase complexes at synapses and a functional requirement based on lethality and reduced synaptic function. This is certainly a useful contribution to our understanding of neuronal v-ATPase functions in vivo. The main weakness of the study is a lack of depth. The study focuses on localization, co-IP of associated proteins, an analysis of acidification and reduced synaptic function in fly larvae, thus providing a baseline for mechanistic study. However, the mechanism of vhaAC45L is not addressed in this short report. How does is vhaAC45L function different from its homolog vhaAC45? Is it required for v-ATPase assembly? Is it required to localize the full v-ATPase complex (or just V0) to the synapse? Is the defect really due to partial loading of synaptic vesicles or does loss of vhaAC45L also affect endosomal and lysosomal function at synapses? The work as is certainly represents a publishable contribution without answering any of these questions - more as an invite for the community to study the role of vhaAC45L; however, I feel this is a bit of a missed opportunity to put the function of a new potential regulator of specific synaptic v-ATPase functions in the context of the most basic functions obvious in this field.

      My main concerns are:

      1. clearly, vhaAC45L is required for SOME function of v-ATPase in neurons - but it remains entirely unclear which one. It is not even clear what compartments are affected. Reduced quantal size of single vesicle exocytosis events can be a direct or indirect consequence of problems in SV biogenesis and recycling.

      Is exo- /endocytosis unaffected? (FM1-43 uptake!).

      We agree that alterations in the synaptic vesicle release/recycling cycle could indeed contribute to the locomotion defect, in addition to the acidification impairment observed in VhaAC45L knockdown larvae. As suggested by the reviewer, we plan to carry out FM-dye assays to measure endocytosis and exocytosis at the neuromuscular junction of control versus VhaAC45L-KD animals. If successful, a new figure will be added to the final version of the paper.

      What compartments are affected? (markers for synaptic vesicles versus lysosomal compartments!).

      Finding out whether VhaAC45L is specifically involved in the acidification of synaptic vesicles, or if it also plays a similar role in other synaptic organelles, in particular lysosomes, would be very interesting indeed. However, we found that it was technically difficult to address this issue in the Drosophila nervous system. A good way would be to check whether the lysosomal pH is affected by VhaAC45L knockdown, as it is the case for synaptic vesicles.

      Unfortunately, because lysosomes are not abundant in neurons, lysosome-specific pH-sensitive probes such as Lysotracker do not yield detectable signals at Drosophila larval synapses. So, whether VhaAC45L is specific for synaptic vesicles or involved in the regulation of V-ATPase activity in all neuronal compartments reminas an open question for now.

      1. molecular function: is vhaAC45L required for v-ATPase assembly? (IP/Pull-downs of v- ATPase complexes in the presence or absence of vhaAC45L with other subunits!).

      In accordance with the reviewer, we are also very much eager to learn more about the precise molecular function of VhaAC45L, and in particular whether it is required or not for assembly of the V-ATPase complex. Pull-downs of V-ATPase proteins in controls versus VhaAC45L-KD could be used to address this question, but this would require a large quantity of antibodies directed against subunits of the V0 and V1 domains, respectively. Unfortunately, there are no such antibodies commercially available against Drosophila V-ATPase proteins. We have tried several antibodies that recognize V-ATPase subunits from other species and were predicted to react against Drosophila homologs, but with no success. The only V-ATPase antibodies currently at our disposal were samples generously sent to us by other laboratories in insufficient quantities for carrying out such experiments. To our regret, therefore, we were not able to answer this question until now because of the lack of appropriate tools.

      1. vha100 was proposed in Drosophila to function on synaptic vesicles and the lysosomal pathway, but, if I remember correctly, here quantal size was normal. I am missing a comparison between the two.

      We thank the reviewer for this comment. A comparison with previously published results on subunit Vha100-1 has now been added (lines 458-469) in the discussion related to this topic in the revised manuscript.

      1. The V5 knock-in is used both as a mutant as well as a tool to analyze protein localization. This is likely okay, but a little concern of course has to be that by creating a mutant protein through stop codon deletion its subcellular localization, turnover, etc. are not normal. Similarly, anti-V5 co-IPs will isolate proteins bound to the mutant variant of vhaAC45L. Minimally, IPs or pull- downs using other members of the V0 complex should be done to understand the role of vhaAC45L in direct comparison with vhaAC45 on complex assembly and possibly targeting to the synapse (or ideally targeting to specific compartments).

      It is indeed a legitimate concern to question the physiological relevance of results obtained by studying V5-tagged VhaAC45L. However, the V5 tag is very small (14 amino acids) and we fused it in place of the stop codon to keep intact the whole sequence of the protein. In addition, we found that the V5 knock-in flies are viable and fertile as homozygous. Given that the null mutants, as well as strong RNAi knockdowns, are lethal at early developmental stage, this suggests that the V5 knock-in has limited negative effects, if any, on VhaAC45L function. This led us to believe that at least a good portion of the V5-tagged protein might be targeted to the right subcellular compartment, and associate to its physiological partners.

      Significance:

      There is significance to the reporting of an accessory v-ATPase subunit required for SOME function of the v-ATPase in neurons. There is some lack of significance in the absence of basic mechanistic insight as to what vhaAC45L does to the v-ATPase in neurons.

      We agree that we did not elucidate here the precise molecular mechanisms by which VhaAC45L contributes to synaptic vesicle acidification. It is rather an initial description of a novel neuronal protein that appears to be essential for proper synaptic functioning, and we provide consistent evidence that its function requires specific interaction with the V-ATPase complex, and in particular with three subunits that reproducibly co-immunoprecipitated with VhaAC45L (namely Vha1C39-1, Vha100-1 and ATP6AP2). Please note that it took many years and many papers before the molecular mechanisms of action of comparable accessory subunits, such as ATP6AP1/AC45 or ATP6AP2, was better understood, and it is still nowadays a matter of investigation. It is therefore very demanding to expect that we describe the exact function of the previously uncharacterized VhaAC45L at all levels in a single first paper.

      Reviewer #2

      In this study and using Drosophila melanogaster as a model system, Dulac et al report the very interesting discovery of a previously uncharacterized neuronal specific regulator of the V-ATPase called VhaAC45L. They combine genetics, IHC, Mass spec and ephys to unravel the expression pattern and function of this protein. They find that it is required to acidify synaptic vesicles in glutamatergic neurons of the Drosophila larval neuromuscular junction, for appropriate synaptic transmission and for larval locomotion. The experiments are very well performed, the data presented very convincing and the paper is well written. Nonetheless, a few additional pieces of evidence and some level of expanded analysis would strengthen the conclusions and increase the depth of the work.

      Major comments:

      1. Figure 1F: the while the localization to the presynaptic terminal is convincing, where exactly the protein is localized to is not studied. The imaging in these experiments could use increased resolution and concomitantly colocalization studies with more specific synaptic vesicle markers.

      We agree that it would be very good to show this additional result. However, confocal microscopy does not provide sufficient resolution to localize the protein at the membrane of individual synaptic vesicles. Another way would be to see if VhaAC45L immunostaining co- localizes with domains enriched in synaptic vesicle markers, but these organelles are rather ubiquitously distributed in the synaptic boutons at the Drosophila neuromuscular junction. To correctly perform this experiment, we would have to do immuno-electron microscopy, a technique we do not master in our laboratory and that we did not plan to implement for the present work.

      1. Figure 3B-G: these experiments should be complemented by a rescue experiment, ideally of the null mutant using a UAS construct and a pan neuronal driver, or - if such animals are viable to the third larval instar stage - a glutamatergic driver. If possible, it would also be good to study the NMJ phenotype of the null mutant rescued to viability using a neuronal driver that does not express in motor neurons (e.g. Chat-G4).

      Although a rescue experiment could potentially add a further evidence that Vha45ACL deficiency is responsible for the synaptic vesicle acidification defect described in Figure 3, we don’t think that it is a requisite here because we obtained similar results by knocking down the gene using two different RNAis. As described in the manuscript, the pan-neuronal expression of Vha45ACL could rescue the embryonic lethality of the null mutant, so it would be theoretically possible to check the acidity level of synaptic vesicles at the neuromuscular junction of the recued larvae. However, this would involve making rather complex genetic constructions to express VMAT-pHluorin in motor neurons in rescued mutant background. In addition, the conclusions we could draw from such experiment would be limited by the lack of comparison. Indeed, in Figure 3 the defect was observed in knockdown context, and the same experiment could not be performed in knockout larvae due to the early lethality. If we could measure the acidity level of rescued null mutants, we would not have any comparison point besides the knockdown experiments. As knockout and knockdown are not likely to produce identical phenotype (especially in terms of magnitude of effect), the ideal would be to compare the rescued phenotype to the null mutant expressing VhaAC45L in all neurons except motoneurons, as suggested by the reviewer. However, such genotype would certainly not be viable, since we observed that expression of VhaAC45L RNAis with a stronger motoneurons driver (D42-Gal4) was sufficient to induce lethality at early developmental stage.

      1. Figure 5: the authors focus on quantal size which measures the postsynaptic response to spontaneous release from the presynaptic terminal. However, it is unclear how this directly relates to the locomotor deficit beyond signaling potential deficits in vesicle loading or fusion. It would be more convincing to also study evoked release, and expand the analysis of presynaptic properties (number of events, amplitude, frequency).

      We fully agree with this comment shared by Reviewers 2 and 3 related to the electrophysiology experiments. Note that these experiments have been carried out in collaboration with another laboratory located in another city. The Covid-19 situation during the past year has prevented, and is still complicating, movements between labs, preventing us from going further with the electrophysiology analyses of VhaAC45L KD. If the situation in the near future allows it, we would very much like to add a more extensive electrophysiological analysis, including in particular the study of evoked release. In the revised manuscript, we have nevertheless completed Figure 5 by adding representative distributions of spontaneous mEPSP amplitudes in control and VhaAC45L knockdown larvae, as well as the results of new analyses showing lack of effects the KD on the mean EPSP frequency.

      1. General: showing some level of genetic interaction with V-ATPase subunits in at least some of the assays would be welcome.

      We are definitely in accordance with the reviewer on that point, but we think that this would involve a lot of work and be beyond the scope of the present initial description. Here we show by proteomic analyses that at least 12 proteins co-precipitate and so potentially interact with VhaAC45L, three of them being previously identified constitutive or accessory V-ATPase subunits. In our opinion, studying the interactions between VhaAC45L and these proteins through genetic and molecular studies will be the subject of future works. As stated by Reviewer 2 in the Referees cross commenting below: “further biochemical analysis is interesting but probably beyond the scope of this initial description and would take too much time”. We fully agree with this statement.

      Minor comments:

      Some of the images, especially those in Figure 3, should be larger for ease of visualization.

      As requested, the images of Figure 3 have been enlarged.

      Significance

      The discovery of a neuronal specific regulator of the V-ATPase is very interesting. To my knowledge it is the first description of a neuronal specific V-ATPase related protein since the description of Vha100-1 by Hiesinger and colleagues in 2005. The work is therefore of great interest to researchers working on synaptic function in general and on synaptic vesicle biology in particular.

      We are grateful to the reviewer for his very positive assessment of our work.

      I note that I do not have in depth expertise in electrophysiology, although I am sufficiently familiar with basic NMJ physiology experiments to render the opinions stated above.

      Reviewer #3

      In this study, Dulac and colleagues investigated roles of VhaAC45-like gene, which codes one of the V-ATPase accessory proteins in Drosophila, in synaptic transmission. First, they demonstrated that VhaZC45L transcripts are expressed selectively in neurons and that the gene products are addressed to synaptic areas. Second, they showed that VhaAC45L is co- immunoprecipitated with some subunits of V-ATPases, which is consistent with bio-informatics predictions. They further demonstrated that VhaAC45L-knockdown (KD) resulted in defects in synaptic vesicle acidification as well as a reduction in quantal size of glutamate, indicating that VhaAC45L play a key role in regulating neurotransmitter release by modulating the driving force for transmitter uptake. Last, not least, they demonstrated that VhaAC45L-KD in motoneurons attenuated larvae locomotor performance, indicating its physiological relevance. Overall, this study is rigorously executed and nicely presented, and adds one more component of the V- ATPase that is responsible for neurotransmitter uptake into synaptic vesicles. However, since this study simply confirmed an established notion from other species such as yeast and mammals that AC45 is one of the accessory proteins of the V-ATPase complex, a conceptual novelty beyond the previous knowledge is relatively poor in its present form. Thus, this reviewer would suggest several issues as following to improve the comprehensiveness as well as novelty of the current manuscript.

      1. The reason why the authors focused on VhaAC45-'like' is somewhat obscure, and therefore should be explained. How different VhaAC45 and VhaAC45L are in terms of amino acid sequences, tissue distributions, and KO phenotypes. It seems more comprehensive if the authors provide some experimental evidence on VhaAC45; e.g. whether it is also expressed in neurons or not (Fig. 1), and, if VhaAC45 is neuronal, whether it can rescue the phenotypes of VhaAC45L- KD to certain degree (Figs 4 & 5).

      Following the reviewer’s request, we have added a sequence alignment of VhaAC45 and VhaAC45L, as well as a graph showing tissue distributions of both genes in Supplementary Figure 1 of the revised manuscript. To our knowledge, there is no published functional study of VhaAC45 in Drosophila, so we can only make assumptions derived from studies on predicted homologs in evolutionarily distant species. For that reason, it is difficult to compare VhaAC45 to VhaAC45L, as it would first require an entire new study of VhaAC45 function in flies. Since our interest is to study neuronal physiology, we focused on VhaAC45L because compelling evidence indicates that this subunit is specific to the nervous system, as described in our manuscript, rather than on VhaAC45 which seems to be expressed in all tissues. In addition, homologs of VhaAC45L have never been functionally characterized to date in any species, making it very interesting to study this new protein in a genetically tractable organism.

      1. What is the mechanism of Ac45 in regulating V-ATPase activity? In mammals, it has been suggested that Ac45 is essential for proper sorting of the V-ATPase to the destined organelles (e.g. Jansen et al., Mol. Biol. Cell., 2010; Jansen et al., BBA, 2008). In this context, it should be examined whether VhaAC45L-KD would affect the synaptic localization of other V-ATPase subunits.

      We thank the reviewer for pointing out these very interesting references. We have indeed tried to determine the relative abundance of two other V-ATPase subunits at the larval neuromuscular junction in control and VhaAC45L knockdown contexts. However, because the tested subunits are not specific to neurons, and are expressed at relatively low levels in synapses, it was not possible for us to properly separate the synaptic signal from the background immunostaining in surrounding muscles. This unfortunately prevented us from performing an accurate and reliable quantification.

      1. Given that a rodent brain SV contains a few copies of the V-ATPase on average (Takamori et al., 2006, and some newer papers by others), it is interesting that >80% reduction of Ac45 showed moderate effects on quantal size. If SVs under study also contains 1 or 2 V-ATPase per SV, there must be some SVs lacking VhAC45L upon KD. In this context, it is interesting to see how VhaAC-KD (RNAi1~3) affect the frequencies of minis.

      The reviewer’s valuable comment prompted us to undertake new analyses on our electrophysiological recordings. We have now added in Figure 5E graphs showing the mean EPSP frequency for larvae expressing VhAC45L RNAi1 and RNAi2, which are the ones that were used in the quantal analysis. Both of these RNAi apparently decreased the frequency compared to controls, but this difference was not statistically significant. As detailed in the Discussion (line 458-469), this may suggest that VhaAC45L does not influence the abundance of the V-ATPase complex at nerve terminals, but rather its efficiency.

      1. In general, decrease in mini amplitudes is accounted for by changes in postsynaptic sensitivity for neurotransmitters. Although acidification deficits would support that decrease in quantal size is due to the decrease in the driving force for glutamate uptake, it should be examined whether the postsynaptic receptor fields are not affected by VhaAC45L-KD by recording postsynaptic response upon application of non-saturable concentrations of glutamate.

      Testing for potential postsynaptic receptor field alteration by glutamate application would be an interesting experiment indeed, but, as we believe, not a critical control for the present manuscript. Because we expressed RNAis presynaptically, any modification in the postsynaptic receptor field would have to be an indirect consequence of VhaAC45L downregulation in motoneurons, and so, likely to be related to the synaptic vesicle acidification defect. It would not change, therefore, our conclusion that VhaAC45L deficiency in motoneurons induces a decrease in quantal size. Because electrophysiology experiments were carried out in collaboration with another laboratory located in another city, the current sanitary context has so far prevented us from performing this test (please refer to our answer to comment 3 of Reviewer 2 for more details).

      1. Related to 4, it is also interesting to see if evoked responses are also attenuated as a result of VhaAC45L-KD, which is more physiologically relevant for locomotor activity phenotype than minis.

      We also agree with this comment, shared by Reviewer 2, to which we already responded above in our answer to comment 3 of Reviewer 2.

      Minor points:

      1. Quantal size of glutamate is not affected by reduced expression of DVGLUT (Daniels et al., Neuron, 2006), which highly contrasts with VhaAC45L, expression of which defines quantal size. Distinct regulation of quantal size by the transporter and the V-ATPase subunit should be discussed.

      As suggested by the reviewer, a discussion of this point has been added (lines 458-469). and Daniels et al. 2006 is now cited in the revised manuscript.

      1. For electrophysiological experiments, respective sample traces should be shown in Figure 5.

      Quantal size is not directly visible in sample traces, so we added instead representative distributions of spontaneous mEPSP amplitudes in control and VhaAC45L knockdown larvae in the new Figure 5C.

      1. <![endif]>Only RNAi1 and RNAi2 lines were examined for SV pH estimation and mini analysis. The results from RNAi3 should be presented, or at least mentioned in the text.

      These experiments were performed using two different RNAi constructs to ensure that similar effects were observed and to exclude the possibility of potential off-targets. Knocking down VhaAC45L in neurons with RNAi1 and 2 was lethal at pupal stages, suggesting that they give similar levels of inactivation. RNAi3 systematically induced lighter phenotypes, producing viable adults, which led us to believe it had a lower efficiency. Because the results on synaptic vesicle acidification and electrophysiology were very consistent with RNAi1 and RNAi2, we considered that it was not necessary to repeat the experiment with RNAi3.

      Significance

      As mentioned above, as it stands, the authors merely confirmed the pre-existing bioinformatic knowledge on one of the AC45 homologues in Drosophila. The audience of The EMBO Journal might be interested in how different/similar VhaAC45 and VhaAC45-like are, and their functional relevance. In particular, is VhaAC45 also mandatory for the V-ATPase functioning in neurons? Adding some basic information of VhaAC45, e.g. tissue distribution, KO phenotypes, and ability to rescue the VhaAC45-like-KD phenotypes, will certainly improve the comprehensiveness of this study, and capture audience's attention.

      As mentioned in our response to point 1 of the reviewer above, we have added more data comparing the structure and distribution of VhaAC45 and VhaAC45L in the revised manuscript. VhaAC45 appears to be ubiquitously expressed whereas VhaAC45L is neuron-specific.

      Comparing VhaAC45 to VhaAC45L would require a completely new study of VhaAC45 function, because it has never been done before in Drosophila to our knowledge. This would require repeating all the experiments with this other gene, probably involving two more years of work, and would make for a much longer and very different manuscript. It is understandable that this cannot be envisaged. Because homologs of VhaAC45L have never been functionally characterized to date in any species, we considered that it was worth studying this new protein on its own.

      Reviewer #4

      We have reviewed "A specific regulator of neuronal V-ATPase in Drosophila melanogaster." by Dulac et al. The authors have identified VhaAC45L as a regulator of neuronal V-ATPase in Drosophila melanogaster. The authors have utilized multiple techniques to determine the localization of VhaAC45L in neurons and specifically in the synapse. The use of multiple approaches including determining RNA levels in different regions of the fly, and using CRISPR- Cas9 technique to insert V5 tag, makes a very convincing argument about the synapse-specific expression of VhaAC45L.

      The combined use of co-immunoprecipitation technique and LC/MS to show that VhaAC45L co- precipitated with V-ATPase complex subunits is convincing that VhaAC45L is a subunit of V- ATPase. To determine the role of VhaAC45L in acidification of synaptic vesicles the authors have utilized pHluorins in combination with multiple RNAi lines. The authors have used a well- designed experiment to prove that VhaAC45L regulates acidification of the synaptic vesicles.

      Further, larval locomotion and quantal size determination using VhaAC45LRNAi which is known to be altered due to pH gradient of synaptic vesicles shows the functional role of VhaAC45L in synaptic vesicle acidification.

      Minor comments:

      1. For all graphs, please remove gridlines to make data points more visible.

      We found that gridlines can be helpful for the readers to assess approximate values on the graphs. So, we have not removed them but rather changed the colour to a light grey so it does not affect any more visibility. We have also placed the points over the error bars in all the graphs, so they become more apparent.

      1. Line 120-123: Authors indicate the VhaAC45LRNAi induced lethal phenotype when expressed in glutamatergic and cholinergic drivers but the figure is missing. Please indicate as "data not shown" if not included in Figure.

      This mention has been added in the manuscript (line 125).

      1. A diagram summarizing the role of VhaAC45L in V-ATPase enzymatic complex and specific role is recommended.

      We believe that it is too early in this first report to draw an accurate diagram summarizing the role of this new protein in the V-ATPase complex.

      Significance

      V-ATPase play a crucial role at the synapse by being responsible for acidification of the synaptic vesicles and identification of a synaptic vesicle specific regulator of V-ATPase is important to understand the complex regulation of synapse function. The authors have used well-designed experiments to convince the localization and function of VhaAC45L in synaptic vesicle acidification.

      We thank the reviewer for his very positive appreciation of our work.

      Referees cross commenting

      (Written by Reviewer 2)

      There seems to be overall consensus among the reviewers on 3 issues:

      1. A somewhat more precise understanding of the role of vhaAC45L in the synaptic vesicle cycle through better localization studies and some classic assays (like FM dye uptake).

      —See our answers to comments 1 of Reviewer 1 and Reviewer 2.

      1. A little more characterization of the transmission defects (e.g. studying evoked responses) would be welcome.

      —See our answers comment 3 of Reviewer 2.

      1. Ascertaining the validity of the alleles with rescue experiments, perhaps in the V5 mutant background to allow localization analysis in a rescued background.

      —See our answers to comment 2 of Reviewer 2.

      I think further biochemical analysis is interesting but probably beyond the scope of this initial description and would take too much time.

      We fully agree with this statement.

      The minor issues are easy to address

      We have addressed all of them in the preliminary revised version of the manuscript.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      We would like to thank you for the comments concerning our manuscript. We responded to each question, as described below. All the authors feel that our manuscript has been much improved by your comments.

      Minor comments:

      Q1. Fig 1 any male vs female mice differences in ATF6b expression?

      Response. We performed qPCR using several tissues from male and felame WT mice, and confirmed no significant differences in Atf6b mRNA levels between male and female mice. We put this result in Figure S1 C.

      Q2. Fig 2C. Please show molecular weight markers on blots

      Response. We put molecular weight markers in Fig.2C, as you suggested.

      Q3. Fig 2C. what are the doublet bands on calnexin?

      Response. Calnexin is sometimes shown as double bands in tissues such as kidney, liver and heart by western blotting (Zeng et al., PLoS One. 2009 Aug 26;4(8):e6787). Although the mechanism is unknown, it could be due to the post-translational modification such as phosphorylation (Wong et al, J Biol Chem. 1998 Jul 3;273(27):17227-35) or partial degradation although proteinase inhibitors are added in the lysis buffer. To my knowledge, alternative splicing is not likely to be the case.

      Q4. Fig 3. what are the ERSE sequences? several different binding sites are reported in literature.

      Response. We put the ERSE sequence in Materials and Methods and in the Figure legends for Figure 3 as “CCAATN9CCACG (Yoshida et al., 1998)”.,

      Q5. p8. What is meant by 5' Atf6b lacks 10 and 11?

      Response We corrected to “Atf6b transcript, which lacks exon 10 and 11, in these mice”.

      Discussion: Please clarify if anti-ATF6-beta antibodies were available for these studies.

      Response. We tried different anti-ATF6β antibodies to detect endogenous ATF6β in culture neurons by western blot. We successfully observed both full-length and N-terminal fragment (the active form) using the one from Biolegend (#853202) (Figure 1E in the new version). We replaced the result with FLAG antibody in HEK293T cells in the old version.

      Discussion: It is puzzling that ATF6a induces calreticulin more potently than ATF6b, but the calreticulin defect is selectively dependent on ATF6b. Could authors speculate on this paradox? It would be interesting to expand on differences between ATF6a and ATF6b function and phenotypes in Discussion in mouse and in people.

      Response. In Discussion, we added sentences regarding a bit puzzling role for ATF6β in CRT expression in the CNS, as below. “All the data from RNA-sequence to the promoter analysis suggested that CRT expression was ATF6β-dependent in primary hippocampal neurons. However, overexpression of ATF6α and ATF6β both enhanced CRT promoter activity…”

      And we proposed a new scenario as below,

      “These results may raise a scenario that, in the CNS, expression of molecular chaperones in the ER is generally governed by ATF6α as previously described (Yamamoto et al., 2007) and that ATF6β functions as a booster if their levels are too low. However, expression of CRT is somewhat governed by ATF6β, and ATF6α functions as a booster. The underlying mechanism for this scenario is not clear yet, but neurons may require a high level of CRT expression even under normal condition, as described in Table S2, which may lead to the development of a unique biological system to constitutively produce CRT in neurons. Further studies are required to clarify the molecular basis how this unique system is constructed and regulated.”

      Reviewer 2

      We would like to thank you for the comments concerning our manuscript. We responded to each question, as described below. All the authors feel that our manuscript has been much improved by your comments.

      Major comments:

      Q1. The post-translational processing of ATF6beta must be demonstrated in hippocampal neurons and not in HEK293T cells in Figure 1E. The authors conclude on Page 6, line 18 that "these results suggest that ATF6beta functions in neurons" but it is not obvious how expression in HEK293T cells contributes to this conclusion in any way.

      Response. We performed western blot with different anti-ATF6β antibodies to detect endogenous ATF6β in culture neurons. We successfully observed both full-length and N-terminal fragment (the active form) from the one from Biolegend (#853202). We therefore replaced the result in HEK293T cells with the one in the hippocampal neurons (Figure 1E in new version).

      Q2. The hippocampal neurons are affected by the loss of ATF6β, even though the mice are not exposed to tunicamycin. Could the authors present evidence that there is physiological ER stress in hippocampal neurons? If not, why is ATF6beta required.

      Response Evidence suggests that neuronal activities including excitatory signals can cause physiological ER stress and induce the UPR at the distal dendrites in the hippocampal neurons (Murakami et al., Neuroscience. 2007 Apr 25;146(1):1-8, Saito et al., J Neurochem. 2018 Jan;144(1):35-49). Among the UPR branches, Ire1-XBP1 pathway has been reported to play an important role in this dendritic UPR and expression of BDNF in cell soma (Saito et al., 2018). Although the present study focuses on the role of ATF6β in the pathological ER stress which causes neuronal death, we believe that it will be intriguing to analyze its role of ATF6β in the physiological ER stress and in the local UPR machinery in neurons.

      Q3. In Figure 3, is there a specific reason why the authors do not mutate the ERSEs in the mouse CRT reporter, pCC1 and instead opt to analyze the huCRT reporter? Given that all the other observations in the manuscript are in mouse calreticulin, it is important to show that the ERSEs in the mouse calreticulin promoter are also regulated in an ATF6beta-dependent manner. Similar to the huCRT reporter, it is also crucial to examine if ATF6beta can regulate the mouse CRT promoter. This would provide an explanation for why calreticulin expression is not completely abolished in ATF6beta mutants.

      Response We added the data of the deletion mutant of mouse CRT promoter, pCC3, which has only 415bp, but still keeps both ERSE1 and 2 in it. pCC3 showed similar promoter activity to pCC1 (Figure 3 B) and huCRT (wt) (Figure 3 C) in both of WT and Atf6b-/- neurons. Because pCC5, which has 260bp but does not have ERSEs in it, lost completely CRT promoter activity (Waser et al., 1997), it is most likely that mouse and human CRT promoters are regulated in a similar manner via ERSEs.

      Q4. In Figure 5A and B, the density of Tubulin staining varies from panel to panel, and is much lower in ATF6beta mutants treated with Tg/Tm. Presumably this is because of cell death but this should be clarified in the main text. Additionally, it is unclear if the EthD-1 staining is nuclear localized. It would help if single channel images for Hoechst and EthD-1 were provided to visualize this.

      Response In Figure 5A and B, we added the statement for the reduction of Calcein-AM (A) and βIII tubulin (B) in the main text. We also added single channel images for Hoechst and EthD-1 in Figure S4 to confirm the nuclear localization of EthD-1.

      Q5. The literature reports that BAPTA-AM treatment itself could cause ER stress (e.g. PMID: 12531184). Here, the authors report the opposite effect. How could the authors reconcile the difference? The effects of BAPTA-AM and 2-APB must individually be examined in Figure 6C and not just in combination with Tm.

      Response. We added the data that BAPTA-AM and 2-APB alone did not cause neuronal death at the concentrations used in this study in Figure S6 B and in the main text.

      Q6. The authors allude to "impairment of Ca2+ homeostasis in ATF6beta mutants" in Page 13 Line 2, but do not show any direct evidence in support of it. While treatment with BAPTA-AM and 2-APB is a start in that direction, it certainly does not demonstrate that under homeostatic conditions in vivo or in vitro there is any change in calcium flux in ATF6beta hippocampal neurons. To make the case that there is indeed perturbation of Ca2+ in ATF6beta mutant hippocampal neurons, the authors need to examine calcium flux and measure calcium indicators and how they are affected when ER stress is induced in these mutant cells.

      Response We added the data that the Ca2+ store in the ER was reduced and Ca2+ concentration in the cytosol increased in Atf6b-/- neurons both under normal and ER stress conditions in Figure 4C.

      Q7. The effect of 2-APB and salubrinal alone on hippocampal neurons need to be examined in Figure 9B-D to eliminate the possibility that these drugs are not enhancing cell survival under normal conditions in a parallel manner.

      Response We added the data that 2-APB and salubrinal alone did not cause neuronal death in the hippocampus in our model in Figure S8 C.

      Q8. The rationale for the examination of Fos, Fosb and Bdnf is poorly described (page 14, line 13) and the conclusions from this line of experimentation are rather weak. The results from Figure 9 to some extent serve to confirm in vivo the data seen in Figure 6C but by no means provide a mechanism for why ATF6beta mutants have perturbed calcium homeostasis (page 14, line 22).

      Response We agreed with your comments that the examination of Fos, Fosb and Bdnf is relatively weak. We, therefore, moved these data to supplementary information (Figure S8 A and B).

      Minor comments:

      Q1. Page 8, line 3: Their rationale for why ATF6beta 5'UTR sequences are seen in their RNA seq data is not clearly explained. This must be rewritten for clarity.

      Response In Atf6b-/- mice, exon 10 and 11 were deleted by homologous recombination. Therefore, 5’ part of Atf6b gene including exon 1-9 can be transcribed. We added the statement in Results, as below.

      “this may be due to the presence of the 5’ Atf6b transcript with exon 1-9 in these mice, in which exon 10 and 11 were deleted by homologous recombination.”

      Q2. Page 8, line 5, the authors write that besides Atf6β , CRT was the only UPR-regulated gene downregulated in Atf6β mutant mice. The authors need to state how they defined "UPR-regulated genes". There must be a list, which the authors do not cite.**

      Response. To avoid the possible confusion, we changed the term “UPR-regulated genes” to “ER stress-responsive genes”.

      Q3. Page 9, line 10: A reference is required for ERSEs.

      Response We added the reference for ERSEs, as you suggested.

      Q4. Page 10, line 6: The authors say "ATF6beta specifically induces CRT promoter activity". This is a confusing statement because "induction" is in response to stress, but the context here is homeostatic regulation since there is ostensibly no stress being induced. This distinction should be made and corrected here and throughout the manuscript.

      Response To avoid the confusion, we changed the sentence to “ATF6β specifically enhances CRT promoter activity”.

      Q5. Page 10, line 16: The use of "latter" here is confusing and it would help to restructure this sentence for clarity.

      Response To avoid the confusion, we changed the phrase to “under control condition and after stimulation with Tg (Figure 4A upper row) or Tm (Figure 4A lower row)

      Q6. Figure 9A is missing Y-axis labels.

      Response We changed Figure 9A (Figure S8 A in new version) and Figure Legends to clarify what each axis indicates.

      Reviewer 3

      We would like to thank you for the comments concerning our manuscript. We responded to each question, as described below. All the authors feel that our manuscript has been much improved by your comments.

      Major comments

      Comment #1. The authors show that overexpression of either Atf6a or Atf6b both increase Crt expression in Atf6b knockout cells. While it is clear that deletion of Atf6a does not basally reduce Crt levels, the overexpression experiment does lead to a question as to how Atf6b can specifically be involved in regulating Crt expression. In the discussion, the authors seem to propose that homo- and hetero-dimerization of ATf6a and Atf6b are required for the basal expression of Crt and that Atf6b serves as a 'booster' of ER chaperone expression. They explicitly state that "Atf6a and Atf6b are required to induce CRT expression". However, it remains unclear to me why in this case would Atf6a deletion not impair Crt expression? The authors address this by invoking a mechanism whereby hippocampal neurons are more reliant on Atf6b for Crt expression, but this doesn't really make sense to me. Ultimately, this point underscores the lack of clear mechanistic basis to explain how Atf6b selectively regulates Crt in the hippocampus. This needs to be better resolved through more experimentation. For example, a ChIP experiment monitoring the binding of ATF6b and ATF6a to the Crt promoter in hippocampal and control cells would go a long way towards addressing this issue.**

      Response. In Discussion, we first made the point clearer that CRT expression is ATF6β-dependent, while those of other molecular chaperones in the ER are ATF6α-dependent. Then, we raised a scenario that, in the CNS, expression of molecular chaperones in the ER is generally governed by ATF6α as previously described (Yamamoto et al., 2007) and ATF6β functions as a booster if their levels are too low. However, expression of CRT is somewhat governed by ATF6β, and ATF6α functions as a booster. We also wrote the limitation of the current study and requirement of the further study to clarify the molecular basis of the unique system to ensure CRT expression in neurons.

      Comment #2. The importance of ATF6b for protecting against insults needs to be better described. For example, the authors should show that overexpression of ATF6b protects against ER stress induced neuronal toxicity in cell culture and in vivo kainate induced neuronal toxicity. Similarly, the authors should evaluate how overexpression of ATF6a protects against these insults to better define the specific dependence of hippocampal neurons on ATF6b. The authors do show that overexpression of ATF6b can rescue the reduced Crt observed in Atf6b-deleted neurons, but the protection should similarly be demonstrated.**

      Response. We performed rescuing experiments to see both of ATF6β and ATF6α overexpression improve cell viability of Atf6b-/- neurons under ER stress. Interesting. ATF6β, but not ATF6α, rescued Atf6b-/- neurons. In Discussion, we raised the possible reasons as below.

      “The lack of rescuing effect of ATF6α may be due to the fact that this molecule enhances the expression of different genes including cell death-related molecule CHOP in addition to molecular chaperons in the ER (Yoshida et al., 2000).”

      Comment #3. Similar to #2, the authors should show that the potential for ATF6b (and ATF6a) overexpression to protect against different insults is impaired in Crt+/- neurons. The authors demonstrate that Crt-depletion increases sensitivity to toxic insults. This would go a long way to demonstrate the importance of the proposed ATF6b-CRT signaling axis in regulating neuronal survival in response to pathologic insults.**

      Response. Unfortunately, right now, the breeding of Calr+/- mice is not in good condition. Although we are increasing the number of mice used for breeding, we have to wait pregnancies to get embryos for isolating neurons from hippocampus. Once we get enough number of mice, we would try the rescuing experiment of Calr+/- hippocampal neurons with ATF6β and ATF6α. However, we also think rescuing experiments of Atf6b-/- neurons by ATF6β, ATF6α, and CRT may be enough in this paper.

      Comment #4. When reporting the RNAseq data, the authors should use the q-value (i.e., FDR) instead of the p-value. This will likely affect the number of genes reported in Table 1, but it is the appropriate statistical test for this type of data.**

      Response. As you suggested, we replace Table1 with a new list which was filtered with the q-value. However, some important and consistent information were obtained from the list filtered with the p-value, we keep it as Table S1 in the supplementary information.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this manuscript, the authors define the functional importance of ATF6b in the hippocampus. They show that ATF6b is highly expressed in the hippocampus relative to other tissues. They demonstrate that deletion or depletion of ATF6b in cultured hippocampal neurons enhances ER stress induced death. Similarly, Atf6b-/- mice show increased sensitivity to kainate induced neuronal death. These results reveal an important role for ATF6b in regulating hippocampal survival in response to pathologic insults. To define a molecular basis for this protection, the authors utilized RNAseq to identify the lectin chaperone calreticulin (Crt) as a gene whose expression is basally reduced in cultured hippocampal neurons where Atf6b is deleted. They show the re-overexpression of Atf6b (or Atf6a) both restore Crt levels in these neurons, underscoring the importance of Atf6 in regulating basal Crt levels. They go on to demonstrate that loss of Atf6b impairs ER stress-dependent increases in Crt, while minimally impacting other Atf6 target genes, again highlighting the importance of Atf6b for Crtregulation. Importantly, overexpression of Crt rescues the increased ER stress-induced toxicity observed in Atf6b knockout neurons, indicating that a primary mechanism by which Atf6b regulates neuronal survival in response to ER stress is through increased Crt expression. Consistent with this, mimicking the 50% reduction in Crt observed in Atf6b knockout neurons using Crt+/- mice showed similar sensitivity to kainate induced neuronal death. Collectively, these results describe an Atf6-Crt axis that is important for regulating neuronal survival in response to pathologic insults.

      Overall the experiments are interesting and provide new insights into the importance of Atf6b in neuronal survival. Notably, the evidence showing that loss of Atf6b increases hippocampal neuron sensitivity to ER stress and kainate induced toxicity are compelling. Any results describing the biological function of Atf6b are interesting, considering how little we know about this ER stress sensing protein. That being said, I have some concerns about the work described that require addressing before publication. Notably, I think more work needs to be done to define the molecular basis for the specific dependence of Crt expression on ATF6b in hippocampal neurons. Further, the authors need to do more experiments to demonstrate the specific importance of ATF6b signaling in the context of ER stress and in vivo neuronal death. I outline these various concerns below:

      Comment #1. The authors show that overexpression of either Atf6a or Atf6b both increase Crt expression in Atf6b knockout cells. While it is clear that deletion of Atf6a does not basally reduce Crt levels, the overexpression experiment does lead to a question as to how Atf6b can specifically be involved in regulating Crt expression. In the discussion, the authors seem to propose that homo- and hetero-dimerization of ATf6a and Atf6b are required for the basal expression of Crt and that Atf6b serves as a 'booster' of ER chaperone expression. They explicitly state that "Atf6a and Atf6b are required to induce CRT expression". However, it remains unclear to me why in this case would Atf6a deletion not impair Crt expression? The authors address this by invoking a mechanism whereby hippocampal neurons are more reliant on Atf6b for Crt expression, but this doesn't really make sense to me. Ultimately, this point underscores the lack of clear mechanistic basis to explain how Atf6b selectively regulates Crt in the hippocampus. This needs to be better resolved through more experimentation. For example, a ChIP experiment monitoring the binding of ATF6b and ATF6a to the Crt promoter in hippocampal and control cells would go a long way towards addressing this issue.

      Comment #2. The importance of ATF6b for protecting against insults needs to be better described. For example, the authors should show that overexpression of ATF6b protects against ER stress induced neuronal toxicity in cell culture and in vivo kainate induced neuronal toxicity. Similarly, the authors should evaluate how overexpression of ATF6a protects against these insults to better define the specific dependence of hippocampal neurons on ATF6b. The authors do show that overexpression of ATF6b can rescue the reduced Crt observed in Atf6b-deleted neurons, but the protection should similarly be demonstrated.

      Comment #3. Similar to #2, the authors should show that the potential for ATF6b (and ATF6a) overexpression to protect against different insults is impaired in Crt+/- neurons. The authors demonstrate that Crt-depletion increases sensitivity to toxic insults. This would go a long way to demonstrate the importance of the proposed ATF6b-CRT signaling axis in regulating neuronal survival in response to pathologic insults.

      Comment #4. When reporting the RNAseq data, the authors should use the q-value (i.e., FDR) instead of the p-value. This will likely affect the number of genes reported in Table 1, but it is the appropriate statistical test for this type of data.

      Significance

      This manuscript provides new context for understanding the functional relationship between Atf6a and the less-studied Atf6b in regulating neuronal survival. As with other studies focused on the relationship between these two ATF6 isoforms, this study demonstrates that these transcriptional programs integrate to coordinate a tissue-specific response to ER stress. Intriguingly, these studies indicate that ATF6b has a specific role in regulating the ER lectin chaperone CRT and that this ATF6b-CRT axis uniquely regulates neuronal survival in response to ER stress. While additional experiments are required to support this claim, the work described herein is a nice addition to our evolving understanding of the importance of ATF6b in regulating ER and cellular physiology during pathologic insults.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      Unfolded Protein Response (UPR) refers to homeostatic signaling pathways that play protective roles in various cell types. This work by Nguyen et al focuses on the UPR-mediator ATF6. In mammals, there are two isoforms of ATF6, alpha and beta. Nguyen et al show that the expression of the ATF6beta isoform is higher in hippocampal neurons whereas the ATF6alpha isoform is more evenly distributed across various neuronal subtypes. By performing gene expression profiling in mouse brain samples, they identify the ER chaperone calreticulin (CRT) as being significantly downregulated in ATF6beta null mutants. They further validate this observation by comparing hippocampi from ATF6alpha and ATF6beta null mice, where CRT is lowered in the latter but not the former. They identify and mutate putative ER stress response elements (ERSE) in the CRT promoter region to show that expression of CRT can be regulated by both ATF6alpha and beta. They demonstrate that treatment of hippocampal neurons with ER stress inducing chemicals leads to induction of CRT, which is suppressed in ATF6beta mutants. Such treatment also leads to cell death, which is exacerbated in ATF6beta mutants but rescued by ectopic expression of CRT. They also extend these observations to cell death induced by treatment with the glutamate receptor agonist, kainate, which was also exacerbated in ATF6beta mutants, but was rescued by counter treatment with ER stress inhibitors. Together, their data suggest a protective role for ATF6beta in hippocampal neurons in the context of ER stress.

      Major comments

      The primary advantage of this work is that much of it was done in vivo in mice, providing immediate context for the role of ATF6β under physiological conditions. They identify a specific region of the brain that requires ATF6beta. On the other hand, the ATF6-CRT signaling axis reported here had been established previously, and therefore, this study brings limited conceptual advances regarding the signaling mechanism itself (see Significance section below).

      Overall, the authors' data support their primary claim that ATF6β has a neuroprotective role in the context of ER stress. The data presented are clear and convincing, and their methods appear rigorous. The manuscript could be further improved if the authors could provide sufficient rationale for some of their experiments, which are discussed below.

      1. The post-translational processing of ATF6beta must be demonstrated in hippocampal neurons and not in HEK293T cells in Figure 1E. The authors conclude on Page 6, line 18 that "these results suggest that ATF6beta functions in neurons" but it is not obvious how expression in HEK293T cells contributes to this conclusion in any way.
      2. The hippocampal neurons are affected by the loss of ATF6β, even though the mice are not exposed to tunicamycin. Could the authors present evidence that there is physiological ER stress in hippocampal neurons? If not, why is ATF6beta required.
      3. In Figure 3, is there a specific reason why the authors do not mutate the ERSEs in the mouse CRT reporter, pCC1 and instead opt to analyze the huCRT reporter? Given that all the other observations in the manuscript are in mouse calreticulin, it is important to show that the ERSEs in the mouse calreticulin promoter are also regulated in an ATF6beta-dependent manner. Similar to the huCRT reporter, it is also crucial to examine if ATF6beta can regulate the mouse CRT promoter. This would provide an explanation for why calreticulin expression is not completely abolished in ATF6beta mutants.
      4. In Figure 5A and B, the density of Tubulin staining varies from panel to panel, and is much lower in ATF6beta mutants treated with Tg/Tm. Presumably this is because of cell death but this should be clarified in the main text. Additionally, it is unclear if the EthD-1 staining is nuclear localized. It would help if single channel images for Hoechst and EthD-1 were provided to visualize this.
      5. The literature reports that BAPTA-AM treatment itself could cause ER stress (e.g. PMID: 12531184). Here, the authors report the opposite effect. How could the authors reconcile the difference? The effects of BAPTA-AM and 2-APB must individually be examined in Figure 6C and not just in combination with Tm.
      6. The authors allude to "impairment of Ca2+ homeostasis in ATF6beta mutants" in Page 13 Line 2, but do not show any direct evidence in support of it. While treatment with BAPTA-AM and 2-APB is a start in that direction, it certainly does not demonstrate that under homeostatic conditions in vivo or in vitro there is any change in calcium flux in ATF6beta hippocampal neurons. To make the case that there is indeed perturbation of Ca2+ in ATF6beta mutant hippocampal neurons, the authors need to examine calcium flux and measure calcium indicators and how they are affected when ER stress is induced in these mutant cells.
      7. The effect of 2-APB and salubrinal alone on hippocampal neurons need to be examined in Figure 9B-D to eliminate the possibility that these drugs are not enhancing cell survival under normal conditions in a parallel manner.
      8. The rationale for the examination of Fos, Fosb and Bdnf is poorly described (page 14, line 13) and the conclusions from this line of experimentation are rather weak. The results from Figure 9 to some extent serve to confirm in vivo the data seen in Figure 6C but by no means provide a mechanism for why ATF6beta mutants have perturbed calcium homeostasis (page 14, line 22).

      Minor comments

      1. Page 8, line 3: Their rationale for why ATF6beta 5'UTR sequences are seen in their RNA seq data is not clearly explained. This must be rewritten for clarity.
      2. Page 8, line 5, the authors write that besides Atf6β , CRT was the only UPR-regulated gene downregulated in Atf6β mutant mice. The authors need to state how they defined "UPR-regulated genes". There must be a list, which the authors do not cite.
      3. Page 9, line 10: A reference is required for ERSEs.
      4. Page 10, line 6: The authors say "ATF6beta specifically induces CRT promoter activity". This is a confusing statement because "induction" is in response to stress, but the context here is homeostatic regulation since there is ostensibly no stress being induced. This distinction should be made and corrected here and throughout the manuscript.
      5. Page 10, line 16: The use of "latter" here is confusing and it would help to restructure this sentence for clarity.
      6. Figure 9A is missing Y-axis labels.

      Significance

      The authors summarize their major findings of the study (at the beginning of the Discussion) as ATF6β being required for CRT induction in the hippocampus, and that this ATF6β -CRT axis is important for the survival of hippocampal neurons. The idea that ATF6 induces CRT had been previously shown by others (PMID 9837962), and therefore, this is not the major new discovery of this study. In addition, the ATF6-calreticulin axis having a cell protective role had been reported in other biological contexts (e.g. PMID: 32905769), so that concept is also not a novel concept presented in this work. Similarly, the role of UPR in glutamate receptor agonist-induced neuronal cell death had been shown previously (the authors cite Kitao e tal., 2001; Sokka et al., 2007; Kezuka et al., 2016), so this link is not the major novel discovery revealed by this study. Instead, this study reports that ATF6β KO mice have specific phenotypes in hippocampal neurons, which had not been reported previously. Furthermore, this manuscript reports detailed information regarding Atf6β's downstream target genes in this tissue. In summary, this study's finding that ATF6 regulates CRT is confirmatory, rather than bringing new conceptual advances. The merit of this study is in the identification of the hippocampus as the organ that specifically requires ATF6beta. While the findings here may not appeal to a broader audience interested in UPR signaling mechanisms, it may draw interest from those who study hippocampal neuron physiology.

      For the editor's reference, this reviewer's field of expertise is in UPR signaling mechanisms in animal models

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Nguyen and colleagues provide evidence that ATF6-beta selectively induces calreticulin expression in mouse hippocampal neurons to protect these neurons from ER stress-inducing toxins. This is a well-written and well-organized report that provides functional information about ATF6-beta, a poorly studied homolog of the ATF6-alpha Unfolded Protein Response regulator. The report suggests that ATF6-beta has a previously unknown and important function in helping brain neurons survive ER stress by regulating calreticulin.

      The study shows that addition of BAPTA, 2-APB, or salubrinal significantly improves neuronal survival in ATF6-/- explants and mice brains in response to ER toxins. But, prior study (PMID: 15705855) used salubrinal at much higher concentration 75uM with little effect at the 5uM dose used in the current study. Evidence should be provided that these drugs are specifically inhibiting ER stress or off-target mechanisms should be discussed in their experimental models.

      Minor comments:

      Fig 1 any male vs female mice differences in ATF6b expression?

      Fig 2C. Please show molecular weight markers on blots

      Fig 2C. what are the doublet bands on calnexin?

      Fig 3. what are the ERSE sequences? several different binding sites are reported in literature.

      p8. What is meant by 5' Atf6b lacks 10 and 11?

      Discussion: Please clarify if anti-ATF6-beta antibodies were available for these studies.

      Discussion: It is puzzling that ATF6a induces calreticulin more potently than ATF6b, but the calreticulin defect is selectively dependent on ATF6b. Could authors speculate on this paradox? It would be interesting to expand on differences between ATF6a and ATF6b function and phenotypes in Discussion in mouse and in people.

      Significance

      ATF6-beta is homolog of ATF6-alpha and assumed to function like ATF6-alpha. This report describes a selective function of ATF6-beta in inducing calreticulin in mouse neurons during ER stress. This suggests ATF6-beta has some different functions than ATF6-alpha in the mouse hippocampal neurons.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      RESPONSE TO REVIEWERS

      We thank Review Commons and its three reviewers. Reviewers 2 and 3 provide detailed comments, which we address individually. Reviewer 1, however, gives a general critique of how we have approached asking how genome architecture affects the extent of evolution and the details of evolutionary trajectories. Our interpretation of their comments is that our approach and the one that they advocate represent two philosophically different, but complementary, views about how to study evolution in the laboratory. We begin by discussing this difference and then proceed to a point by point response to the three reviews.

      Reviewer 1

      Philosophical differences with Reviewer 1

      We interpret Reviewer 1’s comments as endorsing a formal, quantitative study of evolution that aims to explain the factors that control the rate at which fitness increases during experimental evolution. This approach derives from classical population genetics and aims to use a mixture of theory and experiment to uncover general principles that would allow rates of evolution and evolutionary trajectories, expressed as population fitness over time, to be predicted from quantitative parameters, such population sizes, mutation rates, distributions of the fitness effects of mutations (including their degree of dominance in diploids), and global descriptions of either general (e.g. diminishing returns) or allele-specific epistasis.

      This approach aims to predict how the average fitness trajectory should be affected by variations in these parameters and describe the variation, at the level of fitness, in the outcomes in a set of parallel experiments. This is an important approach and have previously used it to investigate how the strength of selection influences the advantage of mutators (Thompson, Desai, & Murray, 2006) and to produce and test theory that predicts how mutation rate and population size control the rate of evolution (Desai, Fisher, & Murray, 2007). Like every approach to evolution, this one has limitations: 1) if it doesn’t identify mutations or investigate phenotype other than fitness, it cannot reveal the biological and biochemical basis of adaptation or report on how variations in population genetic parameters (population size, haploids versus diploids, etc.) influence which genes acquire adaptive mutations, and 2) if the details of experiments (e.g. whether populations are clonal or contain standing variation, or which phenotypes are being selected for) have strong effects on the population genetic parameters, these must be measured before theoretical or empirical relationships could be used to predict the mean and variance of fitness trajectories produced by a given selection. A variety of evidence suggests that the second limitation is real. Examples include the absence of a universal finding that diploid populations evolve more slowly than haploids (discussed on Lines 437-442), even within the same experimental organism, and the finding that diminishing returns epistasis applies well to domesticated yeast evolving in a variety of laboratory environments (e. g. papers from the Desai lab, starting with (Kryazhimskiy, Rice, Jerison, & Desai, 2014) but not to the evolutionary repair experiments that we have conducted (Fumasoni & Murray, 2020; Hsieh, Makrantoni, Robertson, Marston, & Murray, 2020; Laan, Koschwanez, & Murray, 2015).

      The second approach to experimental evolution, which we, as molecular geneticists and cell biologists, predominantly take, is to follow the molecular and cell biological details of how organisms adapt to selective pressure. We subject organisms to defined selective forces, identify candidate causative mutations, test them by reconstructing the evolved mutations, individually and in combination, and perform additional experiments to ask how these mutations are increasing fitness. Because these experiments are performed on model organisms and often address phenotypes that have been studied by classical and molecular genetics, we can often say a good deal about the cell biological and biochemical mechanisms that increase fitness and this work can complement and extend what we know from classical and molecular genetics.

      The current manuscript and its predecessor are examples of finding causative mutations and asking how they improve fitness, with the first paper (Fumasoni & Murray, 2020) demonstrating how mutations in three functional modules could overcome most of the fitness cost of removing an important but non-essential protein and the current paper asking how alterations in genome architecture and dynamics (diploidy and eliminating double-strand break-dependent recombination) affect the extent to which populations increase in fitness and which genes and functional modules acquire mutations as they do so.

      By definition, such experiments are anecdotal: they report on how particular genotypes and genome architectures respond to particular selection pressures. Any individual set of experiments can produce conclusions about the effects of variables, such a population size, mutation rate, and genome architecture, on the mutations that increased fitness in response to the specific selection, but they can do more than lead to speculation and inference about what would happen in other experiments: speculation from the results of a single project and inferences from the combined results of multiple projects. Our interpretation is that the evolutionary repair experiments that we have performed, which have perturbed budding, DNA replication, and the linkage between sister chromatids do indeed lead to a common set of inferences: most of the selected mutations reduce or eliminate the function of genes, the interactions between the selected mutations are primarily additive, and the mutations cluster in a few functional modules.

      We believe that the population and molecular genetic approaches to experimental evolution are complementary and that a full understanding of evolution will require combining both of them. We think this will be especially true as we try to use the findings from laboratory studies to improve our understanding of evolution outside the lab, which takes place over longer periods, in more temporally and spatially variable environments, and is subject to variation in multiple population genetic and biological parameters.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In their previous work the authors examined adaptation in response to replication stress in haploid yeast, via experimental evolution of batch cultures followed by sequencing. Here they extend this approach to include diploid and recombination-deficient strains to explore the role of genome architecture in evolution under replication stress. On the whole, a common set of functional modules are found to evolve under all genetic architectures. The authors discuss the molecular details of adaptation and use their findings to speculate on the determinants of adaptation rate.

      **SECTION A - Evidence, reproducibility and clarity** Experimental evolution can reveal adaptive pathways, but there are some challenges when applying this approach to compare genetic backgrounds or environments. They key challenge is that adaptation potentially depends on both the rate of mutation and the nature of selection. Distinct adaptation patterns between groups could therefore reflect differential mutation, selection, or both. The authors allude to this dichotomy but have very limited data to address it. The closest effort is engineering putatively-adaptive variants into all genetic background including those where they did not arise; the fact that such variants remain beneficial suggests they did not arise in certain backgrounds because of a lower mutation rate, but this is a difficult issue to tackle quantitatively.

      We agree, wholeheartedly, that adaptation depends on the combination of mutation rates and the nature of selection and our goal was to ask how the molecular nature of adaptation depends on genome architecture when three different architectures are subjected to the same selection: constitutive replication stress caused by removing an important component of replisome. We used a haploid strain as a baseline and compared it to two other strains chosen to influence either the effect of mutations (a diploid, where fully recessive mutations that were beneficial in the haploid would become neutral) or the rate of mutations (a recombination-defective strain that would be unable to use ectopic recombination to amplify segments of the genome). In both cases, we expected to see effects that are closer to qualitative than quantitative: the absence of fully recessive mutations in evolved diploids and absence of segmental amplification in the recombination-deficient haploid. We see both effects and they then allow us to ask two other questions: 1) does influencing the effect of a class of mutation (diploids) or preventing a class of mutation (recombination defect) have a major effect on the rate of evolution, and 2) do these differences affect which modules adaptive mutations occur in. As far as we can tell, the answer is no to both questions. We use “as far as we can tell” because our experiments do have limitations. First, the recombination-defective strain has a higher point mutation rate making it impossible to tell how much this elevation, rather than any other factor, accounts for it showing a greater fitness increase than the recombination-proficient haploid. Unfortunately, to our knowledge, it’s impossible to abolish recombination without affecting mutation rates. Second, we only experimentally tested a subset of the inferred causative mutations meaning that for many genes, our assertion that they are adaptive is a statistical inference and their assignment to a particular functional module is based on prior literature rather than our own experiments. In response to this criticism, we have now rephrased some of our sentences (see below).

      From mutation accumulation experiments, where the influence of selection is minimized, there is evidence that genetic architecture affects the rate and spectrum of spontaneous mutations. In this experiment, the allele used to eliminate recombination, rad52, will also increase the mutation rate generally. The diploid strain is also likely to have a distinct mutational profile--as a null expectation diploids should have twice the mutation rate of haploids. Recent evidence indicates the mutation rate difference between haploid and diploid yeast might be less than two-fold, but that there are additional differences in the mutation spectrum, including rates of structural change. The context for this study is therefore three genetic architectures likely to differ in multiple dimensions of their mutation profiles, but mutation rates are not measured directly.

      The reviewer is correct that we did not explicitly measure mutation rates, although the frequency of synonymous mutations (Figure 3-S1B) is a proxy for the point mutation rate as long as the majority of these mutations are assumed to be neutral. By this measure, the mutation rates for ctf4∆ haploids and ct4∆/ctf4∆ diploids, expressed per haploid genome, are close to each other (1.94 for haploids and 1.37 for diploids) but different enough to return p = 0.044 by Welch’s test, whereas the mutation rate for the recombination-deficient, ctf4∆ rad52∆ haploid is 4 to 5-fold higher (7.03). In contrast, we can infer that the ctf4∆ rad52∆ strain has much lower rates of segmental aneuploidy produced by recombination: we see only one such event in this strain in contrast to 16 in the ctf4∆ haploid and 44 in the ctf4∆/ctf4∆ diploid (Supplementary table 4), even though the amplification of the cohesin loader gene, SCC2, confers similar benefits in all three strains.

      The nature of selection on haploids and diploids is expected to differ because of dominance, but ploidy-specific selection is also possible. The authors discuss how recessive beneficial alleles may be less available to diploids, though this can be offset by relatively rapid loss of heterozygosity. However, diploids should also incur more mutations, all else being equal. The rate of beneficial mutation, as opposed to the rate of mutation generally, will depend on the mutational "target size" of fitness, and the authors findings recapitulate other literature (particularly regarding "compensatory" adaptation) that points to faster adaptation in genotypes with lower starting fitness.

      We agree with the reviewer and tried to make the point that which mutations are fixed is primarily determined by the product of the rate at which they occur and the benefit which they confer (lines 193-196). Evidence in budding yeast suggests that in diploid cells, removing one copy of most genes fails to produce a measurable fitness benefit (Deutschbauer et al., 2005), suggesting that losing one copy of many genesis purely recessive. If this was always the case, it would be very hard for such heterozygous, loss-of-function mutations to contribute to evolution in diploids: a mutation that inactivates one copy of a gene would have to rise to high enough frequency by genetic drift that homozygosis of this mutation mitotic recombination would have a significant probability. Instead we find that heterozygous mutations in some genes (inactivation of RAD9, what are likely to be hypomorphic mutations in SLD5) but not others (inactivation of IXR1) confer benefits in diploids that allow their frequency to rise much more rapidly by selection than they would by drift, allowing them to reach frequencies at which mitotic recombination becomes probable.

      There is ample literature on the above topics, particularly discussions of the evolutionary advantages of haploidy versus diploidy. While adaptation to replication stress provides a novel starting point for this investigation, much of the manuscript is devoted to long-standing questions that are not specific to replication stress. Unfortunately, the data the authors collected is not sufficient to shed light on these questions, because mutation and selection cannot be effectively distinguished. The Discussion states that "We find that the genes that acquire adaptive mutations, the frequency at which they are mutated, and the frequency at which these mutations are selected all differ between architectures but that mutations that confer strong benefits always lie in the same three modules" (line 379), but it is not clear that these statements are all supported by the data.

      The reviewer makes two points: we fail to make a significant contribution to long-standing questions about the evolutionary genetics of adaptation and the we make statements that are not supported by our data. On the first we disagree: unlike much of the previous work which compares the effects of mutation rates and population sizes on the rates of evolution, we sequence genomes, identify putative causative mutations, verify that they increase fitness, and test, by reconstruction, how their contribution to fitness is affected by fully characterized genome architectures. We know of no comparable work and we believe that this is a useful contribution to understanding evolution. In addition, some of the literature, for example the discussion of haploidy versus diploidy, has failed to reach a universal conclusion. On the second point, we realized that the statement that the reviewer quotes is stronger than it should be since we do not show “that mutations that confer strong benefits always lie in the same three modules”. What we do show is that mutations in all three modules are found in all three genome architectures (Figure 5), and that combining one mutation from each module (using mutations in genes that are found in that architecture) can reproduce the observed fitness increase in each architecture (Figure 6 B), but the reviewer is correct that we have not demonstrated that every clone from every population has an adaptive mutation in all three modules. We have therefore modified the quoted sentence as follows (altered wording underlined)

      "We find that the genes that acquire adaptive mutations, the frequency at which they are mutated, and the frequency at which these mutations are selected all differ between architectures but that mutations conferring strong benefits can occur in all three modules in each architecture" (Lines 405-408)

      Focusing on the more novel aspect of their experiment-the presence of replication stress-would arguably be a better approach. On this topic the authors have some interesting observations and speculation, but clear predictions are lacking. The introduction section could be redesigned to explicitly state why genome architecture might affect adaptation in response to replication stress in particular, rather than (or in addition to) adaptation generally. If there were no differences in mutation, does the nature of Ctf4 lead to predictions that the molecular basis of compensatory adaptation should differ among genome architectures? Without such predictions it will be difficult for readers to know whether the observation that different genome architectures follow similar adaptive paths is surprising or not.

      We believe that following this suggestion would diminish the paper. We set out to ask how genome architecture affected adaptation to the strong fitness defect produced by removing an important component of an essential process, DNA replication. We chose replication stress as an example of cell biological damage that cells would have to repair with the hope that the results would give general clues about evolutionary repair, rather than hoping that the experiment would inform us about how replication stress altered the types of mutation (e. g. point mutations versus segmental amplification) that were selected As we point out at the beginning of our response, we recognize that the result of any one such experiment must be anecdotal and any attempt to generalize must be described as speculation if it refers only to this one experiment, or inference if it refers to this experiment and other published work. In those cases where we discuss the effect of genome architecture on evolutionary trajectories, we can draw conclusions that apply to our own experiments, but can only speculate on adaptation to different selections. In others, where we see commonalities between our experiments and previous work on evolutionary repair (cite Review), we can make inferences about evolution to adapt to removing important proteins and speculate about other forms of selection. We have revised the discussion to make it clear where we conclude, where we speculate, and where we infer. We suspect that our finding that genome architecture has a larger effect on which genes acquire adaptive mutations than it does on which modules these mutations alter will generalize to other evolutionary repair experiments and may be true even more broadly.

      We deliberately did not make predictions about the effect of genome architecture on the rate at which population fitness increased or the mechanism of adaptation to replication stress because we believed that our ignorance and the diverging results of previous experiments was sufficient to make both exercises worthless. After the fact, we interpret our results to suggest that mutations that reduce the activity of components, such as Sld5, that are stably associated with replication forks should be semi-dominant, but we were not nearly smart enough to make such a specific prediction before the experiment began!

      **Minor comments:** Shifts in ploidy from diploid to haploid are less common than the reverse change, so the observation of such a shift (Fig. 1) should be discussed in more detail.

      We now mention that haploids becoming diploids is more common than the reverse transformation and point out that genome sequencing reveals that these strains are true haploids rather than aneuploids.

      “One diploid population (EVO14) gave rise to a population with a haploid genome content, suggesting a possible haploidization event during evolution. Sequencing revealed no aneuploidies as a potential explanation of this phenomenon. While diploidization has been recurrently observed during experimental evolution with budding yeast (Aleeza C. Gerstein & Otto, 2011; Aleeza C Gerstein, Chun, Grant, & Otto, 2006; Harari, Ram, Rappoport, Hadany, & Kupiec, 2018; Venkataram et al., 2016), reports of spontaneous haploidization events have been instead scarce. Given the difficulties introduced by the change of ploidy over the 1000 generations, we have excluded EVO14 from all our analyses.” (Lines 122-128)

      We believe that the most likely mechanism is that the strain sporulated to produce haploids that were fitter than their diploid parent, but because this event occurred in only one out of eight populations and the proposed explanation is pure speculation we have not included in the revised manuscript.

      Line 88 typo 'stains'.

      Fixed. Thank you.

      Reviewer #1 (Significance (Required)): **SECTION B - Significance** The novel aspect of this study is the combination of replication stress and genome architecture, but here the significance is limited by a lack of clear predictions on how these factors might interact. On the other hand, much of the manuscript is devoted to why adaptation might vary among genome architectures in general, but this long-standing and important question is not particularly well resolved by this experimental approach, which can't disentangle mutation and selection.

      Our belief is that quantitatively predicting how selection will change fitness is nearly impossible because we lack the detailed knowledge of population genetic parameters that apply to our experiments. Prediction is even harder if the goal is to identify which genes will fix adaptive mutations and understand how these mutations alter cellular phenotypes to increase fitness. Thus our approach is almost entirely empirical: we do experiments that alter interesting variables, collect data, and do our best to interpret them and suggest how the conclusions of individual experiments might generalize.

      The authors highlight the dichotomy when discussing the evolution of ploidy: "We suggest that... genome architecture affects two aspects of the mutations that produce adaptation: the frequency at which they occur and the selective advantage they confer" (line 399), but presenting this as a novel inference does not appropriately acknowledge prior research and discussion of these ideas; several relevant papers are cited by the authors in other contexts. It may be possible to recast these findings as a test of the role of genome architecture in adaptation generally, but the authors should clarify the limitations of experimental evolution and more fully consider the theory and data outlined in previous research. In particular, few studies can claim to directly compare mutation rates between genome architectures, and it is not obvious that the present study is an example of such.

      We have the disadvantage that the reviewer doesn’t identify the literature we fail to cite. To us the argument the reviewer quotes is self-evident. As we mention above, our goal was not to test either general or detailed predictions and the level at which we analyzed our experiment, especially demonstrating that mutations were causal and reconstructing them individually and in combination, is missing from previous work. Finally measuring mutation rates is supremely difficult: you either need good ways of following all possible forms of mutation, quantitatively and without selection, or you resort to selecting mutations with a particular phenotype and molecularly characterizing them, knowing that these assays may well give different ratios of the rates of different types of mutation at different loci. We do make and report one measure of mutation rate, the rate of synonymous mutation in protein coding genes, which we discuss above.

      Reviewer expertise: Evoutionary genetics; experimental evolution; mutation.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): **Summary** This manuscript investigates the effect of an organism's genotype (or, as the authors call it, an organism's 'genome architecture') on evolutionary trajectories. For this, the authors use Saccharomyces cerevisiae strains that experience some form of replication stress due to specific gene deletions, and that further differ in ploidy and/or the type of gene(s) deleted. They find the same three functional modules (DNA replication, DNA damage checkpoint, sister chromatid cohesion) are affected across the 3 different genotypes tested; although the specific genes that are mutated varies. **Major comments** This is a solid and exceptionally eloquent paper, comprising a large body of work that is in general well presented. That said, I do have some suggestions and questions. At several points in the manuscript, the authors should perhaps be more careful in their wording and avoid to overgeneralize data without providing additional evidence for these claims.

      We thank the reviewer for their constructive review and address their request for more careful wording below.

      • Some key points of the study are not entirely clear to me; possibly because the study builds upon a previous study that was recently published in eLife. Anyhow, I think it would be useful to clarify the following points a bit more:

        • Why exactly was ctf4∆ chosen as a model for replication stress? What is the evidence that ctf4∆ is a good model for replication stress? Without including some evidence for this, it is unclear how well the findings in this study really can be generalized to replication stress (which is what the authors do now).

      We described the reasons for choosing CTF4 deletion to mimic DNA replication stress in our previous eLife paper, to which we refer at. Nevertheless, the reviewer is right in asking us not to assume that the reader will have read our previous work. Briefly: DNA replication stress is a term that is loosely defined as the combination of the defects in DNA metabolism and the cellular response to these defects in cells whose replication has been substantially perturbed (Macheret & Halazonetis, 2015). Established methods in the field to induce DNA replication stress consist of either pharmacological treatments or genetic perturbation. Pharmacological treatments include hydroxyurea, which target the ribonucleotide reductase and hence stalls forks as a result of dNTP depletion (Crabbé et al., 2010), or aphidicolin, which directly inhibits polymerases α, ε and δ (Vesela, Chroma, Turi, & Mistrik, 2017b; Wilhelm et al., 2019). For genetic perturbation, the conditional depletion of replicative polymerases (Zheng, Zhang, Wu, Mieczkowski, & Petes, 2016) is frequently used. These methods are incompatible with experimental evolution, as cells can mutate the targets of replication inhibitors or alter the expression of genes that have been reduced in expression or activity. Removing an important but non-essential component of the replication machinery avoids these problems. We chose CTF4 deletion as a manipulation that affected the coordination of events at the replication fork: in the absence of Ctf4, the polα-primase complex is no longer physically bound to the replicative helicase, and thus the polymerase’s abundance at the replisome decreases (Tanaka et al., 2009). This manipulation achieves the same effects as polymerase depletion and replisome stalling, producing a constitutive DNA replication stress that can only be overcome by mutations in other genes. Multiple studies have shown that ctf4**D cells display replication intermediates commonly associated to DNA replication stress, such as the accumulation of ssDNA gaps and reversed forks (Abe et al., 2018; Fumasoni, Zwicky, Vanoli, Lopes, & Branzei, 2015), fork stalling (Fumasoni & Murray, 2020), checkpoint activation (Poli et al., 2012; Tanaka et al., 2009) and altered chromosome metabolism (Kouprina et al., 1992).

      We now justify our choice of deleting CTF4 at line 74:

      “DNA replication stress is often induced with drugs or by reducing the level of DNA polymerases (Crabbé et al., 2010; Vesela, Chroma, Turi, & Mistrik, 2017a; Wilhelm et al., 2019; Zheng et al., 2016). To avoid evolving drug resistance or increased polymerase expression, which would rapidly overcome DNA replication stress,** we deleted the CTF4 gene, which encodes a non-essential subunit of the DNA replication machinery (the replisome) (Kouprina NYu, Pashina, Nikolaishwili, Tsouladze, & Larionov, 1988). Ctf4 is a homo-trimer that functions as a structural hub within the replisome (Villa et al., 2016; Yuan et al., 2019) by binding to the replicative DNA helicase, primase (the enzyme that makes the RNA primers that initiate DNA replication), and other accessory factors (Gambus et al., 2009; Samora et al., 2016; Simon et al., 2014; Villa et al., 2016). In the absence of Ctf4, the Pol**a-primase and other lagging strand processing factors are poorly recruited to the replisome (Samora et al., 2016; Tanaka et al., 2009; Villa et al., 2016), causing several characteristic features of DNA replication stress, such as accumulation of single strand DNA (ssDNA) gaps (Abe et al., 2018; Fumasoni et al., 2015), reversed and stalled forks (Fumasoni & Murray, 2020; Fumasoni et al., 2015), cell cycle checkpoint activation (Poli et al., 2012; Tanaka et al., 2009) and altered chromosome metabolism (Hanna, Kroll, Lundblad, & Spencer, 2001; Kouprina et al., 1992). As a consequence of these defects, ctf4**D cells have substantially reduced reproductive fitness (Fumasoni & Murray, 2020).**”

      Would the authors expect to see similar routes of adaptation if a 'genomic architecture' with a less severe/other replication defect would have been used? I realize the last question is perhaps difficult to address without actually doing the experiment (which I am not suggesting the authors should do); I just want to point out that perhaps some data should not be over-generalized.

      We share the reviewer’s interest in asking whether different forms of DNA replication stress would lead to the same results described, and we plan to rigorously investigate this question in a separate paper. We note that the careful comparison between different forms of DNA replication stress has never been made and that authors studying this phenomenon often rely on a single perturbation to induce DNA replication stress (Crabbé et al., 2010; Wilhelm et al., 2019; Zheng et al., 2016). We agree that such a comparison will be useful, but we believe (as indicated by the reviewer) it will require an amount of work that goes beyond the scope of our study. To avoid over-generalization, we are using now using “a form of DNA replication stress” in lines 33, 244, 401, 414 and 461, to make it clear that our conclusions (as opposed to inferences and speculations) are restricted to the response to a single example of replication stress.

      Likewise, why was RAD52 selected as the gene to delete to affect homologous recombination? I understand that it is a key gene, but on the flipside, absence of RAD52 affects multiple cellular pathways and (as the authors also observe in their populations) also results in increased mutation rates which might confound some of the results.

      We aimed to observe the largest deficiency in DNA recombination possible and therefore chose to delete RAD52 because of its many roles in different forms of homologous recombination (Pâques & Haber, 1999) . The choice of other genes, such as RAD51, would have inhibited canonical double strand break (DSB) repair, but allowed other mechanisms that can rescue stalled replication forks (Ait Saada, Lambert, & Carr, 2018), such as break induced replication (BIR) or single strand annealing (SSA) (Ira & Haber, 2002).

      Our position regarding the inevitable increase in mutations rates obtained while working with genome maintenance process has been instead elaborated in response to reviewer #1 above.

      A sentence describing our choice to delete RAD52 has now been included at line 86:

      “…as well as from haploids impaired in homologous recombination due to the deletion of RAD52 (Figure 1A), which encodes a conserved enzyme required for pairing homologous DNA sequences during recombination (Pâques & Haber, 1999). Because Rad52 is involved in different forms of homologous recombination, it’s absence produces the most severe recombination defects and thus allows us to achieve the largest recombination defect achievable with a single gene deletion (Symington, 2002)..”

      Related to the first comment, it is also unclear to me how well the system chosen by the authors is representative of the replication stress experienced by tumor cells (as briefly touched upon in the final section of the discussion). Are some of the homologs key oncogenes that drive carcinogenesis?

      We should have been clearer. Our goal was to argue that the lesions and responses produced by replication stress in tumor cells, such as stalled replication forks and checkpoint activation, were similar to those seen in yeast cells lacking Ctf4. We did not mean to imply removing Ctf4 from yeast cells had the same effects on cell proliferation and survival as inactivating tumor suppressors and activating proto-oncogenes have in mammalian cells. Despite the difference between direct (removing Ctf4) and indirect effects on DNA replication (tumor cells), the replication intermediates (ssDNA, stalled and reversed forks), the cell cycle defects (G2/M delay), the genetic instability (increased mutagenesis and chromosome loss) and chromosome dynamics (late replication zones and chromosome bridges) generated by the absence of Ctf4 are similar to those observed in oncogene-induced DNA replication stress in mammalian cells (Kotsantis, Petermann, & Boulton, 2018). We therefore believe our experiments reveal evolutionary responses to a constitutive DNA replication stress that resembles the replication stress seen in cancer cells. Nevertheless, we agree that the comparison with cancer evolution remains speculative and we therefore avoided mentioning cancer in the title our paper or our conclusions, and only discuss it in a speculative section of the discussion.

      We have modified this section of the discussion as follows (line 554):

      “While generated through a different mechanism (unrestrained proliferation, rather than replisome perturbation), oncogene induced DNA replication stress produces cellular consequences (Kotsantis et al., 2018) which are remarkably similar to those seen in the absence of Ctf4, such as the accumulation of ssDNA, stalled and reversed forks (Abe et al., 2018; Fumasoni & Murray, 2020; Fumasoni et al., 2015), genetic instability (Fumasoni et al., 2015; Hanna et al., 2001; Kouprina et al., 1992) and DNA damage response activation (Poli et al., 2012; Tanaka et al., 2009). Based on these similarities we speculate that evolutionary adaptation to DNA replication stress could reduce its negative effects on cellular fitness and thus assist tumor evolution.”

      The authors should consider rephrasing some sentences regarding the occurrence of adaptive mutations. Sentences such as 'which genes are mutated depends on the selective advantage' (p1; lines 15-16); 'genome architecture controls the frequency at which mutations occur' (p15), "genome architecture controls which genes are mutated" (p1, line 20) makes it sound like the initial occurrence of mutations is not random, whereas in reality, the mutational landscape is the result of the combined effect of occurrence and fitness effect of the mutations, with the later rather than the former likely being the main driver behind the observed patterns.

      We thank the reviewer for asking for more precision in the above sentences, whose proposed changes we now list:

      “Mutations in individual genes are selected at different frequencies in different architectures, but the benefits these mutations confer are similar in all three architectures, and combinations of these mutations reproduce the fitness gains of evolved populations.” (Lines 13-15)

      “Genome architecture influences the distribution of adaptive mutants” (Line 277)

      "genome architecture influences the frequency at which mutations occur, the fitness benefit they confer, and the extent of overall adaptation." (Lines 462-463)

      Some important methodological information is missing or unclear in the manuscript:

      The authors should provide more details on how they decided which clones to select for sequencing. Did they select the biggest colonies; were colonies picked randomly, ...

      This following sentence is now reported in the materials and methods section (Line 603)

      “To capture the within-population genetic variability we selected the clones displaying the largest divergence of phenotypes in terms of resistance to genotoxic agents (methyl-methanesulfonate, hydroxyurea and camptothecin).”

      What is the population size during the evolution experiment?

      We now added the following sentence at line 599:

      “In this regime, the effective population size is calculated as N0 x g where N0 is the size of the population bottleneck at transfer and g is the number of generations achieved during a batch growth cycle and corresponds to approximately to 107 cells.”

      Sequencing of populations and clones: coverage should be mentioned

      The following sentence has now been added at line 616:

      “Clones and populations were sequenced at approximately the following depths: 25-30X for haploid clones, 50-60X for diploid clones, 50-60X for haploid populations and 120-130X for diploid populations.”

      Identification of mutations (p19, line 573): Is this really how the authors defined whether a variant is a mutation? Based on the definition given here, DNA mutations that lead to a synonymous mutation in the protein are not considered as mutations?

      We apologize for this typo. We do identify and consider synonymous mutations as evidenced by Figure 3-S1B. Now the sentence at line 626 correctly reports:

      “A variant that occurs between the ancestor and an evolved strain is labeled as a mutation if it either (1) causes a substitution in a coding sequence or (2) occurs in a regulatory region, defined as the 500 bp upstream and downstream of the coding sequence.”

      Perhaps the information can be found elsewhere, but the source data excel files for mutations is incomplete and should at the very least contain information on the type of mutation (eg. T->A), as well as the location of this mutation in the respective gene.

      Perhaps the reviewer is referring to Supplementary table 2, where we list the number of times a gene has been mutated in different populations (and thus summaries different types of mutations affecting the same gene). The information they request is reported in Supplementary table 1 for all the variants detected in populations and clones sequencing.

      **Minor comments** • While the author already cite several significant papers relevant for their manuscript, some other studies could also be included:

      We thank the reviewer for highlighting these references, which are now cited at line 28

      From the text in the abstract, it is unclear what the three genomic architectures (line 13) exactly are, the authors should consider spelling this out.

      In repose o the reviewer request for clarity we now propose the following change in line 13:

      “We asked how these trajectories depend on a population’s genome architecture by comparing the adaptation of haploids to that diploids and recombination deficient haploids.” (Lines 9-11)

      Can the authors speculate on why a homozygous ctf4D/ctf4D rad52D/rad52D would be lethal, and a haploid not?

      See below

      The authors note that a diploid ctf4D/ctf4D strain is less fit than its haploid counterpart. Why do the authors think this is the case?

      In response to the two previous questions, we now propose the following speculations that we include in the text (Line 97):

      “Diploid cells require twice as many forks as haploids and Ctf4-deficient diploids are thus more likely to have forks that cause severe cell-cycle delays or cell lethality. We speculate that this increased probability explains the more prominent fitness defect displayed by diploid cells. Interestingly, homologs of Ctf4 are absent in prokaryotes, where the primase is physically linked to the replicative helicase (Lu, Ratnakar, Mohanty, & Bastia, 1996) and Ctf4 is essential in the cells of eukaryotes with larger genomes such as chickens (Abe et al., 2018) and humans (Yoshizawa-Sugata & Masai, 2009). Rad52 is likely involved in rescuing stalled replication forks by recombination-dependent mechanisms (Fumasoni et al., 2015; Yeeles, Poli, Marians, & Pasero, 2013). We speculate that the absence of Rad52 increases the duration of these stalls and leads some of them to become double-stranded breaks resulting in cell lethality and explaining the decreased fitness of ctf4D rad52D haploid double mutants. In diploids ctf4D rad52D cells, which have twice as many chromosomes, the number of irreparably stalled fork may be sufficient to kill most of the cells in a population, thus explaining the unviability of the strain.”

      The authors passage their cells for 100 cycles and assume that this corresponds to around 1000 generations for each population. However, the fitness differences between the different starting strains (see also Figure 1B) are likely to cause considerable differences in number of generations between the different strains. Do the authors have more precise measurements of number of generations per population? If not, perhaps it should be noted that some lineages may have undergone more doublings than others, and perhaps also discuss if and how this could influence the results?

      In a batch culture regime, where populations are allowed to reach saturation after each dilution, the number of generations at each passage are dictated by the dilution factor (Van den Bergh, Swings, Fauvart, & Michiels, 2018). A dilution of 1:1000 from a saturated culture will allow for approximately 10 generations before populations reach a new saturated phase. As long as saturation is allowed to occur, this number is independent of the fitness of the cultured strains: Slower-dividing strains will simply employ more time to reach saturation after each dilution. At the beginning of the experiment, we had to dilute the ctf4D rad52**D strains being passaged every 48hrs instead of 24hrs. After generation 50, ctf4D rad52**D strains reached saturation within 24hrs and were then diluted daily. The total count considers the number of passages a culture has undergone, and not the number of days of culture, and thus should guarantee approximately the same number of generations in all three genome architectures.

      Panel A of figure 1A is somewhat confusing; as this seems to indicate that the ctf4∆ was introduced after strains were made, for example, haploid recombination deficient (which is not how these strains were constructed). Perhaps a better way of representing would be to have the indication of DNA replication stress pictured inside the yeast cells.

      We have modified Figure 1A to better represent the way the strains were constructed. For space reasons we have not represented a perturbed fork within each cell, but rather above all of them.

      Legend to Figure 1: is fitness expressed relative to haploid or diploid WT cells for the diploid strains?

      We apologize for having missed this detail in the figure legends. Throughout the figures, haploid and diploid cells were competed against reference strains with the same ploidy. We now add this sentence in Figure 1 and in the materials and methods (line 686).

      Figure 3: to improve readability of this figure, the authors could consider placing the legend of the different symbols (#, *,..) in the figure as well and not just in the figure legend.

      We now include the symbols legend in Figure 3.

      Figure 5 shows Indels, but if I am correct, these mutations are not discussed in the text; nor is it mentioned what the authors used as a cut-off to determine indels (the authors use the term 'small indels' without defining it)? For example, the data shown in Figure 3 and Figure 4 only includes SNPs and not indels (correct?) - but the indels should also be taken into account when investigating which modules are hit.

      Gapped alignments of the relatively long 150 paired-end reads in our data set permits the identification of small indels ranging in size from 1–55 bp using VarScan pileup2indels tool (Koboldt et al., 2012). All small indels (and the respective sequence affected) are listed together with SNPs in Supplementary table 1. Figure 3A, Figure 4 and Figure 5B are representation of ‘gene mutations’ which include both SNPs and small InDels. Large chromosomal Insertion and deletions, not detectable by short read gap alignment are instead identified using the VarScan pileup2copynumber tool (Koboldt et al., 2012), and are represented as amplifications or deletions in Figure 3B and 5C.

      The following sentence has been added to the material and methods at line 629:

      “Gapped alignments of the 150 paired-end reads in our data set permits the identification of small indels ranging in size from 1–55 bp using VarScan pileup2indels tool (Koboldt et al., 2012). All small indels (and the respective sequence affected) are listed together with SNPs in Supplementary table 1.”

      The following definition has been added in Figure legends 3A, 4 and 5A and B.

      “Gene mutations (SNPs and small InDels 1-55bp)”

      Figure 5 mentions: # gene mutations. So these are only the mutations in genes, and not in their up- or downstream regulatory regions?

      We use a broader definition of a gene, not restricted to the open reading frame, and including its regulatory regions. The following definition has been added to figure 5’s legend.

      “Frequency of SNPs and small InDels (1-55bp) affecting genes (Open reading frames and associated regulatory regions).”

      Figure 3-S1: labels of C panels are missing.

      Labels are now included in Figure 3-S1

      Figure 3-S1, panel B: why did the authors focus on synonymous mutations?

      The panel B is commented upon in line 186 and contrasted with panel A to argue that the increased number of mutations detected in ctf4∆ rad52∆ strains is due to a higher mutation rate(which is expected to increase synonymous mutations) instead of an higher number of adaptive mutations (which are less likely to be synonymous) being selected.

      Reviewer #2 (Significance (Required)): This is a solid and clearly written study, comprising a large body of work that is generally well presented and that will be of interest to scientists active in the field of (experimental) evolution and replication. However, many aspects studied in this manuscript have already been studied and reported before; including the recent eLife paper by the same group, as well as studies by other labs that have investigated how genome architecture / genotype affects evolutionary trajectories, the effect of ploidy on evolution, .... Because of this, I do feel that the authors should put their findings more in the context of existing literature context, including a general description of which results are truly novel, which confirm previous findings and which results seem to go against previous reports. This is already so at some points in the text, but I feel this could be done even more.

      We now rephrase the following paragraphs in our discussion to better highlight the main conclusions in contrast to the existing literature:

      “Engineering one mutation in each module into an ancestral strain lacking Ctf4 is enough to produce the evolved fitness increase in all three genomic architectures. Furthermore, engineering mutations in individual genes confer benefits in all three architectures (Fig. 6A) ,even in those where the mutations in these genes was rare, and combining these mutations recapitulated the evolved fitness increase in all three architectures (Fig. 6B). Altogether our results demonstrate the existence of a common pathway for yeast cells to adapt to a form of constitutive DNA replication stress.” (Lines 409-414)

      “Our results thus go against the trend of slower adaptation in diploids as compared to haploids reported by the majority other studies (A. C. Gerstein, Cleathero, Mandegar, & Otto, 2011; Marad, Buskirk, & Lang, 2018; Zeyl, Vanderford, & Carter, 2003). This effect is not limited to populations experiencing DNA replication stress (Figure 2A) but is also present in control wild-type populations (Figure 2B). Our results support the idea that the details of genotypes, selections, and experimental protocols can determine the effect of ploidy on adaptation.” (Lines 437-442)

      “Our results therefore agree with previous reports observing declining adaptability across strains with different initial fitness but largely fail to observe diminishing return epistasis as a potential justification of this phenomenon. Our experiments and two previous evolutionary repair experiments (Hsieh et al., 2020; Laan et al., 2015) both show interactions that are approximately additive between different selected mutations. The reasons for this difference are currently unknown.” (Lines 450-455)

      Additionally, I think the authors should be more careful not to over-generalize their findings, which come from only a few specific genetic manipulations that might not be representative for general replication stress. For example (p15), can the authors really claim that they have unraveled general principles of adaptation to constitutive DNA replication stress? Perhaps a better motivation of the choice of ctf4 as a model mutation for DNA replication stress could also help (see also my earlier comments). A similar comment applies to the molecular mechanisms affecting adaptation in diploid cells - what evidence do the authors have that their findings are not specific to the one specific type of diploid strain they used in their study? Adding a bit more background information or nuance for some of the claims would help tackle this issue.

      We now followed the suggestions made previously by the reviewer to justify our experimental choices better and to use a language that avoids over-generalizations.

      Field of expertise of this reviewer: genetics, evolution, genomics

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): **Summary:** Here the authors carry out an evolution experiment, propagating replicate populations of the budding yeast with the CTF 4 gene deleted in three different genetic backgrounds: haploid , diploid and recombination deficient (RAD52 deletion). The authors find that the rate of evolution depends on the initial fitness of the different genetic backgrounds which is consistent with a repeated finding of evolution experiments: that beneficial mutations tend to have a smaller fitness effect in high fitness genetic backgrounds. Curiously even though the targets of selection tended to be specific to each of the three different genetic backgrounds, genetic reconstruction experiments showed beneficial mutations convert a fitness increase in all genetics backgrounds. The authors go on to provide a plausible explanation for why each of the three genetic backgrounds are predisposed to certain types of beneficial mutations. Overall, these results provide important context and caveats for an emerging consensus that genetic background determines the rate of evolution, a comprehensive molecular breakdown of adaptation to DNA replication stress and a mechanistic explanation for why different beneficial mutations are favoured in diploids, haploids and recombination deficient strains. This is a well-executed study that is beautifully presented and easy to follow. This will be of great interest to those in the experimental evolution community and the data an excellent resource.

      We thank reviewer #3 for emphasizing that reconstructed mutations are beneficial even in architectures where they were not ultimately detected at the end of the experiment. We have now highlighted this point in our conclusions as a response to the reviewer’s #1 and #2 request for more clarity regarding our novel findings.

      “We find that the genes that acquire adaptive mutations, the frequency at which they are mutated, and the frequency at which these mutations are selected all differ between architectures but that mutations that confer strong benefits can occur in all three modules in each architecture. Engineering one mutation in each module into an ancestral strain lacking Ctf4 is enough to produce the evolved fitness increase in all three genomic architectures. Furthermore, reconstruction of a panel of mutations into all three architectures proved they are adaptive even in architectures where the affected genes were not found significantly mutated by the end of the experiment. Altogether our results demonstrate the existence of a common pathway for yeast cells to adapt to a form of constitutive DNA replication stress.” (Lines 405-414)

      **Major comments:**

      • Are the key conclusions convincing? Yes, the convergent evolution analysis, fitness assays, and genetic reconstructions are sufficient to characterise the genetic causes of adaptation in this experiment, and are of the highest standard. The authors do particularly well to fully recover the fitness increases that evolved with their genetic reconstructions, which imparts a completeness to their understanding of what happened in their evolution experiment.
      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? No, in nearly all cases the authors make reasonable claims. One exception is on L419 in the discussion, where the authors speculate why some mutations do not follow diminishing returns epistasis, but this idea does not really have any basis (no citation or reasons to suggest that DNA repair genes are less connected with other genes in the genome). If the authors cannot support this statement, it should be removed, and instead write that is currently unknown why some individual mutations do not follow the pattern of diminishing returns.

      On reflection, we agree with the reviewer and now state,

      “Our results confirm previous reports observing declining adaptability across strains with different initial fitness but largely fail to observe diminishing return epistasis as a potential justification of this phenomenon. Our experiments and two previous evolutionary repair experiments (Hsieh et al., 2020; Laan et al., 2015) both show interactions that are approximately additive between different selected mutations. The reasons for this difference are currently unknown.

      A hypothesis, which would need experimental validation, could be that the different mutations have different degrees of epistatic interactions with the rest of the genome. Ixr1, whose mutation follows diminishing return epistasis, is a transcription factor that could in principle affect the expression of many other genes implicated in different cellular modules. Sld5, Scc2 and Rad9 instead, whose mutations have the same effect across different genome architectures, having more mechanistic roles in genome maintenance may have strong epistatic interactions only with a restricted number of cellular modules implicated with DNA metabolism.

      • Would additional experiments be essential to support the claims of the paper? No.
        • Are the data and the methods presented in such a way that they can be reproduced? Yes, but some more details are needed for the convergent evolution analysis, see minor comments.
        • Are the experiments adequately replicated and statistical analysis adequate? Yes, but some more statistic reporting in the main text or figure legends would be helpful, for example. L159: Please report the statistical test, test statistic and p value in the text or in the figure legend. Currently significance is indicated, but the methods do not specify the test.

      We apologize for the lack of clarity in the main text. The test used for all fitness analysis was only reported in the materials and methods as follow:

      “The P-values reported in figures are the result of t-tests assuming unequal variances (Welch’s test)”

      We now include the test and the associated p-value in line 184, and write the above sentence in all the relevant figures.

      This should also be done for the GO analysis shown in figure 3A.

      We thank reviewer #3 pointing out this omission. We now include the following section:

      “Gene ontology (GO) enrichment analysis:

      The list of genes with putatively selected mutations (Figure 3A) or homozygous mutations in diploids (Figure 4) were input as ‘multiple proteins’ in the STRING database, which reports on the network of interactions between the input genes (https://string-db.org). The GO term enrichment analysis provided by STRING are reported in Supplementary Table 3 and Supplementary Table 6 respectively. Briefly, the strength of the enrichment is calculated as Log10(O/E), where O is the number of ‘observed’ genes in the provided list (of length N) which belong to the GO-term, and E is the number of ‘expected’ genes we would expect to find matching the GO-term providing a list of the same length N made of randomly picked genes. P-values are computed using a Hypergeometric test and corrected for multiple testing using the Benjamini-Hochberg procedure. The resulting P-values are represented as ‘False discovery rate’ in the supplementary tables and describe the significance of the GO terms enrichment (Franceschini et al., 2013).”

      **Minor comments:**

      • Specific experimental issues that are easily addressable. Not a new experiment, but extra details are required. The authors carried out both clone and whole population sequencing. For their convergent evolution analysis, what is the criteria for a mutation to be included- ie, does it need to be fixed, have attained a certain frequency? This is important- if the criteria were low (say 5%), it would be important to know whether gene A had fixed in 4 populations, while gene B had attained a frequency of 10% in 5 populations. As it stands both would be described as examples of convergent evolution. This can be handled by providing these details in the methods.

      For the population sequencing we disregarded variants found at less than 25% and 35% of the reads in haploid and diploid populations respectively as we observed they were largely the product of alignment errors. All the variants found at frequencies higher than the thresholds indicated were used for the parallel evolution analysis. The frequency at which each individual variant was detected in each population is reported in Supplementary table 1, while the average frequency at which a gene has been found mutated across different populations is reported in Supplementary table 2. The reason why we didn’t solely focus on fixed mutations for our convergent evolution analysis was that from previous work we knew of the existence of clonal interference which kept the frequency of verified adaptive mutations that coexisted in the same population (e.g. ixr1 and sld5) well below 90% (Fumasoni & Murray, 2020).

      For clarity we now add the following sentence in the material and methods:

      “Variants found in less than 25% and 35% of the reads in haploid and diploid populations respectively were discarded, since many of these corresponded to misalignment of repeated regions. For clone sequencing, only variants found in more than 75% of the reads in haploids and 35% of the reads in diploids (to account for heterozygosity) were considered mutations. The frequency of the reads associated with all the variants detected are reported in Supplementary table 1”

      • Are prior studies referenced appropriately? I note that the authors use the term declining adaptability where as other papers use the term diminishing returns epistasis- I am sure the authors have good reasons for their choice of nomenclature but I think it would be helpful for their readers to connect this work to other work by mentioning that declining adaptability is also referred to as diminishing returns.

      We use both terms (for instance in line 446 and line 448) with a different meaning : By ‘declining adaptability’ we refer the phenomenon where more fit strains display lower adaptation rates than less fit ones. By ‘diminishing returns epistasis’ we refer to a possible explanation of such a phenomenon, where adaptive mutations have different fitness effects due to their ‘global’ epistatic interactions with other alleles. It has to be noted that ‘diminishing returns epistasis’ is not the only proposed explanation of the phenomenon of declining adaptability (Couce & Tenaillon, 2015). In our case, we do find evidence of declining adaptability but very limited evidence for diminishing return epistasis (only 1 mutation in 5 has a different fitness effect in different architectures).

      A reference the authors have missed: L419, as well as citing the Desai Lab bioxive paper, they should cite another theory paper that obtained similar conclusions. Lyons, D.M., et al. https://doi.org/10.1038/s41559-020-01286-y.

      We thank the reviewer for the suggested reference, which is now cited at line 450.

      • Are the text and figures clear and accurate? This paper is beautifully written and easy to follow, a lot of thought has gone into the figures which are aesthetically pleasing and easy to navigate.

        • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? No.

        **Typos**

        L32 "do" should be "to" L95 analyzed L219 are the authors referring to ref 15 here? I think so, but please specify

      We thank the reviewer for carefully finding the typos, which are now all corrected.

      Reviewer #3 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. This paper is an important conceptual result and an immediate advance for basic research. The authors have done an outstanding job of showing the potential for the clinical translation of this research, especially regarding cancer biology.
      • Place the work in the context of the existing literature (provide references, where appropriate). This study follows up on and builds upon an earlier paper by these same authors published in E-life in 2020. Conceptually this work is most closely related to work in Michael Desai's, Sergey Kryazhimskiy's, Tim Coopers and Chris Marx's labs work looking at diminishing returns epistasis in yeast, and studies contrasting evolution of haploids and diploids led by Greg Lang's and Sarah Otto's labs.
      • State what audience might be interested in and influenced by the reported findings. This work will be of great interest to the Experimental evolution and molecular evolution communities and also of interest to those who study cancer genomics and DNA replication and repair.
      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Microbial experimental evolution.

      REFERENCES CITED IN THE REVIEW RESPONSE

      Abe, T., Kawasumi, R., Giannattasio, M., Dusi, S., Yoshimoto, Y., Miyata, K., … Branzei, D. (2018). AND-1 fork protection function prevents fork resection and is essential for proliferation. Nature Communications, 9(1), 3091. https://doi.org/10.1038/s41467-018-05586-7

      Ait Saada, A., Lambert, S. A. E., & Carr, A. M. (2018, November 1). Preserving replication fork integrity and competence via the homologous recombination pathway. DNA Repair. Elsevier B.V. https://doi.org/10.1016/j.dnarep.2018.08.017

      Couce, A., & Tenaillon, O. A. (2015). The rule of declining adaptability in microbial evolution experiments. Frontiers in Genetics, 6(MAR), 99. https://doi.org/10.3389/fgene.2015.00099

      Crabbé, L., Thomas, A., Pantesco, V., De Vos, J., Pasero, P., & Lengronne, A. (2010). Analysis of replication profiles reveals key role of RFC-Ctf18 in yeast replication stress response. Nature Structural & Molecular Biology, 17(11), 1391–1397. https://doi.org/10.1038/nsmb.1932

      Desai, M. M., Fisher, D. S., & Murray, A. W. (2007). The speed of evolution and maintenance of variation in asexual populations. Current Biology : CB, 17(5), 385–394. https://doi.org/10.1016/j.cub.2007.01.072

      Deutschbauer, A. M., Jaramillo, D. F., Proctor, M., Kumm, J., Hillenmeyer, M. E., Davis, R. W., … Giaever, G. (2005). Mechanisms of Haploinsufficiency Revealed by Genome-Wide Profiling in Yeast. Genetics, 169(4), 1915–1925. https://doi.org/10.1534/GENETICS.104.036871

      Franceschini, A., Szklarczyk, D., Frankild, S., Kuhn, M., Simonovic, M., Roth, A., … Jensen, L. J. (2013). STRING v9.1: Protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Research, 41(D1), D808. https://doi.org/10.1093/nar/gks1094

      Fumasoni, M., & Murray, A. W. (2020). The evolutionary plasticity of chromosome metabolism allows adaptation to constitutive DNA replication stress. ELife, 9. https://doi.org/10.7554/eLife.51963

      Fumasoni, M., Zwicky, K., Vanoli, F., Lopes, M., & Branzei, D. (2015). Error-Free DNA Damage Tolerance and Sister Chromatid Proximity during DNA Replication Rely on the Polα/Primase/Ctf4 Complex. Molecular Cell, 57(5), 812–823. https://doi.org/10.1016/j.molcel.2014.12.038

      Gambus, A., van Deursen, F., Polychronopoulos, D., Foltman, M., Jones, R. C., Edmondson, R. D., … Labib, K. (2009). A key role for Ctf4 in coupling the MCM2-7 helicase to DNA polymerase alpha within the eukaryotic replisome. EMBO J., 28(19), 2992–3004. https://doi.org/10.1038/emboj.2009.226

      Gerstein, A. C., Cleathero, L. A., Mandegar, M. A., & Otto, S. P. (2011). Haploids adapt faster than diploids across a range of environments. Journal of Evolutionary Biology, 24(3), 531–540. https://doi.org/10.1111/j.1420-9101.2010.02188.x

      Gerstein, Aleeza C., & Otto, S. P. (2011). Cryptic fitness advantage: Diploids invade haploid populations despite lacking any apparent advantage as measured by standard fitness assays. PLoS ONE, 6(12), 26599. https://doi.org/10.1371/journal.pone.0026599

      Gerstein, Aleeza C, Chun, H.-J. E., Grant, A., & Otto, S. P. (2006). Genomic Convergence toward Diploidy in Saccharomyces cerevisiae. PLoS Genetics, 2(9), e145. https://doi.org/10.1371/journal.pgen.0020145

      Hanna, J. S., Kroll, E. S., Lundblad, V., & Spencer, F. a. (2001). Saccharomyces cerevisiae CTF18 and CTF4 are required for sister chromatid cohesion. Mol Cell Biol., 21(9), 3144–3158. https://doi.org/10.1128/MCB.21.9.3144-3158.2001

      Harari, Y., Ram, Y., Rappoport, N., Hadany, L., & Kupiec, M. (2018). Spontaneous Changes in Ploidy Are Common in Yeast. Current Biology, 28(6), 825-835.e4. https://doi.org/10.1016/j.cub.2018.01.062

      Hsieh, Y. Y. P., Makrantoni, V., Robertson, D., Marston, A. L., & Murray, A. W. (2020). Evolutionary repair: Changes in multiple functional modules allow meiotic cohesin to support mitosis. PLoS Biology, 18(3), e3000635. https://doi.org/10.1371/journal.pbio.3000635

      Ira, G., & Haber, J. E. (2002). Characterization of RAD51-independent break-induced replication that acts preferentially with short homologous sequences. Molecular and Cellular Biology, 22(18), 6384–6392. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/12192038

      Koboldt, D. C., Zhang, Q., Larson, D. E., Shen, D., McLellan, M. D., Lin, L., … Wilson, R. K. (2012). VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research, 22(3), 568–576. https://doi.org/10.1101/gr.129684.111

      Kotsantis, P., Petermann, E., & Boulton, S. J. (2018, May 1). Mechanisms of oncogene-induced replication stress: Jigsaw falling into place. Cancer Discovery. American Association for Cancer Research Inc. https://doi.org/10.1158/2159-8290.CD-17-1461

      Kouprina, N., Kroll, E., Bannikov, V., Bliskovsky, V., Gizatullin, R., Kirillov, A., … et al. (1992). CTF4 (CHL15) mutants exhibit defective DNA metabolism in the yeast Saccharomyces cerevisiae. Mol Cell Biol., 12(12), 5736–5747. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/1341195

      Kouprina NYu, Pashina, O. B., Nikolaishwili, N. T., Tsouladze, A. M., & Larionov, V. L. (1988). Genetic control of chromosome stability in the yeast Saccharomyces cerevisiae. Yeast (Chichester, England), 4(4), 257–269. https://doi.org/10.1002/yea.320040404

      Kryazhimskiy, S., Rice, D. P., Jerison, E. R., & Desai, M. M. (2014). Microbial evolution. Global epistasis makes adaptation predictable despite sequence-level stochasticity. Science (New York, N.Y.), 344(6191), 1519–1522. https://doi.org/10.1126/science.1250939

      Laan, L., Koschwanez, J. H., & Murray, A. W. (2015). Evolutionary adaptation after crippling cell polarization follows reproducible trajectories. ELife, 4, e09638. https://doi.org/10.7554/eLife.09638

      Lu, Y. B., Ratnakar, P. V, Mohanty, B. K., & Bastia, D. (1996). Direct physical interaction between DnaG primase and DnaB helicase of Escherichia coli is necessary for optimal synthesis of primer RNA. Proc. Natl. Acad. Sci. USA, 93(23), 12902–12907. https://doi.org/10.1073/pnas.93.23.12902

      Macheret, M., & Halazonetis, T. D. (2015). DNA replication stress as a hallmark of cancer. Annu. Rev. Pathol. Mech. Dis., 10, 425–448. https://doi.org/10.1146/annurev-pathol-012414-040424

      Marad, D. A., Buskirk, S. W., & Lang, G. I. (2018). Altered access to beneficial mutations slows adaptation and biases fixed mutations in diploids. Nature Ecology & Evolution, 2(5), 882–889. https://doi.org/10.1038/s41559-018-0503-9

      Pâques, F., & Haber, J. E. (1999). Multiple pathways of recombination induced by double-strand breaks in Saccharomyces cerevisiae. Microbiology and Molecular Biology Reviews : MMBR, 63(2), 349–404. https://doi.org/

      Poli, J., Tsaponina, O., Crabbé, L., Keszthelyi, A., Pantesco, V., Chabes, A., … Rothstein, R. (2012). dNTP pools determine fork progression and origin usage under replication stress. EMBO J., 31(4), 883–894. https://doi.org/10.1038/emboj.2011.470

      Samora, C. P., Saksouk, J., Goswami, P., Wade, B. O., Singleton, M. R., Bates, P. A., … Masai, H. (2016). Ctf4 Links DNA Replication with Sister Chromatid Cohesion Establishment by Recruiting the Chl1 Helicase to the Replisome. Molecular Cell, 63(3), 371–384. https://doi.org/10.1016/j.molcel.2016.05.036

      Simon, A. C., Zhou, J. C., Perera, R. L., van Deursen, F., Evrin, C., Ivanova, M. E., … Pellegrini, L. (2014). A Ctf4 trimer couples the CMG helicase to DNA polymerase alpha in the eukaryotic replisome. Nature, 510(7504), 293–297. https://doi.org/10.1038/nature13234

      Symington, L. S. (2002). Role of RAD52 Epistasis Group Genes in Homologous Recombination and Double-Strand Break Repair. Microbiology and Molecular Biology Reviews, 66(4), 630–670. https://doi.org/10.1128/mmbr.66.4.630-670.2002

      Tanaka, H., Katou, Y., Yagura, M., Saitoh, K., Itoh, T., Araki, H., … Shirahige, K. (2009). Ctf4 coordinates the progression of helicase and DNA polymerase alpha. Genes to Cells, 14(7), 807–820. https://doi.org/10.1111/j.1365-2443.2009.01310.x

      Thompson, D. A., Desai, M. M., & Murray, A. W. (2006). Ploidy controls the success of mutators and nature of mutations during budding yeast evolution. Current Biology : CB, 16(16), 1581–1590. https://doi.org/10.1016/j.cub.2006.06.070

      Van den Bergh, B., Swings, T., Fauvart, M., & Michiels, J. (2018). Experimental Design, Population Dynamics, and Diversity in Microbial Experimental Evolution. Microbiol Mol Biol Rev., 82(3), e00008-18. https://doi.org/10.1128/MMBR.00008-18

      Venkataram, S., Dunn, B., Li, Y., Agarwala, A., Chang, J., Ebel, E. R., … Petrov, D. A. (2016). Development of a Comprehensive Genotype-to-Fitness Map of Adaptation-Driving Mutations in Yeast. Cell, 166(6), 1585-1596.e22. https://doi.org/10.1016/J.CELL.2016.08.002

      Vesela, E., Chroma, K., Turi, Z., & Mistrik, M. (2017a). Common Chemical Inductors of Replication Stress: Focus on Cell‐Based Studies. Biomolecules, 7(1), 19. https://doi.org/10.3390/biom7010019

      Vesela, E., Chroma, K., Turi, Z., & Mistrik, M. (2017b, February 21). Common chemical inductors of replication stress: Focus on cell-based studies. Biomolecules. MDPI AG. https://doi.org/10.3390/biom7010019

      Villa, F., Simon, A. C., Ortiz Bazan, M. A., Kilkenny, M. L., Wirthensohn, D., Wightman, M., … Dutta, A. (2016). Ctf4 Is a Hub in the Eukaryotic Replisome that Links Multiple CIP-Box Proteins to the CMG Helicase. Molecular Cell, 63(3), 385–396. https://doi.org/10.1016/j.molcel.2016.06.009

      Wilhelm, T., Olziersky, A. M., Harry, D., De Sousa, F., Vassal, H., Eskat, A., & Meraldi, P. (2019). Mild replication stress causes chromosome mis-segregation via premature centriole disengagement. Nature Communications, 10(1), 1–14. https://doi.org/10.1038/s41467-019-11584-0

      Yeeles, J. T., Poli, J., Marians, K. J., & Pasero, P. (2013). Rescuing stalled or damaged replication forks. Cold Spring Harbor Perspectives in Biology, 5(5), a012815. https://doi.org/10.1101/cshperspect.a012815

      Yoshizawa-Sugata, N., & Masai, H. (2009). Roles of human AND-1 in chromosome transactions in S phase. J Biol Chem., 284(31), 20718–20728. https://doi.org/10.1074/jbc.M806711200

      Yuan, Z., Georgescu, R., Santos, R. de L. A., Zhang, D., Bai, L., Yao, N. Y., … Li, H. (2019). Ctf4 organizes sister replisomes and Pol α into a replication factory. ELife, 8, e47405. https://doi.org/10.7554/eLife.47405

      Zeyl, C., Vanderford, T., & Carter, M. (2003). An Evolutionary Advantage of Haploidy in Large Yeast Populations. Science, 299(5606), 555–558. https://doi.org/10.1126/SCIENCE.1078417

      Zheng, D.-Q., Zhang, K., Wu, X.-C., Mieczkowski, P. A., & Petes, T. D. (2016). Global analysis of genomic instability caused by DNA replication stress in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A., 113(50), E8114–E8121. https://doi.org/10.1073/pnas.1618129113

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      This reviewer did not leave any comments

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      Here the authors carry out an evolution experiment, propagating replicate populations of the budding yeast with the CTF 4 gene deleted in three different genetic backgrounds: haploid , diploid and recombination deficient (RAD52 deletion). The authors find that the rate of evolution depends on the initial fitness of the different genetic backgrounds which is consistent with a repeated finding of evolution experiments: that beneficial mutations tend to have a smaller fitness effect in high fitness genetic backgrounds. Curiously even though the targets of selection tended to be specific to each of the three different genetic backgrounds, genetic reconstruction experiments showed beneficial mutations convert a fitness increase in all genetics backgrounds. The authors go on to provide a plausible explanation for why each of the three genetic backgrounds are predisposed to certain types of beneficial mutations. Overall, these results provide important context and caveats for an emerging consensus that genetic background determines the rate of evolution, a comprehensive molecular breakdown of adaptation to DNA replication stress and a mechanistic explanation for why different beneficial mutations are favoured in diploids, haploids and recombination deficient strains. This is a well-executed study that is beautifully presented and easy to follow. This will be of great interest to those in the experimental evolution community and the data an excellent resource.

      Major comments:

      • Are the key conclusions convincing? Yes, the convergent evolution analysis, fitness assays, and genetic reconstructions are sufficient to characterise the genetic causes of adaptation in this experiment, and are of the highest standard. The authors do particularly well to fully recover the fitness increases that evolved with their genetic reconstructions, which imparts a completeness to their understanding of what happened in their evolution experiment.
      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? No, in nearly all cases the authors make reasonable claims. One exception is on L419 in the discussion, where the authors speculate why some mutations do not follow diminishing returns epistasis, but this idea does not really have any basis (no citation or reasons to suggest that DNA repair genes are less connected with other genes in the genome). If the authors cannot support this statement, it should be removed, and instead write that is currently unknown why some individual mutations do not follow the pattern of diminishing returns.
      • Would additional experiments be essential to support the claims of the paper? No.
      • Are the data and the methods presented in such a way that they can be reproduced? Yes, but some more details are needed for the convergent evolution analysis, see minor comments.
      • Are the experiments adequately replicated and statistical analysis adequate? Yes, but some more statistic reporting in the main text or figure legends would be helpful, for example. L159: Please report the statistical test, test statistic and p value in the text or in the figure legend. Currently significance is indicated, but the methods do not specify the test. This should also be done for the GO analysis shown in figure 3A.

      Minor comments:

      • Specific experimental issues that are easily addressable. Not a new experiment, but extra details are required. The authors carried out both clone and whole population sequencing. For their convergent evolution analysis, what is the criteria for a mutation to be included- ie, does it need to be fixed, have attained a certain frequency? This is important- if the criteria were low (say 5%), it would be important to know whether gene A had fixed in 4 populations, while gene B had attained a frequency of 10% in 5 populations. As it stands both would be described as examples of convergent evolution. This can be handled by providing these details in the methods.
      • Are prior studies referenced appropriately? I note that the authors use the term declining adaptability where as other papers use the term diminishing returns epistasis- I am sure the authors have good reasons for their choice of nomenclature but I think it would be helpful for their readers to connect this work to other work by mentioning that declining adaptability is also referred to as diminishing returns.

      A reference the authors have missed: L419, as well as citing the Desai Lab bioxive paper, they should cite another theory paper that obtained similar conclusions. Lyons, D.M., et al. https://doi.org/10.1038/s41559-020-01286-y. .

      • Are the text and figures clear and accurate? This paper is beautifully written and easy to follow, a lot of thought has gone into the figures which are aesthetically pleasing and easy to navigate.
      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? No.

      Typos

      L32 "do" should be "to" L95 analyzed<br> L219 are the authors referring to ref 15 here? I think so, but please specify

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. This paper is an important conceptual result and an immediate advance for basic research. The authors have done an outstanding job of showing the potential for the clinical translation of this research, especially regarding cancer biology.
        • Place the work in the context of the existing literature (provide references, where appropriate). This study follows up on and builds upon an earlier paper by these same authors published in E-life in 2020. Conceptually this work is most closely related to work in Michael Desai's, Sergey Kryazhimskiy's, Tim Coopers and Chris Marx's labs work looking at diminishing returns epistasis in yeast, and studies contrasting evolution of haploids and diploids led by Greg Lang's and Sarah Otto's labs.
        • State what audience might be interested in and influenced by the reported findings. This work will be of great interest to the Experimental evolution and molecular evolution communities and also of interest to those who study cancer genomics and DNA replication and repair.
        • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. Microbial experimental evolution.
    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      This manuscript investigates the effect of an organism's genotype (or, as the authors call it, an organism's 'genome architecture') on evolutionary trajectories. For this, the authors use Saccharomyces cerevisiae strains that experience some form of replication stress due to specific gene deletions, and that further differ in ploidy and/or the type of gene(s) deleted. They find the same three functional modules (DNA replication, DNA damage checkpoint, sister chromatid cohesion) are affected across the 3 different genotypes tested; although the specific genes that are mutated varies.

      Major comments

      This is a solid and exceptionally eloquent paper, comprising a large body of work that is in general well presented. That said, I do have some suggestions and questions. At several points in the manuscript, the authors should perhaps be more careful in their wording and avoid to overgeneralize data without providing additional evidence for these claims.

      • Some key points of the study are not entirely clear to me; possibly because the study builds upon a previous study that was recently published in eLife. Anyhow, I think it would be useful to clarify the following points a bit more:

      • Why exactly was ctf4∆ chosen as a model for replication stress? What is the evidence that ctf4∆ is a good model for replication stress? Without including some evidence for this, it is unclear how well the findings in this study really can be generalized to replication stress (which is what the authors do now). Would the authors expect to see similar routes of adaptation if a 'genomic architecture' with a less severe/other replication defect would have been used? I realize the last question is perhaps difficult to address without actually doing the experiment (which I am not suggesting the authors should do); I just want to point out that perhaps some data should not be over-generalized.

      • Likewise, why was RAD52 selected as the gene to delete to affect homologous recombination? I understand that it is a key gene, but on the flipside, absence of RAD52 affects multiple cellular pathways and (as the authors also observe in their populations) also results in increased mutation rates which might confound some of the results.

      • Related to the first comment, it is also unclear to me how well the system chosen by the authors is representative of the replication stress experienced by tumor cells (as briefly touched upon in the final section of the discussion). Are some of the homologs key oncogenes that drive carcinogenesis?

      • The authors should consider rephrasing some sentences regarding the occurrence of adaptive mutations. Sentences such as 'which genes are mutated depends on the selective advantage' (p1; lines 15-16); 'genome architecture controls the frequency at which mutations occur' (p15), "genome architecture controls which genes are mutated" (p1, line 20) makes it sound like the initial occurrence of mutations is not random, whereas in reality, the mutational landscape is the result of the combined effect of occurrence and fitness effect of the mutations, with the later rather than the former likely being the main driver behind the observed patterns.
      • Some important methodological information is missing or unclear in the manuscript:

      • The authors should provide more details on how they decided which clones to select for sequencing. Did they select the biggest colonies; were colonies picked randomly, ...

      • What is the population size during the evolution experiment?

      • Sequencing of populations and clones: coverage should be mentioned

      • Identification of mutations (p19, line 573): Is this really how the authors defined whether a variant is a mutation? Based on the definition given here, DNA mutations that lead to a synonymous mutation in the protein are not considered as mutations?

      • Perhaps the information can be found elsewhere, but the source data excel files for mutations is incomplete and should at the very least contain information on the type of mutation (eg. T->A), as well as the location of this mutation in the respective gene.

      Minor comments

      • While the author already cite several significant papers relevant for their manuscript, some other studies could also be included:

      • From the text in the abstract, it is unclear what the three genomic architectures (line 13) exactly are, the authors should consider spelling this out.

      • Can the authors speculate on why a homozygous ctf4/ctf4 rad52/rad52 would be lethal, and a haploid not?

      • The authors note that a diploid ctf4/ctf4 strain is less fit than its haploid counterpart. Why do the authors think this is the case?

      • The authors passage their cells for 100 cycles and assume that this corresponds to around 1000 generations for each population. However, the fitness differences between the different starting strains (see also Figure 1B) are likely to cause considerable differences in number of generations between the different strains. Do the authors have more precise measurements of number of generations per population? If not, perhaps it should be noted that some lineages may have undergone more doublings than others, and perhaps also discuss if and how this could influence the results?

      • Panel A of figure 1A is somewhat confusing; as this seems to indicate that the ctf4∆ was introduced after strains were made, for example, haploid recombination deficient (which is not how these strains were constructed). Perhaps a better way of representing would be to have the indication of DNA replication stress pictured inside the yeast cells.

      • Legend to Figure 1: is fitness expressed relative to haploid or diploid WT cells for the diploid strains?

      • Figure 3: to improve readability of this figure, the authors could consider placing the legend of the different symbols (#, *,..) in the figure as well and not just in the figure legend.

      • Figure 5 shows Indels, but if I am correct, these mutations are not discussed in the text; nor is it mentioned what the authors used as a cut-off to determine indels (the authors use the term 'small indels' without defining it)? For example, the data shown in Figure 3 and Figure 4 only includes SNPs and not indels (correct?) - but the indels should also be taken into account when investigating which modules are hit.

      • Figure 5 mentions: # gene mutations. So these are only the mutations in genes, and not in their up- or downstream regulatory regions?

      • Figure 3-S1: labels of C panels are missing.

      • Figure 3-S1, panel B: why did the authors focus on synonymous mutations?

      Significance

      This is a solid and clearly written study, comprising a large body of work that is generally well presented and that will be of interest to scientists active in the field of (experimental) evolution and replication.

      However, many aspects studied in this manuscript have already been studied and reported before; including the recent eLife paper by the same group, as well as studies by other labs that have investigated how genome architecture / genotype affects evolutionary trajectories, the effect of ploidy on evolution, .... Because of this, I do feel that the authors should put their findings more in the context of existing literature context, including a general description of which results are truly novel, which confirm previous findings and which results seem to go against previous reports. This is already so at some points in the text, but I feel this could be done even more.

      Additionally, I think the authors should be more careful not to over-generalize their findings, which come from only a few specific genetic manipulations that might not be representative for general replication stress. For example (p15), can the authors really claim that they have unraveled general principles of adaptation to constitutive DNA replication stress? Perhaps a better motivation of the choice of ctf4 as a model mutation for DNA replication stress could also help (see also my earlier comments). A similar comment applies to the molecular mechanisms affecting adaptation in diploid cells - what evidence do the authors have that their findings are not specific to the one specific type of diploid strain they used in their study? Adding a bit more background information or nuance for some of the claims would help tackle this issue.

      Field of expertise of this reviewer: genetics, evolution, genomics

    5. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In their previous work the authors examined adaptation in response to replication stress in haploid yeast, via experimental evolution of batch cultures followed by sequencing. Here they extend this approach to include diploid and recombination-deficient strains to explore the role of genome architecture in evolution under replication stress. On the whole, a common set of functional modules are found to evolve under all genetic architectures. The authors discuss the molecular details of adaptation and use their findings to speculate on the determinants of adaptation rate.

      SECTION A - Evidence, reproducibility and clarity

      Experimental evolution can reveal adaptive pathways, but there are some challenges when applying this approach to compare genetic backgrounds or environments. They key challenge is that adaptation potentially depends on both the rate of mutation and the nature of selection. Distinct adaptation patterns between groups could therefore reflect differential mutation, selection, or both. The authors allude to this dichotomy but have very limited data to address it. The closest effort is engineering putatively-adaptive variants into all genetic background including those where they did not arise; the fact that such variants remain beneficial suggests they did not arise in certain backgrounds because of a lower mutation rate, but this is a difficult issue to tackle quantitatively.

      From mutation accumulation experiments, where the influence of selection is minimized, there is evidence that genetic architecture affects the rate and spectrum of spontaneous mutations. In this experiment, the allele used to eliminate recombination, rad52, will also increase the mutation rate generally. The diploid strain is also likely to have a distinct mutational profile--as a null expectation diploids should have twice the mutation rate of haploids. Recent evidence indicates the mutation rate difference between haploid and diploid yeast might be less than two-fold, but that there are additional differences in the mutation spectrum, including rates of structural change. The context for this study is therefore three genetic architectures likely to differ in multiple dimensions of their mutation profiles, but mutation rates are not measured directly.

      The nature of selection on haploids and diploids is expected to differ because of dominance, but ploidy-specific selection is also possible. The authors discuss how recessive beneficial alleles may be less available to diploids, though this can be offset by relatively rapid loss of heterozygosity. However, diploids should also incur more mutations, all else being equal. The rate of beneficial mutation, as opposed to the rate of mutation generally, will depend on the mutational "target size" of fitness, and the authors findings recapitulate other literature (particularly regarding "compensatory" adaptation) that points to faster adaptation in genotypes with lower starting fitness.

      There is ample literature on the above topics, particularly discussions of the evolutionary advantages of haploidy versus diploidy. While adaptation to replication stress provides a novel starting point for this investigation, much of the manuscript is devoted to long-standing questions that are not specific to replication stress. Unfortunately, the data the authors collected is not sufficient to shed light on these questions, because mutation and selection cannot be effectively distinguished. The Discussion states that "We find that the genes that acquire adaptive mutations, the frequency at which they are mutated, and the frequency at which these mutations are selected all differ between architectures but that mutations that confer strong benefits always lie in the same three modules" (line 379), but it is not clear that these statements are all supported by the data.

      Focusing on the more novel aspect of their experiment-the presence of replication stress-would arguably be a better approach. On this topic the authors have some interesting observations and speculation, but clear predictions are lacking. The introduction section could be redesigned to explicitly state why genome architecture might affect adaptation in response to replication stress in particular, rather than (or in addition to) adaptation generally. If there were no differences in mutation, does the nature of Ctf4 lead to predictions that the molecular basis of compensatory adaptation should differ among genome architectures? Without such predictions it will be difficult for readers to know whether the observation that different genome architectures follow similar adaptive paths is surprising or not.

      Minor comments:

      Shifts in ploidy from diploid to haploid are less common than the reverse change, so the observation of such a shift (Fig. 1) should be discussed in more detail.

      Line 88 typo 'stains'.

      Significance

      SECTION B - Significance

      The novel aspect of this study is the combination of replication stress and genome architecture, but here the significance is limited by a lack of clear predictions on how these factors might interact. On the other hand, much of the manuscript is devoted to why adaptation might vary among genome architectures in general, but this long-standing and important question is not particularly well resolved by this experimental approach, which can't disentangle mutation and selection.

      The authors highlight the dichotomy when discussing the evolution of ploidy: "We suggest that... genome architecture affects two aspects of the mutations that produce adaptation: the frequency at which they occur and the selective advantage they confer" (line 399), but presenting this as a novel inference does not appropriately acknowledge prior research and discussion of these ideas; several relevant papers are cited by the authors in other contexts. It may be possible to recast these findings as a test of the role of genome architecture in adaptation generally, but the authors should clarify the limitations of experimental evolution and more fully consider the theory and data outlined in previous research. In particular, few studies can claim to directly compare mutation rates between genome architectures, and it is not obvious that the present study is an example of such.

      Reviewer expertise: Evoutionary genetics; experimental evolution; mutation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers

      We thank the reviewers for their careful reading of our manuscript and their valuable suggestions and comments. To address the reviewers’ concerns and improve our manuscript, we will complete the additional experiments and further revise the text as described below.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). Please place your comments about significance in section 2. The authors present an in vivo analysis of pdzd8 (CG10362) and a synthetic ER-mitochondria tether in the regulation of locomotor activity, lifespan, and mitochondrial turnover of Drosophila melanogaster, using basic bioinformatics, RNAi, SPLICS, imaging and microscopies observations (i. e. TEM, SIM), fly lines, and a representative AD fly disease model, etc. The research methodologies were detailed in good order. The model system employed was suitable to address the research topic. The manuscript was written in a clear language and statistical analysis were correctly applied.

      **Major comments:**

      *-Are the key conclusions convincing?*

      Yes. The results/conclusions are logical and provide an overview of Pdzd8 in the regulation of mitochondrial quality control and neuronal homeostasis.

      *-Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.*

      No. Experiments were generally well performed, and all the data support the conclusions.

      *-Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.*

      No suggested experiments needed.

      *-Are the data and the methods presented in such a way that they can be reproduced?*

      Yes. The authors have followed proper experimental design and methods have been described in sufficient detail.

      *-Are the experiments adequately replicated and statistical analysis adequate?*

      Yes, they are.

      **Minor comments:**

      *-Specific experimental issues that are easily addressable.*

      No comment.

      *-Are prior studies referenced appropriately?*

      Yes. The relevant literatures have been cited appropriately.

      *-Are the text and figures clear and accurate?*

      1.Please pay attention to the correct spelling of the described protein name (Pdzd8) and gene name (should be in 'italic') throughout the manuscript, i. e. line 36, 98, and 556, etc.

      As this is the first published characterization of the fly homolog of the mammalian Pdzd8 We have decided to name the fly protein pdzd8, using the lower case “p” to distinguish it from the mammalian protein. We have checked and corrected our use of italics for the gene name as noted in track changes.

      2.In figure 1C and its figure legend, please state what the numbers "201" and "195" stand for.

      We have added the text “numbers on bars indicate number of mitochondria analysed” to the figure legend.

      3.Your data needs to be converted the lowercase letter "x" to math symbol "×" when representing times sign, i. e. line 523, 5x, etc.

      Corrected

      *-Do you have suggestions that would help the authors improve the presentation of their data and conclusions?*

      No comment.

      Reviewer #1 (Significance (Required)):

      *-Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.*

      Discoveries from this study include 1) characterization of the tethering protein Pdzd8 in Drosophila melanogaster, and 2) shed light on a possible way on how to enhance mitochondrial quality control and to help promote healthy aging of neurons by manipulating MERCs.

      *-Place the work in the context of the existing literature (provide references, where appropriate).*

      With this manuscript, the authors present a straightforward but sound piece of scientific research, with the intent to illustrate the consequences of neuronal depletion of pdzd8 in Drosophila melanogaster. Since Pdzd8 plays specific functions in ER-mitochondrial tethering complexes and dysregulations of MERCs are damaging to neurons, this protein represents a good potential target. In this context the characterization of Pdzd8 should represent an interesting starting point. To this purpose, the gene was knockdown and the tether construct was recombinantly produced. The fly lines were then subjected to analysis both at the organismal and at the cellular level.

      *-State what audience might be interested in and influenced by the reported findings.* Audience might include those who are in the field of neuroscience and pharmaceutical, and benefit from an awareness of this research.

      *-Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.*

      Key words in my field of expertise: Ageing, neurodegenerative diseases, Alzheimer's disease, mitophagy, NAD+, neuroprotection. My group is investigating the molecular mechanisms of ageing and age-related neurodegeneration (especially AD) using cross-species model systems, ranging from human brain samples, iPSCs, C. elegans, Drosophila melanogaster, and mice, therefore I have sufficient expertise to evaluate this paper.

      **Referees Cross-commenting**

      To this reviewer the key novelty of this paper was the study of the regulation of the mitochondrial-ER contact sites (MERCs) in life and health. The data indicate that MERCs mediated by the tethering protein pdzd8 play a critical role in the regulation of mitochondrial homeostasis, neuronal function, and lifespan. In a transitional perspective, this reviewer would ask to check whether this mechanism conserves in rodents or not (e.g. to to memory in the AD mice and to run lifespan in mitochondrial toxin condition). This may be to much. But will depend on the standard of the journal. We thank the reviewer for their input, evaluation and interest. We too are keen to know whether this mechanism is conserved and hope to investigate this in our ongoing work including characterizing a mouse mutant, but the current work already represents a substantial investment of resources and a worthy study in its own right as the first description of the in vivo role of pdzd8, so we feel it is beyond the scope of the current work.

      Reviewer #2

      (Evidence, reproducibility and clarity (Required)):

      Hewitt et al. describe and characterize for the first time the ortholog of pdzd8 in Drosophila melanogaster. In accordance with pdzd8's previously described function as a member of mitochondrial-ER contact sites (MERCs) the authors show reduced MERCs upon RNAi mediated depletion of pdzd8 via TEM, SIM and a split-GFP based contact site sensor. Pdzd8 depletion results in the increased life span as well as improved locomotor activity in aging flies while increase of MERCs with a synthetic tether accelerates the age-related declines in survival and locomotion. Moreover, pdzd8 depleted flies are more resistant against mitochondrial toxins. The authors correlate these protective effects of pdzd8 knockdown with an increase in mitophagy using a mitophagy sensor and describe a rescue of locomotor defects in an Alzheimer disease fly model by pdzd8 depletion.

      **Major comments:**

      1.The authors quantify the number of MERCs in thin sections of TEM (Fig 1B and C). It would add to the paper if the authors would show a representative reconstruction of the quantified somata, as a 3D reconstruction would visualize ER-Mito contacts more reliable than thin sections.

      We agree that the 3D reconstruction of TEM images would provide a satisfying addition to the current analyses, however such advanced techniques are not readily available. The current samples used to collect these data cannot be used to generate 3D reconstructions. To counter this, we have used three independent methods to analyse the changes in MERCs, all of which show a decrease in MERCs in the flies with less pdzd8 supporting that these observations are reproducible and robust.

      2.The authors quantify MERCs in pdzd8 KD also by SIM (Fig1F, G). However, they quantify the number of MERCs in epidermal cells while they also show SIM images of larval neurons (Fig S1D). For consistency and to support their claim of MERC reduction in neurons, we ask the authors to include the quantification based on larval neurons especially as the authors show that pdzd8 is predominantly expressed in the CNS.

      Unfortunately, the soma of larval neurons have extremely limited cytosol (see fig. S1D) which creates very challenging conditions to discern the spatial separation of ER and mitochondria by light microscopy. While co-localisation of organelle markers in such cells has been reported in the literature, we are extremely concerned that the lack of space within the cytosol renders such analysis unreliable. However, we will attempt to quantify the extent of co-localisation of the ER and mitochondria in these cells. In contrast, epidermal cells are much larger providing greater spatial separation of ER and mitochondria. Notably, we complement the co-localisation analysis of epidermal cells with two additional approaches, TEM analysis and the SPLICS reporter construct, to demonstrate pdzd8-RNAi results in decreased MERCs specifically in neurons.

      3.The authors describe a decreased NMJ volume in Fig 4G. It would improve and complete the functional characterization of pdzd8 in flies if the authors can provide further data whether pdzd8 KD causes a general synaptic defect. Can the authors show morphological synaptic defects in the existing TEM data of the adult brain or provide additional ERG recordings, which would elucidate the functional consequences of pdzd8 depletion in the CNS?

      Our TEM data are not suitable for us to properly analyse defects in synaptic morphology as our images centered around the cell bodies where the organelle morphology was easiest to distinguish and there are very few synapses. While it is not surprising that the knockdown of pdzd8 has some detrimental effects, we chose to focus our efforts on trying to determine the cause of the protective effect on locomotor activity in aged flies rather than to exhaustively characterise the myriad phenomena which may be impacted as a knock-on effect of the disrupted cell biology that we have demonstrated. We hope to further explore the detrimental functional consequences of pdzd8 depletion on such phenomena as neurotransmission in future work.

      1. Hewitt et al. suggest a beneficial effect of increased turnover of mitochondria for healthy aging. To convince readers we would like to ask the following:

      a) This claim is based on their observation of increased mitophagy in pdzd8 depleted flies using one reporter (Fig 5). Can the authors support their data with an alternative method as this is one of the key claims of the manuscript?

      The mitoQC tool is well established in the field and we have found it to perform better but consistent with mito-Keima (Lee et al. 2018 JCB doi: 10.1083/jcb.201801044). We would be happy to consider other assays if the reviewer can suggest an unbiased and established alternative.

      b) An increased turnover of Mitochondria would also suggest that there are more "young" mitochondria present in the pdzd8 KD neurons. Can the authors experimentally address that?

      We understand the reviewer’s point here but due to the continual fission and fusion, as well as piecemeal turnover of mitochondria (see Vincow et al. 2019 Autophagy doi: 10.1080/15548627.2019.1586258), the concept of ‘young’ versus ‘old’ mitochondria is misplaced. The mitochondrial network essentially exists as a milieu of components which are produced and degraded at different rates.

      c)Furthermore, we would like to ask the authors to use also the MERC tether as control in the mitophagy assay. This would allow further conclusions about the role of the mitophagy, its protective effect during aging and the role of MERCs in this process.

      We remind the reviewer that this MERC tether is constructed from an RFP with N- and C-terminal tethering peptides. The presence of this RFP prevents the proper analysis of the mitoQC mCherry signal. However, given the dramatic phenotypes we think that it is unlikely that a decrease in mitophagy alone can explain the detrimental effects of increased tethering.

      1. In Fig6 A,B the authors should include also the pdzd8 KD to support their claim that the rescue of climbing defects correlates with an reduction of MERCs.

      We thank the reviewer for this suggestion and we will perform this experiment.

      Moreover, it would be beneficial for their final conclusion, if the authors could show that increases mitophagy in the background of Ab42 expressing flies.

      We thank the reviewer for this suggestion and we will perform this experiment.

      **Minor comments:**

      1.Can the authors add to the figure legend of Fig 1F how the ER and Mitochondria were labeled?

      We have added the constructs to the figure legend (full genotypes for all figures are given in Table S2).

      2.Error bars should be added in the quantification of MERCs in Fig1C.

      The MERCs are quantified in three brains per genotype but as there were variable numbers of sections suitable for imaging from each brain the total values are combined to give a single percentage.

      3.A reference to Supplementary Fig S1D is missing in the main text.

      This figure is referenced in line 135

      4.Can the authors label the individual genotypes in Fig S3C and 4F?

      Figure labels and legends have been modified to clarify this.

      5.Can the author specify which brain region they imaged in Fig 5C?

      The regions imaged and quantified were chosen for their clear organelle morphology rather than targeting a specific brain region. All images were from the protocerebrum and the methods and figure legends have been updated to note this.

      6.Are the ATP levels normalized to ADP in Fig S3D? Can the authors specify in the figure and figure legend to what ATP was normalized?

      Figure labels and legends have been modified to clarify the ATP levels are normalised to total protein quantification of the samples.

      7.Please sort the supplementary figures in accordance to their reference order in the text.

      We thank the reviewer for checking this. This figure order will be rechecked in the final version as addressing reviewer comments is likely to lead to further changes.

      Reviewer #2 (Significance (Required)):

      The authors present here novel insights about the functional role of a new member of the MERCs, pdzd8, using RNAi mediated depletion and Drosophila melanogaster as a model system. As MERCs receive more attention especially in the context of their potential role in neurological diseases, the author's manuscript will be of high interest to the scientific community. The in vivo model combined with multiple different technical approaches add to the significance of the paper. There are some controls and additional experiments that are required to support the author's main claims and complete the functional characterization of pdzd8 (see major comments).

      Field of expertise: neuroscience, fly genetics, neurodegeneration.

      Reviewer #3

      (Evidence, reproducibility and clarity (Required)):

      This manuscript entitled "Decreasing pdzd8-mediated mitochondrial-ER contacts in neurons improves fitness by increasing mitophagy" by Hewitt and collaborators describes the role of the Drosophila ortholog of PDZD8 in ER-mitochondria contacts in neurons and the physiological consequence of pdzd8 loss. The authors show that ER-mitochondria contacts are reduced in fly neurons expressing a pdzd8-RNAi construct. Decreasing pdzd8 expression in neurons was accompanied by a slowed age-associated decline in locomotor activity, and an increased lifespan. In presence of mitochondrial toxins, neurons deficient for pdzd8 were protected. Finally, the authors showed that pdzd8 silencing increased mitophagy in aged neurons, and protected against neurodegeneration in a model of Alzheimer's disease.

      **Major points:**

      1)There are important controls that are missing. RNAi expression often affects off-target genes which could unfortunately modify the observed phenotypes. The authors should verify that a) the phenotypes observed by RNAi-mediated pdzd8 silencing can be rescued by the expression of an RNAi-insensitive pdzd8 construct (the authors should verify the rescue of the most crucial phenotypes described in the manuscript); b) the RNAi-LacZ-line that they use as control in the paper does not behave differently from a WT line, which could be induced by an off-target effect of the RNAi-LacZ (again with the most crucial phenotypes).

      While the Drosophila community is fortunate to have a plethora of readily available tools for interrogating the function of nearly all genes in the genome – tools which form the foundation of most work in Drosophila labs worldwide – the availability is not limitless. In this instance, the transgenic RNAi line generated as a resource for the community comprises a 500 bp hairpin, computed to be the most selective target for that gene. Being a 500 bp sequence it is unrealistic to be able to establish an RNAi-resistant variant that still faithfully functions as normal. Nevertheless, although imperfect we show in Figure S3B that pdzd8-RNAi rescues the climbing defect produced by overexpressing pdzd8, providing evidence the construct is specifically acting on this sequence.

      Similarly, the availability of ‘control’ RNAi reagents is generous but still limited. This LacZ-RNAi line is one of a few well-established controls that has provided a cornerstone reference for a wealth of studies. Nevertheless, we will provide experimental data that aged climbing of nSyb>LacZ-RNAi is highly comparable to several other well-established control genotypes.

      2) Did the author analyzed their EM data in a blinded-way to minimize subjective bias? This type of analysis is complicated by the manual annotation of ultrastructures, which is by nature subjective. For instance, this reviewer would have annotated the two mitochondria in the middle of Fig 1B, right as "Mitochondria with ER contact", as there is a membrane tube present at the interface of these two organelles.

      The EM data were analysed blinded to the genotypes. This is noted in the methods section.

      3) There is a controversy in the field on the role of PDZD8: some papers show its involvement in ER-mitochondria contacts, others in ER-lysosome contacts. The authors should discuss this point in more details. Moreover, the authors should localize the protein in Drosophila neurons; is the protein associated with mitochondria or endo/lysosomes?

      We recognize that there is some debate in the field over the localization and role of PDZD8. However, since there is currently no antibody against the Drosophila protein and the sequence is sufficiently divergent such that antibodies against the mammalian protein will not recognize the fly protein, we are not well-positioned to determine the localization of Drosophila pdzd8. Consequently, we will expand our discussion to reflect the differing views.

      We can offer instead to quantify the localization of mouse PDZD8 in our newly generated NIH-3T3 Pdzd8-Halo knock in line to help resolve the controversy regarding the location(s) and function(s) of mammalian Pdzd8.

      4) The authors should specify in more details how the different quantifications were performed. For instance Fig 1G: how many samples were quantified (i.e. how many flies, and how many neurons); what is compared? Fields-of-view, neurons, flies...?

      Further details have been added to the figure legends 1G (now H), 4I, 5 and Fig S2.

      **Minor point:**

      1)Could the authors show the SIM images Fig1F together with the binarized images.

      These images have been added to Figure 1 and the legend and text updated accordingly.

      2) It is surprising to see that data otherwise similar are represented with so many different types of graph (For instance Fig 5, bar graph, box-plot, violin plot). Why individual data points are not always present on the graphs?

      The graphs will be redrawn using more consistent representations once the data for the revisions has been gathered.

      3) The way that data are presented is sometimes odd: for instance, line 101, the authors wrote "To establish that MERCs were decreased...". This would imply that they knew the result before performing the experiment. And later, line 103 "Accordingly...".

      These sentences have been rephrased “To determine whether MERCs were decreased..” and “These results showed the…”

      Reviewer #3 (Significance (Required)):

      This study about the role of pdzd8 is timely. The functional description of inter-organelle contacts is a hot topic in cell biology. There are several recent reports describing the identification of pdzd8 role in inter-organelle contact formation. This manuscript provides data on the role of pdzd8 in a whole organism and expands our understanding of this protein.

      My expertise: inter-organelle contacts (human cells)

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript entitled "Decreasing pdzd8-mediated mitochondrial-ER contacts in neurons improves fitness by increasing mitophagy" by Hewitt and collaborators describes the role of the Drosophila ortholog of PDZD8 in ER-mitochondria contacts in neurons and the physiological consequence of pdzd8 loss. The authors show that ER-mitochondria contacts are reduced in fly neurons expressing a pdzd8-RNAi construct. Decreasing pdzd8 expression in neurons was accompanied by a slowed age-associated decline in locomotor activity, and an increased lifespan. In presence of mitochondrial toxins, neurons deficient for pdzd8 were protected. Finally, the authors showed that pdzd8 silencing increased mitophagy in aged neurons, and protected against neurodegeneration in a model of Alzheimer's disease.

      Major points:

      1)There are important controls that are missing. RNAi expression often affects off-target genes which could unfortunately modify the observed phenotypes. The authors should verify that a) the phenotypes observed by RNAi-mediated pdzd8 silencing can be rescued by the expression of an RNAi-insensitive pdzd8 construct (the authors should verify the rescue of the most crucial phenotypes described in the manuscript); b) the RNAi-LacZ-line that they use as control in the paper does not behave differently from a WT line, which could be induced by an off-target effect of the RNAi-LacZ (again with the most crucial phenotypes).

      2)Did the author analyzed their EM data in a blinded-way to minimize subjective bias? This type of analysis is complicated by the manual annotation of ultrastructures, which is by nature subjective. For instance, this reviewer would have annotated the two mitochondria in the middle of Fig 1B, right as "Mitochondria with ER contact", as there is a membrane tube present at the interface of these two organelles.

      3)There is a controversy in the field on the role of PDZD8: some papers show its involvement in ER-mitochondria contacts, others in ER-lysosome contacts. The authors should discuss this point in more details. Moreover, the authors should localize the protein in Drosophila neurons; is the protein associated with mitochondria or endo/lysosomes?

      4)The authors should specify in more details how the different quantifications were performed. For instance Fig 1G: how many samples were quantified (i.e. how many flies, and how many neurons); what is compared? Fields-of-view, neurons, flies...?

      Minor point:

      1)Could the authors show the SIM images Fig1F together with the binarized images.

      2)It is surprising to see that data otherwise similar are represented with so many different types of graph (For instance Fig 5, bar graph, box-plot, violin plot). Why individual data points are not always present on the graphs?

      3)The way that data are presented is sometimes odd: for instance, line 101, the authors wrote "To establish that MERCs were decreased...". This would imply that they knew the result before performing the experiment. And later, line 103 "Accordingly...".

      Significance

      This study about the role of pdzd8 is timely. The functional description of inter-organelle contacts is a hot topic in cell biology. There are several recent reports describing the identification of pdzd8 role in inter-organelle contact formation. This manuscript provides data on the role of pdzd8 in a whole organism and expands our understanding of this protein.

      My expertise: inter-organelle contacts (human cells)

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Hewitt et al. describe and characterize for the first time the ortholog of pdzd8 in Drosophila melanogaster. In accordance with pdzd8's previously described function as a member of mitochondrial-ER contact sites (MERCs) the authors show reduced MERCs upon RNAi mediated depletion of pdzd8 via TEM, SIM and a split-GFP based contact site sensor. Pdzd8 depletion results in the increased life span as well as improved locomotor activity in aging flies while increase of MERCs with a synthetic tether accelerates the age-related declines in survival and locomotion. Moreover, pdzd8 depleted flies are more resistant against mitochondrial toxins. The authors correlate these protective effects of pdzd8 knockdown with an increase in mitophagy using a mitophagy sensor and describe a rescue of locomotor defects in an Alzheimer disease fly model by pdzd8 depletion.

      Major comments:

      1.The authors quantify the number of MERCs in thin sections of TEM (Fig 1B and C). It would add to the paper if the authors would show a representative reconstruction of the quantified somata, as a 3D reconstruction would visualize ER-Mito contacts more reliable than thin sections.

      2.The authors quantify MERCs in pdzd8 KD also by SIM (Fig1F, G). However, they quantify the number of MERCs in epidermal cells while they also show SIM images of larval neurons (Fig S1D). For consistency and to support their claim of MERC reduction in neurons, we ask the authors to include the quantification based on larval neurons especially as the authors show that pdzd8 is predominantly expressed in the CNS.

      3.The authors describe a decreased NMJ volume in Fig 4G. It would improve and complete the functional characterization of pdzd8 in flies if the authors can provide further data whether pdzd8 KD causes a general synaptic defect. Can the authors show morphological synaptic defects in the existing TEM data of the adult brain or provide additional ERG recordings, which would elucidate the functional consequences of pdzd8 depletion in the CNS?

      4.Hewitt et al. suggest a beneficial effect of increased turnover of mitochondria for healthy aging. To convince readers we would like to ask the following:

      a)This claim is based on their observation of increased mitophagy in pdzd8 depleted flies using one reporter (Fig 5). Can the authors support their data with an alternative method as this is one of the key claims of the manuscript?

      b)An increased turnover of Mitochondria would also suggest that there are more "young" mitochondria present in the pdzd8 KD neurons. Can the authors experimentally address that?

      c)Furthermore, we would like to ask the authors to use also the MERC tether as control in the mitophagy assay. This would allow further conclusions about the role of the mitophagy, its protective effect during aging and the role of MERCs in this process.

      5.In Fig6 A,B the authors should include also the pdzd8 KD to support their claim that the rescue of climbing defects correlates with an reduction of MERCs. Moreover, it would be beneficial for their final conclusion, if the authors could show that increases mitophagy in the background of Ab42 expressing flies.

      Minor comments:

      1.Can the authors add to the figure legend of Fig 1F how the ER and Mitochondria were labeled?

      2.Error bars should be added in the quantification of MERCs in Fig1C.

      3.A reference to Supplementary Fig S1D is missing in the main text.

      4.Can the authors label the individual genotypes in Fig S3C and 4F?

      5.Can the author specify which brain region they imaged in Fig 5C?

      6.Are the ATP levels normalized to ADP in Fig S3D? Can the authors specify in the figure and figure legend to what ATP was normalized?

      7.Please sort the supplementary figures in accordance to their reference order in the text.

      Significance

      The authors present here novel insights about the functional role of a new member of the MERCs, pdzd8, using RNAi mediated depletion and Drosophila melanogaster as a model system. As MERCs receive more attention especially in the context of their potential role in neurological diseases, the author's manuscript will be of high interest to the scientific community. The in vivo model combined with multiple different technical approaches add to the significance of the paper. There are some controls and additional experiments that are required to support the author's main claims and complete the functional characterization of pdzd8 (see major comments).

      Field of expertise: neuroscience, fly genetics, neurodegeneration.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). Please place your comments about significance in section 2. The authors present an in vivo analysis of pdzd8 (CG10362) and a synthetic ER-mitochondria tether in the regulation of locomotor activity, lifespan, and mitochondrial turnover of Drosophila melanogaster, using basic bioinformatics, RNAi, SPLICS, imaging and microscopies observations (i. e. TEM, SIM), fly lines, and a representative AD fly disease model, etc. The research methodologies were detailed in good order. The model system employed was suitable to address the research topic. The manuscript was written in a clear language and statistical analysis were correctly applied.

      Major comments:

      -Are the key conclusions convincing?

      Yes. The results/conclusions are logical and provide an overview of Pdzd8 in the regulation of mitochondrial quality control and neuronal homeostasis.

      -Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      No. Experiments were generally well performed, and all the data support the conclusions.

      -Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      No suggested experiments needed.

      -Are the data and the methods presented in such a way that they can be reproduced?

      Yes. The authors have followed proper experimental design and methods have been described in sufficient detail.

      -Are the experiments adequately replicated and statistical analysis adequate?

      Yes, they are.

      Minor comments:

      -Specific experimental issues that are easily addressable.

      No comment.

      -Are prior studies referenced appropriately?

      Yes. The relevant literatures have been cited appropriately.

      -Are the text and figures clear and accurate?

      1.Please pay attention to the correct spelling of the described protein name (Pdzd8) and gene name (should be in 'italic') throughout the manuscript, i. e. line 36, 98, and 556, etc.

      2.In figure 1C and its figure legend, please state what the numbers "201" and "195" stand for.

      3.Your data needs to be converted the lowercase letter "x" to math symbol "×" when representing times sign, i. e. line 523, 5x, etc.

      -Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      No comment.

      Significance

      -Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Discoveries from this study include 1) characterization of the tethering protein Pdzd8 in Drosophila melanogaster, and 2) shed light on a possible way on how to enhance mitochondrial quality control and to help promote healthy aging of neurons by manipulating MERCs.

      -Place the work in the context of the existing literature (provide references, where appropriate).

      With this manuscript, the authors present a straightforward but sound piece of scientific research, with the intent to illustrate the consequences of neuronal depletion of pdzd8 in Drosophila melanogaster. Since Pdzd8 plays specific functions in ER-mitochondrial tethering complexes and dysregulations of MERCs are damaging to neurons, this protein represents a good potential target. In this context the characterization of Pdzd8 should represent an interesting starting point. To this purpose, the gene was knockdown and the tether construct was recombinantly produced. The fly lines were then subjected to analysis both at the organismal and at the cellular level.

      -State what audience might be interested in and influenced by the reported findings. Audience might include those who are in the field of neuroscience and pharmaceutical, and benefit from an awareness of this research.

      -Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Key words in my field of expertise: Ageing, neurodegenerative diseases, Alzheimer's disease, mitophagy, NAD+, neuroprotection. My group is investigating the molecular mechanisms of ageing and age-related neurodegeneration (especially AD) using cross-species model systems, ranging from human brain samples, iPSCs, C. elegans, Drosophila melanogaster, and mice, therefore I have sufficient expertise to evaluate this paper.

      Referees Cross-commenting

      To this reviewer the key novelty of this paper was the study of the regulation of the mitochondrial-ER contact sites (MERCs) in life and health. The data indicate that MERCs mediated by the tethering protein pdzd8 play a critical role in the regulation of mitochondrial homeostasis, neuronal function, and lifespan. In a transitional perspective, this reviewer would ask to check whether this mechanism conserves in rodents or not (e.g. to to memory in the AD mice and to run lifespan in mitochondrial toxin condition). This may be to much. But will depend on the standard of the journal.

  2. Jan 2021
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Overall:

      We thank the reviewers for their thoughtful comments and suggestions on how to improve the manuscript. We also thank the reviewers for describing the study as “highly significant,” “rigorous and reliable as described and can be reproduced by others,” and as “relevant to investigators working in the field of rickettsial diseases and to a broader audience studying mechanisms of intracellular parasitism and host responses.”

      In this revised manuscript we have addressed all the minor points raised by the reviewers. In regard to additional experiments, all three reviewers suggested that we perform histology of skin lesions, and in a revised manuscript we propose to thoroughly address this by performing histology at multiple time points in infected wild type and in interferon receptor-deficient mice. We will also attempt to use immunohistochemistry to identify the infected cell types in the skin and in internal organs. We will compare these findings to histology of human eschars. We feel that the reviewer comments support our contention that a manuscript containing these proposed additional experiments will be of strong significance in the field.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      “Rickettsial eschars are hallmarks of less severe spotted fever diseases. The underlying mechanisms involved in the formation of the eschar caused by pathogenic rickettsiae remains unknown. The authors of this manuscript studied this interesting research question by using Ifnar-/-Ifngr-/- mice and Sca2 or OmpB mutant of R. parkeri. R. parkeri probably is the best rickettsial species to study rickettsial eschar due to the clinical features of R. parkeri rickettsioses and the biosafety level required to work with it. The data presented in the manuscript are very promising. The conclusions are supported by the presented results. For the first time, this study recapitulated human eschar-like skin lesion observed in patients with R. parkeri rickettsioses in the mouse models. More interestingly, mice inoculated with Sca2 mutant of R. parkeri i.d. had less disseminated rickettsiae in tissues, which helps us to understand the mechanisms by which pathogenic rickettsiae cause systemic infection after the arthropod bite.”

      **Minor comments: **

      “1) Figure 2D, it looks likely the lethality of mice i.d. infection with R. parkeri is not dose dependent. For example, mice inoculated with 10^4 showed greater lethality compared to 10^7. The authors might want to explain it in the Discussion.”

      The reviewer is correct in observing that the lethality between different doses of R. parkeri in Ifnar-/-Ifngr-/-mice after intradermal infection is not dose dependent with the current number of mice used per group. We do not understand the reason for this, and more broadly we don’t understand the mechanism of lethality. We speculate that there could be a bottleneck; however, answering this question will require future investigations into the mechanisms of lethality that are beyond the scope of this study. To address the reviewer’s point, we now include this statement: “Degrees of lethality between different doses in Ifnar-/-Ifngr-/- mice were not significantly different from one another, and the cause of lethality in this model remains unclear.”

      “2) Line 202, innate immunity in vitro might need to be revised.”

      We agree that the previous description was vague. We changed the description to be more specific and it now reads: “…Sca2 does not significantly enhance the ability of R. parkeri to evade interferon-stimulated genes or inflammasomes in vitro.

      “3) It is unclear what is the unit of the inoculum in animal experiments, PFU?”

      Yes, it is PFU. We have now indicated this in the figure legends.

      “4) Line 36, in the study of "Reference 16", C3H/HeN mice, not B6 mice, were used.”

      We thank the reviewer for noticing this error and we have changed the text to C3H/HeN.

      “5) The conclusion on eschar will be greatly strengthened if histological analysis is included, particularly whether dermis necrosis is present or not.”

      In the revised manuscript we will perform histology on eschars in wild type and Ifnar-/-Ifngr-/- mice over time. We will also use immunohistochemistry to analyze the infected cell types and will compare this to data on human eschars. We agree that this will greatly strengthen our conclusions regarding the similarities between the mouse and human eschars.

      “6) Line 357, it is not clear what "spinfection" means.”

      We have changed this to “infection” for clarity.

      “Reviewer #1 (Significance (Required)): Several approaches employed in the study are new to the field of animal models of the rickettsioses. For example, fluorescent dextran was used to investigating the vascular damage in skin at the inoculation site; body temperature for mice infected with R. parkeri. Overall, the study is highly significant since it has answered the important questions in the research area of spotted fever rickettsioses and employed appropriate approaches. No major concerns were noticed.”

      We thank the reviewer for appreciating the significance of this work.

      **Referees cross commenting** I agree with other reviewers' comments. Thanks for the invite.”

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      “The manuscript utilizes a new model of spotted fever rickettsiosis. Using this model, the authors have determined that knockout of the sca2 or ompB gene attenuates Rickettsia parkeri, and vaccination with the attenuated rickettsiae provides protection against virulent challenge. However, the model is far less than ideal as it has eliminated important effectors of immunity.”

      We thank the reviewer for their comments and we hope to thoroughly address their concerns. In regard to the effects of interferons on long-lasting immunity to R. parkeri, we note to the reviewer that we observed that immunized Ifnar-/-Ifngr-/- mice were completely and robustly protected from rechallenge. No lethality and no loss of body weight or temperature was observed after a rechallenge dose of 10x the LD-50. These data reveal that interferons are dispensable for long-lasting immunity to R. parkeri in inbred mice and are not important effectors of adaptive immunity to R. parkeri. This is thus the first model that can be used to investigate the factors required for adaptive immunity to R. parkeri in mice.

      If the reviewer’s comment is not referring to long-lasting adaptive immunity to R. parkeri but is instead referring to the general concept of using immunocompromised mice as models, we note that immunocompromised mice are used as models for a variety of pathogens, including many Rickettsia species (reviewed in Osterloh, Med Microbiol Immunol 2017), and Ifnar-/-Ifngr-/- mice specifically are used as models for Zika and Dengue virus infections. Unlike many other immunocompromised mice, Ifnar-/-Ifngr-/- mice do not require maintenance on antibiotics and they have no noticeable differences to wild type mice in regard to breeding or growth.

      “Manuscript also fails to recognize that there is a Amblyomma maculatum tick transmitted model of Rickettsia parkeri infection that causes an eschar and disseminated pathology”

      In the previous version of the manuscript in lines 266-269 we cited and acknowledged the reported tick transmission model in non-human primates (Banajee et al., 2015). As also noted by Reviewer 3, our model with needle inoculation is significantly less time consuming and expensive than a tick transmission model. Moreover, needle inoculation makes it feasible to precisely measure the number of bacteria that are administered, which is not true with ticks. Lastly, the tick model was described in non-human primates, which are significantly more expensive than inbred mice and are not amenable to genetic manipulation. Thus, our model provides many significant advantages over the tick model in non-human primates, including cost, time, availability of genetic mutants, and reproducibility.

      “The model that they have used is inadequately characterized. The cutaneous lesion was not evaluated histologically to determine if it features the actual characteristics of an eschar.”

      We thank the reviewer for the suggestion and as a part of our revision plan, we propose to thoroughly analyze the lesion histologically.

      “Although bacteria were found in the liver and spleen, in which macrophages are significant target, there was no evaluation of the vital organs including lung and brain nor demonstration of the target cells or pathologic lesions.”

      In previous work from our lab (Engström et al., 2019), we found that lungs of wild type mice contained similar number of infectious R. parkeri as the spleen and liver after intravenous infection. Thus, in order to be able to process more samples quickly, we did not include lungs in the experiments described here. In unreported data, we also found that organs including the brain, kidneys, and heart had no/little recoverable PFUs. As a part of our revision plan, we propose to perform immunohistochemistry in the spleen, liver, lung, and skin to identify the infected cell types. Identifying the infected cell types will reveal if the same cell types are infected in our mouse model as in humans.

      “Unfortunately, the assay of vascular permeability was applied only to the inoculation site and not to the disseminated visceral organs such as lung and brain.”

      We have performed the vascular permeability assay using internal organs alongside the skin; however, little/no fluorescence was observed in any sample. We were unable to distinguish differences between control groups or between control and experimental groups in organs from mice that were treated and untreated with the fluorescent dextran. Thus, we were unfortunately not able to apply the described vascular damage assay to organs other than the skin. We now indicate this in the revised text.

      Reviewer #2 (Significance (Required)):

      “The authors all have misrepresented the eschar as a critically important lesion whereas the patients usually do not even know i's presence until they began to develop systemic symptoms and it is a detected by a physician examining the patient.”

      We did not intend to suggest that the eschar is either more or less critically important than other features of rickettsial disease. We simply described the eschar as a “hallmark feature” of eschar-associated rickettsiosis. Additionally, as the reviewer notes, patients report systemic symptoms, and our model elicits systemic disease by R. parkeri in mice. Thus, the model we describe recapitulates both an eschar and disseminated disease and is the first mouse model for R. parkeri that exhibits both of the disease manifestations mentioned by the reviewer.

      “On line 30 the authors state that mice are the natural reservoir of Rickettsia parkeri. The references cited describe the failure of acquisition by feeding ticks, meaning that it is not a true reservoir. The reference describing animals with antibodies merely indicates exposure to a spotted fever group Rickettsia not sufficient evidence of a role as a reservoir.”

      We thank the reviewer for making this important distinction and we have altered the text to read: “…small rodents including mice have been found as seropositive for R. parkeri in the wild.”

      “In response to the request for my expertise, I have contributed a large amount of data to understanding mechanisms of immunity to rickettsiae and have developed several useful animal models of Rickettsial diseases. I also have expertise on clinical aspects of spotted fever group rickettsioses, including the eschar.”

      **Referees cross commenting**

      “This is not the first Mouse model of rickettsiosis to contain an eschar. There is a model of Rickettsia parkeri transmitted by Amblyomma maculatum ticks in which eschars occur.”

      As noted above by us and also by Reviewer 3, we cited and discussed the tick transmission model in non-human primates (Banajee et al., 2015) in the Discussion. We also note to the reviewer the many advantages of our i.d. infection model, including how it will make these experiments more widely accessible, more reproducible, less expensive, faster, and enable the infection of mice with various genetic modifications.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      “This manuscript reports novel observations pertinent to development in inbred mice of an eschar lesion and generalized lethal infection following intradermal infection with Rickettsia parkeri, the mice are deficient in two types of interferon receptors. This is a new observation for the murine system and expands the existing repertoire of model infections for tick-borne rickettsiae. This study also reports that Sca2-mediated actin-based motility is required for R. parkeri dissemination and provides indirect evidence that OmpB protein is involved in eschar formation, thus corroborating previous knowledge about these major surface exposed antigens of rickettsiae and host cell interactions and host responses to these organisms.”

      “The study is rigorous and reliable as described and can be reproduced by others given availability of adequate funding, access to similar facilities, strains of mice and rickettsial mutants, and technical personnel with similar skills and training. There are no ethical or technical concerns.”

      We thank the reviewer for their thoughtful comments and for appreciating the advantages of this model.

      “The main limitation of the manuscript is due to the fact that histological and immunohistochemical analysis of the eschar was not performed; therefore, it is not clear if pathological processes and features of this lesion formation are the same or related to the human pathology.”

      We thank the reviewer for this suggestion. As a part of the revision plan, we propose to perform histological and immunohistochemical analysis of the eschar and will compare these findings to reported data from humans. We will also identify the cell types infected in the skin and internal organs in wild type and Ifnar-/-Ifngr-/- mice.

      “Similarly, in an attempt to generalize (as the authors try very hard), it is not clear how these observations will be relevant to rickettsial pathogens which are responsible for more severe forms of rickettsioses (such as R. rickettsii and R. prowazekii) but are not known to cause eschar formation as a part of their clinical manifestations.”

      Our findings with Sca2 are in agreement with findings on R. rickettsii Sca2 in guinea pigs (Kleba et al., 2010), which showed that Sca2 was required for eliciting fever and an antibody response. Our work also expands on these findings by showing that sca2 mutants immunize against rechallenge and by finding reduced bacterial burdens in internal organs after intradermal infection with sca2 mutant bacteria. Thus, we believe that studying R. parkeri genes in Ifnar-/-Ifngr-/- mice can serve as a model to better understand conserved virulence genes in diverse rickettsial pathogens.

      Beyond virulence genes, we note that our model also recapitulates systemic disease including dissemination to internal organs. Thus, it provides a platform to study disease manifestations beyond the eschar that may be relevant to other rickettsial pathogens including R. rickettsii and R. prowazekii.

      Some other virulent rickettsial pathogens cause limited/no disease in WT C57Bl/6 mice, including R. akari, R. conorii, R. typhi, and O. tsutsugamushi (reviewed in Osterloh, Med Microbiol Immunol 2017). Thus, Ifnar-/-Ifngr-/- mice may potentially serve as models for these pathogens. We now include this point in the Discussion.

      “The other deficiency is due to a limited description of the Sca2 and OmpB mutants used in this study. It was necessary to locate and review previous publications by this group in order to understand the experiments conducted here and their interpretation. It would be useful to the readers if this information (a better more complete description of the mutants and their properties) is summarized in this manuscript.”

      We have now provided a more complete description of the mutants in the Introduction and Results.

      Reviewer #3 (Significance (Required)):

      *“The study is relevant to investigators working in the field of rickettsial diseases and to a broader audience studying mechanisms of intracellular parasitism and host responses.

      The study argues that difference(s) in dermal IFN signaling mechanism(s) distinguish human and murine susceptibility to R. parkeri infection. This is a very useful speculation; however, a better and deeper discussion would be helpful to demonstrate the relevance of these observations and their connection(s) to events occurring during the course of human infections. Regrettably, there are almost no citations of classic or current literature addressing these aspects of rickettsial pathogenesis and the role of IFN-dependent mechanisms beyond self-citations. Overall, the discussion includes four relatively short paragraphs, each addressing different directions of possible research, which indicates ample possible utility of this murine model; however, a more coherent and convincing discussion is desirable.”*

      We thank the reviewer for the suggestion, and we have now expanded the Discussion to address the role for IFN-dependent mechanisms in humans and mice during rickettsial infections, including classic and current literature citations.

      **Referees cross commenting**

      “I agree with the Reviewer #2 that per se this is not the first murine model reproducing eschar upon A. maculatum transmission; however, this is the first model that allows to monitor eschar formation using needle inoculation. This model can be widely used; while many labs maybe limited by their facility setup and can't afford/conduct tick transmission experiments. The authors acknowledged existing of the tick transmission model and discuss inclusion of this option in their future experiments.”

      We thank the reviewer for recognizing the many advantages of this model.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript reports novel observations pertinent to development in inbred mice of an eschar lesion and generalized lethal infection following intradermal infection with Rickettsia parkeri, the mice are deficient in two types of interferon receptors. This is a new observation for the murine system and expands the existing repertoire of model infections for tick-borne rickettsiae. This study also reports that Sca2-mediated actin-based motility is required for R. parkeri dissemination, and provides indirect evidence that OmpB protein is involved in eschar formation, thus corroborating previous knowledge about these major surface exposed antigens of rickettsiae and host cell interactions and host responses to these organisms.

      The study is rigorous and reliable as described, and can be reproduced by others given availability of adequate funding, access to similar facilities, strains of mice and rickettsial mutants, and technical personnel with similar skills and training. There are no ethical or technical concerns.

      The main limitation of the manuscript is due to the fact that histological and immunohistochemical analysis of the eschar was not performed; therefore, it is not clear if pathological processes and features of this lesion formation are the same or related to the human pathology. Similarly, in an attempt to generalize (as the authors try very hard), it is not clear how these observations will be relevant to rickettsial pathogens which are responsible for more severe forms of rickettsioses (such as R. rickettsii and R. prowazekii) but are not known to cause eschar formation as a part of their clinical manifestations.

      The other deficiency is due to a limited description of the Sca2 and OmpB mutants used in this study. It was necessary to locate and review previous publications by this group in order to understand the experiments conducted here and their interpretation. It would be useful to the readers if this information (a better more complete description of the mutants and their properties) is summarized in this manuscript.

      Significance

      The study is relevant to investigators working in the field of rickettsial diseases, and to a broader audience studying mechanisms of intracellular parasitism and host responses.

      The study argues that difference(s) in dermal IFN signaling mechanism(s) distinguish human and murine susceptibility to R. parkeri infection. This is a very useful speculation; however, a better and deeper discussion would be helpful to demonstrate the relevance of these observations and their connection(s) to events occurring during the course of human infections. Regrettably, there are almost no citations of classic or current literature addressing these aspects of rickettsial pathogenesis and the role of IFN-dependent mechanisms beyond self-citations. Overall the discussion includes four relatively short paragraphs, each addressing different directions of possible research, which indicates ample possible utility of this murine model; however, a more coherent and convincing discussion is desirable.

      Referees cross commenting

      I agree with the Reviewer #2 that per se this is not the first murine model reproducing eschar upon A. maculatum transmission; however, this is the first model that allows to monitor eschar formation using needle inoculation. This model can be widely used; while many labs maybe limited by their facility setup and can't afford/conduct tick transmission experiments. The authors acknowledged existing of the tick transmission model and discuss inclusion of this option in their future experiments.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript utilizes a new model of spotted fever rickettsiosis. Using this model, the authors have determined that knockout of the sca2 or ompB gene attenuates Rickettsia parkeri, and vaccination with the attenuated rickettsiae provides protection against virulent challenge. However, the model is far less than ideal as it has eliminated important effectors of immunity. Manuscript also fails to recognize that there is a Amblyomma maculatum tick transmitted model of Rickettsia parkeri infection that causes an eschar and disseminated pathology. The model that they have used is inadequately characterized. The cutaneous lesion was not evaluated histologically to determine if it features the actual characteristics of an eschar. Although bacteria were found in the liver and spleen, in which macrophages are significant target, there was no evaluation of the vital organs including lung and brain nor demonstration of the target cells or pathologic lesions. Unfortunately the assay of vascular permeability was applied only to the inoculation site and not to the disseminated visceral organs such as lung and brain.

      Significance

      The authors all have misrepresented the eschar as a critically important lesion whereas the patients usually do not even know i's presence until they began to develop systemic symptoms and it is a detected by a physician examining the patient.

      On line 30 the authors state that mice are the natural reservoir of Rickettsia parkeri. The references cited describe the failure of acquisition by feeding ticks, meaning that it is not a true reservoir. The reference describing animals with antibodies merely indicates exposure to a spotted fever group Rickettsia not sufficient evidence of a role as a reservoir.

      In response to the request for my expertise, I have contributed a large amount of data to understanding mechanisms of immunity to rickettsiae and have developed several useful animal models of Rickettsial diseases. I also have expertise on clinical aspects of spotted fever group rickettsioses, including the eschar.

      Referees cross commenting

      This is not the first Mouse model of rickettsiosis to contain an eschar.There is a model of Rickettsia parkeri transmitted by Amblyomma maculatum ticks in which eschars occur.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Rickettsial eschars are hallmarks of less severe spotted fever diseases. The underlying mechanisms involved in the formation of the eschar caused by pathogenic rickettsiae remains unknown. The authors of this manuscript studied this interesting research question by using Ifnar-/-Ifngr-/- mice and Sca 2 or OmpB mutant of R. parkeri. R. parkeri probably is the best rickettsial species to study rickettsial eschar due to the clinical features of R. parkeri rickettsioses and the biosafety level required to work with it. The data presented in the manuscript are very promising. The conclusions are supported by the presented results. For the first time, this study recapitulated human eschar-like skin lesion observed in patients with R. parkeri rickettsioses in the mouse models. More interestingly, mice inoculated with Sca2 mutant of R. parkeri i.d. had less disseminated rickettsiae in tissues, which helps us to understand the mechanisms by which pathogenic rickettsiae cause systemic infection after the arthropod bite.

      Minor comments:

      1)Figure 2D, it looks likely the lethality of mice i.d. infection with R. parkeri is not dose-dependent. For example, mice inoculated with 10^4 showed greater lethality compared to 10^7. The authors might want to explain it in the "Discussion".

      2)Line 202, innate immunity in vitro might need to be revised.

      3)It is unclear what is the unit of the inoculum in animal experiments, PFU?

      4)Line 36, in the study of "Reference 16", C3H/HeN mice, not B6 mice, were used.

      5)The conclusion on eschar will be greatly strengthened if histological analysis is included, particularly whether dermis necrosis is present or not.

      6)Line 357, it is not clear what "spinfection" means.

      Significance

      Several approaches employed in the study are new to the field of animal models of the rickettsioses. For example, fluorescent dextran was used to investigating the vascular damage in skin at the inoculation site; body temperature for mice infected with R. parkeri. Overall, the study is highly significant since it has answered the important questions in the research area of spotted fever rickettsioses and employed appropriate approaches. No major concerns was noticed.

      Referees cross commenting

      I agree with other reviewers' comments. Thanks for the invite.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for taking the time to carefully evaluate our manuscript. The paper will be significantly improved by their suggestions, and we are grateful for their perspectives.

      To address the reviewers’ concerns, we will complete additional control experiments and revise the manuscript as detailed below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In the present work Stumpff, Reinholdt and co-workers investigate the mechanism by which micronuclei contribute to tumorigenesis. Micronuclei are classic markers of genomic instability widely used in the diagnosis of cancer, but whether they work as drivers of the process has recently attracted significant attention due to their link with chromothripsis. Here, the Stumpff/Reinhold labs have explored an interesting model to test some ideas about the role of micronuclei as drivers of tumorigenesis, based on Kif18A/p53 double KO mice. They confirm the formation of micronuclei in these animals, but find no substantial increase in survival and tumor incidence relative to p53 KO animals, despite higher incidence of micronuclei in Kif18A/p53 KO tumors. They conclude that, per se, micronuclei do not have the capacity to form tumors, regardless of p53 status. This was surprising, given the well-established role of p53 in preventing the proliferation of micronucleated cells. To shed light into this apparent paradox, they compared micronuclei from Kif18A KO cells with micronuclei generated by a number of other experimental conditions that promote formation of anaphase lagging chromosomes or generates acentric fragments. They found that micronuclei derived from Kif18A are intrinsically different from micronuclei generated by those other means and essentially showed increased accumulation of lamin B, were more resistant to rupture and preserved the capacity to expand as cells exited mitosis. Of note, they find a correlation between chromosome proximity to the poles/main chromosome mass and the different features that characterize micronuclei from Kif18A KO cells, compared with the other experimental conditions in which late lagging chromosomes are more frequent. Overall, I find this study extremely interesting, well designed and executed in a rigorous way that characterizes the consistent solid work from these laboratories over the years. I have just few minor points that I recommend to be addressed prior to publication. 1-Abstract and main text lines 70 and 100: the authors indicate that Kif18A mutant mice produce micronuclei due to unaligned chromosomes. This is correct, but at the same time misleading. The authors should clarify that although micronuclei derive from compromised congression, I was convinced from previous works (Fonseca et al., JCB, 2019) that it was their asynchronous segregation in anaphase that led to micronuclei formation. As is, a less familiar reader may conceive that misaligned chromosomes directly result in micronuclei, for example by being detached from the main chromosome mass.

      We thank the reviewer for raising this point. We agree that micronuclei form in the absence of KIF18A due to chromosome alignment defects, which reduces interchromosomal compaction and leads to asynchronous arrival of chromosomes at spindle poles during anaphase. As the reviewer suggests, micronuclei form around chromosomes that travel longer distances and arrive late to the poles. We have revised the manuscript to clarify this (Lines 12-13, 72-73, 102).

      2-Page 2, line 59: "cells entering cell division...become fragmented". It is not the cells, but the chromosomes that fragment. Please correct.

      We have revised this wording to indicate it is the chromosomes within micronuclei which fragment (Line 60-63).

      3-Page 4, line 149: "reduced survival in the Kif18A null, p53 mice". P53 what? KO, WT? Please clarify.

      We have revised this wording as suggested, to read: “reduced survival in the Kif18agcd2/gcd2, p53-/- mice,” (Line 158).

      4-Page 5, line 212: the authors refer that micronuclei were scored for absence of lamin A/C, but previously they scored it as "continuous/discontinuous". Please clarify.

      Thank you for raising this question. When we scored lamin A/C, we noted cases where lamin A/C signal was incompletely present (not fully co-localizing with the micronuclear area, as indicated by DAPI). In these infrequent cases, micronuclei were identified as having “discontinuous” lamin A/C signal and were binned with those micronuclei lacking lamin A/C, for purposes of creating a binary readout of the micronuclear envelope: either 1) “intact” (having full, completely continuous lamin A/C signatures) or 2) “ruptured” (lacking a complete micronuclear signal of lamin A/C). We will update the text and the methods to more clearly reflect this categorization (Lines 221-225; 603-607).

      5-Page 6, line 243: "Kif18A is not required for micronuclear envelope rupture". Shouldn't it be micronuclear envelope "integrity"?

      We apologize for the confusion here. The experiment performed was designed to distinguish whether micronuclear envelopes are more stable in KIF18A KO cells or if KIF18A itself is somehow required for the rupture of all micronuclear envelopes to occur. Since nocodazole-induced micronuclei were able to rupture in KIF18A KO cells at similar frequencies to those seen in control cells, the data indicate that KIF18A is not required for the process of micronuclear envelope rupture. We modified the text to improve clarity (lines 252-253).

      6-One of the most interesting results of the paper is the correlation between envelope formation in micronuclei with their respective position relative to the poles/midzone. Could the authors try to investigate causality? For instance, the authors refer to works from other labs in which MT bundles and a midzone Aurora B activity gradient might play a role in the different features associated with micronuclei envelope formation, depending on their origin. Could the authors manipulate this gradient and investigate whether it changes the outcome in terms of nuclear envelope assembly properties on micronuclei? Are there any detectable features in midzone MT organization in Kif18A KO cells that would justify the observed differences?

      We agree that this result is very interesting. However, we feel the proposed experiments would repeat previous work and are somewhat outside the purview of the present study. Elegant experiments to address Aurora’s role in preventing micronucleus formation have already been performed using genetic approaches in Drosophila neuroblasts and small molecule inhibitors in mammalian cells and Drosophila S2 cells (PMIDs: 24925910, 25877868, and 29986897). Interpreting effects of Aurora B inhibition are complicated by the many critical roles Aurora B plays in ensuring proper and faithful chromosome segregation. Thus, experiments to precisely test Aurora’s effect on micronuclear envelope stability require addition of Aurora B inhibitors on a cell-by-cell basis, administered within a narrow window of minutes during anaphase. It would require significant effort to obtain enough cells from different experimental conditions to make a meaningful comparison.

      The suggestion to investigate detectable differences or features in midzone MT organization in KIF18A KO cells is also appreciated. We have not observed gross differences in midzone microtubules in KIF18A KO cells, but we will quantitatively evaluate this and add these results to the revised manuscript.

      Reviewer #1 (Significance (Required)):

      Kif18A plays a key role in chromosome alignment, without apparently affecting kinetochore-microtubule attachments in non-transformed cells. Because they cannot establish a proper metaphase plate Kif18A KO cells enter anaphase with highly asynchronous segregation due to non-uniform chromosome distribution along the spindle axis. Consequently, some "delayed" chromosomes form micronuclei, in cell culture and in vivo. Interestingly, prior art has failed to detect any increased signs of genomic instability in Kif18A KO cells and mice, and, contrary to what would be expected based on current trends, these mice do now show any signs of increased incidence of tumors, in fact they even show some protective effect to induced colitis-associated colorectal cancer. Noteworthy, all previous experimental works pointing to a role of micronuclei as key intermediates of genomic instability in cancer relied on models in which the tumor suppressor protein p53 had been inactivated. In the present work, the authors explore the relationship between micronuclei formation and p53 inactivation by investigating tumor formation in Kif18A/p53 double KO animals (1 or 2 alleles of p53 inactivated).The reported results are timely and will attract the interest of a broad readership, while decisively contributing to shed light into an ongoing debate. I am therefore all in favor for the publication of this work in any journal affiliated with review commons, pending some minor revisions.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Sepaniac and colleagues use in vivo and in vitro approaches to examine why micronuclei generated by lack of KIF18A activity do not promote tumorigenesis. The authors conclude that micronuclei in KIF18A depleted cells form stable micronuclear envelopes, which may be a result from lagging chromosomes being closer to the spindle pole when the micronuclear envelope forms. The authors further conclude that the stability of the micronuclei arising from lack of KIF18A can explain why Kif18a mutant mice do not develop tumors. These results also suggest that the consequences of micronuclei and their possible contribution to tumorigenesis depend on the context of their genesis. While the mouse model data and characterization of the stability of micronuclei generated by different insults support the conclusions, the lagging chromosome positioning data could be improved. Moreover, a number of other issues should be addressed prior to publication.

      **Major issues:**

      1.Line 153-155. The authors conclude that the slight reduction in overall survival is "due to a reduced ability of Kif18a mutants to cope with rapid tumorigenesis," but it is unclear why this would be the case. There is also an increase in micronucleated cells in thymic lymphomas from Kif18a/p53 homozygous mice (Fig. 2B)-could this not contribute? In Fig. 3C, the authors show that micronuclear rupture is similar in both Kif18a WT and mutant mice, so it seems possible that the increase in the frequency of micronuclei (Fig. 2B) coupled with a similar frequency of micronuclear rupture (Fig. 3C) could lead to the reduced survival. Then, in the discussion, the authors downplay this finding by saying (line 371) "loss of Kif18a had modest or no effect on survival of Trp53 homozygotes and heterozygotes." Why then speculate earlier in the text that loss of Kif18a reduces the ability to cope with tumorigenesis?

      We thank the reviewer for pointing out this issue. Our goal here was to try and explain why the Kif18a/p53 mutant homozygotes display a small but significant reduction in survival compared to p53 mutants, while the Kif18a mutation does not impact survival of p53 heterozygotes, which could be considered a more sensitive model for detecting decreased survival. Kif18a homozygous mutants do display a small reduction in survival shortly after birth compared to heterozygote and wild type littermates (PMID: 25824710). Thus, we can’t exclude the possibility that incompletely penetrant, postnatal lethality might be coincident with reduced fitness in surviving mutants, thus naming them more sensitive to loss of p53 loss of function. We have removed this statement form the revised text.

      However, the reviewer’s point that the combination of increased micronuclei in Kif18a/p53 homozygous mutants combined with a similar rupture rate seen in p53 mutants could also underlie or at least contribute to reduced survival is a good one. We have softened our conclusion in the Results section regarding the reduced survival of double homozygous mice (lines 158-164). We also agree that the way in which this point is addressed in the results and discussion sound contradictory. Thus, we have edited the language in the Discussion to improve consistency (lines 393-399).

      2.Related to the point above, the authors show in figure 3 that the micronuclei found in healthy tissues display infrequent membrane rupture (panel B). However, micronuclear membrane rupture in tumor tissues is much more frequent (panel C). How do the authors explain this? Do they hypothesize that the micronuclei in the tumors originate by mechanisms other than the misalignment caused by lack of KIF18A? Does KIF18A depletion cause aneuploidy due to segregation of two sisters to the same pole? If so, one could expect the tumors to be aneuploid (is this the case?) and aneuploidy has been shown by numerous groups to cause genomic instability. Such genomic instability could then explain the difference in membrane rupture.

      We agree that this is an interesting question. We plan to investigate several possible contributors to increased rupture in tumor cells in a separate study. As outlined in the Discussion (lines 443-458), we hypothesize that rupture could increase in tumor tissue due to changes in lamin expression or cytoskeletal forces in these cells. However, as the reviewer notes, differences in aneuploidy could also potentially explain the differences in membrane rupture observed in healthy (non-tumorous) and thymic lymphoma tissues. For example, an increase in chromosome number could lead to lagging chromosomes being positioned closer to the midzone in Kif18a mutant cells or, as the reviewer suggests, the micronuclei could occur in aneuploid tumors due mitotic defects other than misalignment. This may be difficult to determine unequivocally in primary cell or tissue samples. However, we do have a limited quantity of primary thymic lymphoma-derived cells and we will use these to initially investigate aneuploidy in the two genotypes. The results of these studies will be added to the final revised manuscript. In addition, we will incorporate a discussion of how aneuploidy may increase rupture frequency in tumors into the revised manuscript.

      3.The authors conclude that lagging chromosomes in KIF18A KO cells are found closer to the main chromatin mass. The Stumpff lab showed in a 2019 JCB paper that KIF18 KO cells have a chromosome alignment defect and as a result during anaphase the chromosomes can be scattered rather than forming the tight, uniform mass that is observed in WT cells. The scattering of kinetochores resulting from this phenotype could affect the value of "Avg Chromosomes Distances" in Fig 7B and the normalized distance in the KIF18A KO cells. Therefore, live-cell imaging experiments would be helpful to resolve this and possibly strengthen this conclusion. RPE1 cells with fluorescently tagged CENP-A and centrin could be used to ensure that the lagging chromosomes will not rejoin the main nucleus. Moreover, these cells could be used for correlative live-fixed cell experiments in which fixed cell analysis following micronucleus formation could be used to show that chromosomes that lag farther away from the spindle pole are more likely to have defective micronuclear envelopes.

      The reviewer’s concern that the unalignment phenotype, characteristic of KIF18A KO cells, may impact the value of average chromosome distances used to set a threshold for chromosomes meeting our definition of lagging is valid. To address this, we analyzed the standard deviations for chromosome-to-pole distances within half spindles of KIF18A KO and nocodazole-washout treated anaphase cells as a way to compare chromosome scattering in these two conditions. This analysis revealed no significant difference between the standard deviations of chromosome positions in the two groups, suggesting that scattering is similar in nocodazole treated and KIF18A KO cells. We have included these data in the manuscript (Line 351-356, and additional data added to Figure S2C).

      In order to further strengthen this conclusion, we are certainly willing to attempt the live cell imaging experiments suggested by the reviewer. We would like to point out that the frequency of micronucleus formation in the KIF18A KO cells is relatively low compared to the frequency seen after other experimental treatments (~7% of divisions result in a micronucleus). Thus, a large number of individual cells would need to be imaged with relatively high temporal resolution to make conclusions about the effects of chromosome position on micronuclear envelope formation (such analyses are not possible with the live data sets we currently have, where cells were imaged every 2 minutes). This difficulty led us to perform these measurements in synchronized and fixed cells to begin with.

      4.Based on the Fonseca et al. 2019 JCB paper (video 2), micronuclei from KIF18A KO do not exclusively arise from lagging chromosomes. Instead, chromosomes can also escape the main chromatin mass after segregation and subsequently be excluded from the main nucleus. It would be important to know what fraction of the micronuclei in KIF18A KO cells arise via lagging chromosomes. Since Aurora B and/or bundled microtubules at the spindle midzone are believed to prevent proper nuclear envelope formation, chromosomes that properly segregate but later become separated from the main nucleus would be more likely to form proper micronuclear envelopes than those arising from lagging chromosomes. The correlative microscopy experiment suggested in the previous point could allow differentiation between these two routes to micronucleus formation.

      The reviewer is correct that we did occasionally see chromosomes escape the main chromatin mass after segregation in the Fonseca et al., 2019 study referenced. We did not quantify the frequency of these events in that study, but they were rare. To address this quantitatively, we have measured the incidence of micronuclear formation around lagging chromosomes and chromosomes that escape the main chromatin mass after segregation in videos of KIF18A KO cells. We find that when micronuclei form in these cells, they form around lagging chromosomes 98% (46 out of 47 events) of the time. These data were derived from 4 live cell imaging experiments. This information has been added to the Results section (line 328-330).

      **Minor issues:**

      1.Some parts of the manuscript are excessively wordy and some sentences are unclear or convoluted (e.g., lines 148-153 and 238-239).

      Thank you for this feedback. We have revised the text in these two locations to improve clarity (lines 159-162 and 247-248 in the revised manuscript).

      2.Lines 59-61. This sentence is formulated incorrectly. First of all, the subject of the sentence is "cells" and the verb is "can become fragmented." However, the authors mean that the DNA in the micronucleus can become fragmented (not the cells). Moreover, the way the sentence is currently formulated seems to suggested that the fragmentation occurs during cell division. However, this is not the case. Please, revise the text to make it more accurate.

      We appreciate this point and have revised this text to reflect more precise language to describe this model. It is certainly the micronucleated chromatin which may become fragmented, and this fragmentation occurs as a result of replication stress, including replication fork collapse, after an existing micronucleated cell enters a subsequent round of S or G2 phase (PMIDs: 22258507, 26017310).

      3.Lines 114-115. Please, provide references in support of this statement.

      The statement in question: “This arrest was at least partially dependent on p53, consistent with other reports of cell cycle arrest following micronucleation,” shares the same references as the sentence that follows it (Sablina 1998, Thompson and Compton, 2010; Fonseca et al., 2019). We have updated the references to appear after the first statement to make this clear.

      4.Line 153. The authors refer to Fig. 1C, but I think they mean Fig. 1B.

      Thank you, we have updated the text to read Fig 1B.

      5.Line 324. the authors find that RPE1 KIF18A KO cells have lagging chromosomes in ana/telophase 9% of the time, then say that this shows that lagging chromosomes are rare in KIF18A KO cells. However, this is a large increase compared to normal RPE1 cells, which only have 1-2% frequency of lagging chromosomes. So, they should revise the text here to say that the rates of lagging chromosomes from KIF18A KO are lower compared to the rates induced by nocodazole washout.

      This is an important distinction. We have removed this confusing statement from the revised text (lines 336-338).

      6.Line 383. The references listed here should be moved earlier and specifically after the statement summarizing the results of the studies instead of being listed after the authors' conclusion/interpretation of the data. The same issue was noted in other parts of the manuscript.

      We have corrected this error (Lines 402-408). Before final submission, we will further amend the style of the manuscript throughout to cite relevant papers after the statement summarizing the results of those studies, rather than after our interpretation of the studies.

      7.Figure 1A. In the text, the authors say they cross a Kif18a heterozygous mutant mouse with a p53 heterozygous mutant mouse, but the two mice in this figure are already heterozygous for both. Please, revise the text or depict the previous additional cross necessary to obtain the double heterozygous.

      We thank the reviewer for catching this discrepancy. We have revised the text to describe the crosses necessary to obtain the double heterozygous mice shown in the figure (lines 121-123). The gcd2 mutation in Kif18a was named due to the “germ cell depleted” phenotype it causes. These homozygous mice are therefore infertile (Czechanski et al., 2015). For this reason, heterozygous mice for each gene were crossed to achieve the necessary homozygous progeny.

      8.Figure 3A. Arrows or dotted circles outlining the micronuclei in the insets of the middle and bottom rows would be helpful since the DAPI signal in the micronuclei is low and somewhat difficult to see.

      We have updated these figures as suggested to more clearly indicate the micronuclear area.

      9.Figure 3B. Error bars should be added to the graph. Moreover, the authors noted that the differences are not significant. However, this seems surprising, given that in some cases there is a three- to five-fold difference between certain pairs. Indeed, a chi-square test using the numbers from table S1 indicated p values We appreciate this feedback on the statistical tests and comparisons among these data. The main point of these analyses is to demonstrate that tissues other than blood form micronuclei in vivo in the absence of Kif18a function and that the majority of these micronuclear envelopes are completely surrounded by Lamin A/C. The data presented in Figure 3B were obtained by counting several tissue types from a single mouse of each genotype. Thus, we do not believe that error bars are appropriate in this context. To avoid confusion, we have also removed the statistical bars which had indicated no significant differences in rupture frequency among the genotypes in each sampled tissue, as these are also probably inappropriate.

      We understand the reviewer’s point that some pairwise comparisons of the data in Table S1 indicate that they are significantly different. We originally used a Chi-square test to compare the data from all three genotypes for each tissue. Because these data did not rise to the threshold of significance necessary to reject the null hypothesis across all three genotypes within each individual tissue type, we did not think performing pairwise comparisons between only two of those genotypes was appropriate (Whitlock and Schluter, The Analysis of Biological Data, 2009). Specifically, analyses of rupture frequency for spleen, liver, and thymus tissue gave p-values above 0.05 (spleen, p = 0.35; liver, p = 0.056; thymus, p = 0.052). Thus, we did not proceed with pairwise comparisons. In contrast, the analyses of p53 effects on micronucleus levels in peripheral blood in Fig 1D utilized samples from 8 individual mice for each genotype, and are therefore more amenable to statistical comparisons. If the reviewer believes any of the details of this approach are incorrect, we are happy to revise the analyses.

      10.Figure 5G. When referring to this figure (lines 292-294), the authors talk about correlation. However, the points in this graph seem to be scattered a bit randomly.

      To address this concern, we performed a Pearson’s correlation test on the data in Figure 5G. As suspected by the reviewer, this analysis did not indicate a significant correlation, and we have removed this plot from the manuscript.

      11.Figure 6B-D. The Y-axis titles of the three graphs are a bit confusing. Please, consider revising.

      We have updated the Y-axis titles for these graphs to more accurately represent what is displayed on each plot.

      12.In Figure 7 and the text, the authors use the terms "late-lagging" and "lagging" chromosomes interchangeably, which is somewhat confusing in this context because lagging chromosome distance from the main chromosome mass is thought to contribute to defective assembly of micronuclear envelopes. It is not clear whether the authors intend to indicate, with this term, that the lagging chromosome is farther away from the main chromosome mass or that the lagging chromosome is in a "late" anaphase cell. Because this is confusing, I suggest just using the term "lagging chromosome" consistently. It could be useful to include representative images of lagging chromosomes located at different distances from the main chromosome mass. And certainly, the authors should include an example of a lagging chromosome in the KIF18A KO cells.

      We agree with the reviewer’s concern regarding confusion of these terms. We have updated the text to use the term “lagging chromosome” consistently, as the reviewer suggests. We have also updated Figure 7A to include a representative image of a lagging chromosome in a KIF18A KO cell.

      13.Figure S2A. The example in the bottom right image looks more like a chromosome bridge than a lagging chromosomes. Kinetochore staining is necessary to unequivocally identify lagging chromosomes.

      We agree with the reviewer that kinetochore staining is necessary to precisely identify lagging chromosomes. We had used these images to quickly and crudely assess the presence and frequency of potentially lagging chromosomes, observed in late-anaphase cells by eye, and for subsequent experiments where lagging chromosomes were measured, repeated these experiments with proper staining of poles and kinetochores to make precise, quantifiable assessments. Reviewer #2 (Significance (Required)):

      Based on the previous knowledge on the factors that cause abnormal assembly of the micronuclear membrane, the results presented in this study were somewhat predictable. However, these findings will add to the knowledge of how micronuclei form and the potential factors that lead to micronuclear membrane rupture. Previous studies investigating micronucleus behavior have focused on micronuclei arising via merotelic kinetochore mis-attachments. These mis-attachments lead to formation of micronuclei close to the spindle midzone. In the present study, instead, the micronuclei arising from lack of KIF18A activity form farther away from the spindle midzone. The results presented here suggest that the positioning of these micronuclei farther away from the midzone enables assembly of a more stable micronuclear membrane that will be less likely to rupture during the following cell cycle. A recent study showed that the microtubule bundles in the spindle midzone interfere with micronuclear membrane assembly. Based on this, it is not surprising that micronuclei forming away from the spindle midzone (like those resulting from lack of KIF18A activity) assemble more normal membranes. Although somewhat expected, this study provides the actual data in support of this phenomenon. This study will be of interest to cell biologists interested in cell division and genomic instability. My research has focused on cell division, aneuploidy, and chromosomal instability for nearly thirty years. Therefore, I believe I am fully qualified to evaluate this manuscript.

      **Referees cross-commenting**

      My areas of expertise do not include nuclear membrane structure and function. Therefore, I encourage the authors to consider the comments of reviewer #3 for issues related to reliable quantification of micronuclear membrane rupture.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary** Sepaniac et al demonstrate that loss of KIF18a, a motor protein required for proper chromosome congression and chromatin compaction during mitosis, is insufficient to drive tumor development in mice although it does increase the frequency of micronuclei (MN), nuclear compartments that form around broken or missegregated chromosomes, in both normal and tumor tissue. MN are thought to increase genome instability and metastasis by undergoing DNA damage and activating innate immune signaling after irreparable nuclear membrane rupture. The authors use a non-transformed human cell line, hTERT-RPE-1, with KIF18a knocked out to demonstrate that MN formed as a result of KIF18a loss have more stable nuclear membranes than MN generated by other methods. They go on to correlate this increased stability with increased chromosome proximity to the main chromatin mass during nuclear envelope assembly and increased chromatin decompaction by a combination of fixed and live cell imaging.

      **Major Comments**

      1.This study relies heavily on the use of lamin A loss or discontinuity to identify ruptured micronuclei. Although the authors validate this marker against "leakage" of the soluble nuclear protein mCherry-NLS, there are several lines of evidence suggesting that lamin A loss or disruption is not a reliable reporter. In figure S3C, the top two panels of intact MN in the KIF18KO appear ruptured based on the gH2AX labeling, yet have significant levels of lamin A and are labeled as intact. In figure 4D, the rate of MN rupture after nocodazole release (60% ruptured in 2 hours) is much faster than that reported in other papers (40-60% in 16-18 hours, Liu et al; 60% in 16 hours; Hatch et al). In addition, images in Hatch et al, 2013 show lamin A localizing to both intact and ruptured MN and anecdotal information in the field suggests that lamin A localization is not a reliable reporter.

      These discrepancies may be due to how the authors' define "mCherry-NLS leakage", which needs to be defined in the methods as previous studies have demonstrated that MN frequently have delayed or reduced nuclear import even though the membrane is intact. Regardless, the authors need to provide compelling independent evidence that lamin A loss and disruption faithfully recognize ruptured MN by either validating this marker against additional rupture reporters, such as Lap2, LBR, or emerin accumulation, or by repeating key experiments in cells expressing mCherry-NLS.

      Our decision to use lamin A/C as a reporter was based on its use as a marker for micronuclear envelope presence in prior studies (Hatch, 2013; Liu, 2018). We were unaware of anecdotal information in the field that suggests that lamin A localization may not be a reliable reporter.

      However, we think we understand the reviewer’s point to be that although it is clear from prior studies that gaps in the nuclear lamina are a known predictor of micronuclear rupture, these gaps can persist for some time before rupture has actually occurred. We agree that this is an important distinction and thank the reviewer for raising these questions.

      As the reviewer notes, we performed control experiments to address this issue and validate the use of lamin A/C as a marker of micronuclear envelope rupture. Our approach involved correlating lamin staining with the localization of mCherry-NLS signal to the micronucleus (Figure S1). We found that these signals correlated well. As the reviewer points out, this analysis in fixed cells could be misleading in cases where nuclear import is reduced, but the micronuclear envelope is intact. If this were a significant contributor, we may have expected to see greater instances of micronuclei that exhibit continuous lamin A/C signal but lack nuclear localization of mCherry-NLS. However, we found this combination was rare among the KIF18A and RPE1 nocodazole washout treated cells (2%, or 1 of 46 micronuclei had continuous lamin A/C while lacking mCherry-NLS). We admit that this assumption may be oversimplified though.

      The reviewer’s point about the timing of nocodazole treatment and washout something we have definitely considered. We note that prior studies have used differing time points after nocodazole treatment and release. For Hatch et al., 2013: U2OS cells were treated for 6 hours with nocodazole and then subjected to mitotic shakeoff, 48% of micronuclei were ruptured after 6 hours and ~60% were ruptured after 16 hours. Similarly, in Liu et al., 2018 60% of micronuclei were ruptured 16 hours post mitotic shake off and nocodazole release. While these results suggest that rupture increases with time after mitosis, it isn’t clear how early rupture may occur. In other words, does it take several hours in G2 before nearly half of micronuclei rupture or do many of these rupture shortly after cell division?

      We note that other explanations could also potentially contribute to the differences in rupture rates reported in our study compared to those in previous publications. For example, we used a short nocodazole treatment (2 hrs) compared to the longer treatments (6 hrs) used in previous studies. We did this originally in order to produce a similar percentage of micronucleated cells as is seen in KIF18A KO cell populations. However, the difference in nocadozole treatment length could potentially influence the types and frequencies of kinetochore microtubule attachments formed. For example, if centrosomes stay closer together in mitotic cells after short nocodazole treatments, this could increase the number of abnormal attachments (e.g. PMID: 22130796). Such an effect would be expected to increase the frequency of lagging chromosomes and/or potentially produce more lagging chromosomes within the anaphase midzone.

      The best way to address this issue would be to repeat our analyses of mcherry-NLS in live cells to track the formation and rupture of micronuclei. We did attempt these live imaging experiments previously and have found this experiment challenging due to: 1) the low frequency of micronuclear formation in KIF18A KO cell population; 2) a low transfection/expression efficiency for the mCherry-NLS plasmid in RPE1 cells, and 3) photobleaching of the mCherry-NLS plasmid. For these reasons, we transitioned into fixed cell experiments for the mCherry-NLS reporter. However, we propose to troubleshoot this assay and attempt to obtain the data necessary to determine when rupture is occurring. In addition, we will use additional markers to investigate micronuclear envelope stability, as the reviewer has suggested.

      Regardless of the outcome of these experiments, we have measured a clear difference between the lamin deposition within micronuclear envelopes of KIF18A KO cells compared to those formed following other insults. Lamin recruitment is well established as a predictor of nuclear envelope stability. If necessary, we could alter the text to indicate that the presence of lamin A/C and B within micronuclear envelopes of KIF18A KO cells are indicative of nuclear envelope stability, and that this is distinct from the lamin profiles of micronuclei in cells subjected to nocodazole-washout.

      2.Micronuclei in tumor sections and other dense tissues can appear very similar to other types of chromatin, including blebs from adjacent nuclei and dead cells. To verify that the quantified structures are bona fide micronuclei, the authors need to include a marker for the cell boundary. This is especially critical in the lamin a stained tumor sections with heterogenous lamin A protein expression.

      We appreciate the point this reviewer raises and we carefully considered accurate identification of micronuclei in tissues. Three optical sections were collected from each sample. During analyses, we scrolled through the ~2-micron thick sections to exclude chromatin bodies connected to an out-of-plane nucleus or nuclear bleb. We have a limited number of sectioned and preserved thymic lymphoma tissues remaining. We will use these samples to reassess micronuclear frequency in the presence of a cell boundary marker.

      3.Figure 4 compares MN rupture frequency between cells treated with different inducers of micronuclei - KIF18A KO, nocodazole release, and irradiation. These treatments have different effects on the cell cycle: KIF18A causes minor delays, nocodazole arrests cells in mitosis, and g-IR likely causes delays in S and G2. Since MN rupture frequency increases with the duration of interphase, the authors need to assess rupture frequency at similar time points after mitosis for all three conditions. One way to accomplish this would be to repeat this experiment and analyze cells collected by mitotic cells by shake-off prior to fixation and labeling.

      We appreciate this point regarding differences in mitotic timing. Since micronuclear rupture frequency increases with time in interphase, we would expect the MN in KIF18A KO cells to exhibit the highest level of rupture if cell cycle timing were the primary variable affecting stability in our experiments. KIF18A KO cells are asynchronously dividing, and the micronuclei examined in populations of those cells could have been generated at any time. We do not have the same type of temporal control of these events as we do with drug treatment. In contrast, the vast majority of the MN in nocodazole washout cells would not have been in interphase for more than 1.5 hours in our experiments, yet showed increased lamin A/C defects. RPE1 cells treated with MAD2 siRNA knockdown, which do not experience mitotic delays (PMID: 9606211; 15239953), also showed greater frequencies of micronuclear envelopes which lacked lamin A/C compared to those arising in KIF18A KO cells.

      To further address this question, we could attempt a mitotic shake-off assay, however, we believe that the formation of micronuclei, as a percentage in the population of KIF18A KO cells, will be limiting in these experiments.

      As an alternative, we propose to use live cell imaging to follow micronuclear formation and rupture, as described above in reference to point 1.

      **Minor Comments**

      1.In figure 6A, it is unclear when the videos start and how micronuclei are selected for analysis. Do the micronuclei have to be continuously visible from the time they missegregate? Do the videos all start at the same time point during mitosis or is it contingent on when the MN appears separated from the main nucleus? One concern is that a consistent delay in micronucleus appearance in the nocodazole treated cells could artificially decrease the amount of MN expansion observed.

      We thank the reviewer for these questions. The individual micronuclei did not need to be continuously visible from the time that they missegregated, though the majority were. When a micronucleus was not sufficiently in the plane of focus for an accurate area measurement, the individual measurement at that time point was not collected. In cases where one or more frames which were not measurable, a micronucleus was only included in the final data set if it was 1) the only micronucleus present in the daughter cell or 2) easily identifiable to be the same micronucleus. Measurements were taken until the micronuclear area reached an equilibrium for several frames. Final fold change in area was established by dividing final area measurements by initial measurements.

      The initial measurement for each micronucleus taken from the videos all start at the same relative point during mitosis, which is just after chromosome segregation has occurred.

      2.In figure 7A, it is difficult to identify the "lagging" chromosome in the top panel. It would be helpful to label the chromosome that becomes the MN, or ideally, to include a video or still images to demonstrate how micronuclei form in the KIF18A KO cells.

      We have updated the images in Figure 7A to include an example of a lagging chromosome in a KIF18A KO cell. We will also include a more explicit reference to our previous study (Fonseca et al., 2019), which described how micronuclei form around lagging chromosomes in KIF18A KO RPE1 cells.

      3.The two image panels in figure 7A are imaged at significantly different times during anaphase (early anaphase on bottom versus late anaphase/telophase on top). A better comparison would be between two cells at the same time point in anaphase.

      We have updated the images in Figure 7A to compare cells at similar stages of anaphase. In our quantification of lagging chromosomes, we also accounted for anaphase-timing differences by normalizing all measurements within each half-spindle.

      Reviewer #3 (Significance (Required)):

      In this study, the authors identify chromatin decondensation in micronuclei as a new predictor of membrane stability. Although these results are correlative, if their micronucleus rupture results can be validated as described in major comment 1, this study would advance our understanding of the micronucleus rupture mechanism by linking mitotic spindle location, chromatin decondensation, and lamin B1 protein recruitment. This would provide needed support to a current model in the field that micronucleus stability is largely determined during nuclear envelope assembly. In addition, if KIF18a loss generates stable micronuclei at high frequency, it will become a critical system for testing MN rupture hypotheses in the field. Thus, this work would be of significant interest to cell biologists working on nuclear envelope structure and function, chromosome organization, and mitosis. I include myself in this group as a cell biologist studying nuclear envelope structure and function with an expertise in membrane dynamics. The authors also find that mice mutant for KIF18a have increased micronucleation in normal tissues but not increased tumor initiation. They hypothesize that this is due to the low rupture frequency of KIF18a-induced MN, however their data cannot reject the null hypothesis that the small increase in MN they see in KIF18a mutant mice would be insufficient to induce tumorigenesis even if rupture frequency was high. Thus the significance of their finding that micronucleation is not sufficient for cancer progression is unclear. However, the thorough analysis of micronucleation and rupture in several healthy tissues as well as a tumor model in KIF18 mutant mice would be of interest to both pathologists and cancer researchers focused on mechanisms of genome instability. These types of experiments are critical to determine how micronuclei contribute to cancer progression and the quantifications presented in this paper are truly impressive.

      We appreciate this reviewer’s enthusiasm for our work and acknowledge that we cannot definitively conclude that micronuclear envelope stability explains why Kif18a mutant mice do not form tumors. However, it is interesting to note that the micronuclear loads measured using a peripheral erythrocyte assay are similar in Kif18agcd2/gcd2 mutant mice (0.6% micronucleated erythrocytes, of total erythrocytes) and ATMtm1 Awb/tm1 Awb mutant mice (0.6% of micronucleated erythrocytes, of total) (Fonseca et al., 2019). Yet, the tumor frequency in these two models is dramatically different: Kif18agcd2/gcd2 mutant mice do not spontaneously form tumors – while the majority of ATMtm1 Awb/tm1 Awb mutant mice do develop thymic lymphoma tumors between 2 and 4 months (Barlow, 1996). It is not clear how much micronuclei contribute to tumorigenesis in the ATM mutant model, but this comparison does suggest that the increase in MN seen in Kif18a mutants may be physiologically relevant. We have added this information to the revised text (lines 125-130).

      **Referees cross-commenting**

      I agree with the concerns raised by the other 2 reviewers, especially their comments about the need to clarify the mechanism of chromosome lagging versus chromosome congression and compaction. I think that all of these suggestions, though, are contingent on them being able to reproduce their micronucleus rupture results with a better marker of nucleus integrity. I strongly believe that additional validation of lamin A as a micronucleus rupture marker will demonstrate that it is unreliable, based both on our own observations in RPE-1 cells and the images they show

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      Sepaniac et al demonstrate that loss of KIF18a, a motor protein required for proper chromosome congression and chromatin compaction during mitosis, is insufficient to drive tumor development in mice although it does increase the frequency of micronuclei (MN), nuclear compartments that form around broken or missegregated chromosomes, in both normal and tumor tissue. MN are thought to increase genome instability and metastasis by undergoing DNA damage and activating innate immune signaling after irreparable nuclear membrane rupture. The authors use a non-transformed human cell line, hTERT-RPE-1, with KIF18a knocked out to demonstrate that MN formed as a result of KIF18a loss have more stable nuclear membranes than MN generated by other methods. They go on to correlate this increased stability with increased chromosome proximity to the main chromatin mass during nuclear envelope assembly and increased chromatin decompaction by a combination of fixed and live cell imaging.

      Major Comments

      1.This study relies heavily on the use of lamin A loss or discontinuity to identify ruptured micronuclei. Although the authors validate this marker against "leakage" of the soluble nuclear protein mCherry-NLS, there are several lines of evidence suggesting that lamin A loss or disruption is not a reliable reporter. In figure S3C, the top two panels of intact MN in the KIF18KO appear ruptured based on the gH2AX labeling, yet have significant levels of lamin A and are labeled as intact. In figure 4D, the rate of MN rupture after nocodazole release (60% ruptured in 2 hours) is much faster than that reported in other papers (40-60% in 16-18 hours, Liu et al; 60% in 16 hours; Hatch et al). In addition, images in Hatch et al, 2013 show lamin A localizing to both intact and ruptured MN and anecdotal information in the field suggests that lamin A localization is not a reliable reporter.

      These discrepancies may be due to how the authors' define "mCherry-NLS leakage", which needs to be defined in the methods as previous studies have demonstrated that MN frequently have delayed or reduced nuclear import even though the membrane is intact. Regardless, the authors need to provide compelling independent evidence that lamin A loss and disruption faithfully recognize ruptured MN by either validating this marker against additional rupture reporters, such as Lap2, LBR, or emerin accumulation, or by repeating key experiments in cells expressing mCherry-NLS.

      2.Micronuclei in tumor sections and other dense tissues can appear very similar to other types of chromatin, including blebs from adjacent nuclei and dead cells. To verify that the quantified structures are bona fide micronuclei, the authors need to include a marker for the cell boundary. This is especially critical in the lamin a stained tumor sections with heterogenous lamin A protein expression.

      3.Figure 4 compares MN rupture frequency between cells treated with different inducers of micronuclei - KIF18A KO, nocodazole release, and irradiation. These treatments have different effects on the cell cycle: KIF18A causes minor delays, nocodazole arrests cells in mitosis, and g-IR likely causes delays in S and G2. Since MN rupture frequency increases with the duration of interphase, the authors need to assess rupture frequency at similar time points after mitosis for all three conditions. One way to accomplish this would be to repeat this experiment and analyze cells collected by mitotic cells by shake-off prior to fixation and labeling.

      Minor Comments

      1.In figure 6A, it is unclear when the videos start and how micronuclei are selected for analysis. Do the micronuclei have to be continuously visible from the time they missegregate? Do the videos all start at the same time point during mitosis or is it contingent on when the MN appears separated from the main nucleus? One concern is that a consistent delay in micronucleus appearance in the nocodazole treated cells could artificially decrease the amount of MN expansion observed.

      2.In figure 7A, it is difficult to identify the "lagging" chromosome in the top panel. It would be helpful to label the chromosome that becomes the MN, or ideally, to include a video or still images to demonstrate how micronuclei form in the KIF18A KO cells.

      3.The two image panels in figure 7A are imaged at significantly different times during anaphase (early anaphase on bottom versus late anaphase/telophase on top). A better comparison would be between two cells at the same time point in anaphase.

      Significance

      In this study, the authors identify chromatin decondensation in micronuclei as a new predictor of membrane stability. Although these results are correlative, if their micronucleus rupture results can be validated as described in major comment 1, this study would advance our understanding of the micronucleus rupture mechanism by linking mitotic spindle location, chromatin decondensation, and lamin B1 protein recruitment. This would provide needed support to a current model in the field that micronucleus stability is largely determined during nuclear envelope assembly. In addition, if KIF18a loss generates stable micronuclei at high frequency, it will become a critical system for testing MN rupture hypotheses in the field. Thus, this work would be of significant interest to cell biologists working on nuclear envelope structure and function, chromosome organization, and mitosis. I include myself in this group as a cell biologist studying nuclear envelope structure and function with an expertise in membrane dynamics.

      The authors also find that mice mutant for KIF18a have increased micronucleation in normal tissues but not increased tumor initiation. They hypothesize that this is due to the low rupture frequency of KIF18a-induced MN, however their data cannot reject the null hypothesis that the small increase in MN they see in KIF18a mutant mice would be insufficient to induce tumorigenesis even if rupture frequency was high. Thus the significance of their finding that micronucleation is not sufficient for cancer progression is unclear. However, the thorough analysis of micronucleation and rupture in several healthy tissues as well as a tumor model in KIF18 mutant mice would be of interest to both pathologists and cancer researchers focused on mechanisms of genome instability. These types of experiments are critical to determine how micronuclei contribute to cancer progression and the quantifications presented in this paper are truly impressive.

      Referees cross-commenting

      I agree with the concerns raised by the other 2 reviewers, especially their comments about the need to clarify the mechanism of chromosome lagging versus chromosome congression and compaction.

      I think that all of these suggestions, though, are contingent on them being able to reproduce their micronucleus rupture results with a better marker of nucleus integrity. I strongly believe that additional validation of lamin A as a micronucleus rupture marker will demonstrate that it is unreliable, based both on our own observations in RPE-1 cells and the images they show

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Sepaniac and colleagues use in vivo and in vitro approaches to examine why micronuclei generated by lack of KIF18A activity do not promote tumorigenesis. The authors conclude that micronuclei in KIF18A depleted cells form stable micronuclear envelopes, which may be a result from lagging chromosomes being closer to the spindle pole when the micronuclear envelope forms. The authors further conclude that the stability of the micronuclei arising from lack of KIF18A can explain why Kif18a mutant mice do not develop tumors. These results also suggest that the consequences of micronuclei and their possible contribution to tumorigenesis depend on the context of their genesis. While the mouse model data and characterization of the stability of micronuclei generated by different insults support the conclusions, the lagging chromosome positioning data could be improved. Moreover, a number of other issues should be addressed prior to publication.

      Major issues:

      1.Line 153-155. The authors conclude that the slight reduction in overall survival is "due to a reduced ability of Kif18a mutants to cope with rapid tumorigenesis," but it is unclear why this would be the case. There is also an increase in micronucleated cells in thymic lymphomas from Kif18a/p53 homozygous mice (Fig. 2B)-could this not contribute? In Fig. 3C, the authors show that micronuclear rupture is similar in both Kif18a WT and mutant mice, so it seems possible that the increase in the frequency of micronuclei (Fig. 2B) coupled with a similar frequency of micronuclear rupture (Fig. 3C) could lead to the reduced survival. Then, in the discussion, the authors downplay this finding by saying (line 371) "loss of Kif18a had modest or no effect on survival of Trp53 homozygotes and heterozygotes." Why then speculate earlier in the text that loss of Kif18a reduces the ability to cope with tumorigenesis?

      2.Related to the point above, the authors show in figure 3 that the micronuclei found in healthy tissues display infrequent membrane rupture (panel B). However, micronuclear membrane rupture in tumor tissues is much more frequent (panel C). How do the authors explain this? Do they hypothesize that the micronuclei in the tumors originate by mechanisms other than the misalignment caused by lack of KIF18A? Does KIF18A depletion cause aneuploidy due to segregation of two sisters to the same pole? If so, one could expect the tumors to be aneuploid (is this the case?) and aneuploidy has been shown by numerous groups to cause genomic instability. Such genomic instability could then explain the difference in membrane rupture.

      3.The authors conclude that lagging chromosomes in KIF18A KO cells are found closer to the main chromatin mass. The Stumpff lab showed in a 2019 JCB paper that KIF18 KO cells have a chromosome alignment defect and as a result during anaphase the chromosomes can be scattered rather than forming the tight, uniform mass that is observed in WT cells. The scattering of kinetochores resulting from this phenotype could affect the value of "Avg Chromosomes Distances" in Fig 7B and the normalized distance in the KIF18A KO cells. Therefore, live-cell imaging experiments would be helpful to resolve this and possibly strengthen this conclusion. RPE1 cells with fluorescently tagged CENP-A and centrin could be used to ensure that the lagging chromosomes will not rejoin the main nucleus. Moreover, these cells could be used for correlative live-fixed cell experiments in which fixed cell analysis following micronucleus formation could be used to show that chromosomes that lag farther away from the spindle pole are more likely to have defective micronuclear envelopes.

      4.Based on the Fonseca et al. 2019 JCB paper (video 2), micronuclei from KIF18A KO do not exclusively arise from lagging chromosomes. Instead, chromosomes can also escape the main chromatin mass after segregation and subsequently be excluded from the main nucleus. It would be important to know what fraction of the micronuclei in KIF18A KO cells arise via lagging chromosomes. Since Aurora B and/or bundled microtubules at the spindle midzone are believed to prevent proper nuclear envelope formation, chromosomes that properly segregate but later become separated from the main nucleus would be more likely to form proper micronuclear envelopes than those arising from lagging chromosomes. The correlative microscopy experiment suggested in the previous point could allow differentiation between these two routes to micronucleus formation.

      Minor issues:

      1.Some parts of the manuscript are excessively wordy and some sentences are unclear or convoluted (e.g., lines 148-153 and 238-239).

      2.Lines 59-61. This sentence is formulated incorrectly. First of all, the subject of the sentence is "cells" and the verb is "can become fragmented." However, the authors mean that the DNA in the micronucleus can become fragmented (not the cells). Moreover, the way the sentence is currently formulated seems to suggested that the fragmentation occurs during cell division. However, this is not the case. Please, revise the text to make it more accurate.

      3.Lines 114-115. Please, provide references in support of this statement.

      4.Line 153. The authors refer to Fig. 1C, but I think they mean Fig. 1B.

      5.Line 324. the authors find that RPE1 KIF18A KO cells have lagging chromosomes in ana/telophase 9% of the time, then say that this shows that lagging chromosomes are rare in KIF18A KO cells. However, this is a large increase compared to normal RPE1 cells, which only have 1-2% frequency of lagging chromosomes. So, they should revise the text here to say that the rates of lagging chromosomes from KIF18A KO are lower compared to the rates induced by nocodazole washout.

      6.Line 383. The references listed here should be moved earlier and specifically after the statement summarizing the results of the studies instead of being listed after the authors' conclusion/interpretation of the data. The same issue was noted in other parts of the manuscript.

      7.Figure 1A. In the text, the authors say they cross a Kif18a heterozygous mutant mouse with a p53 heterozygous mutant mouse, but the two mice in this figure are already heterozygous for both. Please, revise the text or depict the previous additional cross necessary to obtain the double heterozygous.

      8.Figure 3A. Arrows or dotted circles outlining the micronuclei in the insets of the middle and bottom rows would be helpful since the DAPI signal in the micronuclei is low and somewhat difficult to see.

      9.Figure 3B. Error bars should be added to the graph. Moreover, the authors noted that the differences are not significant. However, this seems surprising, given that in some cases there is a three- to five-fold difference between certain pairs. Indeed, a chi-square test using the numbers from table S1 indicated p values <0.05 for several pairwise comparisons.

      10.Figure 5G. When referring to this figure (lines 292-294), the authors talk about correlation. However, the points in this graph seem to be scattered a bit randomly.

      11.Figure 6B-D. The Y-axis titles of the three graphs are a bit confusing. Please, consider revising.

      12.In Figure 7 and the text, the authors use the terms "late-lagging" and "lagging" chromosomes interchangeably, which is somewhat confusing in this context because lagging chromosome distance from the main chromosome mass is thought to contribute to defective assembly of micronuclear envelopes. It is not clear whether the authors intend to indicate, with this term, that the lagging chromosome is farther away from the main chromosome mass or that the lagging chromosome is in a "late" anaphase cell. Because this is confusing, I suggest just using the term "lagging chromosome" consistently. It could be useful to include representative images of lagging chromosomes located at different distances from the main chromosome mass. And certainly, the authors should include an example of a lagging chromosome in the KIF18A KO cells.

      13.Figure S2A. The example in the bottom right image looks more like a chromosome bridge than a lagging chromosomes. Kinetochore staining is necessary to unequivocally identify lagging chromosomes.

      Significance

      Based on the previous knowledge on the factors that cause abnormal assembly of the micronuclear membrane, the results presented in this study were somewhat predictable. However, these findings will add to the knowledge of how micronuclei form and the potential factors that lead to micronuclear membrane rupture. Previous studies investigating micronucleus behavior have focused on micronuclei arising via merotelic kinetochore mis-attachments. These mis-attachments lead to formation of micronuclei close to the spindle midzone. In the present study, instead, the micronuclei arising from lack of KIF18A activity form farther away from the spindle midzone. The results presented here suggest that the positioning of these micronuclei farther away from the midzone enables assembly of a more stable micronuclear membrane that will be less likely to rupture during the following cell cycle. A recent study showed that the microtubule bundles in the spindle midzone interfere with micronuclear membrane assembly. Based on this, it is not surprising that micronuclei forming away from the spindle midzone (like those resulting from lack of KIF18A activity) assemble more normal membranes. Although somewhat expected, this study provides the actual data in support of this phenomenon. This study will be of interest to cell biologists interested in cell division and genomic instability. My research has focused on cell division, aneuploidy, and chromosomal instability for nearly thirty years. Therefore, I believe I am fully qualified to evaluate this manuscript.

      Referees cross-commenting

      My areas of expertise do not include nuclear membrane structure and function. Therefore, I encourage the authors to consider the comments of reviewer #3 for issues related to reliable quantification of micronuclear membrane rupture.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In the present work Stumpff, Reinholdt and co-workers investigate the mechanism by which micronuclei contribute to tumorigenesis. Micronuclei are classic markers of genomic instability widely used in the diagnosis of cancer, but whether they work as drivers of the process has recently attracted significant attention due to their link with chromothripsis. Here, the Stumpff/Reinhold labs have explored an interesting model to test some ideas about the role of micronuclei as drivers of tumorigenesis, based on Kif18A/p53 double KO mice. They confirm the formation of micronuclei in these animals, but find no substantial increase in survival and tumor incidence relative to p53 KO animals, despite higher incidence of micronuclei in Kif18A/p53 KO tumors. They conclude that, per se, micronuclei do not have the capacity to form tumors, regardless of p53 status. This was surprising, given the well-established role of p53 in preventing the proliferation of micronucleated cells. To shed light into this apparent paradox, they compared micronuclei from Kif18A KO cells with micronuclei generated by a number of other experimental conditions that promote formation of anaphase lagging chromosomes or generates acentric fragments. They found that micronuclei derived from Kif18A are intrinsically different from micronuclei generated by those other means and essentially showed increased accumulation of lamin B, were more resistant to rupture and preserved the capacity to expand as cells exited mitosis. Of note, they find a correlation between chromosome proximity to the poles/main chromosome mass and the different features that characterize micronuclei from Kif18A KO cells, compared with the other experimental conditions in which late lagging chromosomes are more frequent. Overall, I find this study extremely interesting, well designed and executed in a rigorous way that characterizes the consistent solid work from these laboratories over the years. I have just few minor points that I recommend to be addressed prior to publication.

      1-Abstract and main text lines 70 and 100: the authors indicate that Kif18A mutant mice produce micronuclei due to unaligned chromosomes. This is correct, but at the same time misleading. The authors should clarify that although micronuclei derive from compromised congression, I was convinced from previous works (Fonseca et al., JCB, 2019) that it was their asynchronous segregation in anaphase that led to micronuclei formation. As is, a less familiar reader may conceive that misaligned chromosomes directly result in micronuclei, for example by being detached from the main chromosome mass.

      2-Page 2, line 59: "cells entering cell division...become fragmented". It is not the cells, but the chromosomes that fragment. Please correct.

      3-Page 4, line 149: "reduced survival in the Kif18A null, p53 mice". P53 what? KO, WT? Please clarify.

      4-Page 5, line 212: the authors refer that micronuclei were scored for absence of lamin A/C, but previously they scored it as "continuous/discontinuous". Please clarify.

      5-Page 6, line 243: "Kif18A is not required for micronuclear envelope rupture". Shouldn't it be micronuclear envelope "integrity"?

      6-One of the most interesting results of the paper is the correlation between envelope formation in micronuclei with their respective position relative to the poles/midzone. Could the authors try to investigate causality? For instance, the authors refer to works from other labs in which MT bundles and a midzone Aurora B activity gradient might play a role in the different features associated with micronuclei envelope formation, depending on their origin. Could the authors manipulate this gradient and investigate whether it changes the outcome in terms of nuclear envelope assembly properties on micronuclei? Are there any detectable features in midzone MT organization in Kif18A KO cells that would justify the observed differences?

      Significance

      Kif18A plays a key role in chromosome alignment, without apparently affecting kinetochore-microtubule attachments in non-transformed cells. Because they cannot establish a proper metaphase plate Kif18A KO cells enter anaphase with highly asynchronous segregation due to non-uniform chromosome distribution along the spindle axis. Consequently, some "delayed" chromosomes form micronuclei, in cell culture and in vivo. Interestingly, prior art has failed to detect any increased signs of genomic instability in Kif18A KO cells and mice, and, contrary to what would be expected based on current trends, these mice do now show any signs of increased incidence of tumors, in fact they even show some protective effect to induced colitis-associated colorectal cancer. Noteworthy, all previous experimental works pointing to a role of micronuclei as key intermediates of genomic instability in cancer relied on models in which the tumor suppressor protein p53 had been inactivated. In the present work, the authors explore the relationship between micronuclei formation and p53 inactivation by investigating tumor formation in Kif18A/p53 double KO animals (1 or 2 alleles of p53 inactivated).The reported results are timely and will attract the interest of a broad readership, while decisively contributing to shed light into an ongoing debate. I am therefore all in favor for the publication of this work in any journal affiliated with review commons, pending some minor revisions.

  3. Dec 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their constructive suggestions, which have substantially improved this work. We have comprehensively revised the manuscript, and detail individual responses below:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The study by Forbes et al describes and characterizes a 2nd generation peptide-based inhibitor of the MYB:CBP interaction, termed CRYBMIM, which they use to study MYB:cofactor interactions in leukemia cells. The CRYBMIM has improved properties relative to the MYBMIM peptide, and display more potency in biochemical and cell-based assays. Using a combination of epigenomics and biochemical screens, the authors define a list of candidate MYB cofactors whose functional significance as AML dependencies is supported by analysis of the DepMap database. Using genomewide profiling of TF and CBP occupancy, the authors provide evidence that CRYBMIM treatment reprograms the interactome of MYB in a manner that disproportionately changes specific cis-elements over others. Stated differently, the overall occupancy pattern of many TFs/cofactors shows gains and losses at specific cis elements, resulting in a complex modulation of MYB function and changes in transcription in leukemia cells. Overall, this is a strong, well-written study, with clear experimental results and relatively straightforward conclusions. The therapeutic potential of modulating MYB in cancer is enormous, and hence I believe this study will attract a broad interest in the cancer field and will likely be highly cited. I list below a few control experiments that would clarify the specificity of CRYBMIM. 1) Does CRYBMIM bind to other KIX domains, such as of MED15. It would be important to evaluate the specificity of this peptide for whether it binds to other KIX domains.

      Response: We analyzed all known human KIX domain sequences, and found that the most similar one to CBP/P300 is MED15 (38% identity), as shown in revised Supp. Fig. 2D. The sequence similarity of the remaining human KIX domains is substantially lower. To determine the specificity of CRYBMIM in binding the CBP/P300 versus MED15, we exposed human AML cell extracts to biotinylated CRYBMIM immobilized on streptavidin beads versus beads alone. Whereas CRYBMIM binds efficiently to CBP/P300, it does not exhibit any measurable binding to MED15 (even though MED15 is highly expressed), as shown in revised Supp. Fig. 2E, and reproduced for convenience below. While this does not exclude the possibility that CRYBMIM binds to other proteins, the biochemical specificity observed here, combined with the genetic requirement of CBP for cellular effects of CRYBMIM as shown by a genome-wide CRISPR screen (Fig. 1B and below), indicate that CRYBMIM is a specific ligand of CBP/P300. The manuscript has been revised on page 6 and 4-5 accordingly.

      2) Similarly, it would be useful to perform a mass spec analysis to all nuclear factors that associate with streptavidin-immobilized CRYBMIM. This again would be help the reader to understand the specificity of this peptide.

      Response: We agree with the reviewer that macromolecular ligands like CRYBMIM may interact with cellular proteins in complex ways. To define specific effects, we utilized four orthogonal strategies, explained below.

      First, we purified the CBP-containing nuclear complex using immunoprecipitation and determined its composition by mass spectrometry proteomics. This analysis revealed 833 proteins that are specifically associated with CBP (revised Table S6). Although technically feasible, the fact that CBP is associated with hundreds of proteins would make the experiment suggested by the reviewer difficult to interpret, because it would be a major challenge to distinguish proteins bound directly by the peptide versus proteins purified indirectly by virtue of the fact that CRYBMIM binds to CBP/P300, which in turn binds to many other proteins. While we recently developed improved methods for cross-linking mass spectrometry proteomics that permit the identification of direct protein-protein interactions (Ser, Cifani, Kentsis 2019, https://doi.org/10.1021/acs.jproteome.9b00085), we believe that these experiments are beyond the scope of the current manuscript, which already includes 40 new figure panels as part of this revision.

      In lieu of this experiment, we purified the CBP-containing nuclear complex after treatment with CRYBMIM or control using immunoprecipitation and determined its composition by targeting Western blotting. This analysis revealed RUNX1, LYL1 and SATB1 are specifically associated with CBP (revised Fig. 14B), among which RUNX1 is specifically remodeled in the MYB:CBP/P300 complex upon CRYBMIM binding. This transcriptional factor recruitment and remodeling support the idea of CRYBMIM’s specificity for the MYB:CBP/P300 complex.

      Second, to define the specificity of CRYBMIM, we used glycine mutants of CRYBMIM and its parent MYBMIM, CG3 and TG3, respectively, in which residues that form key salt bridge and hydrophobic interactions with KIX are replaced with glycines, but otherwise retain all other features of the active probes. Both CG3 and TG3 exhibit significantly reduced effects on the viability of AML cell lines, consistent with the specific effects of CRYBMIM (Fig. 3D).

      To confirm that this is due to CBP binding, we purified cellular CBP/P300 by binding to biotinylated CRYBMIM, and observed that it can be efficiently competed by excess of free CRYBMIM, but not TAT (Fig. 2E).

      Finally, to establish definitively that cellular CBP is responsible for CRYBMIM effects, we generated isogenic cell lines that are either deficient or proficient for CBP using CRISPR genome editing. This experiment demonstrated that CBP deficiency confers significant resistance to CRYBMIM, indicating that CBP is required for CRYBMIM-mediated effects (revised Figure 4), and reproduced below. We revised the manuscript on pages 21, 8, 6 and 9 accordingly.

      3) The major limitation of this study which modestly lessens my enthusiasm of this work is that the mechanistic model of MYB-sequestered TFs proposed here is based on a face-value interpretation of IP-MS data coupled with ChIP-seq data. Normally, I would expect such a mechanism to be supported with some additional focused biochemical experiments of specific interactions, to complement all of the omics approaches. For example, can the authors evaluate and/or validate further how MYB physically interacts with LYL1, CEBPA, SPI1, or RUNX1. Are these interactions direct or indirect? Which domains of these proteins are involved? Does CRYBMIM treatment modulate the ability of these proteins to associate with one another in a co-IP? Do these interactions occur in normal hematopoietic cells? A claim is made throughout this study that these are aberrant TF complexes, but I believe more evidence is required to support this claim.

      Response: We appreciate the reviewer’s comment and totally agree with this point. To examine how MYB aberrantly assembles transcription factors in AML, we performed MYB co-immunoprecipitation (co-IP) in a panel of seven genetically diverse AML cell lines with varying susceptibility to CRYBMIM, chosen to represent the common and refractory forms of human AML. Here, we confirmed co-assembly of CBP/P300, LYL1, E2A, LMO2 in all AML cell lines tested, and cell type-specific co-assembly of SATB1 and CEBPA, as shown in revised Fig. 8A, which are in agreement with the IP-MS and ChIP-seq results. We further corroborated these findings by co-IP studies of CBP/P300, as shown in the revised Fig. 8B. We performed similar co-IP experiments in normal hematopoietic progenitor cells, and found most of the co-assembled factors in AML cells were not observed in normal cells except for CBP/P300 and LYL1, as shown in the revised Figure 9E. Combined with the apparently aberrant expression of E2A and SATB1 in AML cells but not normal blood cells, this leads us to conclude that MYB assembles aberrant transcription factor complexes in AML cells. These complexes can be remodeled by peptidomimetic inhibitors, leading to their redistribution on chromatin, suppression of oncogenic gene expression and induction of cellular differentiation. We confirmed this mechanism by direct biochemical experiments in AML cells, demonstrating disassembly and remodeling of CBP/P300 complexes, as shown in the revised Figure 14. At least some of these interactions are direct, given the known direct binding between MYB and CEBPA (Oelgeschläger, Nuchprayoon, Lüscher, Friedman 1996, https://doi.org/10.1128/mcb.16.9.4717). We revised the manuscript text on pages 13, 15 and 21 accordingly.

      Reviewer #1 (Significance (Required)):

      Overall, this is a strong, well-written study, with clear experimental results and relatively straightforward conclusions. The therapeutic potential of modulating MYB in cancer is enormous, and hence I believe this study will attract a broad interest in the cancer field and will likely be highly cited.

      Response: We appreciate this sentiment and completely agree with the reviewer. The phenomenon reported in this work represents the first of its kind demonstration of the aberrant organization of transcription factor control complexes in cancer, and its pharmacologic modulation. We believe that this concept will serve as a transformative paradigm for understanding oncogenic gene control and the development of effective therapies for its definitive treatment.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This manuscript reports the generation of a new and improved peptide mimetic inhibitor of the interaction between MYB and CBP/P300. The original MYBMIM inhibitor of this interaction, reported recently by the same laboratory, was modified by addition and substitution of peptide sequences from CREB, thus improving the affinity of the resulting CRYBMIM peptide to CBP/P300. The improved inhibitor profile results in increased anti-AML efficacy of CRYBMIM over MYBMIM. The authors go on to examine the mechanism underlying the anti-AML activity of CRYBMIM by integrating gene expression analysis, chromatin immunoprecipitation sequencing and mass spectrometric protein complex identification in human AML cells. I have some minor questions the authors may wish to comment on:

      1) The relocation of MYB, along with CBP/P300, to genes controlling myeloid differentiation (clusters 4 and 9) upon CRYBMIM treatment is reminiscent of the increased binding of MYB to myeloid pro-differentiation genes in AML cells following RUVBL2 silencing, recently reported in Armenteros-Monterroso et al. 2019 Leukemia 33:2817. Do the authors know if there is any overlap between genes in either of the clusters and the list reported in the latter study?

      Response: We thank the reviewer for making this suggestion. We also observe both RUVBL2 and RUVBL1 in the protein complex specifically associated with MYB (Fig. 7A and B). We compared the gene expression changes induced by CRYBMIM with those reported by Armenteros-Monterroso et al in 2019 (https://doi.org/10.1038/s41375-019-0495-8), and found that 37% of upregulated genes by RUVBL2 silencing were shared with genes induced by CRYBMIM treatment. In addition, upregulated genes in cluster 4 and 9 included myeloid differentiation-related genes, such as JUN, FOS and FOSB, which were also induced RUVBL2 silencing. We revised the manuscript to reflect this association on page 12.

      2) Could the authors comment on a possible mechanism to explain the co-localization of MYB and CBP/P300 to the loci in clusters 4 and 9 following CRYBMIM treatment? Is it possible that CBP/P300 is recruited by other transcription factors to these loci, independently of binding to MYB? Or is the binding of CBP/P300 to MYB at these loci somehow more resistant to disruption by CRYBMIM?

      Response: The reviewer has focused on an interesting point. At least for cluster 9, these genes exhibit gain of CBP/P300 in association with RUNX1 (Figure 12A), which we confirm by direct biochemical studies of MYB and CBP/P300 complexes immunoprecipitated from AML cells (revised Figure 14B-C). These experiments show that CRYBMIM treatment disrupts the MYB:CBP/P300 complexes, leading to the increased assembly of CBP/P300 with RUNX1. These findings are consistent with a dynamic competition mechanism that governs availability of CBP/P300 to transcriptional co-activation, in which distinct transcription factors compete for limiting amounts of CBP/P300. This possible mechanism is discussed in the revised manuscript (page 18-19 and 21).

      3) In the first paragraph of page 9, the text states: "Previously, we found that MYBMIM can suppress MYB:CBP/P300-dependent gene expression, leading to AML cell apoptosis that required MYB-mediated suppression of BCL2 (Ramaswamy et al., 2018)." I think this is a typo, since in this study, MYBMIM treatment results in loss of MYB binding to the BCL2 gene and consequent reduction in BCL2 expression. Do the authors mean 'MYBMIM-mediated suppression of BCl2' or 'loss of MYB-mediated activation of BCL2'?

      Response: We thank the reviewer and have corrected this typographic error in the text.

      4) The authors explain the failure of excess CREBMIM to displace CBP/P300 from immobilised CREBMIM (Figure 1E-F) by the nature of the CREB:CBP/P300 interaction. Does this imply that CREBMIM is unable to disrupt the interaction between CREB and CBP/P300 in living cells and that the CBP/P300 purified from native MV4;11 lysates by immobilised CREBMIM was from a pool not associated with CREB?

      Response: We thank the reviewer for making this point. Indeed, we reproducibly observe that CRYBMIM binding to CBP can be competed with excess free CRYBMIM, but CREBMIM binding cannot be competed by excess CREBMIM. This may be due to the different stabilities of the CBP complexes that are available for binding in cells. Alternatively, it is also possible that CREB binding to CBP, as reflected by CREBMIM, has a relatively slow dissociation rate, as compared to MYB, as reflected by CRYBMIM. We have begun to purify cellular CBP complexes (revised Fig 8. and response to comment 2 for Reviewer 1), and aim to define their determinants in future studies, as enabled by the introduction of CRYBMIM, CREBMIM and MLLMIM probes in the current work.

      Reviewer #2 (Significance (Required)):

      Based on this integrative analysis, the authors propose a convincing hypothesis, involving the assembly of aberrant transcription factor complexes and sequestration of P300/CBP from genes involved in normal myeloid development, for the oncogenic activity of MYB in AML. As well as the obvious therapeutic potential of the CRYBMIM inhibitor itself, the data reported here reveal multiple avenues for future investigation into novel anti-AML therapeutic strategies. This is an innovative and important study.

      This study will be of interest to scientists and clinicians involved in leukaemia research as well as cancer biology in general.

      My field of expertise: leukaemia biology, leukaemia models, aberrant transcription factor activity in leukaemia

      Response: We appreciate and agree with this assessment.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This manuscript describes an improved MYB-mimetic peptide (cf the group's earlier work published in Nature Communications, 2018) and its effects on AML cell lines. It also describes - and this constitutes the majority of the paper - the dynamics of chromatin occupancy by MYB and other associated transcription factors upon disruption of the MYB-CBP/P300 interaction. The authors suggest this represents a shift from an oncogenic program to a myeloid differentiation program. \*Major comments:***

      Regarding the improved affinity, and biological activity, of CRYBMIM:

      1.Improved affinity of CRYBMIM cf MYBMIM: clearly, it is improved, but not by a lot. By MST the increased affinity is about 3x. In terms of effects on AML cell viability: there is no direct comparison, and this should be included. In the group's previous paper there is no direct estimate for MYBMIM but it looks like the IC50 is between 10 and 20 micromolar so the effect is again around 2.5 fold. Also, the effects of the amino acid substitutions in CG3 are also very small (2.4x) given that 3 critical residues are altered. This is quite concerning.

      Response: As pointed out by the reviewer, CRYBMIM exhibits several fold increase in binding affinity, as measured using purified proteins in vitro. Similar increase in cellular potency is observed after short-term treatment of AML cells, as shown in revised Figure 3C, and reproduced below. However, increasing the duration of treatment to several days leads to substantial improvement in apparent cellular potency (Figure 3G). For example, while MYBMIM induces approximately 100-fold reduction in cell viability of MV411 cells, CRYBMIM induces more than 1,000-fold reduction. Similarly, whereas MYBMIM exhibited relatively modest effects on OCIAML3 and SKM1 cells, CRYBMIM induces more than 1,000-fold reduction in cell viability. As we show in the revised manuscript, this appears to be due to the combination of increased biochemical affinity and specific proteolysis of MYB, which cooperate to induce extensive remodeling of MYB transcriptional complexes and gene expression (revised Figure 11). In all, this exemplifies how pharmacologic modulators of protein interactions can achieve significantly improved biological potency from relatively modest affinity effects, a concept that recently has been successfully used to develop a variety of PROTACs that leverage this “event-driven” as opposed to occupancy-driven pharmacology. The manuscript has been revised on page 8 and 18 to clarify this point.

      2.Does CRYBMIM really "spare" normal hematopoietic cells? Not according to Fig 2E, where there is only a 2-fold difference in IC50.

      Response: To better define the relative toxicity of CRYBMIM and MYBMIM, we examined their effects on the growth and survival of normal hematopoietic progenitor cells as compared to AML cells using colony forming assays in methylcellulose under more physiologic conditions in the presence of human hematopoietic cytokines (revised Figure 3E, and reproduced below). While CRYBMIM significantly reduced the clonogenic capacity, growth and survival of MV411 AML cells, there were no significant effects on the total clonogenic activity of normal CD34+ human umbilical cord blood progenitor cells under these conditions. At the highest dose, CRYBMIM induced modest reduction in CFU-MG colony formation, and modest increase in BFU-E colony formation of normal hematopoietic progenitor cells. We revised the manuscript to indicate that CRYBMIM “relatively spares” normal blood progenitor cells on page 8.

      Response: We appreciate the attention to this issue. In the original manuscript, we showed dose-response curves of cord blood progenitor cells cultured in suspension supplemented with fetal bovine serum, a system that is known to induce in appropriate hematopoietic cell differentiation (https://doi.org/10.1016/j.molmed.2017.07.003). In the revised manuscript, we show results of colony formation assays of hematopoietic progenitor cells cultured in serum-free, semi-solid conditions supplemented with human hematopoietic cytokines (revised Figure 3E and 3F). This is a more physiologic system which more faithfully maintains normal hematopoietic cell differentiation, as compared to the cellular differentiation induced by fetal bovine serum-containing media lacking hematopoietic growth factors, as used in the experiments in our original manuscript. To establish a positive control, in addition to treating AML cells under the same condition, we used doxorubicin, which is part of current treatment of patients with AML, and which in our experiments, exhibits significant and pronounced reduction in the clonogenic capacity, growth and survival of normal blood progenitor cells (revised Figure S3B). The manuscript has been revised on page 8 accordingly.

      1. Fig 2F doesn't include any lines that express very low or undetectable levels of MYB. Some of these should be included to further examine specificity.

      Response: We have now tested CRYBMIM against a large panel of non-hematopoietic tumor and non-tumor cell lines, with varying degrees of MYB expression. Some of those cells exhibit high level of MYB gene expression and MYB genetic dependency, which is at least in part correlated with susceptibility to CRYBMIM. (revised Figure S4, and reproduced below). The manuscript has been revised on page 8 accordingly.

      Effects on gene expression and MYB binding:

      Data on MYB target gene expression and apoptosis/differentiation, and the conclusions drawn per se are sound, but:

      5.Fig S3 seems to show that MYB protein is lost on treatment with CRYBMIM. This isn't even mentioned in the text but raises a whole range of major questions eg why is this the case? Is this what is responsible for the loss of MYB-p300 interaction and/or biological effects on AML cells? Is this what is responsible for the effects on MYB target gene expression in Fig 3 and MYB binding to chromatin in Fig 4? This must be addressed.

      Response: We have revised the manuscript to include this discussion, and performed additional experiments to define this phenomenon. We confirmed rapid reduction in MYB protein levels upon CRYBMIM treatment on the time-scale of one to four hours in diverse AML cell lines (revised Figure 11), with the rate of MYB protein loss correlating to the cellular susceptibility to CRYBMIM (revised Figure 11, and reproduced below). The manuscript has been revised on page 18 accordingly.

      This is consistent with the specific proteolysis of MYB induced by the peptidomimetic remodeling of the MYB:CBP/P300 complex. We confirmed this by combined treatment with the proteosomal/protease inhibitor MG132 (revised Figure 11C, and reproduced below). This effect was specific because overexpression of BCL2, which blocks MYBMIM-induced apoptosis (Ramaswamy et al, Kentsis, https://doi.org/10.1038/s41467-017-02618-6), was unable to rescue CRYBMIM-induced proteolysis of MYB, arguing that MYB proteolysis is a specific effect of CRYBMIM rather than a non-specific consequence of apoptosis. The manuscript has been revised on page 18 accordingly.

      6.Fig 4 and the accompanying text are a bit hard to follow, but if I understood them correctly, I am surprised that the "gained MYB peaks" don't include the MYB binding motif itself? This at least deserves some comment. Also, there doesn't seem to have been any attempt to integrate the ChIP-Seq data with the expression data of Fig 3. This would provide clearer insights into the identities and types of MYB-regulated genes that are directly affected by suppression of CBP/p300 binding to MYB.

      Response: We thank the reviewer for this suggestion. The revised manuscript now includes a comprehensive and integrated analysis of chromatin and gene expression dynamics (revised Figures 13A and 13B). In contrast to the model in which blockade of MYB:CBP/P300 induces loss of gene expression and loss of transcription factor and CBP/P300 chromatin occupancy, we also observed a large number of genes with increased expression and gain of CBP/P300 occupancy (revised Figure 13A-B, and reproduced below). This includes numerous genes that control hematopoietic differentiation, such as FOS, JUN, and ATF3. As a representative example, in the case of FOS, we observed that CRYBMIM-induced accumulation of CBP/P300 was associated with increased binding of RUNX1, and eviction of CEBPA and LYL1 (revised Figure 13C). Thus, the absence of “gained MYB peaks” is due to the redistribution of CBP/P300 with alternative transcription factors, such as RUNX1. In all, these results support the model in which the core regulatory circuitry of AML cells is organized aberrantly by MYB and its associated co-factors including LYL1, CEBPA, E2A, SATB1 and LMO2, which co-operate in the induction and maintenance of oncogenic gene expression, as co-opted by distinct oncogenes in biologically diverse subtypes of AML (revised Figure 14). This involves apparent sequestration of CBP/P300 from genes controlling myeloid cell differentiation. Thus, oncogenic gene expression is associated with the assembly of aberrantly organized MYB transcriptional co-activator complexes, and their dynamic remodeling by selective blockade of protein interactions can induce AML cell differentiation. The manuscript has been revised on page 20-21 accordingly.

      7.The MS studies on MYB-interacting proteins seem very interesting and novel. I am not an expert on MS, though, so I'd suggest this section be reviewed by someone who is. Moreover, I was unable to see the actual data from this study because the material I was provided with didn't include Table S4 and S5.

      Response: We appreciate this point. For this reason, we have deposited all of our mass spectrometry data to be openly available via PRIDE (accession number PXD019708), and also openly provide all of the analyzed data via Zenodo (https://doi.org/10.5281/zenodo.4321824), as additionally provided in the Supplementary Material for this manuscript.

      \Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?* 8.Claims regarding biological activity, specificity and improvements cf MYBMIM should be moderated given the small size of these effects as mentioned above (points 1 and 3).*

      Response: As explained in detail in response to comments 1-3 above (page 12-14 of this response), we have substantially revised the manuscript to incorporate both new experimental results and additional explanations (pages 6-8).

      9.I found the description of the studies related to Figs 5 and 6 somewhat difficult to follow and convoluted. While changes in MYB and CBP/p300 chromatin occupancy clearly occur on M CRYBMIM treatment, it is not clear that the complexes seen on genes prior to treatment represent "aberrant" complexes. These may just be characteristic of undifferentiated (myeloid) cells. The authors appear to argue that because some of the candidate co-factors show "apparently aberrant expression in AML cells" based on comparison of (presumably mRNA) expression data with normal cells, the presence of these factors in the complexes make them "aberrant" (moreover, the "aberrancy score" of Fig 5 C is not defined anywhere, as far as I can see). This inference is drawing a rather long bow, given that the AML-specific factors may not actually be absent from the complexes in normal cells. So this conclusion should be moderated if a more direct MS comparison cannot be provided (for which I understand the technical difficulties).

      Response: We have now measured protein abundance levels of key transcription factors assembled with MYB in AML cells in various normal human hematopoietic cells (revised Figure 9, and reproduced below). We found that most transcription factors that are assembled with MYB in diverse AML cell lines could be detected in one or more normal human blood cells, albeit with variable abundance, with the exception of CEBPA and SATB1 that were measurably expressed exclusively in AML cells (revised Figure 9A). Using unsupervised clustering and principal component analysis, we defined the combinations of transcription factors that are associated with aberrant functions of MYB:CBP/P300, as defined by their susceptibility to peptidomimetic remodeling (revised Figure 9B-D). In addition, we directly examined the physical assembly of MYB with key transcription factors in normal hematopoietic cells using co-immunoprecipitation studies (revised Figure 9E). In agreement with the physical association of MYB seen in AML cell lines, we observed association with CBP/P300 and LYL1 in normal hematopoietic cells. However, we did not observe physical association with E2A and SATB1 in normal cells, which indicates aberrant association of these in AML cell lines. This leads us to propose that these complexes are aberrantly assembled, at least in part due to the inappropriate transcription factor co-expression. The manuscript has been revised on page 15 accordingly.

      \Would additional experiments be essential to support the claims of the paper?*

      Response: As explained in detail in response to comment 5 above (page 16 of this response), we have carried out extensive studies of the specific proteolysis of MYB. We conclude that MYB transcription complexes are regulated both by MYB:CBP/P300 binding and by specific factor proteolysis, and can be induced by its peptidomimetic blockade in AML cells. Such “event-driven” pharmacology is emerging as a powerful tool to modulate protein function in cells, and studies reported in our work should enable its translation into improved therapies for patients, and improved probes for basic science.

      11.Provision of a positive control for the experiment of Fig S2.

      Response: As explained in detail in response to comment 2 above (page 13-14 of this response), we precisely defined the effects of CRYBMIM and MYBMIM on the clonogenic capacity, growth and survival of normal hematopoietic progenitor cells in serum-free, methylcellulose media supplemented with human hematopoietic cytokines. These experiments showed relatively modest effects (9.3 ± 3.8% reduction) of CRYBMIM on normal cells (Figure 3E), as compared to substantial inhibition (54 ± 2.4 % reduction) of the growth and survival of AML cells (Figures 3E). For comparison, doxorubicin led to more than 98 % reduction in clonogenic capacity (revised Figure S3B).

      12.\Are the data and the methods presented in such a way that they can be reproduced?**

      -Mostly yes

      Response: The revised manuscript includes a complete description of all methods, including a detailed supplement, listing technical details, with all analyzed data available openly via Zenodo (https://doi.org/10.5281/zenodo.4321824).

      13.\Are the experiments adequately replicated and statistical analysis adequate?**

      -Mostly yes

      Response: All experiments were performed in at least three replicates, with all quantitative comparisons performed using appropriate statistical tests, as explained in the manuscript.

      **Minor comments:**

      *Specific experimental issues that are easily addressable.*

      -These are mostly indicated above.

      In addition:

      14.Why is BCL2 expression down-regulated by MYBMIM but not CRYMYB?

      Response: We made the same observation, and attribute this difference to the fact that BCL2 expression is regulated by several transcription factors, including CEBPA, which is affected by CRYBMIM but not MYBMIM. Similar to MYBMIM treatment, MYB occupancy at the BCL2 enhancer was reduced upon CRYBMIM treatment. However, new binding sites of other factors, such as CBP/P300 and RUNX1, appeared simultaneously, suggesting that redistribution of transcription factors following CRYBMIM treatment can affect transcriptional regulation of BCL2 expression (revised Figure S9 and shown below).

      *Are prior studies referenced appropriately?

      -Yes *Are the text and figures clear and accurate?*

      15.Generally, although some details are missing eg what aberrancy score in Fig 5C means.

      Response: Thank you for pointing this out. We have revised this figure to clarify this score, which is defined as the ratio of gene expression in AML cells relative to normal hematopoietic progenitor cells (revised Figure 7C).

      16.\Do you have suggestions that would help the authors improve the presentation of their data and conclusions?**

      -The title of this manuscript could and I think should be changed. The term "therapeutic", is not appropriate because no therapeutic agents are described in the m/s nor is any form of AML, even experimentally, treated. Also "CBP" should be replaced with CBP/P300, especially since most evidence suggests that P300 is the likely more important partner of MYB (eg Zhao et al 2011

      Response: We agree and have revised the title to clarify the significance of this work: “Convergent organization of aberrant MYB complexes controls oncogenic gene expression in acute myeloid leukemia.” We have revised the manuscript to indicate CBP/P300.

      17.-It would be worth discussing the core observation that disruption of the MYB-CBP/P300 interaction actually results in changes in MYB DNA binding. That this would occur is not at all obvious, because CBP/p300 doesn't interact with MYB's DNA binding domain nor does it have intrinsic DNA binding activity.

      Response: We thank the reviewer for this comment, and agree that remodeling of the MYB complex must affect the binding of MYB and other cofactors to DNA, at least in part mediated by potential acetylation by CBP/P300 (page 24).

      Reviewer #3 (Significance (Required)):

      **The Nature and Significance of the Advance**

      1) The major significance of this work lies in the chromatin occupancy and MYB complex studies. There are a number of very interesting findings including the apparent redistribution of MYB and/or CBP/P300 upon treatment with CRYBMIM. These suggest a series of changes in factors associated with particular gene sets involved in myeloid differentiation, although as mentioned above particular target genes are not specifically identified. However the pathways corresponding to these are listed in Table S6.

      Response: We have revised the manuscript to include the target genes in revised Supplemental Table 4 as well as DESeq2 tables (deposited in Zenodo, https://doi.org/10.5281/zenodo.4321824).

      2) The new peptide design (CRYBMIM) is interesting but its differences in binding and biological effects of MYBMIM are mostly incremental. See above.

      Response: We respectfully disagree and would like to explain how this work is significant both for conceptual and technical reasons. First, while the biochemical affinity of CRYBMIM is quantitatively increased compared with MYBMIM, this quantitatively increased affinity translates into qualitatively improved biological potency, as a result of “event-driven” pharmacology that characterizes pharmacologic protein interaction modulators (please also see response to Reviewer 3, comment 1, page 6 of this response). MYBMIM suppresses the growth and survival mostly of MLL-rearranged leukemias, whereas CRYBMIM does so for the vast majority (10 out of 11) of studied subtypes of AML. This now enables its therapeutic translation, as we are currently pursuing in collaboration with Novartis. Second, its improved biological activity led to the discovery of the previously unknown and unanticipated CBP/P300 sequestration mechanism of oncogenic gene control. We use this discovery to develop a precise model of aberrant gene control in AML that for the first time unifies previously disparate observations into a general mechanism. This is highly significant because it provides shared molecular dependencies for most subtypes of AML, a long-standing conundrum in cancer biology.

      *Place the work in the context of the existing literature (provide references, where appropriate).*

      -This m/s builds on and extends the report from the same group in Nature Communications (2018), which described the earlier peptide MYBMIM, some effects on MYB target genes and on AML cells. It and the previous paper also draw on the findings regarding the role of the MYB-CBP/P300 interaction in myeloid leukemogenesis (Pattabirman et al 2014) and on previous genome-wide studies of MYB target genes (Zhoa et al 2011; Zuber et al 2011).

      *State what audience might be interested in and influenced by the reported findings.*

      -This m/s will likely be of interest to scientists interested in MYB per se, in AML, in cancer genomics and transcriptional regulation.

      *Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.* -My expertise: AML, experimental hematology, transcription, MYB, cancer genomics

      3) As mentioned above, I feel that additional expertise is required to review the MS studies.

      Response: We have deposited all raw data in PRIDE (accession number PXD019708) and all processed data in Zenodo (https://doi.org/10.5281/zenodo.4321824), making it available for the community for further analysis.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      This manuscript describes an improved MYB-mimetic peptide (cf the group's earlier work published in Nature Communications, 2018) and its effects on AML cell lines. It also describes - and this constitutes the majority of the paper - the dynamics of chromatin occupancy by MYB and other associated transcription factors upon disruption of the MYB-CBP/P300 interaction. The authors suggest this represents a shift from an oncogenic program to a myeloid differentiation program.

      Major comments:

      Regarding the improved affinity, and biological activity, of CRYBMIM:

      1.Improved affinity of CRYBMIM cf MYBMIM: clearly, it is improved, but not by a lot. By MST the increased affinity is about 3x. In terms of effects on AML cell viability: there is no direct comparison, and this should be included. In the group's previous paper there is no direct estimate for MYBMIM but it looks like the IC50 is between 10 and 20 micromolar so the fecct is again around 2.5 fold. Also, the effects of the amino acid substitutions in CG3 are also very small (2.4x) given that 3 critical residues are altered. This is quite concerning.

      2.Does CRYBMIM really "spare" normal hematopoietic cells? Not according to Fig 2E, where there is only a 2-fold difference in IC50.

      3.Fig 2E and Supp Fig S2 appear to be contradictory. The latter shows no effect of 20micromolar CRYBMIM on colony formation by normal CD34+ cells, in complete contrast to killing with IC50 of 12.8 micromolar in Fig 2E. There is no +ve control for Fig S2 ie does the peptide work under colony assay conditions? This MUST be addressed.

      4.Fig 2F doesn't include any lines that express very low or undetectable levels of MYB. Some of these should be included to further examine specificity.2

      Effects on gene expression and MYB binding:

      Data on MYB target gene expression and apoptosis/differentiation, and the conclusions drawn per se are sound, but:

      5.Fig S3 seems to show that MYB protein is lost on treatment with CRYBMIM. This isn't even mentioned in the text but raises a whole range of major questions eg why is this the case? Is this what is responsible for the loss of MYB-p300 interaction and/or biological effects on AML cells? Is this what is responsible for the effects on MYB target gene expression in Fig 3 and MYB binding to chromatin in Fig 4? This must be addressed.

      6.Fig 4 and the accompanying text are a bit hard to follow, but if I understood them correctly, I am surprised that the "gained MYB peaks" don't include the MYB binding motif itself? This at least deserves some comment. Also, there doesn't seem to have been any attempt to integrate the ChIP-Seq data with the expression data of Fig 3. This would provide clearer insights into the identities and types of MYB-regulated genes that are directly affected by suppression of CBP/p300 binding to MYB.

      7.The MS studies on MYB-interacting proteins seem very interesting and novel. I am not an expert on MS, though, so I'd suggest this section be reviewed by someone who is. Moreover, I was unable to see the actual data from this study because the material I was provided with didn't include Table S4 and S5.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      8.Claims regarding biological activity, specificity and improvements cf MYBMIM should be moderated given the small size of these effects as mentioned above (points 1 and 3).

      9.I found the description of the studies related to Figs 5 and 6 somewhat difficult to follow and convoluted. While changes in MYB and CBP/p300 chromatin occupancy clearly occur on M CRYBMIM treatment, it is not clear that the complexes seen on genes prior to treatment represent "aberrant" complexes. These may just be characteristic of undifferentiated (myeloid) cells. The authors appear to argue that because some of the candidate co-factors show "apparently aberrant expression in AML cells" based on comparison of (presumably mRNA) expression data with normal cells, the presence of these factors in the complexes make them "aberrant" (moreover, the "aberrancy score" of Fig 5 C is not defined anywhere, as far as I can see). This inference is drawing a rather long bow, given that the AML-specific factors may not actually be absent from the complexes in normal cells. So this conclusion should be moderated if a more direct MS comparison cannot be provided (for which I understand the technical difficulties).

      Would additional experiments be essential to support the claims of the paper?

      1. Address the issue of the apparent loss of MYB protein upon CRYBMIM treatment. If this is occurring, the whole premise of the subsequent work is undermined.

      12.Provision of a positive control for the experiment of Fig S2.

      Are the data and the methods presented in such a way that they can be reproduced?

      -Mostly yes

      Are the experiments adequately replicated and statistical analysis adequate?

      -Mostly yes

      Minor comments:

      Specific experimental issues that are easily addressable. -These are mostly indicated above.

      In addition: oWhy is BCL2 expression down-regulated by MYBMIM but not CRYMYB?

      *Are prior studies referenced appropriately?

      -Yes

      Are the text and figures clear and accurate?

      -Generally, although some details are missing eg what aberrancy score in Fig 5C means.

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      -The title of this manuscript could and I think should be changed. The term "therapeutic", is not appropriate because no therapeutic agents are described in the m/s nor is any form of AML, even experimentally, treated. Also "CBP" should be replaced with CBP/P300, especially since most evidence suggests that P300 is the likely more important partner of MYB (eg Zhao et al 2011

      -It would be worth discussing the core observation that disruption of the MYB-CBP/P300 interaction actually results in changes in MYB DNA binding. That this would occur is not at all obvious, because CBP/p300 doesn't interact with MYB's DNA binding domain nor does it have intrinsic DNA binding activity.

      Significance

      The Nature and Significance of the Advance

      -The major significance of this work lies in the chromatin occupancy and MYB complex studies. There are a number of very interesting findings including the apparent redistribution of MYB and/or CBP/P300 upon treatment with CRYBMIM. These suggest a series of changes in factors associated with particular gene sets involved in myeloid differentiation, although as mentioned above particular target genes are not specifically identified. However the pathways corresponding to these are listed in Table S6.

      -The new peptide design (CRYBMIM) is interesting but its differences in binding and biological effects cf MYBMIM are mostly incremental. See above.

      Place the work in the context of the existing literature (provide references, where appropriate).

      -This m/s builds on and extends the report from the same group in Nature Communications (2018), which described the earlier peptide MYBMIM, some effects on MYB target genes and on AML cells. It and the previous paper also draw on the findings regarding the role of the MYB-CBP/P300 interaction in myeloid leukemogenesis (Pattabirman et al 2014) and on previous genome-wide studies of MYB target genes (Zhoa et al 2011; Zuber et al 2011).

      State what audience might be interested in and influenced by the reported findings.

      -This m/s will likely be of interest to scientists interested in MYB per se, in AML, in cancer genomics and transcriptional regulation.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      -My expertise: AML, experimental hematology, transcription, MYB, cancer genomics

      -As mentioned above, I feel that additional expertise is required to review the MS studies.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This manuscript reports the generation of a new and improved peptide mimetic inhibitor of the interaction between MYB and CBP/P300. The original MYBMIM inhibitor of this interaction, reported recently by the same laboratory, was modified by addition and substitution of peptide sequences from CREB, thus improving the affinity of the resulting CRYBMIM peptide to CBP/P300. The improved inhibitor profile results in increased anti-AML efficacy of CRYBMIM over MYBMIM. The authors go on to examine the mechanism underlying the anti-AML activity of CRYBMIM by integrating gene expression analysis, chromatin immunoprecipitation sequencing and mass spectrometric protein complex identification in human AML cells.

      I have some minor questions the authors may wish to comment on:

      1) The relocation of MYB, along with CBP/P300, to genes controlling myeloid differentiation (clusters 4 and 9) upon CRYBMIM treatment is reminiscent of the increased binding of MYB to myeloid pro-differentiation genes in AML cells following RUVBL2 silencing, recently reported in Armenteros-Monterroso et al. 2019 Leukemia 33:2817. Do the authors know if there is any overlap between genes in either of the clusters and the list reported in the latter study?

      2) Could the authors comment on a possible mechanism to explain the co-localization of MYB and CBP/P300 to the loci in clusters 4 and 9 following CRYBMIM treatment? Is it possible that CBP/P300 is recruited by other transcription factors to these loci, independently of binding to MYB? Or is the binding of CBP/P300 to MYB at these loci somehow more resistant to disruption by CRYBMIM?

      3) In the first paragraph of page 9, the text states: "Previously, we found that MYBMIM can suppress MYB:CBP/P300-dependent gene expression, leading to AML cell apoptosis that required MYB-mediated suppression of BCL2 (Ramaswamy et al., 2018)." I think this is a typo, since in this study, MYBMIM treatment results in loss of MYB binding to the BCL2 gene and consequent reduction in BCL2 expression. Do the authors mean 'MYBMIM-mediated suppression of BCl2' or 'loss of MYB-mediated activation of BCL2'?

      4) The authors explain the failure of excess CREBMIM to displace CBP/P300 from immobilised CREBMIM (Figure 1E-F) by the nature of the CREB:CBP/P300 interaction. Does this imply that CREBMIM is unable to disrupt the interaction between CREB and CBP/P300 in living cells and that the CBP/P300 purified from native MV4;11 lysates by immobilised CREBMIM was from a pool not associated with CREB?

      Significance

      Based on this integrative analysis, the authors propose a convincing hypothesis, involving the assembly of aberrant transcription factor complexes and sequestration of P300/CBP from genes involved in normal myeloid development, for the oncogenic activity of MYB in AML. As well as the obvious therapeutic potential of the CRYBMIM inhibitor itself, the data reported here reveal multiple avenues for future investigation into novel anti-AML therapeutic strategies. This is an innovative and important study.

      This study will be of interest to scientists and clinicians involved in leukaemia research as well as cancer biology in general.

      My field of expertise: leukaemia biology, leukaemia models, aberrant transcription factor activity in leukaemia

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The study by Forbes et al describes and characterizes a 2nd generation peptide-based inhibitor of the MYB:CBP interaction, termed CRYBMIM, which they use to study MYB:cofactor interactions in leukemia cells. The CRYBMIM has improved properties relative to the MYBMIM peptide, and display more potency in biochemical and cell-based assays. Using a combination of epigenomics and biochemical screens, the authors define a list of candidate MYB cofactors whose functional significance as AML dependencies is supported by analysis of the DepMap database. Using genomewide profiling of TF and CBP occupancy, the authors provide evidence that CRYBMIM treatment reprograms the interactome of MYB in a manner that disproportionately changes specific cis-elements over others. Stated differently, the overall occupancy pattern of many TFs/cofactors shows gains and losses at specific cis elements, resulting in a complex modulation of MYB function and changes in transcription in leukemia cells.

      Overall, this is a strong, well-written study, with clear experimental results and relatively straightforward conclusions. The therapeutic potential of modulating MYB in cancer is enormous, and hence I believe this study will attract a broad interest in the cancer field and will likely be highly cited. I list below a few control experiments that would clarify the specificity of CRYBMIM.

      1) Does CRYBMIM bind to other KIX domains, such as of MED15. It would be important to evaluate the specificity of this peptide for whether it binds to other KIX domains.

      2) Similarly, it would be useful to perform a mass spec analysis to all nuclear factors that associate with streptavidin-immobilized CRYBMIM. This again would be help the reader to understand the specificity of this peptide.

      The major limitation of this study which modestly lessens my enthusiasm of this work is that the mechanistic model of MYB-sequestered TFs proposed here is based on a face-value interpretation of IP-MS data coupled with ChIP-seq data. Normally, I would expect such a mechanism to be supported with some additional focused biochemical experiments of specific interactions, to complement all of the omics approaches. For example, can the authors evaluate and/or validate further how MYB physically interacts with LYL1, CEBPA, SPI1, or RUNX1. Are these interactions direct or indirect? Which domains of these proteins are involved? Does CRYBMIM treatment modulate the ability of these proteins to associate with one another in a co-IP? Do these interactions occur in normal hematopoietic cells? A claim is made throughout this study that these are aberrant TF complexes, but I believe more evidence is required to support this claim.

      Significance

      Overall, this is a strong, well-written study, with clear experimental results and relatively straightforward conclusions. The therapeutic potential of modulating MYB in cancer is enormous, and hence I believe this study will attract a broad interest in the cancer field and will likely be highly cited.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 The authors study allostery with a beautiful genotype-phenotype experiment to study the fitness landscape of an allosteric lac repressor protein. The authors make a mutational library using error prone pcr and measure the impact on antibiotic resistance protein expression at varying levels of ligand, IPTG, expression. After measuring the impact of mutations authors fill-in the missing data using a neural net model. This type of dose response is not standard in the field, but the richness of their data and the discovery of the "band pass" phenomena prove its worth here splendidly. Using this mixed experimental/predicted data the authors explore how each mutation alters the different parameters of a hill equation fit of a dose response curve. Using higher order mutational space the authors look at how mutations can qualitatively switch phenotypes to inverted or band-stop dose-response curves. To validate and further explore a band-stop novel phenotype, the authors focused on a triple mutant and made all combinations of the 3 mutations. The authors find that only one mutation alone alters the dose-response and only in combination does a band-stop behavior present itself. Overall this paper is a fantastic data heavy dive into the allosteric fitness landscape of protein. Overall, the data presented in this paper is thoroughly collected and analyzed making the conclusions well-based. We do not think additional experiments nor substantial changes are needed apart from including basic experimental details and more biophysical rationale/speculation as discussed in further detail below.

      The authors do a genotype-phenotype experiment that requires extensive deep sequencing experiments. However, right now quite a bit of basic statistics on the sequencing is missing. Baseline library quality is somewhat shown in supplementary fig 2 but the figure is hard to interpret. It would be good to have a table that states how many of all possible mutations at different mutation depths (single, double, etc) there are. Similarly, sequencing statistics are missing- it would be useful to know how many reads were acquired and how much sequencing depth that corresponds to. This is particularly important for barcode assignment to phenotype in the long-read sequencing. In addition, a synonymous mutation comparison is mentioned but in my reading that data is not presented in the supplemental figures section.

      We thank the reviewer for this succinct summary of the manuscript and the results. We appreciate the reviewer identifying data of interest that were not included in the original manuscript. We agree that this information is necessary to consider the results. Specific changes are summarized in the comments below.

      The paper is very much written from an "old school" allostery perspective with static end point structures that are mutually exclusive - eg. p5l10 "relative ligand-binding affinity between the two conformations" - however, an ensemble of conformations is likely needed to explain their data. This is especially true for the bandpass and inverted phenotypes they observe. The work by Hilser et al is of particular importance in this area. We would invite the authors to speculate more freely about the molecular origins of their findings.

      We agree with the suggestions to adopt a modern allosteric perspective. We have changed the language throughout the manuscript to align with the ensemble model of allostery. We continue to frame results using the Monad-Wyman-Changeaux model, which reliably predicts LacI activity from biophysical parameters and is not exclusive of more modern models of allostery.

      **Minor** There are a number of small modifications. In general this paper is very technical and could use with some explanation and discussion for relevance to make the manuscript more approachable for a broader audience. P1L23: Ligand binding at one site causes a conformational change that affects the activity of another > not necessarily true - and related to using more "modern" statistical mechanical language for describing allostery.

      We agree with the reviewer’s comment. We have addressed this comment by adopting language in line with more modern view of allostery, for example:

      “With allosteric regulation, ligand binding at one site on a biomolecule changes the activity of another, often distal, site. Switching between active and inactive states provides a sense-and-response function that defines the allosteric phenotype.”

      P2L20: The core experiment of this paper is a selection using a mutational library. In the main body the authors mention the library was created using mutagenic pcr but leave it at that. More details on what sort of mutagenic pcr was used in the main body would be useful. According to the methods error prone pcr was used. Why use er-pcr vs deep point mutational libraries? Presumably to sample higher order phenotype? Rationale should be included. Were there preliminary experiments that helped calibrate the mutation level?

      We agree that justifying the decision to use error-prone PCR for library construction would be helpful. To explain this decision, we have added to the main text to explain this decision and to reflect on the consequences.

      “We used error-prone PCR across the full lacI CDS to investigate the effects of higher-order substitutions spread across the entire LacI sequence and structure.”

      And

      Novel phenotypes emerged at mutational distances greater than one amino acid substitution, highlighting the value in sampling a broader genotype space with higher-order mutations. Furthermore, the untargeted, random mutagenesis approach used here was critical for finding these novel phenotypes, as the genotypes required for these novel phenotypes were unpredictable.”

      P2L20: Baseline library statistics would be great in a table for coverage, diversity, etc especially as this was done by error prone pcr vs a more saturated library generation method. This is present in sup fig2 but it's a bit complicated.

      To more clearly convey the diversity within the library, we have included a heatmap of amino acid substitution counts found within the library (Supplementary Fig. 4). Additionally, we have added Supplementary Table 1, which lists the distribution of mutational distances of LacI variants found within the library, and the corresponding coverage of all possible mutations for each mutational distance.

      P2L26: How were FACS gates drawn? This is in support fig17 - should be pointed to here.

      We agree that a better description of the FACS process would be helpful. To address this we have included Supplementary Fig. 2, showing flow cytometry measurements of the library before and after FACS. Additionally, we have extended the description of the FACS process:

      “The initial library had a bimodal distribution of G__­0, as indicated by flow cytometry results, with a mode at low fluorescence (near G__­0 of wildtype LacI), and mode at higher gene expression. To generate a library in which most of the LacI variants could function as allosteric repressors, we used fluorescence activated cell sorting (FACS) to select the portion of the library with low fluorescence in the absence of ligand, gating at the bifurcation of the two modes (Sony SH800S Cell Sorter, Supplementary Fig. 2).”

      __

      P3L4: Where is the figure/data for the synonymous SNP mutations? This should be in the supplement.

      We agree this data is necessary to support the claim that LacI function was not impacted by synonymous mutations. We have included a new Supplementary Fig. 9, which shows the distribution of Hill equation parameters for LacI variants that code for the wild-type amino acid sequence, but with non-identical coding DNA sequences. Additionally, we included the results of a statistical analysis in the main text, this analysis compared all synonymous sequences in the library:

      “__We compared the distributions of the resulting Hill equation parameters between two sets of variants: 39 variants with exactly the wild-type coding DNA sequence for LacI (but with different DNA barcodes) and 310 variants with synonymous nucleotide changes (i.e. the wild-type amino acid sequence, but a non-wild-type DNA coding sequence). Using the Kolmogorov-Smirnov test, we found no significant differences between the two sets (p-values of 0.71, 0.40, 0.28, and 0.17 for G0, G∞, EC50, and n respectively, Supplementary Fig. 9).” __

      P3L20: The authors use a ML learning deep neural network to predict variant that were not covered in the screen. However, the library generation method is using error prone pcr meaning there could multiple mutations resulting in the same amino acid change. The models performance was determined by looking at withheld data however error prone pcr could result in multiple nonsynomymous mutations of the same amino acid. For testing were mutations truly withheld or was there overlap? Because several mutations are being represented by different codon combinations. Was the withheld data for the machine learning withholding specific substitutions?

      We thank the reviewer for identifying the need to clarify this critical data analysis. Data was held-out at the amino acid level, and so no overlap between the training and testing datasets occurred. We have clarified the description of the method in the main text:

      “We calculated RMSE using only held-out data not used in the model training, and the split between held-out data and training data was chosen so that all variants with a specific amino acid sequence appear in only one of the two sets.”


      In addition, higher order protein interactions are complicated and idiosyncratic. I am surprised how well the neural net performs on higher order substitutions. P4L4: Authors find mutations at the dimer/tetramer interfaces but don't mention whether polymerization is required. is dimerization required for dna binding? Tetramerization?

      We agree with the reviewer that, overall, a description of LacI structure and function would improve messaging the reported results. As such, we have added Supplementary Table 2, which defines the structural features discussed throughout the manuscript. Additionally, we have strived to describe the relevant structural and functional role of specific amino acids that are discussed in the text. Finally, we have also added a paragraph to the main text that summarizes the structure and function of LacI.

      “The LacI protein has 360 amino acids arranged into three structural domains__22–24__. The first 62 N-terminal amino acids form the DNA-binding domain, comprising a helix-turn-helix DNA-binding motif and a hinge that connects the DNA-binding motif and the core domain. The core domain, comprising amino acid positions 63-324, is divided into two structural subdomains: the N-terminal core and the C-terminal core. The full core domain forms the ligand-binding pocket, core-pivot region, and dimer interface. The tetramerization domain comprises the final 30 amino acids and includes a flexible linker and an 18 amino acid α-helix (Fig. 3, Supplementary Table 2). Naturally, LacI functions as a dimer of dimers: Two LacI monomers form a symmetric dimer that further assembles into a tetramer (a dimer of dimers).”

      P4L8: Substitutions near the dimer interface both impact g0 and ec50, which authors say is consistent with a change in the allosteric constant. Can authors explain their thinking more in the paper to make it easier to follow? Are the any mutations in this area that only impact g0 or ec50 alone? Why may these specific residues modify dimerization?

      We agree that a more in-depth discussion on the possible mechanisms behind these phenotypic changes would improve the manuscript. We have added discussion throughout the subsection “Effects of amino acid substitutions on LacI phenotype,” we believe this added discussion improve the manuscript and clarify the relationship between the observed allosteric phenotypes and the molecular mechanisms behind them. W

      Overall, we have made a number of changes in the manuscript that we hope will address these concerns.

      P4L8: The authors discuss the allosteric constant extensively within the paper but do not explain it. It would be helpful to have an explanation of this to improve readability. This explanation should include the statistical mechanical basis of it and some speculation about the ways it manifests biophysically.

      The allosteric constant is a critical concept, and we agree that it must be defined and discussed clearly throughout the manuscript. We have greatly expanded the discussion of the effects of single amino acid substitutions, and in the process we give examples of biochemical changes in the protein, and how they may affect the allosteric constant. We think this added text improves the manuscript and helps clarify the allosteric constant and the biomolecular processes that affect it.

      P4L1-16: Authors see mutations in the dimerization region that impact either G0 and Gsaturated in combination with Ec50 but not g0 and gsaturated together. Maybe we do not fully understand the hill equation but why are there no mutations that impact both g0 and gsaturated seen in support fig 13c? Why would mutations in the same region potentially impacting dimerization impact either g0 or gsaturated? What might be the mechanism behind divergent responses?

      It is important to recognize that the dimer interface does not just support the formation of dimers. There are many points of contact along the dimer interface that change when LacI switches between the active and inactive states. So, the dimer interface also helps regulate the balance between the active and inactive states. Our results show that different substitutions near the dimer interface can push this balance either toward the active or inactive states to varying degrees. We’ve added text throughout the description of single-substitutions effects to give specific examples and added a new paragraph at the end of that section to provide additional discussion and context. With regard to the more specific question of changes to both G0 and Ginf, the models indicate that simultaneous changes to those Hill Equation parameters requires an unusual combination of biophysical changes. To clarify this point, we added a short paragraph to the text:

      “None of the single amino substitutions measured in the library simultaneously decrease __G∞ and increase G0 (Supplementary Fig. 20c). This is not surprising, since substitutions that shift the biophysics to favor the active state tend to decrease G∞ while those that favor the inactive state tend to increase G0, and the biophysical models2,14,15 indicate that only a combination of parameter changes can cause both modifications to the dose-response. The library did, however, contain several multi-substitution variants with simultaneously decrease __G∞ and increase G0. These inverted variants, and their associated substitutions are discussed below.”


      P4L29: for interpretability it would be good to explain what log-additive effect means in the context of allostery.

      We agree that this information would be useful to the reader and have added additional text to explain log-additivity. We thank the reviewer for pointing out this oversight.

      “Combining multiple substitutions in a single protein almost always has a log-additive effect on EC50. That is, the proportional effects of two individual amino acid substitutions on the EC50 can be multiplied together. For example, if substitution A results in a 3-fold change, and substitution B results in a 2-fold change, the double substitution, AB, behaving log-additively, results in a 6-fold change__.”__

      P4L34-P5L19: This section is wonderful. Really cool results and interesting structural overlap! P5L34 Helix 9 of the protein is mentioned but it's functional relevance is not. This is common throughout the paper - it would be useful for there to be an overview somewhere to help the reader contextualize the results with known structural role of these elements.

      We agree with the reviewer that this information would help to contextualize the results. We have made a number of changes to address this. First, we have added Supplementary Table 2, which describes the structural features of LacI. Second, we have added a paragraph overviewing the structure and function of LacI. Third, we have expanded the section “the effects of individual amino acid substitutions on the function of LacI” to discuss the structural or biochemical impact of specific substitutions. We thank the reviewer for this suggestion.

      P5L39: The authors identified a triple mutant with the band-stop phenotype then made all combination of the triple mutant. Of particular interest is R195H/G265D which is nearly the same as the triple mutant. It would be nice if the positions of each of these mutations and have some discussion to begin to rationalize this phenotype, even if to point out how far apart they are and that there is no easy structural rationale!

      We appreciate the reviewer highlighting this area of interest. We have added structural information to Fig. 6, which indicates to position of the amino acid substitutions that result in the band-stop phenotype, as well as a small discussion in the main text:

      “To further investigate the band-stop phenotype, we chose a strong band-stop LacI variant with only three amino acid substitutions (R195H/G265D/A337D). These three positions are distributed distally on the periphery of the C-terminal core domain, and the role that each of these substitutions plays in the emergence of the band-stop phenotype is unclear.”

      P6L9: There should be more discussion of the significance of this work directly compared to what is known. For instance, negative cooperativity is mentioned as an explanation for bi-phasic dose response but this idea is not explained. Why would the relevant free energy changes be more entropic? Another example is the reverse-TetR phenotype observed by Hillen et al.

      We agree that more discussion is necessary to frame the results reported in the manuscript. To address this, we have added additional discussion throughout the manuscript that relates the results to the current understanding of allostery. Also, in the Conclusion, we added specific examples that lead us to link the ideas of bi-phasic dose response, negative cooperativity, and entropy/disorder. We believe these additions have improved the manuscript and we thank the reviewer for this suggestion.

      P6L28: The authors mention that phenotypes exist with genotypes that are discoverable with genotype-phenotype landscapes. This study due to the constraints of error prone pcr were somewhat limited. How big is the phenotypic landscape? Is it worth doing a more systematic study? What is the optimal experimental design: Single mutations, doubles, random - where is there the most information. How far can you drift before your machine learning model breaks down? How robust would it be to indels?

      The reviewer raises some excellent questions here, some of which are appropriate subjects for future work. The optimal experimental design depends on the objective: If the goal is to understand every possible mutation, a systematic site-saturation approach would be more appropriate. However, the landscape of a natural protein is limited by its wild-type DNA coding sequence, and so some substitutions are inaccessible (due to the arrangement of the codon table). The approach we took allowed to us characterize most of the accessible amino acid substitutions, while also allowing us to identify novel functions that would not have been identified with other approaches. We have added a little to the main text to discuss this (below). With regard to the DNN model, in the manuscript (SI Fig. 14), we show how the predictive accuracy degrades with mutational distance from the wild-type. It is possible that the type of DNN that we used could handle indels, since it effectively encodes each variant as a set of step-wise changes from the wild-type. But as with all machine-learning methods, it would require training with a dataset that included indels.

      “Novel phenotypes emerged at mutational distances greater than one amino acid substitution, highlighting the value in sampling a broader genotype space with higher-order mutations. Furthermore, the untargeted, random mutagenesis approach used here was critical for finding these novel phenotypes, as the genotypes required for these novel phenotypes were unpredictable.”

      Figures: Sup figs 3-7: The comparison of library-based results and single mutants is a great example of how to validate genotype-phenotype experiments!

      Thank you.

      Supp fig 5.: Missing figure number.

      We appreciate the reviewer catching this error and have attempted to properly label all figures and tables in this revision. Thank you.

      Supp fig7: G0 appears to have very poor fit between library vs single mutant version. Why might this be? R^2 would likely be better to report here as opposed to RMSE as RMSE is sensitize to the magnitude of the data such that you cannot directly compare RMSE of say 'n' to G0.

      We agree that these are important discussion points and have addressed this concern with an expanded discussion in the main text, as well as the addition of coefficient of correlation (R^2) in the caption for Figure 2 (previously supplementary figure 7). We believe these additions contribute meaningfully to the manuscript, and they address the concerns of the reviewer. The additional text reads:

      “We compared the Hill equation parameters from the library-scale measurement to those same parameters determined from flow cytometry measurements for each of the chemically synthesized LacI variants (Fig. 2). This served as a check of the new library-scale method’s overall ability to measure dose-response curves with quantitative accuracy. The accuracy for each Hill equation parameter in the library-scale measurement was: 4-fold for G0, 1.5-fold for G∞, 1.8-fold for EC50, and ± 0.28 for n. For G0, G∞, and EC50, we calculated the accuracy as: __, where __ is the root-mean-square difference between the logarithm of each parameter from the library-scale and cytometry measurements. For n, we calculated the accuracy simply as the root-mean-square difference between the library-scale and cytometry results. The accuracy for the gene expression levels (G0 and G∞) was better at higher gene expression levels (typical for G∞) than at low gene expression levels (typical for G0), which is expected based on the non-linearity of the fitness impact of tetracycline (Supplementary Figs. 10-11). Measurements of the Hill coefficient, n, had high relative uncertainties for both barcode-sequencing and flow cytometry, and so the parameter n was not used in any quantitative analysis.”

      Sup fig13c: it is somewhat surprising that mutations only appear to effect g0 and not gsaturated. This implies that basal and saturated activity are not coupled. Is this expected? Why or why not?

      This comment is partially addressed with a response above (P4L1-16). Coupled gene expression increases do occur, especially with substitutions at the start codon that result in fewer copies of LacI in the cell. In this instance, both G0 and G∞ are increased. Otherwise, changes to multiple biophysical parameters are required to increase both G0 and G∞.

      Reviewer #1 (Significance (Required)): Allostery is hard to comprehend because it involves many interacting residues propagating information across a protein. The Monod-Wyman-Changeux (MWC) and Koshland, Nemethy, and Filmer (KNF) models have been a long standing framework to explain much of allostery, however recent formulations have focused on the role of the conformational ensemble and a grounding in statistical mechanics. This manuscript focuses on the functional impact of mutations and therefore contribution of the amino acids to regulation. The authors unbiased approach of combining a dose-response curve and mutational library generation let them fit every mutant to a hill equation. This approach let the authors identify the allosteric phenotype of all measured mutations! The authors found inverted phenotypes which happen in homologs of this protein but most interesting is the strange and idiosyncratic 'Band-stop' phenotype. The band-stop phenotype is bi-phasic that will hopefully be followed up with further studies to explain the mechanism. This manuscript is a fascinating exploration of the adaptability of allosteric landscapes with just a handful of mutations. Genotype-phenotype experiments allow sampling immense mutational space to study complex phenotypes such as allostery. However, a challenge with these experiments is that allostery and other complicated phenomena come from immense fitness landscapes altering different parameters of the hill equation. The authors approach of using a simple error prone pcr library combined with many ligand concentrations allowed them to sample a very large space somewhat sparsely. However, they were able to predict this data by training and using a neural net model. I think this is a clever way to fill in the gaps that are inherent to somewhat sparse sampling from error prone pcr. The experimental design of the dose response is especially elegant and a great model for how to do these experiments. With some small improvements for readability, this manuscript will surely find broad interest to the genotype-phenotype, protein science, allostery, structural biology, and biophysics fields. We were prompted to do this by Review Commons and are posting our submitted review here: Willow Coyote-Maestas has relevant expertise in high throughput screening, protein engineering, genotype-phenotype experiments, protein allostery, dating mining, and machine learning. James Fraser has expertise in structural biology, genotype-phenotype experiments, protein allostery, protein dynamics, protein evolution, etc.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): The authors use deep mutational scanning to infer the dose-response curves of ~60,000 variants of the LacI repressor and so provide an unprecedently systematic dataset of how mutations affect an allosteric protein. Overall this is an interesting dataset that highlights the potential of mutational scanning for rapidly identifying diverse variants of proteins with desired or unexpected activities for synthetic biology/bioengineering. The relatively common inverted phenotypes and their sequence diversity is interesting, as is the identification of several hundred genotypes with non-sigmoidal band-stop dose-response curves and their enrichment in specific protein regions. A weakness of the study is that some of the parameter estimates seem to have high uncertainty and this is not clearly presented or the impact on the conclusions analysed. A second shortcoming is that there is little mechanistic insight beyond the enrichments of mutations with different effects in different regions of the protein. But as a first overview of the diversity of mutational effects on the dose-response curve of an allosteric protein, this is an important dataset and analysis. **Comments** **Data quality and reproducibility** "The flow cytometry results confirmed both the qualitative and quantitative accuracy of the new method (Supplementary Figs. 3-7)"

      • There need to be quantitative measures of accuracy in the text here for the different parameters.

      We believe this comment is addressed along with the following two comments.

      • Sup fig 7 panels should be main text panels - they are vital for understanding the data quality In particular, the G0 parameter estimates from the library appear to have a lower bound ie they provide no information below a cytometry Go of ~10^4. This is an important caveat and needs to be highlighted in the main text. The Hill parameter (n) estimate for wt (dark gray) replicate barcodes is extremely variable - why is this?

      • In general there is not a clear enough presentation of the uncertainty and biases in the parameter estimations which seem to be rather different for the 4 parameters. Only the EC50 parameter seems to correlate very well with the independent measurements.

      We thank the reviewer for identifying a need for more information on the accuracy of this method. So, we have moved Supplementary Fig. 7 to the main text (Fig 2 in the revised manuscript) and have added coefficients of correlation to each Hill equation parameter in that figure caption. Furthermore, we have added new data (Supplementary Fig. 11), which shows the uncertainty associated with different gene expression levels. Finally, we have added a discussion on the accuracy of this method for each parameter of the Hill equation to the main text. Estimation of the Hill coefficient (n) from data is often highly uncertain and variable, because that parameter estimate can be highly sensitive to random measurement errors at a single point on the curve. The estimate for the wild type appears to be highly variable because the plot contains 53 replicate measurements. So, the plotted variability represents approximately 2 standard deviations. The spread of wild-type results in the plot is consistent with the stated RMSE for the Hill coefficient. Furthermore, the Hill coefficient is not used in any of the additional quantitative analysis in our manuscript, partially because of its relatively high measurement uncertainty, but also because, based on the biophysical models, it is not as informative of the underlying biophysical changes.

      “We compared the Hill equation parameters from the library-scale measurement to those same parameters determined from flow cytometry measurements for each of the chemically synthesized LacI variants (Fig. 2). This served as a check of the new library-scale method’s overall ability to measure dose-response curves with quantitative accuracy. The accuracy for each Hill equation parameter in the library-scale measurement was: 4-fold for G0, 1.5-fold for G∞, 1.8-fold for EC50, and ± 0.28 for n. For G0, G∞, and EC50, we calculated the accuracy as: "exp" ["RMSE" ("ln" ("x" ))], where "RMSE" ("ln" ("x" )) is the root-mean-square difference between the logarithm of each parameter from the library-scale and cytometry measurements. For n, we calculated the accuracy simply as the root-mean-square difference between the library-scale and cytometry results. The accuracy for the gene expression levels (G0 and G∞) was better at higher gene expression levels (typical for G∞) than at low gene expression levels (typical for G0), which is expected based on the non-linearity of the fitness impact of tetracycline (Supplementary Figs. 10-11). Measurements of the Hill coefficient, n, had high relative uncertainties for both barcode-sequencing and flow cytometry, and so the parameter n was not used in any quantitative analysis.”

      • The genotypes in the mutagenesis library contain a mean of 4.4 aa substitutions and the authors us a neural network to estimate 3 of the Hill equation parameters (with uncertainties) for the 1991/2110 of the single aa mutations. It would be useful to have an independent experimental evaluation of the reliability of these inferred single aa mutational effects by performing facs on a panel of single aa mutants (using single aa mutants in sup fig 3-7, if there are any, or newly constructed mutants).

      We agree that the predictive performance of the DNN requires experimental validation. We evaluated the performance by withholding data from 20% of the library, including nearly 200 variants with single amino acid substitutions, and then compared the predicted effect of those substitutions to the measured effect. The results of this test are reported in Supplementary Fig. 14. Additionally, we have adjusted the main text to more clearly explain the evaluation process.

      “To evaluate the accuracy of the model predictions, we used the root-mean-square error (RMSE) for the model predictions compared with the measurement results. We calculated RMSE using only held-out data not used in the model training, and the split between held-out data and training data was chosen so that all variants with a specific amino acid sequence appear in only one of the two sets.” __ __

      • fig3/"Combining multiple substitutions in a single protein almost always has a log-additive effect on EC50." How additive are the other 2 parameters? this analysis should also be presented in fig 3. If they are not as additive is it simply because of lower accuracy of the measurements? If the mutational effects are largely additive, then a simple linear model (rather than the DNN) could be used to estimate the single mutant effects from the multiple mutant genotypes.

      We agree with the reviewer that exploring the log-additivity of the Hill equation parameters is informative, and have included Supplementary Figure 21, which displays this information. Furthermore, we expanded the discussion of log-additivity on all three parameters in the main text:

      “Combining multiple substitutions in a single protein almost always has a log-additive effect on EC50. That is, the proportional effects of two individual amino acid substitutions on the EC50 can be multiplied together. For example, if substitution A results in a 3-fold change, and substitution B results in a 2-fold change, the double substitution, AB, behaving log-additively, results in a 6-fold change. Only 0.57% (12 of 2101) of double amino acid substitutions in the measured data have EC50 values that differ from the log-additive effects of the single substitutions by more than 2.5-fold (Fig. 4). This result, combined with the wide distribution of residues that affect EC50, reinforces the view that allostery is a distributed biophysical phenomenon controlled by a free energy balance with additive contributions from many residues and interactions, a mechanism proposed previously1,39 and supported by other recent studies17, rather than a process driven by the propagation of local, contiguous structural rearrangements along a defined pathway.

      A similar analysis of log-additivity for G0 and G∞ is complicated by the more limited range of measured values for those parameters, the smaller number of substitutions that cause large shifts in G0 or G∞, and the higher relative measurement uncertainty at low G(L). However, the effects of multiple substitutions on G0 and G∞ are also consistent with log-additivity for almost every measured double substitution variant (Supplementary Fig. 21).”

      **Presentation/clarity of text and figures**

      • The main text implies that the DNN is trained to predict 3 parameters of the Hill equation but not the Hill coefficient (n). This should be clarified / justified in the main text.

      We agree that the decision to exclude the parameter ‘n’ requires explanation in the main text. To address this, we have added to the main text:

      “Measurements of the Hill coefficient, n, had high relative uncertainties for both barcode-sequencing and flow cytometry, and so the parameter n was not used in any quantitative analysis.”

      and

      “We trained the model to predict the Hill equation parameters G0, G∞, and EC50 (Supplementary Fig. 13), the three Hill equation parameters that were determined with relatively low uncertainty by the library-scale measurement.”

      • The DNN needs to be better explained and justified in the main text for a general audience. How do simpler additive models perform for phenotypic prediction / parameter inference?

      We agree with the reviewer that the DNN needs to be justified in the main text. As part of the revision plan, we propose to compare the predictive performance of the DNN to an additive model.

      • Ref 14. analyses a much smaller set of mutants in the same protein but using an explicit biophysical model. It would be helpful to have a more extensive comparison with the approach and conclusions to this previous study.

      Throughout the manuscript, we frame the results and discussion in terms of the referenced biophysical model. Using the model, we describe the biophysical effects that a substitution may have on LacI, based on observed changes to function associated with that substitution. We also comment briefly on the limitations of this model when applied to the extensive dataset presented here.

      “Most of the non-silent substitutions discussed above are more likely to affect the allosteric constant than either the ligand or operator affinities. Within the biophysical model, those affinities are specific to either the active or inactive state of LacI, i.e. they are defined conditionally, assuming that the protein is in the appropriate state. So, almost by definition, substitutions that affect the ligand-binding or operator-binding affinities (as defined in the models) must be at positions that are close to the ligand-binding site or within the DNA-binding domain. Substitutions that modify the ability of the LacI protein to access either the active state or inactive state, by definition, affect the allosteric constant. This includes, for example, substitutions that disrupt dimer formation (dissociated monomers are in the inactive state), substitutions that lock the dimer rigidly into either the active or inactive state, or substitutions that more subtly affect the balance between the active and inactive states. Thus, because there are many more positions far from the ligand- and DNA- binding regions than close to those regions, there are many more opportunities for substitutions to affect the allosteric constant than the other biophysical parameters. Note that this analysis assumes that substitutions don’t perturb the LacI structure too much, so that the active and inactive states remain somehow similar to the wild-type states. Our results suggest that this is not always the case: consider, for example, the substitutions at positions __K84 and M98 discussed above and the substitutions resulting in the inverted and band-stop phenotypes discussed below.”__

      • Enrichments need statistical tests to know how unexpected that results are e.g. p5 line 12 "67% of strongly inverted variants have substitutions near the ligand-binding pocket"

      We agree that this information is necessary to interpret the results. We have included p-values (previously reported only in the Methods section) throughout the main text of the manuscript.

      The publication by Poelwijk et al. was considered extensively when planning this work, and failing to cite that manuscript would have been tremendously unjust. We have included it, as well as a few additional references that have identified and discussed inverted LacI variants. We sincerely thank the reviewer for identifying this oversight.

      • What mechanisms do the authors envisage that could produce the band-stop dose response curves? There is likely previous theoretical work that could be cited here. In general there is little discussion of the biophysical mechanisms that could underlie the various mutational effects.

      We agree with the reviewer, that discussing the biophysical mechanisms that underlie many of the reported mutations is important to understand the results. We have expanded the subsection “Effects of amino acid substitutions on LacI phenotype” to include discussion on several of the key substitutions (or groups of substitutions) and their potential biophysical effects. Additionally, we consider mechanism that may underlie the band-stop sensor, and propose one model that could explain the band-stop phenotype:

      “In particular, the biphasic dose-response of the band-stop variants suggests negative cooperativity: that is, successive ligand binding steps have reduced ligand binding affinity. Negative cooperativity has been shown to be required for biphasic dose-response curves__42,43. The biphasic dose-response and apparent negative cooperativity are also reminiscent of systems where protein disorder and dynamics have been shown to play an important role in allosteric function1, including catabolite activator protein (CAP)44,45 and the Doc/Phd toxin-antitoxin system46. This suggests that entropic changes may also be important for the band-stop phenotype. A potential mechanism is that band-stop LacI variants have two distinct inactive states: an inactive monomeric state and an inactive dimeric state. In the absence of ligand, inactive monomers may dominate the population. Then, at intermediate ligand concentrations, ligand binding stabilizes dimerization of LacI into an active state which can bind to the DNA operator and repress transcription. When a second ligand binds to the dimer, it returns to an inactive dimeric state, similar to wildtype LacI. This mechanism, and other possible mechanisms, do not match the MWC model of allostery or its extensions2,13–15__ and require a more comprehensive study and understanding of the ensemble of states in which these band-stop LacI variants exist.”

      • "This result, combined with the wide distribution of residues that affect EC50, suggests that LacI allostery is controlled by a free energy balance with additive contributions from many residues and interactions." 'additive contributions and interactions' covers all possible models of vastly different complexity i.e. this sentence is rather meaningless.

      We have attempted to contextualize this statement by adding additional discussion and references. We hope these additions give more meaning to this section.

      “__This result, combined with the wide distribution of residues that affect EC50, reinforces the view that allostery is a distributed biophysical phenomenon controlled by a free energy balance with additive contributions from many residues and interactions, a mechanism proposed previously1,39 and supported by other recent studies17, rather than a process driven by the propagation of local, contiguous structural rearrangements along a defined pathway.”__

      • fig 4 c and d compress a lot of information into one figure and I found this figure confusing. It may be clearer to have multiple panels with each panel presenting one aspect. It is also not clear to me what the small circular nodes exactly represent, especially when you have one smaller node connected to two polygonal nodes, and why they don't have the same colour scale as the polygonal nodes.

      We agree with the reviewer that figure 4 (or Figure 5 in the revised manuscript) contains a lot of information. The purpose of this figure is to convey the structural and genetic diversity among the sets of inverted variants and band-stop variants. We designed this figure to convey this point at two levels: a brief overview, where the diversity is apparent by quickly considering the figure, and at a more informative level, with some quantitative data and structurally relevant points highlighted. We have modified the caption slightly, in an effort to improve clarity.

      • line 25 - 'causes a conformational change' -> 'energetic change' (allostery does not always involve conformational change

      We thank the reviewer for this comment and have adopted a more modern language describe allostery throughout the manuscript.

      • sup fig 5 legend misses '5'

      We thank the reviewer for pointing this out, we have attempted to number all figures and tables more carefully.

      • sup fig 7. pls add correlation coefficients to these plots (and move to main text figures).

      We agree that this information is of interest and have included this data as main text Figure 2. In addition, we have included coefficients of correlation in the caption of this figure.

      • Reference 21 is just a title and pubmed link

      We thank the reviewer for identifying this error, we have corrected this in the references.

      • "fitness per hour" -> growth rate

      To ensure that this connection is clearly established, when we introduce fitness for the first time, we clarify that it relates to growth rate:

      “Consequently, in the presence of tetracycline, the LacI dose-response modulates cellular fitness (i.e. growth rate) based on the concentration of the input ligand isopropyl-β-D-thiogalactoside (IPTG).”

      Also, we define ‘fitness’ in the Methods section:

      “The experimental approach for this work was designed to maintain bacterial cultures in exponential growth phase for the full duration of the measurements. So, in all analysis, the Malthusian definition of fitness was used, i.e. fitness is the exponential growth rate__58__.”

      • page 6 line 28 - "discoverable only via large-scale landscape measurements" - directed evolution approaches can also discover such genotypes (see e.g. Poelwijk /Tans paper). Please re-phrase.

      We agree with the reviewer and have adjusted the main text accordingly.

      “__Overall, our findings suggest that a surprising diversity of useful and potentially novel allosteric phenotypes exist with genotypes that are readily discoverable via large-scale landscape measurements.”__

      • pls define jargon the first time it is used e.g. band-stop and band-pass

      We agree that all unconventional terms should be explicitly defined when used, and we have attempted to define the band-pass and band-stop dose-response curves more clearly in the main text:

      “These include examples of LacI variants with band-stop dose-response curves (i.e. variants with high-low-high gene expression; e.g. Fig. 1e, Supplementary Fig. 7), and LacI variants with band-pass dose-response curves (i.e. variants with low-high-low gene expression; e.g. Supplementary Fig. 8).”

      **Methods/data availability/ experimental and analysis reproducibility:** The way that growth rate is calculated on page 17 equation 1- This section is confusing. Please be explicit about how you accounted for the lag phase, what the lag phase was, and total population growth during this time. In addition, please report the growth curves from the wells of the four plates, the final OD600 of the pooled samples, and exact timings of when the samples were removed from 37 degree incubation in a table. These are critical for calculating growth rate in individual clones downstream.

      We thank the reviewer for identifying the need to clarify this section of text. The ‘lag’ in this section referred to a delay before tetracycline began impacting the growth rate of cells. To address this, we have changed ‘lag’ in this context to ‘delay.’ Furthermore, we have attempted to clarify precisely the cause of this delay, and how we accounted for it in calculating growth rates:

      For samples grown with tetracycline, the tetracycline was only added to the culture media for Growth Plates 2‑4. Because of the mode of action of tetracycline (inhibition of translation), there was a delay in its effect on cell fitness: Immediately after diluting cells into Growth Plate 2 (the first plate with tetracycline), the cells still had a normal level of proteins needed for growth and proliferation and they continued to grow at nearly the same rate as without tetracycline. Over time, as the level of proteins required for cell growth decreased due to tetracycline, the growth rate of the cells decreased. Accordingly, the analysis accounts for the variation in cell fitness (growth rate) as a function of time after the cells were exposed to tetracycline. With the assumption that the fitness is approximately proportional to the number of proteins needed for growth, the fitness as a function of time is taken to approach the new value with an exponential decay:

      (3)

      where μitet is the steady-state fitness with tetracycline, and α is a transition rate. The transition rate was kept fixed at α = log(5), determined from a small-scale calibration measurement. Note that at the tetracycline concentration used during the library-scale measurement (20 µg/mL), μitet was greater than zero even at the lowest G(L) levels (Supplementary Fig. 10). From Eq. (3), the number of cells in each Growth Plate for samples grown with tetracycline is:

      • What were the upper and lower bounds of the measurements? (LacI deletion vs Tet deletion / autofluoresence phenotype - true 100% and true 0% activity). Knowing and reporting these bounds will also allow easier comparison between datasets in the future.

      We agree that knowing the limitations of the measurement are important for contextualizing the results. To address this point, we have included Supplementary Fig. 11, which shows the uncertainty of the measurement across gene expression levels.

      Please clarify whether there was only 1 biological replicate (because the plates were pooled before sequencing)? Or if there were replicates present an analysis of reproducibility.

      We thank the reviewer for pointing out the ambiguity in the original manuscript. The library-scale measurement reported here was completed once, the 24 growth conditions were spread across 96 wells, so each condition occupied 4 wells. The 4 wells were combined prior to DNA extraction. We have clarified this process in the methods by removing ‘duplicate’:

      “Growth Plate 2 contained the same IPTG gradient as Growth Plate 1 with the addition of tetracycline (20 µg/mL) to alternating rows in the plate, resulting in 24 chemical environments, with each environment spread across 4 wells.”

      Despite there being only a single library-scale measurement, the accuracy and reliability of the results are supported by many distinct biological replicates within the library (i.e. LacI variants with the same amino acid sequence but with different barcodes, see new Supplementary Fig. 9), as well as over 100 orthogonal dose-response curve measurements completed with flow cytometry (Figure 2). We believe these support the reproducibility of the work and we have included statistical analysis on the accuracy of the library-scale measurement results.

      “To test the accuracy of the new method for library-scale dose-response curve measurements, we independently verified the results for over 100 LacI variants from the library. For each verification measurement, we chemically synthesized the coding DNA sequence for a single variant and inserted it into a plasmid where LacI regulates the expression of a fluorescent protein. We transformed the plasmid into E. coli and measured the resulting dose-response curve with flow cytometry (e.g. Fig. 1e). We compared the Hill equation parameters from the library-scale measurement to those same parameters determined from flow cytometry measurements for each of the chemically synthesized LacI variants (Fig. 2). This served as a check of the new library-scale method’s overall ability to measure dose-response curves with quantitative accuracy. The accuracy for each Hill equation parameter in the library-scale measurement was: 4-fold for G0, 1.5-fold for G∞, 1.8-fold for EC50, and ± 0.28 for n. For G0, G∞, and EC50, we calculated the accuracy as: "exp" ["RMSE" ("ln" ("x" ))], where "RMSE" ("ln" ("x" )) is the root-mean-square difference between the logarithm of each parameter from the library-scale and cytometry measurements. For n, we calculated the accuracy simply as the root-mean-square difference between the library-scale and cytometry results (Supplementary Fig. 7).”

      • Please provide supplementary tables of the data (in addition to the raw sequencing files). Both a table summarising the growth rates, inferred parameter values and uncertainties for genotypes and a second table with the barcode sequence counts across timepoints and associated experimental data.

      We agree that access to this information is critical. Due to the size of the associated data, we have made this data available for download in a public repository. We direct readers to the repository information in the “Data Availability” statement:

      “The raw sequence data for long-read and short-read DNA sequencing have been deposited in the NCBI Sequence Read Archive and are available under the project accession number PRJNA643436. Plasmid sequences have been deposited in the NCBI Genbank under accession codes MT702633, and MT702634, for pTY1 and pVER, respectively.

      The processed data table containing comprehensive data and information for each LacI variant in the library is publicly available via the NIST Science Data Portal, with the identifier ark:/88434/mds2-2259 (https://data.nist.gov/od/id/mds2-2259 or https://doi.org/10.18434/M32259). The data table includes the DNA barcode sequences, the barcode read counts, the time points used for the libarary-scale measurement, fitness estimates for each barcoded variant across the 24 chemical environments, the results of both Bayesian inference models (including posterior medians, covariances, and 0.05, 0.25, 0.75, and 0.95 posterior quantiles), the LacI CDS and amino acid sequence for each barcoded variant (as determined by long-read sequencing), the number of LacI CDS reads in the long-read sequencing dataset for each barcoded variant, and the number of unintended mutations in other regions of the plasmid (from the long-read sequencing data).

      Code Availability

      All custom data analysis code is available at https://github.com/djross22/nist_lacI_landscape_analysis.”

      Reviewer #2 (Significance (Required)): The authors present an unprecedently systematic dataset of how mutations affect an allosteric protein. This illustrates the potential of mutational scanning for rapidly identifying diverse variants of allosteric proteins / regulators with desired or unexpected activities for synthetic biology/bioengineering. Previous studies have identified inverted dose-response curve for a lacI phenotypes https://www.cell.com/fulltext/S0092-8674(11)00710-0 but using directed evolution i.e. they were not comprehensive in nature. The audience of this study would be protein engineers, the allostery field, synthetic biologists and the mutation scanning community and evolutionary biologists interested in fitness landscapes. My relevant expertise is in deep mutational scanning and genotype-phenotype landscapes, including work on allosteric proteins and computational methods. Reviewer #3 (Evidence, reproducibility and clarity (Required)): In this interesting manuscript the authors developed in ingenious high throughput screening approach which utilizes DNA barcoding to select variants of LacI proteins with different allosteric profiles for IPTG control using E. coli fitness (growth rate) in a range of antibiotic concentrations as a readout thus providing a genotype-phenotype map for this enzyme. The authors used library of 10^5-10^ variants of LacI expressed from a plasmid and screened for distinct IPTG activation profiles under different conditions including several antibiotic stressors. As a result they identified various patterns of activation including normal (sigmoidal increase), inverted (decrease) and unusual stop-band where the dependence of growth on [IPTG] is non-monotonic. The study is well-conceived, well executed and provides statistically significant results. The key advance provided by this work is that it allows to identify specific mutations in LacI connected with one of three allosteric profiles. The paper is clearly written all protocols are explained and it can be reproduced in a lab that possesses proper expertise in genetics. Reviewer #3 (Significance (Required)): The significance of this work is that it discovered libraries of LacI variants which give rise to distinct profiles of allosteric control of activation of specific genes (in this case antibiotic resistance) by the Lac mechanism. The barcoding technology allowed to identify specific mutations which are (presumably) causal of changes in the way how allosteric activation of LacI by IPTG works. As such it provides a rich highly resolved dataset of LacI variants for further exploration and analysis. Alongside with these strengths several weaknesses should also be noted:

      1. First and foremost the paper does not provide any molecular-level biophysical insights into the impact of various types of mutations on molecular properties of LacI. Do the mutations change binding affinity to IPTG? Binding side? Communication dynamics? Stability? The diagrams of connectivity for the stop-band mutations (Fig.4) do not provide much help as they do not tell much which molecular properties of LacI are affected by mutations and why certain mutations have specific effect on allostery. A molecular level exploration would make this paper much stronger.

      We address this comment with comment (2), below.

      1. In the same vein a theoretical MD study would be quite illuminating in answering the key unanswered question of this work: Why do mutations have various and pronounced effects of allosteric regulation by LacI?. I think publication of this work should not be conditioned on such study but again adding would make the work much stronger.

      We appreciate the reviewer’s comments and agree that investigating the molecular mechanisms driving the phenotypic changes identified in this work is a compelling proposition. Throughout the manuscript, we identify positions and specific amino acid substitutions that affect the measurable function of LacI, and occasionally discuss the biophysical effects that may underly these changes. We have expanded the discussion to include possible molecular-level effects.

      The dataset reported here identifies many potential candidates for molecular-level study, either computationally or experimentally. However, this manuscript is scoped to report a large-scale method to measure the genotype-phenotype landscape of an allosteric protein, and a limited investigation into the emergence of novel phenotypes that are identified in the landscape.

      1. Lastly a recent study PNAS v.116 pp.11265-74 (2019) explored a library of variants of E. coli Adenylate Kinase and showed the relationship between allosteric effects due to substrate inhibition and stability of the protein. Perhaps a similar relationship can explored in this case of LacI.

      We thank the reviewer for highlighting this publication. We agree with the reviewer that similar effects may play a role in the activity of LacI. Establishing such a relationship would require additional experimentation, and, we think, is outside the scope of the submitted manuscript. Although, we hope follow-up studies using this dataset will investigate this phenomenon and other related mechanisms, that may underlie the band-stop phenotype and other observed effects.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this interesting manuscript the authors developed in ingenious high throughput screening approach which utilizes DNA barcoding to select variants of LacI proteins with different allosteric profiles for IPTG control using E. coli fitness (growth rate) in a range of antibiotic concentrations as a readout thus providing a genotype-phenotype map for this enzyme. The authors used library of 10^5-10^ variants of LacI expressed from a plasmid and screened for distinct IPTG activation profiles under different conditions including several antibiotic stressors. As a result they identified various patterns of activation including normal (sigmoidal increase), inverted (decrease) and unusual stop-band where the dependence of growth on [IPTG] is non-monotonic. The study is well-conceived, well executed and provides statistically significant results. The key advance provided by this work is that it allows to identify specific mutations in LacI connected with one of three allosteric profiles. The paper is clearly written all protocols are explained and it can be reproduced in a lab that possesses proper expertise in genetics.

      Significance

      The significance of this work is that it discovered libraries of LacI variants which give rise to distinct profiles of allosteric control of activation of specific genes (in this case antibiotic resistance) by the Lac mechanism. The barcoding technology allowed to identify specific mutations which are (presumably) causal of changes in the way how allosteric activation of LacI by IPTG works. As such it provides a rich highly resolved dataset of LacI variants for further exploration and analysis.

      Alongside with these strengths several weaknesses should also be noted:

      1. First and foremost the paper does not provide any molecular-level biophysical insights into the impact of various types of mutations on molecular properties of LacI. Do the mutations change binding affinity to IPTG? Binding side? Communication dynamics? Stability? The diagrams of connectivity for the stop-band mutations (Fig.4) do not provide much help as they do not tell much which molecular properties of LacI are affected by mutations and why certain mutations have specific effect on allostery. A molecular level exploration would make this paper much stronger.
      2. In the same vein a theoretical MD study would be quite illuminating in answering the key unanswered question of this work: Why do mutations have various and pronounced effects of allosteric regulation by LacI?. I think publication of this work should not be conditioned on sucgh study but again adding would make the work much stronger.
      3. Lastly a recent study PNAS v.116 pp.11265-74 (2019) explored a library of variants of E. coli Adenylate Kinase and showed the relationship between allosteric effects due to substrate inhibition and stability of the protein. Perhaps a similar relationship can explored in this case of LacI.
    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors use deep mutational scanning to infer the dose-response curves of ~60,000 variants of the LacI repressor and so provide an unprecedently systematic dataset of how mutations affect an allosteric protein. Overall this is an interesting dataset that highlights the potential of mutational scanning for rapidly identifying diverse variants of proteins with desired or unexpected activities for synthetic biology/bioengineering. The relatively common inverted phenotypes and their sequence diversity is interesting, as is the identification of several hundred genotypes with non-sigmoidal band-stop dose-response curves and their enrichment in specific protein regions. A weakness of the study is that some of the parameter estimates seem to have high uncertainty and this is not clearly presented or the impact on the conclusions analysed. A second shortcoming is that there is little mechanistic insight beyond the enrichments of mutations with different effects in different regions of the protein. But as a first overview of the diversity of mutational effects on the dose-response curve of an allosteric protein, this is an important dataset and analysis.

      Comments

      Data quality and reproducibility

      "The flow cytometry results confirmed both the qualitative and quantitative accuracy of the new method (Supplementary Figs. 3-7)"

      • There need to be quantitative measures of accuracy in the text here for the different parameters.
      • Sup fig 7 panels should be main text panels - they are vital for understanding the data quality In particular, the G0 parameter estimates from the library appear to have a lower bound ie they provide no information below a cytometry Go of ~10^4. This is an important caveat and needs to be highlighted in the main text. The Hill parameter (n) estimate for wt (dark gray) replicate barcodes is extremely variable - why is this?
      • In general there is not a clear enough presentation of the uncertainty and biases in the parameter estimations which seem to be rather different for the 4 parameters. Only the EC50 parameter seems to correlate very well with the independent measurements.
      • The genotypes in the mutagenesis library contain a mean of 4.4 aa substitutions and the authors us a neural network to estimate 3 of the Hill equation parameters (with uncertainties) for the 1991/2110 of the single aa mutations. It would be useful to have an independent experimental evaluation of the reliability of these inferred single aa mutational effects by performing facs on a panel of single aa mutants (using single aa mutants in sup fig 3-7, if there are any, or newly constructed mutants).
      • fig3/"Combining multiple substitutions in a single protein almost always has a log-additive effect on EC50." How additive are the other 2 parameters? this analysis should also be presented in fig 3. If they are not as additive is it simply because of lower accuracy of the measurements? If the mutational effects are largely additive, then a simple linear model (rather than the DNN) could be used to estimate the single mutant effects from the multiple mutant genotypes.

      Presentation/clarity of text and figures

      • The main text implies that the DNN is trained to predict 3 parameters of the Hill equation but not the Hill coefficient (n). This should be clarified / justified in the main text.
      • The DNN needs to be better explained and justified in the main text for a general audience. How do simpler additive models perform for phenotypic prediction / parameter inference?
      • Ref 14. analyses a much smaller set of mutants in the same protein but using an explicit biophysical model. It would be helpful to have a more extensive comparison with the approach and conclusions o this previous study.
      • Enrichments need statistical tests to know how unexpected that results are e.g. p5 line 12 "67% of strongly inverted variants have substitutions near the ligand-binding pocket"
      • missing citation: Poelwijk et al 2011 https://www.cell.com/fulltext/S0092-8674(11)00710-0 previously reported an inverted dose-response curve for a lacI mutant.
      • What mechanisms do the authors envisage that could produce the band-stop dose response curves? There is likely previous theoretical work that could be cited here. In general there is little discussion of the biophysical mechanisms that could underlie the various mutational effects.
      • "This result, combined with the wide distribution of residues that affect EC50, suggests that LacI allostery is controlled by a free energy balance with additive contributions from many residues and interactions." 'additive contributions and interactions' covers all possible models of vastly different complexity i.e. this sentence is rather meaningless.
      • fig 4 c and d compress a lot of information into one figure and I found this figure confusing. It may be clearer to have multiple panels with each panel presenting one aspect. It is also not clear to me what the small circular nodes exactly represent, especially when you have one smaller node connected to two polygonal nodes, and why they don't have the same colour scale as the polygonal nodes.
      • line 25 - 'causes a conformational change' -> 'energetic change' (allostery does not always involve conformational change
      • sup fig 5 legend misses '5'
      • sup fig 7. pls add correlation coefficients to these plots (and move to main text figures).
      • Reference 21 is just a title and pubmed link
      • "fitness per hour" -> growth rate
      • page 6 line 28 - "discoverable only via large-scale landscape measurements" - directed evolution approaches can also discover such genotypes (see e.g. Poelwijk /Tans paper). Please re-phrase.
      • pls define jargon the first time it is used e.g. band-stop and band-pass

      Methods/data availability/ experimental and analysis reproducibility:

      • The way that growth rate is calculated on page 17 equation 1- This section is confusing. Please be explicit about how you accounted for the lag phase, what the lag phase was, and total population growth during this time. In addition, please report the growth curves from the wells of the four plates, the final OD600 of the pooled samples, and exact timings of when the samples were removed from 37 degree incubation in a table. These are critical for calculating growth rate in individual clones downstream.
      • What were the upper and lower bounds of the measurements? (LacI deletion vs Tet deletion / autofluoresence phenotype - true 100% and true 0% activity). Knowing and reporting these bounds will also allow easier comparison between datasets in the future.
      • Please clarify whether there was only 1 biological replicate (because the plates were pooled before sequencing)? Or if there were replicates present an analysis of reproducibility.
      • Please provide supplementary tables of the data (in addition to the raw sequencing files). Both a table summarising the growth rates, inferred parameter values and uncertainties for genotypes and a second table with the barcode sequence counts across timepoints and associated experimental data.

      Significance

      The authors present an unprecedently systematic dataset of how mutations affect an allosteric protein. This illustrates the potential of mutational scanning for rapidly identifying diverse variants of allosteric proteins / regulators with desired or unexpected activities for synthetic biology/bioengineering.

      Previous studies have identified inverted dose-response curve for a lacI phenotypes https://www.cell.com/fulltext/S0092-8674(11)00710-0 but using directed evolution i.e. they were not comprehensive in nature.

      The audience of this study would be protein engineers, the allostery field, synthetic biologists and the mutation scanning community and evolutionary biologists interested in fitness landscapes.

      My relevant expertise is in deep mutational scanning and genotype-phenotype landscapes, including work on allosteric proteins and computational methods.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors study allostery with a beautiful genotype-phenotype experiment to study the fitness landscape of an allosteric lac repressor protein. The authors make a mutational library using error prone pcr and measure the impact on antibiotic resistance protein expression at varying levels of ligand, IPTG, expression. After measuring the impact of mutations authors fill-in the missing data using a neural net model. This type of dose response is not standard in the field, but the richness of their data and the discovery of the "band pass" phenomena prove its worth here splendidly.

      Using this mixed experimental/predicted data the authors explore how each mutation alters the different parameters of a hill equation fit of a dose response curve. Using higher order mutational space the authors look at how mutations can qualitatively switch phenotypes to inverted or band-stop dose-response curves. To validate and further explore a band-stop novel phenotype, the authors focused on a triple mutant and made all combinations of the 3 mutations. The authors find that only one mutation alone alters the dose-response and only in combination does a band-stop behavior present itself. Overall this paper is a fantastic data heavy dive into the allosteric fitness landscape of protein.

      Major

      Overall, the data presented in this paper is thoroughly collected and analyzed making the conclusions well-based. We do not think additional experiments nor substantial changes are needed apart from including basic experimental details and more biophysical rationale/speculation as discussed in further detail below.

      The authors do a genotype-phenotype experiment that requires extensive deep sequencing experiments. However, right now quite a bit of basic statistics on the sequencing is missing. Baseline library quality is somewhat shown in supplementary fig 2 but the figure is hard to interpret. It would be good to have a table that states how many of all possible mutations at different mutation depths (single, double, etc) there are. Similarly, sequencing statistics are missing- it would be useful to know how many reads were acquired and how much sequencing depth that corresponds to. This is particularly important for barcode assignment to phenotype in the long-read sequencing. In addition, a synonymous mutation comparison is mentioned but in my reading that data is not presented in the supplemental figures section.

      The paper is very much written from an "old school" allostery perspective with static end point structures that are mutually exclusive - eg. p5l10 "relative ligand-binding affinity between the two conformations" - however, an ensemble of conformations is likely needed to explain their data. This is especially true for the bandpass and inverted phenotypes they observe. The work by Hilser et al is of particular importance in this area. We would invite the authors to speculate more freely about the molecular origins of their findings.

      Minor

      There are a number of small modifications. In general this paper is very technical and could use with some explanation and discussion for relevance to make the manuscript more approachable for a broader audience.

      P1L23: Ligand binding at one site causes a conformational changes that affects the activity of another > not necessarily true - and related to using more "modern" statistical mechanical language for describing allostery.

      P2L20: The core experiment of this paper is a selection using a mutational library. In the main body the authors mention the library was created using mutagenic pcr but leave it at that. More details on what sort of mutagenic pcr was used in the main body would be useful. According to the methods error prone pcr was used. Why use er-pcr vs deep point mutational libraries? Presumably to sample higher order phenotype? Rationale should be included. Were there preliminary experiments that helped calibrate the mutation level?

      P2L20: Baseline library statistics would be great in a table for coverage, diversity, etc especially as this was done by error prone pcr vs a more saturated library generation method. This is present in sup fig2 but it's a bit complicated.

      P2L26: How were FACS gates drawn? This is in support fig17 - should be pointed to here.

      P3L4: Where is the figure/data for the synonymous SNP mutations? This should be in the supplement.

      P3L20: The authors use a ML learning deep neural network to predict variant that were not covered in the screen. However, the library generation method is using error prone pcr meaning there could multiple mutations resulting in the same amino acid change. The models performance was determined by looking at withheld data however error prone pcr could result in multiple nonsynomymous mutations of the same amino acid. For testing were mutations truly withheld or was there overlap? Because several mutations are being represented by different codon combinations. Was the withheld data for the machine learning withholding specific substitutions?

      In addition, higher order protein interactions are complicated and idiosyncratic. I am surprised how well the neural net performs on higher order substitutions.

      P4L4: Authors find mutations at the dimer/tetramer interfaces but don't mention whether polymerization is required. is dimerization required for dna binding? Tetramerization?

      P4L8: Substitutions near the dimer interface both impact g0 and ec50, which authors say is consistent with a change in the allosteric constant. Can authors explain their thinking more in the paper to make it easier to follow? Are the any mutations in this area that only impact g0 or ec50 alone? Why may these specific residues modify dimerization?

      P4L8: The authors discuss the allosteric constant extensively within the paper but do not explain it. It would be helpful to have an explanation of this to improve readability. This explanation should include the statistical mechanical basis of it and some speculation about the ways it manifests biophysically.

      P4L1-16: Authors see mutations in the dimerization region that impact either G0 and Gsaturated in combination with Ec50 but not g0 and gsaturated together. Maybe we do not fully understand the hill equation but why are there no mutations that impact both g0 and gsaturated seen in support fig 13c? Why would mutations in the same region potentially impacting dimerization impact either g0 or gsaturated? What might be the mechanism behind divergent responses?

      P4L29: for interpretability it would be good to explain what log-additive effect means in the context of allostery.

      P4L34-P5L19: This section is wonderful. Really cool results and interesting structural overlap!

      P5L34 Helix 9 of the protein is mentioned but it's functional relevance is not. This is common throughout the paper - it would be useful for there to be an overview somewhere to help the reader contextualize the results with known structural role of these elements.

      P5L39: The authors identified a triple mutant with the band-stop phenotype then made all combination of the triple mutant. Of particular interest is R195H/G265D which is nearly the same as the triple mutant. It would be nice if the positions of each of these mutations and have some discussion to begin to rationalize this phenotype, even if to point out how far apart they are and that there is no easy structural rationale!

      P6L9: There should be more discussion of the significance of this work directly compared to what is known. For instance negative cooperativity is mentioned as an explanation for bi-phasic dose response but this idea is not explained. Why would the relevant free energy changes be more entropic? Another example is the reverse-TetR phenotype observed by Hillen et al.

      P6L28: The authors mention that phenotypes exist with genotypes that are discoverable with genotype-phenotype landscapes. This study due to the constraints of error prone pcr were somewhat limited. How big is the phenotypic landscape? Is it worth doing a more systematic study? What is the optimal experimental design: Single mutations, doubles, random - where is there the most information. How far can you drift before your machine learning model breaks down? How robust would it be to indels?

      Figures:

      Sup figs 3-7: The comparison of library-based results and single mutants is a great example of how to validate genotype-phenotype experiments!

      Supp fig 5.: Missing figure number.

      Supp fig7: G0 appears to have very poor fit between library vs single mutant version. Why might this be? R^2 would likely be better to report here as opposed to RMSE as RMSE is sensitize to the magnitude of the data such that you cannot directly compare RMSE of say 'n' to G0.

      Sup fig13c: it is somewhat surprising that mutations only appear to effect g0 and not gsaturated. This implies that basal and saturated activity are not coupled. Is this expected? Why or why not?

      Significance

      Allostery is hard to comprehend because it involves many interacting residues propagating information across a protein. The Monod-Wyman-Changeux (MWC) and Koshland, Nemethy, and Filmer (KNF) models have been a long standing framework to explain much of allostery, however recent formulations have focused on the role of the conformational ensemble and a grounding in statistical mechanics. This manuscript focuses on the functional impact of mutations and therefore contribution of the amino acids to regulation. The authors unbiased approach of combining a dose-response curve and mutational library generation let them fit every mutant to a hill equation. This approach let the authors identify the allosteric phenotype of all measured mutations! The authors found inverted phenotypes which happen in homologs of this protein but most interesting is the strange and idiosyncratic 'Band-stop' phenotype. The band-stop phenotype is bi-phasic that will hopefully be followed up with further studies to explain the mechanism. This manuscript is a fascinating exploration of the adaptability of allosteric landscapes with just a handful of mutations.

      Genotype-phenotype experiments allow sampling immense mutational space to study complex phenotypes such as allostery. However, a challenge with these experiments is that allostery and other complicated phenomena come from immense fitness landscapes altering different parameters of the hill equation. The authors approach of using a simple error prone pcr library combined with many ligand concentrations allowed them to sample a very large space somewhat sparsely. However, they were able to predict this data by training and using a neural net model. I think this is a clever way to fill in the gaps that are inherent to somewhat sparse sampling from error prone pcr. The experimental design of the dose response is especially elegant and a great model for how to do these experiments.

      With some small improvements for readability, this manuscript will surely find broad interest to the genotype-phenotype, protein science, allostery, structural biology, and biophysics fields.

      We were prompted to do this by Review Commons and are posting our submitted review here:

      Willow Coyote-Maestas has relevant expertise in high throughput screening, protein engineering, genotype-phenotype experiments, protein allostery, dating mining, and machine learning.

      James Fraser has expertise in structural biology, genotype-phenotype experiments, protein allostery, protein dynamics, protein evolution, etc.

      Referees cross-commenting

      Seems like our biggest issues are: better uncertainty estimates of the parameters and more biophysical/mechanistic explanation/speculation. The uncertainty estimates might be tricky with the deep learning approach. The more biophysical speculation will require some re-writing around an ensemble rather than a static structure perspective.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):**

      This study addresses the role IL-13 in promoting lung damage following migration of the helminth N. brasiliensis larvae from the circulatory system to the lung. The work clearly shows using IL13-/- mice that Nb elicited IL-13 immunity at days 2-6 post-infection reduces pathology. The authors demonstrate an association with reduced eosinophils but no effect on neutrophil numbers.

      Proteomic analysis identifies a number of molecules known to be involved in protecting against type 2 pathologies such as relm-a and SP-D.

      The authors then identify a clear requirement for IL-13 in driving relm-a expression.

      Finally, the authors present a whole lung RNA transcript profile which largely supports their proteomic observations.

      Taken together the work presents a sound case for IL-13 being an important player in protecting against initial lung pathology.

      **Major requests:**

      The paper is really very interesting and important. To an extent it questions existing dogma of IL-13 being a driver of lung inflammation.

      Addressing the following could hopefully be achieved using archived samples or with an acceptable amount of extra experimental work.

      Figure 1: D2 and D6 Lung IL-13 concentrations (ELISA) in WT mice would set the scene for the papers story*

      We agree that showing IL-13 concentrations in the lung would nicely set the stage for the role of IL-13 during __Nippostrongylus__ infection. In the current paper, we showed IL-13 mRNA levels in Figure 3 but in a revised version, we will include D2 and D6 mRNA data in Figure 1. We attempted to quantify IL-13 protein levels in the BAL fluid of infected WT mice on D2 and D6 post-infection. However, IL-13 in the BAL was below the levels of detection for our ELISA assay. Therefore, we would need to measure IL-13 protein in total lung homogenates but we do not have material archived at present. If the editor feels this is a critical piece of data we will perform repeat experiments.

      Figure 2: The authors should add evidence that function/activity of neutrophils/eosinophils is changed/not changed: e.g. granzyme, MBP, EPO release in BAL and/or lung.

      As supported by referee 3, we feel that measuring functional readouts of neutrophils and eosinophils, while interesting, is currently outside of the scope of the paper. Further, with respect to eosinophils, we see a major reduction in total eosinophil numbers in IL-13-deficient mice which would likely result in a reduction in the level of functional molecules such as MBP. Thus, these readouts in the BAL may not be a reliable indicator of cellular function and results difficult to interpret in light of altered cell numbers.

      Additionally, some data showing changes in epithelial stress related cytokines such as IL-23 and IL-33 would be informative (IHC and /or ELISA).

      The reviewer makes a good suggestion that would complement our proteomics/pathway analysis. As described in our comment below regarding Foxa2 pathways, we do have additional data showing epithelial cell defects in the absence of IL-13 and will add this to a revised manuscript. While we do see a trend for a reduction in IL-33 mRNA in infected IL-13-deficient mice, it is difficult to correlate this with functional protein. If requested, we can perform additional analyses to measure IL-23 and/or IL-33 protein levels in archived BAL fluid or by IHC of lung sections.

      *The following will require a new experiment:

      The authors present a strong case for RELMa being associated with/driven by IL-13 responses. The following I feel would prove that IL-13 driven RELMa is important in reducing lung pathology. Can enhanced lung pathology or cell responses associated with pathology be reduced/altered by dosing Nb infected IL13-/- mice with recombinant relma or by restimulating BAL cells (for example) from IL-13-/- mice. This team is well placed to comment on the potential for such an in vivo experiment to be feasible.

      Or could the authors could also test the ability for other candidate molecules to reduce lung pathology? Would for example i/n dosing of IL-13-/- mice with AMCase, BRP39 or SP-D protect against pathology? It would be expected to be the case for SP-D.*

      Our previous study has shown that RELM-__a plays an important yet highly complex role during lung repair (see Sutherland et al. 2018: https://doi.org/10.1371/journal.ppat.1007423____). The suggested experiments would advance our understanding of the function of RELM-a and other effector molecules during type 2 immunity and repair. However, it is unlikely that the impact of IL-13 will be due to a single effector molecule (as supported by Reviewer 3) and thus these types of experiments would shift the focus of the paper from the impact of IL-13 to understanding specific function of type 2 effectors. Since our study deals more broadly with the function of IL-13 rather than the downstream effectors, we hope that this will open up further investigation of these specific molecules to the wider community to take forward.__

      *Reviewer #1 (Significance (Required)):

      The manuscript places IL-13 as an important initiator of early protection from acute lung damage. This is important as it is to an extent a non-canonical role for this cytokine. This is also important as IL-13 can be manipulated therapeutically. To maximise potential application of such drugs requires detailed understanding of the various contextual roles of IL-13. This study provides such evidence.

      The authors identify a range of target mediators.

      This is an important body of work that is useful for understanding how acute lung damage can be regulated.

      This work will be of interest to Type 2 immunologists, any researcher with an interest in pulmonary inflammation as well as mucosal immunity.

      I make these suggestions/comments based on my own background in Type 2 immunity, lung inflammation and parasitic helminth infection and immunity.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Chenery et al report that IL-13 plays a critical role in protecting mice from lung damage caused by the infection of a nematode, Nippostrongylus Brasiliensis, in WT or IL-13 knockout mice (IL-13 eGFP knock-in mice, Neill et al., Nature 2010). Phenotypically, they demonstrated that IL-13 genetic deficiency resulted in more severe lung injuries and haemorrhaging following the larvae migratory infections. Through the proteomic and transcriptomic profiling, they identified gene-expression changes involved in the cellular stress responses, e.g. up-regulating the expression of epithelial-derived type 2 molecules, controlled by IL-13. They also found that type 2 effector molecules including RELM-alpha and surfactant protein D were compromised in IL-13 knockout mice. Thus, they proposed that IL-13 has tissue-protective functions during lung injury and regulates epithelial cell responses during type 2 immunity in this acute setting. Overall, the manuscript was clearly written and a number of findings were interesting and expected compared to the published knowledge. However, this work could be improved and more impactful by further performing the following suggested experiments.

      Major points:

      1. It may not be accurate to claim that "IL-13 played a critical role in limiting tissue injury ... in the lung following infection" since IL-13 participates in both repelling worms and activating tissue reparative responses. It is very hard to distinguish these two kinds of responses with the current experimental settings because the much higher worm burden led to more direct lung damage in IL-13-/- mice than WT counterparts.*

      The reviewer raises an important point that we will need to clarify in a revised manuscript. Based on several studies, the role of IL-13 in mediating __Nippostrongylus expulsion occurs in the small intestine, after the parasites have already cleared the lung tissue. The number of worms in the lung do not differ at the time points we are investigating. We have qRT-PCR data measuring Nippostrongylus__-specific actin levels, which we and others have previously shown to accurately reflect worm numbers. We can therefore demonstrate that the differences in lung damage do not reflect a difference in the number of larvae in the lungs of IL-13 KO mice compared to WT mice. These data will be incorporated into the manuscript to better clarify this point.

      1. It would be more informative if the authors could perform the RNA-seq analysis on the IL-13-responsive cell type such as airway epithelial cells (goblet cell) by comparing WT vs IL13-/- in the context of lung damage caused by Nitrostrongylus Brasiliensis infection.*

      RNA-sequencing of specific cells would indeed be an excellent experiment that would reveal more IL-13-depedendent processes in our model. However, this would be a considerable undertaking at this stage (as reviewer 3 has pointed out). Nonetheless, our extended analysis of the Foxa2 pathway as requested below has highlighted a number of genes regulated by IL-13, which are known to be involved in epithelial cell function.

      We agree with the reviewer that showing additional validation data to support the Foxa2 defect in IL-13-deficient mice would strengthen our paper’s overall message. We have additional qRT-PCR data of IL-13-dependent genes regulated by Foxa2 (__Clca1, Muc5ac, Ccl11, and Foxa3__) that clearly support this epithelial cell-specific defect that we can readily incorporate into the revised paper.

      *Reviewer #2 (Significance (Required)):

      Overall, the manuscript was clearly written and a number of findings were interesting and expected compared to the published knowledge.

      **Referees cross-commenting**

      To Reviewer #1's Review: fair and constructive

      To Reviewer #3's Review: agree in general.*

      * Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this study, Allen, Sutherland and colleagues utilize IL-13 deficient mice to investigate the function of IL-13 in the early response to lung tissue damage induced by helminth infection. They demonstrate that IL-13 deficiency has significant effects on the acute tissue response to helminth infection (at day 2 and 6 post-infection). Particularly, IL-13 deficiency results in increased lung hemorrhaging, and more pronounced lung tissue damage evidenced by increased gaps in the alveolar architecture. They perform proteomic and transcriptomic profiling of the lungs to determine IL-13-induced pathways and demonstrate many protein and gene expression changes in the absence of IL-13. These include dysregulated collagens, reduced epithelial-derived proteins RELMalpha and surfactant protein D, downregulated pathways related to cellular stress, and increased genes associated with the Foxa2 pathway.

      Overall, the key conclusions are convincing, and the study design, methods and data analysis are clear, rigorous and thorough.

      **Minor Comments:**

      1. The authors concluded that lung epithelial cells are more sensitive to IL-13 than IL-4, but the intranasal injection of both proteins showed a similar induction of RELMα - investigation into this difference would be useful. Alternatively, providing an explanation for these different findings could be helpful.*

      Our suggestion that epithelial cells are likely to be more sensitive to IL-13 was based both on our data and the existing literature. We would agree that we do not have definitive evidence for this. Indeed, because the type 2 receptor can respond to both IL-4 and IL-13 this issue is difficult to easily resolve experimentally. We will expand on this in a revised manuscript, making our explanations clearer whilst acknowledging the alternative explanations.

      This is a good suggestion and we have additional flow cytometry data looking at hematopoietic cell expression of RELM-__a from these experiments that we can incorporate into the revised manuscript. We have found that airway macrophages were another source of RELM-a__ in the lung and mirrored the airway epithelial cell responses to both intraperitoneal and intranasal delivery of IL-4 and IL-13.

      We agree that a comparison of IL-13Ra1 versus IL-13 deficiency should be included in the discussion of our manuscript. These authors found epithelial-specific defects in IL-13Ra1-deficient mice such as Clca1 (aka Clca3), RELM-__a, and chitinase-like proteins even under homeostatic conditions, which is highly consistent with our data. This study also found that IL-13Ra1 deficiency led to increased bleomycin-induced pathology and together with our data, offers further insight into the IL-13/IL-13Ra__1 axis during lung injury. We will add these points to our discussion and will attempt to directly compare their gene expression data set with our data to find more overlapping genes between the two mouse strains and disease models.

      This is indeed a very important point we will address in a revised discussion. IL-4R__a-deficient mice did show increased bleeding in the Chen et al. study that was not seen in the IL-13Ra__1 KO suggesting IL-4 alone is sufficient to limit bleeding. This is in contrast to our study where we found increased bleeding in IL-13 KO mice independent of IL-4. However, a major difference between the studies is the background strain of mice used, which was BALB/c in the Chen et al. study versus C57BL/6 mice we used in our study. In addition to differences in IL-4 and IL-13 levels between the strains, we have unpublished observations of major differences in vascular integrity with BALB/c much more prone to bleeding, which is an active area of investigation in the lab. Although we have yet to unravel these differences mechanistically, they could explain differential requirements for IL-4 versus IL-13 to limit bleeding between the two strains.

      Our apologies, we will fix the reference duplication.

      *Reviewer #3 (Significance (Required)):

      This study addresses the specific function of IL-13 in acute helminth infection of the lung, which has not previously been studied, as most studies investigate the combined function of IL-4 and IL-13 through IL-4 receptor KO or Stat6 KO mice.

      It is a thorough, well-conducted and well-organized study with high quality data using 'omics' strategies to profile IL-13-induced genes and proteins. Their data identifies intriguing pathways that are dependent on IL-13, opening new avenues to explore for IL-13-mediated protective roles in acute lung tissue damage. Therefore this study provides conceptual and technical advances. Additionally, since targeting IL-4 and IL-13 are in clinical trials or employed therapeutically for pulmonary disorders, the findings from these studies are clinically relevant. It would however have been useful to validate some of these pathways and demonstrate epithelial-specific outcomes for IL-13-induced tissue protection.

      Previous studies using IL4RKO have shown that IL-4 and IL-13 are necessary to protect from acute tissue damage in helminth infection (Chen, Nature medicine - referred to by authors). Other studies have investigated IL-13 in fibrosis and granulomatous inflammation (papers referenced by authors, and Ramalingam Nature Immunology 2009). Last, one study shows that IL-13Ra1 signaling is important for protection in bleomycin-induced lung injury, findings using a different transgenic mouse, which are relevant for this study and may be useful to discuss (Karo-Atar, Mucosal Immunology 2016).

      As stated above - the data in this manuscript identify intriguing pathways that are dependent on IL-13, opening new, exciting avenues to explore for IL-13-mediated protective roles in acute lung tissue damage. Their data is also unique as it combines proteomics and transcriptomics, and identities previously unappreciated IL-13 regulated pathways such as cellular stress and Foxa2, which would be interesting to investigate further.

      **Referees cross-commenting**

      To Reviewer 1: The suggested data for Figure 1 (IL-13 concentrations) could be useful, but suggested experiments for Figure 2 could be outside the main focus of the paper.

      For the main suggested experiment: treatment of IL-13-/- with RELMalpha, this could be useful, One caveat is that RELMalpha might not be the only effector molecule downstream of IL-13 so the authors may not get a definitive answer. An alternative (although not as RELMalpha-specific) would be to treat IL13KO mice with FcIL-4 or FcIL-13 - the latter that drives RELMalpha, and look at whether FcIL-13 is more protective than FcIL-4.*

      We agree that rescue experiments could provide insights into the relative protective effects of IL-4 versus IL-13. However, it might be challenging to interpret the results in part because of the difficulty in establishing physiologically relevant doses and timing and the fact that IL-4 will also signal through the type 2 receptor. These difficulties are reflected in the interpretation of our current data as discussed above (pt. 1 reviewer 3). Although we have found IL-4 and IL-13 delivery experiments valuable and have used them in many of our papers, we have always been cautious in our interpretation, as we typically use these at super-physiological doses. However, this is an experiment we would consider if the editors felt it essential to the story.

      To Reviewer 2: I agree with points 1 and 3 - especially with point 3, which would give more in-depth understanding into the functional outcomes of the IL-13 -> FoxA2 pathway identified. For point 2, RNA-seq of epithelial cells would be informative, but may be beyond the scope of the project.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this study, Allen, Sutherland and colleagues utilize IL-13 deficient mice to investigate the function of IL-13 in the early response to lung tissue damage induced by helminth infection. They demonstrate that IL-13 deficiency has significant effects on the acute tissue response to helminth infection (at day 2 and 6 post-infection). Particularly, IL-13 deficiency results in increased lung hemorrhaging, and more pronounced lung tissue damage evidenced by increased gaps in the alveolar architecture. They perform proteomic and transcriptomic profiling of the lungs to determine IL-13-induced pathways and demonstrate many protein and gene expression changes in the absence of IL-13. These include dysregulated collagens, reduced epithelial-derived proteins RELMalpha and surfactant protein D, downregulated pathways related to cellular stress, and increased genes associated with the Foxa2 pathway.

      Overall, the key conclusions are convincing, and the study design, methods and data analysis are clear, rigorous and thorough.

      Minor Comments:

      1. The authors concluded that lung epithelial cells are more sensitive to IL-13 than IL-4, but the intranasal injection of both proteins showed a similar induction of RELMα - investigation into this difference would be useful. Alternatively, providing an explanation for these different findings could be helpful.
      2. Providing data by immunofluorescence or flow cytometry of non-epithelial expression of RELMalpha following intranasal versus intraperitoneal injection of IL-4 versus IL-13 would be useful.
      3. Discussion of IL-13Ra1 deficient mice would be useful, in particular the study by Karo-Atar and Munitz in Mucosal Immunology 2016, showing that IL13Ra1 is protective against bleomycin-induced pulmonary injury (PMID: 26153764). Comparing their data with the gene expression datasets from this study would be useful (acknowledging the caveat that IL-4 effects through the type 2 receptor would also be abrogated in these IL13Ra1 mice).
      4. The authors reference Chen et al. Nature Medicine 2012, but do not discuss the finding in this paper that neither IL-4-/- nor IL13Ra1-/- have increased lung hemorrhage. This might be a mouse strain issue and worthwhile discussing.
      5. Reference 32 and 36 (Sutherland PLoS pathogens) are duplicates

      Significance

      This study addresses the specific function of IL-13 in acute helminth infection of the lung, which has not previously been studied, as most studies investigate the combined function of IL-4 and IL-13 through IL-4 receptor KO or Stat6 KO mice.

      It is a thorough, well-conducted and well-organized study with high quality data using 'omics' strategies to profile IL-13-induced genes and proteins. Their data identifies intriguing pathways that are dependent on IL-13, opening new avenues to explore for IL-13-mediated protective roles in acute lung tissue damage. Therefore this study provides conceptual and technical advances. Additionally, since targeting IL-4 and IL-13 are in clinical trials or employed therapeutically for pulmonary disorders, the findings from these studies are clinically relevant. It would however have been useful to validate some of these pathways and demonstrate epithelial-specific outcomes for IL-13-induced tissue protection.

      Previous studies using IL4RKO have shown that IL-4 and IL-13 are necessary to protect from acute tissue damage in helminth infection (Chen, Nature medicine - referred to by authors). Other studies have investigated IL-13 in fibrosis and granulomatous inflammation (papers referenced by authors, and Ramalingam Nature Immunology 2009). Last, one study shows that IL-13Ra1 signaling is important for protection in bleomycin-induced lung injury, findings using a different transgenic mouse, which are relevant for this study and may be useful to discuss (Karo-Atar, Mucosal Immunology 2016).

      As stated above - the data in this manuscript identify intriguing pathways that are dependent on IL-13, opening new, exciting avenues to explore for IL-13-mediated protective roles in acute lung tissue damage. Their data is also unique as it combines proteomics and transcriptomics, and identities previously unappreciated IL-13 regulated pathways such as cellular stress and Foxa2, which would be interesting to investigate further.

      Referees cross-commenting

      To Reviewer 1: The suggested data for Figure 1 (IL-13 concentrations) could be useful, but suggested experiments for Figure 2 could be outside the main focus of the paper.

      For the main suggested experiment: treatment of IL-13-/- with RELMalpha, this could be useful, One caveat is that RELMalpha might not be the only effector molecule downstream of IL-13 so the authors may not get a definitive answer. An alternative (although not as RELMalpha-specific) would be to treat IL13KO mice with FcIL-4 or FcIL-13 - the latter that drives RELMalpha, and look at whether FcIL-13 is more protective than FcIL-4.

      To Reviewer 2: I agree with points 1 and 3 - especially with point 3, which would give more in-depth understanding into the functional outcomes of the IL-13 -> FoxA2 pathway identified. For point 2, RNA-seq of epithelial cells would be informative, but may be beyond the scope of the project.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Chenery et al report that IL-13 plays a critical role in protecting mice from lung damage caused by the infection of a nematode, Nippostrongylus Brasiliensis, in WT or IL-13 knockout mice (IL-13 eGFP knock-in mice, Neill et al., Nature 2010). Phenotypically, they demonstrated that IL-13 genetic deficiency resulted in more severe lung injuries and haemorrhaging following the larvae migratory infections. Through the proteomic and transcriptomic profiling, they identified gene-expression changes involved in the cellular stress responses, e.g. up-regulating the expression of epithelial-derived type 2 molecules, controlled by IL-13. They also found that type 2 effector molecules including RELM-alpha and surfactant protein D were compromised in IL-13 knockout mice. Thus, they proposed that IL-13 has tissue-protective functions during lung injury and regulates epithelial cell responses during type 2 immunity in this acute setting. Overall, the manuscript was clearly written and a number of findings were interesting and expected compared to the published knowledge. However, this work could be improved and more impactful by further performing the following suggested experiments.

      Major points:

      1. It may not be accurate to claim that "IL-13 played a critical role in limiting tissue injury ... in the lung following infection" since IL-13 participates in both repelling worms and activating tissue reparative responses. It is very hard to distinguish these two kinds of responses with the current experimental settings because the much higher worm burden led to more direct lung damage in IL-13-/- mice than WT counterparts.
      2. It would be more informative if the authors could perform the RNA-seq analysis on the IL-13-responsive cell type such as airway epithelial cells (goblet cell) by comparing WT vs IL13-/- in the context of lung damage caused by Nitrostrongylus Brasiliensis infection.
      3. Figure 6C, the transcriptional profiling of mouse lungs revealed that the Foxa2 pathway was significantly up-regulated in the IL-13-/- infected mice. This is an important finding because this pathway plays a critical role in the process of alveolarization and inhibiting goblet cell hyperplasia. In order to validate this finding, some components in this pathway could be further examined.

      Significance

      Overall, the manuscript was clearly written and a number of findings were interesting and expected compared to the published knowledge.

      Referees cross-commenting

      To Reviewer #1's Review: fair and constructive

      To Reviewer #3's Review: agree in general.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This study addresses the role IL-13 in promoting lung damage following migration of the helminth N. brasiliensis larvae from the circulatory system to the lung. The work clearly shows using IL13-/- mice that Nb elicited IL-13 immunity at days 2-6 post-infection reduces pathology. The authors demonstrate an association with reduced eosinophils but no effect on neutrophil numbers.

      Proteomic analysis identifies a number of molecules known to be involved in protecting against type 2 pathologies such as relm-a and SP-D.

      The authors then identify a clear requirement for IL-13 in driving relm-a expression.

      Finally, the authors present a whole lung RNA transcript profile which largely supports their proteomic observations.

      Taken together the work presents a sound case for IL-13 being an important player in protecting against initial lung pathology.

      Major requests:

      The paper is really very interesting and important. To an extent it questions existing dogma of IL-13 being a driver of lung inflammation.

      Addressing the following could hopefully be achieved using archived samples or with an acceptable amount of extra experimental work.

      Figure 1: D2 and D6 Lung IL-13 concentrations (ELISA) in WT mice would set the scene for the papers story

      Figure 2: The authors should add evidence that function/activity of neutrophils/eosinophils is changed/not changed: e.g. granzyme, MBP, EPO release in BAL and/or lung. Additionally, some data showing changes in epithelial stress related cytokines such as IL-23 and IL-33 would be informative (IHC and /or ELISA).

      The following will require a new experiment:

      The authors present a strong case for RELMa being associated with/driven by IL-13 responses. The following I feel would prove that IL-13 driven RELMa is important in reducing lung pathology. Can enhanced lung pathology or cell responses associated with pathology be reduced/altered by dosing Nb infected IL13-/- mice with recombinant relma or by restimulating BAL cells (for example) from IL-13-/- mice. This team is well placed to comment on the potential for such an in vivo experiment to be feasible.

      Or could the authors could also test the ability for other candidate molecules to reduce lung pathology? Would for example i/n dosing of IL-13-/- mice with AMCase, BRP39 or SP-D protect against pathology? It would be expected to be the case for SP-D.

      Significance

      The manuscript places IL-13 as an important initiator of early protection from acute lung damage. This is important as it is to an extent a non-canonical role for this cytokine. This is also important as IL-13 can be manipulated therapeutically. To maximise potential application of such drugs requires detailed understanding of the various contextual roles of IL-13. This study provides such evidence.

      The authors identify a range of target mediators.

      This is an important body of work that is useful for understanding how acute lung damage can be regulated.

      This work will be of interest to Type 2 immunologists, any researcher with an interest in pulmonary inflammation as well as mucosal immunity.

      I make these suggestions/comments based on my own background in Type 2 immunity, lung inflammation and parasitic helminth infection and immunity.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for carefully reading our manuscript. We found their comments to be incredibly thoughtful and constructive and greatly appreciate their feedback. We are confident that addressing the reviewers’ concerns will strengthen our manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript entitled 'Combinatorial patterns of graded RhoA activation and uniform F-actin depletion promote tissue curvature' by Denk-Lobnig et al. the authors study the organisation of junctional F-actin during the process of mesoderm invagination during gastrulation in the model Drosophila. Following on from previous work by the same lab that identified and analysed a multicellular myosin II gradient across the mesoderm important for apical constriction and tissue bending, the authors now turn their attention to actin. Using imaging of live and fixed samples, they identify a patterning of F-actin intensity/density at apical junctions that they show is dynamically changing going into mesoderm invagination and is set up by the upstream transcription factors driving this process, Twist and Snail. They go on to show, using genetic perturbations, that both actin and the previously described myosin gradient are downstream of regulation and activation by RhoA, that in turn is controlled by a balance of RhoGEF2 activation and RhoGAP C-GAP inactivation. The authors conclude that the intricate expression patterns of all involved players, that all slightly vary from one another, is what drives the wild-type distinctive cell shape changes in particular rows of cells of the presumptive mesoderm and surrounding epidermis.

      This is a very interesting study analysing complex and large-scale cell and tissue shape changes in the early embryo. Much has been learned over the last decade and more about many of the molecular players and their particular behaviours that drive the process, but how all upstream regulators work together to achieve a coordinated tissue-scale behaviours is still not very well understood, and this study add important insights into this.

      The experiments seem well executed and support the conclusion drawn, but I have a few comments and questions that I feel the authors should address to strengthen their argument.

      We thank the reviewer for their interest in the paper and their helpful comments.

      **General points:**

      1. The authors state early on that they chose to focus on junctional rather than apical medial F-actin, but it is unclear to me really what the rationale behind that is. In much of the authors earlier work, they study the very dynamic behaviour of the apical-medial actomyosin that drives the apical cell area reduction in mesodermal cells required for folding. They have previously analysed F-actin in the constricting cells, but have only focused on the most constricting central cell rows (Coravos, J. S., & Martin, A. C. (2016). Developmental Cell, 1-14). The role of junctional F-actin compared to the apical-medial network on which the myosin works to drive constriction is much less clear, it could stabilize overall cell shape or modulate physical malleability or compliance of cells, or it could more actively be involved in implementing the 'ratchet' that needs to engage to stabilise a shrunken apical surface. I would appreciate more explanation or guidance on why the authors chose not to investigate apical-medial F-actin across the whole mesoderm and adjacent ectoderm, but rather focused in junctional F-actin, especially explaining better throughout what they think the role of the junctional F-actin they measure is.

      We focused on the junctional/lateral F-actin pool because this is where tissue-wide patterns in intensity variation are observed, especially when looking across the mesoderm-ectoderm boundary. Indeed, when we compare the apical-medial F-actin of marginal mesoderm cells to ectoderm cells in cross sections, we find no apparent difference, whereas there is a striking difference in junctional/lateral F-actin density (Fig. 1B, C; Supplemental Fig. 1A, D). We provide some preliminary en face views of the medial-apical surface in our response to Point 2, and we will obtain higher resolution images from live and fixed embryos to better show the network organization. We agree with the reviewer that this requires added justification. Therefore, we will: 1) Provide higher resolution images of apical-medial F-actin comparing different regions of mesoderm and ectoderm, and 2) revise the text to better justify why we chose junctional/lateral F-actin to focus our tissue-level analysis and to elaborate more on what we think the role of junctional/lateral F-actin in this process may be.

      Comparing the F-actin labeling in the above previous paper to the stainings/live images shown here, they look quite different. This is most likely due to the authors here not showing the whole apical area but focusing on junctional, i.e. below the most apical region. It is not completely clear to me from the paper at what level along the apical-basal axis the authors are analysing the junctional F-actin. Supplemental Figure 2 seems to suggest about half-way down the cell, which would be below junctional levels. Could the authors indicate this more clearly, please? Overall, I would appreciate if the authors could supply some more high-resolution images of F-actin from fixed samples (which I assume will give the better resolution) of how F-actin actually looks in the different cells with differing levels. Is there for instance a visible change to F-actin organisation? And could this help explain the observed changes in intensity and their function?

      We apologize for the confusion, we were referring to ‘junctions’ as the lateral contacts between cells, as opposed to the adherens junctions at the apical surface. We have modified the text to use the term ‘lateral’ rather than ‘junctional’ F-actin, so as to avoid this confusion. The difference in cortical F-actin staining is not restricted to a particular apical-basal position, but extends along the length of the lateral domain (Fig. 1B, C). As far as we can tell the actin is bundled and underlies the entire cell circumference. We will: 1) better define the apical-basal position within the cell that we are showing, and 2) show high-resolution en face images of F-actin at different apical-basal positions, across different cell positions, in live and fixed embryos to better justify our focus on lateral F-actin (similar orientation, but higher resolution/quality than preliminary live data below).

      Along the same lines of thought as in point 2): Dehapiot et al. (Dehapiot, B., ... & Lecuit, T. (2020). Assembly of a persistent apical actin network by the formin Frl/Fmnl tunes epithelial cell deformability. Nature Cell Biology, 1-21) have recently shown for the process of germband extension and amnioserosa contraction that two pools of F-actin can be observed, a persistent pool not dependent on Rho[GTP] and a Rho-[GTP] dependent one. Could the authors comment on what they think might occur in the mesoderm, are similar pools present here as well?

      1. As the authoirs state themselves, Rho does not only affect actin via diaphanous, but of course also myosin via Rock. So it would be good to refelect this more in the interpretation and discussion of data, as the causal timeline could be complex.

      We thank the reviewer for reminding us to address this point and to discuss this excellent recent paper. We have not observed a persistent medial actin network in mesoderm cells or ectoderm cells at this stage (i.e. prior to germband extension). It was previously shown in mesoderm cells that pulsed myosin contractions condense the medio-apical F-actin network, but that this is often followed by F-actin network remodeling and that total F-actin levels decrease during apical constriction (Mason et al., 2013). Furthermore, Rho-kinase inhibition in mesoderm cells significantly disrupts this network, but does not inhibit the rapid assembly and disassembly of apical F-actin cables, which could reflect elevated actin turnover (Mason et al., 2013; Jodoin et al., 2015). To address the reviewer’s points, we 1) now include a paragraph in the Discussion to discuss the Dehapiot et al. paper (Comment 3) and the possible roles of various pools of F-actin and Rock/myosin shape the tissue (Comment 4) (lines 404-408), and 2) will image the apical surface of mesoderm and ectoderm at this stage and also germband extension (as a positive control) in order to determine whether there is a persistent network.

      **More specific comments to experiments and figures:**

      1. Reduction of junction function by alpha-catenin-RNAi: how strong is the reduction in catenin? Could they label a-catenin in fixed embryos? The authors conclude the original pre-constriction patterning of F-actin intensity is not dependent on intact junctions, but they show that the increase in F-actin in the mesodermal cells concomitant with apical constriction is in fact impaired in the RNAi. Thus, the authors can also not conclude whether the continued accumulation of myosin and its persistence depend on intact junctions. The initial set-up of the myosin gradient in terms of intensity distribution is unaffected, but clearly dynamics, subcellular pattern, interconnectivity etc. of myosin are affected and thus may well depend on some mechanical feed-back. I find this section of the manuscript slightly overstated and feel the conclusion should be more cautious.

      We thank the reviewer for pointing this out; we completely agree that we should have been more precise with our language. Our main conclusion was that myosin accumulation in a gradient does not require ‘sustained mechanical connectivity’. We felt it was important, given our model of transcriptional patterning, to show that some patterns did not result from mechanics or even apical constriction. Alpha-catenin knock-down provides the cleanest and most severe disruption of adhesion that we can accomplish at this developmental stage. We showed that alpha-catenin-RNAi resulted in: a) almost no intercellular connectivity in myosin structures (Yevick et al., 2019), and b) no apical constriction (this study, Fig. 3B).

      We: 1) revised the text in this section, clarifying that we are only referring to the gradient and that other myosin properties clearly do depend on mechanics, 2) will include data better showing the extent of the alpha-catenin knockdown and its effects on junctions and actomyosin.

      Figure 1 versus Figure 2: Why do the Utrophin-ABD virtual cross-sections look so fuzzy and bad in comparison to phalloidin labelled F-actin in the virtual cross-section in Fig. 1B and C? The labelling shown in 2B and D does not even look very junctional...

      We apologize that we did not explain the difference in visualization methods more clearly. For live images (Figure 2), we used a projection of cross-sections, which includes 20 µm length along the anterior-posterior (AP) axis. This projection method is less dependent on the specific AP position of the cross-section and the specific cells being shown. We did this because the projection helps to visualize the tissue pattern in live images where fluorescence images are noisier than fixed images, which exhibit cleaner labeling (Fig. 1). To address this point, we plan to: 1) Edit the text to make the method of visualization clearer, and 2) fix snail and twist mutant embryos and also provide thin cross-sections analogous to Fig. 1.

      Figure 5 C and D: the control gradients for myosin shown in C and D are completely different, for C the half-way height cell row is deduced as 5 whereas the (in theory identical) control measure in D has row 3 at halfway height! Why is this? Putting all curves together in the same panel would suggest that that C control curve is very similar to RhoGEF2-OE! This can't be right.

      The reason for the different width of the gradients in these controls is the Sqh::GFP copy number. All of our experiments perturbing Rho were carefully controlled so as to ensure the same copy number of the fluorescent marker that we were visualizing. For technical reasons, we were only able to get 1 copy of the Sqh::GFP into the RhoGEF2-OE background. Having two copies of the Sqh::GFP appears to have a slightly activating effect; in fact, the reviewer might have noticed that ventral furrows with 2 copies Sqh::GFP (and a wider gradient) have lower curvature, consistent with our main conclusion (Fig. 7 C). The effects of fluorescently tagged markers were a concern for us and so we were careful to show that the effects of changing RhoA activity on tissue curvature occur regardless of the fluorescent marker (i.e., Sqh::GFP or Utr::GFP, Fig. 7 and Sup. Fig. 7).

      Still in Figure 5: Panels C and D again, but for apical area: are the control and C-GAP-RNAi or RhoGEF2-OE curves significantly different? What statistics were used on this?

      We thank the reviewer for this point. We did not include statistical comparisons of the gradient width originally, because we felt that it does not completely capture the difference between the two curves and that presenting the curves instead lets readers examine the intricacies of the data as a whole. However, to address the reviewer’s point, we will add statistical comparisons for apical area as well as myosin and actin patterns.

      Supplemental Figure 1: Panels in D: I appreciate this control, but would really also like to see the same control at a stage when folding has commenced and stretched cells are present at the margin of the mesoderm. How homogenous does the GAP43 label look in those?

      We will add a more apical projection (with quantification) of this embryo, in which folding has already commenced, to the revised manuscript, so its stage is clearer.

      Supplemental Figure 5: Panel 5 B: the authors conclude that the myosin gradient under RhoGEF2 RNAi is not smaller, but looking at the curves it in fact looks wilder. They also mention that the overall level of myosin in this condition is lower than the control...

      We will include quantification of absolute levels in Supplemental Figure 5 to compare overall levels. We will also statistically compare RhoGEF2 RNAi and control gradients and update our conclusions accordingly.

      Following on from the above, a comment of Figure 7: - The authors use RhoGEF2 RNAi stating that it affects the actin pattern, but the myosin pattern also seems affected. In line 318 the authors state that they use this condition to look at how junctional actin density affects curvature. I find this phrase misleading as It might lead the readers to think that RHoGEF2 RNAi only affects junctional F-actin, although it also affects myosin patterning.

      We thank the reviewer for catching this, that’s a good point. We have revised the text in lines 317-326 to more accurately describe the effect of RhoGEF2-RNAi on actin and myosin patterning, and to connect this to the effect on curvature.

      • Line 311: confusingly, the authors state that an increase in the actomyosin gradient affects curvature. But it is only the myosin gradient that is increased, while the junctional actin gradient is flatter than the control in both C-GAP RNAi and RhoGEF2 OE (the distinction is even made by authors line 243). Could this be clarified?

      We thank the reviewer for pointing out this imprecision on our part and have revised Line 311 to more precisely describe the individual effects on myosin and F-actin pattern changes upon RhoA perturbation.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Mesoderm invagination during Drosophila gastrulation has been a paradigm for how regionally restricted gene expression locally activates Rho signalling and for how subsequently activated acto-myosin drives cell shape changes which in turn lead to a change in tissue morphology. Despite the numerous studies on this subject and a good understanding of the overall process, several important aspects have remained elusive so far. Among these is the dynamics of cortical and junctional F-actin and its contribution to the shape changes of cells and tissue. Previous studies have focused on MyoII, the „active" component of the actin cytoskeleton. The dynamics of the „passive" counterpart, namely actin filaments, has been neglected, although it is clear that Rho signalling controls both branches.

      We thank the reviewer for the tough questions. The reviewer raises important points that, even if not all feasible to address experimentally, can be addressed by being more precise with our language__ and conclusions.__

      1. Although I clearly acknowledge the efforts taken by the authors to define a function of cortical (junctional) F-actin in apical constriction and furrow formation, several central aspects of the study are not sufficiently resolved and conclusive. Rho signalling controls MyoII via Rok and F-actin via forming/dia, among other less defined targets. The role of MyoII and cortical contraction could be conclusively sorted out, since inhibition of Rok affects the MyoII branch but not the other branches. A similar approach, i. e. a specific inhibition/depletion without affecting the other branch, has not been taken yet for the F-actin branch. The authors have not resolved this issue. When analysing the mutants, the authors cannot distinguish the effect of Rho signalling on the MyoII and F-actin branch. For this reason the changes in F-actin distribution in the mutants are linked to changes in Myo activity and thus a function cannot be assigned to F-actin. In order to derive a specific role of F-actin distribution for furrow formation, the authors need to find experimental ways to affect F-actin levels without affecting MyoII, for example by analysing mutants for dia or other formins.

      The reviewer’s assertion that Rok and Diaphanous only affect myosin and actin, respectively, is oversimplified. For example, in mammals, Rok regulates the Lim-Kinase/Cofilin pathway and thus F-actin (Geneste et al., JCB, 2002). The ‘F-actin branch’ of the RhoA pathway has been examined in multiple previous studies of mesoderm invagination (Fox and Peifer, 2007; Homem et al., 2008; Mason et al., 2013). We did not include diaphanous mutants in this tissue-level study because diaphanous mutants and actin drugs: a) affect RhoA signaling (Munjal et al., 2015; Coravos et al., 2016; Michaux et al., 2018), b) disrupt adherens junctions and tissue integrity (Homem et al, 2008; Mason et al., 2013), and c) have a preponderance of cellularization defects (Afshar et al., 2000). However, we agree with the reviewer that this could potentially be interesting, and so we 1) will look at the tissue-level pattern in Diaphanous-depleted embryos, 2) will analyze tissue-level actomyosin patterns in Rok inhibitor-injected embryos, and 3) have added a section to the Discussion (lines 418-432) explaining past work in this area and why we did not provide data on diaphanous mutants. A caveat of the proposed experiments is that actin and myosin ‘branches’ may be too interconnected to be conclusively separated.

      The authors employ a discontinuous spatial axis by the cell number. Although there are good arguments to understand and treat the cells as units, there are also good arguments for using a scale with absolute distance. I have doubts that the graded distributions presented by the authors are a result of this scaling with cell units. When looking at panel B of Fig 1 or Fig. 2A,B, for example, a sharp step like distribution is visible at the boundary between mesoderm and ectoderm anlage. In contrast a F-actin intensity distribution is graded after quantification. The graded distribution appears not to be a consequence of averaging because an even sharper step is very obvious in a projection along the embryonic axis as shown in panel B and D of Fig. 2, for example. The difference of a sharp step in the images and graded distribution after quantification with a spatial axis in cell number, is obvious for a-catenin in Fig. 3D and Rho signalling in Fig. 4 B. As the authors base their central conclusion (see headline) on the graded distribution, resolving the issue of spatial scale is a prerequisite of publication.

      We thank the reviewer for their point. It is an excellent idea and we have included representative plots with a continuous spatial scale in addition to our cell-based analysis (see below, each trace is average line intensity for 1 embryo). The spatially resolved analysis shows similar patterns for F-actin, myosin and RhoA pathway components as the cell-based metric and we plan to include this data as Supplemental Fig. 3 and 4 in a revised version of the manuscript.

      The authors put the spatial distribution of Rho signalling and F-actin into the center of their conclusion. They do so by affecting the pattern with mutants in twist/snail and varying upstream factors of Rho signalling. With respect to myo activation this have been done previously although possibly with less detail and it is no new insight that the width of the mesoderm anlage and corresponding Rho signalling domain has a consequence on the shape of the groove and furrow. To maintain the conclusion of the manuscript that spatially graded Rho signalling is contributes to tissue curvature, more convincing ways to change the pattern of Rho signalling are needed. Changing the balance of GEF and GAP shows the importance of Rho signalling and possibly signalling levels but not the contribution of its spatial distribution.

      A strength of our study was that we were able to stably ‘tune’ Rho signaling pattern and then follow tissue shape at later stages to determine the connection between the two. We respectfully disagree with the statement that, “with respect to myosin activation this has been done previously”. In past work, we expanded myosin activation by modifying embryonic cell fate, including changes in dorsal cell fates (Heer et al. 2017; Chanet et al., 2017). Here, we directly manipulate RhoA signaling, maintaining the width of the mesoderm anlage (see images below).

      A central conclusion of our study is that RhoA activation level determines the width of myosin activation within a normally sized mesoderm anlage, which has not been done before. The genetic approach presented in the paper was the best way we found to manipulate the spatial pattern of myosin/actin in a stable manner that lasts through invagination. It is worth noting that this approach allowed us to carefully ‘tune’ the level of RhoA activation so as to avoid elevating RhoA levels to the point that it disrupts signaling polarity within the cell (Mason et al., 2016). In our hands, optogenetic manipulation of RhoA, which requires continuous optical input, was less robust because: a) 2D tissue flow precludes delivering a consistent level of activation to given cells over the time course of invagination, b) tissue folding (i.e. 3D deformation) dramatically alters how much light is delivered to the mesoderm cells.

      To address the reviewer’s point, we: 1) edited the Discussion to explicitly state that we did not alter the pattern of RhoA activation without altering RhoA signaling levels and (lines 339-343), 2) plan to include Snail or Twist stainings showing that the width of the mesoderm anlage is not altered by changes in RhoA signaling so there is no confusion about this point, and 3) plan to include a mechanical model that compares how altering signaling levels vs. altering the spatial distribution of signaling affect fold curvature, respectively.

      Reviewer #2 (Significance (Required)):

      The question of a contribution of F-actin is addressed in this manuscript. The authors quantify F-actin in fixed and living embryos at two prominent steps in ventral furrow formation, (1) shortly prior to onset of apical constrictions and (2) when the groove has formed. They distinguish junctional and „medial" cortical F-actin. They employ a discontinuous spatial axis, the number of cells away from the ventral midline but not an absolute scale (see my notes below). The measurements are applied to wild type and mutant embryos affecting the transcriptional patterning (twist, snail), adherens junctions, and Rho signalling. The authors claim to reveal by their measurements a graded distribution of F-actin intensities with a peak at the ventral midline and a second peak at the boundary between mesoderm and ectoderm with a low point in the stretching cells of the mesectoderm. The authors further claim to reveal a graded distribution of Rho signalling components within the mesoderm anlage. Based on these data the authors conclude that graded Rho signalling and depletion of F-actin promote tissue curvature.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Previous work has shown that mesoderm invagination at the ventral midline of the Drosophila embryo requires precise spatial regulation of actomyosin levels in order to fold the tissue. In this work, Denk-Lobnig and colleagues further investigate the spatial distribution of myosin and F-actin in the mesoderm and how these patterns are established. The authors identify an F-actin pattern at the apical cell junctions that emerges upon folding, with elevated levels in the cells around the ventral midline, a decrease in junctional F-actin in the marginal mesoderm, and then an increase at the mesoderm-ectoderm border. They identify Snail and Twist as regulating different aspects of establishing this F-actin pattern. Additionally, by modulating RhoA activity (downstream of Twist) the authors are able to alter the width of the actomyosin pattern without affecting the width of the mesoderm tissue, which in turn affects the curvature of the tissue fold and the post-fold lumen size.

      The authors have conducted an elegant quantitative analysis of the distribution of actin, myosin and several of their regulators across the tissue. The study makes an attempt at integrating a large amount of information into a model of tissue folding, and the concept of mechanical gradients is exciting and still underexplored. I am concerned that the interpretation of some results focuses on specific details but ignores larger scale effects (e.g. potential effects of some of the manipulations on the ectoderm, and the impact that that could have on tissue folding are largely ignored). The statistical analysis of several results should also be improved. I suggest to address the following points.

      We thank the reviewer for their interest in our work and their important suggestions.

      **MAJOR**

      1. Line 127 and Figure 1E: The authors argue that there is an anticorrelation between F-actin distribution and cell areas. However, an R-squared value of 0.1083 rather suggests little-to-no correlation. The authors should evaluate the statistical significance of that correlation.

      To indicate whether the relationship between F-actin distribution and cell areas is significant, we will report the p-value for the F-test for overall significance for our regression analysis, as well as sample size, of this data in the revised manuscript. The F-statistic for this analysis is __F = 89.2, p-value = 4.7e-20.__

      Figure 5: claims that the width of the actomyosin gradient is affected by the various perturbations should be supported with statistical analysis. For example, the half-maximal gradient position for each individual myosin trace could be calculated (instead of using the mean trace), displayed using a box plot, and tested for significance using the Mann-Whitney U test, as in Figure 7. This is slightly complicated by the fact that the control group in Figure 5C is the same as the control group in Figure 3E, which needs to be carefully considered. Also, similar calculations should be made for the F-actin data in Fig 5E-G since throughout the rest of the paper, the authors refer to the width of the "actomyosin gradient" which implicates both myosin and actin.

      We thank the reviewer for this point We will include statistical comparisons for myosin gradients in the revised manuscript. To allow for multiple comparisons using the same control, we plan to use Kruskal-Wallis testing, which is analogous to one-way ANOVA for non-parametric data, and a post-hoc test to determine which pairs have significantly different distributions.

      We will update the language in the manuscript to distinguish between actin and myosin patterns. As our main conclusion is that F-actin depletion levels are changed by RhoA in marginal mesoderm cells, we will statistically compare this between groups.

      Line 142 and Figure 2B-C: I was confused by the description of the snail phenotype: - a. the claim that in snail mutants actin levels are uniform: based on Figure 2C, I'd say that F-actin levels decrease across the mesoderm moving away from the ventral midline, and that the main issue is with the accumulation of actin in the distal end of the mesoderm. The authors should better justify the claim that F-actin levels are uniform in snail mutants (or remove it). Maybe comparing F-actin levels in the first four or five rows of the mesoderm? - b. how about the increase of F-actin in the distal mesoderm, just adjacent to the ectoderm boundary? Why is it gone in snail mutants?

      1. We agree that the intensity in all embryos appears to decrease on the sides of the embryos when imaged in this way, but it is also clear that there is no abrupt increase in F-actin density going into the ectoderm. In our experience, the edge effect is due to the distance of the side of the embryo from the coverslip rather than actual lower F-actin density. This is suggested by: a) the fact that all snail mutant embryos peak at the center of the image even though they are not all oriented with the ventral side perfectly on top, and b) all embryos exhibit an intensity decrease within the ectoderm toward the edges of the image that are further away from the coverslip (Fig. 2 C, E, F). We will: 1) modify the text to include an explanation, and 2) fix and stain snail and twist mutant cross-sections that will not exhibit this effect of imaging depth, for comparison.
      2. We show in Figure S1C that in wild-type, F-actin does not actually increase in cells at the ectoderm boundary, but merely decreases in lateral mesoderm cells. Thus, it is likely that snail mutant embryos are merely lacking patterning in the mesoderm, where snail is active.
      3. With alpha-catenin-RNAi, F-actin depletion across the mesoderm still occurs, but junctional F-actin levels are not increased around the midline. While some explanations are offered in the text, the reason for this phenotype seems important for the story. The text in lines 204-205 suggests that F-actin that would normally be localized to the apical junctions is instead being drawn into medioapical actomyosin foci. Is this idea supported by evidence that medioapical F-actin in control embryos is lower than in alpha-catenin RNAi?

      We appreciate the reviewer’s suggestion to explain this more thoroughly. We find that in alpha-catenin-RNAi and even arm (β-catenin) mutant embryos, junctional complexes (i.e., E-cadherin) are drawn into the myosin spot through continuous contractile flow (see below and Martin et al., 2010 for arm). To make this clear in the manuscript, we plan to: 1) include data showing the effects of alpha-catenin RNAi on F-actin and E-cadherin localization in fixed embryos, which is now included in Supplemental Figure S3, and 2)

      include live imaging of UtrGFP-labeled alpha-catenin RNAi embryos.

      Figure 6A: there is a correlation between cell position and the productivity of myosin pulses, which the authors attribute to the RhoA gradient. This should be more definitively demonstrated by:

      • a. Plot and calculate the correlation between RhoA levels (measured with the RhoA probe) and the change in cell area caused by a contraction pulse. Is this a significant correlation?

      • b. How does myosin persistence change when RhoA is manipulated, e.g. in RhoA overexpressing embryos or in RhoA RNAi?

      It has already been shown that there is a correlation between myosin amplitude and apical constriction amplitude (Xie et al., 2015).__ Apical myosin and Rho-kinase localization depends entirely on RhoA activity (Mason et al., 2016) and Rho-kinase co-localizes precisely with myosin in both space and time (Vasquez et al., 2014). Changing levels of the RhoA regulator C-GAP has been shown to affect myosin persistence and the productivity of apical constriction, with higher C-GAP causing less productive constriction (Mason et al., 2016). We plan to update the text to connect the observation with what has been shown in previous studies and to make statements regarding causality on the tissue-level more cautious. However, our observation further shows how cytoskeletal activity is patterned across the mesoderm, so we think it has value and that it should be included in this paper. An in depth study of the connection between RhoA regulators and myosin persistence/pulsing is beyond the scope of the present study, especially considering possible COVID-19 restrictions. Making these connections will require substantial effort in the future.__

      **MINOR**

      1. The authors should indicate if the myosin shown in Figure 1A is junctional or medioapical. If it is junctional, does medioapical myosin better match junctional F-actin and cell areas? Similarly, if they are showing medioapical myosin, how does junctional myosin compare to junctional actin? It seems to me that consistently comparing the patterns of junctional F-actin and medioapical myosin (and RhoGEF2, RhoA, and ROCK in Figure 4) could be somewhat misleading, as the pools compared localize in different subcellular compartments.

      The myosin images shown throughout the paper are medioapical myosin. Junctional myosin in mesoderm cells is lower in intensity and cannot easily be seen by live imaging. We agree that it is important for the reader to see all pools of these proteins. Therefore, we will include in a supplemental figure high resolution images of actin and myosin at both apical and subapical positions for midline mesoderm, marginal mesoderm, and ectoderm cells at the time of folding. We will also justify why the analyzed pools were chosen, respectively.

      Most of the intensity traces for myosin and F-actin are presented as normalized intensity, relative to the highest intensity in the trace. However, there are claims throughout the text about the relative levels of myosin (ex. Line 241) and F-actin (conclusions based on Fig. 2B-D) that should be supported by quantification. It seems that changes in intensity for both F-actin and myosin, in addition to shape of the gradient, would contribute to the understanding of actomyosin regulation in this tissue. However, if intensities cannot be directly compared between groups due to variation in imaging settings or staining protocols, there should be no claims made about changes in overall F-actin or myosin intensity.

      We appreciate the point made by the reviewer here. To address this point, we will provide data for absolute levels in relevant cases and be more precise in our conclusions.

      The significance of the correlation in Figure 7E should be quantified.

      We will report the p-value for the F-test for overall significance for our regression analysis of this data. The F-statistic for this analysis is F = __15.6, p-value = 0.00103.__

      Supplemental Figure 2: does the segmentation image match the second Z reslice immediately above? It does not appear so, or perhaps they are just not aligned. Having the two match would be more convincing of the segmentation technique.

      We will ensure that matching images are used for this figure.

      Reviewer #3 (Significance (Required)):

      The authors have conducted an elegant quantitative analysis of the distribution of actin, myosin and several of their regulators across the tissue. The study makes an attempt at integrating a large amount of information into a model of tissue folding, and the concept of mechanical gradients is exciting and still underexplored. I am concerned that the interpretation of some results focuses on specific details but ignores larger scale effects (e.g. potential effects of some of the manipulations on the ectoderm, and the impact that that could have on tissue folding are largely ignored). The statistical analysis of several results should also be improved.

      This is a great point. It is important to note that our conclusions required us to ‘tune’ the expression of GEF and the depletion of GAP with GAL4 drivers to get expression levels that do not dramatically affect RhoA polarity within mesoderm cells, but that alter the tissue level pattern within the mesoderm. Furthermore, we were cautious in making sure that our perturbations that elevate RhoA activation level did not lead to elevated myosin in the ectoderm (Fig. 5A and B). It is worth noting that RhoGEF2 is still full-length in all cases and has all of the normal regulatory domains that allow its activity to be restricted to the mesoderm at this stage. To more explicitly show the effect of our perturbations on ectoderm cells, we plan to include higher resolution images comparing myosin and F-actin organization/levels in the ectoderm for our manipulations of RhoA signaling.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Previous work has shown that mesoderm invagination at the ventral midline of the Drosophila embryo requires precise spatial regulation of actomyosin levels in order to fold the tissue. In this work, Denk-Lobnig and colleagues further investigate the spatial distribution of myosin and F-actin in the mesoderm and how these patterns are established. The authors identify an F-actin pattern at the apical cell junctions that emerges upon folding, with elevated levels in the cells around the ventral midline, a decrease in junctional F-actin in the marginal mesoderm, and then an increase at the mesoderm-ectoderm border. They identify Snail and Twist as regulating different aspects of establishing this F-actin pattern. Additionally, by modulating RhoA activity (downstream of Twist) the authors are able to alter the width of the actomyosin pattern without affecting the width of the mesoderm tissue, which in turn affects the curvature of the tissue fold and the post-fold lumen size.

      The authors have conducted an elegant quantitative analysis of the distribution of actin, myosin and several of their regulators across the tissue. The study makes an attempt at integrating a large amount of information into a model of tissue folding, and the concept of mechanical gradients is exciting and still underexplored. I am concerned that the interpretation of some results focuses on specific details but ignores larger scale effects (e.g. potential effects of some of the manipulations on the ectoderm, and the impact that that could have on tissue folding are largely ignored). The statistical analysis of several results should also be improved. I suggest to address the following points.

      MAJOR

      1. Line 127 and Figure 1E: The authors argue that there is an anticorrelation between F-actin distribution and cell areas. However, an R-squared value of 0.1083 rather suggests little-to-no correlation. The authors should evaluate the statistical significance of that correlation.
      2. Figure 5: claims that the width of the actomyosin gradient is affected by the various perturbations should be supported with statistical analysis. For example, the half-maximal gradient position for each individual myosin trace could be calculated (instead of using the mean trace), displayed using a box plot, and tested for significance using the Mann-Whitney U test, as in Figure 7. This is slightly complicated by the fact that the control group in Figure 5C is the same as the control group in Figure 3E, which needs to be carefully considered. Also, similar calculations should be made for the F-actin data in Fig 5E-G since throughout the rest of the paper, the authors refer to the width of the "actomyosin gradient" which implicates both myosin and actin.
      3. Line 142 and Figure 2B-C: I was confused by the description of the snail phenotype:
        • a. the claim that in snail mutants actin levels are uniform: based on Figure 2C, I'd say that F-actin levels decrease across the mesoderm moving away from the ventral midline, and that the main issue is with the accumulation of actin in the distal end of the mesoderm. The authors should better justify the claim that F-actin levels are uniform in snail mutants (or remove it). Maybe comparing F-actin levels in the first four or five rows of the mesoderm?
        • b. how about the increase of F-actin in the distal mesoderm, just adjacent to the ectoderm boundary? Why is it gone in snail mutants?
      4. With alpha-catenin-RNAi, F-actin depletion across the mesoderm still occurs, but junctional F-actin levels are not increased around the midline. While some explanations are offered in the text, the reason for this phenotype seems important for the story. The text in lines 204-205 suggests that F-actin that would normally be localized to the apical junctions is instead being drawn into medioapical actomyosin foci. Is this idea supported by evidence that medioapical F-actin in control embryos is lower than in alpha-catenin RNAi?
      5. Figure 6A: there is a correlation between cell position and the productivity of myosin pulses, which the authors attribute to the RhoA gradient. This should be more definitively demonstrated by:
        • a. Plot and calculate the correlation between RhoA levels (measured with the RhoA probe) and the change in cell area caused by a contraction pulse. Is this a significant correlation?
        • b. How does myosin persistence change when RhoA is manipulated, e.g. in RhoA overexpressing embryos or in RhoA RNAi?

      MINOR

      1. The authors should indicate if the myosin shown in Figure 1A is junctional or medioapical. If it is junctional, does medioapical myosin better match junctional F-actin and cell areas? Similarly, if they are showing medioapical myosin, how does junctional myosin compare to junctional actin? It seems to me that consistently comparing the patterns of junctional F-actin and medioapical myosin (and RhoGEF2, RhoA, and ROCK in Figure 4) could be somewhat misleading, as the pools compared localize in different subcellular compartments.
      2. Most of the intensity traces for myosin and F-actin are presented as normalized intensity, relative to the highest intensity in the trace. However, there are claims throughout the text about the relative levels of myosin (ex. Line 241) and F-actin (conclusions based on Fig. 2B-D) that should be supported by quantification. It seems that changes in intensity for both F-actin and myosin, in addition to shape of the gradient, would contribute to the understanding of actomyosin regulation in this tissue. However, if intensities cannot be directly compared between groups due to variation in imaging settings or staining protocols, there should be no claims made about changes in overall F-actin or myosin intensity.
      3. The significance of the correlation in Figure 7E should be quantified.
      4. Supplemental Figure 2: does the segmentation image match the second Z reslice immediately above? It does not appear so, or perhaps they are just not aligned. Having the two match would be more convincing of the segmentation technique.

      Significance

      The authors have conducted an elegant quantitative analysis of the distribution of actin, myosin and several of their regulators across the tissue. The study makes an attempt at integrating a large amount of information into a model of tissue folding, and the concept of mechanical gradients is exciting and still underexplored. I am concerned that the interpretation of some results focuses on specific details but ignores larger scale effects (e.g. potential effects of some of the manipulations on the ectoderm, and the impact that that could have on tissue folding are largely ignored). The statistical analysis of several results should also be improved.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Mesoderm invagination during Drosophila gastrulation has been a paradigm for how regionally restricted gene expression locally activates Rho signalling and for how subsequently activated acto-myosin drives cell shape changes which in turn lead to a change in tissue morphology. Despite the numerous studies on this subject and a good understanding of the overall process, several important aspects have remained elusive so far. Among these is the dynamics of cortical and junctional F-actin and its contribution to the shape changes of cells and tissue. Previous studies have focused on MyoII, the „active" component of the actin cytoskeleton. The dynamics of the „passive" counterpart, namely actin filaments, has been neglected, although it is clear that Rho signalling controls both branches.

      1. Although I clearly acknowledge the efforts taken by the authors to define a function of cortical (junctional) F-actin in apical constriction and furrow formation, several central aspects of the study are not sufficiently resolved and conclusive. Rho signalling controls MyoII via Rok and F-actin via forming/dia, among other less defined targets. The role of MyoII and cortical contraction could be conclusively sorted out, since inhibition of Rok affects the MyoII branch but not the other branches. A similar approach, i. e. a specific inhibition/depletion without affecting the other branch, has not been taken yet for the F-actin branch. The authors have not resolved this issue. When analysing the mutants, the authors cannot distinguish the effect of Rho signalling on the MyoII and F-actin branch. For this reason the changes in F-actin distribution in the mutants are linked to changes in Myo activity and thus a function cannot be assigned to F-actin. In order to derive a specific role of F-actin distribution for furrow formation, the authors need to find experimental ways to affect F-actin levels without affecting MyoII, for example by analysing mutants for dia or other formins.
      2. The authors employ a discontinuous spatial axis by the cell number. Although there are good arguments to understand and treat the cells as units, there are also good arguments for using a scale with absolute distance. I have doubts that the graded distributions presented by the authors are a result of this scaling with cell units. When looking at panel B of Fig 1 or Fig. 2A,B, for example, a sharp step like distribution is visible at the boundary between mesoderm and ectoderm anlage. In contrast a F-actin intensity distribution is graded after quantification. The graded distribution appears not to be a consequence of averaging because an even sharper step is very obvious in a projection along the embryonic axis as shown in panel B and D of Fig. 2, for example. The difference of a sharp step in the images and graded distribution after quantification with a spatial axis in cell number, is obvious for a-catenin in Fig. 3D and Rho signalling in Fig. 4 B. As the authors base their central conclusion (see headline) on the graded distribution, resolving the issue of spatial scale is a prerequisite of publication.
      3. The authors put the spatial distribution of Rho signalling and F-actin into the center of their conclusion. They do so by affecting the pattern with mutants in twist/snail and varying upstream factors of Rho signalling. With respect to myo activation this have been done previously although possibly with less detail and it is no new insight that the width of the mesoderm anlage and corresponding Rho signalling domain has a consequence on the shape of the groove and furrow. To maintain the conclusion of the manuscript that spatially graded Rho signalling is contributes to tissue curvature, more convincing ways to change the pattern of Rho signalling are needed. Changing the balance of GEF and GAP shows the importance of Rho signalling and possibly signalling levels but not the contribution of its spatial distribution.

      Significance

      The question of a contribution of F-actin is addressed in this manuscript. The authors quantify F-actin in fixed and living embryos at two prominent steps in ventral furrow formation, (1) shortly prior to onset of apical constrictions and (2) when the groove has formed. They distinguish junctional and „medial" cortical F-actin. They employ a discontinuous spatial axis, the number of cells away from the ventral midline but not an absolute scale (see my notes below). The measurements are applied to wild type and mutant embryos affecting the transcriptional patterning (twist, snail), adherens junctions, and Rho signalling. The authors claim to reveal by their measurements a graded distribution of F-actin intensities with a peak at the ventral midline and a second peak at the boundary between mesoderm and ectoderm with a low point in the stretching cells of the mesectoderm. The authors further claim to reveal a graded distribution of Rho signalling components within the mesoderm anlage. Based on these data the authors conclude that graded Rho signalling and depletion of F-actin promote tissue curvature.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript entitled 'Combinatorial patterns of graded RhoA activation and uniform F-actin depletion promote tissue curvature' by Denk-Lobnig et al. the authors study the organisation of junctional F-actin during the process of mesoderm invagination during gastrulation in the model Drosophila. Following on from previous work by the same lab that identified and analysed a multicellular myosin II gradient across the mesoderm important for apical constriction and tissue bending, the authors now turn their attention to actin. Using imaging of live and fixed samples, they identify a patterning of F-actin intensity/density at apical junctions that they show is dynamically changing going into mesoderm invagination and is set up by the upstream transcription factors driving this process, Twist and Snail. They go on to show, using genetic perturbations, that both actin and the previously described myosin gradient are downstream of regulation and activation by RhoA, that in turn is controlled by a balance of RhoGEF2 activation and RhoGAP C-GAP inactivation. The authors conclude that the intricate expression patterns of all involved players, that all slightly vary from one another, is what drives the wild-type distinctive cell shape changes in particular rows of cells of the presumptive mesoderm and surrounding epidermis.

      This is a very interesting study analysing complex and large-scale cell and tissue shape changes in the early embryo. Much has been learned over the last decade and more about many of the molecular players and their particular behaviours that drive the process, but how all upstream regulators work together to achieve a coordinated tissue-scale behaviours is still not very well understood, and this study add important insights into this.

      The experiments seem well executed and support the conclusion drawn, but I have a few comments and questions that I feel the authors should address to strengthen their argument.

      General points:

      1. The authors state early on that they chose to focus on junctional rather than apical medial F-actin, but it is unclear to me really what the rationale behind that is. In much of the authors earlier work, they study the very dynamic behaviour of the apical-medial actomyosin that drives the apical cell area reduction in mesodermal cells required for folding. They have previously analysed F-actin in the constricting cells, but have only focused on the most constricting central cell rows (Coravos, J. S., & Martin, A. C. (2016). Developmental Cell, 1-14). The role of junctional F-actin compared to the apical-medial network on which the myosin works to drive constriction is much less clear, it could stabilize overall cell shape or modulate physical malleability or compliance of cells, or it could more actively be involved in implementing the 'ratchet' that needs to engage to stabilise a shrunken apical surface. I would appreciate more explanation or guidance on why the authors chose not to investigate apical-medial F-actin across the whole mesoderm and adjacent ectoderm, but rather focused in junctional F-actin, especially explaining better throughout what they think the role of the junctional F-actin they measure is.
      2. Comparing the F-actin labeling in the above previous paper to the stainings/live images shown here, they look quite different. This is most likely due to the authors here not showing the whole apical area but focusing on junctional, i.e. below the most apical region. It is not completely clear to me from the paper at what level along the apical-basal axis the authors are analysing the junctional F-actin. Supplemental Figure 2 seems to suggest about half-way down the cell, which would be below junctional levels. Could the authors indicate this more clearly, please? Overall, I would appreciate if the authors could supply some more high-resolution images of F-actin from fixed samples (which I assume will give the better resolution) of how F-actin actually looks in the different cells with differing levels. Is there for instance a visible change to F-actin organisation? And could this help explain the observed changes in intensity and their function?
      3. Along the same lines of thought as in point 2): Dehapiot et al. (Dehapiot, B., ... & Lecuit, T. (2020). Assembly of a persistent apical actin network by the formin Frl/Fmnl tunes epithelial cell deformability. Nature Cell Biology, 1-21) have recently shown for the process of germband extension and amnioserosa contraction that two pools of F-actin can be observed, a persistent pool not dependent on Rho[GTP] and a Rho-[GTP] dependent one. Could the authors comment on what they think might occur in the mesoderm, are similar pools present here as well?
      4. As the authoirs state themselves, Rho does not only affect actin via diaphanous, but of course also myosin via Rock. So it would be good to refelect this more in the interpretation and discussion of data, as the causal timeline could be complex.

      More specific comments to experiments and figures:

      1. Reduction of junction function by alpha-catenin-RNAi: how strong is the reduction in catenin? Could they label a-catenin in fixed embryos? The authors conclude the original pre-constriction patterning of F-actin intensity is not dependent on intact junctions, but they show that the increase in F-actin in the mesodermal cells concomitant with apical constriction is in fact impaired in the RNAi. Thus, the authors can also not conclude whether the continued accumulation of myosin and its persistence depend on intact junctions. The initial set-up of the myosin gradient in terms of intensity distribution is unaffected, but clearly dynamics, subcellular pattern, interconnectivity etc. of myosin are affected and thus may well depend on some mechanical feed-back. I find this section of the manuscript slightly overstated and feel the conclusion should be more cautious.
      2. Figure 1 versus Figure 2: Why do the Utrophin-ABD virtual cross-sections look so fuzzy and bad in comparison to phalloidin labelled F-actin in the virtual cross-section in Fig. 1B and C? The labelling shown in 2B and D does not even look very junctional...
      3. Figure 5 C and D: the control gradients for myosin shown in C and D are completely different, for C the half-way height cell row is deduced as 5 whereas the (in theory identical) control measure in D has row 3 at halfway height! Why is this? Putting all curves together in the same panel would suggest that that C control curve is very similar to RhoGEF2-OE! This can't be right.
      4. Still in Figure 5: Panels C and D again, but for apical area: are the control and C-GAP-RNAi or RhoGEF2-OE curves significantly different? What statistics were used on this?
      5. Supplemental Figure 1: Panels in D: I appreciate this control, but would really also like to see the same control at a stage when folding has commenced and stretched cells are present at the margin of the mesoderm. How homogenous does the GAP43 label look in those?
      6. Supplemental Figure 5: Panel 5 B: the authors conclude that the myosin gradient under RhoGEF2 RNAi is not smaller, but looking at the curves it in fact looks wilder. They also mention that the overall level of myosin in this condition is lower than the control...
      7. Following on from the above, a comment of Figure 7:
        • The authors use RhoGEF2 RNAi stating that it affects the actin pattern, but the myosin pattern also seems affected. In line 318 the authors state that they use this condition to look at how junctional actin density affects curvature. I find this phrase misleading as It might lead the readers to think that RHoGEF2 RNAi only affects junctional F-actin, although it also affects myosin patterning.
        • Line 311: confusingly, the authors state that an increase in the actomyosin gradient affects curvature.But it is only the myosin gradient that is increased, while the junctional actin gradient is flatter than the control in both C-GAP RNAi and RhoGEF2 OE (the distinction is even made by authors line 243). Could this be clarified?

      Significance

      Morphogenesis of organs, and how these highly coordinated processes are driven by transcriptional events, local control (of for instance cytoskeletal behaviour), is a major field in developmental and cell biology. Advances over the last decade have led to a much better understanding of the role of myosin (in the form of actomyosin) in defining cell and therefore tissue shape in morphogenesis. The role and control fo actin organisation, that the myosin depends upon for its action, is much less understood. Thus this study will add an important piece of understanding of the basic control of morphogenesis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their enthusiastic support for our work and their insightful comments and suggestions which we believe strengthen the manuscript. Below we detail how we propose to respond to each of the specific points raised by each reviewer.

      Reviewer #1__

      1). It is convincingly shown that adding insulator elements (cHS4) reduces crosstalk between the two PAX6 CREs tested (Fig. 3). However, it is unclear if this approach will work for other CREs. This point should be discussed, and perhaps the authors could give some troubleshooting advice (e.g. adding more insulators or trying different insulator elements?).

      We will include these suggestions in the discussion and describe some ongoing efforts to characterise another insulator element in our assay.

      2). All CREs used in proof-of-concept experiments in this work have well known activities in zebrafish embryos. A new/uncharacterized CRE has not been tested yet using this system. It is unclear from the workflow (Fig. 1B) what happens if the CRE does not drive detectable levels of EGFP/mCherry. How does one determine whether lack of reporter expression is due to technical problem (with the transgene or phiC31 integration) or that the CRE is not active in zebrafish? Perhaps adding a PCR-based genotyping step could address this potential problem?

      We will include a PCR-based genotyping assay in the description of the assay pipeline and discuss its utility in assessing successful integration events as suggested by the reviewer.

      3). Other limitations of the system should also be discussed. For example, the system appears to be useful for identifying variant CREs that result in a change (either loss or gain) of temporal or spatial activity, but it is not clear how subtle changes in expression level (either slightly increased or decreased) would be identified or quantified. Perhaps other approaches could be used in combination with this system to fully analyze mutant CRE activity. Another limitation is that this approach is only be applicable to CREs that are active in the first few days of zebrafish embryonic development.

      We will include these suggestions in the discussion and clearly address the limitations of the system

      **Minor points:**

      i) Although it is discussed in the previous work published in PLoS Genetics, it is probably worth mentioning here why the gata2 minimal promoter was chosen for the reporter system.

      The choice of the gata2 promoter in our constructs was based on previously published work from our group. We will re-iterate and reference these studies in the workflow description.

      1. ii) It would be helpful if the cSH4 element is briefly described (e.g. "insulator element") in Fig.1 legend. We will modify the figure legend according to the suggestion.

      3). It is not clear from the manuscript whether the new reagents reported here-including dual reporter vectors and transgenic attB landing site zebrafish strains-will be made available to the scientific community, or how these reagents would be distributed.

      We would include a section describing our plans for distribution of reagents and tools described in the manuscript. All the vectors would be deposited in Addgene for distribution and all the zebrafish lines would be openly shared with the scientific community.

      Reviewer #2:

      1. The dual reporter system uses EGFP and mCherry to report the activities of two different CREs in the same animal. However, EGFP and mCherry have drastically different fluorescence properties which have not been measured particularly well in vivo and especially not in zebrafish. They have different maturation times (mCherry is much quicker). Both are quite stable in vivo, but mCherry is particularly stable in cell culture and in vivo, even resisting lysosomal degradation (EGFP does not - it is acid and protease sensitive) (Katayama et al., 2008; McWilliams et al., 2016). Often, promoter activity assays in zebrafish employ short lived "destabilized" FPs, such as destabilized GFP and destabilized dsRed. With stable FPs, false positives could be reported due to the fluorescent signal remaining for a long period of time after promoter activity has ceased. Replacing the traditional FPs with destabilized versions could be one way to improve the temporal resolution of this assay. This is probably not necessary to do in the present study but might be a worthy future direction.

      We would discuss these points in the possible limitations of our assay and will also endeavour to incorporate these suggestions in future versions of our assays.

      However, no matter which pair of FPs is chosen, there will be differences in signal intensity/brightness and decay rate. Thus, the FP swap experiments should be employed for any experiment claiming a temporal (Fig. 4) or quantitative (Fig. 5) difference between CRE activation or deactivation. If the EGFP/mCherry swap experiments show the same results, the confidence in the assay will be significantly bolstered.

      We estimate the proposed experiments to take about 4 months to allow for molecular cloning of the FP swapped constructs, injection into the "landing" strain, raising to sexual maturity (2.5 mo), screening for founders, and performing the imaging. These are the only two suggested experiments I would need to feel confident in the results and to recommend publication

      We appreciate the reviewer’s suggestion but would point out that we included dye-swaps for the PAX6-CREs described in Figure 3 in this manuscript. The dye-swap experiment for SBE2WT/SBE2Mut were described in our previous work published in Plos Genetics. However, to increase the confidence of the readers in our current system we would include the other suggested dye swaps in the revised version of our manuscript.

      Reviewer #3:

      **Major comments**

      1. First, given the importance of quality landing lines for the methodology, I would like to see more clarity and emphasis on validation of the Shh-SBE2 landing pad in the main text. Based on supplemental tables 1 and 2, this reviewer is somewhat unclear on whether there is one or three lines with Shh-SBE2 based landing pads (one site is mentioned in table 1, but table 3 mentions three F0 lines, and the text is ambiguous). The authors also state that the Shh-SBE2 landing pad is a single copy integration, but the data supporting this conclusion does not appear to be included (linker mediated PCR does not rule out other integrations).

      We will provide a detailed description of the landing lines addressing all the concerns raised by the reviewer.

      It would also be useful to have more clear numbers indicating the reproducibility of the expression pattern in F1 animals. Do 100% of F1 progeny from multiple crosses show the integration show the expression pattern in image 2 A? If there is variability how much, and how many fish were examined? This reviewer also wonders whether appropriate expression of Shh-SBE2 in this landing site is enough to call it neutral. For example, perhaps position effects might be observed with a different weaker CRE in this site? Better documentation will allow for more widespread and appropriate use of the landing pad.

      We will expand the description for the part of the pipeline the reviewer is referring to, providing the details of transgene segregation.

      Similar concerns apply to the integration of test constructs. To evaluate the practicality of the approach, it would be useful to have numbers reporting the frequency of recovering F1 individuals with PhiC mediated integration of the reporter into the desired landing site. It is also important to provide better documentation of the degree of reproducibility in expression patterns between F1 progeny. Numbers of embryos imaged and fraction with the indicated expression pattern are needed for all data in the main text. At minimum, gross expression patterns should be examined in at least 10 F1 larvae. If there is variability between individuals, some image documentation of this in supplementary data would be welcome.

      We will include the suggested information in the results and provide the supplementary data as suggested by the reviewer.

      **Minor comments:**

      i) For figure 1, it may be clearer to present generation of the landing pad lines and screening of CRES using these lines in separated figure panels (B) for generation of landing pads, and (C) for CRE analysis.

      We will modify figure 1 as suggested.

      ii) Landing pads that were less effective might also be moved out of figure 2, to the supplemental material to help improve clarity and to allow for focus on the tools with the most utility

      We will modify figure 2 as suggested.

      iii) Scale bars should be included in all images,

      This will be done for all the images

      iv) In some cases, image labeling somewhat obscures the relevant features

      We will rectify this in the revised version

      v) To help evaluate consistency, in all relevant figures (4, 5, sup fig 3 ect) the number of embryos examined should be included in the legend.

      We will modify the figure legends to include this information

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The manuscript by Bhatia addresses a longstanding need for rigorous methods to directly compare the effectiveness of cis-regulatory elements (CREs) during vertebrate embryogenesis. The manuscript describes a method for simultaneous quantitative assessment of the spatial and temporal activity of wild-type and mutant CREs using live imaging in zebrafish embryos. The approach takes advantage of a predefined neutral docking site, and dual-CRE reporter cassette that can be integrated into this site using PhiC31. Using this method, the authors demonstrate subtle differences in the spatial and temporal dynamics of two shh CREs that have been previously reported to have similar domains of activity, and they demonstrate changes in CRE activity in embryos harboring a disease specific mutation in the SBE2 CRE.

      Major comments

      Overall this manuscript describes a valuable tool and key conclusions regarding its need and utility convincing. However, some additional documentation of methods and key reagents, and numbers would be of value.

      First, given the importance of quality landing lines for the methodology, I would like to see more clarity and emphasis on validation of the Shh-SBE2 landing pad in the main text. Based on supplemental tables 1 and 2, this reviewer is somewhat unclear on whether there is one or three lines with Shh-SBE2 based landing pads (one site is mentioned in table 1, but table 3 mentions three F0 lines, and the text is ambiguous). The authors also state that the Shh-SBE2 landing pad is a single copy integration, but the data supporting this conclusion does not appear to be included (linker mediated PCR does not rule out other integrations). It would also be useful to have more clear numbers indicating the reproducibility of the expression pattern in F1 animals. Do 100% of F1 progeny from multiple crosses show the integration show the expression pattern in image 2 A? If there is variability how much, and how many fish were examined? This reviewer also wonders whether appropriate expression of Shh-SBE2 in this landing site is enough to call it neutral. For example, perhaps position effects might be observed with a different weaker CRE in this site? Better documentation will allow for more widespread and appropriate use of the landing pad.

      Similar concerns apply to the integration of test constructs. To evaluate the practicality of the approach, it would be useful to have numbers reporting the frequency of recovering F1 individuals with PhiC mediated integration of the reporter into the desired landing site. It is also important to provide better documentation of the degree of reproducibility in expression patterns between F1 progeny. Numbers of embryos imaged and fraction with the indicated expression pattern are needed for all data in the main text. At minimum, gross expression patterns should be examined in at least 10 F1 larvae. If there is variability between individuals, some image documentation of this in supplementary data would be welcome.

      Presumably nearly all of this data has already been collected during validation of the tools and just isn't reported clearly, so these updates would not require significant time or cost.

      Minor comments:

      With respect to clarity, while the authors do an excellent job of explaining the rational for their system, the details of execution in the manuscript can be difficult to follow at times, below are minor suggestions to help the reader follow more easily.

      For figure 1, it may be clearer to present generation of the landing pad lines and screening of CRES using these lines in separated figure panels (B) for generation of landing pads, and (C) for CRE analysis.

      Landing pads that were less effective might also be moved out of figure 2, to the supplemental material to help improve clarity and to allow for focus on the tools with the most utility

      Scale bars should be included in all images,

      In some cases, image labeling somewhat obscures the relevant features

      To help evaluate consistency, in all relevant figures (4, 5, sup fig 3 ect) the number of embryos examined should be included in the legend.

      Significance

      This manuscript is significant as if provides useful tools for direct comparison of CRE activity in stable transgenic embryos, where two CREs are integrated into a single genomic location. The method offers an advance in efficiency and rigor compared to past approaches. As a zebrafish researcher, it is easy to recognize the value of having a transgenic line with a validated neutral landing site for transgene analysis, and having a well-designed construct for detailed in vivo comparison of CRE activity.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This study presents a dual fluorescent protein (FP) reporter system to determine differential activities of Cis regulator elements (CREs) on transcription factor behavior in an in vivo setting. The strategy uses the PhiC31 system to ensure single copy insertion into a consistent genomic locus and is an important improvement over the authors' previous work using a similar system with random genomic integration and separated FP constructs. Because different genomic loci are more accessible than others, comparing the activities of randomly inserted CREs cannot be quantitative and requires generation and comparison of multiple lines for each CRE to validate. The bulk of this study is validation of the new specifically inserted, dual FP system including showing that including insulator sequences between the CREs of interest is necessary to prevent crosstalk. The last two figures demonstrate the utility of the system to interrogate spatial and temporal regulation of CRE variants and the quantitative expression levels of a mutant and WT CRE pair. This is an exciting tool with clear potential to uniquely compare CRE activities in vivo, and the results are clearly presented. However, given that the impact of this study is as a technical improvement over previous methods and that it is aimed to demonstrate the robustness and utility of the reporter system, additional controls are necessary to demonstrate that FP choice does not influence the temporal or quantitative readouts.

      The dual reporter system uses EGFP and mCherry to report the activities of two different CREs in the same animal. However, EGFP and mCherry have drastically different fluorescence properties which have not been measured particularly well in vivo and especially not in zebrafish. They have different maturation times (mCherry is much quicker). Both are quite stable in vivo, but mCherry is particularly stable in cell culture and in vivo, even resisting lysosomal degradation (EGFP does not - it is acid and protease sensitive) (Katayama et al., 2008; McWilliams et al., 2016). Often, promoter activity assays in zebrafish employ short lived "destabilized" FPs, such as destabilized GFP and destabilized dsRed. With stable FPs, false positives could be reported due to the fluorescent signal remaining for a long period of time after promoter activity has ceased. Replacing the traditional FPs with destabilized versions could be one way to improve the temporal resolution of this assay. This is probably not necessary to do in the present study but might be a worthy future direction. However, no matter which pair of FPs is chosen, there will be differences in signal intensity/brightness and decay rate. Thus, the FP swap experiments should be employed for any experiment claiming a temporal (Fig. 4) or quantitative (Fig. 5) difference between CRE activation or deactivation. If the EGFP/mCherry swap experiments show the same results, the confidence in the assay will be significantly bolstered.

      We estimate the proposed experiments to take about 4 months to allow for molecular cloning of the FP swapped constructs, injection into the "landing" strain, raising to sexual maturity (2.5 mo), screening for founders, and performing the imaging. These are the only two suggested experiments I would need to feel confident in the results and to recommend publication.

      Significance

      The impact of this study is as a technical improvement over previous methods and is aimed to demonstrate the robustness and utility of the reporter system.

      The manuscript is geared towards zebrafish experts with an interest in the imaging of intracellular and transcriptional processes.

      Our laboratory has expertise in zebrafish developmental genetics and live imaging of reporters.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This is a technical manuscript that describes a new transgenic reporter system in zebrafish that is designed to simultaneously test the activity of two cis-regulatory elements (CREs) in the same living embryo. This is an extension of previous work from the authors that established methods to compare two CREs in transgenic zebrafish embryos (published in PLoS Genetics; DOI: 10.1371/journal.pgen.1005193). Here, to address the problem of position effects caused by random transgene integration, the authors have created a dual reporter transgene that can be integrated into a specific neutral site (using phiC31 recombination) in the zebrafish genome. Expression of different fluorescent proteins (EGFP and mCherry) are regulated by two CREs of interest in the zebrafish embryo, which allows visualization of the temporal and spatial activity of the CREs in real time during embryonic development. The authors propose this system could be used to directly compare wild-type and mutant CREs, and then provide several lives of evidence that establish proof-of-concept. Overall, the results are clearly presented, and the conclusions are convincing. The description of methods (including supplemental tables) is extensive, which will facilitate reproducibility. The manuscript is succinct, and describes a useful approach to characterize CREs. However, I have a few points for the authors to consider:

      Major points:

      1)It is convincingly shown that adding insulator elements (cHS4) reduces crosstalk between the two PAX6 CREs tested (Fig. 3). However, it is unclear if this approach will work for other CREs. This point should be discussed, and perhaps the authors could give some troubleshooting advice (e.g. adding more insulators or trying different insulator elements?).

      2)All CREs used in proof-of-concept experiments in this work have well known activities in zebrafish embryos. A new/uncharacterized CRE has not been tested yet using this system. It is unclear from the workflow (Fig. 1B) what happens if the CRE does not drive detectable levels of EGFP/mCherry. How does one determine whether lack of reporter expression is due to technical problem (with the transgene or phiC31 integration) or that the CRE is not active in zebrafish? Perhaps adding a PCR-based genotyping step could address this potential problem?

      3)Other limitations of the system should also be discussed. For example, the system appears to be useful for identifying variant CREs that result in a change (either loss or gain) of temporal or spatial activity, but it is not clear how subtle changes in expression level (either slightly increased or decreased) would be identified or quantified. Perhaps other approaches could be used in combination with this system to fully analyze mutant CRE activity. Another limitation is that this approach is only be applicable to CREs that are active in the first few days of zebrafish embryonic development.

      Minor points:

      1)Although it is discussed in the previous work published in PLoS Genetics, it is probably worth mentioning here why the gata2 minimal promoter was chosen for the reporter system.

      2)It would be helpful if the cSH4 element is briefly described (e.g. "insulator element") in Fig.1 legend.

      3)It is not clear from the manuscript whether the new reagents reported here-including dual reporter vectors and transgenic attB landing site zebrafish strains-will be made available to the scientific community, or how these reagents would be distributed.

      Significance

      This work introduces a new method to analyze cis-regulatory element (CRE) activity in vivo. By generating transgenic zebrafish with a neutral phiC31 landing site for reporter transgene integration, this work improves on previous methods by overcoming the problem of position effects caused by random transgene integration. This will be useful approach to characterize CREs during embryonic development, and variant CREs associated with human disease. This paper will be of interest to developmental biologists, and geneticists trying to understand CRE activity. I have expertise in zebrafish genetics, with extensive experience using Tol2 transgenesis, and some experience using phiC31 recombination. The described experimental approach here is straightforward, and will be easy to apply in labs with experience in zebrafish transgenesis, and imaging fluorescent protein expression in embryos.

  4. Nov 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer__ #1 (Evidence, reproducibility and clarity (Required)):__ Septins are highly conserved small GTPase cytoskeletal proteins that function as molecular scaffolds for dynamic cell wall and plasma membrane-remodeling, as well as diffusion barriers restricting movement of membrane and cell wall-associated molecules. Recent work has started to unravel the functional connections between the septins, cell wall integrity MAPK pathway signaling, and lipid metabolism, however most studies have focused on a small sub-set of septin monomers and/or were conducted in primarily yeast-type fungi. Here the authors show in the filamentous fungus A. nidulans that the core hexamer septins are required for proper coordination of the cell wall integrity pathway, that all septins are involved in lipid metabolism. Especially sphingolipid, but not sterols and phosphoinositides, contributes to the localization and stability of core septins at the plasma membrane. The experiments are simple and clear, therefore the conclusion is convincing. Fig.8 model, I would like to see the situation of septin mutant.

      We thank the reviewer for the positive comments. In response to the request from this reviewer and a similar one from reviewer 2 for more on the effect of the loss of individual septins, we added text clarifying the roles of core hexamer, core octamer and noncore septins throughout the manuscript including in the legend to Fig 8 (li 439-444) and the discussion (li 388-402). Please see responses to reviewer 2 comments for more detail.

      Reviewer #1 (Significance (Required)):

      Since localization of cell wall synthesis proteins, lipid domains and septins are likely to depend on each other, sometimes difficult to evaluate the effect is direct or indirect. The comprehensive analyses like performed here are helpful to catch the overview in the field.

      Reviewer__ #2 (Evidence, reproducibility and clarity (Required)):__ **Summary** The study by Mela and Momany describes the function of core septins of A. nidulans and links with the requirement of the cell wall integrity pathway and the sphingolipids which, are required for membrane and cell wall stability. The study is of interest for the fungal genetics community, and the authors have conducted a substantial amount of work in a field they have substantial experience. However, one of the main weaknesses of the manuscript is the assumption whether the CWI pathway controls de septin function of if the core septins control it.

      We agree that while our data clearly indicate interactions between the septins and the CWI pathway, which component controls the other is not clear. We have modified the text to address this concern in several places as detailed in responses to the reviewer’s specific comments below.

      **Major comments** In the abstract, the authors claim that double mutant analysis suggested core septins function downstream of the final kinase of the cell wall integrity pathway. However, from the experiments showed, it is difficult to be convinced about that. The authors should make efforts do make it clear in the manuscript and the discussion. For example: -Line 25-26 (abstract): "Double mutant analysis suggested core septins function downstream of the final kinase of the cell wall integrity pathway."

      We agree that while the double mutant analysis shows interaction of septins with the CWI pathway, the evidence for them being downstream is not strong. We have revised the abstract as follows:

      Li29-30: Double mutant analysis with Δ**mpkA suggested core septins interact with the cell wall integrity pathway.”

      -Line 181-182; 219-220 (results) "Double mutant analyses suggest core septins modulate the cell wall integrity pathway downstream of the kinase cascade." This conclusion is one of the most important of the manuscript. However, this reviewer argues that it cannot be convincingly addressed if at least the phosphorylation ok the MAP kinase MpkA in the septins background is not evaluated under conditions of cell stress and sphingolipid biosynthesis inhibition. The genetic analysis alone maybe not enough to infer if septins control the CWI or the other way around. There may have compensatory effects when the CWI pathway is impaired. For example, most of the septins and mpkA double mutants seems to suppress the defect of the delta mpkA under cell wall stress. The authors should consider this idea.

      Although we discuss the epistasis experiments as one possible interpretation, we agree the genetic analysis is not enough to definitively show that the septins are upstream of the CWI pathway or the other way around. The suppression of cell wall defects by deletion of septins in a mpkA null mutant background under cell wall stress suggests a bypass of the CWI pathway for remediation of the cell wall or some other alternate regulatory node. One possible interpretation of these data could be that by inactivation of normal CWI integrity function through deletion of the final kinase, in addition to deletion of septins (possibly acting as negative regulators of CWI components), there may be a parallel node by which cell wall remediation could still occur.

      Wording throughout the abstract, results, and discussion has been modified accordingly.

      Li 29-30: Double mutant analysis with Δ**mpkA suggested core septins interact with the cell wall integrity pathway.

      Li 208-209: Double mutant analyses suggest the core septin aspB cdc3 modulates the cell wall integrity pathway in the ∆mpkA background under cell wall stress.

      Li 221-225: When challenged with low concentrations of CASP and CFW, the ∆aspBcdc3**∆mpkAslt2 and ∆aspE ∆mpkA slt2 mutants were more sensitive than ∆aspBcdc3 and ∆aspE single mutants, but suppressed the colony growth defects of ∆mpkA slt2. The novel phenotype of the double mutants shows that septins are involved in cell wall integrity and raises the possibility that they act in a bypass or parallel node for remediation of cell wall defects (Fig 4).

      Li 227-228: Fig 4. Double mutant analyses suggest core septins modulate the cell wall integrity pathway.

      Li 464-468: Double mutant analyses between septins and CWI pathway kinases also support a role for core septins in maintaining cell wall integrity under stress (Fig 4). Suppression of cell wall defects under cell wall stress by deletion of septins in an ∆mpkA slt2 background suggests a parallel node by which septins negatively regulate cell wall integrity pathway sensors or kinases could exist.

      There is no clear evidences on the manuscript that the core septins AspA, AspB, AspC, and ApsD are epithastic in A. nidulans. Therefore, the authors choice of using different Asp deletion mutants as a proxy for all the septins mutants is questionable. For example, there is no mention of why AspB was chosen for Figure 2 (chitin and β-1,3-glucan deposition), and AspA was chosen for Figure 3 (chitin synthase localization) since these experiments are correlated. The same is true for Figure S1 where AspB and AspE were used. One can wonder if some of the core septins would have a major impact in the chitin content.

      We agree with the reviewer that not all four core septins are equivalent. Previously published work from our lab shows that AspACdc11, AspBCdc3, AspCCdc12, and AspDCdc10 form octamers and that AspACdc11, AspBCdc3, and AspCCdc12 form hexamers, that both of these heteropolymers co-exist, and that the noncore septin AspE is not part of either core heteropolymer, though it appears to influence them possibly through brief interactions (Lindsay et al., 2010; Hernandez-Rodriguez et al., 2012; Hernandez-Rodriguez et al., 2014). This previous work also clearly shows that strains in which the hexameric septins have been deleted (ΔaspA, ΔaspB, and ΔaspC) have very similar phenotypes while strains in which the octamer-exclusive septin has been deleted (ΔaspD) have different phenotypes.

      In our attempt to simplify the current manuscript we discussed the four core septins as a group. In retrospect this caused us to miss important distinctions on the roles of hexamer vs octamer septins and we are grateful to the reviewer for pointing this out. We have modified language throughout the revised manuscript to specify whether results and interpretations apply to core hexamer septins, core octamer septins, the noncore septin, or individual septins. This more detailed analysis has given us several new ideas to test in future work.

      While we cannot exclude the possibility that interesting results might be produced by analyzing null alleles of each individual septin gene for all experiments, we agree with the cross-reference by Reviewer #3 that there is a very low likelihood that we would see different results by analyzing all individual septins within each subgroup (hexamer, octamer or noncore).

      To the reviewer’s questions on choice of septins for Fig 2, Fig 3, and Fig S1:

      ΔaspA, ΔaspB, and ΔaspC showed similar sensitivity to cell wall-disturbing agents in the plate-based assays in Fig 1 and are all part of the core hexamer. We have modified text including the figure legends to make it clear which septins were used in the experiments and which group they belong to.

      In a related comment about Figure 3, the reallocation of chitin synthases in the absence of septins is very interesting, but consider that all the core septin genes should be tested. Without a fully functioning cell wall, the formation of septa will be impaired. It makes their results less surprising.

      In the case of Fig 3, we were unable to recover ChsB-GFP in the ΔaspB or ΔaspC backgrounds but were able to recover it in the ΔaspA background. We have clarified as follows:

      Li184-187: To determine the localization of synthases, a chitin synthase B-GFP (chsB-GFP) strain was crossed with strains in which core hexamer septins were deleted. After repeated attempts, the only successful cross was with core hexamer deletion strain ∆aspA cdc11.

      Figure 3, Panels A and B, chitin was also labeled by Calcofluor White which clearly shows that the formation of septa was not impaired even in the septin null mutant background (this is in agreement with previous work form our lab which shows that septa still forms in individual septin null mutants). The results showed that unlike WT cells, chitin synthase is not only absent in most branch tips in the septin null mutant background, but seems to be limited primarily to longer (presumably actively growing/non-aborted) branches; these findings were surprising to us, considering other major cell wall synthesis events, such as targeting of cell wall synthases to septa during septation appeared to be unimpaired (based on the presence of fully-developed, chitin-labeled septa).

      The labeling of septa by calcofluor is now noted in the legend to Figure 3 as follows:

      Li 201: Calcofluor White labeling shows the presence of the polymer chitin at septa, main hyphal tips, branches, and …

      Why was chitin synthase B chosen to be analyzed in terms of reallocation? How many chitin synthases are in the A. nidulans genome. This rationale should be explained in the manuscript.

      We have added the following:

      Lines 173-182: A. nidulans contains six genes for chitin synthases: chsA, chsB, chsC, chsD, csmA, and csmB. Chitin synthase B localizes to sites of polarized growth in hyphal tips, as well as developing septa in vegetative hyphae and conidiophores, a pattern very similar to septin localization. Deletion of chitin synthase B shows severe defects in most filamentous fungi analyzed thus far, and repression of the chitin synthase b gene expression in chsA, chsC, and chsD double mutants exacerbated growth defects from a number of developmental states observed in each single mutant, suggesting it plays a major role in chitin synthesis at most growth stages (Fukuda et al., 2009). For these reasons, we chose chitin synthase B as a candidate to observe in septin mutant background for possible defects in localization.

      Figure 3 and Figure 4. The authors should make efforts to quantify the phonotypes they claim. They are overall very subtle, especially for Figure 3. Also, a decrease of fluorescence is a tricky observation that should be better reported by quantification.

      Line scans of aniline blue and CFW label were conducted and added as Fig S1. Quantitation was performed and added as Fig S3. See author’s response to Reviewer #3 below for details.

      Again, in Figures 5, 6, and 7, it is clear that the different septins respond differently when ergosterol or sphingolipids synthesis is impaired. It also raises the question again if there are differences in the role of septin genes. Can the authors use previous information about differences in septin function to improve the model (Figure 8)

      As described above, we have modified the manuscript throughout to clarify which phenotypes are seen for core hexamer, core octamer, and noncore septin deletions. As the reviewer notes, these are especially relevant for the sphingolipid-disrupting agents. Our model includes interaction of septins with sterol rich domains that contain both sphingolipids and ergosterol. Because it is not yet clear how subgroups of septins interact with each other and are organized at SRDs, we show all core septins in our model without distinguishing hexamers and octamers in the drawing, but we have now added text to clarify roles and outstanding questions.

      The changes are summarized in the abstract as follows:

      Li 37-40: Our data suggest that the core hexamer and octamer septins are involved in cell wall integrity signaling with the noncore septin playing a minor role; that all five septins are involved in monitoring ergosterol metabolism; that the hexamer septins are required for sphingolipid metabolism; and that septins require sphingolipids to coordinate the cell wall integrity response.

      The clarifications are reflected in the Figure 8 legend (and associated sections of the discussion) as follows:

      Li 436-441: As described in the text, our data suggest that all five septins are involved in cell wall and membrane integrity coordination. The core septins that participate in hexamers appear to be most important for sphingolipid metabolism while all septins appear to be involved in ergosterol metabolism and cell wall integrity. Because SRDs contain both sphingolipids and ergosterol and because it is not yet clear how subgroups of septins interact with each other at SRDs, we show all core septins in our model without distinguishing hexamers and octamers.

      For the above-discussed reasons, the conclusion on lines 384-388 (discussion) is not completely supported by the experiments shown in the manuscript. The authors need to make a better structured and more straightforward story emphasizing the stronger points and reducing descriptions of more speculative points.

      As discussed above, we have made changes throughout the manuscript to clarify which subgroups of septins are involved in which process and to refine our conclusions accordingly. The beginning of the discussion section has been changed as follows:

      Li 384-399: Our data show that A. nidulans septins play roles in both plasma membrane and cell wall integrity and that distinct subgroups of septins carry out these roles. Previous work has shown that the five septins of A. nidulans septins form hexamers (AspACdc11, AspBCdc3, and AspCCdc12) and octamers (AspACdc11, AspBCdc3, AspCCdc12, and AspDCdc10) and that the noncore septin AspE does not appear to be a stable member of a heteropolymer (20). The current work suggests that though all septins are involved in coordinating cell wall and membrane integrity, the roles of hexamers, octamers, and the noncore septin are somewhat different. Core hexamer septins appear to be most important for sphingolipid metabolism, all five septins appear to be involved in ergosterol metabolism, and core septins are most important for cell wall integrity pathway with the noncore septin possibly playing a minor role. As summarized in Figure 8 and discussed in more detail below, our previous and current data are consistent with a model in which: (A) All five septins assemble at sites of membrane and cell wall remodeling in a sphingolipid-dependent process; (B) All five septins recruit and/or scaffold ergosterol and the core hexamer septins recruit and/or scaffold sphingolipids and associated sensors at these sites, triggering changes in lipid metabolism; and (C) The core septins recruit and/or scaffold cell wall integrity machinery to the proper locations and trigger changes in cell wall synthesis. The noncore septin might play a minor role in this process.

      Minor comments Overall the figure caption could be shortened. They are too descriptive and contain details that are easily inferred for the images and from the materials and methods.

      Legends to the following figures have been streamlined by removing portions that belong in the methods: Figure 2, Fig 3, and Fig 6

      The authors made every effort to cove the precedent literature, but the manuscript has 115 references. The authors should evaluate if all the cited literature is extremely relevant. The manuscript would benefit for that conciseness.

      Because this manuscript addresses septins, ergosterol, sphingolipids, cell wall integrity, and multiple different pathways, there is a lot of literature underlying our approaches. Our strong preference is to cite primary literature, however we can shorten our reference list by relying on reviews if requested by the journal.

      Line 124, 493: Replace 10ˆ7, 10ˆ4 to 107, 104, etc

      “10^7” and all other scientific notation was altered to replace carrots “^7” with superscripts “7” throughout.

      The use of fludioxonil as a probe to detect cell wall impairment is perhaps out of context. This drug responds primarily to the HOG pathway and also respond to oxidative damage. So, these results could be suppressed.

      Previous work by Kojima et al., 2006 showed that in addition to the HOG pathway, cell wall integrity is required for resistance to fludioxonil treatment. C. neoformans cell wall integrity mutants bck1, mkk1, and mpk1 (Aspergillus nidulans bckA, mkkA, and mpkA homologues) all exhibit hypersensitivity to fludioxonil, and this was shown to be remediated by the addition of osmotic stabilizers, suggesting cell wall impairment was involved in the growth defect produced by this treatment. Although this drug seems to act primarily through the HOG pathway, the CWI and HOG pathways have been shown to antagonize/negatively regulate one another through a parallel pathway (SVG pathway in yeast) (Lee and Elion, 1999). It has been hypothesized that internal accumulation of glycerol by constitutive activation of the HOG pathway causes decreased cell wall integrity. Due to the apparent cross-pathway control between the HOG and CWI pathways, as well as the high level of conservation of these pathway components in filamentous fungi, we thought this treatment was rightfully dual-purposed to investigate both cell wall impairment in the septin mutants and any possible involvement of the HOG pathway. This seems to be would a reasonable drug treatment to look at cell wall impairment that is not likely to be redundant with the modes of action observed in the other Figure 1 treatments (e.g. CFW, Congo Red, and Caspofungin).

      The text clarifies this point as follows: li 110-112: Fludioxonil (FLU), a phenylpyrrol fungicide that antagonizes the group III histidine kinase in the osmosensing pathway and consequently affects cell wall integrity pathway signaling (Fig 1)(58-67).

      Line 140: "exposure" would be more appropriate than architecture. Please also consider that the difference in the cell wall reported in Figure S1 are very subtle. Are they relevant?

      The differences in the cell wall content reported in Figure S1 (Figure S2 in the revised manuscript) showed that the peak for 4-Glc was almost identical in WT and aspB null mutant, however the overall ratio of peaks switched, where 4-GlcNac content exceeded the 4-Glc content in the mutant compared to WT. By comparison, this was not the case with the septin aspE null mutant. Although this could be considered a ‘subtle’ change in chitin content, we believe this was an important unbiased analysis of the cell wall polysaccharide content and addressed some of the cell wall sensitivity phenotypes we observed, not only between WT and the septin mutants, but also between the septin null mutants which showed sensitivity to cell wall disturbing agents (i.e. aspA, aspB, and aspC) vs. those that did not show significant sensitivity (e.g. aspE). For these reasons we believe this warranted at the very least a supplemental figure for these data.

      Though our idea of cell wall architecture includes changes in polymer exposure, as pointed out by the reviewer, others might use the phrase to mean only content changes. To avoid this misunderstanding, we have replaced the word “architecture” with “organization” in Li 147-148: These data show that cell wall organization is altered in ∆aspB cdc3 and raise the possibility that it might be altered in other core hexamer septin null mutants as well.

      Line 144: explain briefly what it is about and why it was chosen instead of the total detection of chitin sugar monomers. Line 538: Cell wall extraction section. Is this a new method? There is no supporting literature.

      We chose this method because it provides an analysis of all cell wall polysaccharide components and associated linkages. Detection of chitin sugar monomers would have also been a reasonable analysis if this were the only component of the cell wall we were investigating initially. The results showed differences in cell wall chitin content, so these were the data we presented.

      This was addressed on lines 574-576: “Cell walls were isolated from a protocol based on (Bull, 1970); cell wall extraction and lyophilization were conducted as previously described in (Guest and Momany, 2000) with slight modifications listed in full procedure below.”

      The results described on lines 232-257 are marginal to the study and are not exploited by the authors to address the central question of the manuscript, which is the role of the CWI pathway, septins, and sphingolipids. This section could be suppressed or very briefly mentioned in the preceding section.

      We agree that these data did not show any additional involvement of septins in the Calcineurin and cAMP-PKA pathways, and the relevance of the TOR signaling pathway connection is still quite unclear. For this reason, these data were added as a supplemental figure. On the other hand, there are a number of important signaling pathways which have been shown to affect the Cell Wall Integrity pathway directly and indirectly (these three pathways in particular), which is part of the central question of the manuscript. Considering such extensive ‘cross-talk’ between pathways (references produced on Line 65) in filamentous fungi, we felt it necessary to inspect possible involvement of these pathways in septin function via plate-based assays and feel that this s most clearly communicated as its own brief section in the text.

      Reviewer #2 (Significance (Required)): The topic of the manuscript is highly relevant to the fungal biology field and employs a very important genetic model. The cooperation of signaling pathways in mains aspects of fungal physiology is the main significant contribution of this manuscript. Reviewer__ #3 (Evidence, reproducibility and clarity (Required)):__ **Summary:** In this work the authors use genetic analysis in Aspergillus nidulans to identify phenotypes of septin mutants that point to roles for septins in coordinating the cell wall integrity pathway with lipid metabolism in a manner involving sphingolipids. Most of the major conclusions derive from monitoring the effects of combined genetic or chemical manipulations that target specific components of the pathways of interest. Additionally, the authors monitor the subcellular localization of septins, cell-wall modifying enzymes, and components of the cell wall itself. **Major comments:** The key conclusions are convincing, with the unavoidable caveat that null mutations of this sort and chemical inhibitors of these kinds could have unanticipated effects, such as upregulation of unexpected pathways or other compensatory alterations. The authors qualify their conclusions appropriately in this regard. The methods are explained very clearly and the data are presented appropriately. In some cases results are shown as representative images illustrating altered localization of a protein or a cell wall component. The changes observed in the experimental conditions are fairly obvious, but some quantification would not be difficult and would likely make the results even more obvious. For example, the Calcofluor White staining patterns might be nicely quantified by linescans along the hyphal length, and the same is true for AspB-GFP localization upon addition of drugs.

      We thank the reviewer for the positive comments and have made the suggested changes as follows:

      Line scans of aniline blue and CFW label were conducted and added as Fig S1. Text has been modified accordingly (Li 140-147).

      Quantification of Chitin synthase-GFP localization and CFW staining and statistical analysis have now been added as Figure S3 and main text (Li 187-191) has been modified accordingly.

      I could imagine one simple experiment that might generate interesting and relevant results, but by no means would this be a critical experiment for this study. In yeast, exposure to Calcofluor triggers increased chitin deposition in the wall. It would be interesting to know how Calcofluor staining looks in WT or septin-mutant cells that have been growing the presence of Calcofluor for some time, particularly with regard to the localization of chitin deposition in these cells. Such experiments could help connect the idea of septins as sensors of membrane lipid status and also effectors of CWI signaling.

      This is a cool idea that we will pursue in future work. Thanks!

      **Minor comments:** • Body text refers to Figure 1A and 1B but the figure itself does not have panels labeled A or B.

      Figure 1 was revised to show panels A and B labeled clearly.

      • Line 885: "S3" is missing from the beginning of the title of the figure.

      “S” was added to the figure title.

      Reviewer Identity: This is Michael McMurray, PhD, Associate Professor of Cell and Developmental Biology, University of Colorado Anschutz Medical Campus

      Reviewer #3 (Significance (Required)): This is an important conceptual advance in our understanding of septin function because previous work in fungal septins mostly points toward them being important in directing or restricting the localization of other proteins that modify the cell wall or plasma membrane. This new work suggests that septins can play a sensing role, as well. As a fungal (budding yeast) septin researcher myself, I think that other fungal septin researchers would be very interested in these results, and I also think the broader septin community would appreciate it. Additionally, those studying fungal cell wall and plasma membrane biogenesis and coordination, including the Cell Wall Integrity Pathway, will be interested. REFEREES CROSS COMMENTING After reading Reviewer #1's comments, I agree that it would be appropriate to modify the wording of the authors' conclusions about where the septins lie in the CWI pathway (upstream or downstream). While they do mention that there may be other ways to interpret their results, a reader would have to search for the mention of these caveats and if the reader did not, then the strong conclusion statements might be taken as fact.

      The abstract, main text, and discussion have been modified to show that while there is evidence that the septins interact with the CWI pathway, it is not clear which component is upstream vs downstream. See response to reviewer 2 above for details.

      On the other hand, I don't think additional experiments looking at deletions of the other core septins will be worthwhile. I think that there is sufficient evidence to suspect that any single core septin deletion mutant will behave similar to another, and therefore that any one can be taken as representative. While it's possible that the authors might find something informative by looking at other mutants, I personally find the likelihood too low to justify additional experimentation along those lines.

      Based on results from previous work from our lab, there are two subgroups of core septins in A. nidulans (hexamer and octamer) and septins within subgroups appear to behave similarly. The results from the current work support this idea with the same groups of mutants behaving in very similar ways. So, the core hexamer septins, AspACdc11, AspBCdc3, and AspCCdc12 can be used to make predictions about each other, but not about the octamer-exclusive septin AspDCdc10 or the noncore septin AspE. We agree with reviewer 3 that repeating analysis on multiple septins within a subgroup is not likely to give new insight. However, we were not careful in the original version of the manuscript to distinguish between core hexamer and octamer septins. As detailed in the response to reviewer 2 above, we have modified the manuscript throughout to make clear which subgroup of septins were being examined and to put conclusions into this context.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this work the authors use genetic analysis in Aspergillus nidulans to identify phenotypes of septin mutants that point to roles for septins in coordinating the cell wall integrity pathway with lipid metabolism in a manner involving sphingolipids. Most of the major conclusions derive from monitoring the effects of combined genetic or chemical manipulations that target specific components of the pathways of interest. Additionally, the authors monitor the subcellular localization of septins, cell-wall modifying enzymes, and components of the cell wall itself.

      Major comments:

      The key conclusions are convincing, with the unavoidable caveat that null mutations of this sort and chemical inhibitors of these kinds could have unanticipated effects, such as upregulation of unexpected pathways or other compensatory alterations. The authors qualify their conclusions appropriately in this regard.

      The methods are explained very clearly and the data are presented appropriately. In some cases results are shown as representative images illustrating altered localization of a protein or a cell wall component. The changes observed in the experimental conditions are fairly obvious, but some quantification would not be difficult and would likely make the results even more obvious. For example, the Calcofluor White staining patterns might be nicely quantified by linescans along the hyphal length, and the same is true for AspB-GFP localization upon addition of drugs.

      I could imagine one simple experiment that might generate interesting and relevant results, but by no means would this be a critical experiment for this study. In yeast, exposure to Calcofluor triggers increased chitin deposition in the wall. It would be interesting to know how Calcofluor staining looks in WT or septin-mutant cells that have been growing the presence of Calcofluor for some time, particularly with regard to the localization of chitin deposition in these cells. Such experiments could help connect the idea of septins as sensors of membrane lipid status and also effectors of CWI signaling.

      Minor comments:

      • Body text refers to Figure 1A and 1B but the figure itself does not have panels labeled A or B. • Line 885: "S3" is missing from the beginning of the title of the figure.

      Reviewer Identity: This is Michael McMurray, PhD, Associate Professor of Cell and Developmental Biology, University of Colorado Anschutz Medical Campus

      Significance

      This is an important conceptual advance in our understanding of septin function because previous work in fungal septins mostly points toward them being important in directing or restricting the localization of other proteins that modify the cell wall or plasma membrane. This new work suggests that septins can play a sensing role, as well. As a fungal (budding yeast) septin researcher myself, I think that other fungal septin researchers would be very interested in these results, and I also think the broader septin community would appreciate it. Additionally, those studying fungal cell wall and plasma membrane biogenesis and coordination, including the Cell Wall Integrity Pathway, will be interested.

      REFEREES CROSS COMMENTING

      After reading Reviewer #1's comments, I agree that it would be appropriate to modify the wording of the authors' conclusions about where the septins lie in the CWI pathway (upstream or downstream). While they do mention that there may be other ways to interpret their results, a reader would have to search for the mention of these caveats and if the reader did not, then the strong conclusion statements might be taken as fact. On the other hand, I don't think additional experiments looking at deletions of the other core septins will be worthwhile. I think that there is sufficient evidence to suspect that any single core septin deletion mutant will behave similar to another, and therefore that any one can be taken as representative. While it's possible that the authors might find something informative by looking at other mutants, I personally find the likelihood too low to justify additional experimentation along those lines.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The study by Mela and Momany describes the function of core septins of A. nidulans and links with the requirement of the cell wall integrity pathway and the sphingolipids which, are required for membrane and cell wall stability. The study is of interest for the fungal genetics community, and the authors have conducted a substantial amount of work in a field they have substantial experience. However, one of the main weaknesses of the manuscript is the assumption whether the CWI pathway controls de septin function of if the core septins control it.

      Major comments

      In the abstract, the authors claim that double mutant analysis suggested core septins function downstream of the final kinase of the cell wall integrity pathway. However, from the experiments showed, it is difficult to be convinced about that. The authors should make efforts do make it clear in the manuscript and the discussion.

      For example:

      -Line 25-26 (abstract): "Double mutant analysis suggested core septins function downstream of the final kinase of the cell wall integrity pathway."

      -Line 181-182; 219-220 (results) "Double mutant analyses suggest core septins modulate the cell wall integrity pathway downstream of the kinase cascade."

      This conclusion is one of the most important of the manuscript. However, this reviewer argues that it cannot be convincingly addressed if at least the phosphorylation ok the MAP kinase MpkA in the septins background is not evaluated under conditions of cell stress and sphingolipid biosynthesis inhibition. The genetic analysis alone maybe not enough to infer if septins control the CWI or the other way around. There may have compensatory effects when the CWI pathway is impaired. For example, most of the septins and mpkA double mutants seems to suppress the defect of the delta mpkA under cell wall stress. The authors should consider this idea.

      There is no clear evidences on the manuscript that the core septins AspA, AspB, AspC , and ApsD are epithastic in A. nidulans. Therefore, the authors choice of using different Asp deletion mutants as a proxy for all the septins mutants is questionable. For example, there is no mention of why AspB was chosen for Figure 2 (chitin and β-1,3-glucan deposition), and AspA was chosen for Figure 3 (chitin synthase localization) since these experiments are correlated. The same is true for Figure S1 where AspB and AspE were used. One can wonder if some of the core septins would have a major impact in the chitin content.

      In a related comment about Figure 3, the reallocation of chitin synthases in the absence of septins is very interesting, but consider that all the core septin genes should be tested. Without a fully functioning cell wall, the formation of septa will be impaired. It makes their results less surprising.

      Why was chitin synthase B chosen to be analyzed in terms of reallocation? How many chitin synthases are in the A. nidulans genome. This rationale should be explained in the manuscript.

      Figure 3 and Figure 4. The authors should make efforts to quantify the phonotypes they claim. They are overall very subtle, especially for Figure 3. Also, a decrease of fluorescence is a tricky observation that should be better reported by quantification.

      Again, in Figures 5, 6, and 7, it is clear that the different septins respond differently when ergosterol or sphingolipids synthesis is impaired. It also raises the question again if there are differences in the role of septin genes. Can the authors use previous information about differences in septin function to improve the model (Figure 8)

      For the above-discussed reasons, the conclusion on lines 384-388 (discussion) is not completely supported by the experiments shown in the manuscript. The authors need to make a better structured and more straightforward story emphasizing the stronger points and reducing descriptions of more speculative points. Minor comments Overall the figure caption could be shortened. They are too descriptive and contain details that are easily inferred for the images and from the materials and methods.

      The authors made every effort to cove the precedent literature, but the manuscript has 115 references. The authors should evaluate if all the cited literature is extremely relevant. The manuscript would benefit for that conciseness.

      Line 124, 493: Replace 10ˆ7, 10ˆ4 to 107, 104, etc

      The use of fludioxonil as a probe to detect cell wall impairment is perhaps out of context. This drug responds primarily to the HOG pathway and also respond to oxidative damage. So, these results could be suppressed.

      Line 140: "exposure" would be more appropriate than architecture. Please also consider that the difference in the cell wall reported in Figure S1 are very subtle. Are they relevant?

      Line 144: explain briefly what it is about and why it was chosen instead of the total detection of chitin sugar monomers. Line 538: Cell wall extraction section. Is this a new method? There is no supporting literature.

      The results described on lines 232-257 are marginal to the study and are not exploited by the authors to address the central question of the manuscript, which is the role of the CWI pathway, septins, and sphingolipids. This section could be suppressed or very briefly mentioned in the preceding section.

      Significance

      The topic of the manuscript is highly relevant to the fungal biology field and employs a very important genetic model. The cooperation of signaling pathways in mains aspects of fungal physiology is the main significant contribution of this manuscript.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Septins are highly conserved small GTPase cytoskeletal proteins that function as molecular scaffolds for dynamic cell wall and plasma membrane-remodeling, as well as diffusion barriers restricting movement of membrane and cell wall-associated molecules. Recent work has started to unravel the functional connections between the septins, cell wall integrity MAPK pathway signaling, and lipid metabolism, however most studies have focused on a small sub-set of septin monomers and/or were conducted in primarily yeast-type fungi.

      Here the authors show in the filamentous fungus A. nidulans that the core hexamer septins are required for proper coordination of the cell wall integrity pathway, that all septins are involved in lipid metabolism. Especially sphingolipid, but not sterols and phosphoinositides, contributes to the localization and stability of core septins at the plasma membrane.

      The experiments are simple and clear, therefore the conclusion is convincing. Fig.8 model, I would like to see the situation of septin mutant.

      Significance

      Since localization of cell wall synthesis proteins, lipid domains and septins are likely to depend on each other, sometimes difficult to evaluate the effect is direct or indirect. The comprehensive analyses like performed here are helpful to catch the overview in the field.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors use focused-ion beam (FIB) milling coupled with cryo-electron tomography and subtomogram averaging to uncover the structure of the elusive proximal and distal centrioles, as well as different regions of the axoneme in the sperm of 3 mammalian species: pig, horse, and mouse. The in-situ tomograms of the sperm neck region beautifully illustrate the morphology of both the proximal centriole, confirming the partial degeneration of mouse sperm, and intriguingly, asymmetry in the microtubule wall of pig sperm. In distal centrioles, the authors show that in all mammalian species, microtubule doublets of the centriole wall are organized around a pair of singlet microtubules. The presented segmentation of the connecting piece is beautiful and nicely shows the connecting piece forming a nine-fold, asymmetric, chamber the centrioles. The authors further use subtomogram averaging to provide the first maps of the mammalian central pair and identify sperm-specific radial spoke-bridging barrel structures. Lastly, the authors perform further subtomogram averaging to show to the connecting site of the outer dense fibers to the microtubule doublet of the proximal principal piece and confirm the presence of the TAILS microtubule inner protein complex (Zabeo et al, 2018) in the singlet microtubules occupying the tip of sperm tails.

      The manuscript provides the clearest insight into flagellar base morphology to date, giving insight into the morphological difference between different mammalian cilia and centriole types. The manuscript is suitable for publication, once the following questions are addressed.

      We are ecstatic that the reviewer shares our enthusiasm for this work. We are particularly grateful that the reviewer appreciates the significance of the unique, and hitherto under-explored biology of the sperm centrioles and the flagellar base.

      **Major Points:**

      How many centrioles and axonemes were used in generating the averages presented in the paper? If too few samples were used, especially in centrioles undergoing dramatic remodeling or degeneration, the reality of MIPs and MAPs being present might be completely affected. For instance, In figure 1d, the authors present a cryoET map of the centriole microtubule triplet. However, centrioles are divided into several regions with different accessory elements. Here, the authors could show the presence of only part of the A-C linker. The A-C linker covers only 40% of the centriole, so does it mean that this centriole is made only of the accessories that characterize the proximal side of the centriole? In the same line, what were the boundaries governing subtomogram extraction? For example, in the distal centriole, were microtubules extracted from just before the start of the transition zone, to the end of the microtubule vaulting, more pronounced at the end of the proximal region? There are known heterogeneities in centriole, as well as flagella, ultrastructure along the proximal distal axis. If no pre-classification was performed for subtomogram longitudinal position along with the centriole and axoneme, structural features may be averaged out, and or present and not reflecting their real longitudinal localization. The classification should be applied here if it was not the case.

      These are all valid points. Because there is no easy way to target the PC/DC when cryo-FIB milling, and because there is only one of each structure in every cell, the chances of catching them in ~150-nm-thin lamellae are slim (not to mention the number of things that can and do go wrong when doing cryo-ET on lamellae). As such, the averages of the PC were generated from 3 tomograms (3 cells) and those of the DC from 2 tomograms (2 cells). We do have more tomograms with the PC/DC, but these were used for segmentation/visual inspection since we only used the best tomograms for averaging. These numbers are not entirely atypical for cryo-FIB datasets; the only other in situ centriole structures are from 5-6 centrioles (from Chlamydomonas, from Le Guennec et al 2020 doi: 10.1126/sciadv.aaz4137 and Klena et al 2020 doi: 10.15252/embj.2020106246).

      To allow readers to adjust their interpretations according to the small number of cells analysed, we explicitly stated the number of animals/cells/tomograms used to generate averages in Table S1. Furthermore, we amended the text to clarify which regions of the centrioles our averages represent. These changes are detailed below:

      (1) proximal centriole

      The lamellae used for averaging PC triplets caught mostly the proximal end of the centriole, and essentially all of the particles come from the most proximal ~ 400 nm. In a sense, this was a form of pre-classification. We now state explicitly that our structure represents only the proximal region and that proximal/distal differences may be identified in the future (see section on distal centriole below). Despite the limited particle number, we are confident in the presence of the MIPs as these are also visible in the raw data (the striations in Fig. 1a, now Fig. 1d, for instance). Page 7, Line 165 was edited accordingly as well as the legend to Fig. 1.

      (2) distal centriole

      The subtomograms used for the DC average were extracted from the region of the distal centriole closest to the base of the axoneme (i.e; the region marked “distal centriole” in Fig. 2h-i). Because the DC doublet average in Fig. 2j was generated from very few particles, we tried to be very conservative when interpreting it. Page 9, Line 216 was edited accordingly likewise the legend to Fig. 2.

      (3) axoneme

      We did attempt to average the axoneme from different regions of flagella (midpiece, proximal principal piece, distal principal piece). This is shown in Fig. 6d-l. The major difference we found was at the doublet-ODF connection. We did not find any striking differences in MIP densities, or in radial spoke densities along the proximodistal axis. As such, the averages in Fig. 5 are from the entire principal piece (but not the midpiece), which we state in the figure legend.

      Because mammalian sperm flagella are very long, it is possible that we missed more subtle differences. We now state this in the Discussion (page 20, line 491):

      **Minor Points:**

      • In line 3, motile cilia are not only used to swim, they can move liquid or mucus for instance.

      Done. Page 3, line 64

      • In line 175, the authors stated " a prominent MIP associated with protofilament A9, was also reported in centrioles isolated from CHO cells (Greenan et al. 2018) and in basal bodies from bovine respiratory epithelia (Greenan et al 2020). Actually, this MIP has been seen in many other centrioles from other species, such as Trichonympha (https://doi.org/10.1016/j.cub.2013.06.061 ), Chlamydomonas, and Paramecium ( DOI: 10.1126/sciadv.aaz4137 ). Citing these studies will reinforce the evolutionary conservation of this MIP and therefore its potential crucial role in the A microtubule.

      We thank the reviewer for pointing out these very important papers, we added them to the manuscript (page 7, lines 175-176).

      • In Line178, the authors stated: "Protofilaments A9 and A10 are proposed to be the location of the seam (Ichikawa et 2017)". High-resolution cryoEM maps confirmed it: https://doi.org/10.1016/j.cell.2019.09.030 . This publication should be cited. Moreover, authors should also refer to this paper when discussing MIPs in the microtubule doublet.

      Done (page 7, lines 178-179 and page 13, line 329).

      We also now cite Ma et al (along with Ichikawa et al 2019 doi: 10.1073/pnas.1911119116 and Khalifa et al 2020 doi: 10.7554/eLife.52760) in the Discussion when alluding to high-resolution structures as a possible means of identifying MIPs (page 19, lines 479).

      • In Line 187-189 the authors stated, "We resolved density of the A-C linker (gold) which is associated with protofilaments C9 and C10." The A-C linker interconnects the triplets of the proximal centriole (Guichard et. al. 2013, Li et. al. 2019, Klena et. al. 2020) with distinct regions binding the C-tubule, as shown by the authors in gold, as well as an A-link, making contact with the A-tubule through various protofilaments in a species-specific manner, but always on protofilament A9. The authors may have identified the A-link, labeled in green, on the outside of protofilament A8/A9 in Figure 1d.

      We thank the reviewer for pointing this out. The position of the olive green density associated with A8/A9 is indeed consistent with the A-link, and this is also now illustrated more clearly in the new version of Fig. 1e (now Fig. 1h, see below). We accordingly edited page 8, lines 187-188.

      • In figure 1e, the authors provide a 9-fold representation of the centriole based on their map. How relevant is this model ? the distance between triplet is inconsistent here, which has not been observed before. Do they use true 3D coordinates to generate this model? The A-C linker, which is only partially reconstructed, does not contact the A microtubule. Is it really the case? did the authors see that the A-link density of the A-C linker has disappeared? If these points are not clearly specified, this representation might be misleading.

      In order to avoid misleading readers, we replaced this panel with a model generated directly by plotting back the averages into their original positions and orientations in the tomogram (new Fig. 1h). This model now shows that the olive green density on A8/A9 is in the right position to form part of the A-C linker (as Reviewer 1 correctly pointed out in their previous point). We have amended the figure legend accordingly. We also described how the plotback was generated in the Materials and Methods section (page 26, line 648).

      As the reviewer points out, the distance between triplets does indeed seem inconsistent in the plotback. This is an interesting observation, but we feel it is a bit too preliminary to discuss in detail here. This can be explored in a follow-up study more focused on sperm centriole geometry.

      • The nomenclature regarding MIPs is sometimes confusing in this manuscript. For example, in lines 228-229 "We then determined the structure of DC doublets, revealing the presence of MIPs distinct from those in the PC." Does this include the gold and turquoise labeled structures in Figure 2j? These densities appear to correspond to the inner scaffold stem in the gold density presented in Figure 2j, and armA, presented in the turquoise density (Li et. al. 2011, Le Guennec et. al. 2020). The presence of this Stem here is important as it correlates with the presence of the molecular player making the inner scaffold (POC5, POC1B, CENTRIN): https://doi.org/10.1038/s41467-018-04678-8

      While we were initially very conservative with interpreting the DC doublet average (as stated above it comes from very few particles), we agree with the reviewer’s assessment that the gold and turquoise densities in Fig. 2j are consistent with the Stem and armA respectively of the inner scaffold. Because the inner scaffold contributes to centriole rigidity, it will be interesting to determine if and how it changes during remodelling of the atypical DC in mammalian sperm. Intriguingly, at least some inner scaffold components (including POC5, POC1B) reorganise into two rods in the mammalian sperm DC (Fishman et al 2018 doi: 10.1038/s41467-018-04678-8). We expanded the section on the DC average (page 9, lines 218-220):

      • The connecting piece is composed of column vaults emanating from the striated columns is compelling and beautiful segmentation data. However, it is important to note how many pig sperm proximal centrioles had immediate-short triplet side contact with the Y-shaped segmented column 9, as well as in how many mouse centrioles have the two electron-dense structures flanking the striated columns.

      Done. Material and Methods Page 25, lines 615-619.

      The resolution of the mammalian central pair is an important development brought by this work. The structural similarity between the central pair of pig and horse is convincing. However, with only 281 subtomograms being averaged for the murine central pair, corresponding to an estimated resolution of 49Å, the absence of the helical MIP of C1 with 8 nm periodicity suggests that there is simply not enough signal to capture it in the average. The same could be said for the smaller MIP displayed in Figure 4 c, panel ii. This point should be clearly stated.

      We agree with the reviewer that the quality of the mouse CPA structure is not on par with the pig and horse CPA structures. We now explicitly state this caveat in the text (pages 11, lines 276-277):

      Another piece of compelling data presented in this study is the attachment of the outer dense fibers to the axoneme of the midpiece and proximal and distal principal pieces. From the classification data presented along the flagellar length, it is clear that the only ODF contact made with the axoneme is at the proximal principle plate. However, this is far from obvious in the native top view images presented. Is it possible to include a zoomed inset of the connection between the A-tubule and ODF connection?

      We are very happy that the reviewer finds this data exciting. As Fig. 6 is quite cluttered as is, we instead tried to better annotate the cross-section views of the axoneme by tracing one doublet-ODF pair in each image (or only a doublet in the case of the distal principal piece). This shows that there is a gap between the doublet and the ODF in the midpiece, and that there is no such gap in the principal piece. We also hope that annotating one doublet-ODF pair helps the reader see that the same pattern holds true for the other doublets/ODFs. The legend to Fig. 6 was changed accordingly.

      Reviewer #1 (Significance (Required)):

      This work is of good quality and provides crucial information on the structure of centriole and axoneme in 3 different species. This work complements well the previous works.

      The audience for this type of study is large as it is of interest to researchers working on centrioles, cilium, and sperm cell architecture.

      We are pleased the reviewer appreciate the quality of our work and see the interest for broad audience.

      My expertise is cryo-tomography and centriole biology

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study, Leung et al. used state-of-the-art EM imaging techniques, including FIB cryo-milling, Volta Phase plate, cryo-electron tomography and subtomogram averaging, to study the structure of sperm flagella from three mammalian species, pig, horse and mouse. First, they described two unique centrioles in the sperm, the PC and the DC. They found the PCs are composed of a mixture of triplet and doublet MTs. In contrast, the DCs are composed mainly of doublet and singlet MTs. By using subtomogram averaging, they identified a number of accessory proteins, including many MIPs bound to the MT wall. Many are unique to the mammalian sperm. They further described the connecting piece region of the sperm enclosing the centrioles and found an asymmetric arrangement. Furthermore, the authors presented the structure of sperm axonemes from all three species. These include the DMT and the CPA. Finally, they described the tail region of the sperm and described how the DMTs transitioned to the singlet MTs.

      This is a beautiful piece of work! It is by far the most comprehensive structural study of mammalian sperm cells. These findings will serve as a valuable resource for structure and function analysis of the mammalian flagella in the future. Now the stage is set for identifying the molecular nature of the structures and densities described in this study.

      We thank the reviewer for their positive evaluation! We are very happy that they share our excitement for the work, and that they also see it as “setting the stage” for future studies at the molecular level.

      The manuscript is clearly written. The data analysis is thorough. The conclusions are solid and not overstated. I don't have any major issues for its publication. A number of minor suggestions are listed below. Most are related to the figures and figure legends.

      Figure 1d, the figure legend should mention this is the subtomogram average of PC triplet MTs from pig sperm, though this is mentioned in the text. Also, for convenience, the color codes for the MIPs should be mentioned in the figure legend.

      Done.

      Figure 2J, similarly, the figure legend should mention this is the subtomogram average of DC doublets. It also needs a description of the color codes of the identified MIPs. For the DMT, please indicate the A- and B-tubule, which are colored in light or dark blue.

      Done, except we would prefer not to enumerate the MIPs as we did not name them nor discuss them extensively in the main text as we do not want to over-interpret the MIPs at this point as the average is from relatively small number of particles. However, we did specify that the gold and turquoise densities on the luminal surface are consistent with the inner scaffold. The figure legend was edited accordingly.

      Line 228, "We then determined the structure of DC doublet by subtomogram averaging"

      Done.

      For both Fig 2 and Fig 3. the DC doublets are colored in dark and light blue, please specify which is the A- or B-tubule in the figure legends.

      Done.

      Line 273, need space between "goldenrod"

      We would prefer to keep “goldenrod” spelled as is since this is how the color is referred to in Chimera and ChimeraX.

      Figure 4. need to expand the figure legend. Panels I, ii, iii, iv, are cut-through view of the lumen of CPA microtubules C1 and C2.

      Done.

      Line 338, Interestingly, the RS1 barrel is radially distributed asymmetrically around the axoneme

      Done.

      Figure 5, need color codes for the arrowheads (light pink, pink, magenta) in panels i~n,

      Done.

      Figure 7, (a-c) please use arrowheads to indicate the location of caps in the singlet MT.

      Done.

      Reviewer #2 (Significance (Required)):

      This is a beautiful and significant work - by far the most comprehensive analysis of mammalian sperm structure

      We are thrilled the reviewer appreciate the novelty of our work.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This is a very interesting study that explores the structural diversity of mammalian sperm flagella, in pig, mouse and horse, at high resolution using cryo-FIB milling and cryo-tomography. The study provides the first in situ cryo-EM structure of a mammalian centriole and describes a number of microtubule associated structures, such as MIPs and plugs at the plus-end of microtubules, that were not been reported so far. Additionally, the authors identify several asymmetries in the overall structure of the flagellum in the three species, which have implications for the understanding of the flagellar beat and waveform geometry in sperm, which are discussed by the authors. Although this study does not provide mechanistic novel information on the function of the described structures, it will undoubtedly serve as a reference for future theoretical and empirical work on the role of these structures in shaping the flagellar beat.

      With the exception of a couple of "eclectic word choices" in the Introduction (see detailed feedback in Minor Comments), the manuscript is also well written. Image acquisition and analysis are sound.

      We thank the reviewer for positively evaluating our work. We are glad that they feel our study will “serve as a reference” to inform future studies.

      However, I have some suggestions that should help the authors to strengthen their claims and present their results. The study is in principle suitable to be published, after the following points will be addressed:

      **Major comments:**

      • A major concern is that it is not clear how many animals, sperms and lamellae the authors used to acquire the data presented in the manuscript. This information needs to be provided, because it not uncommon to encounter aberrant flagella, even in a wildtype animal. The authors should state how many animals, and how many flagella per each animal were analyzed, in order to allow the reader to have an opinion on the reliability of their observations.

      • The figures are esthetically pleasing; however, the figures legends should be carefully revised to include necessary information about color codes, image annotations.

      We thank the reviewer for raising these points. We completely agree that the numbers of animals and cells are important pieces of information. As such, we now explicitly state the number of animals/cells/tomograms used for each average in Table S1. For more qualitative observations (such as the relationship between the asymmetry of the pig sperm PC and the Y-shaped segmented columns), we now state in the number of cells and animals in which we see each feature (see detailed response to Reviewer 1).

      **Minor comments:**

      • Line 26. I do not think that the word "menagerie" is properly used in this context.

      • Line 29. The same is true for the word "Bewildering" in this sentence.

      We apologise for our somewhat eclectic word choice. We see the reviewer’s point that unconventional word choice may distract readers, so we replaced these two words with ‘diverse’ and ‘an extensive’, respectively.

      • Line 286 "Our structures of the CPA are the first from any mammalian system, and our structures of the doublets are the first from any mammalian sperm, thus filling crucial gaps in the gallery of axoneme structures." Sentences like this one would fit much better in the Conclusions or at least in the Discussion.

      We thank the reviewer for this suggestion, but we would prefer to keep this sentence where it is, if possible. We think it is useful to tell the audience upfront why these structures are significant, especially since readers who aren’t deep in the field may be bogged down by all the details.

      • Line 377 "Large B-tubule MIPs have so far only been seen in human respiratory cilia (Fig. 5j) and in Trypanosoma (the ponticulus, Fig. 5n), but the morphometry of these MIPs differs from the helical MIPs in mammalian sperm." Please insert the citations for the studies about respiratory cilia and Trypanosoma flagella.

      Done.

      • In Figure 1. What do the stars shown in panel a and a' indicate?

      We indeed failed to specify what the asterisks/stars indicate. They are meant to emphasise that the electron-dense material in the lumen of the PC is continuous with the CP. We have now specified this in the text (page 10, lines 245).

      Given the complexity of the structures that compose the flagellar system of sperms, it would be helpful to add an illustration of the sperm with careful annotation of the centriole structures and the various segments of the flagellum.

      This is an excellent suggestion. To help orient readers, we added three panels to Fig. 1 (Fig. 1a-c) showing low-magnification images of whole sperm cells. We annotated different parts of the flagellum (neck, midpiece, principal piece, endpiece) so that readers can refer back to these panels in case they want to know which part of the cell the averages are from.

      • Figure 2. Explanation of the used color codes is missing. Additionally, the authors should include an explanation for the black and white arrows and for the 2 insets in i.

      Done. For the color code, please see response to Reviewer 2. For the black and white arrows, we edited the figure legend.

      • In "(j) In situ structure of the pig sperm DC with the tubulin backbone in grey and microtubule inner protein densities colored individually" ...it should be written "...sperm DC microtubule doublet..."

      Done.

      • In this figure, but also in every other figure that shows centriole, axoneme, or even microtubule averages it is important to indicate the microtubule polarity. Please add the symbol + and - to indicate microtubule polarity in the figures.

      Done. In order to avoid overcrowding, we only labelled the pig structures as the horse and the mouse structures are always shown in the same orientations as the pig.

      • Figure 3. Additional to the images in a,b, and c, the original tomographic slices (without segmentation) should be shown here, to allow the reader to visualize the structure.

      We now include three additional supplementary movies slicing through the respective tomograms.

      • Figure 7. Scale bars are missing in d-f.

      Done.

      • Scale bars are missing in most Supplementary figures.

      Done.

      • Table S1. The Information about horse and mouse centriole data is missing.

      The reviewer is correct, but this information is missing because we did not average from the horse and the mouse. For the mouse, the triplets were in various stages of degeneration, resulting in heterogeneity that precluded us from averaging. For the horse, we simply did not catch enough centrioles to generate a meaningful structure.

      Reviewer #3 (Significance (Required)):

      This study provides several novel structural insights in to the sperm flagellum structure that have implications for the understanding of the flagellar beat and waveform geometry in sperm. Although this study does not provide mechanistic novel information on the function of the described structures, it will undoubtedly serve as a reference for future theoretical and empirical work on the role of these structures in shaping the flagellar beat.

      Great to see the reviewer appreciate the novelty of our work.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This is a very interesting study that explores the structural diversity of mammalian sperm flagella, in pig, mouse and horse, at high resolution using cryo-FIB milling and cryo-tomography. The study provides the first in situ cryo-EM structure of a mammalian centriole and describes a number of microtubule associated structures, such as MIPs and plugs at the plus-end of microtubules, that were not been reported so far. Additionally, the authors identify several asymmetries in the overall structure of the flagellum in the three species, which have implications for the understanding of the flagellar beat and waveform geometry in sperm, which are discussed by the authors. Although this study does not provide mechanistic novel information on the function of the described structures, it will undoubtedly serve as a reference for future theoretical and empirical work on the role of these structures in shaping the flagellar beat. With the exception of a couple of "eclectic word choices" in the Introduction (see detailed feedback in Minor Comments), the manuscript is also well written. Image acquisition and analysis are sound.

      However, I have some suggestions that should help the authors to strengthen their claims and present their results. The study is in principle suitable to be published, after the following points will be addressed:

      Major comments:

      • A major concern is that it is not clear how many animals, sperms and lamellae the authors used to acquire the data presented in the manuscript. This information needs to be provided, because it not uncommon to encounter aberrant flagella, even in a wildtype animal. The authors should state how many animals, and how many flagella per each animal were analyzed, in order to allow the reader to have an opinion on the reliability of their observations.
      • The figures are esthetically pleasing; however, the figures legends should be carefully revised to include necessary information about color codes, image annotations.

      Minor comments:

      • Line 26. I do not think that the word "menagerie" is properly used in this context.
      • Line 29. The same is true for the word "Bewildering" in this sentence.
      • Line 286 "Our structures of the CPA are the first from any mammalian system, and our structures of the doublets are the first from any mammalian sperm, thus filling crucial gaps in the gallery of axoneme structures." Sentences like this one would fit much better in the Conclusions or at least in the Discussion.
      • Line 377 "Large B-tubule MIPs have so far only been seen in human respiratory cilia (Fig. 5j) and in Trypanosoma (the ponticulus, Fig. 5n), but the morphometry of these MIPs differs from the helical MIPs in mammalian sperm." Please insert the citations for the studies about respiratory cilia and Trypanosoma flagella.
      • In Figure 1. What do the stars shown in panel a and a' indicate? Given the complexity of the structures that compose the flagellar system of sperms, it would be helpful to add an illustration of the sperm with careful annotation of the centriole structures and the various segments of the flagellum.
      • Figure 2. Explanation of the used color codes is missing. Additionally, the authors should include an explanation for the black and white arrows and for the 2 insets in i.
      • In "(j) In situ structure of the pig sperm DC with the tubulin backbone in grey and microtubule inner protein densities colored individually" ...it should be written "...sperm DC microtubule doublet..."
      • In this figure, but also in every other figure that shows centriole, axoneme, or even microtubule averages it is important to indicate the microtubule polarity. Please add the symbol + and - to indicate microtubule polarity in the figures.
      • Figure 3. Additional to the images in a,b, and c, the original tomographic slices (without segmentation) should be shown here, to allow the reader to visualize the structure.
      • Figure 7. Scale bars are missing in d-f.
      • Scale bars are missing in most Supplementary figures.
      • Table S1. The Information about horse and mouse centriole data is missing.

      Significance

      This study provides several novel structural insights in to the sperm flagellum structure that have implications for the understanding of the flagellar beat and waveform geometry in sperm. Although this study does not provide mechanistic novel information on the function of the described structures, it will undoubtedly serve as a reference for future theoretical and empirical work on the role of these structures in shaping the flagellar beat.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this study, Leung et al. used state-of-the-art EM imaging techniques, including FIB cryo-milling, Volta Phase plate, cryo-electron tomography and subtomogram averaging, to study the structure of sperm flagella from three mammalian species, pig, horse and mouse. First, they described two unique centrioles in the sperm, the PC and the DC. They found the PCs are composed of a mixture of triplet and doublet MTs. In contrast, the DCs are composed mainly of doublet and singlet MTs. By using subtomogram averaging, they identified a number of accessory proteins, including many MIPs bound to the MT wall. Many are unique to the mammalian sperm. They further described the connecting piece region of the sperm enclosing the centrioles and found an asymmetric arrangement. Furthermore, the authors presented the structure of sperm axonemes from all three species. These include the DMT and the CPA. Finally, they described the tail region of the sperm and described how the DMTs transitioned to the singlet MTs.

      This is a beautiful piece of work! It is by far the most comprehensive structural study of mammalian sperm cells. These findings will serve as a valuable resource for structure and function analysis of the mammalian flagella in the future. Now the stage is set for identifying the molecular nature of the structures and densities described in this study.

      The manuscript is clearly written. The data analysis is thorough. The conclusions are solid and not overstated. I don't have any major issues for its publication. A number of minor suggestions are listed below. Most are related to the figures and figure legends.

      Figure 1d, the figure legend should mention this is the subtomogram average of PC triplet MTs from pig sperm, though this is mentioned in the text. Also, for convenience, the color codes for the MIPs should be mentioned in the figure legend.

      Figure 2J, similarly, the figure legend should mention this is the subtomogram average of DC doublets. It also needs a description of the color codes of the identified MIPs. For the DMT, please indicate the A- and B-tubule, which are colored in light or dark blue.

      Line 228, "We then determined the structure of DC doublet by subtomogram averaging"

      For both Fig 2 and Fig 3. the DC doublets are colored in dark and light blue, please specify which is the A- or B-tubule in the figure legends.

      Line 273, need space between "goldenrod"

      Figure 4. need to expand the figure legend. Panels I, ii, iii, iv, are cut-through view of the lumen of CPA microtubules C1 and C2.

      Line 338, Interestingly, the RS1 barrel is radially distributed asymmetrically around the axoneme

      Figure 5, need color codes for the arrowheads (light pink, pink, magenta) in panels i~n,

      Figure 7, (a-c) please use arrowheads to indicate the location of caps in the singlet MT.

      Significance

      This is a beautiful and significant work - by far the most comprehensive analysis of mammalian sperm structure

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this study, the authors use focused-ion beam (FIB) milling coupled with cryo-electron tomography and subtomogram averaging to uncover the structure of the elusive proximal and distal centrioles, as well as different regions of the axoneme in the sperm of 3 mammalian species: pig, horse, and mouse. The in-situ tomograms of the sperm neck region beautifully illustrate the morphology of both the proximal centriole, confirming the partial degeneration of mouse sperm, and intriguingly, asymmetry in the microtubule wall of pig sperm. In distal centrioles, the authors show that in all mammalian species, microtubule doublets of the centriole wall are organized around a pair of singlet microtubules. The presented segmentation of the connecting piece is beautiful and nicely shows the connecting piece forming a nine-fold, asymmetric, chamber the centrioles. The authors further use subtomogram averaging to provide the first maps of the mammalian central pair and identify sperm-specific radial spoke-bridging barrel structures. Lastly, the authors perform further subtomogram averaging to show to the connecting site of the outer dense fibers to the microtubule doublet of the proximal principal piece and confirm the presence of the TAILS microtubule inner protein complex (Zabeo et al, 2018) in the singlet microtubules occupying the tip of sperm tails. The manuscript provides the clearest insight into flagellar base morphology to date, giving insight into the morphological difference between different mammalian cilia and centriole types. The manuscript is suitable for publication, once the following questions are addressed.

      Major Points: How many centrioles and axonemes were used in generating the averages presented in the paper? If too few samples were used, especially in centrioles undergoing dramatic remodeling or degeneration, the reality of MIPs and MAPs being present might be completely affected. For instance, In figure 1d, the authors present a cryoET map of the centriole microtubule triplet. However, centrioles are divided into several regions with different accessory elements. Here, the authors could show the presence of only part of the A-C linker. The A-C linker covers only 40% of the centriole, so does it mean that this centriole is made only of the accessories that characterize the proximal side of the centriole? In the same line, what were the boundaries governing subtomogram extraction? For example, in the distal centriole, were microtubules extracted from just before the start of the transition zone, to the end of the microtubule vaulting, more pronounced at the end of the proximal region? There are known heterogeneities in centriole, as well as flagella, ultrastructure along the proximal distal axis. If no pre-classification was performed for subtomogram longitudinal position along with the centriole and axoneme, structural features may be averaged out, and or present and not reflecting their real longitudinal localization. The classification should be applied here if it was not the case.

      Minor Points:

      • In line 3, motile cilia are not only used to swim, they can move liquid or mucus for instance.
      • In line 175, the authors stated " a prominent MIP associated with protofilament A9, was also reported in centrioles isolated from CHO cells (Greenan et al. 2018) and in basal bodies from bovine respiratory epithelia (Greenan et al 2020). Actually, this MIP has been seen in many other centrioles from other species, such as Trichonympha (https://doi.org/10.1016/j.cub.2013.06.061 ), Chlamydomonas, and Paramecium ( DOI: 10.1126/sciadv.aaz4137 ). Citing these studies will reinforce the evolutionary conservation of this MIP and therefore its potential crucial role in the A microtubule.
      • In Line178, the authors stated: "Protofilaments A9 and A10 are proposed to be the location of the seam (Ichikawa et 2017)". High-resolution cryoEM maps confirmed it: https://doi.org/10.1016/j.cell.2019.09.030 . This publication should be cited. Moreover, authors should also refer to this paper when discussing MIPs in the microtubule doublet.
      • In Line 187-189 the authors stated, "We resolved density of the A-C linker (gold) which is associated with protofilaments C9 and C10." The A-C linker interconnects the triplets of the proximal centriole (Guichard et. al. 2013, Li et. al. 2019, Klena et. al. 2020) with distinct regions binding the C-tubule, as shown by the authors in gold, as well as an A-link, making contact with the A-tubule through various protofilaments in a species-specific manner, but always on protofilament A9. The authors may have identified the A-link, labeled in green, on the outside of protofilament A8/A9 in Figure 1d.
      • In figure 1e, the authors provide a 9-fold representation of the centriole based on their map. How relevant is this model ? the distance between triplet is inconsistent here, which has not been observed before. Do they use true 3D coordinates to generate this model? The A-C linker, which is only partially reconstructed, does not contact the A microtubule. Is it really the case? did the authors see that the A-link density of the A-C linker has disappeared? If these points are not clearly specified, this representation might be misleading.
      • The nomenclature regarding MIPs is sometimes confusing in this manuscript. For example, in lines 228-229 "We then determined the structure of DC doublets, revealing the presence of MIPs distinct from those in the PC." Does this include the gold and turquoise labeled structures in Figure 2j? These densities appear to correspond to the inner scaffold stem in the gold density presented in Figure 2j, and armA, presented in the turquoise density (Li et. al. 2011, Le Guennec et. al. 2020). The presence of this Stem here is important as it correlates with the presence of the molecular player making the inner scaffold (POC5, POC1B, CENTRIN): https://doi.org/10.1038/s41467-018-04678-8
      • The connecting piece is composed of column vaults emanating from the striated columns is compelling and beautiful segmentation data. However, it is important to note how many pig sperm proximal centrioles had immediate-short triplet side contact with the Y-shaped segmented column 9, as well as in how many mouse centrioles have the two electron-dense structures flanking the striated columns.

      The resolution of the mammalian central pair is an important development brought by this work. The structural similarity between the central pair of pig and horse is convincing. However, with only 281 subtomograms being averaged for the murine central pair, corresponding to an estimated resolution of 49Å, the absence of the helical MIP of C1 with 8 nm periodicity suggests that there is simply not enough signal to capture it in the average. The same could be said for the smaller MIP displayed in Figure 4 c, panel ii. This point should be clearly stated.

      Another piece of compelling data presented in this study is the attachment of the outer dense fibers to the axoneme of the midpiece and proximal and distal principal pieces. From the classification data presented along the flagellar length, it is clear that the only ODF contact made with the axoneme is at the proximal principle plate. However, this is far from obvious in the native top view images presented. Is it possible to include a zoomed inset of the connection between the A-tubule and ODF connection?

      Significance

      This work is of good quality and provides crucial information on the structure of centriole and axoneme in 3 different species. This work complements well the previous works. The audience for this type of study is large as it is of interest to researchers working on centrioles, cilium, and sperm cell architecture.

      My expertise is cryo-tomography and centriole biology

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      This manuscript follows on from previous work from the Rhind lab to investigate whether the load of MCMs at origins is a factor in when the origin activate (as a population average) during S phase. The authors use budding yeast and a auxin degron system to modulate the levels of an MCM subunit. This allows them to titrate down the concentration of the MCM hexamer and observe the effect. Crucially, they assay both the reduction in MCM load at origins and the subsequent replication dynamics in the same experiment. This is the power of their approach and allows them to rigorously test their hypothesis.

      **Major comments**

      1.I found the introductory paragraph discussing the Rhind lab hypothesis about the possibility of multiple MCM being loaded at origins somewhat misleading. The first paragraph of the discussion was much clear. However, I feel that the introductory paragraph should deal with the difference between the two proposals: 0-1 MCM-DH per origin (de Moura et al), vs 0-50+ MCM-DH (Yang et al). It s also important to note that Foss et al find that "In budding yeast, [MCM] complexes were present in sharp peaks comprised largely of single double-hexamers" - i.e. consistent with 0-1 MCM-DH per origin.

      To improve the balance of the introduction, I think the authors should briefly introduce the concepts behind the 0-1 MCM-DH per origin; this was defined as origin competence by Stillman and clearly described by McCune et al (2008; see figure 8) prior to the work from de Moura et al.

      Furthermore, in the discussion the authors should be more even-handed. To date there is no data to conclusively rule one way or the other in distinguishing between single vs multiple MCMs. The authors cite Lynch et al and state "overexpression of origin-activating factors in S phase causes most all origins to fire early in S phase, consistent with most origins having at least one MCM loaded". However, Lynch et al report equivalent (roughly equal) origin efficiencies, but the assay doesn't distinguish between all going up to high efficiency or all going to a lower intermediary efficiency. Given that fork factors (polymerases, etc) are likely to become limiting at some point (or checkpoints could be activated due to limited dNTP supplies) it would seem plausible that uniform origin efficiency could be a consequence of less than maximal origin firing. As part of this discussion it would be useful for the authors to include what conclusions have been reached on MCM load from in vitro systems (with chromatin substrates).

      Because the main focus of the paper is not dependent on whether MCM stoichiometry varies from 0 to 1 or 0 to many, we had relegated our discussion of absolute stoichiometry to the Discussion. However, it is clear from multiple reviewer's comments that it is something very much on readers minds. Therefore, we have now included a brief introduction to the 0-to-1 and 0-to-many scenarios in the Introduction and moved the bulk of the discussion of the data supporting the two scenarios to the Discussion.

      2.The authors are not the first to look at the consequence of reduced MCM concentrations on origin function. This was essentially the basis for the MCM screen undertaken by Bik Tye's lab that first identified the MCM genes. In addition to temperature sensitive mutants, the Tye group also examined heterozygotes (Lei et al., 1996) to show differential effect on the ability of two origins to support plasmid replication. The authors finds are entirely consistent with these early studies, particularly since ARS416 (formerly ARS1) was found to highly sensitive to reduced MCM levels and ARS1021 (formerly ARS121) was found to be insensitive to MCM levels. The authors find a signifiant reduction in MCM load at ARS416, but the MCM load at ARS1021 is unaltered by reduced MCM concentration. It would be worth the authors noting this consistency. The authors do cite the Lei study, but not in this context. The original MCM screen was published here:

      Maine, G., Sinha, P., Tye, B. (1984). Mutants of S. cerevisiae defective in the maintenance of minichromosomes Genetics 106(3), 365 - 385.

      Furthermore, at the end of the discussion the authors state that "it will be interesting to dissect the specific cis- and trans-acting factors that make origins sensitive or resistant to changes in MCM levels". The equivalent effect reported by the Tye lab has already been dissected by the Donaldson lab (Nieduszynski et al., 2006) and perhaps it would be worth briefly mentioning their findings.

      We have included both of these literature precedents in the Discussion.

      3.The authors should show the flow cytometry data for each of their cell cycle experiments, if only in supplementary figures. This is important to allow a reader (and reviewer) to judge the level of synchrony achieved when interpreting the results.

      This data is now included as Figure S1

      4.I think the authors should show the ChIP signal at some example origins, including ones sensitive and insensitive to the reduction in MCM concentration. Currently all the high resolution ChIP data (i.e. over 1400 bp, e.g. Fig 3a) is presented as meta-analyses of many origins.

      We will include this analysis in a subsequent revision.

      5.When describing the results in Fig 4a the authors focus on changes (highlighted in black boxes) that fit their expectation. However, there are other sites that should at least be mentioned that don't seem to fit the authors model, e.g. ARS517, ARS518. It would be worth discussing what fraction of the timing data can be explained by the reduced MCM load.

      We now explicitly point out that Figures 4c and 4d address this issue of the robustness of the correlation. Although there is significant variation, as the reviewer points out, the trend is seen genome wide. As it happens, both ARS517 and ARS518 do fit the model reasonably well. They have intermediate loss of MCM signal and intermediate delay in timing.

      **Minor comments**

      -These data, rather than this data (throughout).

      I suspect that the journal style and/or copy editors will make the final call. However, I will point out that although 'data' is most certainly plural in Latin, its predominate modern English usage is as a mass noun, such as water or sand or information. In general, users do not think of, or use, 'data' as a collection of discrete elements, each on being a 'datum', a contention supported by the very infrequent use of the word datum. For instance, in ChIP-seq experiment, what is a datum? Each individual read? Each individual nucleotide in each read? The quality score for each individual nucleotide in each read? Each pixel in each image from the sequencer? When one wants to refer to an individual piece of data, common usage is to refer to a data point, just as one would refer to a grain of sand. Moreover, if 'data' were plural, it would be incorrect to use it in phrases such as "there is very little data available". Would the review really suggest using "there are very few data available"?

      -the authors should clearly state in figure legends what window size has been used in analysing genomic data.

      All analyses were done using 1kb windows, as now stated in the figure legends.

      -in figure 2a the authors show pairwise comparisons between conditions, it would be nice to see the 3rd pairwise comparisons perhaps as a supplementary figure

      We have included the third comparison in Figure 2a.

      -in figure 2c it would be clearer to use the same colour for the lines and the points

      The regression lines are in the same colors as the data points they fit. x=y is shown in blue for comparison, as now noted in the figure legend.

      -the authors should avoid the use of red/green colour combinations in their figures (see: https://thenode.biologists.com/data-visualization-with-flying-colors/research/)

      All figures will be redrawn in colorblind-accessible colors in a subsequent revision.

      -in the text the authors state "ORC binding to the ACS and subsequent MCM loading is a directional process dependent on a ACS- site and a similar but inverted nearby sequence (Xu et al., 2006)". I think it would be more appropriate to cite the following study here:

      Coster, G., Diffley, J. (2017). Bidirectional eukaryotic DNA replication is established by quasi-symmetrical helicase loading Science (New York, NY) 357(6348), 314 - 318. https://dx.doi.org/10.1126/science.aan0063

      The Coster reference has been included.

      -the list of factors that influence replication timing should include Rif1, whereas it is less clear that Rpd3 acts within the unique genome (as opposed to indirectly via repetitive DNA, e.g. rDNA)

      Rif1 has been added to the list.

      -figure 4 - it might help to mark the centromere on panel a. Also, why do the ChIP peaks and annotated origins appear to line up so poorly?

      The shift between the peaks and the ACS positions was introduced during the construction of the figure. Thanks for catching it. The alignment has been corrected and the centromere annotation has been added.

      -figure 4d - would it not be better to use fraction of lost MCM signal on the x-axis as in previous figures?

      If T_rep was a linear function of MCM stoichiometry, fraction lost would work as well as amount lost. However, we find that there is a lower correlation between fraction of MCM signal lost and T_rep delay than between absolute MCM signal lost and T_rep delay, suggesting a more complicated relationship.

      -"with galactose or raffinose, to induce or repress Mcm2-7 overexpression, respectively." This is incorrect, raffinose does not repress this promoter (that requires glucose).

      Fixed.

      -the S. pombe spike in is a great addition to the over expression experiments. It's a shame that it wasn't included in the auxin experiments.

      Yes, we agree.

      -why does the data in fig 5d appear to be at much lower resolution that the previous ChIP data?

      The resolution was inadvertently reduced during the rendering of the figure. The resolution has restored.

      -in the sequencing analysis pipeline for MCM ChIP the authors use a 650 bp upper size limit; why have such a large threshold compared to the size of a nucleosome? Are the analyses and findings sensitive to this size threshold?

      Although the MNase digestion was optimized to produce mostly mononucleosomal-sized digestion, some di- and very little tri- nucleosomal fragments still remain. In order to capture as many of the MCM-protected immunoprecipitated fragments as possible, the upper limit was set at 650 bp (up to 4 nucleosomes-worth of DNA). However, there is a very minimal contribution from fragments larger than mononucleosomes, qualitatively as well as quantitatively in 1kb windows around origins. Figure 3a provides a qualitative depiction of the contribution of dinucleosomes (input, ~300bp).

      -the repliscope package was published here:

      Batrakou, D., Müller, C., Wilson, R., Nieduszynski, C. (2020). DNA copy-number measurement of genome replication dynamics by high-throughput sequencing: the sort-seq, sync-seq and MFA-seq family. Nature Protocols 15(3), 1255 - 1284. https://dx.doi.org/10.1038/s41596-019-0287-7

      The reference has been corrected.

      Reviewer #1 (Significance):

      This work builds upon a body of work from the Rhind group (and others) to determine the contribution of MCM load to replication origin activation dynamics. To my mind this is the most convincing dataset and analysis to date and goes a long way to supporting the model that the efficiency of MCM loading is a major factor in determining the mean replication time of an origin. As the authors state, they are still not able to distinguish between two different models of MCM load (single vs multiple). It would be interesting for the authors to discuss how these two models could be distinguished in the future (perhaps with single cell/molecule experiments).

      This study will be of interest to those in the fields of DNA replication and genome stability.

      My field of expertise is DNA replication and replication origin function.

      Reviewer #2 (Evidence, reproducibility and clarity):

      **Summary:**

      This is a nice study that characterizes the consequences of limiting or increasing Mcm expression on the replication program. Prior ChIP experiments in yeast have observed that not all origins exhibit the same level of Mcm enrichment and that increased mcm enrichment was correlated with origin activity. These observations led to two different models -- a) that multiple Mcm2-7 double hexamer complexes are loaded at some origins and b) a probabilistic model where the differential enrichment of Mcm2-7 reflected the fraction of cells in a population that had loaded the Mcm2-7 complex at a specific origin. While the titration experiments presented here don't provide any conclusive support for either model, they do provide some novel and relevant insights for the replication field, in part, due to the increased resolution and quantification afforded by the MNase ChIP-seq approach (and S. pombe spike in). The authors very nicely demonstrate that origins are differentially sensitive to Mcm2-7 depletion and that loss of Mcm2-7 loading results in an altered replication timing profile. The origins most impacted by loss of Mcm2-7 are 'weak' origins as described by the Fox group. Intriguingly, the authors find that the 5X overexpression of Mcm2-7 does not perturb the relative Mcm2-7 loading at individual origins, but rather instead globally represses Mcm2-7 association at all origins. They also find that overexpression of both Cdt1 and Mcm2-7 is detrimental to the cell (although no obvious replication phenotype was observed). Finally, the authors present a reasonable interpretation of their data in the context of models for replication timing which was very well articulated.

      **Major Comments:**

      From the methods it appears that different analyses were performed with different replicates?

      "Replicate #1 was used for all analyses except for V plots, for which the higher resolution Replicate #2 was used."

      Ideally all of the conclusions should be supported by all the replicates independently, or if the replicates are concordant -- they should be merged (at a similar sequencing depth) prior to doing the analyses.. Even the v-plots with merged replicates will be informative due to the greater sequencing depth.

      Though we agree that greater sequencing depth would be informative for aggregation analysis, we think that one of the main strengths of our study is the analysis of MCM quantitation and replication timing in the same population of cells. Although the experiments were performed in exactly the same way, there is always slight biological or temporal differences between the replicates, due to the complicated nature of the experimental design. This variation increases the noise between the MCM ChIP and the replication timing analyses. Therefore, were analyzed the replicates separately. However, we did do all of the analyses on both replicates and got similar results. We have now explicitly stated as much.

      The authors should provide a separate analysis for the larger nucleosomal sized fragments and smaller putative MCM double hexamer fragments with regards to the Mcm loading and relationship to ACS and orientation. They may represent an interesting intermediate with mechanistic consequences for the interpretation.

      We will include the suggested analysis in a subsequent revision.

      The authors should present the v-plots and an analysis of which side the Mcm's load for the overexpression studies. I was surprised that there was no further in-depth analysis for these two extremes. Perhaps similar conclusions will be reached, but it should at least be mentioned/presented as a supplementary figure.

      We will include the suggested analysis in a subsequent revision.

      **Minor Comments:**

      This is largely semantic, but the majority of MNase ChIP-seq signal recovered is associated with the nucleosomes and not in the NDR and as the signal in the NDR is differentially sensitive to digestion, I would suggest rephrasing the following sentence:

      "In contrast to previous genome-wide reports (Belsky et al., 2015), but in agreement with recent in-vitro cryo-EM structures (Miller et al., 2019), we also observe MCM signal in the nucleosome-depleted region (NDR) of origins. "

      to :

      "In agreement with a previous genome-wide report (belsky 2015), we found that the bulk of the MCM signal was associated with nucleosomal sized fragments; however the increased resolution afforded by our approach allowed us to also detect protected fragments in the NDR as predicted by recent in vitro cryo em structures..."

      We have modified the sentence as suggested.

      As a sanity check, please double check V-plots and presence of small fragments with the digestion conditions. In the Henikoff manuscript the bulk of sub-nucleosomal fragments were lost with the longer digestion time. Specifically, the TF footprints were more pronounced with minimal digestion. While it might be argued that the longer digestion more tightly resolved the binding site, in many cases they were completely lost with the 20 minute digestion. This is just a simple check -- I don't doubt the results as reported given the experimental conditions are very different. For example, the henikoff manuscript did not use cross linking or an antibody enrichment step.

      We double checked and confirmed that more small fragments are found in the more digested library. The reason that we see more small fragments when we digest more, in contrast to the contrary observation in the Henikoff paper is presumably because MCM has a larger footprint than a transcription factor and protects that footprint more effectively.

      Last paragraph of the "MCM associates with nucleosomes section" which reports that the Mcm2-7 complex is loaded up or downstream from the ACS independent of orientation should cite Belsky 2015 (Figure 5 and discussion) for the initial observation.

      Done.

      The authors argue that the global reduction in MCM loading associated with overexpression may be a technical artifact given that all origins exhibit a proportional reduction in mcm2-7 loading. However, this is exactly what the S. pombe spike in control is intended for. The relative difference between individual origins resulting from Mcm2-7 depletion would still be evident without the spike in. The authors do discuss different possibilities, but I would not be so keen to discard this as technical artifact.

      We, too, are reluctant to dismiss this result as a technical artifact. However, we are at a loss to offer any other explanation. We raise a handful of biological possibilities in the Discussion, but dismiss each one as failing to account for our results. We would be happy to entertain other suggestions.

      Reviewer #2 (Significance):

      This work has several advances that will be appreciated by the replication field -- including a high resolution view of Mcm2-7 loading in the context of chromatin; the impact of titrating (low and high) MCM expression on MCM loading and replication timing program; and a well reasoned discussion of how different models of MCM loading would impact origin activation and replication timing program. The work builds on prior studies in the field (eg. Belsky 2015), while some of the conclusions regarding the localization of the Mcm2-7 complex relative to the ACS and surrounding nucleosomes are confirmatory, the increased resolution provides new insight (like the enrichment of small fragments in the NDR) that could be further strengthened by additional analysis (see above).

      My expertise is DNA replication and chromatin.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors use Auxin-mediated degradation of Mcm4 to reduce the concentration of the MCM helicase complex in yeast, and determine the effects of this reduction on both MCM-origin association (interpreted as MCM loading) by MNase-MCM-ChIPSeq and on replication origin function by Sync-Seq replication timing experiments (deep sequencing of a yeast population as it progresses through a synchronized S-phase). Complementary experiments testing the effect of induced MCM complex over-expression on MCM-origin association are also performed.

      The authors find that reducing Mcm4 levels (and thus loading-competent MCM complexes) causes yeast cells to be more sensitive to DNA replication stress. In addition, not all origins are equally susceptible to reductions in MCM levels; the origins that do lose MCM binding at reduced MCM levels show a reduction in activity and an associated delay in their replication time under those conditions. Finally, over-expression of the MCM complex has no effect on MCM-origin association or origin function, suggesting that MCM levels are not limiting for origin licensing in yeast under normal lab conditions. The strengths of the study are the well-executed experiments and very nice data that are presented. However, there are several weaknesses. The authors make conclusions that are not supported by their data; and several of the outcomes are not at all unexpected based on extensive published studies in yeast and mammalian cells, raising issues about whether this study advances and/or clarifies the current gaps in the field. While some of the relevant past studies were referenced, the authors did not place their own study in the context to published work and current models in the field, which reduced the scholarly value of their study. Because the work was not placed in context of the field, some of the rationale and conclusions were misleading.

      **Some specific major comments:**

      1,The title is misleading. The authors have clearly shown that when MCM levels are be made limiting in an engineered system, some origins are substantially less active, which means that these origin loci are replicated "passively" (i.e. by a Replication Fork (RF) emanating from a distal origin) rather than actively (i.e. by "firing" and initiating replication). Their own replication data show that. But this competition is only revealed when MCM levels are artificially/experimentally lowered. What is the evidence that competition for MCM complexes among individual origins establishes replication timing patterns in yeast? If anything, the over-expression experiment suggests the opposite--that MCM levels are not limiting and therefore do not play a substantial role in establishing the replication timing patterns that are observed in yeast. Instead those patterns appear to result primarily from the fact that MCM complex activation factors are present in limiting concentrations relative to origins.

      We agree with the reviewer's analysis and have revised the title to "The Capacity of Origins to Load MCM Establishes Replication Timing Patterns".

      2,The abstract states that "the number of MCMs loaded onto origins has been proposed to be a key determinant of when those origins initiate DNA replication during S-phase". While it is true that this lab has proposed this model in budding yeast, the current study performs no experiments that directly address this model--i.e. that i. individual origins possess a different number of MCM complexes and or ii that these differences underlie timing differences. They acknowledge this point in their Discussion--a ChIPSeq experiment is an ensemble experiment--there is no way to know that differences in MCM signals correspond to a different number of MCM complexes per origin versus a differences in the fraction of cells that contain and MCM complex at all at a given origin . But this statement in the abstract, combined with their conclusion in the same section of the paper: "Our results support a model in which the loading activity of origins, controlled by their ability to recruit ORC and compete for MCM, determines the number of helicases loaded, which in turn affects replication timing" implies that they have tested a model that they have not tested. Given how quickly readers "skim" the literature these days, a misleading abstract can do a lot of damage to a field. The results presented in this study neither support nor refute the model for the number of helicases loaded per origin, and the fact that reducing origin licensing efficiency by making the major substrate limiting reduces the number of licensed origins in a cell population is fully expected based on the current state of the field .

      Four questions are addressed in this comment. The first is whether there is variable MCM stoichiometry at origins. The second is whether that variation ranges from 0 to 1 and 0 to many. The third is if the variation is stoichiometry affects replication timing. The fourth is how this variation in stoichiometry comes about.

      Our work is based on the conclusion, supported by a substantial body of literature, that MCM loading stoichiometry varies among origins. Our data in this paper further supports this conclusion.

      As the reviewer notes, and as we had tried to make clear, the data is this paper does not address the range of the variation. Moreover, as we also tried to make clear, our hypotheses, results and conclusions are not affected by whether the range is 0 to 1 or 0 to many.

      This paper focuses on Questions 3 and 4. We have reworked the introduction to make these distinctions more clear.

      We have also corrected the abstract to refer to "the stoichiometry", instead of "the number", of MCMs.

      3,The rationale for the study as stated in the Introduction: "Although the molecular biochemistry of initiation at individual origins continues to be elucidated in great detail (Bleichert, 2019), the mechanism governing the time at which different regions of the genome replicate has remained largely elusive (Boos and Ferreira, 2019)." Is also misleading. In fact, in budding yeast (and other organisms) there have been several advances in this area particularly with respect to DNA replication origin activation. The S-phase origin activation factors are limiting for origin function, and factors such as Ctf19 at centromeres and Fkh1/2 at non-centromeric early-acting origins help to directly recruit the limiting S-phase factor, Dbf4, to origins. It is misleading to ignore this substantial progress and not make an effort to place this current study, which is important and one of the first to look directly at MCM loading control in yeast, into a relevant context with respect to what is known. What's interesting is that this S-phase model assumes/requires that most origins are, in fact, licensed and thus that differences in licensing efficiency are not a major driving of replication timing patterns in yeast. But we do not know why there are only subtle differences in MCM loading---this study may help explain that.

      We have broadened the scope of our Introduction and Discussion to address these points. However, it is not the case that "there are only subtle differences in MCM loading". MCM ChIP-seq (, and this paper) and MCM ChEC-seq both show well over ten-fold variation in MCM stoichiometry at origins. We have now explicitly made this point in the Introduction.

      4,The authors link the differential ability of MCM loading deficiencies when MCM is made limiting to differences in ORC binding categories. The "weak" origins, that presumably bind ORC weakly, were most affected by reductions in MCM. Are these origins less efficient than the other categories, DNA and chromatin-dependent (using the origin efficiency metric data from the Whitehouse lab) where MCM binding is not reduced as much? In normal cells are these early or late origins? Is the idea that the role of excess MCM is to achieve a sufficient number or "back up" origins per cell to deal with potential stress, as proposed by the Blow and Schwob labs in tissue culture cells many years ago? It seems likely that the data reported here are in fact confirmations of those early studies in mammalian cells---which is useful to know even if not unexpected.

      We will include the suggested analyses in a subsequent revision.

      Excess MCM do, as has been long appreciated and as we discuss, contribute to replication-stress tolerance. However, that is not a major point of our paper.

      5,Aren't the results that losing MCM signal corresponds to loss of origin activity peaks entirely expected? The same result would be obtained if you made a point mutation in that origin's ACS. Of course preventing an origin from being licensed will delay that region's replication time in S-phase because it now must be replicated passively. Licensing affects replication timing patterns because the MCM complex is the substrate for limiting S-phase factors, but that is far different from concluding that the number of MCMs at an origin is what controls the time in S-phase when an origin is activated.

      Yes, "the results that losing MCM signal corresponds to loss of origin activity peaks [are] entirely expected". However, this is not the important result. The key result is that the distribution of MCM at origins is not uniformly affected, which leads to our conclusions that, in wild-type cells, origin capacity dominates MCM stoichiometry and that, when MCM become limiting, origin activity (probably determined by ORC affinity) becomes critical—neither of which were expected results. In any case, the expected correlation between MCM loading and origin activity was observed as a consequence of measuring MCM stoichiometry and replication timing and is an obvious analysis to include, so we did so.

      6,The authors stated that the measured MCM abundance for the 43% of origins that are not known to be controlled by the multiple mechanisms that have been shown to control origin replication time. Is this because they think that MCM loading contributes to the timing control of only these origins? Was MCM loading not affected at any of these other origins when MCM levels were reduced? Are those 43% of origins in the "weak" binding category in terms of ORC? The rationale for eliminating so many origins from these analyses were not clear.

      We propose that the probability of origin activation is the product of the stoichiometry of MCM at the origin and the rate of MCM activation, which may be affected by trans-acting factors. For the 43% of origins for which there is no known trans-acting regulation, the correlation with stoichiometry is stronger. However, the correlation holds when looking at all origin, too. The suggestion to look at only the 57% of origins with known trans-action regulation is a good one. We will include this analysis and the other suggested analyses in a subsequent revision.

      7,Doesn't the data in Figure 4c at 0 mM auxin support the conclusion that differences in MCM ChIP signals have negligible effects on origin activation time, in contrast to the publication by Das, 2015 from this lab? Or is the point that these origins are sensitive to reductions in MCM levels and the more sensitive they are the more delayed their replication time (but again, doesn't that have to be true? If they are losing MCM signals they cannot function as origins, so they are replicated passively and, by definition, will show delayed replication timing. An origin is defined as such by a loaded MCM complex.)

      No. The reason the correlation in 4c is not a good as in our previous work is that in Das 2015 we compared origin-activation efficiency (calculated from our stochastic model in Yang 2010), instead of T_rep, which we used here. T_rep is a convolution of origin-activation time and passive-replication time, reducing to correlation. The important observation is that the correlation gets better as MCM levels are reduced.

      The correlation between MCM stoichiometry and activation efficiency may seem trivial, but just because a model is simple does not mean it is not correct. If stoichiometry was the only factor regulating origin activation, we would expect a stronger correlation. So, we conclude that there are other factors at play, quite possible the trans-acting factors that the reviewer mentions in their second point. However, if stoichiometry played no role, we would expect no correlation. So, we propose that MCM stoichiometry is "an important determinant of replication timing".

      8,I do not understand the conclusions from Figure 4d. There is an extremely small positive correlation between how much of an MCM signal is lost and delay in replication time of an origin, but this correlation is not surprising as an unlicensed origin cannot be an origin and will be replicated passively. What seems most surprising about these data is that the effect is so weak, not that it exists. There is quite a lot of scatter in this plot at 500 uM auxin, with some origins losing a given amount of signal (x) and being only slightly delayed in replication time, and others losing the same amount of signal (x) and being substantially delayed. What underlies this outcome?--Are the ones that are not substantially delayed closer to origins that have not been affected at all by MCM reductions? Why is the correlation so weak? The other regulators of origin activation time have stronger and more precise effects--for example the centromere-control can be precisely eliminated so that only the replication time of the centromere-proximal origins are delayed.

      We believe that much of the noise in Figure 4d is due, as the reviewer suggests, to passive replication of origins which lose most of their MCM signal and become inactive but happen to reside next to origins which don’t lost any MCM signal and fire early. And excellent example is ARS 510 (see Figure 4a). ARS510 loses most of its MCM signal and clearly loses its initiation peak in the T_rep plot. However, because it is next to ARS511, which does not lose much MCM signal and which remains a efficient origin, ARS510 is still replicated early. We will include this example in a subsequent revision.

      9,Multiple studies in yeast and mammalian cells indicate that MCM subunits are in excess relative to other licensing and S-phase initiation factors, so it is not unexpected that over-expressing MCM did not lead to enhanced levels of licensing. It seems much more plausible that Cdc6 or Cdt1 or both factors are present in limiting amounts for MCM loading, so I did not understand the point of over-producing MCM subunits. If the "weak" origins are the ones that are most dramatically affected by reducing MCM to "limiting" levels, isn't the question whether you can increase licensing at these origins when you over-produce a factor that is likely limiting for licensing, such as Cdt1 or Cdc6 (or both) while leaving MCM at its normal levels. The fact that MCM levels are not limiting for licensing is not surprising and, if anything, argues against these levels having a regulatory role in origin activation timing---which seems to be the opposite of what the authors want to conclude.

      Orc1-6, Cdc6 and Cdt1 are all substoichiometric to MCM. However, they all act catalytically to load MCM. So, although they may be kinetically limiting, they do not prevent most or all MCMs being loaded in wild-type cells. The fact that overexpressing MCMs (with or without Cdt1) does not allow for more MCM loading suggests that under normal conditions origins are saturated with MCMs and have little or no capacity to load more MCM, even when given plenty of time to do so. From this result, we conclude that origin capacity is a major determinant of MCM loading in wild-type cells. From our MCM-reduction experiments, we also conclude that, when MCM is limiting, origin competition affects which origins load MCMs faster. However, we agree with the reviewer's first point, that our title gave the incorrect impression that we concluded that origin competition is the primary determinant of MCM loading in wild-type cells. Thus, as suggested, we have changed the title. We have also reworked the Introduction and Discussion to more clearly explain that competition is only a determining factor when MCMs are limited.

      In summary, I think the technical aspects of the experiments were quite strong, but I do not think that the experiments answered the question that was posed by the authors.

      **Minor points:**

      Many places where "This data" should be changed to "These data". Data are plural.

      See comments on this point in the response to Reviewer #2.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this study, the authors use Auxin-mediated degradation of Mcm4 to reduce the concentration of the MCM helicase complex in yeast, and determine the effects of this reduction on both MCM-origin association (interpreted as MCM loading) by MNase-MCM-ChIPSeq and on replication origin function by Sync-Seq replication timing experiments (deep sequencing of a yeast population as it progresses through a synchronized S-phase). Complementary experiments testing the effect of induced MCM complex over-expression on MCM-origin association are also performed.

      The authors find that reducing Mcm4 levels (and thus loading-competent MCM complexes) causes yeast cells to be more sensitive to DNA replication stress. In addition, not all origins are equally susceptible to reductions in MCM levels; the origins that do lose MCM binding at reduced MCM levels show a reduction in activity and an associated delay in their replication time under those conditions. Finally, over-expression of the MCM complex has no effect on MCM-origin association or origin function, suggesting that MCM levels are not limiting for origin licensing in yeast under normal lab conditions. The strengths of the study are the well-executed experiments and very nice data that are presented. However, there are several weaknesses. The authors make conclusions that are not supported by their data; and several of the outcomes are not at all unexpected based on extensive published studies in yeast and mammalian cells, raising issues about whether this study advances and/or clarifies the current gaps in the field. While some of the relevant past studies were referenced, the authors did not place their own study in the context to published work and current models in the field, which reduced the scholarly value of their study. Because the work was not placed in context of the field, some of the rationale and conclusions were misleading.

      Some specific major comments:

      1,The title is misleading. The authors have clearly shown that when MCM levels are be made limiting in an engineered system, some origins are substantially less active, which means that these origin loci are replicated "passively" (i.e. by a Replication Fork (RF) emanating from a distal origin) rather than actively (i.e. by "firing" and initiating replication). Their own replication data show that. But this competition is only revealed when MCM levels are artificially/experimentally lowered. What is the evidence that competition for MCM complexes among individual origins establishes replication timing patterns in yeast? If anything, the over-expression experiment suggests the opposite--that MCM levels are not limiting and therefore do not play a substantial role in establishing the replication timing patterns that are observed in yeast. Instead those patterns appear to result primarily from the fact that MCM complex activation factors are present in limiting concentrations relative to origins.

      2,The abstract states that "the number of MCMs loaded onto origins has been proposed to be a key determinant of when those origins initiate DNA replication during S-phase". While it is true that this lab has proposed this model in budding yeast, the current study performs no experiments that directly address this model--i.e. that i. individual origins possess a different number of MCM complexes and or ii that these differences underlie timing differences. They acknowledge this point in their Discussion--a ChIPSeq experiment is an ensemble experiment--there is no way to know that differences in MCM signals correspond to a different number of MCM complexes per origin versus a differences in the fraction of cells that contain and MCM complex at all at a given origin . But this statement in the abstract, combined with their conclusion in the same section of the paper: "Our results support a model in which the loading activity of origins, controlled by their ability to recruit ORC and compete for MCM, determines the number of helicases loaded, which in turn affects replication timing" implies that they have tested a model that they have not tested. Given how quickly readers "skim" the literature these days, a misleading abstract can do a lot of damage to a field. The results presented in this study neither support nor refute the model for the number of helicases loaded per origin, and the fact that reducing origin licensing efficiency by making the major substrate limiting reduces the number of licensed origins in a cell population is fully expected based on the current state of the field .

      3,The rationale for the study as stated in the Introduction: "Although the molecular biochemistry of initiation at individual origins continues to be elucidated in great detail (Bleichert, 2019), the mechanism governing the time at which different regions of the genome replicate has remained largely elusive (Boos and Ferreira, 2019)." Is also misleading. In fact, in budding yeast (and other organisms) there have been several advances in this area particularly with respect to DNA replication origin activation. The S-phase origin activation factors are limiting for origin function, and factors such as Ctf19 at centromeres and Fkh1/2 at non-centromeric early-acting origins help to directly recruit the limiting S-phase factor, Dbf4, to origins. It is misleading to ignore this substantial progress and not make an effort to place this current study, which is important and one of the first to look directly at MCM loading control in yeast, into a relevant context with respect to what is known. What's interesting is that this S-phase model assumes/requires that most origins are, in fact, licensed and thus that differences in licensing efficiency are not a major driving of replication timing patterns in yeast. But we do not know why there are only subtle differences in MCM loading---this study may help explain that.

      4,The authors link the differential ability of MCM loading deficiencies when MCM is made limiting to differences in ORC binding categories. The "weak" origins, that presumably bind ORC weakly, were most affected by reductions in MCM. Are these origins less efficient than the other categories, DNA and chromatin-dependent (using the origin efficiency metric data from the Whitehouse lab) where MCM binding is not reduced as much? In normal cells are these early or late origins? Is the idea that the role of excess MCM is to achieve a sufficient number or "back up" origins per cell to deal with potential stress, as proposed by the Blow and Schwob labs in tissue culture cells many years ago? It seems likely that the data reported here are in fact confirmations of those early studies in mammalian cells---which is useful to know even if not unexpected.

      5,Aren't the results that losing MCM signal corresponds to loss of origin activity peaks entirely expected? The same result would be obtained if you made a point mutation in that origin's ACS. Of course preventing an origin from being licensed will delay that region's replication time in S-phase because it now must be replicated passively. Licensing affects replication timing patterns because the MCM complex is the substrate for limiting S-phase factors, but that is far different from concluding that the number of MCMs at an origin is what controls the time in S-phase when an origin is activated.

      6,The authors stated that the measured MCM abundance for the 43% of origins that are not known to be controlled by the multiple mechanisms that have been shown to control origin replication time. Is this because they think that MCM loading contributes to the timing control of only these origins? Was MCM loading not affected at any of these other origins when MCM levels were reduced? Are those 43% of origins in the "weak"binding category in terms of ORC? The rationale for eliminating so many origins from these analyses were not clear.

      7,Doesn't the data in Figure 4c at 0 mM auxin support the conclusion that differences in MCM ChIPsignals have negligible effects on origin activation time, in contrast to the publication by Das, 2015 from this lab? Or is the point that these origins are sensitive to reductions in MCM levels and the more sensitive they are the more delayed their replication time (but again, doesn't that have to be true? If they are losing MCM signals they cannot function as origins, so they are replicated passively and, by definition, will show delayed replication timing. An origin is defined as such by a loaded MCM complex.)

      8,I do not understand the conclusions from Figure 4d. There is an extremely small positive correlation between how much of an MCM signal is lost and delay in replication time of an origin, but this correlation is not surprising as an unlicensed origin cannot be an origin and will be replicated passively. What seems most surprising about these data is that the effect is so weak, not that it exists. There is quite a lot of scatter in this plot at 500 uM auxin, with some origins losing a given amount of signal (x) and being only slightly delayed in replication time, and others losing the same amount of signal (x) and being substantially delayed. What underlies this outcome?--Are the ones that are not substantially delayed closer to origins that have not been affected at all by MCM reductions? Why is the correlation so weak? The other regulators of origin activation time have stronger and more precise effects--for example the centromere-control can be precisely eliminated so that only the replication time of the centromere-proximal origins are delayed.

      9,Multiple studies in yeast and mammalian cells indicate that MCM subunits are in excess relative to other licensing and S-phase initiation factors, so it is not unexpected that over-expressing MCM did not lead to enhanced levels of licensing. It seems much more plausible that Cdc6 or Cdt1 or both factors are present in limiting amounts for MCM loading, so I did not understand the point of over-producing MCM subunits. If the "weak" origins are the ones that are most dramatically affected by reducing MCM to "limiting" levels, isn't the question whether you can increase licensing at these origins when you over-produce a factor that is likely limiting for licensing, such as Cdt1 or Cdc6 (or both) while leaving MCM at its normal levels. The fact that MCM levels are not limiting for licensing is not surprising and, if anything, argues against these levels having a regulatory role in origin activation timing---which seems to be the opposite of what the authors want to conclude.

      In summary, I think the technical aspects of the experiments were quite strong, but I do not think that the experiments answered the question that was posed by the authors.

      Minor points:

      Many places where "This data" should be changed to "These data". Data are plural.

      Significance

      Significance: see above

      Referees Cross Commenting

      Reviewer 3. My overall conclusions about this study are that the data are extremely nice and useful to the field, but that their potential to advance the field or clarify it for 'outsiders' are limited by 1, a biased. model-centric presentation that fails to put the work in context of a lot of strong previous work. Some of the conclusions cannot event be tested by the experimental design 2, some of the data analyses, for example the parsing of origins for analyses of MCM effects versus effects on replication time seem arbitrary and were not clearly justified. 3, The correlation between reductions in MCM loading and Trep delay seemed weak, even after parsing for origins expected to experience the largest effects, which is actually kind of interesting, but was ignored in favor of the pre-determined interpretation.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This is a nice study that characterizes the consequences of limiting or increasing Mcm expression on the replication program. Prior ChIP experiments in yeast have observed that not all origins exhibit the same level of Mcm enrichment and that increased mcm enrichment was correlated with origin activity. These observations led to two different models -- a) that multiple Mcm2-7 double hexamer complexes are loaded at some origins and b) a probabilistic model where the differential enrichment of Mcm2-7 reflected the fraction of cells in a population that had loaded the Mcm2-7 complex at a specific origin. While the titration experiments presented here don't provide any conclusive support for either model, they do provide some novel and relevant insights for the replication field, in part, due to the increased resolution and quantification afforded by the MNase ChIP-seq approach (and S. pombe spike in). The authors very nicely demonstrate that origins are differentially sensitive to Mcm2-7 depletion and that loss of Mcm2-7 loading results in an altered replication timing profile. The origins most impacted by loss of Mcm2-7 are 'weak' origins as described by the Fox group. Intriguingly, the authors find that the 5X overexpression of Mcm2-7 does not perturb the relative Mcm2-7 loading at individual origins, but rather instead globally represses Mcm2-7 association at all origins. They also find that overexpression of both Cdt1 and Mcm2-7 is detrimental to the cell (although no obvious replication phenotype was observed). Finally, the authors present a reasonable interpretation of their data in the context of models for replication timing which was very well articulated.

      Major Comments:

      From the methods it appears that different analyses were performed with different replicates?

      "Replicate #1 was used for all analyses except for V plots, for which the higher resolution Replicate #2 was used."

      Ideally all of the conclusions should be supported by all the replicates independently, or if the replicates are concordant -- they should be merged (at a similar sequencing depth) prior to doing the analyses.. Even the v-plots with merged replicates will be informative due to the greater sequencing depth.

      The authors should provide a separate analysis for the larger nucleosomal sized fragments and smaller putative MCM double hexamer fragments with regards to the Mcm loading and relationship to ACS and orientation. They may represent an interesting intermediate with mechanistic consequences for the interpretation.

      The authors should present the v-plots and an analysis of which side the Mcm's load for the overexpression studies. I was surprised that there was no further in-depth analysis for these two extremes. Perhaps similar conclusions will be reached, but it should at least be mentioned/presented as a supplementary figure.

      Minor Comments:

      This is largely semantic, but the majority of MNase ChIP-seq signal recovered is associated with the nucleosomes and not in the NDR and as the signal in the NDR is differentially sensitive to digestion, I would suggest rephrasing the following sentence:

      "In contrast to previous genome-wide reports (Belsky et al., 2015), but in agreement with recent in-vitro cryo-EM structures (Miller et al., 2019), we also observe MCM signal in the nucleosome-depleted region (NDR) of origins. "

      to :

      "In agreement with a previous genome-wide report (belsky 2015), we found that the bulk of the MCM signal was associated with nucleosomal sized fragments; however the increased resolution afforded by our approach allowed us to also detect protected fragments in the NDR as predicted by recent in vitro cryo em structures..."

      As a sanity check, please double check V-plots and presence of small fragments with the digestion conditions. In the Henikoff manuscript the bulk of sub-nucleosomal fragments were lost with the longer digestion time. Specifically, the TF footprints were more pronounced with minimal digestion. While it might be argued that the longer digestion more tightly resolved the binding site, in many cases they were completely lost with the 20 minute digestion. This is just a simple check -- I don't doubt the results as reported given the experimental conditions are very different. For example, the henikoff manuscript did not use cross linking or an antibody enrichment step.

      Last paragraph of the "MCM associates with nucleosomes section" which reports that the Mcm2-7 complex is loaded up or downstream from the ACS independent of orientation should cite Belsky 2015 (Figure 5 and discussion) for the initial observation.

      The authors argue that the global reduction in MCM loading associated with overexpression may be a technical artifact given that all origins exhibit a proportional reduction in mcm2-7 loading. However, this is exactly what the S. pombe spike in control is intended for. The relative difference between individual origins resulting from Mcm2-7 depletion would still be evident without the spike in. The authors do discuss different possibilities, but I would not be so keen to discard this as technical artifact.

      Significance

      This work has several advances that will be appreciated by the replication field -- including a high resolution view of Mcm2-7 loading in the context of chromatin; the impact of titrating (low and high) MCM expression on MCM loading and replication timing program; and a well reasoned discussion of how different models of MCM loading would impact origin activation and replication timing program. The work builds on prior studies in the field (eg. Belsky 2015), while some of the conclusions regarding the localization of the Mcm2-7 complex relative to the ACS and surrounding nucleosomes are confirmatory, the increased resolution provides new insight (like the enrichment of small fragments in the NDR) that could be further strengthened by additional analysis (see above).

      My expertise is DNA replication and chromatin.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This manuscript follows on from previous work from the Rhind lab to investigate whether the load of MCMs at origins is a factor in when the origin activate (as a population average) during S phase. The authors use budding yeast and a auxin degron system to modulate the levels of an MCM subunit. This allows them to titrate down the concentration of the MCM hexamer and observe the effect. Crucially, they assay both the reduction in MCM load at origins and the subsequent replication dynamics in the same experiment. This is the power of their approach and allows them to rigorously test their hypothesis.

      Major comments

      1.I found the introductory paragraph discussing the Rhind lab hypothesis about the possibility of multiple MCM being loaded at origins somewhat misleading. The first paragraph of the discussion was much clear. However, I feel that the introductory paragraph should deal with the difference between the two proposals: 0-1 MCM-DH per origin (de Moura et al), vs 0-50+ MCM-DH (Yang et al). It s also important to note that Foss et al find that "In budding yeast, [MCM] complexes were present in sharp peaks comprised largely of single double-hexamers" - i.e. consistent with 0-1 MCM-DH per origin.

      To improve the balance of the introduction, I think the authors should briefly introduce the concepts behind the 0-1 MCM-DH per origin; this was defined as origin competence by Stillman and clearly described by McCune et al (2008; see figure 8) prior to the work from de Moura et al. Furthermore, in the discussion the authors should be more even-handed. To date there is no data to conclusively rule one way or the other in distinguishing between single vs multiple MCMs. The authors cite Lynch et al and state "overexpression of origin-activating factors in S phase causes most all origins to fire early in S phase, consistent with most origins having at least one MCM loaded". However, Lynch et al report equivalent (roughly equal) origin efficiencies, but the assay doesn't distinguish between all going up to high efficiency or all going to a lower intermediary efficiency. Given that fork factors (polymerases, etc) are likely to become limiting at some point (or checkpoints could be activated due to limited dNTP supplies) it would seem plausible that uniform origin efficiency could be a consequence of less than maximal origin firing. As part of this discussion it would be useful for the authors to include what conclusions have been reached on MCM load from in vitro systems (with chromatin substrates).

      2.The authors are not the first to look at the consequence of reduced MCM concentrations on origin function. This was essentially the basis for the MCM screen undertaken by Bik Tye's lab that first identified the MCM genes. In addition to temperature sensitive mutants, the Tye group also examined heterozygotes (Lei et al., 1996) to show differential effect on the ability of two origins to support plasmid replication. The authors finds are entirely consistent with these early studies, particularly since ARS416 (formerly ARS1) was found to highly sensitive to reduced MCM levels and ARS1021 (formerly ARS121) was found to be insensitive to MCM levels. The authors find a signifiant reduction in MCM load at ARS416, but the MCM load at ARS1021 is unaltered by reduced MCM concentration. It would be worth the authors noting this consistency. The authors do cite the Lei study, but not in this context. The original MCM screen was published here: Maine, G., Sinha, P., Tye, B. (1984). Mutants of S. cerevisiae defective in the maintenance of minichromosomes Genetics 106(3), 365 - 385. Furthermore, at the end of the discussion the authors state that "it will be interesting to dissect the specific cis- and trans-acting factors that make origins sensitive or resistant to changes in MCM levels". The equivalent effect reported by the Tye lab has already been dissected by the Donaldson lab (Nieduszynski et al., 2006) and perhaps it would be worth briefly mentioning their findings.

      3.The authors should show the flow cytometry data for each of their cell cycle experiments, if only in supplementary figures. This is important to allow a reader (and reviewer) to judge the level of synchrony achieved when interpreting the results.

      4.I think the authors should show the ChIP signal at some example origins, including ones sensitive and insensitive to the reduction in MCM concentration. Currently all the high resolution ChIP data (i.e. over 1400 bp, e.g. Fig 3a) is presented as meta-analyses of many origins.

      5.When describing the results in Fig 4a the authors focus on changes (highlighted in black boxes) that fit their expectation. However, there are other sites that should at least be mentioned that don't seem to fit the authors model, e.g. ARS517, ARS518. It would be worth discussing what fraction of the timing data can be explained by the reduced MCM load.

      Minor comments

      -These data, rather than this data (throughout).

      -the authors should clearly state in figure legends what window size has been used in analysing genomic data.

      -in figure 2a the authors show pairwise comparisons between conditions, it would be nice to see the 3rd pairwise comparisons perhaps as a supplementary figure

      -in figure 2c it would be clearer to use the same colour for the lines and the points

      -the authors should avoid the use of red/green colour combinations in their figures (see: https://thenode.biologists.com/data-visualization-with-flying-colors/research/)

      -in the text the authors state "ORC binding to the ACS and subsequent MCM loading is a directional process dependent on a ACS- site and a similar but inverted nearby sequence (Xu et al., 2006)". I think it would be more appropriate to cite the following study here: Coster, G., Diffley, J. (2017). Bidirectional eukaryotic DNA replication is established by quasi-symmetrical helicase loading Science (New York, NY) 357(6348), 314 - 318. https://dx.doi.org/10.1126/science.aan0063

      -the list of factors that influence replication timing should include Rif1, whereas it is less clear that Rpd3 acts within the unique genome (as opposed to indirectly via repetitive DNA, e.g. rDNA)

      -figure 4 - it might help to mark the centromere on panel a. Also, why do the ChIP peaks and annotated origins appear to line up so poorly?

      -figure 4d - would it not be better to use fraction of lost MCM signal on the x-axis as in previous figures?

      -"with galactose or raffinose, to induce or repress Mcm2-7 overexpression, respectively." This is incorrect, raffinose does not repress this promoter (that requires glucose).

      -the S. pombe spike in is a great addition to the over expression experiments. It's a shame that it wasn't included in the auxin experiments.

      -why does the data in fig 5d appear to be at much lower resolution that the previous ChIP data?

      -in the sequencing analysis pipeline for MCM ChIP the authors use a 650 bp upper size limit; why have such a large threshold compared to the size of a nucleosome? Are the analyses and findings sensitive to this size threshold?

      -the repliscope package was published here:

      Batrakou, D., Müller, C., Wilson, R., Nieduszynski, C. (2020). DNA copy-number measurement of genome replication dynamics by high-throughput sequencing: the sort-seq, sync-seq and MFA-seq family. Nature Protocols 15(3), 1255 - 1284. https://dx.doi.org/10.1038/s41596-019-0287-7

      Significance

      This work builds upon a body of work from the Rhind group (and others) to determine the contribution of MCM load to replication origin activation dynamics. To my mind this is the most convincing dataset and analysis to date and goes a long way to supporting the model that the efficiency of MCM loading is a major factor in determining the mean replication time of an origin. As the authors state, they are still not able to distinguish between two different models of MCM load (single vs multiple). It would be interesting for the authors to discuss how these two models could be distinguished in the future (perhaps with single cell/molecule experiments).

      This study will be of interest to those in the fields of DNA replication and genome stability.

      My field of expertise is DNA replication and replication origin function.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Abrams and Nance describes how the polarity proteins PAR-6 and PKC-3/aPKC promote lumen extension of the unicellular excretory canal in C. elegans. Using tissue-specific depletion methods they find that CDC-42 and the RhoGEF EXC-5/FGD are required for luminal localization of PAR-6, which recruits the exocyst complex required for lumen extension. Interestingly, they show that the ortholog of the mammalian exocyst receptor, PAR-3, is dispensable for luminal membrane extension. Overall, this is a well-written and interesting manuscript.*

      1.Because depletion of PAR-3 in the canal causes milder defects than PAR-6 or CDC-42 the authors suggest that they cannot rule out the possibility that an alternative isoform of PAR-3 is expressed and buffering the defect. They should perform canal-specific RNAi-mediated depletion of the entire PAR-3 gene to determine if this is true.

      We agree with Reviewer 1 that it would be useful to provide additional evidence that an alternative isoform of PAR-3 lacking the ZF1 degron is not expressed. While tissue-specific RNAi could be used, we have not been successful obtaining complete knockdown in previous tissue-specific RNAi experiments. Moreover, this approach does not target any maternal PAR-3 protein that may be inherited by the excretory cell. As an alternative approach to address this point, we will analyze par-3::zf1::yfp/par-3(null) worms following excretory-cell-specific expression of zif-1, and compare to par-3::zf1::yfp/par-3::zf1::yfp siblings. We would expect the excretory cell phenotype to become more severe if additional, ‘phenotype-buffering’ forms of PAR-3 were present, or if there was incomplete degradation of PAR-3::ZF1::YFP in our previous experiments.

      2.The authors suggest that GTP-loaded (activated) CDC-42 recruits PAR-6 to the luminal membrane. It would be nice if they could use a biosensor, such as the GBD-WSP-1 reagent from Buechner's lab to confirm that EXC-5 depletion also reduces activated CDC-42, as would be expected. This should be achievable since there is strong CDC-42 signal, even in the cytoplasm.

      This is an excellent suggestion. We will utilize a CDC-42 biosensor – an integrated cdc42p::gfp::wsp-1(gbd) strain created in our lab and previously validated and characterized (Zilberman et al. 2017). We have confirmed that the biosensor is detected in the excretory canal and appears enriched at or near the lumenal membrane. We will cross the biosensor into the exc-5::zf1::mScarlet background. This will allow us to assess lumenal enrichment, and using heat shock inducible ZIF-1, determine if there is a reduction in biosensor lumenal enrichment when EXC-5::ZF1::mScarlet is acutely degraded.

      If the biosensor is difficult to measure at the canal lumen, an alternative approach would be to use an available exc-5 null allele to examine genetically if cdc-42 and exc-5 are acting in the same pathway. We could cross CDC-42exc(-) larvae into exc-5(rh232) and quantify excretory canal phenotypes. If CDC-42 and EXC-5 are indeed functioning in the same pathway we would expect no enhancement of the CDC-42exc(-) phenotype.

      3.Related to point 2, (i) does mutation of the CRIB domain of PAR-6 impair its recruitment to the luminal membrane, and (ii) does this mutant exacerbate canal defects when PAR-3 is depleted?

      (i) Our lab has previously generated and characterized a transgenic par6P::par-6(**CRIB)::gfp strain (Zilberman et al., 2017). We will examine this strain to determine if expression is detectable in the excretory canal, and if so, we will compare lumenal enrichment of PAR-6(CRIB)::GFP to control worms expressing wild-type PAR-6::GFP.

      (ii) This is a very interesting experiment, as it would help address if the mild phenotype observed in PAR-3 depleted animals is due to the remaining PAR-6 that is recruited by CDC-42. Our lab has previously shown that par6P::par-6(**CRIB)::gfp cannot rescue the embryonic lethality of a par-6 mutant, in contrast to par-6::gfp (Zilberman et al. 2017). This indicates that the CRIB domain is needed for PAR-6 function during embryogenesis and suggests that CRIB domain mutations introduced by CRISPR would almost certainly be lethal, precluding analysis of the excretory cell.

      As an alternative experiment, we would determine if PAR-3 localizes to the lumenal membrane independently of CDC-42; such a finding would imply that PAR-3 and CDC-42 likely have independent contributions to PAR-6 localization (rather than CDC-42 promoting PAR-6 localization by localizing PAR-3). To do this, we will degrade ZF1::YFP::CDC-42 in the excretory cell and examine the localization of PAR-3::mCherry compared to controls. We have all of the strains needed for this experiment.

      4.The authors hypothesize that partial recruitment of PAR-6 by CDC-42 is sufficient for luminal membrane extension to explain the mild defects caused by PAR-3 depletion. Since depletion of PAR-6 and CDC-42 alone causes milder canal truncations the authors should co-deplete these proteins (as well as PAR-3 and CDC-42) to determine if there is an additive effect.

      This is an excellent suggestion in principal. However, it is not possible to know in any given degradation experiment whether the targeted protein is completely degraded; we can only say it is no longer detectable by fluorescence. Thus, any degron allele (in the presence of ZIF-1) could behave like a strong hypomorph rather than a null. It would not be possible to interpret double degradation experiments in such a case, as a more severe phenotype in the double could simply be a result of combining two hypomorphic alleles, further reducing pathway activity even if the genes function together in the same pathway. To interpret this experiment properly, a null allele of at least one of the genes would have to be used. This is not possible since par and cdc-42 null mutants are lethal and there is also maternal contribution. As an alternative to these double depletion experiments, we will deplete PAR-6::ZF1::YFP or PAR-3::ZF1::YFP in exc-5 null mutant larvae, as unlike cdc-42, exc-5 is not an essential gene.

      5.In figure 2, the authors show that depletion of PKC-3 causes more severe canal truncations than PAR-6. Since these proteins function in the same complex what do they think is the reason for this difference? This point could be discussed more in the manuscript.

      As described in the previous point, incomplete degradation could produce modestly different phenotypes even for genes that act in the same pathway. Therefore, it is not possible to determine whether PAR-6 and PKC-3 have different roles using this approach. We will add text to the discussion bringing up this point.

      6.Related to point 5, more experiments with PKC-3 should be done to determine if, for example, localization of SEC-10 is similarly affected as ablation of PAR-3, PAR-6 and CDC-42.

      We agree, and will address this point by acutely degrading ZF1::GFP::PKC-3 and examining transgenic SEC-10::mCherry, as we have done for other par genes.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): The manuscript by Abrams & Nance describes a precise investigation of the role of PAR proteins in the recruitment of the exocyst during and after the extension of the C. elegans excretory canal. State-of-the-art genetic techniques are used to acutely deplete proteins only in the targeted cell, and examine the localization of endogenously expressed markers. Experiments are well described and carefully quantified, with systematic statistical analysis. The manuscript is easy to follow and the bibliography is very good. Most conclusions are well supported.

      1) I am not entirely convinced by the presence of CDC-42 at the lumenal membrane (Fig3G); it seems to be more sub-lumenal that really lumenal. It peaks well before PAR-6 (Fig3H) which itself seem slightly less apical that PAR-3 (Fig3F). Could you use super-resolution microscopy (compatible with endogenous expression levels) to more precisely localize CDC-42? Similar point for PAR-3 and PAR-6 which do not seem to colocalize completely - a longitudinal line scan along the lumenal membrane might provide the answer even without super-resolution; this could help explain why these two proteins do not have the same function. These suggestions are easy to do provided the authors can have access to super-resolution (Airyscan to name it; although other methods will be perfectly acceptable I believe it is the most simple one).

      We agree that the CDC-42 localization peak does not precisely match the PAR-6 peak. As the reviewer notes, resolving the subcellular localization of these two proteins will not be feasible using standard confocal microscopy. We will image the ZF1::YFP::CDC-42; PAR-6:mKate strain using a Zeiss LSM 880 with Airyscan to determine if their subcellular localization patterns are distinct.

      To examine PAR-3 and PAR-6 colocalization at the lumen, we will acquire additional confocal images of the PAR-6-ZF1-YFP; PAR-3-mCherry strain and examine colocalization of the clusters along the lumenal membrane. As a positive control for two proteins that should co-localize, we will image ZF1::GFP::PKC-3; PAR-6-mKate; these two proteins bind directly and co-localize in nearly all cells in which they have been examined.

      2) The same group has described a CDC-42 biosensor to detect its active form. It could be used here to precisely pinpoint where active CDC-42 is required: in the cytoplasm? At the lumenal membrane? colocalizing with what other protein? This will require the expression of a transgene under an excretory cell specific promotor and a simple injection strategy while helping to strengthen the description of the CDC-42 role.

      See Reviewer 1 point #2.

      3) As the authors certainly know, there is a PAR-6 mutation which prevents its binding to CDC-42. They could express this construct in the excretory canal a simple extrachromosomal array should be sufficient) to validate the direct interaction between these proteins in this cell.

      See Reviewer 1 point #3.

      4) What is the lethality of ZIF-1-mediated depletion of the various factors under the exc promoter? Can homozygous strains be maintained? Authors just have to add a sentence in the Mat&Met section.

      All of the strains with excretory cell-specific degradation we have examined are viable when grown on NGM plates. We will add this point to the materials and methods.

      Provided that the authors have access to an Airyscan, all the questions asked here can be answered in two months (one month for constructs, one month for injection and data analysis) at a very minor cost.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Strengths of this manuscript include the use of endogenously tagged proteins (rather than over-expressed transgenes) for high resolution imaging and a cell-type specific acute depletion strategy that avoids complicating pleiotropies and allows tests of molecular epistasis. While some results were fairly expected based on prior studies of Cdc42, PAR proteins, and the exocyst in other tissues or systems, differences in the requirements for par-6 and pkc-3 vs. par-3 strongly suggest that the former genes play more important roles in exocyst recruitment. I was also excited to see a connection made between EXC-5 and PKC-3 localization.

      1.Lumen formation vs. lumen extension. The abstract and introduction use these two terms almost interchangeably, but they are not the same and more care should be taken to avoid the former term. The data here do not demonstrate any roles for par or other genes in lumen formation, but do demonstrate roles in lumen extension and organization/shaping.

      We agree and will correct wording to indicate that lumen extension is affected.

      2.Related to the above, mutant phenotypes here are surprisingly mild and variable. The authors discuss possible reasons for the particularly mild phenotype of par-3 mutants, but don't specifically address the mild phenotypes of the others. Clearly quite a bit of polarization and apical membrane addition occurs in ALL of the mutants. Is this because those early steps use other/redundant molecular players, or is depletion too late or incomplete to reveal an early role?

      We agree with Reviewer 3 and will bring up these points in the discussion. Degradation of proteins strongly predicted to function together (RAL-1 and SEC-5; PAR-6 and PKC-3) produce similar although not identical phenotypes; as discussed above we consider it likely that these differences reflect minor differences in degradation efficiency below our ability to detect by fluorescence. As Reviewer 3 points out, the excretory-specific driver we use to express ZIF-1 may not be active at the very earliest stages of lumen formation, and degradation could take 45 minutes or more after the promoter becomes active (Armenti et al, 2014). Thus, we agree that phenotypes could be more severe if it were possible to completely deplete each tagged protein prior to the onset of lumen formation. However, this caveat does not change the interpretations of our experiments since all proteins are degraded with the same driver. We have avoided mentioning that the phenotypes we observe reflect the ‘null’ phenotype for these reasons. We will emphasize these points in the discussion.

      The authors introduce a new reagent, "excP" (the promoter for T28H11.8), which they use to drive canal cell expression of ZIF-1 for their degron experiments. Please provide more information about when in embryogenesis this promoter becomes active, how that compares to when the par genes, sec-5, ral-1 and cdc-42 are first expressed, and what canal length is at that time. It would also be helpful to show the timeframe for degron-based depletion using this reagent (Figure 1C shows only depletion at L4, days later).

      Publicly available single cell RNA seq data (https://pubmed.ncbi.nlm.nih.gov/31488706/ and https://cello.shinyapps.io/celegans_explorer/) suggest that canal expression of the endogenous T28H11.8 gene doesn't really ramp up until the 580-650 minute timepoint, which is several hours after par gene canal expression (270-390 minutes) and the initiation of canal lumen formation (bean stage, 400-450 minutes). These data suggest that excP might come on too late to test requirements in lumen formation and early stages of extension. This caveat should be at least mentioned.

      See point #2 above. We agree that providing more information on expression from the T28H11.8 promoter would be important for interpreting the severity of phenotypes. We will raise this point in the discussion, and include existing published expression data and a more detailed analysis of the excP::mCherry transgene.

      3.There are two major aspects to the mutant phenotypes observed here: short lumens and cystic lumens. A short lumen makes sense intuitively, but the cysts could use a little more explanation. (What are cysts? What is thought to be the basis of their formation?). It is intriguing that cysts in sec-5 vs. ral-1 mutants (Figure 1) and par-6 vs. pkc-3 mutants (Figure 4) seem to have a very different size and overall appearance. Are these consistent differences, and if so, what could be the explanation for them?

      This is an interesting point. Since it is not practical to perform time-lapse imaging to watch canal cysts form, we analyzed only L1 and L4 larvae. We believe from our imaging that these are discontinuous regions of the lumen. One explanation for the expansion and dilation of the cystic lumens by L4 stage could be that the canal lumen has been expanded by fluid buildup resulting from a defect in canal function in osmoregulation, but we have not tested this directly. The reviewer also raises an interesting point regarding different appearances of cysts in SEC-5 and RAL-1 depleted larvae compared to PAR-6 and PKC-3. It is possible that these differences arise because SEC-5 and RAL-1 might direct whether vesicles will fuse at all, whereas PAR proteins direct where they will fuse in the cell (i.e. there could be fusion at basal surfaces, or just reduced apical fusion). We will bring up these points in the discussion.

      4.The authors did not test if PKC-3, like PAR-6, is required to recruit exocyst to the canal cell apical membrane, but their prior studies in the embryo suggested that it is (Armenti et al 2014). They also did not test if EXC-5 is required to recruit PAR-6 and the exocyst (along with PKC-3), or if CDC-42 is required to recruit PKC-3 (along with PAR-6). There seems to be an assumption that PAR-6 and PKC-3 are regulated and function in a common manner (as is often the case), but that has not been demonstrated here specifically. The basis for this assumption and alternatives to the linear model should be acknowledged.

      As discussed above (Reviewer 1 point #6), we will directly test whether PKC-3 is required to recruit SEC-10::mCherry to the lumenal membrane. We agree with Reviewer 3 that we have not shown that PAR-6 and PKC-3 always function similarly, although this is expected based on their similar phenotypes and co-dependent functions in other cells. We will mention this caveat in the discussion.

      5.EXC-5 is presumed to act upstream of CDC-42 based on shared phenotypes and the known Rho GEF activity of its mammalian homologs. However, direct evidence for this is currently lacking. In future, the authors might test if depleting EXC-5 affects CDC-42 activation/GTP-loading by using CDC-42 biosensors that have been reported in the literature (e.g. Lazetic et al 2018).

      See Reviewer 1 point #2.

      \*Minor comments:** Figure 1, Figure 4, Figure S3, Figure S4 Blue color/CFP indicates the apical/luminal membrane or the apical region of the canal cytoplasm, not the actual lumen as the labels suggest. The lumen is a hollow cavity on the opposite side of the plasma membrane from these markers, and it is shown as white in the Figure 1A upper right cartoon.*

      Thank you for pointing this out. We will correct the figure labelling.

      Figure 2, Figure S2 I'm not confident in the statistical analysis used here (Fisher's Exact test on two bins, 50% canal length), given that four length bins (not two) were defined. I recommend consulting a statistician.

      Our rationale for using two bins for the statistical analysis was because control larvae nearly all have a similar canal length (L1 stage: 99% of larvae have canal length that is 51-75% of body length; L4 stage: 98% of larvae have canal length that is 76-100% of body length), making it straightforward to ask if mutants are shorter. We chose not to make more granular phenotypic comparisons, as we cannot rule out that subtle differences in degradation efficiency, rather than differences in biological function, underlie any differences in canal length of the degron mutants. We will consult with a statistician to determine if this is an acceptable way to statistically compare controls and mutants.

      p.3 "Born during late embryogenesis..." Actually, the canal cell is born at ~270 minutes after first cleavage, which is in the first half of embryogenesis, not what I would call "late".

      We agree and will correct the wording.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The C. elegans excretory canal cell is a classic model for studying single cell tubulogenesis, where a cell establishes an intracellular apical domain that extends to form a lumen. Prior studies in this system have identified a set of gene products that localize to the growing apical domain and/or are important for its organization and size, but the molecular pathways through which these various gene products act remain poorly understood. Here, Abrams and Nance are able to connect the dots among several of these to flesh out a pathway for apical membrane addition. Specifically, they demonstrate that CDC-42 is needed to recruit PAR-6, and that PAR-6 is needed to recruit the exocyst to the apical membrane and to promote proper apical membrane growth and organization. EXC-5, a candidate GEF for CDC-42, also appears to act in this pathway.

      Strengths of this manuscript include the use of endogenously tagged proteins (rather than over-expressed transgenes) for high resolution imaging and a cell-type specific acute depletion strategy that avoids complicating pleiotropies and allows tests of molecular epistasis. While some results were fairly expected based on prior studies of Cdc42, PAR proteins, and the exocyst in other tissues or systems, differences in the requirements for par-6 and pkc-3 vs. par-3 strongly suggest that the former genes play more important roles in exocyst recruitment. I was also excited to see a connection made between EXC-5 and PKC-3 localization.

      Major comments:

      1.Lumen formation vs. lumen extension. The abstract and introduction use these two terms almost interchangeably, but they are not the same and more care should be taken to avoid the former term. The data here do not demonstrate any roles for par or other genes in lumen formation, but do demonstrate roles in lumen extension and organization/shaping.

      2.Related to the above, mutant phenotypes here are surprisingly mild and variable. The authors discuss possible reasons for the particularly mild phenotype of par-3 mutants, but don't specifically address the mild phenotypes of the others. Clearly quite a bit of polarization and apical membrane addition occurs in ALL of the mutants. Is this because those early steps use other/redundant molecular players, or is depletion too late or incomplete to reveal an early role?

      The authors introduce a new reagent, "excP" (the promoter for T28H11.8), which they use to drive canal cell expression of ZIF-1 for their degron experiments. Please provide more information about when in embryogenesis this promoter becomes active, how that compares to when the par genes, sec-5, ral-1 and cdc-42 are first expressed, and what canal length is at that time. It would also be helpful to show the timeframe for degron-based depletion using this reagent (Figure 1C shows only depletion at L4, days later).

      Publicly available single cell RNA seq data (https://pubmed.ncbi.nlm.nih.gov/31488706/ and https://cello.shinyapps.io/celegans_explorer/) suggest that canal expression of the endogenous T28H11.8 gene doesn't really ramp up until the 580-650 minute timepoint, which is several hours after par gene canal expression (270-390 minutes) and the initiation of canal lumen formation (bean stage, 400-450 minutes). These data suggest that excP might come on too late to test requirements in lumen formation and early stages of extension. This caveat should be at least mentioned.

      3.There are two major aspects to the mutant phenotypes observed here: short lumens and cystic lumens. A short lumen makes sense intuitively, but the cysts could use a little more explanation. (What are cysts? What is thought to be the basis of their formation?). It is intriguing that cysts in sec-5 vs. ral-1 mutants (Figure 1) and par-6 vs. pkc-3 mutants (Figure 4) seem to have a very different size and overall appearance. Are these consistent differences, and if so, what could be the explanation for them?

      4.The authors did not test if PKC-3, like PAR-6, is required to recruit exocyst to the canal cell apical membrane, but their prior studies in the embryo suggested that it is (Armenti et al 2014). They also did not test if EXC-5 is required to recruit PAR-6 and the exocyst (along with PKC-3), or if CDC-42 is required to recruit PKC-3 (along with PAR-6). There seems to be an assumption that PAR-6 and PKC-3 are regulated and function in a common manner (as is often the case), but that has not been demonstrated here specifically. The basis for this assumption and alternatives to the linear model should be acknowledged.

      5.EXC-5 is presumed to act upstream of CDC-42 based on shared phenotypes and the known Rho GEF activity of its mammalian homologs. However, direct evidence for this is currently lacking. In future, the authors might test if depleting EXC-5 affects CDC-42 activation/GTP-loading by using CDC-42 biosensors that have been reported in the literature (e.g. Lazetic et al 2018).

      Minor comments:

      Figure 1, Figure 4, Figure S3, Figure S4 Blue color/CFP indicates the apical/luminal membrane or the apical region of the canal cytoplasm, not the actual lumen as the labels suggest. The lumen is a hollow cavity on the opposite side of the plasma membrane from these markers, and it is shown as white in the Figure 1A upper right cartoon.

      Figure 2, Figure S2 I'm not confident in the statistical analysis used here (Fisher's Exact test on two bins, <50% and >50% canal length), given that four length bins (not two) were defined. I recommend consulting a statistician.

      p.3 "Born during late embryogenesis..." Actually, the canal cell is born at ~270 minutes after first cleavage, which is in the first half of embryogenesis, not what I would call "late".

      Significance

      Polarized plasma membrane addition is critical for the development of epithelial tissues, so understanding the mechanisms that control this is of broad interest to many cell and developmental biologists. This study will be of particularly high interest to researchers working on PAR proteins, the exocyst, or single cell tube development.

      The results here add to the existing body of evidence for PAR-dependent recruitment of exocyst to expanding apical/luminal surfaces (e.g. Bryant et al 2010; Jones et al 2011, 2014; Armenti et al 2014) and to evidence for key functional distinctions between PAR-6 & PKC-3 vs. PAR-3 (e.g. Achilleos et al 2010; Jones et al 2014). The results here are more robust than in those prior studies and more clearly illustrate directionality due to the authors' acute depletion strategy, which avoids major tissue disruptions that could secondarily affect protein localization.

      expertise keywords: C. elegans, epithelia, tubulogenesis

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript by Abrams & Nance describes a precise investigation of the role of PAR proteins in the recruitment of the exocyst during and after the extension of the C. elegans excretory canal. State-of-the-art genetic techniques are used to acutely deplete proteins only in the targeted cell, and examine the localization of endogenously expressed markers. Experiments are well described and carefully quantified, with systematic statistical analysis. The manuscript is easy to follow and the bibliography is very good. Most conclusions are well supported.

      I only have a few minor questions or remarks:

      1) I am not entirely convinced by the presence of CDC-42 at the lumenal membrane (Fig3G); it seems to be more sub-lumenal that really lumenal. It peaks well before PAR-6 (Fig3H) which itself seem slightly less apical that PAR-3 (Fig3F). Could you use super-resolution microscopy (compatible with endogenous expression levels) to more precisely localize CDC-42? Similar point for PAR-3 and PAR-6 which do not seem to colocalize completely - a longitudinal line scan along the lumenal membrane might provide the answer even without super-resolution; this could help explain why these two proteins do not have the same function. These suggestions are easy to do provided the authors can have access to super-resolution (Airyscan to name it; although other methods will be perfectly acceptable I believe it is the most simple one).

      2) The same group has described a CDC-42 biosensor to detect its active form. It could be used here to precisely pinpoint where active CDC-42 is required: in the cytoplasm? At the lumenal membrane? colocalizing with what other protein? This will require the expression of a transgene under an excretory cell specific promotor and a simple injection strategy while helping to strengthen the description of the CDC-42 role.

      3) As the authors certainly know, there is a PAR-6 mutation which prevents its binding to CDC-42. They could express this construct in the excretory canal a simple extrachromosomal array should be sufficient) to validate the direct interaction between these proteins in this cell.

      4) What is the lethality of ZIF-1-mediated depletion of the various factors under the exc promoter? Can homozygous strains be maintained? Authors just have to add a sentence in the Mat&Met section.

      Provided that the authors have access to an Airyscan, all the questions asked here can be answered in two months (one month for constructs, one month for injection and data analysis) at a very minor cost.

      Significance

      The reviewer has an expertise in cell polarity and membrane trafficking, using C. elegans as a model.

      The manuscript by Abrams & Nance describes a precise investigation of the role of PAR proteins in the recruitment of the exocyst during and after the extension of the C. elegans excretory canal. The interactions between these factors have already been examined in a number of models and contexts. In particular it follows a previous study from the same group (Armenti et al, Dev Biol, 2014) which established that the exocyst and RAL-1 controls polarized secretion in this model, and that PAR proteins are required for the polarized localization of the exocyst, but using the early embryo. This new manuscript is entirely focused on the excretory canal and 1) confirms the previous results, and 2) significantly extends them by precisely dissecting the role of CDC-42 and the apical PAR proteins. It will be of interest to researchers investigating the links between polarity and membrane trafficking with the description of a molecular cascade required for membrane trafficking in the context of a single-cell tube.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript by Abrams and Nance describes how the polarity proteins PAR-6 and PKC-3/aPKC promote lumen extension of the unicellular excretory canal in C. elegans. Using tissue-specific depletion methods they find that CDC-42 and the RhoGEF EXC-5/FGD are required for luminal localization of PAR-6, which recruits the exocyst complex required for lumen extension. Interestingly, they show that the ortholog of the mammalian exocyst receptor, PAR-3, is dispensable for luminal membrane extension. Overall, this is a well-written and interesting manuscript.

      Major comments

      1.Because depletion of PAR-3 in the canal causes milder defects than PAR-6 or CDC-42 the authors suggest that they cannot rule out the possibility that an alternative isoform of PAR-3 is expressed and buffering the defect. They should perform canal-specific RNAi-mediated depletion of the entire PAR-3 gene to determine if this is true.

      2.The authors suggest that GTP-loaded (activated) CDC-42 recruits PAR-6 to the luminal membrane. It would be nice if they could use a biosensor, such as the GBD-WSP-1 reagent from Buechner's lab to confirm that EXC-5 depletion also reduces activated CDC-42, as would be expected. This should be achievable since there is strong CDC-42 signal, even in the cytoplasm.

      3.Related to point 2, (i) does mutation of the CRIB domain of PAR-6 impair its recruitment to the luminal membrane, and (ii) does this mutant exacerbate canal defects when PAR-3 is depleted?

      4.The authors hypothesize that partial recruitment of PAR-6 by CDC-42 is sufficient for luminal membrane extension to explain the mild defects caused by PAR-3 depletion. Since depletion of PAR-6 and CDC-42 alone causes milder canal truncations the authors should co-deplete these proteins (as well as PAR-3 and CDC-42) to determine if there is an additive effect.

      5.In figure 2, the authors show that depletion of PKC-3 causes more severe canal truncations than PAR-6. Since these proteins function in the same complex what do they think is the reason for this difference? This point could be discussed more in the manuscript.

      6.Related to point 5, more experiments with PKC-3 should be done to determine if, for example, localization of SEC-10 is similarly affected as ablation of PAR-3, PAR-6 and CDC-42.

      Significance

      This manuscript builds off their previous work on the role of the exocyst in excretory canal extension and in our view represents an important advance that is relevant to biological tube development across phyla. Therefore, this work should be of interest to biologists studying tubulogenesis in many different model systems.

      My areas of expertise include model organism genetics, biological tube development, and biochemistry.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful to Review Commons for the opportunity to get valuable comments on our manuscript “Trim39 regulates neuronal apoptosis by acting as a SUMO-targeted E3 ubiquitin-ligase for the transcription factor NFATc3”. We would like to acknowledge the very nice and constructive reviews that our manuscript received. We found all of the reviewer comments well founded and we are taking them into careful consideration in preparing the revised version. We are currently performing additional experiments to address the questions raised by the reviewers. We are not yet able to provide a revised version of the manuscript, but you will find below our response to the reviewers and our plan of revision. It is difficult to anticipate exactly how much time we will need to get the requested results and to prepare a complete revised version, as it will depend on whether we can work normally and whether we encounter technical problems. However, it should be possible within a few months.

      Reviewer #1

      **Summary:**

      Desagher and co-workers investigate the regulation of the NFAT family member NFATc3, a transcription factor in neurons with a pro-apoptotic role. They identify TRIM39 as a ubiquitin E3 ligase regulating NFATc3. They demonstrate that TRIM39 can bind and ubiquitinate NFATc3 in vitro and in cells. They identify a critical SUMO interaction motif in TRIM39, that is required for its interaction with NFATc3 and for its ability to ubiquitinate NFATc3. Moreover, mutating sumoylation sites in NFATc3 reduces the interaction with TRIM39 and reduces its ubiquitination. Silencing TRIM39 increases the protein levels of NFATc3 and its transcriptional activity, leading to apoptosis of neurons. TRIM17 modulates the TRIM39-NFATc3 axis. Combined, TRIM39 appears to be a SUMO-targeted ubiquitin ligase (STUbL) for NFATc3 in neurons.

      **Major points:**

      1.This manuscript containing two stories: the rather exciting story that TRIM39 is a STUbL for NFATc3 (as mentioned in the title) and the second less exciting story: TRIM17 modulates the regulation of NFATc3 by TRIM39. These stories are now mixed in a confusing manner, disrupting the flow of the first story. It would be better to focus the current manuscript on the first story and strengthen it further and develop the second story in a second manuscript.

      We understand that the reviewer is more interested in the part of our manuscript related to the characterization of Trim39 as a STUbL due to his/her field of expertise. However, the two other reviewers are also interested in the other parts of our work. Notably the third reviewer would like us to highlight the physiological importance of our findings. Indeed, the main goal of this article is to describe the mechanisms regulating the stability of the transcription factor NFATc3. Trim17 plays a role in this regulation by inhibiting Trim39. It is particularly important for understanding the impact of these mechanisms on neuronal apoptosis as Trim17 is induced in these conditions. As we want to reach a wide audience, we prefer not to focus our manuscript on the identification of a new STUbL. However, we agree with the reviewer that it would be very interesting to strengthen this part of our work and we are grateful for his/her suggestions.

      2.Whereas the cellular experiments to indicate that TRIM39 acts as a STUbL are properly carried out, the observed effects are not necessarily direct. Direct evidence that TRIM39 is indeed a STUbL for sumoylated NFATc3 needs to be obtained in vitro, using purified recombinant proteins. Does TRIM39 indeed preferentially ubiquitinate sumoylated NFATc3? Is ubiquitination reduced for non-sumoylated NFATc3? Is ubiquitination of sumoylated NFATc3 dependent on SIM3 of TRIM39? Do other SIMs in TRIM39 contribute?

      We agree with the reviewer that additional in vitro experiments using purified recombinant proteins would strengthen the characterization of Trim39 as a STUbL. In order to answer the specific questions of the reviewer, we propose to perform in vitro ubiquitination using different forms of GST-Trim39 (WT/mSIM3/mSIM1&2) following in vitro SUMOylation (or not) of NFATc3 produced by TnT (wheat germ) and purified by immunoprecipitation. Preliminary results using WT Trim39 show that indeed the in vitro ubiquitination of NFATc3 is improved by prior in vitro SUMOylation. We have to confirm these results and to test the SIM mutants of Trim39 in the same conditions.

      3.Rule out potential roles for other STUbLs by including control knockdowns of RNF4 and RNF111 and verify the sumoylation of NFATc3 and ubiquitination of wildtype and sumoylation-mutant NFATc3.

      Our data show that silencing of Trim39 deeply decreases the ubiquitination level of NFATc3 in Neuro2A cells, indicating that Trim39 plays a major role in this process. We agree that this does not exclude the possible involvement of other STUbLs in NFATc3 ubiquitination in this model but their potential contribution would be limited. This point will be better addressed in the discussion.

      4.Figure 6B: use SUMO inhibitor ML-792 to demonstrate that ubiquitination of wildtype NFATc3 by TRIM39 is dependent on sumoylation.

      We thank the reviewer for suggesting this experiments that can easily improve the strength of our demonstration. Our preliminary results indeed indicate that in vivo ubiquitination of NFATc3 by Trim39 is strongly decreased following treatment with the SUMO inhibitor ML-792. We have to confirm these results.

      **Minor points:**

      5.Figure 1A and B: demonstrate by immunoprecipitation and Western that the endogenous counterparts indeed interact.

      We are currently setting the conditions to immunoprecipitate endogenous NFATc3 and Trim39 in order to demonstrate that they indeed interact.

      6.Figure 1C and 1E: Quantify the PLA results properly and perform statistics.

      We will perform these quantification and statistical analysis as requested.

      7.Figure 2B: Correct unequal loading of samples.

      We agree with the reviewer (as with reviewer #2) that the blots showing the total lysates of this experiment are confusing. As mentioned in the legend, some material has been lost during the TCA precipitation resulting in unequal loading. However, these experiments have been performed a very long time ago and we do not have the protein extracts anymore. We are currently trying to produce efficient shRNA-expressing lentiviruses to reproduce this experiment and provide a better figure.

      8.Figure 6B: proper statistics are needed here from at least three independent experiments.

      The reviewer is right. Statistics are needed to reinforce the significance of these results. We have quantified three independent experiments and made graphs and statistics that will be presented in the revised version of the manuscript. They better support our conclusion.

      Reviewer #1 (Significance (Required)):

      Humans have over 600 different ubiquitin E3s. Currently, RNF4 and RNF111 are the only known human SUMO-Targeted Ubiquitin Ligases (STUbLs). Here, the authors present evidence that the ubiquitin E3 ligase TRIM39 is a STUbL for sumoylated NFATc3. Identification of a new STUbL is an exciting finding for the ubiquitin and SUMO field and for the field of ubiquitin-like signal transduction in general, but needs to be strengthened as outlined above. My field of expertise is SUMO and ubiquitin signal transduction.

      Reviewer #2

      In this manuscript, the authors analyze the effect of TRIM39, a ubiquitin E3 ligase, on NFATc3, a transcription factor that regulates apoptosis in the nervous system. The authors show that TRIM39 can promote the ubiquitination of NFATc3 and regulate its half-life. Furthermore, ubiquitination depends on the SUMOylation state of NFATc3, which suggests that TRIM39 could be a new example of SUMOylation-dependent ubiquitin ligase or STUbL. **In addition, the authors show that TRIM17 interferes with TRIM39 ubiquitination, representing a new regulatory level for NFATc3 degradation. This has consequences on the regulation of apoptosis in cells derived from the nervous system.

      The authors show well-controlled, sound results for the most part. The manuscript is well written, and argumentation is convincing. Given the fact that only 2 STUbLs were previously characterized in mammals, the results are relevant and represent an advance in the field. Overall, this is a nice piece of work. Here are some comments.

      **Major comments**

      -In Fig. 2B, the levels of material loaded are uneven, which difficult the interpretation.

      We agree with the reviewer (as with reviewer #1) that the blots showing the total lysates of this experiment are confusing. As mentioned in the legend, some material has been lost during the TCA precipitation, resulting in unequal loading. In the other experiments, we have the same problem or the background is too high. We are currently trying to produce efficient shRNA-expressing lentiviruses to reproduce this experiment and provide a better figure.

      However, it seems that the control shRNA also has an effect on NFATc3 ubiquitination, which should not be the case.

      It is true that, in the present figure, the ubiquitination signal is decreased in cells transduced with the control shRNA. However, this is likely due to reduced expression of transfected NFATc3 following lentiviral infection, as it can be seen on the western blot of total lysates.

      Also, by reducing ubiquitination by TRIM39, shouldn't you expect an increase in the levels of NFATc3, if this ubiquitination was driving degradation? The authors do not specify whether those cells were treated or not with proteasomal inhibitor.

      We agree that an increase in the protein level of NFATc3 is expected following silencing of Trim39. However, in the assay presented in Figure 2B, NFATc3 is transfected and the part of overexpressed NFATc3 that is ubiquitinated by endogenous Trim39 is certainly low. Therefore, silencing of Trim39 cannot have a visible impact on the total protein level of NFATc3.

      Indeed, cells were treated with proteasome inhibitor. It is mentioned in the legend of Figure 2A. To avoid repeating it in the legend of Figure 2B, we just wrote that, after 24h transfection, cells were treated as in A, with includes MG-132 treatment for 6h.

      Same applies in Figure 4B, where no reduction in NFATc3 are seen after including TRIM39 in the reaction (beyond the fact that it looks reduced because the presence of ubiquitinated forms).

      In Figure 4B, the reaction of ubiquitination is performed in an acellular medium with purified recombinant proteins. Although NFATc3 is produced by in vitro transcription/translation in wheat germ extract, it is purified by immunoprecipitation before in vitro ubiquitination. Therefore, the reaction does not contain any proteasome and NFATc3 should not be degraded following its ubiquitination by TRIM39.

      -After the experiments in vitro shown in Fig. 2C, the authors conclude that the NFATc3 is a direct substrate of TRIM39. I think the authors used the right approach by using bacterially produced GST-TRIM39 for the ubiquitination reaction. However NFATc3 is produced by an in vitro transcription-translation system, which could in principle provide other contaminant proteins to the reaction. Did the authors try to use bacterially produced NFATc3? This might be difficult in the case of big proteins, in which case the authors could add some caution note in the text. Same applies in Figure 4B.

      The reviewer is right. It would have been preferable to use NFATc3 produced in bacteria. Indeed, we started with this approach. However, it was very difficult to get NFATc3 expressed in bacteria, and when we succeeded, most of the protein was degraded. We tried different protease inhibitor cocktails and we used a strain of bacteria (BL21-CodonPlus(DE3)-RP) that is mutated on the genes coding for the proteases Lon and OmpT and is further engineered to express tRNAs that are often limiting when expressing mammalian proteins. Unfortunately, this did not improve our production enough.

      We agree that, in principle, in vitro transcription-translation (TnT) systems can include contaminant proteins. However, we used wheat germ extract to produce NFATc3 by TnT. Moreover, we immunopurified NFATc3 from the TnT reaction prior to the ubiquitination reaction. The probability that proteins modifying NFATc3 are expressed in plants and are co-purified with NFATc3 is low. Nevertheless, we will discuss this point in the result section of the revised version of the manuscript, when describing results of Figure 2B and 4B.

      -In Fig. 6B, higher levels of ubiquitination in the different SUMOylation mutants are shown. Is this effect consistent? How this can be explained?

      We are grateful to the reviewer for pointing out this inconsistency in our manuscript. It will be corrected. Indeed, the values indicated in red in Figure 6B are confusing and are certainly not consistent. We calculated them by normalizing the intensity of the ubiquitination signal by the intensity of NFATc3 in total lysates, which seems to have introduced a bias. Variations in NFATc3 levels are probably responsible for the artificially higher levels of ubiquitination for different SUMOylation mutants after normalization. When quantifying three independent experiments, as requested by reviewer #1, we realized that results are much more consistent without normalization. Therefore, in the revised version of manuscript, we will add a graph showing the average and standard deviation of three independent experiments quantified without normalization. We will also replace the experiment currently presented in Figure 6B by another one in which the levels of NFATc3 show lower variations in the total lysates.

      In addition, variations in the levels of NFATc3 are shown in the total lysate, despite the use of proteasomal inhibitors. How the author explain this effect?

      These variations in NFATc3 levels in the total lysates may be due to differential protein precipitation by TCA. That is why, in more recent experiments, we collected a portion of the homogenous cell suspension before lysis in the guanidinium buffer, to assess the expression level of transfected proteins (as presented in Figures 4A and 7E).

      It is true that treatment with proteasome inhibitor should attenuate differences in protein level due to different ubiquitination levels. However, cells are transfected for 24h and then treated with MG-132 for 6h before lysis. Proteasome inhibition cannot compensate for what occurred in the cells during the 24h transfection. It is added essentially to accumulate poly-ubiquitinated forms of NFATc3.

      Somehow, this is contradictory with the general message of SUMOylation-dependent ubiquitination.

      The reduced levels of SUMOylation mutants in total lysates may appear to be contradictory with SUMOylation-dependent ubiquitination. However, as mentioned above, this could be due to differential protein precipitation by TCA or to different transfection efficiencies. In contrast, the half-life measurement of WT and EallA mutant, that does not rely on initial expression levels, clearly shows a stabilization of the SUMOylation mutant. Moreover, the average of the three ubiquitination experiments is really convincing. Therefore, we believe that the data that will be presented in the revised manuscript will strongly support our hypothesis.

      -In Fig. 7E, not clear to me what the big bands above 130 KDa is after the nickel beads. Do they correspond to monoUb NFATc3 or to the unmodified protein that is sticky to the beads? Do the authors have side-by-side gels of the initial lysate next to the nickel beads eluates to show the increase in molecular weight?

      The big bands above 130 kD among nickel bead-purified proteins in Figure 7A are unlikely to be unmodified NFATc3 sticking to the beads. Indeed, in the control condition, in which NFATc3 is overexpressed in the absence of His-ubiquitin, these bands are not visible. Therefore, they might be mono-ubiquitinated forms of NFATc3, or degradation products of poly-ubiquitinated NFATc3. We will correct the figure to clarify this point. Unfortunately, we do not have a gel with nickel bead eluates and total lysates side by side for this experiment.

      -Quantifications in some pictures (i.e. Figures 5A, 5B, 6B, 7) is shown in red above or below the bands. Not clear whether the quantifications shown correspond to that single experiment or is the average of several experiments. In the first case, the number would not be very valuable. Authors could add quantification graphs with standard deviations or error bars to the experiments if they want to make the point of changes (significant or not) in the levels. Alternatively, indicate in the Figure legends whether the numbers correspond to the average of several experiments.

      These quantifications correspond to the representative experiments shown in the different figures. We will clarify this point in the Figure legends of the revised manuscript. We added these quantifications to normalize the amount of co-precipitated proteins by the amount of the precipitated partner (Fig 5A, 5B, 7B, 7C, 7D) which is not always precipitated with the same efficiency in the different conditions. We think that it should help the reader to assess the degree of interaction. We also added quantifications to Figure 7E to normalize the ubiquitination signal by the amount of NFATc3 expressed in the total lysate. However, we did not want to overload the figures by adding too many graphs.

      For Figure 6B, where TCA precipitation of total lysates created an inconsistency, we will provide a graph with the average and standard deviation of three independent experiments, as requested by reviewer #1.

      -In Fig. 8, the quantification of apoptotic nuclei has been done just based on the morphology after DAPI staining. Could you use an apoptosis marker (i.e. cleaved caspase Abs) to label the apoptotic cells?

      We have been using primary cultures of cerebellar granule neurons (CGN) as an in vitro model of neuronal apoptosis for many years. Nuclear condensation, visualized after DAPI staining, is very characteristic in these neurons and allows a reliable assessment of neuronal apoptosis. In a previous study (Desagher et al. JBC 2005), we have shown that the kinetics of apoptosis in CGN is the same whether we measure cytochrome c release, active caspase 3 or nuclear condensation (Fig 1b). We therefore believe that the counting of apoptotic nuclei is sufficient to support our conclusions, notably for transfection experiments in Figure 8A which would require a lot of work to be repeated with active caspase 3 staining. However, if we can produce efficient shRNA-expressing lentiviruses, we will reproduce the experiment presented in Figure 8B and we will perform a western blot using anti-active caspase 3 to confirm our conclusion.

      **Minor comments**

      -In Figs. 1 and 5, the red channel should be put in black and white, as it is much easier to see the signal. Not relevant to have DAPI alone in B&W (it does not hurt either), as it is well visible in the merge picture. Also, quantification of the PLA positive dots should be shown in Fig. 1.

      We thank the reviewer for these suggestions. We will modify the figures and we will quantify the PLA dots in Figure 1 as requested.

      -In Fig. 3C, is the difference in TRIM17 expression between empty plasmid and NFATc3 plasmid significant? If so, indicate it in the graph. The same in panels D, E, indicate all significant differences. Same in other Figures.

      No, the difference in Trim17 expression is not statistically significant between NFATc3 and empty plasmid although it clearly increases. However, we agree with the reviewer that more significant differences could be shown in the figures, particularly in Figure 3. Nonetheless, we will try not to overload the figures and will restrict ourselves to comparisons that make sense.

      -It would be nice to show a scheme on the location of SIMs in TRIM39 in relation to the other feature of the protein.

      We are grateful to the reviewer for this suggestion. We will be happy to add a scheme of Trim39 showing its different domains and the location of its SIMs in the revised Figure 7.

      -In Fig. 2 legend, "Note that in the presence of ubiquitin the unmodified form of WT GST-Trim39 is lower due to high Trim39 ubiquitination." Please change to "...in the presence of ubiquitin the levels of the unmodified form..."

      -In Fig. 7 legend, the phrases "The intensity of the bands ... " are not clear. Please rephrase.

      -In Fig. 8 legend, "\** * PWe thank the reviewer for pointing out typographical errors and awkward sentences in our manuscript. Changes will be made as requested.

      Reviewer #2 (Significance (Required)):

      In this manuscript, the authors analyze the effect of TRIM39, a ubiquitin E3 ligase, on NFATc3, a transcription factor that regulates apoptosis in the nervous system. The authors show that TRIM39 can promote the ubiquitination of NFATc3 and regulate its half-life. Furthermore, ubiquitination depends on the SUMOylation state of NFATc3, which suggests that TRIM39 could be a new example of SUMOylation-dependent ubiquitin ligase or STUbL. In addition, the authors show that TRIM17 interferes with TRIM39 ubiquitination, representing a new regulatory level for NFATc3 degradation. This has consequences on the regulation of apoptosis in cells derived from the nervous system.

      The authors show well-controlled, sound results for the most part. The manuscript is well written, and argumentation is convincing. Given the fact that only 2 STUbLs were previously characterized in mammals, the results are relevant and represent an advance in the field. Overall, this is a nice piece of work.

      Audience: researchers interested on proteostasis in general and on nervous system regulation

      My expertise: postranslational modifications

      Reviewer #3

      **Summary:**

      In this study, Shrivastava et al. elucidated the previously unknown function of TRIM39 in regulating protein stability of NFATc3, the predominant member of the NFAT family of transcription factor in neurons, where it plays a pro-apoptotic role. NFATs have been shown to be regulated by multiple mechanisms, including at the level of protein stability. In this study, the authors identify TRIM39 as the E3 ligase for NFATc3. Interestingly, TRIM39 recognizes the SUMOylated form of NFATc3 and the interaction facilitates its ubiquitylation and subsequent proteasomal degradation. They further showed that binding of TRIM39 to NFATc3 can also be regulated by TRIM17. Like TRIM39, TRIM17 is a ring-finger containing protein previously shown by this group that it binds NFATc3 but the interaction resulted in an up- rather than down-regulation of NFATc3. In this study, they offer insight to the paradox that overexpression of TRIM17 binding to TRIM39 is to inhibit TRIM39-mediated ubiquitylation of NFATc3. Furthermore, they showed activation of NFATc3 transcriptionally activates TRIM17 expression, thus forming a feedback loop between NFATc3 and TRIM17. Hence, an TRIM17-TRIM39-NFATc3 signaling axis for modulating the protein stability for promoting the activity of NFATc3 in regulating apoptosis in the cerebellar granule neurons induced by KCl deprivation is proposed

      The key conclusions are convincing. The data in general are of good quality and with many of the key interactions vigorously documented **by conducting reciprocal interaction analysis. For knockdown expeRIMents, two shRNA independent sequences were used. However, some issues remain to be addressed:

      **Major comments:**

      1.Figure 1D - the authors should demonstrate that the depletion of TRIM39 expression by shRNA in Neuro2A by Western blotting

      We agree with the reviewer that it would be better to provide this control. Unfortunately, we have never been able to observe a convincing decrease in the protein level of Trim39, following knockdown, by Western blotting in Neuro2A cells. This is surprising because the decrease is clearly visible by immunofluorescence in Neuro2A cells, and by western blotting in neurons (see Figure 8C). It is possible that Neuro2A cells, but not neurons, express a protein that is non-specifically recognized by our best anti-Trim39 antibody in western blots and that migrates at the same size as Trim39, thus preventing the investigator to detect the depletion of Trim39. We will test additional anti-Trim39 antibodies to address this question.

      2.Figure 3 - the author should show overexpression of TRIM39 resulted in reduction of basal level of endogenous NFATc3 due to its effect on protein stability by using CHX or other pulse chase method.

      This is an important point and we have performed many experiments using cycloheximide to measure the half-life of NFATc3 in the presence or the absence of overexpressed Trim39. The results were neither consistent nor reproducible. This is certainly due to the fact that the half-life of endogenous NFATc3 is longer than that of overexpressed Trim39 and that cycloheximide inhibits the expression of both proteins. Therefore, we will perform pulse-chase experiments after metabolic labelling of cells with [35S]-Met. We are currently setting up the conditions to immunoprecipitate endogenous NFATc3 to be able to perform these experiments.

      3.Figure 3 - Does overexpression or knockdown of TRIM39 has an effect on affecting levels of NFATc3 mRNAs?

      The reviewer is right. It is important to control that overexpression and knockdown of Trim39 do not modify the mRNA level of NFATc3. Therefore, we are currently measuring NFATc3 mRNA levels in all the experiments used to make the graphs of Figure 3. These results will be added to the revised version of the manuscript as supplemental data. First results show no significant change of NFATc3 mRNA levels in these experiments.

      4.Figure 6A - the authors should confirm the multiple bands that are slower migrating are SUMO form of NFATc9 by demonstrating the presence of SUMO in these forms of NFATc3, or alternatively, perform His-SUMO pull-down and probe for NFATc3.

      The reactions shown in Figure 6B have been performed in vitro, with purified recombinant proteins and with NFATc3 produced by in vitro transcription/translation. The wheat germ extract used to produce NFATc3 is unlikely to provide the material needed for post-translation modification of a mammalian protein. However, we agree that it would be better to confirm that slower migrating bands are indeed SUMOylated forms of NFATc3. We may hybridize the membranes with an anti-SUMO antibody but it would give a smear as the enzymes added to the reaction mix are themselves SUMOylated. Therefore, we will show an experiment in which the reaction mix has been incubated with and without SUMO. The results show no slower migrating bands in the absence of SUMO although all conditions were otherwise identical. This will be added to the revised Figure 6.

      5.Figure 7C - the quantification for mSIM1 does not seem to agree with the band intensity.

      Yes, we agree with the reviewer that the quantification (122%) does not seem to reflect the amount of SUMO-chains bound to GST-Trim39 mSIM1. This is due to the normalization of the SUMO signals by the intensity of GST-Trim39 bands. Indeed, it is difficult to control exactly how much recombinant protein is used. GST-Trim39 mSIM1 was slightly less abundant than the other GST-Trim39 proteins in this experiment, explaining why less SUMO-chains were eluted in this condition. The normalization is mentioned in the legend of Figure 7C.

      6.TRIM17 reduces TRIM39/NFATc3 interaction and inhibits TRIM39 E3 activity, which results in stabilization of NFATc3. NFATc3 in turn transcriptionally induces TRIM17 expression, thus forming a feedback loop between NFATc3 and TRIM17. It will be good if the authors can discuss the possibility of the existence of this feedback mechanism in physiological context? Is the protein level of NFATc3 level, which should be low abundance at the resting state, elevated by KCI deprivation? If so, can the authors discuss the possible signalling event(s) that that lead to activation of NFATc3 upon KCI deprivation? For instance, does KCL deprivation cause de-SUMOylation of NFATc3?

      We thank the reviewer for these suggestions. Our preliminary results suggest that the protein level of NFATc3 is increased in neurons following KCl deprivation. We are currently performing additional experiments to confirm this result. If proved, this increase may be due to the transcriptional induction of Trim17 that should result in the stabilization of NFATc3 through the inhibition of Trim39. It may also be due to a possible deSUMOylation of NFATc3 following apoptosis induction, as suggested by the reviewer. To address the latter point, we are currently setting up PLA using anti-NFATc3 and anti-SUMO antibodies to assess the SUMOylation level of endogenous NFATc3 in neurons. If they are of good quality, we will add these data to Figure 8 and we will discuss the possible existence of feedback loops in neuronal apoptosis, as suggested by the reviewer.

      **Minor comments:**

      1.Line 294 - it should be "SUMOylation" instead of "SUMO".

      We thank the reviewer for pointing out this typographical error that will be corrected.

      2.Figure 8 - to include TRIM39/NFATc3 double knockdown to show the effect on increased neuronal apoptosis in the cells with TRIM39 knocked down was due to elevation of NFATc3 rather than other target(s) of TRIM39.

      We agree that it would be interesting to test whether the increase on neuronal apoptosis following Trim39 silencing is mainly due to its effect on NFATc3. We will therefore perform double silencing of Trim39 and NFATc3 in neurons in order to address this point.

      3.The discussion may be shortened and revised to highlight the physiological importance of the findings linked to cerebellar granule neurons survival.

      As suggested by the reviewer, we will modify the discussion to better highlight the physiological implications of our data, particularly by discussing the results of the additional experiments we will conduct in neurons.

      Reviewer #3 (Significance (Required)):

      Prior to this study, the mechanism by which protein stability of NFATc3, the pre-dominant member of the NFAT family of transcription factor in neurons, is regulated remains poorly understood. Shrivastava et al. have unravelled the interplay between ubiquitylation and SUMOylation involving TRIM39 and TRIM17 to have an important role in regulating protein stability of NFATc3. The work is interesting and bears significance towards understanding how apoptosis could be finely controlled in cerebellar granule neurons. Furthermore, the study has also expanded the understanding of the role and regulation of the TRIM family of proteins. The senior author is an expert in this field and over the years, her group has contributed many key discoveries on the function of TRIM family of E3 ubiquitin ligases and their critical ubiquitylation substrates in neuronal survival and its relevance to neuronal biology and diseases.

      The referee's field of expertise in in the field of mitochondrial apoptosis signalling. The referee extensively involved in studying how protein stability of regulators in apoptosis signalling are regulated by the ubiquitin-proteasome system (UPS) and how does the regulation play a role in physiology and diseases.

      Key words: apoptosis, ubiquitylation, cell signaling, liver diseases

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this study, Shrivastava et al. elucidated the previously unknown function of TRIM39 in regulating protein stability of NFATc3, the predominant member of the NFAT family of transcription factor in neurons, where it plays a pro-apoptotic role. NFATs have been shown to be regulated by multiple mechanisms, including at the level of protein stability. In this study, the authors identify TRIM39 as the E3 ligase for NFATc3. Interestingly, TRIM39 recognizes the SUMOylated form of NFATc3 and the interaction facilitates its ubiquitylation and subsequent proteasomal degradation. They further showed that binding of TRIM39 to NFATc3 can also be regulated by TRIM17. Like TRIM39, TRIM17 is a ring-finger containing protein previously shown by this group that it binds NFATc3 but the interaction resulted in an up- rather than down-regulation of NFATc3. In this study, they offer insight to the paradox that overexpression of TRIM17 binding to TRIM39 is to inhibit TRIM39-mediated ubiquitylation of NFATc3. Furthermore, they showed activation of NFATc3 transcriptionally activates TRIM17 expression, thus forming a feedback loop between NFATc3 and TRIM17. Hence, an TRIM17-TRIM39-NFATc3 signaling axis for modulating the protein stability for promoting the activity of NFATc3 in regulating apoptosis in the cerebellar granule neurons induced by KCl deprivation is proposed.

      The key conclusions are convincing. The data in general are of good quality and with many of the key interactions vigorously documented by conducting reciprocal interaction analysis. For knockdown expeRIMents, two shRNA independent sequences were used. However, some issues remain to be addressed:

      Major comments:

      1.Figure 1D - the authors should demonstrate that the depletion of TRIM39 expression by shRNA in Neuro2A by Western blotting

      2.Figure 3 - the author should show overexpression of TRIM39 resulted in reduction of basal level of endogenous NFATc3 due to its effect on protein stability by using CHX or other pulse chase method.

      3.Figure 3 - Does overexpression or knockdown of TRIM39 has an effect on affecting levels of NFATc3 mRNAs?

      4.Figure 6A - the authors should confirm the multiple bands that are slower migrating are SUMO form of NFATc9 by demonstrating the presence of SUMO in these forms of NFATc3, or alternatively, perform His-SUMO pull-down and probe for NFATc3.

      5.Figure 7C - the quantification for mSIM1 does not seem to agree with the band intensity.

      6.TRIM17 reduces TRIM39/NFATc3 interaction and inhibits TRIM39 E3 activity, which results in stabilization of NFATc3. NFATc3 in turn transcriptionally induces TRIM17 expression, thus forming a feedback loop between NFATc3 and TRIM17. It will be good if the authors can discuss the possibility of the existence of this feedback mechanism in physiological context? Is the protein level of NFATc3 level, which should be low abundance at the resting state, elevated by KCI deprivation? If so, can the authors discuss the possible signalling event(s) that that lead to activation of NFATc3 upon KCI deprivation? For instance, does KCL deprivation cause de-SUMOylation of NFATc3?

      Minor comments:

      1.Line 294 - it should be "SUMOylation" instead of "SUMO".

      2.Figure 8 - to include TRIM39/NFATc3 double knockdown to show the effect on increased neuronal apoptosis in the cells with TRIM39 knocked down was due to elevation of NFATc3 rather than other target(s) of TRIM39.

      3.The discussion may be shortened and revised to highlight the physiological importance of the findings linked to cerebellar granule neurons survival.

      Significance

      Prior to this study, the mechanism by which protein stability of NFATc3, the pre-dominant member of the NFAT family of transcription factor in neurons, is regulated remains poorly understood. Shrivastava et al. have unravelled the interplay between ubiquitylation and SUMOylation involving TRIM39 and TRIM17 to have an important role in regulating protein stability of NFATc3. The work is interesting and bears significance towards understanding how apoptosis could be finely controlled in cerebellar granule neurons. Furthermore, the study has also expanded the understanding of the role and regulation of the TRIM family of proteins. The senior author is an expert in this field and over the years, her group has contributed many key discoveries on the function of TRIM family of E3 ubiquitin ligases and their critical ubiquitylation substrates in neuronal survival and its relevance to neuronal biology and diseases.

      The referee's field of expertise in in the field of mitochondrial apoptosis signalling. The referee extensively involved in studying how protein stability of regulators in apoptosis signalling are regulated by the ubiquitin-proteasome system (UPS) and how does the regulation play a role in physiology and diseases.

      Key words: apoptosis, ubiquitylation, cell signaling, liver diseases

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, the authors analyze the effect of TRIM39, a ubiquitin E3 ligase, on NFATc3, a transcription factor that regulates apoptosis in the nervous system. The authors show that TRIM39 can promote the ubiquitination of NFATc3 and regulate its half-life. Furthermore, ubiquitination depends on the SUMOylation state of NFATc3, which suggests that TRIM39 could be a new example of SUMOylation-dependent ubiquitin ligase or STUbL. In addition, the authors show that TRIM17 interferes with TRIM39 ubiquitination, representing a new regulatory level for NFATc3 degradation. This has consequences on the regulation of apoptosis in cells derived from the nervous system. The authors show well-controlled, sound results for the most part. The manuscript is well written, and argumentation is convincing. Given the fact that only 2 STUbLs were previously characterized in mammals, the results are relevant and represent an advance in the field. Overall, this is a nice piece of work. Here are some comments.

      Major comments

      -In Fig. 2B, the levels of material loaded are uneven, which difficult the interpretation. However, it seems that the control shRNA also has an effect on NFATc3 ubiquitination, which should not be the case. Also, by reducing ubiquitination by TRIM39, shouldn't you expect an increase in the levels of NFATc3, if this ubiquitination was driving degradation? The authors do not specify whether those cells were treated or not with proteasomal inhibitor. Same applies in Figure 4B, where no reduction in NFATc3 are seen after including TRIM39 in the reaction (beyond the fact that it looks reduced because the presence of ubiquitinated forms).

      -After the experiments in vitro shown in Fig. 2C, the authors conclude that the NFATc3 is a direct substrate of TRIM39. I think the authors used the right approach by using bacterially produced GST-TRIM39 for the ubiquitination reaction. However NFATc3 is produced by an in vitro transcription-translation system, which could in principle provide other contaminant proteins to the reaction. Did the authors try to use bacterially produced NFATc3? This might be difficult in the case of big proteins, in which case the authors could add some caution note in the text. Same applies in Figure 4B.

      -In Fig. 6B, higher levels of ubiquitination in the different SUMOylation mutants are shown. Is this effect consistent? How this can be explained? In addition, variations in the levels of NFATc3 are shown in the total lysate, despite the use of proteasomal inhibitors. How the author explain this effect? Somehow, this is contradictory with the general message of SUMOylation-dependent ubiquitination.

      -In Fig. 7E, not clear to me what the big bands above 130 KDa is after the nickel beads. Do they correspond to monoUb NFATc3 or to the unmodified protein that is sticky to the beads? Do the authors have side-by-side gels of the initial lysate next to the nickel beads eluates to show the increase in molecular weight?

      -Quantifications in some pictures (i.e. Figures 5A, 5B, 6B, 7) is shown in red above or below the bands. Not clear whether the quantifications shown correspond to that single experiment or is the average of several experiments. In the first case, the number would not be very valuable. Authors could add quantification graphs with standard deviations or error bars to the experiments if they want to make the point of changes (significant or not) in the levels. Alternatively, indicate in the Figure legends whether the numbers correspond to the average of several experiments.

      -In Fig. 8, the quantification of apoptotic nuclei has been done just based on the morphology after DAPI staining. Could you use an apoptosis marker (i.e. cleaved caspase Abs) to label the apoptotic cells?

      Minor comments

      -In Figs. 1 and 5, the red channel should be put in black and white, as it is much easier to see the signal. Not relevant to have DAPI alone in B&W (it does not hurt either), as it is well visible in the merge picture. Also, quantification of the PLA positive dots should be shown in Fig. 1.

      -In Fig. 3C, is the difference in TRIM17 expression between empty plasmid and NFATc3 plasmid significant? If so, indicate it in the graph. The same in panels D, E, indicate all significant differences. Same in other Figures.

      -It would be nice to show a scheme on the location of SIMs in TRIM39 in relation to the other feature of the protein.

      -In Fig. 2 legend, "Note that in the presence of ubiquitin the unmodified form of WT GST-Trim39 is lower due to high Trim39 ubiquitination." Please change to "...in the presence of ubiquitin the levels of the unmodified form..."

      -In Fig. 7 legend, the phrases "The intensity of the bands ... " are not clear. Please rephrase.

      -In Fig. 8 legend, " P<0.001". Change to "* P<0.001".

      Significance

      In this manuscript, the authors analyze the effect of TRIM39, a ubiquitin E3 ligase, on NFATc3, a transcription factor that regulates apoptosis in the nervous system. The authors show that TRIM39 can promote the ubiquitination of NFATc3 and regulate its half-life. Furthermore, ubiquitination depends on the SUMOylation state of NFATc3, which suggests that TRIM39 could be a new example of SUMOylation-dependent ubiquitin ligase or STUbL. In addition, the authors show that TRIM17 interferes with TRIM39 ubiquitination, representing a new regulatory level for NFATc3 degradation. This has consequences on the regulation of apoptosis in cells derived from the nervous system.

      The authors show well-controlled, sound results for the most part. The manuscript is well written, and argumentation is convincing. Given the fact that only 2 STUbLs were previously characterized in mammals, the results are relevant and represent an advance in the field. Overall, this is a nice piece of work.

      Audience: researchers interested on proteostasis in general and on nervous system regulation

      My expertise: postranslational modifications

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Desagher and co-workers investigate the regulation of the NFAT family member NFATc3, a transcription factor in neurons with a pro-apoptotic role. They identify TRIM39 as a ubiquitin E3 ligase regulating NFATc3. They demonstrate that TRIM39 can bind and ubiquitinate NFATc3 in vitro and in cells. They identify a critical SUMO interaction motif in TRIM39, that is required for its interaction with NFATc3 and for its ability to ubiquitinate NFATc3. Moreover, mutating sumoylation sites in NFATc3 reduces the interaction with TRIM39 and reduces its ubiquitination. Silencing TRIM39 increases the protein levels of NFATc3 and its transcriptional activity, leading to apoptosis of neurons. TRIM17 modulates the TRIM39-NFATc3 axis. Combined, TRIM39 appears to be a SUMO-targeted ubiquitin ligase (STUbL) for NFATc3 in neurons.

      Major points:

      1.This manuscript containing two stories: the rather exciting story that TRIM39 is a STUbL for NFATc3 (as mentioned in the title) and the second less exciting story: TRIM17 modulates the regulation of NFATc3 by TRIM39. These stories are now mixed in a confusing manner, disrupting the flow of the first story. It would be better to focus the current manuscript on the first story and strengthen it further and develop the second story in a second manuscript.

      2.Whereas the cellular experiments to indicate that TRIM39 acts as a STUbL are properly carried out, the observed effects are not necessarily direct. Direct evidence that TRIM39 is indeed a STUbL for sumoylated NFATc3 needs to be obtained in vitro, using purified recombinant proteins. Does TRIM39 indeed preferentially ubiquitinate sumoylated NFATc3? Is ubiquitination reduced for non-sumoylated NFATc3? Is ubiquitination of sumoylated NFATc3 dependent on SIM3 of TRIM39? Do other SIMs in TRIM39 contribute?

      3.Rule out potential roles for other STUbLs by including control knockdowns of RNF4 and RNF111 and verify the sumoylation of NFATc3 and ubiquitination of wildtype and sumoylation-mutant NFATc3.

      4.Figure 6B: use SUMO inhibitor ML-792 to demonstrate that ubiquitination of wildtype NFATc3 by TRIM39 is dependent on sumoylation.

      Minor points:

      5.Figure 1A and B: demonstrate by immunoprecipitation and Western that the endogenous counterparts indeed interact.

      6.Figure 1C and 1E: Quantify the PLA results properly and perform statistics.

      7.Figure 2B: Correct unequal loading of samples.

      8.Figure 6B: proper statistics are needed here from at least three independent experiments.

      Significance

      Humans have over 600 different ubiquitin E3s. Currently, RNF4 and RNF111 are the only known human SUMO-Targeted Ubiquitin Ligases (STUbLs). Here, the authors present evidence that the ubiquitin E3 ligase TRIM39 is a STUbL for sumoylated NFATc3. Identification of a new STUbL is an exciting finding for the ubiquitin and SUMO field and for the field of ubiquitin-like signal transduction in general, but needs to be strengthened as outlined above. My field of expertise is SUMO and ubiquitin signal transduction.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      **A. Summary:**

      In this modeling study, the authors devised a multicellular model to investigate how circadian clocks in different parts (organs) of plants coordinate their timing. The model uses a plausible mechanism to explain how having a different sensitivity to light leads to different phase and period of circadian clock, which is observed in different plant organs. The model allows for entrainment in Light-Dark (LD) cycles and then a release in always-light (LL) environments. The model disentangles numerous factors that have confounded previous experiments. In one instance, the authors assigned different light sensitivities to the different organs (e.g., root tip, hypocotyl, etc.) which unambiguously show that this one element alone - spatially differing sensitivity to light - is sufficient for recapitulating experimentally observed differences in periods and phases between plant organs. The model also recapitulates the spatial waves of gene expression within and between organs that experimentalists reported. At the sub-tissue level, the model-produced waves have similar patterns as the experimentally observed waves. This confirmation further validates the model. By having the cells share clock mRNA, from any clock component genes, showed the same, experimentally observed spatial dynamics. The main conclusion of the study is that regional differences (e.g., between different organs) in light senilities, when combined with cell-to-cell sharing of clock-gene mRNAs, enables a robust, yet flexible, circadian timing under noisy environmental cycles.

      Thank you for your assessment of our work. We plan to make the following revisions based on your feedback.

      **B. Specific points:**

      1.Lines 125-127: "To simulate the variability observed in single cell clock rhythms, we multiplied the level of each mRNA and protein by a time scaling parameter that was randomly selected from a normal distribution." - Why not add a white (Gaussian) noise term to these equations? How does multiplying by a random variable (for rescaling time) different from my proposal? Some explanation should be given in the text here.

      We opted for a time scaling approach as this generates between-cell period differences but avoids within-cell period differences. This is consistent with single cell experiments (S1 Fig; Gould et al., 2018, eLife). We will provide an explanation of this in the text.

      2.Does the spatial network model simplify calculations by assuming separations of timescales (e.g., for equilibration in concentrations of mRNAs that diffuse between cells)? If so, it would be good to spell these out in the beginning of the Results section (where the model is described).

      We agree that a more detailed discussion of the model assumptions would be beneficial and we will provide this in the text.

      3.Lines 161-162: "....in a phase only model by local...." should be "....in a phase model only by local...."

      Thank you for your correction.

      4.Lines 188-190: The authors observed that qualitatively similar/indistinguishable behaviors arose regardless of which elements are varied (e.g., global versus local cell-cell coupling, setting light input to be equal in all regions of the seedling, etc.). Then they claim here that "...these results show that the assumptions of local cell-to-cell coupling and differential light sensitivity between regions are the key aspects of our model that allow a match to experimental data." - I don't see how this follows from the observation almost any of the variations lead to the same behaviors in this section (spatial waves). Show the reasoning in the text here.

      We observed spatial waves with different local coupling regimes (4 v. 8 nearest neighbours). However, we did not observe spatial waves with global coupling (S10 Fig). This led us to conclude that local coupling is a key aspect. In addition, we do not observe waves when setting the light input to be equal in all regions of the seedling (S11 Fig). This confirms that local differences in light sensitivity are also required in our simulations to generate spatial waves. We will clarify these points with revisions to the text.

      5.Pgs. 9-10: Section on "Cell-to-cell coupling maintains global coordination under noisy light-dark cycles": The simulation results rigorously support the authors' main conclusion here, which is that local cell-to-cell coupling allows for global coordination under noisy LD cycles. But I'm missing an intuitive explanation (or just any explanation) for why this is. At the end of this section, the authors should provide some intuition or qualitative explanation for the observations that they produced using their model in this section.

      We will revise the text to provide an intuitive explanation of these results. The coupling decreases the within-region phase differences. Despite the between-regions phase differences persisting, this effect is sufficient to improve the overall global synchrony.

      6.Lines 261-262: Replace the present tenses with past tenses.

      Thank you for your correction.

      7.Is the main idea that cell-to-cell coupling allows for averaging of fluctuations, between organs or cells within the same organ, while allowing for coordination of the average quantities? Is this responsible for both the flexibility and robustness observed under noisy environmental cycles?

      The cell-to-cell-coupling allows for the averaging of fluctuations between cells and the regional flexibility arises from the different light sensitivities in each region. What was interesting to us was that under light-dark cycles the regional flexibility was not lost due to either the noise in the light or the averaging effect of the cell-to-cell coupling. We will revise the text to emphasize these points. Thank you for your prompts.

      8.Line 304: Is it really true that the mammalian circadian rhythm is centralized? Don't some parts of our bodies have different circadian clock (e.g., slight differences in phase) than some other parts of our bodies?

      There are indeed some small phase differences between parts of our bodies because the mammalian system, like the plant system, is imperfectly coupled. However, the mammalian system is considered more centralized because the suprachiasmatic nucleus in the brain receives the key entraining signal of light and then coordinates rhythms across the body (Bell-Pedersen et al., 2005, Nat Rev Gen; Brown & Azzi, 2013, Circadian Clocks). We will expand on these interesting points by adding a paragraph to the discussion.

      Reviewer #1 (Significance):

      **Overall assessment:**

      I enthusiastically recommend this work for publication after the authors address my comments below (please see "Specific points").

      The model's main strength is that the authors could vary each ingredient separately - light sensitivity of each cell/organ, which gene's mRNA diffuses between cells, cellular noise, local versus global cell-cell coupling, etc. Afterwards, the authors could determine which of these variations produces which experimentally observed behaviors. Another strength of the model is that it can reproduce not just one, but numerous, experimentally observed behaviors that are important for understanding circadian clocks in plants. Thus, the model is grounded in experimental truth and produces experimentally observed results. Crucially, since the authors could vary every single element in the model independently of the other elements, the authors are able to provide plausible explanations for why the experiments produced the results that they did (experimentally, a number of confounding factors prevented one from pinpointing to which element produced which observation).

      Another strength of the model is also extendable, by other researchers to investigate other plant physiologies in the future (e.g., circadian clock's influence on cell division). The authors highlight these future uses in the discussion section. Therefore, I believe that this work will be valuable to plant biologists, non-plant biologists who are interested in circadian clocks, and systems biologists in general.

      The manuscript is also well written and relatively easy to follow, even for non-plant biologists like myself.

      Thank you for the positive feedback - we are pleased that you find the manuscript of broad interest to a range of readers.

      Comment on Reviewer #2:

      I agree with his/her major criticism #3 (ELF4 long-distance movement). I find this to be a reasonable request. Fulfilling it would increase the paper's impact.

      Please see our response to reviewer #2.

      Comment on Reviewer #3:

      The reviewer's point (1) asks for a reasonable request.

      Regarding his/her point (2): This is also reasonable. I'd recommend his/her suggestion (a). In the end, I'd be interested to see how the authors respond to this (what function they choose to let adjacent cells be subjected to some correlated light-input intensity. I'd be happy with something simple such as + noise, where is a deterministic term that, for example, decreases exponentially as one moves away from some central cell. Basically, I'd let the authors decide how to implement this and accept their current implementation - no correlation in light-intensity between adjacent cells - as an extreme scenario, as this reviewer points out.

      Please see our response to reviewer #3.

      Reviewer #2 (Evidence, reproducibility and clarity):

      **Summary:**

      The manuscript presents an improved model of the circadian clock network that accounts for tissue-specific clock behavior, spatial differences in light sensitivity, and local coupling achieved through intercellular sharing of mRNA. In contrast to whole-plant or "phase-only" models, the authors' approach enables them to address the mechanism behind coupling and how the clock maintains regional synchrony in a noisy environment. Using 34 parameters to describe clock activity and applying the properties mentioned above, the authors demonstrate that their model can recapitulate the spatial waves in circadian gene expression observed and can simulate how the plant maintains local synchrony with regional differences in rhythms under noisy LD cycles. Spatial models that incorporate cell-type-specific sensitivities to environmental inputs and local coupling mechanisms will be most accurate for simulating clock activity under natural environments.

      Thank you for your assessment of our work. We plan to make the following revisions based on your feedback.

      *We have the following **major criticisms** as follows*

      1) When assigning light sensitivities in different regions of the plant, the authors assign a higher sensitivity value to the root tip (L=1.03) than they do to the other part of the root (L=0.90). We are curious why the root tip would have higher light sensitivity than the rest of the root. Is this based on experimental data (if so, please cite in this section or methods)? It seems that these L values were assigned simply to make sure they recapitulated the period differences observed in Fig. 2A. Are these values based on PhyB expression in those organs? Or perhaps based on cell density in those locations?

      We assign the light sensitivity to match observed experimental period differences across the plant (Fig 2A,B). This is based on previous experiments demonstrating that experimental period differences are dependent on light input through the light sensing gene PHYB (Greenwood et al., 2019, PLoS Bio; Nimmo et al., 2020, Physiologia Plantarum). For example, in WT seedlings, the root tip oscillates faster than the root, but this difference is lost in the phyb-9 mutant (Greenwood et al., 2019). Thus, we assume the root tip to be more sensitive to light than the roots.

      Further supporting this assumption, there is evidence that expression of phytochromes and cryptochromes are increased in the root tip relative to the root (e.g., Somers & Quail, 1995, Plant J; Bognar et al., 1999, PNAS; Toth et al., 2001, Plant Physiol), as the reviewer proposes. However, further experiments would be needed to verify that these differences in expression are what lead to the differences in clock timing. We will add a discussion of these experiments to the text.

      2) In the discussion of the test where they set the "light inputs to be equal" in all regions to simulate the phyb-9 mutant, could the authors please clarify whether that means they set the L light sensitivity value equal in all regions?

      This is indeed what we mean, we will rephrase the text for clarity.

      a. If they are referring to setting the L value equal to all regions, we suggest that this discussion be moved to the section about different light sensitivities instead of the local sharing of mRNA section.

      Thank you for your suggestion, we agree and will move this discussion.

      b. Additionally, is it possible to set the light sensitivity to zero for all parts of the plant? We think this would be more suitable to simulate the phyb-9 mutant phenotype.

      We thank the reviewer for this suggestion. We will include a simulation with light sensitivity set to zero in the revised manuscript, in addition to the existing simulations with light sensitivity set to 1.

      3) Based on the recent Chen et al. (2020) paper showing ELF4 long-distance movement, we think it would be of great interest for the authors to model ELF4 protein synthesis/translation as the coupling factor, in addition to the modeling using CCA1/LHY mRNA sharing. We understand you may be saving this analysis for a future modeling paper, but this addition to the paper could increase the impact of this paper.

      Thank you for the suggestion to improve our manuscript. We agree it will be of interest to model ELF4 protein as the local coupling factor. In the revision, we will simulate each clock protein (including ELF4) as the local coupling factor and compare.

      In addition, we will also modify the coupling mechanism to simulate the long-distance transport of ELF4 proposed by Chen et al., 2020. Our preliminary simulations show that we can couple shoot rhythms to those in the root tip, but that this long range coupling can not on its own generate the spatial structure observed in experiments. We agree with the reviewers that this analysis and an associated discussion will further increase the impact of the paper.

      4) This model is able to simulate circadian rhythms under 12:12 LD cycles, which represents two days of the year-the equinoxes. We are curious if the model can simulate rhythms under short days and long days as well. We understand this analysis may be outside the scope of this paper and may require changing the values of the 34 parameters used but think it could be a useful addition here or in future work.

      We agree it would be interesting to observe the behavior of the model under different day lengths. We will include simulations under short and long days in the revision.

      *And **minor criticisms** as follows*

      1) In the first paragraph of the results section, it would be helpful for the authors to reference Table S1 when they mention the 34 parameters used to model oscillator function

      We agree and we will implement this helpful suggestion.

      2) In the first paragraph of the section titled "Local flexibility persists under idealized and noisy LD cycles", it would be helpful for the authors to reference S12 Fig after the last sentence that starts "However, ELF4/LUX appeared more synchronized..."

      We agree and we will implement this helpful suggestion.

      3) In the first paragraph of the section titled "Cell-to-cell coupling maintains global communication under noisy light-dark cycles", the authors refer to a "Table 1" but I think they mean to refer to Table S1"

      Thank you, we will implement this helpful suggestion.

      4) In Fig. 1, panel C is described as demonstrating the cell-to-cell coupling through the "level of CCA1/LHY". This phrasing is vague and we think could be improved to the "mRNA level of CCA1/LHY".

      We agree and will implement this helpful suggestion.

      Reviewer #2 (Significance (Required)):

      This work would be broadly interesting to other researchers studying cell-to-cell signaling and coupling of circadian rhythms in plants and other species where spatial waves of gene expression have been observed (i.e., mice and humans). Additionally, the computational modeling aspect of this work was easily interpretable for someone outside this expertise. Our expertise lies in plant circadian biology.

      We thank the reviewer for recognising the broad appeal of our work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      **Summary:**

      The authors start by taking a previously published model of the plant circadian clock and implement five changes: 1) updating the network topology to reflect some recent experimental findings, 2) make a spatial model loosely based on a seedling template 3) introduce coupling between cells based on shared levels of CCA1/LHY 4) randomly rescale time in each cell to induce inter-cell differences in period, 5) include a light sensitivity that depends on the region considered.

      For a certain configuration of light sensitivities/intensities, the different periods of oscillations in each seedling region roughly match that of experiments. With a sufficiently high coupling between cells, the system can also generate spatial waves, which are also observed in the experimental system.

      With pulsed light inputs the spatial pattern is still produced. The authors then investigate the robustness to environmental noise by generating stochastic light signals and show that the global synchrony, as measured with a synchronisation index, increases with cell-to-cell coupling strength. The paper is overall well-written, and the background and details of the analysis are well presented.

      Thank you for your assessment of our work. We plan to make the following revisions based on your feedback.

      **Major comments:**

      For the first part of paper, the output of the model is certainly the focus. There is virtually no discussion of the inferred parameters and how much confidence the authors have in their values.

      Thank you for this point. We will add discussion of the inferred parameters to the initial part of the results.

      My main issue with the paper is about the section with noisy light signals, which is included in the title and is ultimately one of the main themes of the article.

      Specifically, on line 224:

      "This decrease in cell-to-cell variation revealed an underlying spatial structure (Fig 4D, middle and right, and S13 Fig), comparable to that observed under idealized LD cycles (Fig 4B, middle and right, and S12 Fig)."

      Firstly, I don't feel these conclusions match with the data presented. Comparing figure 4D middle and right with figure 4B middle and right shows a clear and pronounced loss in spatial structure. In its current form, this statement has to change, but I believe there are at least two other major issues with this figure:

      We agree there are some differences in the spatial structure between idealized (Fig 4B) and noisy (Fig 4D) LD cycles. Preliminary simulations suggest that this is due to the way the noisy LD cycles are programmed.

      In the current implementation of noisy LD cycles, the maximum intensity of L, L**max, differs between each region, such that relative differences in light sensitivity between regions are maintained. This means that some phase differences between regions are maintained. However, as the reviewer correctly points out in point 1 below, due to the noise fluctuations, the average level of light is lower than under idealized LD cycles, and with considerable day-to-day variation. We believe this is why the spatial structure differs.

      Preliminary simulations suggest that if we normalize the mean light intensity such that the mean is equal between the two conditions (as the reviewer suggests in point 1 below), the spatial structure appears similar. We will present this analysis in the revision.

      1) The figure is clearly designed to invite a comparison between the noise-free light cycles on the left with the noisy cycles on the right. However, due to how the noisy light is simulated, the variance of light signal increases AND the average intensity of light decreases by 50%. When comparing the left and the right, we therefore don't know whether the changes are due to differences in the average signal or differences from the stochasticity. I think the authors should simulate a noisy light signal with the same mean intensity level as the deterministic signal.

      As discussed above, we agree that the average intensity of the light decreases due to the noise, and this complicates interpretation. We will simulate idealized and noisy light cycles with the same mean light level upon revision.

      2) The noise model for the light doesn't seem realistic. On line 484 is says:

      "We made the simplifying assumption that each cell is exposed to an independent noisy LD cycle due to their unique positions in the environment. LD cycles were input to the molecular model through the parameter L".

      In fact, this could be considered as an incredibly complex signal, because for 800 cells it means drawing 800 random light signals. The implication is that two adjacent cells receive statistically independent light signals. Depending on chance, one cell might receive tropical levels of light while its neighbour experiences a cloudy day. This affects the interpretation and conclusions from figures 4 and 5. I propose two different ways of improving the simulation of the noisy light signal:

      a) In one extreme case, all cells receive the same noisy light signal, and the other extreme, they all receive independent signals. You could consider a mixture model of light signals, where each cell receives \lambda L_global(t) + (1-\lambda) L_individual(t), where L_global(t) is a global light signal that is shared by all cells and L_individual(t) is a light signal unique to an individual cell. The mixing parameter \lambda controls how similar the light signal is between cells

      b) Clearly the light signal will differ depending on the region, but there will be some spatial correlation. You could also consider methods of simulating light such that neighbouring cells receive correlated signals, although this might be difficult.

      Thank you for your proposals. We agree that our current implementation of noisy LD cycles represents an extreme scenario. Given that there is no environmental data at sufficient resolution to reliably evaluate which implementation is most realistic, we will explore different approaches based on your suggestions and present them in our revision.

      Assuming that the problem with the mean signal is corrected, do you expect the average spatial pattern to be the same between figure 4 B and D with no coupling (J=0) (although an increase in the variance between cells)? Perhaps not (owing to nonlinearities in the system), but it would be interesting to comment.

      We agree that the decreased light intensity complicates interpretation of the spatial structure. Although in the current implementation relative light differences between regions are maintained, the spatial structure is altered because the mean intensities are lower. Preliminary simulations with the mean intensity fixed do result in spatial patterns more similar to that seen in Fig 4B, but with increased variance. Comprehensive simulations will be included in the revised manuscript.

      The different periods in the different regions of the seedling are caused by differences in light sensitivity, which the authors claim is justified from refs 12-15. An alternative hypothesis is the that biochemical parameters such as degradation rates are different between regions. This is briefly alluded to in the introduction, but I think it would be interesting to discuss further. What would be the pros and cons of the two different mechanisms?

      We agree that an alternative hypothesis is that biochemical parameters such as degradation rates may differ between regions. Experimental evidence, however, more supports the light sensitivity hypothesis. This is because, for example, mutations in light signalling remove the spatial differences between regions. We agree though that this is an important point, and will add a paragraph to the discussion discussing the pros and cons of the two different mechanisms.

      I understand that the authors used a pre-existing model, but I must say that I find the way that light is incorporated into the model a bit confusing.

      On line 345 it says:

      "L(t) represents the input light signal (L = 0, lights off; L > 0, lights on) and D(t) denotes a corresponding darkness input signal (D = 1, lights off; D = 0, lights on)."

      Surely the only thing that matters biophysically is the number of photons hitting the plant? Could you explain why the model needs to have a separate "darkness signal" compared to just a single light signal?

      A darkness signal has been introduced in many circadian clock models because degradation rates of the clock genes can depend upon the light or dark condition. We agree with the reviewer that we should explain this clearer in the text.

      In the model, the light intensity changes depending on the region. It might make more sense for interpretability if instead there is an additional light-sensitivity coefficient that depends on the region, because at the moment I'm not sure what units L(t) is supposed to take.

      Thank you for your suggestion. We will try to implement this approach.

      **Minor comments**

      Could you more explicitly describe a possible molecular mechanism through which the coupling acts?

      Thank you for your suggestion. We will more explicitly discuss likely transport mechanisms in the text.

      In Figure 1C it looks like different genes are coupling to different genes, so you may need to rearrange it.

      In our model, the level of CCA1/LHY is shared. Thus, CCA1/LHY from one cell can be considered to repress the expression of other interacting genes in the neighbour cell.

      Line 103: "We found that regional differences persist even under LD cycles, but cell to-cell minimized differences between neighbor cells." Missing word.

      Thank you for your correction.

      Line 124: "The coupling strength was set to 2 (Methods)." This is meaningless in isolation, so it would be better to briefly explain what the coupling parameter is before mentioning its value.

      Thank you for your suggestion, we will describe the coupling function in more detail.

      Through the text, I think De Caluwe should be corrected to De Caluwé

      Thank you for your correction.

      Typo line 493

      Thank you for your correction.

      Code and data are not made available.

      Model code will be made available from our project GitLab page: https://gitlab.com/slcu/teamJL/greenwood_tokuda_etal_2020

      Output of analysis of experimental data and simulations will also be made available on the GitLab page.

      Reviewer #3 (Significance (Required)):

      The authors motivate the paper by highlighting that their proposed model improves on phase-based models in that it describes underlying molecular mechanisms.

      From an experimental side, it's interesting that a model is developed and directly compared with measured spatio-temporal waves of gene expression. From a theoretical side, the authors address questions relating to oscillations, multi-scale modelling and noise robustness that also generalise to other systems. I therefore expect that both experimental and theoretical audiences will be interested in the results.

      There are many possible additions and modifications that could be made to the model, and so the model and analysis could provide a platform for future research. However, I can't comment on whether there are similar pre-existing models of the plant circadian clock that contain both a molecular description of the circadian clock as well as a spatial scale.

      We appreciate the reviewer’s view that the work is interesting to both experimental and theoretical audiences.

      Comments on Review #1:

      The time is rescaled in each cell, meaning that each cell has a unique period, but the dynamics remain deterministic and hence the peak-to-peak times will be exactly the same for each cell. I imagine this isn't completely consistent with single-cell data (if available), where peak-to-peak times are very likely to be variable due to noisy gene expression. In a future paper it would be interesting to analyse the system using stochastic differential equations.

      Please see our response to reviewer #1.

      Comments on Review #2:

      I agree on the following two points:

      1) It would add value to discuss whether the different ranking of light sensitivities by organ matches any available experimental data.

      Please see our response to reviewer #2.

      2) As the Reviewers point out, there are many possibilities for testing the robustness of the system to light clues, including varying the length of the day. Although outside of the scope of this paper, I wonder if it's possible to find data from a light sensor measuring light intensity across an entire year? Plugging such data into the model and measuring how the amplitude and period changes would be really interesting, in my opinion.

      Thank you for your suggestion. We also see this as an interesting future direction.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The authors start by taking a previously published model of the plant circadian clock and implement five changes: 1) updating the network topology to reflect some recent experimental findings, 2) make a spatial model loosely based on a seedling template 3) introduce coupling between cells based on shared levels of CCA1/LHY 4) randomly rescale time in each cell to induce inter-cell differences in period, 5) include a light sensitivity that depends on the region considered.

      For a certain configuration of light sensitivities/intensities, the different periods of oscillations in each seedling region roughly match that of experiments. With a sufficiently high coupling between cells, the system can also generate spatial waves, which are also observed in the experimental system.

      With pulsed light inputs the spatial pattern is still produced. The authors then investigate the robustness to environmental noise by generating stochastic light signals and show that the global synchrony, as measured with a synchronisation index, increases with cell-to-cell coupling strength. The paper is overall well-written, and the background and details of the analysis are well presented.

      Major comments:

      For the first part of paper, the output of the model is certainly the focus. There is virtually no discussion of the inferred parameters and how much confidence the authors have in their values.

      My main issue with the paper is about the section with noisy light signals, which is included in the title and is ultimately one of the main themes of the article.

      Specifically, on line 224:

      "This decrease in cell-to-cell variation revealed an underlying spatial structure (Fig 4D, middle and right, and S13 Fig), comparable to that observed under idealized LD cycles (Fig 4B, middle and right, and S12 Fig)."

      Firstly, I don't feel these conclusions match with the data presented. Comparing figure 4D middle and right with figure 4B middle and right shows a clear and pronounced loss in spatial structure. In its current form, this statement has to change, but I believe there are at least two other major issues with this figure:

      1) The figure is clearly designed to invite a comparison between the noise-free light cycles on the left with the noisy cycles on the right. However, due to how the noisy light is simulated, the variance of light signal increases AND the average intensity of light decreases by 50%. When comparing the left and the right, we therefore don't know whether the changes are due to differences in the average signal or differences from the stochasticity. I think the authors should simulate a noisy light signal with the same mean intensity level as the deterministic signal. . 2) The noise model for the light doesn't seem realistic. On line 484 is says:

      "We made the simplifying assumption that each cell is exposed to an independent noisy LD cycle due to their unique positions in the environment. LD cycles were input to the molecular model through the parameter L".

      In fact, this could be considered as an incredibly complex signal, because for 800 cells it means drawing 800 random light signals. The implication is that two adjacent cells receive statistically independent light signals. Depending on chance, one cell might receive tropical levels of light while its neighbour experiences a cloudy day. This affects the interpretation and conclusions from figures 4 and 5. I propose two different ways of improving the simulation of the noisy light signal:

      a) In one extreme case, all cells receive the same noisy light signal, and the other extreme, they all receive independent signals. You could consider a mixture model of light signals, where each cell receives \lambda L_global(t) + (1-\lambda) L_individual(t), where L_global(t) is a global light signal that is shared by all cells and L_individual(t) is a light signal unique to an individual cell. The mixing parameter \lambda controls how similar the light signal is between cells

      b) Clearly the light signal will differ depending on the region, but there will be some spatial correlation. You could also consider methods of simulating light such that neighbouring cells receive correlated signals, although this might be difficult.

      Assuming that the problem with the mean signal is corrected, do you expect the average spatial pattern to be the same between figure 4 B and D with no coupling (J=0) (although an increase in the variance between cells)? Perhaps not (owing to nonlinearities in the system), but it would be interesting to comment.

      The different periods in the different regions of the seedling are caused by differences in light sensitivity, which the authors claim is justified from refs 12-15. An alternative hypothesis is the that biochemical parameters such as degradation rates are different between regions. This is briefly alluded to in the introduction, but I think it would be interesting to discuss further. What would be the pros and cons of the two different mechanisms?

      I understand that the authors used a pre-existing model, but I must say that I find the way that light is incorporated into the model a bit confusing.

      On line 345 it says: "L(t) represents the input light signal (L = 0, lights off; L > 0, lights on) and D(t) denotes a corresponding darkness input signal (D = 1, lights off; D = 0, lights on)."

      Surely the only thing that matters biophysically is the number of photons hitting the plant? Could you explain why the model needs to have a separate "darkness signal" compared to just a single light signal?

      In the model, the light intensity changes depending on the region. It might make more sense for interpretability if instead there is an additional light-sensitivity coefficient that depends on the region, because at the moment I'm not sure what units L(t) is supposed to take.

      Minor comments

      Could you more explicitly describe a possible molecular mechanism through which the coupling acts?

      In Figure 1C it looks like different genes are coupling to different genes, so you may need to rearrange it.

      Line 103: "We found that regional differences persist even under LD cycles, but cell to-cell minimized differences between neighbor cells." Missing word.

      Line 124: "The coupling strength was set to 2 (Methods)." This is meaningless in isolation, so it would be better to briefly explain what the coupling parameter is before mentioning its value.

      Through the text, I think De Caluwe should be corrected to De Caluwé

      Typo line 493

      Code and data are not made available.

      Significance

      The authors motivate the paper by highlighting that their proposed model improves on phase-based models in that it describes underlying molecular mechanisms.

      From an experimental side, it's interesting that a model is developed and directly compared with measured spatio-temporal waves of gene expression. From a theoretical side, the authors address questions relating to oscillations, multi-scale modelling and noise robustness that also generalise to other systems. I therefore expect that both experimental and theoretical audiences will be interested in the results.

      There are many possible additions and modifications that could be made to the model, and so the model and analysis could provide a platform for future research. However, I can't comment on whether there are similar pre-existing models of the plant circadian clock that contain both a molecular description of the circadian clock as well as a spatial scale.

      REFEREE'S CROSS-COMMENTING

      Comments on Review #1:

      The time is rescaled in each cell, meaning that each cell has a unique period, but the dynamics remain deterministic and hence the peak-to-peak times will be exactly the same for each cell. I imagine this isn't completely consistent with single-cell data (if available), where peak-to-peak times are very likely to be variable due to noisy gene expression. In a future paper it would be interesting to analyse the system using stochastic differential equations.

      Comments on Review #2:

      I agree on the following two points:

      1) It would add value to discuss whether the different ranking of light sensitivities by organ matches any available experimental data.

      2) As the Reviewers point out, there are many possibilities for testing the robustness of the system to light clues, including varying the length of the day. Although outside of the scope of this paper, I wonder if it's possible to find data from a light sensor measuring light intensity across an entire year? Plugging such data into the model and measuring how the amplitude and period changes would be really interesting, in my opinion.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The manuscript presents an improved model of the circadian clock network that accounts for tissue-specific clock behavior, spatial differences in light sensitivity, and local coupling achieved through intercellular sharing of mRNA. In contrast to whole-plant or "phase-only" models, the authors' approach enables them to address the mechanism behind coupling and how the clock maintains regional synchrony in a noisy environment. Using 34 parameters to describe clock activity and applying the properties mentioned above, the authors demonstrate that their model can recapitulate the spatial waves in circadian gene expression observed and can simulate how the plant maintains local synchrony with regional differences in rhythms under noisy LD cycles. Spatial models that incorporate cell-type-specific sensitivities to environmental inputs and local coupling mechanisms will be most accurate for simulating clock activity under natural environments.

      We have the following major criticisms as follows

      1) When assigning light sensitivities in different regions of the plant, the authors assign a higher sensitivity value to the root tip (L=1.03) than they do to the other part of the root (L=0.90). We are curious why the root tip would have higher light sensitivity than the rest of the root. Is this based on experimental data (if so, please cite in this section or methods)? It seems that these L values were assigned simply to make sure they recapitulated the period differences observed in Fig. 2A. Are these values based on PhyB expression in those organs? Or perhaps based on cell density in those locations?

      2) In the discussion of the test where they set the "light inputs to be equal" in all regions to simulate the phyb-9 mutant, could the authors please clarify whether that means they set the L light sensitivity value equal in all regions? a. If they are referring to setting the L value equal to all regions, we suggest that this discussion be moved to the section about different light sensitivities instead of the local sharing of mRNA section. b. Additionally, is it possible to set the light sensitivity to zero for all parts of the plant? We think this would be more suitable to simulate the phyb-9 mutant phenotype.

      3) Based on the recent Chen et al. (2020) paper showing ELF4 long-distance movement, we think it would be of great interest for the authors to model ELF4 protein synthesis/translation as the coupling factor, in addition to the modeling using CCA1/LHY mRNA sharing. We understand you may be saving this analysis for a future modeling paper, but this addition to the paper could increase the impact of this paper.

      4) This model is able to simulate circadian rhythms under 12:12 LD cycles, which represents two days of the year-the equinoxes. We are curious if the model can simulate rhythms under short days and long days as well. We understand this analysis may be outside the scope of this paper and may require changing the values of the 34 parameters used but think it could be a useful addition here or in future work.

      And minor criticisms as follows

      1) In the first paragraph of the results section, it would be helpful for the authors to reference Table S1 when they mention the 34 parameters used to model oscillator function

      2) In the first paragraph of the section titled "Local flexibility persists under idealized and noisy LD cycles", it would be helpful for the authors to reference S12 Fig after the last sentence that starts "However, ELF4/LUX appeared more synchronized..."

      3) In the first paragraph of the section titled "Cell-to-cell coupling maintains global communication under noisy light-dark cycles", the authors refer to a "Table 1" but I think they mean to refer to Table S1"

      4) In Fig. 1, panel C is described as demonstrating the cell-to-cell coupling through the "level of CCA1/LHY". This phrasing is vague and we think could be improved to the "mRNA level of CCA1/LHY".

      Significance

      This work would be broadly interesting to other researchers studying cell-to-cell signaling and coupling of circadian rhythms in plants and other species where spatial waves of gene expression have been observed (i.e., mice and humans). Additionally, the computational modeling aspect of this work was easily interpretable for someone outside this expertise. Our expertise lies in plant circadian biology.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      A. Summary:

      In this modeling study, the authors devised a multicellular model to investigate how circadian clocks in different parts (organs) of plants coordinate their timing. The model uses a plausible mechanism to explain how having a different sensitivity to light leads to different phase and period of circadian clock, which is observed in different plant organs. The model allows for entrainment in Light-Dark (LD) cycles and then a release in always-light (LL) environments. The model disentangles numerous factors that have confounded previous experiments. In one instance, the authors assigned different light sensitivities to the different organs (e.g., root tip, hypocotyl, etc.) which unambiguously show that this one element alone - spatially differing sensitivity to light - is sufficient for recapitulating experimentally observed differences in periods and phases between plant organs. The model also recapitulates the spatial waves of gene expression within and between organs that experimentalists reported. At the sub-tissue level, the model-produced waves have similar patterns as the experimentally observed waves. This confirmation further validates the model. By having the cells share clock mRNA, from any clock component genes, showed the same, experimentally observed spatial dynamics. The main conclusion of the study is that regional differences (e.g., between different organs) in light senilities, when combined with cell-to-cell sharing of clock-gene mRNAs, enables a robust, yet flexible, circadian timing under noisy environmental cycles.

      B. Specific points:

      1.Lines 125-127: "To simulate the variability observed in single cell clock rhythms, we multiplied the level of each mRNA and protein by a time scaling parameter that was randomly selected from a normal distribution." - Why not add a white (Gaussian) noise term to these equations? How does multiplying by a random variable (for rescaling time) different from my proposal? Some explanation should be given in the text here.

      2.Does the spatial network model simplify calculations by assuming separations of timescales (e.g., for equilibration in concentrations of mRNAs that diffuse between cells)? If so, it would be good to spell these out in the beginning of the Results section (where the model is described).

      3.Lines 161-162: "....in a phase only model by local...." should be "....in a phase model only by local...."

      4.Lines 188-190: The authors observed that qualitatively similar/indistinguishable behaviors arose regardless of which elements are varied (e.g., global versus local cell-cell coupling, setting light input to be equal in all regions of the seedling, etc.). Then they claim here that "...these results show that the assumptions of local cell-to-cell coupling and differential light sensitivity between regions are the key aspects of our model that allow a match to experimental data." - I don't see how this follows from the observation almost any of the variations lead to the same behaviors in this section (spatial waves). Show the reasoning in the text here.

      5.Pgs. 9 -10: Section on "Cell-to-cell coupling maintains global coordination under noisy light-dark cycles": The simulation results rigorously support the authors' main conclusion here, which is that local cell-to-cell coupling allows for global coordination under noisy LD cycles. But I'm missing an intuitive explanation (or just any explanation) for why this is. At the end of this section, the authors should provide some intuition or qualitative explanation for the observations that they produced using their model in this section.

      6.Lines 261-262: Replace the present tenses with past tenses.

      7.Is the main idea that cell-to-cell coupling allows for averaging of fluctuations, between organs or cells within the same organ, while allowing for coordination of the average quantities? Is this responsible for both the flexibility and robustness observed under noisy environmental cycles?

      8.Line 304: Is it really true that the mammalian circadian rhythm is centralized? Don't some parts of our bodies have different circadian clock (e.g., slight differences in phase) than some other parts of our bodies?

      Significance

      Overall assessment:

      I enthusiastically recommend this work for publication after the authors address my comments below (please see "Specific points").

      The model's main strength is that the authors could vary each ingredient separately - light sensitivity of each cell/organ, which gene's mRNA diffuses between cells, cellular noise, local versus global cell-cell coupling, etc. Afterwards, the authors could determine which of these variations produces which experimentally observed behaviors. Another strength of the model is that it can reproduce not just one, but numerous, experimentally observed behaviors that are important for understanding circadian clocks in plants. Thus, the model is grounded in experimental truth and produces experimentally observed results. Crucially, since the authors could vary every single element in the model independently of the other elements, the authors are able to provide plausible explanations for why the experiments produced the results that they did (experimentally, a number of confounding factors prevented one from pinpointing to which element produced which observation).

      Another strength of the model is also extendable, by other researchers to investigate other plant physiologies in the future (e.g., circadian clock's influence on cell division). The authors highlight these future uses in the discussion section. Therefore, I believe that this work will be valuable to plant biologists, non-plant biologists who are interested in circadian clocks, and systems biologists in general.

      The manuscript is also well written and relatively easy to follow, even for non-plant biologists like myself.

      REFEREE'S CROSS-COMMENTING

      Comment on Reviewer #2:

      I agree with his/her major criticism #3 (ELF4 long-distance movement). I find this to be a reasonable request. Fulfilling it would increase the paper's impact.

      Comment on Reviewer #3:

      The reviewer's point (1) asks for a reasonable request. Regarding his/her point (2): This is also reasonable. I'd recommend his/her suggestion (a). In the end, I'd be interested to see how the authors respond to this (what function they choose to let adjacent cells be subjected to some correlated light-input intensity. I'd be happy with something simple such as < intensity > + noise, where <intensity> is a deterministic term that, for example, decreases exponentially as one moves away from some central cell. Basically, I'd let the authors decide how to implement this and accept their current implementation - no correlation in light-intensity between adjacent cells - as an extreme scenario, as this reviewer points out.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes (Nr1d1, Cry1, Bmal1), and performed model selection/fitting to explain the observed mRNA distributions. They decomposed the mRNA variability into distinct sources, and showed that intrinsic noise (transcription burst) dominates the variance. Therefore, looking at transcript counts may not be feasible to estimate single-cell circadian phase. However, the study is quite descriptive and ends up being a bit dissatisfying, so if the authors could improve this aspect by perhaps analyzing a mechanism on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho), it would help quite a bit in this regard. The model selection/fitting itself was not really sufficient to compensate for this, as it stands .

      We thank the reviewer for appreciating the new smFISH data, the analyses performed, and the consequences regarding phase inference from single cell snapshots.

      The reviewer suggests “perhaps analyzing a mechanism on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho)”, and we have thus added a new Results paragraph (lines 281-316) and two new Supp Figures 13 and 14 to directly address this point.

      Specifically, we have added a dynamic, stochastic model of the circadian clock in order to add mechanistic insight into the parameters of the preferred model M4. Concerning \rho, in the initial manuscript we suggested that the correlations of cell-specific burst sizes (described by the parameter \rho) in the preferred model M4 could result from the underlying network topology. To substantiate this claim, we have now added an analysis of a stochastic model of the clock that includes gene-gene interaction amongst the core-clock genes. The core-clock network involves variables (such as protein levels), parameters (such as mRNA/ protein half-lives) and additional genes (such as Clock) that are not directly measurable in our experiments; and thus offering a detailed mechanistic mathematical model for our data is therefore not realistic. We therefore developed a simplified mathematical model for the three measured genes to explore the underlying mechanisms that could control the parameter \rho, as the referee suggests. As a starting point, we used the circadian clock gene network topology for Nr1d1, Cry1 and Bmal1 as modelled in Relógio et al. (Relógio et al., 2011) (see new Supplementary Material). To keep the model close to the inference framework, we used oscillatory functions for the burst frequency while the transcription rate (and hence the burst size) for each gene is affected by the protein levels of the other genes in the network. Using stochastic simulations we show that, for particular configurations of feedback where the negative repression of Nr1d1 by CRY1 is high, the network can generate positive mRNA correlation between Bmal1/Cry1 mRNA and negative correlation between Nr1d1/Cry1mRNA, as observed in our data (Figure 2C). Furthermore, using the same inference framework as for our data on the simulated mRNA distributions, the obtained \rho is positive for Bmal1/Cry1 and negative for Nr1d1/Cry1, which was also found for our data (Figure 3C). Even though the model is clearly a simplified representation of the clock, these simulations give credence to the scenario that the \rho parameter obtained from the data is a signature of the underlying network topology.

      While the emphasis of the paper is certainly on parameter inference of the single-cell RNA FISH data, we believe the addition of this dynamic model provides more mechanistic insight into the results of the model fitting and hence significantly more depth to the article.

      \*Specific comments:** *

      1.It is hard to distinguish the RNA FISH signals (Figure 1A, 2B). It is probably technically challenging as the mRNAs are of low abundance. I think it may help if they adjust the contrast for the cytoplasm stain or just delineate the cell boundaries.

      Thank you for pointing this out, and we agree that our rendering of the FISH images was not optimal and have now significantly improved it (see new Figure 1A and 2B). Considering the other reviewers’ comments related to the images, we have now 1) added the cell contours as requested; 2) use red/green for the smFISH signal in the pairs of genes; 3) we have improved the contrast to make it easier to distinguish the RNA FISH signals.

      2.In Figure 2C, the authors showed gene-pair correlations with cells of all sizes. Could the authors do a size-dependent extrinsic-noise filtering (Padovan-Merhar, Dev. Cell, 2015; Hansen et al., 2018, Cell Systems) to better dissect the correlations?

      We used negative binomial distributions to directly model the number of mRNA in the cells, which is a natural choice given that the raw smFISH are integer counts. The model incorporates cell size dependencies in a unified framework, which predicts the joint distribution of raw counts, which is why we showed raw counts in the main figure. That being said, as the referee suggests, it can be useful for exploratory purposes to see the relationship between the measured genes while regressing out the contribution of cell area, and we have now added this analysis as Supp Figure 9. On line 156-161 we write:

      “To also estimate the correlation between genes while accounting for cell area, we regressed out the area for each gene and recalculated the correlation coefficients [37,38]. Since all genes are positively correlated with area (Fig. 2A), this processing shifted the correlations for both pairs of genes. Specifically, the correlation coefficients for the area-filtered mRNA counts decreased but remained positive for Bmal1/Cry1 and became more negative for Nr1d1/Cry1(Supp Figure 9).”

      3.For fitting model M3, as the authors pointed out, there are many local minima. Is the fitting score truly sufficient to eliminate the possibility for partial synchrony especially considering that the authors didn't show how effective the Dex treatment was to synchronize the circadian phase?

      Thank you for this comment. In fact, we didn't mean to fully eliminate the possibility of imperfect synchronization, but have tried our best to address it both experimentally and with modeling.

      Experimentally, in addition to the Dex treatment, we also compared with a condition in which we entrained the cells using temperature cycles, which is a standard in the field to achieve the best synchronization. We obtained a fold change of 2.1, which was in the range of previous studies (Saini, et al, 2012) and was slightly higher than with Dex synchronisation (1.6). Given that the improvement was not high and that it was important for us to study the system under free-running conditions and not in an entrained state (i.e. phase locking, which distorts the free dynamics and noise characteristics of the oscillator), we used the Dex protocol.

      Model 3 was used as a computational approach to correct for the individual phases. In addition to the difficult optimisation landscape, the challenge with model M3 also resides in the difficulty of estimating an individual phase for each cell, as the two mRNA counts measured in each cell do not contain sufficient phase information. This could potentially be resolved by either measuring more genes simultaneously, but is, however, beyond the scope of the present manuscript. We have added discussion on this to the text on lines 244-248:

      “Thus, it was apparently difficult to use model M3 to correct the individual phase for each cell, likely due to the fact that the two mRNA counts measured in each cell do not contain sufficient phase information, and that the global optimisation problem contains many local minima. This could potentially be improved by measuring more genes simultaneously.”

      We have also added a new Results section (lines 305-316) and Supp Figure 14 to show that imperfect synchrony alone cannot explain the correlation structure observed in our data. Indeed, if two genes have a similarly phased oscillation, the expression of the two genes will be positively correlated (as shown in the new Supp Figure 14). Similarly, when the oscillations are in anti-phase, negative correlations will be found. Given that Nr1d1 and Cry1 are closer in phase than Bmal1 and Cry1, one would expect that the correlation between Nr1d1 and Cry1 (once accounting for area) would be more positive than for Bmal1 and Cry1, which was not found in the data (area-corrected correlations shown in Supp Figure 9). It therefore seems unlikely that the observed correlations could be caused by imperfect synchrony alone. Together with our simulations of the gene network (described above), we therefore argue that gene-gene interactions are a more plausible mechanistic explanation of the correlations observed in our measured bivariate mRNA distributions.

      4.Regarding model M4, the authors added a cell-specific noise term without specifying the contributing factors. Typically adding degrees of freedom should improve fitting and make it easier for a model to fit, why not in this case? Can the authors provide some explanations/mechanisms.

      We believe there has been a misunderstanding regarding model M4. By adding parameters, model M4 is indeed easier to fit. There is even a problem of overfitting whereby the burst frequency becomes unrealistically high and the model effectively fits a Poisson distribution to each individual cell. To avoid this, we lock the burst frequency values to the posterior mean values from model M2. After describing model M4, we write (lines 260-265):

      “When all parameters are free, we noticed that the burst frequency can become unrealistically high due to a tendency to overfit to individual cells, and we therefore locked the burst frequency to the posterior mean values from model M2. The PSIS-LOO scores overall favoured model M4 (Fig. 3B), and the predicted joint probability density shows good similarity to the observed data (Fig. 3D) (all time points shown in Supp figure 11).”

      Regarding the above comment in the reviewer’s summary on contributing factors of model M4 we added a simple dynamical model that attempts to explain at least one possible mechanism of generating correlations in cell-specific bursting parameters (see above).

      5.The authors should include the number (range) of cells analyzed in the figure legends.

      We have now added the number of cells used at each time point to the legend of Figure 1D. To respond to Reviewer #2 we have also added details on the number of smFISH replicates used at each time point. The number of cells for each replicate is shown in Supp Figures 2-5.

      Reviewer #1 (Significance (Required)):

      Overall, we felt conflicted about the manuscript. On one hand, the authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes. On the other hand, the manuscript was a bit dissatisfying/descriptive. If the authors could provide and analyze some sort of mechanisms on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho) it should help improve the manuscript.

      We thank the review for the suggestion to expand on the mechanistic interpretation, which we have followed. In addition, we would like to emphasise that a similar smFISH analysis of the core circadian oscillator has never been done, and we believe our data represents a significant contribution to the field. Moreover, our quite generic probabilistic inference framework for smFISH using mixture models to describe intrinsic (transcriptional bursting) and extrinsic fluctuations is also novel and the code provided (written using the Stan probabilistic programming language) might find a wide applicability.

      Concerning the mechanistic description, as described above, we added a stochastic, dynamic model of gene expression and propose that gene-gene interactions within the core-clock network topology represent a plausible mechanism for generating correlated burst parameters between genes, which are a feature of the preferred model M4 found during inference. We additionally added an explanatory figure to argue that, given the phase relationship between genes, imperfect synchronisation alone cannot explain the observed correlations that we observe between the pairs of genes. Together, this analysis provides more mechanistic insight into the underlying factors controlling the gene-gene relationships in our measured bivariate mRNA distributions.

      \*Referees cross-commenting** *

      I agree with Reviewer #3 regarding expanding the discussion to include the Shah & Tyagi and Raj et al citations on buffering. However caution should be exercised regarding ref 26 as it is quite controversial and subsequent analyses came to different conclusions (PMID: 30359620 and 30243562). The general consensus is that nuclear buffering of transcript noise (proposed in ref 26) is not a general phenomenon (ref 27 is specific to the calcium response pathway). In fact, the presence and evolution of specific pathways to buffer transcriptional noise, such as protein-protein mechanisms (Shah & Tyagi) or extended half-life proteins (Raj et al. and others), argues that transcript fluctuations are not probably buffered in general.

      Following the suggestion of Reviewer #3, we have expanded the Discussion to include the references cited (Shah & Tyagi, Raj and others).

      Previous work from our lab is also nuancing the conclusions from references 26 and 27. Specifically, buffering effects are expected to be highly gene-specific (3’UTR), and in fact we have not seen those with our unstable construct during live-cell imaging (Suter et al., 2011; Zoller et al., 2015). We have also added text in order to explicitly state that subsequent papers have nuanced the general claims in references 26 and 27. In the text we write (lines 335-342):

      “One explanation for the low intrinsic fluctuation in these studies is that transcriptional fluctuations are filtered by nuclear retention, though other reports suggest that Fano factors (variance/mean, a measure of overdispersion compared to the Poisson distribution) can be even larger in the cytoplasm than in the nucleus [38]. In the cells used here, the strong signature of transcriptional bursting and high intrinsic noise is consistent with live imaging of a Bmal1transcriptional reporter in the same cell line under similar growth conditions, where intrinsic noise was estimated to be 4-times larger than extrinsic noise [23].”.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      \*Summary:** *

      The authors study experimentally and computationally the dynamic transcription of circadian clock genes over time in individual cells with single molecule RNA-FISH with the aim to understand how different noise sources contribute to single cell transcription variability and basic functions of circadian clocks. The authors integrate experiments with computational modeling to understand biology.

      \*Major comments:** *

      This study has some major limitations that need to be addressed to test the model usefulness, to understand noise sources and to gain biological insights into circadian clocks.

      We thank this reviewer for the constructive feedback which enabled us to significantly strengthen the revised manuscript.

      The limitations are on the experiments, the computational implementation of the modeling and the integration of experiments with models.

      Although the experimental datasets contain several hundred cells per time point for multiple time points, only a single replica experiment is presented. From the presented data it is not clear how reproducible these temporal patterns are and if indeed differences between timepoints can be resolved if multiple biological replica experiments have been analyzed. To address this point at least three biological experiments needs to be presented and analyzed for each of the genes. Plotting the SEM on the means in figure 1B is misleading because several hundred cells have been measured which automatically makes the error small. The SEM just describes how well we can determine the mean from a distribution. Instead a mean and std from the biological replicas need to be plotted to show how experimental variability in experiments is resulting in the described expression pattern. This is similar to RNA-seq data or RT-PCR from multiple replica.

      We certainly agree that demonstrating reproducibility is important. Note that our smFISH data is from three independent cell culture dishes and microscopy slides, which included independent cell synchronization. This was described in the Methods but we agree that the data presentation was not showing the individual replicas, which we have now added. In Figure 1B, we now show the mean of each replicate for each time point. While the reviewer suggested displaying the mean and standard deviation across replicates, we show all data points at each time point to make it even more transparent. The mRNA distribution of each replicate is also shown in Supp Figures 2-5, together with individual quantification of mean, coefficient of variation and number of cells.

      In addition, to further demonstrate the reproducibility of the temporal patterns we have performed an additional independent experiment on four time points. This experiment shows that the oscillatory patterns for Nr1d1 and Cry1are clearly significant and reproducible (new Supp Figure 7). The combination of the replicates shown for the main experiment (Supp Figures 2-5) and the new replicate experiment (Supp Figure 7) shows that the oscillatory temporal patterns for the mean mRNA levels are robust and reproducible, and in fact similar as those found in bulk analyses (Ukai-Tadenuma et al., 2011; Hughes et al., 2009), which is expected.

      It is also not clear how good the cell segmentation works and how does cell segmentation influence the analysis. In figure 1A show the segmentation of the cell boundary together with the membrane stain.

      Thanks to this and other reviewers’ comments, we have now significantly improved the presentation of the FISH images. We have now 1) added the cell contours as requested; 2) used red/green for the smFISH signal in the pairs of genes; 3) we have improved the contrast to make it easier to distinguish the RNA FISH signals.

      We have also added Supp Figure 1 to show that the cell segmentation we used is reliable. In fact, as we had described, we used the sum Z-stack projections of the red channel (Wu et al., 2018), which we found provides the most accurate cell segmentation. We now show in Supp Figure 1 that the obtained segmentation shows convincing agreement with the cell autofluorescence .

      The authors use the RNA mean and RNA-FISH distributions and combine this data to build and compare different models. How do you know that the given data fulfils the central limit so that a model describing the mean is an adequate approach? To test this point, the authors should show through subsampling from the data and the model that indeed their data sets have enough cells to fulfil the central limit theorem.

      This comment reflects a misunderstanding of our approach, which we now try to better explain. In our inference framework we use a negative binomial (NB) distribution (and mixtures of NBs) to model the full distribution of mRNA counts, and our approach is therefore not based exclusively on the mean of the distribution. The estimation of model parameters and comparison of models is performed using the PSIS-LOO optimisation procedure (see below). The mixture model of NB binomials makes a few assumptions which we had clearly stated. In fact it captures both bursty transcription (in the limit of short bursts as is biologically plausible, which yields the NB distribution), and cell-to-cell variability (extrinsic noise) captured by the mixture. The suitability of the NB to model bursty transcription is established (Raj et al., 2006), and it is parameterized by a mean and a dispersion coefficient, such that the CV of the distribution is the inverse of the burst frequency (Zoller et al., 2015). Therefore the mean is indeed an important parameter of the model, but we do not see the relationship with the CLT. The used probabilistic inference (PSIS-LOO: Pareto-Smoothed Importance Sampling Leave-One-Out, Vehtari et al. 2017, see below) is established and state-of-the-art for selecting models of the appropriate complexity and we are not aware of a similar previous quantitative model for smFISH analysis.

      We have now added significantly more explanations both on the general approach as well as the methodological details in a fully-revised Methods section to avoid further misunderstanding.

      A strength of the manuscript is that several competing and biologically meaningful models have been generated. However, the manuscript lacks rigor in terms of how fitting and model selection is performed. It is not clear how good the models fit the data. To address this point, the authors should visually compare the model fits to the data and plot their fit errors as a function of model complexity.

      We fully agree that comparing different models using a model selection approach is a powerful methodology, in fact it is arguably the most systematic way to approach modeling problems in quantitative biology. Model selection is an active research area and there have been significant developments recently. Here, we used a state-of-the-art and established Bayesian approach (PSIS-LOO: Pareto-Smoothed Importance Sampling Leave-One-Out, Vehtari et al. 2017), which is certainly rigorous and more objective than visual comparison. The PSIS-LOO is conceptually similar to other approaches of model performance such as AIC or WAIC, and the entire field of model selection aims at establishing rigorous methods to assess the tradeoff between fit errors and model complexity. In PSIS-LOO, this is done by using pareto-smoothed importance sampling to estimate the expected log pointwise predictive density for a new dataset using leave-one-out cross-validation. The PSIS-LOO is the currently recommended metric for measuring model performance in Bayesian analysis (Vehtari et al., 2017) and is considered superior to other approaches such as computations of Bayes factors since it is less sensitive to model priors (Gelman et al. 2013). The performance of the models as measured with PSIS-LOO is shown in Figure 3B. As already mentioned, we have added further details as to how the fitting and model selection is performed in a revised Methods section. We agree that visual comparison is useful to gain intuition and this is why we showed the bivariate distributions in Figure 3D and in Supp Figure 11.

      Regarding the comment on “fit error”, note also that we probabilistically model the full mRNA distribution for each gene. In each cell, there is a likelihood score that measures the likelihood of observing the measured mRNA count given the modelled probability distribution. As our approach is based on this likelihood, the notion of “fitting error” needs to be replaced by the log likelihood (‘fitting error’ is mathematically equivalent to a log-likelihood when the noise model is Gaussian, which is not the case here).

      Another limitation is that the models have not been validated for example by using them to make predictions. One type of prediction could be to fit the model to one biological replica and then predict the other replica (cross validation). Another prediction would be to take the distribution fitted to the experimental data and then compare the model mean to the experimental mean.

      Thank you for this comment. As explained above, we used the state-of-the-art PSIS-LOO to measure the predictive performance of the models, which approximates the result of leave-one-out cross-validation using the full data set. To further assess the predictive capabilities of the model, we have now also added a “leave-replicate-out” cross-validation, as the reviewer suggests (new Supp Figure 12). The aim of our “leave-replicate-out” cross-validation was to test how well the predictions of each model generalise to independent cells that are not in the training set. To do this, we trained each model while omitting the data from one gene on a test slide. We then calculated the likelihood score of the test slide using the parameters from the training set, and repeated this for all slides. Similarly to the PSIS-LOO, the results of the leave-replicate-out cross-validation convincingly show that model M4 has the highest predictive performance. This is now described in the updated text on lines 265-271.

      The results from fitting and prediction should be plotted as a function of model complexity. This kind of analysis will illustrate how model complexity is supported by the data.

      As already mentioned, we used state-of-the-art algorithms to analyze prediction vs. complexity. With the above addition, we now have two methods of calculating the predictive performance of each model: the approximate leave-one-out score as measured with PSIS-LOO and the leave-replicate-out cross-validation. For each model, the PSIS-LOO score is plotted in Figure 3B and the leave-replicate-out cross-validation score is shown in Supp Figure 12.

      In the method section on models, a biological motivation must be presented to justify the different model assumption.

      Thank you for pointing out that the biological justification of the models needed to be expanded. In addition to the improved justifications already provided in the Results section, we have now updated the Methods section such that a biological motivation is included for each model.

      How do the models that fit the distributions describe the mean?

      As explained above, the inference is performed on the entire distributions, using a family of distributions (mixtures of NBs) which are parameterized in a biologically relevant manner (transcriptional bursting + extrinsic noise). The mean and variance of the distribution are now described on lines 585-586 in addition to Figure 3A.

      It is necessary to list model parameters for each of the models, their description, their parameter values, their parameter uncertainty and units of each parameter.

      Thank you, this has now been added as Supplementary Tables 2-5.

      It is not clear to me how the joint probability in figures 2,4, S2 and S4 have been used to fit the model.

      Again, the joint distributions are modeled using mixtures of NBs and the inference is performed on the entire dataset at once using a log-likelihood approach. This uses all the data at once, and it is embedded in a Bayesian model selection method. The way that the joint probability is used is now clarified in the revised Methods section and in the Results section (lines 208-214):

      “For both models M1 and M2, the likelihood of observing the data given the parameters of the model is evaluated using the model-specific NB distribution and the mRNA counts for both genes in each cell. This is performed for both Bmal1/Cry1 and Nr1d1/Cry1 pairs across all time points, and this likelihood is combined with model priors to define the posterior parameter distribution for each model (Methods). We applied Hamiltonian Monte Carlo sampling within the STAN probabilistic programming language to sample the posterior distribution and infer model parameters 40.”

      How do the models make sense in the context of the fact that human genes exist as a diploids?

      This is a good point, although note though that the 3T3 cells are from mice and not humans. 3T3 cells are tetraploid, and it turns out that under the justified assumption that the bursts are short (Zoller et al., 2015; Suter et al., 2011), the number of alleles rescales the burst frequency, i.e. the effective (observed) burst frequency equals the number of alleles times the burst frequency per allele, but it does not change the shape of the distributions. On line 580-582 we have now written: “Since 3T3 cells are tetraploid, and, again assuming that the bursts are short, the inferred burst frequency for tetraploid cells will be approximately four times that of a single allele.”

      The variance decomposition is shortly described but no results are presented to show how this is done. This should be better explained.

      The variance decomposition we used is not a new result; in fact, we used the analytical results of Bowsher, C. G. & Swain, P. S. “Identifying sources of variation and the flow of information in biochemical networks” (PNAS, 2012). The mathematical proofs of the formula we use are contained within that reference; however, we have re-written this section to make it clearer to the reader (lines 688-718).

      \*Minor comments:** *

      In figure 3A, it is not clear to me what these different plots relate to the models. It is also not clear what are equations that describe each model.

      The Methods section has now been improved to show the full data-generating mechanism for each model, and each model has its own section title to make it easier to find. We have also improved the legend for Figure 3 to make the relationship to each model clearer.

      The legends in figure 3 are not very informative. More details need to be presented to understand this figure.

      Thank you for pointing this out, and we have now re-written the figure legend for Figure 3 to make the figure clearer.

      Reviewer #2 (Significance (Required)):

      This is an interesting and important topic with the potential to have general implication of how to model periodic single cell gene expression data and eventually better understand circadian clocks. This study will expand on other modeling studies of circadian clocks and has the potential to advance the field (PMCID: PMC7229691). I personally have done similar analysis and experiments in another system and biological context which has demonstrated the power of this approach if implemented rigorously. I am not an expert in circadian clocks in human cells.

      We thank the reviewer for appreciating the implications for the circadian and single cell gene expression community. Note that to our knowledge, modeling smFISH counts using mixtures of negative binomials combined with Bayesian model selection has not been done. It is both highly relevant biologically (combines intrinsic and extrinsic fluctuations in a rigorous way), general and its applicability extends far beyond the circadian oscillator. Therefore, this approach for quantitative smFISH data analysis also fills an important methodological gap.

      \*Referees Cross commenting** *

      Reviewer #1:

      I agree with the assessment that model fitting and model selection was not sufficient. But I disagreed that the data is enough. Although many cells and time points are analyzed, there is no evidence of how reproducible each mRNA distribution can be measured at each time point. I think reproducibility is key and will also help with the model fitting and identification.

      Regarding the point on reproducibility, we have made the following four changes:

      1. We have added an independent 4 time-point experiment to show that the oscillatory patterns of the distributions are reproducible (Supp Figure 7).
      2. In Figure 1 we now also show the mean of each replicate for the main experiment (Figure 1B).
      3. We also show the mRNA distributions of each replicate in Supp Figures 2-5.
      4. We have added the “leave-replicate-out” cross-validation to show that that the model performance of the preferred model generalises to independent slides that were not included in training (Supp Figure 12). In responding to Reviewer #1 regarding the modeling, we have now also added a simplified dynamical model of circadian clock expression to add mechanistic insight into our proposed models. Overall, we have significantly expanded the description of the model selection approaches to help readers who are less familiar with Bayesian model selection methods.

      Reviewer #3:

      Regarding the red background, my understanding is that this comes from the probe hybridization. This is maybe because the probe concentration has not been optimized or the number of probes per gene is low and the signal to noise is not so good.Or it could be auto fluorescent background. In this case a different fluorophore needs to be used to avoid this problem.

      Thank you for those comments, and we agree with all reviewers that the presentation of the images needed to be improved. It turned out that in Figure 1, we had shown the cell mask in red so it is clearly not related to probe concentration or autofluorescence. We have now removed the cell mask channel from the main images which allows highlighting better the smFISH signals. All smFISH images for Figures 1 and 2 have been much improved, and we’ve added a new Supp Figure 1 to show the performance of our cell segmentation.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this paper Nicholas et al image mRNAs encoding the key controllers of circadian rhythms, Rev-erba, Cry and Bmal1 in single cells over time. It was shown earlier that single cells exhibit circadian rhythms using reporter genes. A large number of studies have shown that transcription is an inherently stochastic process, which raises a question as to how single cells are able to achieve their rhythms on the face of this noise. Their results show that the number of mRNAs for the three genes exhibit the expected periodicity, but this periodicity is associated with significant cell-to-cell variation. They also explore to what extent this variability derives from stochastic transcription vs other sources of variation that are extrinsic to the genes. The results are interesting and experimental and modeling results are important (however this reviewer is not able to judge the veracity of mathematics that underlay the models).

      We thank this reviewer for appreciating the importance of our work.

      \*Some of the concerns that arose are listed below:** *

      1.The images show an annoying red background. If the red is HCS cell mask, it should be removed, and RNA presented on grey scale. This will make a better presentation. The red hue also appears in fig 2 b but here it is one of the RNA. I suggest in Fig 2 one RNA can be presented in green and the other in red, while the nuclei in blue.

      Thank you for this comment. We had indeed shown the cell mask in the red channel and now removed it. Together with the other suggestions and comments from the reviewers, we implemented the following changes: 1) added the cell contours as requested; 2) use red/green for the smFISH signal in the pairs of genes; 3) we have improved the contrast to make it easier to distinguish the RNA FISH signals. The presentation of the images is now much improved.

      2.This paper and a few others talk about the cell size contributing to the cell-to-cell variability in mRNA numbers. Where does it come from physically? One can imagine based on the cell cycle stage there could be more than two copies of then gene in a cell, which will yield more RNAs, but they say that their cells don't have much cell cycle variability. Perhaps a clearer discussion is called for rather than just being polite to other investigators.

      The referee is right that several studies observed empirically that larger cells show more mRNA molecules in smFISH experiments (Padovan et al., 2015; Kempe et al., 2015). In Padovan et al. (2015), the authors found that transcriptional burst size changes with cell volume and burst frequency with cell cycle. The main theory for transcription scaling with cell volume is to maintain transcript concentration. Using cell fusion experiments, they showed that cellular size can directly and globally affect gene expression by modulating transcription. Furthermore, they proposed that the mechanism underlying the global regulation integrates both DNA content and cellular volume to produce the appropriate amount of RNA for a cell of a given size, which is consistent with a model whereby a factor limiting for transcription is sequestered to the DNA. We used these results to propose a model whereby burst size scales with area, and we found an increase in predictive performance (compare M2 with M1 in Figure 3B). While our model selection supported the inclusion of cell area, the variance decomposition showed that the fraction of variance due to cell area ranged from 4.2% for Nr1d1 to 17.6% for Bmal1. We have now expanded the introduction to discuss this in more depth (lines 73-80) as requested.

      3.References 26 and 27 are cited for 10-80% of variance due to gene extrinsic sources. These references actually deny that there is a significant transcriptional noise in most genes. Again, stronger discussion is called for.

      As mentioned in the reply to Reviewer 1, previous work from our lab is also nuancing the conclusions from references 26 and 27. Specifically, buffering effects are expected to be highly gene-specific (3’UTR), and in fact we have not seen those with our unstable construct during live-cell imaging (Suter et al., 2011; Zoller et al., 2015). We have also added text in order to explicitly state that subsequent papers have nuanced the general claims in references 26 and 27. In the text we write (lines 335-342):

      “One explanation for the low intrinsic fluctuation in these studies is that transcriptional fluctuations are filtered by nuclear retention, though other reports suggest that Fano factors (variance/mean, a measure of overdispersion compared to the Poisson distribution) can be even larger in the cytoplasm than in the nucleus [38]. In the cells used here, the strong signature of transcriptional bursting and high intrinsic noise is consistent with live imaging of a Bmal1transcriptional reporter in the same cell line under similar growth conditions, where intrinsic noise was estimated to be 4-times larger than extrinsic noise [23].”.

      4.The results raise a very important question, whether and to what extent the transcriptional noise propagates to the next step of gene regulation and are there buffering mechanisms in the cell. For example, Raj et al, Variability in gene expression underlies incomplete penetrance, Nature 2010, show that alternative pathways serve to buffer the impact of gene expression noise. Similarly, Shah and Tyagi, Barriers to transmission of transcriptional noise in a c-fos c-jun pathway, Mol Syst Biol, 2013, show that variability in mRNA is buffered at protein level and the level of protein-protein complexes. Furthermore, they show that to the extent those vary, the chromatin intrinsically buffers against the fluctuations in numbers of transcription factors. Mention of these and other studies will enrich the paper.

      We have modified the Discussion section and now discuss these papers (and a few more). We thank the reviewer for the suggestions, which will help the reader to have a broader overview of noise buffering in gene expression and indeed enrich the paper.

      Reviewer #3 (Significance (Required)):

      Significance is high. Quality is high.

      \*Referees Cross-Commenting** *

      I agree with the comments made by other reviewers particularly about references 26 and 27. The major conclusions of reference 26 were questioned by Hansen et al 2018. At the bottom of page 7 the authors are qualifying their results in the light of references 26 and 27. Perhaps now there is less of a need to do so.

      As mentioned above, we have added the following sentence citing the Hansen paper to make it clear to the reader that key conclusions of the references 26 and 27 are disputed (lines 335-342):

      “One explanation for the low intrinsic fluctuation in these studies is that transcriptional fluctuations are filtered by nuclear retention, though other reports suggest that Fano factors (variance/mean, a measure of overdispersion compared to the Poisson distribution) can be even larger in the cytoplasm than in the nucleus [38].

      References

      Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. 2013. Bayesian Data Analysis, 3rd edn. CRC Press, London.

      Hughes ME, DiTacchio L, Hayes KR, Vollmers C, Pulivarthy S, Baggs JE, Panda S, Hogenesch JB. 2009. Harmonics of circadian gene transcription in mammals. PLoS Genet 5. doi:10.1371/journal.pgen.1000442

      Kempe H, Schwabe A, Cremazy F, Verschure PJ, Bruggeman FJ. 2015. The volumes and transcript counts of single cells reveal concentration homeostasis and capture biological noise. Mol Biol Cell 26:797–804. doi:10.1091/mbc.E14-08-1296

      Padovan-Merhar O, Nair GP, Biaesch AG, Mayer A, Scarfone S, Foley SW, Wu AR, Churchman LS, Singh A, Raj A. 2015. Single Mammalian Cells Compensate for Differences in Cellular Volume and DNA Copy Number through Independent Global Transcriptional Mechanisms. Mol Cell 58:339–352. doi:10.1016/j.molcel.2015.03.005

      Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. 2006. Stochastic mRNA synthesis in mammalian cells. PLoS Biol4:e309. doi:10.1371/journal.pbio.0040309

      Relógio A, Westermark PO, Wallach T, Schellenberg K, Kramer A, Herzel H. 2011. Tuning the mammalian circadian clock: Robust synergy of two loops. PLoS Comput Biol 7:1–18. doi:10.1371/journal.pcbi.1002309

      Saini C, Morf J, Stratmann M, Gos P, Schibler U. 2012. Simulated body temperature rhythms reveal the phase-shifting behavior and plasticity of mammalian circadian oscillators. Genes Dev 26:567–580. doi:10.1101/gad.183251.111

      Suter DM, Molina N, Gatfield D, Schneider K, Schibler U, Naef F. 2011. Mammalian Genes Are Transcribed with Widely Different Bursting Kinetics. Science (80- ) 332:472–474. doi:10.1126/science.1198817

      Ukai-Tadenuma M, Yamada RG, Xu H, Ripperger JA, Liu AC, Ueda HR. 2011. Delay in feedback repression by cryptochrome 1 Is required for circadian clock function. Cell 144:268–281. doi:10.1016/j.cell.2010.12.019

      Vehtari A, Gelman A, Gabry J. 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27:1413–1432. doi:10.1007/s11222-016-9696-4

      Wu C, Simonetti M, Rossell C, Mignardi M, Mirzazadeh R, Annaratone L, Marchiò C, Sapino A, Bienko M, Crosetto N, Nilsson M. 2018. RollFISH achieves robust quantification of single-molecule RNA biomarkers in paraffin-embedded tumor tissue samples. Commun Biol 1:1–8. doi:10.1038/s42003-018-0218-0

      Zoller B, Nicolas D, Molina N, Naef F. 2015. Structure of silent transcription intervals and noise characteristics of mammalian genes. Mol Syst Biol 11:823. doi:10.15252/msb.20156257

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this paper Nicholas et al image mRNAs encoding the key controllers of circadian rhythms, Rev-erba, Cry and Bmal1 in single cells over time. It was shown earlier that single cells exhibit circadian rhythms using reporter genes. A large number of studies have shown that transcription is an inherently stochastic process, which raises a question as to how single cells are able to achieve their rhythms on the face of this noise. Their results show that the number of mRNAs for the three genes exhibit the expected periodicity, but this periodicity is associated with significant cell-to-cell variation. They also explore to what extent this variability derives from stochastic transcription vs other sources of variation that are extrinsic to the genes. The results are interesting and experimental and modeling results are important (however this reviewer is not able to judge the veracity of mathematics that underlay the models).

      Some of the concerns that arose are listed below:

      1.The images show an annoying red background. If the red is HCS cell mask, it should be removed, and RNA presented on grey scale. This will make a better presentation. The red hue also appears in fig 2 b but here it is one of the RNA. I suggest in Fig 2 one RNA can be presented in green and the other in red, while the nuclei in blue.

      2.This paper and a few others talk about the cell size contributing to the cell-to-cell variability in mRNA numbers. Where does it come from physically? One can imagine based on the cell cycle stage there could be more than two copies of then gene in a cell, which will yield more RNAs, but they say that their cells don't have much cell cycle variability. Perhaps a clearer discussion is called for rather than just being polite to other investigators.

      3.References 26 and 27 are cited for 10-80% of variance due to gene extrinsic sources. These references actually deny that there is a significant transcriptional noise in most genes. Again, stronger discussion is called for.

      4.The results raise a very important question, whether and to what extent the transcriptional noise propagates to the next step of gene regulation and are there buffering mechanisms in the cell. For example, Raj et al, Variability in gene expression underlies incomplete penetrance, Nature 2010, show that alternative pathways serve to buffer the impact of gene expression noise. Similarly, Shah and Tyagi, Barriers to transmission of transcriptional noise in a c-fos c-jun pathway, Mol Syst Biol, 2013, show that variability in mRNA is buffered at protein level and the level of protein-protein complexes. Furthermore, they show that to the extent those vary, the chromatin intrinsically buffers against the fluctuations in numbers of transcription factors. Mention of these and other studies will enrich the paper.

      Significance

      Significance is high. Quality is high.

      Referees Cross-Commenting

      I agree with the comments made by other reviewers particularly about references 26 and 27. The major conclusions of reference 26 were questioned by Hansen et al 2018. At the bottom of page 7 the authors are qualifying their results in the light of references 26 and 27. Perhaps now there is less of a need to do so.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary: The authors study experimentally and computationally the dynamic transcription of circadian clock genes over time in individual cells with single molecule RNA-FISH with the aim to understand how different noise sources contribute to single cell transcription variability and basic functions of circadian clocks. The authors integrate experiments with computational modeling to understand biology.

      Major comments:

      This study has some major limitations that need to be addressed to test the model usefulness, to understand noise sources and to gain biological insights into circadian clocks.

      The limitations are on the experiments, the computational implementation of the modeling and the integration of experiments with models.

      Although the experimental datasets contain several hundred cells per time point for multiple time points, only a single replica experiment is presented. From the presented data it is not clear how reproducible these temporal patterns are and if indeed differences between timepoints can be resolved if multiple biological replica experiments have been analyzed. To address this point at least three biological experiments needs to be presented and analyzed for each of the genes. Plotting the SEM on the means in figure 1B is misleading because several hundred cells have been measured which automatically makes the error small. The SEM just describes how well we can determine the mean from a distribution. Instead a mean and std from the biological replicas need to be plotted to show how experimental variability in experiments is resulting in the described expression pattern. This is similar to RNA-seq data or RT-PCR from multiple replica.

      It is also not clear how good the cell segmentation works and how does cell segmentation influence the analysis. In figure 1A show the segmentation of the cell boundary together with the membrane stain.

      The authors use the RNA mean and RNA-FISH distributions and combine this data to build and compare different models. How do you know that the given data fulfils the central limit so that a model describing the mean is an adequate approach? To test this point, the authors should show through subsampling from the data and the model that indeed their data sets have enough cells to fulfil the central limit theorem.

      A strength of the manuscript is that several competing and biologically meaningful models have been generated. However, the manuscript lacks rigor in terms of how fitting and model selection is performed. It is not clear how good the models fit the data. To address this point, the authors should visually compare the model fits to the data and plot their fit errors as a function of model complexity.

      Another limitation is that the models have not been validated for example by using them to make predictions. One type of prediction could be to fit the model to one biological replica and then predict the other replica (cross validation). Another prediction would be to take the distribution fitted to the experimental data and then compare the model mean to the experimental mean.

      The results from fitting and prediction should be plotted as a function of model complexity. This kind of analysis will illustrate how model complexity is supported by the data.

      In the method section on models, a biological motivation must be presented to justify the different model assumption.

      How do the models that fit the distributions describe the mean?

      It is necessary to list model parameters for each of the models, their description, their parameter values, their parameter uncertainty and units of each parameter.

      It is not clear to me how the joint probability in figures 2,4, S2 and S4 have been used to fit the model.

      How do the models make sense in the context of the fact that human genes exist as a diploids?

      The variance decomposition is shortly described but no results are presented to show how this is done. This should be better explained.

      Minor comments:

      In figure 3A, it is not clear to me what these different plots relate to the models. It is also not clear what are equations that describe each model.

      The legends in figure 3 are not very informative. More details need to be presented to understand this figure.

      Significance

      This is an interesting and important topic with the potential to have general implication of how to model periodic single cell gene expression data and eventually better understand circadian clocks. This study will expand on other modeling studies of circadian clocks and has the potential to advance the field (PMCID: PMC7229691). I personally have done similar analysis and experiments in another system and biological context which has demonstrated the power of this approach if implemented rigorously. I am not an expert in circadian clocks in human cells.

      Referees Cross commenting

      Reviewer #1: I agree with the assessment that model fitting and model selection was not sufficient. But I disagreed that the data is enough. Although many cells and time points are analyzed, there is no evidence of how reproducible each mRNA distribution can be measured at each time point. I think reproducibility is key and will also help with the model fitting and identification.

      Reviewer #3: Regarding the red background, my understanding is that this comes from the probe hybridization. This is maybe because the probe concentration has not been optimized or the number of probes per gene is low and the signal to noise is not so good. Or it could be auto fluorescent background. In this case a different fluorophore needs to be used to avoid this problem.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes (Nr1d1, Cry1, Bmal1), and performed model selection/fitting to explain the observed mRNA distributions. They decomposed the mRNA variability into distinct sources, and showed that intrinsic noise (transcription burst) dominates the variance. Therefore, looking at transcript counts may not be feasible to estimate single-cell circadian phase. However, the study is quite descriptive and ends up being a bit dissatisfying, so if the authors could improve this aspect by perhaps analyzing a mechanism on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho), it would help quite a bit in this regard. The model selection/fitting itself was not really sufficient to compensate for this, as it stands .

      Specific comments:

      1.It is hard to distinguish the RNA FISH signals (Figure 1A, 2B). It is probably technically challenging as the mRNAs are of low abundance. I think it may help if they adjust the contrast for the cytoplasm stain or just delineate the cell boundaries.

      2.In Figure 2C, the authors showed gene-pair correlations with cells of all sizes. Could the authors do a size-dependent extrinsic-noise filtering (Padovan-Merhar, Dev. Cell, 2015; Hansen et al., 2018, Cell Systems) to better dissect the correlations?

      3.For fitting model M3, as the authors pointed out, there are many local minima. Is the fitting score truly sufficient to eliminate the possibility for partial synchrony especially considering that the authors didn't show how effective the Dex treatment was to synchronize the circadian phase?

      4.Regarding model M4, the authors added a cell-specific noise term without specifying the contributing factors. Typically adding degrees of freedom should improve fitting and make it easier for a model to fit, why not in this case? Can the authors provide some explanations/mechanisms.

      5.The authors should include the number (range) of cells analyzed in the figure legends.

      Significance

      Overall, we felt conflicted about the manuscript. On one hand, the authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes. On the other hand, the manuscript was a bit dissatisfying/descriptive. If the authors could provide and analyze some sort of mechanisms on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho) it should help improve the manuscript.

      Referees cross-commenting

      I agree with Reviewer #3 regarding expanding the discussion to include the Shah & Tyagi and Raj et al citations on buffering. However caution should be exercised regarding ref 26 as it is quite controversial and subsequent analyses came to different conclusions (PMID: 30359620 and 30243562). The general consensus is that nuclear buffering of transcript noise (proposed in ref 26) is not a general phenomenon (ref 27 is specific to the calcium response pathway). In fact, the presence and evolution of specific pathways to buffer transcriptional noise, such as protein-protein mechanisms (Shah & Tyagi) or extended half-life proteins (Raj et al. and others), argues that transcript fluctuations are not probably buffered in general.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** In this study, the authors investigate the role of hedgehog signaling and lipid metabolism in the neural stem cell niche of the Drosophila larvae. They demonstrate that Hedgehog localizes to lipid droplets in glial cells and show that Hh is necessary but not sufficient for elaboration of glial membranes and normal rates of glial proliferation during development. In addition, they provide an extensive set of results in support of a model that FGF signaling functions upstream of lipid metabolism and hh in glial cells as well as a parallel ROS mediated pathway in glial cells to promote neuroblast proliferation. In general, the results provide strong support for the conclusions. Specifically, the approaches are sound, the images clearly demonstrate the phenotypes described, and the effects are quantified and tested for statistical significance. **Major comments:** 1.Since Hh RNAi decreases the glial compartment (which slows NB proliferation) and increases the frequency of pH3+ NBs, it is unclear why it would decrease the number of EdU+ NBs (Fig. S3C). 2.If overexpression of htl[ACT] slows the NB cell cycle (as evidenced by reduced pH3 and EdU positive cells), it unclear why it does not reduce the number of NBs (Fig. 4L). 3.What is the justification for presenting the EdU quantifications as an EdU index in which the experimental values are normalized to the average number of positive cells in the control? In many cases, the comparison is to the same w[1118] line so it does not control for a specific genetic backgrounds and yet this method may be obscuring experimental variation present between datasets. Likewise, why is glial number presented as a fold-change but NB number is presented as raw counts (e.g. 2D vs S3E)? **Minor comments:** On the top of P.14, "Figure S7A-C" should probably be "Figure S6A-C" Reviewer #1 (Significance (Required)): The cell autonomous regulation of growth and proliferation of neuroblasts in the larval brain have been well-studied, but much less is known about the non-cell autonomous signals. This paper significantly moves forward knowledge in this area by describing multiple steps of a molecular mechanism for glial regulation of the neuroblast cell cycle. These findings would be of interest not only to the study of Drosophila neuroblasts, but also to the broader adult stem cell field. My expertise is in Drosophila stem cell biology and genetics. Reviewer #2 (Evidence, reproducibility and clarity (Required)): **Summary:** The study by Dong et al., investigates the role of Hedgehog in the glial niche during larval neurogenesis in Drosophila. The authors describe the expression of Hh in cortex glia and its association with lipid droplets. They show that Hh expression in cortex glia is required for cortex glial proliferation, cell autonomously, and for maintenance of the normal cell cycle in neuroblasts. They go on to use a well characterised Drosophila glioma model, activation of FGF signalling, to investigate the requirement for Hh during cortex glial overgrowth. They show that FGF-activated cortex glial overproliferation requires Hh for modulation of neuroblast cell cycle, although Hh does not regulate cortex glial proliferation in this context. Finally, they show that inhibition of lipid modification of Hh rescues the neuroblast proliferation cell cycle defect caused by FGF activation in cortex glia. **Major comments:** 1.From the data in presented in Fig. 2H-K and Fig. S3C, I am very confused about role of Hh in the non-cell autonomous regulation of neuroblast cell cycle. Both RNAi and overexpression of Hh with Repo-Gal4 cause a reduction in the neuroblast EdU index (Fig. 2H-K and S3C). The authors conclude this section on p.7 saying "Together, our data suggests that high levels of glial Hh expression restricts NB cell cycle progression." This statement is not consistent with data. What is the normal physiological role of Hh if both decreased and increased levels of cortex glial Hh expression reduce neuroblast cell cycle? The discussion of p.15 does not clarify this issue. The model in Fig.7J relates to the role of Hh in the context of cortex glial FGF activation and does not illustrate the normal physiological role of Hh in the regulation of neuroblast cell cycle. 2.P.8 "Analysis of the total glial cell number indicates overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of cortex glial cells (Figure 4E-G, I-K)." This statement is confusing as Repo staining was used to quantify total glial numbers (including perineural, sub-perineural and cortex glia) but these data are then taken to represent and increase specifically in cortex glia. This should be clarified. 3.It should be mentioned on p.8 that the data in Fig.4A-K reproduce the findings of Avet-Rochex et al., 2012 and Read et al., 2009. 4.Figure 6F. Presumably due to the increase in glia cell number and dramatic increase in glial cell volume, any gene that is specific to, or enriched in, cortex glia will have increased expression levels in RepoGal4>htlACT larval CNS. Can the authors provide evidence that the increase in the expression of these genes is specific to FGF transcriptional regulation and not just a relative increase in the levels of these genes due to an increase in cortex glia as proportion of total CNS volume? Is there any evidence that Hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling in glia? 5.FGF signalling has been shown to be necessary and sufficient for cortex glial proliferation. So does knockdown of Htl, or expression of dominant negative Htl, cause a reduction in Hh, fasn1 and lsd2 expression in cortex glia? If so, does how does reduction of cortex glial numbers independent of FGF signalling, using for example knockdown of String or expression of Decapo, affect the expression of Hh, fasn1 and lsd2 in cortex glia? 6.Can the authors speculate on why and how increased levels of Hh in cortex glia, in the context of FGF activation, inhibit neuroblast cell cycle? Is this a physiological mechanism to limit neuroblast proliferation in the face of increased gliogenesis, or is it simply an indirect result of 'spillover' of excess Hh from cortex glia onto neuroblasts (which are autonomously regulated by Hh and so sensitive to this ligand) by due to increased cortex glia cells? **Minor comments:** -Figure 1C' some lipid droplets are extremely large, is this consistent with previous literature? -Including a profile plot of relative fluorescence intensity in Figure 1C',F',H' to illustrate colocalization of lipidTOX and Hh, would be helpful. -Figure S3A,B quantify Hh protein level and CNS size phenotypes with Hh RNAi. -p.6 include data showing overexpression of Hh does not cause glial overgrowth. -Top of p.14 should be FigS6A-C. -Include quantification of glial overgrowth and lipid droplet phenotypes with HtlACT plus catalase and SOD1 overexpression (Fig. S6D-K). Reviewer #2 (Significance (Required)): The is a novel and very interesting study, well written and the data are very clearly presented. It builds on and adds to the emerging literature on the glial niche and its role in neural stem cell regulation. It will be of great interest to Drosophila neurobiologists but also to the broader field of neural stem cell biology. My expertise is Drosophila neurobiology.

      Dear editor

      Below is our response to the reviewer’s comments and our experimental plan in addressing these concerns.

      Reviewer #1

      Major comments:

      1.Since Hh RNAi decreases the glial compartment (which slows NB proliferation) and increases the frequency of pH3+ NBs, it is unclear why it would decrease the number of EdU+ NBs (Fig. S3C).

      Our experimental data suggests that accompanying glial niche disruption and downregulation of glia-derived signals, NBs are stalled in M phase (we detected an increase in the percentage of pH3+ NBs). As a consequence, less NBs are in G1 and S phase. Therefore, when we conducted a 15-min EdU incorporation, we observed a reduction in EdU incorporation. This NB phenotype (increase in pH3 index and decrease in EdU index) was also observed by Speder and Brand, 2018, when they induced glial niche impairment by inhibiting the PI3K signaling pathway (discussed in P7 of this ms).

      To address whether glial-Hh knockdown reduces the ability of NBs to produce progeny, we plan to carry out two experiments:

      • We will assess the total number of neurons in the CB by assessing Elav+ neurons.

      • We will conduct two EdU pulse-chase experiments. First, we will assess the total number of EdU+ neurons produced within a 4-hr time window (neurons marked with Elav); and the secondly, we will mark the NB lineage (with either nerfin-1-GFP or pros-GFP) and quantify the number of EdU+ neurons produced per lineage during a 4-hr time window.

      Together, these experiments should allow us to assess the consequence of glial-Hh knockdown on NB proliferation.

      If overexpression of htl[ACT] slows the NB cell cycle (as evidenced by reduced pH3 and EdU positive cells), it unclear why it does not reduce the number of NBs (Fig. 4L).

      The number of NBs in the larval CNS is specified at the beginning of post-embryonic neurogenesis, when quiescent NBs re-enter the cell cycle (reviewed by Homem and Knoblich, 2012). Once NBs re-enter the cell cycle, the number of NBs remain constant. NBs undergo asymmetric division to produce one daughter NB and a GMC, which divides once to generate two neurons. With each round of NB-division, the number of NBs remain constant. Therefore, changes in NB cell cycle speed does not alter the overall NB number, only the number of neurons produced.

      To clarify this, we will add a schematic depicting NB asymmetric division to Figure 1.

      3.What is the justification for presenting the EdU quantifications as an EdU index in which the experimental values are normalized to the average number of positive cells in the control?

      EdU index is calculated as number of EdU+ NBs normalised to control EdU+ NBs. The number of EdU+ NBs reflects the NBs that progress through S phase in a 15-min time relative to the control. A similar method was used in Kanai et al., 2018. This method would not be valid only if NB number varied between control and experimental data sets, however, the number of NBs in all our genetic manipulations are not significantly altered relative to their control. We present the quantification of some key manipulations in Reviewer_Figure 1A, B.

      As regards to why we normalise to control in each of these experiments, this is because in-vitro EdU incorporation rely on Click-IT chemistry, which is inherently variable due to incubation conditions. To overcome this, we always incubate control and experimental brains in the same tube and imaged them with the same confocal setting, and each experiment is normalised to its control done in parallel. We have now included Table 1 which includes all the raw data from these experiments (Table 1)

      In the revised manuscript, we will clarify our methodology in greater detail in the Methods section, and we are happy to include Table 1in the supplementary data.

      In many cases, the comparison is to the same w [1118] line so it does not control for a specific genetic backgrounds and yet this method may be obscuring experimental variation present between datasets.

      We have used three different controls in our experiments, namely GAL4 or lexA >w1118, or UAS-mcherryRNAi, or UAS-luc. We detect no significant difference in terms of raw EdU+ NB numbers between the controls used in our experiments, as demonstrated below (Reviewer_Figure 1C). In our revised manuscript, we will include a sentence “As UAS-mcherryRNAi or UAS-luc are indistinguishable from the > w1118 control, we have used GAL4 driver > w1118 as control in place of UAS-luc in our results”.

      Reviewer_Figure 1. Total NB number and Edu+ NB number quantification

      1. A) Hh knockdown or overexpression in glia does not significantly alter NB number compared to control.
      2. B) htlACT overexpression in glia does not significantly alter NB number compared to control.
      3. C) EdU+ NB number is not significantly different within the controls GAL4 or lexA > w1118, or UAS-mcherryRNAi, or UAS-luc. P-value was obtained performing student t-test in A, B and One-way ANOVA in C.

      Likewise, why is glial number presented as a fold-change but NB number is presented as raw counts (e.g. 2D vs S3E)?

      Glial number quantification was carried out using Fiji 3D object counter and a plug-in called “DeadEasy Larval Glia” (Forero et al., 2012), where the threshold of detection is dependent on the brightness of Repo staining in each experiment, this data is presented as fold-change, as control and experiment stained in the same tube are compared to each other. We represented this data as fold-change to allow easy comparison between experiments. The raw data is presented in Table 2. NB number is counted manually and is therefore presented as raw counts.

      **Minor comments:**

      On the top of P.14, "Figure S7A-C" should probably be "Figure S6A-C"

      We will correct this.

      Reviewer #1 (Significance (Required)):

      The cell autonomous regulation of growth and proliferation of neuroblasts in the larval brain have been well-studied, but much less is known about the non-cell autonomous signals. This paper significantly moves forward knowledge in this area by describing multiple steps of a molecular mechanism for glial regulation of the neuroblast cell cycle. These findings would be of interest not only to the study of Drosophila neuroblasts, but also to the broader adult stem cell field.

      My expertise is in Drosophila stem cell biology and genetics.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Major comments:**

      1.From the data in presented in Fig. 2H-K and Fig. S3C, I am very confused about role of Hh in the non-cell autonomous regulation of neuroblast cell cycle. Both RNAi and overexpression of Hh with Repo-Gal4 cause a reduction in the neuroblast EdU index (Fig. 2H-K and S3C). The authors conclude this section on p.7 saying "Together, our data suggests that high levels of glial Hh expression restricts NB cell cycle progression." This statement is not consistent with data. What is the normal physiological role of Hh if both decreased and increased levels of cortex glial Hh expression reduce neuroblast cell cycle? The discussion of p.15 does not clarify this issue. The model in Fig.7J relates to the role of Hh in the context of cortex glial FGF activation and does not illustrate the normal physiological role of Hh in the regulation of neuroblast cell cycle.

      With repo-GAL4>hhRNAi, the cortex glial niche enwrapping NBs is dramatically disrupted, which indirectly alters NB cell cycle progression, indicated by an increase in pH3 index and a decrease in EdU index. From these two pieces of data, it is likely that NBs are stuck in M phase, thus resulting in less NBs in G1 and S phase that are capable to incorporate EdU within a 15-min incubation time window. We will firm up this data with experiments proposed to address concerns of Reviewer 1, Point 1.

      Both RNAi and overexpression of Hh with repo-GAL4 causes a reduction in NB EdU index is seemingly contradictory. However, it is consistent with a previous report from Speder and Brand, 2018, where it was shown that that glial niche impairment induced by the PI3K pathway inhibition also causes a similar NB phenotype (an increase in pH3 index and a decrease in EdU incorporation). Furthermore, with repo-GAL4>htlDN, which caused a similar glia niche impairment (data not shown), we observed a similar phenotype (an increase in pH3 index and a slight decrease in EdU incorporation). Therefore, we concluded that the NB cell cycle progression defects is due to a general cortex glial niche disruption rather than a direct effect of Hh inhibition on NBs. We are happy to include the repo-GAL4>htlDN data in the supplementary data if required.

      With regards to the physiological role of Hh, we can only conclude from the data at hand that Hh is required for the development of cortex glial niche, which is required to maintain NB activities. In terms of how glial niche impairment impedes NB cell cycle progression, we observed that without a proper niche chamber, NBs cluster together instead of residing in separate niches (Figure 2F-G). Therefore, it is possible that the localization of other cell types (i.e. GMCs and neurons) are also altered as a result of NB clustering, which can potentially affect the NB cell cycle. While these questions will be interesting to explore in the future, they are beyond the scope of this current study.

      In contrast, we robustly showed Hh signals, when overexpressed in glial niche, were capable of making contact with NBs (Figure 7C-C’) and triggering a slow-down of NB S-phase progression. Therefore, it is fair to conclude that “high levels of glial Hh expression restricts NB cell cycle progression”.

      In the revised manuscript, we will discuss these findings in greater detail.

      2.P.8 "Analysis of the total glial cell number indicates overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of cortex glial cells (Figure 4E-G, I-K)." This statement is confusing as Repo staining was used to quantify total glial numbers (including perineural, sub-perineural and cortex glia) but these data are then taken to represent and increase specifically in cortex glia. This should be clarified.

      We thank the reviewer for picking this up. Our intention was to quantify the number of cortex glia cells in glial-specific htlACT, InRwt and EgfrACT manipulations. However, two reported cortex glial antibodies (PntP2 from Avet-Rochex et al., 2012 and SoxN described in Read, 2018), showed unspecific labelling of other cell types (Reviewer_Figure 2, arrows, neurons and NBs). As an alternative, we quantified the total glial cell number (Repo+) in htlACT, InRwt or EgfrACT overexpressed using a cortex glial driver (NP2222-GAL4). We expect that the alterations in glial cell number would be primarily attributed to cortex glial-specific gene manipulation. We agree that we should say that “overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of glial cell”.

      In the revised manuscript, we will clarify this in the results section.

      Reviewer_Figure 2: PntP2 staining in the larval CNS.

      A-B) Representative images showing that PntP2 antibody stains cortex glial cells (marked by NP2222-GAL4>mGFP, yellow arrows), NBs (white arrows) and neurons (blue arrows). B) is the zoomed in image of A). Scale bar = 50 mm.

      It should be mentioned on p.8 that the data in Fig.4A-K reproduce the findings of Avet-Rochex et al., 2012 and Read et al., 2009.

      We will correct this.

      4.Figure 6F. Presumably due to the increase in glia cell number and dramatic increase in glial cell volume, any gene that is specific to, or enriched in, cortex glia will have increased expression levels in RepoGal4>htlACT larval CNS. Can the authors provide evidence that the increase in the expression of these genes is specific to FGF transcriptional regulation and not just a relative increase in the levels of these genes due to an increase in cortex glia as proportion of total CNS volume? Is there any evidence that Hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling in glia?

      We agree that FGF activation causes a dramatic increase in glial cell number, thus will cause a relative increase in the level of hh, fasn1 and lsd2s. However, with RT-qPCR, the same amounts of total RNA (1μg) were extracted from control vs repo-GAL4> htlACT and reverse transcribed into cDNA for qPCR. Therefore, the mRNA level described in Figure 6 F are already normalized to the total amount of genetic material.

      In the literature, it is not reported that hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling. However, lipid metabolism rewiring is well known as a hallmark of glioblastoma. For example, high levels of FASN has been linked with high grade glioblastoma (Grube et al., 2014). Furthermore, FGF signalling has also been shown to modulate lipid metabolism and alter the transcription of the Lsd-2 homologue called Plin2 in a mouse model (Ye et al., 2016).

      To figure out whether hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling. we will have to first find out which TFs are altered in the glia upon altered FGF signalling via cortex glia specific RNA-seq, and then conduct DamID to identify their target genes. This would be interesting to follow-up but is however beyond the scope this current study.

      We will add a section on this in the discussion section of the revised ms.

      FGF signalling has been shown to be necessary and sufficient for cortex glial proliferation. So does knockdown of Htl, or expression of dominant negative Htl, cause a reduction in Hh, fasn1 and lsd2 expression in cortex glia?

      In response to glial htlDN overexpression, we observed a significant reduction in total glial number and overall Hh expression. However, RT-qPCR showed that mRNA levels of hh, fasn1 or lsd-2 were not altered upon htlDNoverexpression (Reviewer_Figure 3).

      This data will be included in the supplementary data in the revised ms.

      Reviewer_Figure 3. Glial htlDN overexpression doesn’t alter the expression of hh, fasn1 and lsd2. The mRNA levels of hh, fasn1 and lsd2 are normalized to the reference gene rpl32.

      Continued: If so, how does reduction of cortex glial numbers independent of FGF signalling, using for example knockdown of String or expression of Decapo, affect the expression of Hh, fasn1 and lsd2 in cortex glia?

      To address this question, we plan to assess the expression levels of hh, fasn1 and lsd-2 using glia specific expression of an inhibitor of the PI3K (delta p60), which has been shown by Speder and Brand, 2018 to cause a reduction in cortex glial number. We will also ascertain whether Decapo overexpression causes cortex glial niche impairment. If so, we will also assess the expression levels of hh, fasn1 and lsd-2 in this setting.

      6.Can the authors speculate on why and how increased levels of Hh in cortex glia, in the context of FGF activation, inhibit neuroblast cell cycle? Is this a physiological mechanism to limit neuroblast proliferation in the face of increased gliogenesis, or is it simply an indirect result of 'spillover' of excess Hh from cortex glia onto neuroblasts (which are autonomously regulated by Hh and so sensitive to this ligand) by due to increased cortex glia cells?

      We favour the model that excess Hh in the glia compartment “spills over” to reduce NB proliferation, which are autonomously regulated by Hh and therefore are sensitive to this ligand. We can add this to the discussion.

      **Minor comments:**

      -Figure 1C' some lipid droplets are extremely large, is this consistent with previous literature?

      These large lipid droplets are caused by lipid droplet fusion due to the use of detergent in this experiment. When we perform antibody staining together with lipid droplet staining, PBST detergent is required for antibody staining to work. However, this created the artefact of large lipid droplets, due to lipid droplet fusion. This has previously been reported by Bailey et al., 2015, and we have explained this in P19 of the Method section.

      -Including a profile plot of relative fluorescence intensity in Figure 1C',F',H' to illustrate colocalization of lipidTOX and Hh, would be helpful.

      We will include this in the revised ms.

      -Figure S3A,B quantify Hh protein level and CNS size phenotypes with Hh RNAi.

      We will include this in the revised ms.

      -p.6 include data showing overexpression of Hh does not cause glial overgrowth.

      We will include this in the revised ms.

      -Top of p.14 should be FigS6A-C.

      We will correct this.

      -Include quantification of glial overgrowth and lipid droplet phenotypes with HtlACT plus catalase and SOD1 overexpression (Fig. S6D-K).

      We will include this in the revised ms.

      Reviewer #2 (Significance (Required)):

      The is a novel and very interesting study, well written and the data are very clearly presented. It builds on and adds to the emerging literature on the glial niche and its role in neural stem cell regulation. It will be of great interest to Drosophila neurobiologists but also to the broader field of neural stem cell biology.

      My expertise is Drosophila neurobiology.








      Table 1. EdU+ NB numbers for each genotype described in each Figure

      Figure

      Genotype

      EdU incubation time

      Average EdU+ NB number

      SEM

      Number of samples

      Figure 2J

      repo-GAL4>w1118

      15 min

      66.63

      1.79

      16

      Figure 2J

      repo-GAL4>UAS-hh

      15 min

      57.35

      1.35

      20

      Figure 2K

      NP2222-GAL4>w1118

      15 min

      67.91

      1.44

      11

      Figure 2K

      NP2222-GAL4>UAS-hh

      15 min

      60.79

      0.79

      14

      Figure 2P

      dnab-GAL4>w1118

      15 min

      70.5

      1.44

      12

      Figure 2P

      dnab-GAL4>ciACT

      15 min

      60.1

      1.48

      10

      Figure S3C

      repo-GAL4>dcr2; mcherryRi

      10 min

      57.42

      0.63

      12

      Figure S3C

      repo-GAL4>dcr2; hhRi43255

      10 min

      48.56

      2.65

      9

      Figure 3K

      NP2222-GAL4>w1118

      The same dataset as Figure 2K

      Figure 3K

      NP2222-GAL4>UAS-hh

      Figure 3K

      NP2222-GAL4>UAS-hh; mcherryRi

      15 min

      57.44

      1.41

      16

      Figure 3K

      NP2222-GAL4>UAS-hh; lsdRi34617

      15 min

      63.36

      1.34

      14

      Figure 3K

      NP2222-GAL4>UAS-hh; mcherryRi

      15 min

      58.83

      2.61

      6

      Figure 3K

      NP2222-GAL4>UAS-hh; lsdRi32846

      15 min

      64.5

      1.2

      14

      Figure 5E

      repo-GAL4>w1118

      15 min

      71.6

      1.28

      15

      Figure 5E

      repo-GAL4>UAS-htlACT

      15 min

      56

      1.59

      14

      Figure 5E

      NP2222-GAL4>w1118

      15 min

      70.2

      1.58

      10

      Figure 5E

      NP2222-GAL4>UAS-htlACT

      15 min

      54.75

      1.24

      16

      Figure 6G

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure 6G

      NP2222-GAL4>UAS-htlACT

      Figure 6G

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      60

      1.24

      7

      Figure 6G

      NP2222-GAL4>UAS-htlACT;hhRi43255

      15 min

      67.17

      1.13

      12

      Figure 6G

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      59.29

      1.79

      14

      Figure 6G

      NP2222-GAL4>UAS-htlACT;hhRi25794

      15 min

      68.55

      1.68

      11

      Figure 6H

      dnab-GAL4>mcherryRi

      10 min

      49.13

      1.6

      8

      Figure 6H

      dnab-GAL4>ciRi2125-R2

      10 min

      56.54

      1.27

      13

      Figure 6H

      repo-lexA>w1118

      15 min

      68.5

      1.1

      10

      Figure 6H

      repo-lexA>lexAop-htlACT

      15 min

      55.7

      2.15

      10

      Figure 6H

      repo-lexA>lexAop-htlACT; GFPRi

      15 min

      52

      1.58

      30

      Figure 6H

      repo-lexA>lexAop-htlACT; ciRiHMJ23860

      15 min

      62.4

      1.79

      15

      Figure 6H

      repo-lexA>lexAop-htlACT; GFPRi

      15 min

      56.33

      1.49

      12

      Figure 6H

      repo-lexA>lexAop-htlACT; ciRi2125-R2

      15 min

      62.86

      1.81

      7

      Figure 6J

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure 6J

      NP2222-GAL4>UAS-htlACT

      Figure 6J

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      58.64

      0.99

      14

      Figure 6J

      NP2222-GAL4>UAS-htlACT;fasn1Ri3523R2

      15 min

      65

      2.41

      9

      Figure 6J

      NP2222-GAL4>UAS-htlACT;mcherryRi

      The same dataset as Figure 6G control of NP2222-GAL4>UAS-htlACT;hhRi25794

      Figure 6J

      NP2222-GAL4>UAS-htlACT;lsd2Rikk102269

      15 min

      68.13

      1.08

      8

      Figure S5H

      NP2222-GAL4>mcherryRi

      15 min

      66.4

      1.71

      10

      Figure S5H

      NP2222-GAL4>fasn1Ri3523R6

      15 min

      65.5

      1.38

      10

      Figure S5H

      NP2222-GAL4>mcherryRi

      15 min

      66.4

      1.13

      15

      Figure S5H

      NP2222-GAL4>lsd2Rikk102269

      15 min

      64.2

      0.94

      10

      Figure S5H

      NP2222-GAL4>UAS-luc

      15 min

      65

      1.07

      10

      Figure S5H

      NP2222-GAL4>UAS-lsd2

      15 min

      64.9

      1.51

      10

      Figure S5I

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure S5I

      NP2222-GAL4>UAS-htlACT

      Figure S5I

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      57.93

      0.9

      14

      Figure S5I

      NP2222-GAL4>UAS-htlACT;fasn1Ri3523R6

      15 min

      63.79

      1.25

      14

      Figure S5I

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      50.25

      2.52

      8

      Figure S5I

      NP2222-GAL4>UAS-htlACT;lsd2Ri32846

      15 min

      59.3

      1.2

      10

      Figure 7B

      NP2222-GAL4>mcherryRi

      15 min

      65

      0.93

      10

      Figure 7B

      NP2222-GAL4>raspRi11495R2

      15 min

      65.13

      1.29

      15

      Figure 7B

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure 7B

      NP2222-GAL4>UAS-htlACT

      Figure 7B

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      58.33

      1.06

      18

      Figure 7B

      NP2222-GAL4>UAS-htlACT;raspRi11495R1

      15 min

      63.95

      1.05

      21

      Figure 7B

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      59.04

      1.019

      26

      Figure 7B

      NP2222-GAL4>UAS-htlACT;raspRi11495R2

      15 min

      63.07

      0.92

      29

      Figure 7D

      NP2222-GAL4>w1118

      15 min

      69.46

      1.02

      13

      Figure 7D

      NP2222-GAL4>UAS-hh.N.EGFP

      15 min

      52.25

      1.9

      12

      Figure 7F

      repo-GAL4>UAS-hh.N.EGFP;mcherryRi

      15 min

      54.4

      1.18

      15

      Figure 7D

      repo-GAL4>UAS-hh.N.EGFP;fasn1Ri3523R2

      15 min

      65.69

      1.43

      13

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-LacZ

      15 min

      59.17

      1.18

      12

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-Cat.A

      15 min

      64

      1.31

      12

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-LacZ

      15 min

      53.6

      2.32

      10

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-Sod.1

      15 min

      62.7

      1.76

      10

      Table 2. Raw data on glial number

      Figure

      Genotype

      Average Repo+glial number

      SEM

      Number of samples

      Figure 2D

      repo-GAL4>dcr2; mcherryRi

      843

      44.29

      7

      Figure 2D

      repo-GAL4>dcr2; hhRi43255

      666.5

      46.77

      8

      Figure 4K

      NP2222-GAL4>w1118

      1165

      20.55

      10

      Figure 4K

      NP2222-GAL4>htlACT

      2325

      107.5

      10

      Figure 4K

      NP2222-GAL4>InRwt

      1189

      85.92

      10

      Figure 4K

      wrapper-GAL4>w1118

      1305

      51.78

      7

      Figure 4K

      wrapper-GAL4>EgfrACT

      1192

      38.16

      12

      Reference:

      Avet-Rochex, A., Kaul, A.K., Gatt, A.P., McNeill, H., and Bateman, J.M. (2012). Concerted control of gliogenesis by InR/TOR and FGF signalling in the Drosophila post-embryonic brain. Development 139, 2763-2772.

      Bailey, A.P., Koster, G., Guillermier, C., Hirst, E.M., MacRae, J.I., Lechene, C.P., Postle, A.D., and Gould, A.P. (2015). Antioxidant Role for Lipid Droplets in a Stem Cell Niche of Drosophila. Cell 163, 340-353.

      Forero, M.G., Kato, K., and Hidalgo, A. (2012). Automatic cell counting in vivo in the larval nervous system of Drosophila. J Microsc 246, 202-212.

      Grube, S., Dunisch, P., Freitag, D., Klausnitzer, M., Sakr, Y., Walter, J., Kalff, R., and Ewald, C. (2014). Overexpression of fatty acid synthase in human gliomas correlates with the WHO tumor grade and inhibition with Orlistat reduces cell viability and triggers apoptosis. J Neurooncol 118, 277-287.

      Homem, C.C., and Knoblich, J.A. (2012). Drosophila neuroblasts: a model for stem cell biology. Development 139, 4297-4310.

      Kanai, M.I., Kim, M.J., Akiyama, T., Takemura, M., Wharton, K., O'Connor, M.B., and Nakato, H. (2018). Regulation of neuroblast proliferation by surface glia in the Drosophila larval brain. Sci Rep 8, 3730.

      Read, R.D. (2018). Pvr receptor tyrosine kinase signaling promotes post-embryonic morphogenesis, and survival of glia and neural progenitor cells in Drosophila. Development 145.

      Speder, P., and Brand, A.H. (2018). Systemic and local cues drive neural stem cell niche remodelling during neurogenesis in Drosophila. Elife 7.

      Ye, M., Lu, W., Wang, X., Wang, C., Abbruzzese, J.L., Liang, G., Li, X., and Luo, Y. (2016). FGF21-FGFR1 Coordinates Phospholipid Homeostasis, Lipid Droplet Function, and ER Stress in Obesity. Endocrinology 157, 4754-4769.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The study by Dong et al., investigates the role of Hedgehog in the glial niche during larval neurogenesis in Drosophila. The authors describe the expression of Hh in cortex glia and its association with lipid droplets. They show that Hh expression in cortex glia is required for cortex glial proliferation, cell autonomously, and for maintenance of the normal cell cycle in neuroblasts. They go on to use a well characterised Drosophila glioma model, activation of FGF signalling, to investigate the requirement for Hh during cortex glial overgrowth. They show that FGF-activated cortex glial overproliferation requires Hh for modulation of neuroblast cell cycle, although Hh does not regulate cortex glial proliferation in this context. Finally, they show that inhibition of lipid modification of Hh rescues the neuroblast proliferation cell cycle defect caused by FGF activation in cortex glia.

      Major comments:

      1.From the data in presented in Fig. 2H-K and Fig. S3C, I am very confused about role of Hh in the non-cell autonomous regulation of neuroblast cell cycle. Both RNAi and overexpression of Hh with Repo-Gal4 cause a reduction in the neuroblast EdU index (Fig. 2H-K and S3C). The authors conclude this section on p.7 saying "Together, our data suggests that high levels of glial Hh expression restricts NB cell cycle progression." This statement is not consistent with data. What is the normal physiological role of Hh if both decreased and increased levels of cortex glial Hh expression reduce neuroblast cell cycle? The discussion of p.15 does not clarify this issue. The model in Fig.7J relates to the role of Hh in the context of cortex glial FGF activation and does not illustrate the normal physiological role of Hh in the regulation of neuroblast cell cycle.

      2.P.8 "Analysis of the total glial cell number indicates overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of cortex glial cells (Figure 4E-G, I-K)." This statement is confusing as Repo staining was used to quantify total glial numbers (including perineural, sub-perineural and cortex glia) but these data are then taken to represent and increase specifically in cortex glia. This should be clarified.

      3.It should be mentioned on p.8 that the data in Fig.4A-K reproduce the findings of Avet-Rochex et al., 2012 and Read et al., 2009.

      4.Figure 6F. Presumably due to the increase in glia cell number and dramatic increase in glial cell volume, any gene that is specific to, or enriched in, cortex glia will have increased expression levels in RepoGal4>htlACT larval CNS. Can the authors provide evidence that the increase in the expression of these genes is specific to FGF transcriptional regulation and not just a relative increase in the levels of these genes due to an increase in cortex glia as proportion of total CNS volume? Is there any evidence that Hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling in glia?

      5.FGF signalling has been shown to be necessary and sufficient for cortex glial proliferation. So does knockdown of Htl, or expression of dominant negative Htl, cause a reduction in Hh, fasn1 and lsd2 expression in cortex glia? If so, does how does reduction of cortex glial numbers independent of FGF signalling, using for example knockdown of String or expression of Decapo, affect the expression of Hh, fasn1 and lsd2 in cortex glia?

      6.Can the authors speculate on why and how increased levels of Hh in cortex glia, in the context of FGF activation, inhibit neuroblast cell cycle? Is this a physiological mechanism to limit neuroblast proliferation in the face of increased gliogenesis, or is it simply an indirect result of 'spillover' of excess Hh from cortex glia onto neuroblasts (which are autonomously regulated by Hh and so sensitive to this ligand) by due to increased cortex glia cells?

      Minor comments:

      -Figure 1C' some lipid droplets are extremely large, is this consistent with previous literature?

      -Including a profile plot of relative fluorescence intensity in Figure 1C',F',H' to illustrate colocalization of lipidTOX and Hh, would be helpful.

      -Figure S3A,B quantify Hh protein level and CNS size phenotypes with Hh RNAi.

      -p.6 include data showing overexpression of Hh does not cause glial overgrowth.

      -Top of p.14 should be FigS6A-C.

      -Include quantification of glial overgrowth and lipid droplet phenotypes with HtlACT plus catalase and SOD1 overexpression (Fig. S6D-K).

      Significance

      The is a novel and very interesting study, well written and the data are very clearly presented. It builds on and adds to the emerging literature on the glial niche and its role in neural stem cell regulation. It will be of great interest to Drosophila neurobiologists but also to the broader field of neural stem cell biology.

      My expertise is Drosophila neurobiology.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this study, the authors investigate the role of hedgehog signaling and lipid metabolism in the neural stem cell niche of the Drosophila larvae. They demonstrate that Hedgehog localizes to lipid droplets in glial cells and show that Hh is necessary but not sufficient for elaboration of glial membranes and normal rates of glial proliferation during development. In addition, they provide an extensive set of results in support of a model that FGF signaling functions upstream of lipid metabolism and hh in glial cells as well as a parallel ROS mediated pathway in glial cells to promote neuroblast proliferation. In general, the results provide strong support for the conclusions. Specifically, the approaches are sound, the images clearly demonstrate the phenotypes described, and the effects are quantified and tested for statistical significance.

      Major comments:

      1.Since Hh RNAi decreases the glial compartment (which slows NB proliferation) and increases the frequency of pH3+ NBs, it is unclear why it would decrease the number of EdU+ NBs (Fig. S3C).

      2.If overexpression of htl[ACT] slows the NB cell cycle (as evidenced by reduced pH3 and EdU positive cells), it unclear why it does not reduce the number of NBs (Fig. 4L).

      3.What is the justification for presenting the EdU quantifications as an EdU index in which the experimental values are normalized to the average number of positive cells in the control? In many cases, the comparison is to the same w[1118] line so it does not control for a specific genetic backgrounds and yet this method may be obscuring experimental variation present between datasets. Likewise, why is glial number presented as a fold-change but NB number is presented as raw counts (e.g. 2D vs S3E)?

      Minor comments:

      On the top of P.14, "Figure S7A-C" should probably be "Figure S6A-C"

      Significance

      The cell autonomous regulation of growth and proliferation of neuroblasts in the larval brain have been well-studied, but much less is known about the non-cell autonomous signals. This paper significantly moves forward knowledge in this area by describing multiple steps of a molecular mechanism for glial regulation of the neuroblast cell cycle. These findings would be of interest not only to the study of Drosophila neuroblasts, but also to the broader adult stem cell field.

      My expertise is in Drosophila stem cell biology and genetics.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for providing valuable feedback on our original manuscript. A point-by-point response to all of these comments is provided below. [Note that figures are not added in-line because of text-only limitations.]

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The submitted manuscript entitled 'Predicting cell health phenotypes using image-based morphology profiling' (RC-2020-00394) by Way et al. presents a set of seven dyes/staining (as two separate panels) to microscopically screen cell viability. For automatic classification a training/test set of 119 CRISPR (approximately 2 sgRNAs per gene) perturbations on 3 cancer cell lines were generated (lung A549, ovarian ES2, lung HCC44). After segmentation of cell nuclei a set of morphological cell measurements were extracted from each perturbation (total 952 features). The nature of these feature spanning cell cycle and viability phenotypes, enabled the authors to define 70 different phenotype classes, which are used to model a classifier by elastic linear regression. Specific definitions (cell cycle and ROS) were partly predicted/validated in an independent existing image data set (Drug Repurposing Hub project). The data is available as web-based application/visualization and the supplementary method is well described.

      We thank the reviewer for their constructive comments and helpful feedback.

      There is one subtle point that is worth raising given this description: The images we use to measure the cell cycle and viability phenotypes (two different staining panels in the Cell Health assays) are not the same images we use to extract morphology measurements (Cell Painting assay). This lack of connection, which is based on a light wavelength limitation present in all microscopes that limits the number of stains in a single assay, prevents us from developing a method that analyzes the same cells across the three assays. This distinction will become important later in the review, and we have made specific changes in the manuscript to increase clarity.

      **Major concerns:**

      (1)The only fundamental argument of this manuscript not to apply state-of-the-art deep learning (DL) machine-learning (mentioned in McCain et al. 2018), which does not require segmentation, feature extraction, abstraction, manual gating is the 'interpretability' of the predictions. However, performance, precision, scalability (by modern GPUs) with DL should clearly outperform 'manual' regression models. All recent machine vision benchmarks in microscopy confirm this, but also clearly shows 'real world' translational applications, e.g.

      https://www.nature.com/articles/s43018-020-0085-8,

      https://www.biorxiv.org/content/10.1101/2020.07.02.183814v1.full.pdf,

      In other words, the presented methodology is not compared to DL, and is not convincing in terms of interpretability benefits.

      (We’ve copied a similar critique from __Significance sectio__n from Reviewer #1 in order to reduce redundancy) The author/co-authors have been instrumental/pioneered with their past work on cell-based image processing (CellProfiler software), but the presented methodology is simply outdated. Therefore, a revision towards a comparison and benchmarking with DL will also not help.

      Ref (DL with MIL): https://academic.oup.com/bioinformatics/article/32/12/i52/2288769

      We agree that deep learning approaches are exciting; much of our laboratory’s work focuses on their application (see https://doi.org/10.1073/pnas.2001227117, https://doi.org/10.1038/s41592-019-0612-7; https://doi.org/10.1002/cyto.a.23863, https://doi.org/10.1109/CVPR.2018.00970), and we agree that they are likely to outperform simpler regression models trained using so-called hand-engineered features. We thank the reviewer for highlighting our failure to accurately and fully describe our rationale.

      We intentionally did not use deep learning for this problem given (a) data limitations (b) the primary goal of the manuscript, which is to demonstrate feasibility.

      Data limitations. There is no mechanism to link the cells of the assays (Cell Health and Cell Painting) together, which greatly reduces the available sample size. In the two referenced manuscripts, which each propose an exciting approach, the dataset is much larger (~17,000 and ~1,000 images respectively). Our dataset is only 357 perturbations that can only be linked between assays at the perturbation level rather than a single-cell level. Therefore, a deep learning approach is likely to produce models that don’t generalize to other datasets. Furthermore, reviewer 3 commented in favor of the approach we presented: “Using elastic net regression models is well-suited to the problem due to the low number of observations.”

      Primary goal of the manuscript is to demonstrate feasibility. In addition, the primary goal of the manuscript is to add cell health annotations as functional readouts to perturbations. Our aim was to demonstrate feasibility of predicting cell health states, not to optimize performance. Optimizing performance would require collecting much more data, or developing new deep learning or data collection methods to account for the lack of matched single cell readouts.

      To make this rationale more clear and concise, we have made the following changes in the manuscript:

      In the first paragraph of page 3, we make some minor contextual updates (”To demonstrate proof of concept, we collected a small pilot dataset of 119 CRISPR knockout perturbations…”) and replaced “We used simple machine learning methods, which are relatively easy to interpret compared to deep learning” with:

      We used simple machine learning methods instead of a deep learning approach because of our limited sample size of 119 perturbations and the inability to increase the sample size by linking single cell measurements across assays.

      We have also amended the Conclusions section to emphasize our primary goal and note possible deep learning extensions as future directions. The Conclusions now reads:

      We have demonstrated feasibility that information in Cell Painting images can predict many different Cell Health indicators even when trained on a small dataset. The results motivate collecting larger datasets for training, with more perturbations and multiple cell lines. These new datasets would enable the development of more expressive models, based on deep learning, that can be applied to single cells. Including orthogonal imaging markers of CRISPR infection would also enable us to isolate cells with expected morphologies. More data and better models would improve the performance and generalizability of Cell Health models and enable annotation of new and existing large-scale Cell Painting datasets with important mechanisms of cell health and toxicity.

      (2)One aforementioned point of the methodology is cryptically/not described: Why it should be less expensive compared with other (which?) approaches (see introduction)?

      We thank the reviewer for bringing up this point. We believe that part of this confusion stems from a slight misunderstanding about how images from the three assays (two Cell Health and one Cell Painting) are collected. The Cell Health assays are two distinct panels of targeted reagents that are separately prepared as two physically distinct assays. The Cell Painting assay is already an established assay used by many labs and companies around the world to mark cell morphology in an unbiased and relatively cheap way. We are comparing the expenses between the two Cell Health assays vs. the Cell Painting assay.

      We believe that this misunderstanding likely results from our somewhat cryptic and inconsistent language when describing the Cell Health assays in the abstract and introduction. We’ve updated the third sentence of the abstract from “We developed two customized microscopy assays that use seven reagents to measure 70 specific cell health phenotypes...” to now read:

      We developed two customized microscopy assays, one using four targeted reagents and the other three targeted reagents, to collectively measure 70 specific cell health phenotypes including proliferation, apoptosis, reactive oxygen species (ROS), DNA damage, and cell cycle stage.

      For consistency, we have also updated the penultimate paragraph in the introduction to now read:

      To do this, we first developed two customized microscopy assays, which collectively report on 70 different cell health indicators via a total of seven reagents applied in two reagent panels. Collectively, we call these assays “Cell Health”.

      With these clarifications in mind, we believe that the question of comparing monetary costs is more clear. We are comparing the costs of the targeted reagents in the two Cell Health assays to the unbiased reagents in the single Cell Painting assay. We’ve also modified the last two sentences in the first paragraph of the introduction to strengthen the connection between Cell Health assays, targeted reagents, and high cost:

      Cell health is normally assessed by eye or measured by specifically targeted reagents, which are either focused on a single Cell Health parameter (ATP assays) or multiple, in combination, via FACS-based or image-based analyses, which involves a manual gating approach, complicated staining procedures, and significant reagent cost. These traditional approaches limit the ability to scale to large perturbation libraries such as candidate compounds in academic and pharmaceutical screening centers.

      (3)Generalizability and/or training data size is essential for any model-based classification, but not evaluated or validated in the current manuscript. The independent validation on a A549 cell line only data might be not sufficient/convincing.

      We separately address the two distinct points raised by the reviewer of 1) generalizability and 2) training data size:

      Generalizability We agree that any model-based classification must demonstrate generalizability. For this reason, we have taken careful consideration to assess the generalizability of all 70 models in two contexts. First, we assessed model performance in a single held out test set (15% of all data). All results we report in the main text (e.g. Figure 2) report performance on this test set. We see high performance in many (but not all) models, and we observe much better model performance compared to a negative control baseline (New Supplementary Figure S5). High performance in the test set indicates that, for some cell health indicators, the models generalize well.

      Second, we also demonstrate that these models generalize to data from an entirely different experiment using a fundamentally different perturbation (CRISPR vs. drug compounds). We demonstrate generalizability to this external validation data in four different ways: 1) Validating a relatively simple model (“Number of Live Cells”) with an orthogonal viability readout from the PRISM assay (barcoding-based cell viability; updated Figure 4); 2) Demonstrating that proteasome inhibitors, which are known to produce reactive oxygen species, are predicted to do so; 3) Demonstrating that PLK inhibitors, which are known to reduce entry to G1, show a robust dose response in the "G1 Cell Count" model; and 4) Demonstrating that aurora kinase and tubulin inhibitors are predicted to induce high DNA damage (gH2AX) in G1 cells. These two drug classes are known to cause “mitotic slippage” and double stranded DNA breaks. The fourth example was added in response to a comment by reviewer 3.

      We’ve also added a series of enrichment tests, as described in the following new text:

      We also chose to validate three additional models: ROS, G1 cell count, and Number of gH2AX spots in G1 cells. We observed that the two proteasome inhibitors (bortezomib and MG-132) in the Drug Repurposing Hub set yielded high ROS predictions (OR = 76.7; p -15) (Figure 4C). Proteasome inhibitors are known to induce ROS (Han and Park, 2010; Ling et al., 2003). As well, PLK inhibitors yielded low G1 cell counts (OR = 0.035; p = 3.9 x 10-8) (Figure 4C). The PLK inhibitor HM-214 showed an appropriate dose response (Figure 4D). PLK inhibitors block mitotic progression, thus reducing entry into the G1 cell cycle phase (Lee et al., 2014). Lastly, we observed that aurora kinase and tubulin inhibitors were enriched for high Number of gH2AX spots in G1 cells predictions (OR = 11.3; p -15) (Figure 4E). In particular, we observed a strong dose response for the aurora kinase inhibitor barasertib (AZD1152) (Figure 4F). Aurora kinase and tubulin inhibitors cause prolonged mitotic arrest, which can lead to mitotic slippage, G1 arrest, DNA damage, and senescence (Orth et al. 2011; Cheng and Crasta 2017; Tsuda et al. 2017).

      The updated methods section describing our approach to assess generalizability perform the enrichment tests now states:

      Assessing generalizability of cell health models applied to Drug Repurposing Hub data

      We used our cell health webapp (https://broad.io/cell-health-app) to identify compounds with high predictions for three models with high or intermediate performance: ROS, Number of G1 cells, and Number of gH2AX spots in G1 cells. For each model, we identified classes of compounds with consistently high scores, then tested for statistical enrichment: for proteasome inhibitors in the ROS model, PLK inhibitors in the Number of G1 cells model, and aurora kinase and tubulin inhibitors in the Number of gH2AX spots in G1 cells model. We used one-sided Fisher’s exact tests to quantify differences in expected proportions between high and low model predictions. For each case, we determined high and low predictions based on the 50% quantile threshold for each model independently.

      We acknowledge that prospectively making predictions and measuring Cell Health readouts directly in a new experiment would be more convincing, but we note that our existing assessment of generalizability in an external experiment is already unusual in machine learning publications. Additionally and unfortunately, collecting a second validation dataset for this manuscript is not currently feasible given experiments backlogged from COVID.

      1. Training data size

      We also agree that a more comprehensive analysis on training data size would be an important indicator of model limitations. Therefore, we performed a sample titration analysis in which we randomly dropped samples from the training procedure, and tracked performance of the held out test set. We add the following figure, figure legend, and results text to describe and interpret the results.

      Supplementary Figure S13: Dropping samples from training reduces test set model performance in high, mid, and low performing models. We determined model performance stratification by taking the top third, mid third, and bottom third of test set performance when using all data. We performed the sample titration analysis with 10 different random seeds and visualized the median test set performance for each model.

      We updated the results section to introduce and discuss this result:

      Lastly, we performed a sample size titration analysis in which we randomly removed a decreasing amount of samples from training. For the high and mid performing models, we observed a consistent performance drop, suggesting that increasing sample size would result in better overall performance (Supplementary Figure 13).

      Finally, the updated methods section describing our sample titration analysis now reads:

      Machine learning robustness: Investigating the impact of sample size

      We performed an analysis in which we randomly dropped an increasing amount of samples from the training set before model training. After dropping the predefined number of samples, we retrained all 70 cell health models and assessed performance on the original holdout test set. We performed this procedure ten times with ten unique random seeds to mirror a more realistic scenario of new data collection and to reduce the impact of outlier samples on model training.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      **Minor concerns:**

      (1)Highest test performance comprises that precision is mainly driven by cell cycle/count and live status and could be probably derived from DRAQ7 (Fig. 2) and DNA granularity (Fig. 3, bottom right) and would argue for rigid feature selection across channels and features.

      We believe that clarifying the confusion between the two Cell Health assays we developed and the well-established Cell Painting assay addresses part of this concern. The DRAQ7 dye marks dead cells, and is measured in Cell Health. In other words, readouts from this reagent are what we aim to predict, not what we use for training. Indeed, DRAQ7-based phenotypes are among the top predicted models, which is a result we present in Supplementary Figure S7 - this figure uncovers which Cell Health phenotypes are more easily predicted by Cell Painting.

      The DNA granularity morphology measurements are collected from the Cell Painting assay and thus are available for training, and, as noted by the reviewer, encode a high proportion of signal in predicting the various cell health phenotypes. In our most common processing workflows for other projects, we do apply a rigid feature selection pipeline to all Cell Painting profiles before analysis, but we do not do this in this analysis since we were using a model with a sparsity-inducing penalty (elastic net).

      To directly answer the question of how channels and feature groups influence model performance, we’ve performed a systematic experiment removing different channel, compartment, and feature groups and retraining all models with the specific group dropped. We now include the following supplementary figure:

      Supplementary Figure S12: Systematically removing classes of features has little impact on most models’ performance. We retrained all 70 cell health models after dropping features associated with specific (a) feature groups, (b) channels, and (c) compartments. Each dot is one model (predictor), and the performance difference between the original model and the retrained model after dropping features is shown on the x axis. Any positive change indicates that the models got worse after dropping the feature group. (d) Individual model differences in performance after dropping features. Each dot is one class of features removed (as in a-c).

      Additionally, we updated the results section to introduce and discuss this result:

      We also performed a systematic feature removal analysis, in which we retrained cell health models after dropping features that are measured from specific groups, compartments, and channels. We observed that most models were robust to dropping entire feature classes during training (Supplementary Figure 12). This result demonstrates that many Cell Painting features are highly correlated, which might permit prediction “rescue” even if the directly implicated morphology features are not measured. Because of this, we urge caution when generating hypotheses regarding causal relationships between readouts and individual Cell Painting features.

      And we add the following to the methods section:

      Machine learning robustness: Systematically removing feature classes

      We performed an analysis in which we systematically dropped features measured in specific compartments (Nuclei, Cells, and Cytoplasm), specific channels (RNA, Mito, ER, DNA, AGP), and specific feature groups (Texture, Radial Distribution, Neighbors, Intensity, Granularity, Correlation, Area Shape) and retrained all models. We omitted one feature class and then independently optimized all 70 cell health models as described in the Machine learning framework results section above. We repeated this procedure once per feature class.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      (2)Any H2AX and 'polynuclear' would probably fail in any cell line with this size of training data.

      Indeed we would expect certain cell health phenotype models to fail if they had few hits and a relatively low variance of output values. This hit rate is directly associated with the phenotypes that the CRISPR perturbations induce, which is why we intentionally selected them to span multiple gene pathways in an attempt to maximize morphology diversity (see Supplementary Table S1).

      We did indeed observe that the polynuclear model had few hits in the training data and relatively poor performance. We did not expect this result, given that DNA stains are captured in the Cell Health and Cell Painting assays. We suspect the poor performance in this model is likely because so few cells were classified as polynuclear in our gating strategy, making it perhaps an inconsistently measured readout.

      By contrast, some gH2AX models did have relatively good performance. In the conclusion, we note that increased training data size using more perturbations is likely to improve model performance:

      The results motivate collecting larger datasets for training, with more perturbations and multiple cell lines. These new datasets would enable the development of more expressive models, based on deep learning, that can be applied to single cells. Including orthogonal imaging markers of CRISPR infection would also enable us to isolate cells with expected morphologies. More data and better models would improve the performance and generalizability of Cell Health models and enable annotation of new and existing large-scale Cell Painting datasets with important mechanisms of cell health and toxicity.

      (3)To what refers the 'weights' of the model in Fig. 1c?

      We thank the reviewer for pointing out that we never defined this term in the Figure 1 legend. We use “weights” to refer to the coefficients from the regression model. To make this more clear, we have updated the legend to now read: “Model coefficient weights” and the text in Figure 1C to now read “model weights”.

      Reviewer #1 (Significance (Required)):

      This manuscript is not advanced in the context of latest improvements/developments of cell-based microscopic classification. Rationale in the introduction and the conclusion are not linked (interpretability, generalizability, costs). It seems to be unfinished or unformatted to this end?

      Since responding to these reviews, we believe that our primary motivation - to demonstrate proof-of-concept of predicting cell health phenotypes directly from Cell Painting data - is now much clearer, holistically. We provide below an updated introduction, which improves rationale.

      Perturbing cells with specific genetic and chemical reagents in different environmental contexts impacts cells in various ways (Kitano, 2002). For example, certain perturbations impact cell health by stalling cells in specific cell cycle stages, increasing or decreasing proliferation rate, or inducing cell death via specific pathways (Markowetz, 2010; Szalai et al., 2019). Cell health is normally assessed by eye or measured by specifically targeted reagents, which are either focused on a single Cell Health parameter (ATP assays) or multiple, in combination, via FACS-based or image-based analyses, which involves a manual gating approach, complicated staining procedures, and significant reagent cost. These traditional approaches limit the ability to scale to large perturbation libraries such as candidate compounds in academic and pharmaceutical screening centers.

      Image-based profiling assays are increasingly being used to quantitatively study the morphological impact of chemical and genetic perturbations in various cell contexts (Caicedo et al., 2016; Scheeder et al., 2018). One unbiased assay, called Cell Painting, stains for various cellular compartments and organelles using non-specific and inexpensive reagents (Gustafsdottir et al., 2013). Cell Painting has been used to identify small-molecule mechanisms of action (MOA), study the impact of overexpressing cancer mutations, and discover new bioactive mechanisms, among many other applications (Caicedo et al., 2018; Christoforow et al., 2019; Hughes et al., 2020; Pahl and Sievers, 2019; Rohban et al., 2017; Simm et al., 2018; Wawer et al., 2014). Additionally, Cell Painting can predict mammalian toxicity levels for environmental chemicals (Nyffeler et al., 2020) and some of its derived morphology measurements are readily interpreted by cell biologists and relate to cell health (Bray et al., 2016). However, no single assay enables discovery of fine-grained cell health readouts.

      We hypothesized that we could predict many cell health readouts directly from the Cell Painting data, which is already available for hundreds of thousands of perturbations. This would enable the rapid and interpretable annotation of small molecules or genetic perturbations. To do this, we first developed a customized microscopy assays, which collectively report on 70 different cell health indicators via a total of seven reagents applied in two reagent panels. Collectively, we call these assay panels “Cell Health”.

      To demonstrate proof of concept, we collected a small pilot dataset of 119 CRISPR knockout perturbations in three different cell lines using Cell Painting and Cell Health. We used the Cell Painting morphology readouts to train 70 different regression models to predict each Cell Health indicator independently. We used simple machine learning methods instead of a deep learning approach because of our limited sample size and the inability to increase it by linking single cell measurements from both assays. We predicted certain readouts, such as the number of S phase cells, with high performance, while performance on other readouts, such as DNA damage in G2 phase cells, was low. We applied and validated these models on a separate set of existing Cell Painting images acquired from 1,571 compound perturbations measured across six different doses from the Drug Repurposing Hub project (Corsello et al., 2017). We provide all predictions in an intuitive web-based application at http://broad.io/cell-health-app, so that others can extend our work and explore cell health impacts of specific compounds.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This report from Way et al describes a method of extending a very popular screening technology called Cell Painting developed by the Carpenter Lab. The authors are contending with an important issue and as such this paper potentially will be of great interest to the community. Cell Painting provides quantitative fingerprints of cell phenotypes in response to changes in the molecular or physiological status of cells. However the molecular basis or even the candidate pathways for those changes is not always clear. Here, the authors take specific markers of cell physiology, e.g., DNA damage, ROS production, cell cycle progression etc. and relate them to Cell Painting features. The authors are trying to address the issue that running many probes of cell physiology is expensive and time consuming and that identifying proxies for these assays using much simpler Cell Painting technologies would be a useful and potentially powerful approach. The overall goal is to develop some type of regression model that can link the state of cells (the "health") to Cell Painting fingerprints.

      The authors use three separate cell lines and CRISPR knockouts delivered through lentivirus that target 59 genes to establish a range of cell physiologies that they directly measure (the "Cell Health") and then relate to similar assays performed by Cell Painting. Ultimately they aim to use Cell Painting models to predict Cell Health.

      We thank the reviewer for their succinct summary of our goals and rationale for this manuscript, and for the constructive and valuable comments herein.

      **Major Issues:**

      It appears that the phenotypes that are detected at a high enough level of significance (see Fig. 2), e.g DNA damage (gH2Ax), apoptosis (Caspase 3/7), dead cells, ROS (CellROX), etc. are probably most easily detected by simply monitoring DAPI signal in these screens. To detect many of the phenotypes, the authors have presented a fairly complex method of doing much simpler assays. The authors correctly highlight in Fig. 3 that the phenotypes they are detecting go beyond pure signals from DAPI. They report power in their models from Radial Distribution across many different components of the Cell Painting feature set.

      We agree that the two assays we’re collectively calling “Cell Health” are indeed fairly complex - we use two different panels of multiplexed stains and a series of gating strategies to measure phenotypes in various cell subpopulations. However, the fundamental message in the manuscript is that we may no longer need to perform these complex assays if we get this information from the simpler Cell Painting assay.

      We agree that our machine learning approach to predict the various cell health phenotypes uses signals beyond nucleus-based stains. However, even if we are predicting just DAPI signals, this reinforces our argument that the specific stains in the Cell Health assays (which are commonly used in targeted experiments) are not necessary to measure specifically. Instead, in certain circumstances, a scientist should just use unbiased stains to capture their biology of interest, since the stains are cheaper at scale and one has access to much more information.

      It is also worth noting that the DNA damage phenotypes in specific cell subpopulations (e.g. DNA Damage in G1 cells) would not be possible to measure with high precision without EdU co-staining.

      However these appear to give outputs that won't be that useful. It is hard to tell whether this is simply because they don't have enough images or whether their signal is confounded by using cell lines where the lentivirus CRISPR knockouts are working less efficiently.

      (Reviewer 2 introduced a similar critique below, which we now move here) A fundamental issue that the authors mention but do not address is the efficiency of the CRISPR KOs. The authors should measure the efficiency of representative guides and present these data to help support the interpretation of their models.

      We definitely agree that sample size is a limitation in this manuscript. Our primary goal with this paper was to demonstrate feasibility of the approach to predict the targeted Cell Health readouts using a simpler (and more affordable/scalable) assay in Cell Painting. The promising results we observed, especially given this sample size limitation, motivates collecting a larger dataset using more perturbations.

      Potentially confounded signal by low efficiency CRISPR knockouts is also an interesting topic. We do provide Supplementary Figure S8 to describe a subtle relationship that we observed regarding CRISPR infection efficiency. We also discuss this in the results as: “We observed overall better predictivity in ES2 cells, which had the highest CRISPR infection efficiency (Supplementary Figure 8), suggesting that stronger perturbations provide better information for training and that training on additional data should provide further benefit.”

      Additionally, we made a substantial effort to maximize CRISPR efficiency by independently optimizing lentivirus volumes for each sgRNA. In general, we observed that some cell lines are easier to CRISPR, probably based on more factors beyond Cas9 expression. However, we note that CRISPR is being used simply as a perturbation to elicit a variable morphology response. In other words, the type, efficacy, and even accuracy of perturbation does not matter as long as it satisfies two constraints: 1) induces a morphology response for a sufficient number of perturbations, and 2) is consistent between the two assays (Cell Health and Cell Painting). Our setup satisfies both constraints.

      However, this experiment (and data from the experiment) can be used in other contexts in which the CRISPR efficiency is extremely important. Therefore, we added three columns to Supplementary Table 1 providing the efficiency readouts for the three cell lines. (This information was already present in GitHub, but we moved it to a more obvious location in Supplementary Table 1). Code describing this change can be viewed here: https://github.com/broadinstitute/cell-health/pull/142

      In regards to the first sentence of this concern: “However these appear to give outputs that won’t be that useful” - indeed, we fully expected that many cell health readouts would be difficult to predict. In the original submission, we included the following explanation for potential sources of low performing models: ”Performance differences might result from random technical variation, small sample sizes for training models, different number of cells in certain Cell Health subpopulations (e.g. mitosis or polynuclear cells), fewer cells collected in the viability panel (see methods), or the inability of Cell Painting reagents to capture certain phenotypes.”

      It seems misleading (or perhaps the explanation lacks clarity) to describe in the same paragraph the need to validate the model by applying it to new datasets, namely the Drug Repurposing Hub project, then describe gradients in cell health features across UMAP coordinates.

      We thank the reviewer for pointing out this source of confusion and for providing an opportunity to improve the clarity of this section. Our major revisions here are as follows: 1) Introduce the Drug Repurposing Hub as an external dataset for validation; 2) Validate a high performing and simple model (number of live cells) by comparing model readout predictions from the Drug Repurposing Hub Cell Painting profiles against orthogonal PRISM viability readouts (in compounds with slightly different doses); 3) Validate three additional models: enrichment of proteasome inhibitors in the ROS model, enrichment of PLK inhibitors in the G1 cell count model, and enrichment of tubulin-destabilizing compounds in the Number of gH2Ax spots in G1 cells model; 4) Display a global structure of Cell Health predictions in UMAP space for select models. Note that for the fourth point, we are using the UMAP gradients to observe patterns, and not to validate models.

      In order to encapsulate the updated flow, we’ve pasted below the entire Drug Repurposing Hub results/discussion section, which introduces two additional analyses and new text in response to various other reviewer comments. We feel that the updated section improves clarity and purpose.

      The updated section now reads:

      “Predictive models of cell health would be most useful if they could be trained once and successfully applied to data sets collected separately from the experiment used for training. Otherwise one could not annotate existing datasets that lack parallel Cell Health results, and Cell Health assays would have to be run alongside each new dataset. We therefore applied our trained models to a large, publicly-available Cell Painting dataset collected as part of the Drug Repurposing Hub project (Corsello et al., 2017). The data derive from A549 lung cancer cells treated with 1,571 compound perturbations measured in six doses.

      We first chose a simple, high-performing model to validate. The number of live cells model captures the number of cells that are unstained by DRAQ7. We compared model predictions to orthogonal viability readouts from a third dataset: Publicly available PRISM assay readouts, which count barcoded cells after an incubation period (Yu et al., 2016). Despite measuring perturbations with slightly different doses and being fundamentally different ways to count live cells (Figure 4A), the predictions correlated with the assay readout (Spearman's Rho = 0.35, p -3; Figure 4B).

      We also chose to validate three additional models: ROS, G1 cell count, and Number of gH2AX spots in G1 cells. We observed that the two proteasome inhibitors (bortezomib and MG-132) in the Drug Repurposing Hub set yielded high ROS predictions (OR = 76.7; p -15) (Figure 4C). Proteasome inhibitors are known to induce ROS (Han and Park, 2010; Ling et al., 2003). As well, PLK inhibitors yielded low G1 cell counts (OR = 0.035; p = 3.9 x 10-8) (Figure 4C). The PLK inhibitor HM-214 showed an appropriate dose response (Figure 4D). PLK inhibitors block mitotic progression, thus reducing entry into the G1 cell cycle phase (Lee et al., 2014). Lastly, we observed that aurora kinase and tubulin inhibitors yielded high Number of gH2AX spots in G1 cells predictions (OR = 11.3; p Figure 4E). In particular, we observed a strong dose response for the aurora kinase inhibitor barasertib (AZD1152) (Figure 4F). Aurora kinase and tubulin inhibitors cause prolonged mitotic arrest, which can lead to mitotic slippage, G1 arrest, DNA damage, and senescence (Orth et al. 2011; Cheng and Crasta 2017; Tsuda et al. 2017).

      We applied uniform manifold approximation (UMAP) to observe the underlying structure of the samples as captured by morphology data (McInnes et al., 2018). We observed that the UMAP space captures gradients in predicted G1 cell count (Supplementary Figure S14A) and in predicted ROS (Supplementary Figure S14B). We also observed similar gradients in the ground truth cell health readouts in the CRISPR Cell Painting profiles used for training cell health models (Supplementary Figure S15). Gradients in our data suggest that cell health phenotypes manifest in a continuum rather than in discrete states.

      Lastly, we observed moderate technical artifacts in the Drug Repurposing Hub profiles, indicated by high DMSO profile dispersion in the Cell Painting UMAP space (Supplementary Figure 14C). This represents an opportunity to improve model predictions with new batch effect correction tools. Additionally, it is important to note that the expected performance of each Cell Health model can only be as good as the performance observed in the original test set (see Figure 2), and that all predictions require further experimental validation.“

      Updated Figure 4:

      Figure 4: Validating Cell Health models applied to Cell Painting data from The Drug Repurposing Hub. The models were not trained using the Drug Repurposing Hub data. (a) The results of the dose alignment between the PRISM assay and the Drug Repurposing Hub data. This view indicates that there was not a one-to-one matching between perturbation doses. (b) Comparing viability estimates from the PRISM assay to the predicted number of live cells in the Drug Repurposing Hub. The PRISM assay estimates viability by measuring barcoded A549 cells after an incubation period. (c) Drug Repurposing Hub profiles stratified by G1 cell count and ROS predictions. Bortezomib and MG-132 are proteasome inhibitors and are used as positive controls in the Drug Repurposing Hub set; DMSO is a negative control. We also highlight all PLK inhibitors in the dataset. (d) HMN-214 is an example of a PLK inhibitor that shows strong dose response for G1 cell count predictions. (e) Tubulin and aurora kinase inhibitors are predicted to have high Number of gH2AX spots in G1 cells compared to other compounds and controls. (f) Barasertib (AZD1152) is an aurora kinase inhibitor that is predicted to have a strong dose response for Number of gH2AX spots in G1 cells predictions.

      Updated Supplementary Figure:

      Supplementary Figure S14: Applying a Uniform Manifold Approximation (UMAP) to Drug Repurposing Hub consensus profiles of 1,571 compounds across six doses. The models were not trained using the Drug Repurposing Hub data. (a) The point color represents the output of the Cell Health model trained to predict the number of cells in G1 phase (G1 cell count). (b) The same UMAP dimensions, but colored by the output of the Cell Health model trained to predict reactive oxygen species (ROS). (c) In the UMAP space, we highlight DMSO as a negative control, and Bortezomib and MG-132 as two positive controls (proteasome inhibitors) in the Drug Repurposing Hub set. We observe moderate batch effects in the negative control DMSO profiles, based on their spread in this visualization. The color represents the predicted number of live cells. The positive controls were acquired with a very high dose and are expected to result in a very low number of predicted live cells.

      All software updates required to update these figures can be viewed at https://github.com/broadinstitute/cell-health/pull/145

      Is it surprising that cell health phenotypes and gradients therein are present in a dataset describing cell health perturbations?

      This was not surprising to us, and we thank the reviewer for asking the question. We have now added a new Supplementary Figure to present a UMAP with ground truth cell health measurements in the CRISPR dataset (pasted below). By adding the figure, we show how Cell Health predictions are expected to show gradients in UMAP space. In fact, for any lower-dimensional embedding that is able to preserve local neighborhoods of the high-dimensional space, we should expect all linear transformations of the input data (in the high-dimensional space) to vary smoothly across the lower-dimensional embedding. However, it is still informative to observe where the specific Cell Health phenotype predictions manifest in relation to global morphology structure. We add the following sentence in the Drug Repurposing Hub paragraph juxtaposed to the other UMAP gradient observations:

      We applied uniform manifold approximation (UMAP) to observe the underlying structure of the samples as captured by morphology data (McInnes et al., 2018). We observed that the UMAP space captures gradients in predicted G1 cell count (Supplementary Figure S14A) and in predicted ROS (Supplementary Figure S14B). We also observed similar gradients in the ground truth cell health readouts in the CRISPR Cell Painting profiles used for training cell health models (Supplementary Figure S15). Gradients in our data suggest that cell health phenotypes manifest in a continuum rather than in discrete states.

      Supplementary Figure S15: Applying a Uniform Manifold Approximation (UMAP) to the Cell Painting consensus profile data of CRISPR perturbations. UMAP coordinates visualized by (a) cell line, (b) ground truth G1 cell counts, and (c) ground truth ROS counts. (d) Visualizing the distribution of ground truth ROS compared against G1 cell count. The two outlier ES2 profiles are CRISPR knockdowns of GPX4, which is known to cause high ROS.

      We have also added the option to explore the CRISPR profile Cell Health ground truth in our shiny app https://broad.io/cell-health (screenshot pasted below)

      Modifications to the software introducing these changes can be viewed at https://github.com/broadinstitute/cell-health/pull/141.

      The actual test of the model's performance is in the paragraph below, but the data associated with the Spearman correlation is hidden in Fig. S10b. The data is not convincing by eye, and the artifactually low p value suggests that proper statistical corrections were not applied.

      We have moved the Spearman correlation figure (previously Supplementary Figure S10B) into a main figure, along with a complete restructuring of the results and discussion in the Drug Repurposing Hub section.

      We appreciate the careful observations and interpretations, and confirm the statistical test performed here is sound and the p value is correct (there is no need to account for multiple testing since there is only one test being applied, a test of correlation between two variables).

      We add this rationale to the “Comparing viability predictions to an orthogonal readout” methods section:

      We performed the non-parametric Spearman correlation test because 1) the doses were not aligned between the datasets we compared, and 2) it is possible that a strong nonlinear correlation exists between readouts from two fundamentally different ways to measure viability.

      It is definitely valid to critique the scatter plot relationship to understand that the mean squared error is quite high (i.e. if two datasets had viability measurements using the two approaches, it would be wrong to assume that lower measurements in one assay automatically could be compared to lower measurements of the other assay). This level of variability would be lost if all we did was report the test statistic, which is the reason why we included the scatter plot as a figure.

      It may also be important to mention that the authors of the PRISM paper also noted high variation in their estimates (from Corsello et al https://doi.org/10.1038/s43018-019-0018-6): "At the level of individual compound dose–responses, we note that the PRISM Repurposing dataset tends to be somewhat noisier, with a higher standard error estimated from vehicle control measurements (Extended Data Fig. 5c and Extended Data Fig. 6a–c)."

      Nevertheless, we agree that the current way we report this p value is distracting and potentially misleading, depending on how the p value is interpreted. Therefore, we have updated the reporting of all p values to say that they are less than a predefined cutoff. The figure now states that p

      Fig 1A and associated methods are not sufficient information to describe the manual gating strategy and any variability found across iterations in these gates. Effort should be taken to quantify where these manual boundaries were set and why.

      We describe the manual gating strategies in much detail in the methods section “Cell Health assay: Image analysis”. However, we agree that a description of measurement variability and experimental approach requires more detail, and we agree that the manuscript would benefit from a visual example of these gates. These improvements required us to rearrange Figure 1.

      With a goal of increasing reproducibility in the cell health assay, we’ve (1) moved example images of the Cell Health assay to Figure 1A; (2) Moved the existing gating strategies drawing to Supplementary Figure 1; (3) Added real data examples of the manual gating strategy as a new Supplementary Figure 2. We show all updates below:

      Updated Figure 1:

      Figure 1. Data processing and modeling approach. (a) Example images and workflow from the Cell Health assays. We apply a series of manual gating strategies (see Methods) to isolate cell subpopulations and to generate cell health readouts for each perturbation. (top) In the “Cell Cycle” panel, in each nucleus we measure Hoechst, EdU, PH3, and gH2AX. (bottom) In the “Cell Viability” panel, we capture digital phase contrast images, measure Caspase 3/7, DRAQ7, and CellROX. (b) Example Cell Painting image across five channels, plus a merged representation across channels. The image is cropped from a larger image and shows ES2 cells. Below are the steps applied in an image-based profiling pipeline, after features have been extracted from each cell’s image. (c) Modeling approach where we fit 70 different regression models using CellProfiler features derived from Cell Painting images to predict Cell Health readouts.

      Updated Supplementary Figure S1:

      Supplementary Figure S1: Illustration of the gating strategy in the Cell Health assays. We extract 70 different readouts from the Cell Health imaging assay. The assay consists of two customized reagent panels, which use measurements from seven different targeted reagents and one channel based on digital phase contrast (DPC) imaging; shown are five toy examples to demonstrate that individual cells are isolated into subpopulations by various gating strategies to define the Cell Health readouts.

      Updated Supplementary Figure S2 (Example gating strategies):

      Supplementary Figure S2: Real data of manual gating in the Cell Health assays.

      For each cell line, we apply a series of manual gating strategies defined by various stain measurements in single cells to define cell subpopulations. (a) In the cell cycle panel, we first select cells that are useful for cell cycle analysis based on nucleus roundness and Hoechst intensity measurements. We also identify polyploid and “large not round” (polynuclear) cells. (b) We then subdivide the cells used for cell cycle to G1, G2, and S cells based on total Hoechst intensity (DNA content) and EdU incorporation signal intensity. (c) We use Hoechst and PH3 nucleus intensity to define mitotic cells. The points are colored by EdU intensity in the nucleus in both (b) and (c). (d) Example gating in the viability panel. We use DRAQ7 and CellEvent (Caspase 3/7) to distinguish alive and dead cells, and categorize early or late apoptosis. See Methods for more details about how the Cell Health measurements are made.

      We’ve also added the following to the methods section:

      Additionally, we set these gates for each cell subpopulation using a set of random wells from each cell line and experiment independently. We observed that the intensity measurements used to form the gates were consistent across wells and plates, and generally formed distinct cell subpopulation clusters. After using the random wells to set the gates, we used the Harmony microscope software to apply the gates to the remaining wells and plates.

      In general however, the need to clearly define this process further emphasizes a strength in our approach: There is great potential for inconsistencies when different humans draw gates. We aim to reduce these inconsistencies by predicting these readouts from Cell Painting images directly.

      The authors conclude that their results motivate further data acquisition and model training, and that this will improve model performance. This is only true if their lack of predictive power comes from the data volume itself, and not in larger problems of data quality, variability and the core assumptions of their method. The authors note the better predictability in ES2 cells, likely due to higher CRISPR efficiency and therefore stronger phenotypes. It is possible, as I believe the authors suggest, that the ES2 cells provide information that improves the predictive power of cells with poor infection efficiency. It is instead possible that only the ES2 cells with strong phenotypes yield predictive power, pulling the average of the dataset up. Authors could train the cell line specific datasets independently and compare relative changes in predictive performance. Otherwise, is it possible that subtle or highly complex phenotypes simply cannot be detected by this method and more data will be unlikely to improve predictability in modest perturbations.

      We thank the reviewers for raising this possibility. To explore this, we performed a cell-line holdout analysis in which we retrained (and individually reoptimized) all 70 cell health models on every combination of two cell lines and predicted readouts from the held out third cell line.

      Despite there being fewer samples in the training set in the cell line holdout test compared to the original test set (66% vs. 85%) and the fact that each model had never seen the held out cell line before, many cell health phenotypes could still be predicted. We add the following results in a new Supplementary Figure:

      Supplementary Figure S11: Results from a cell line holdout analysis. We trained and evaluated all 70 cell health models in three different scenarios using each combination of two cell lines to train, and the remaining cell line to evaluate. For example, we trained all 70 models using data from A549 and ES2 and evaluated performance in HCC44. We bin all cell health models into 14 different categories (see Supplementary Table S3 and https://github.com/broadinstitute/cell-health/6.ml-robustness for details about the categories and scores). We also provide the original test set (15% of the data, distributed evenly across all cell types) performance in the last row, as well as results after training with randomly permuted data. This cross-cell-type analysis yields worse performance overall. Nevertheless, despite the models never encountering certain cell lines, and having fewer training data points, many models still have predictive power across cell line contexts. Note that we truncated the y axis to remove extreme outliers far below -1. The raw scores are available on https://github.com/broadinstitute/cell-health.

      We’ve also performed a sample size titration analysis, which suggests that more data would indeed improve model performance. More data would also enable a deep learning approach, which is also likely to improve performance.

      Supplementary Figure S13: Dropping samples from training reduces test set model performance in high, mid, and low performing models. We determined model performance stratification by taking the top third, mid third, and bottom third of test set performance when using all data. We performed the sample titration analysis with 10 different random seeds and visualized the median test set performance for each model.

      We also update the results section to introduce and discuss this result:

      Lastly, we performed a sample size titration analysis in which we randomly removed a decreasing amount of samples from training. For the high and mid performing models, we observed a consistent performance drop, suggesting that increasing sample size would result in better overall performance (Supplementary Figure 13).

      And an updated methods describing this analysis now reads:

      Machine learning robustness: Investigating the impact of sample size

      We performed an analysis in which we randomly dropped an increasing amount of samples from the training set before model training. After dropping the predefined number of samples, we retrained all 70 cell health models and assessed performance on the original holdout test set. We performed this procedure ten times with ten unique random seeds to mirror a more realistic scenario of new data collection and to reduce the impact of outlier samples on model training.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      Although the authors argue that the Cell Painting assay is capturing complex health phenotypes using a variety of morphological features, there is a clear overweighting of a particular few (in fact two...). It would be interesting to systematically retrain with exclusion of particular features to determine if equalizing the weight across features changes performance. These are also notably the feature groups with the fewest features-- how many individual features within these feature groups are pulling all the weight?

      We agree that an additional computational analysis including a systematic feature removal would be interesting and valuable. We’ve included this analysis as part of a new results subsection in which we assess where classification improvements are likely to come from by testing robustness of the ML models.

      Specifically, we’ve systematically removed individual features that belong to specific feature groups, channels, and compartments to determine how much their absence negatively affects model performance. The added supplementary figure is pasted below.

      Supplementary Figure S12: Systematically removing classes of features has little impact on most models’ performance. We retrained all 70 cell health models after dropping features associated with specific (a) feature groups, (b) channels, and (c) compartments. Each dot is one model (predictor), and the performance difference between the original model and the retrained model after dropping features is shown on the x axis. Any positive change indicates that the models got worse after dropping the feature group. (d) Individual model differences in performance after dropping features. Each dot is one class of features removed (as in a-c).

      We conclude that the majority of cell health models are robust to missing feature groups. Some models actually improve with a reduction in the feature space. Combined with the feature heatmap presented in Figure 3, these results tell us that a lot of the morphology signal is redundant across Cell Painting features.

      We add the following text to the results:

      We also performed a systematic feature removal analysis, in which we retrained cell health models after dropping features that are measured from specific groups, compartments, and channels. We observed that most models were robust to dropping entire feature classes during training (Supplementary Figure 12). This result demonstrates that many Cell Painting features are highly correlated, which might permit prediction “rescue” even if the directly implicated morphology features are not measured. Because of this, we urge caution when generating hypotheses regarding causal relationships between readouts and individual Cell Painting features.

      And the following to the methods:

      Machine learning robustness: Systematically removing feature classes

      We performed an analysis in which we systematically dropped features measured in specific compartments (Nuclei, Cells, and Cytoplasm), specific channels (RNA, Mito, ER, DNA, AGP), and specific feature groups (Texture, Radial Distribution, Neighbors, Intensity, Granularity, Correlation, Area Shape) and retrained all models. We omitted one feature class and then independently optimized all 70 cell health models as described in the Machine learning framework results section above. We repeated this procedure once per feature class.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      In summary there is a very interesting concept here, but for several possible, currently undefined reasons, the authors are reporting a very weak measurement. The authors allude to these limitations, but it would be great if the authors could address these issues and provide a stronger dataset.

      We thank the reviewers for their encouraging remarks. We believe that with the added robustness analyses and with increased clarity about the motivation behind the paper, we’ve successfully demonstrated a proof of concept for the approach to predict cell health phenotypes from Cell Painting images. We believe that we’ve provided sufficient evidence to a reader to demonstrate the benefits of the prediction approach. As well, given the additional details describing the Cell Health assay reproducibility, that the paper also successfully introduces a new assay paradigm.

      Furthermore, while many of the cell health measurements are definitely weak (and unreliable), it is not fair to generalize all predictions as weak (especially given the sample size limitations).

      It is also worth noting that, under the current circumstances, separating the one dataset we have into a train/test set and validating the model in an external set is the best we could do; we do not have additional budget to run further wet lab experiments (which would also face a COVID backlog in our chemical screening group). We agree that additional datasets would benefit the field; our current data is now public, all of our future data will be public (to the extent possible), and we hope that others building on our work will make their data public too to address these questions.

      Lastly, in response to the “currently undefined reasons” comment, as well as other comments throughout, we’ve now included a new subsection in the Results/Discussion subsection to more directly answer some of the reasons why many models may have underperformed. Specifically, and as mentioned previously in this response, we perform three distinct robustness analyses: 1) Cell line holdout; 2) feature holdout; 3) sample size titration.

      Authors should include representative images of their Cell Health assay in the main figures. A full figure of all labels and examples of manual gating should be included (S1 is too limited)

      Scale bars need to be included in all images, some are missing in S1

      We thank the reviewers for this suggestion. We have since substantially updated figure 1 and supplementary figure S1. We have also added a new supplementary figure S2 as an example of the manual gating strategies, and we have updated all scale bars appropriately. We’ve attached the specific figure updates in an earlier response.

      "20x water objective in confocal mode" is not a sufficient level of detail on image acquisition parameters especially considering the lack of representative images. At the very least, NA and if appropriate pinhole size should be reported. Similarly, "9 FOV per well" is not sufficient. Pixel size and FOV area/dimensions are necessary.

      We have added these necessary details in their representative methods sections:

      We acquired all cell images using an Opera Phenix High Content Imaging Instrument (PerkinElmer) with a 20X water objective (a numerical aperture (NA) of 1.0), in confocal mode (a pinhole size of 50µm). The effective pixel size was 0.65µm/pixel. We acquired images in four channels using default excitation / emission combinations: for the blue channel (Hoechst) 405/435-480; for the green channel (Alexa 488 and CellEvent) 488/500-550; for the orange channel (Alexa 568 and CellRox Orange) 561/570-630 and for the far-red channel (Alexa 647 and DRAQ7) 640/650-760. We applied the Cell Health reagents for cell viability and for cell cycle in two separate plates.

      The legends for the different parts of Fig S10 are transposed which makes the figure quite confusing.The authors should amend or clarify the language of "guide perturbation" and "guide profile".

      Wow! We thank the reviewers for pointing out this oversight, and for their careful attention to detail. This figure is now completely different after the restructuring of the Drug Repurposing Hub results/discussion section. The legends for all figures are now correct.

      EdU is defined after it is abbreviated in methods

      We thank the reviewers for noting this. We’ve now fixed where these acronyms are abbreviated in the methods section and removed their definition in later sections where redundant:

      The authors should address the following image processing reproducibility concerns:

      Segmentation and feature extraction parameters are not included in the Supplementary Information. Either attach the CellProfiler pipeline or add a table with parameters and settings used for each module.

      CellProfiler and Harmony versions are missing.

      We thank the reviewers for pointing out these very important omissions. We have since rectified in the methods section:

      We built a CellProfiler image analysis and illumination correction pipeline (version 2.2.0) to extract these image-based features (McQuin et al., 2018). We include the CellProfiler pipelines in our github repository.

      We developed and ran two distinct image analysis pipelines in Harmony software (version 4.1; PerkinElmer) for each of the Cell Health plates.

      We also add the CellProfiler pipelines to our GitHub repository. A pull request introducing this change can be viewed here: https://github.com/broadinstitute/cell-health/pull/149

      Subpopulation definition (page 14) should be defined in a way that the algorithms (pipelines) could be reproduced, e.g.: "unusually high intensity of Hoechst max" requires a stricter definition.

      These definitions are subjective by nature. Gating decisions will be different depending on the scientist performing the image analysis. We feel that the sentence: “We excluded outlier nuclei with unusually high intensity of Hoechst max” conveys this subjectivity well. One of the strengths of the proposed approach to predict cell health phenotypes directly from the Cell Painting images is the removal of gating subjectivity.

      Why is the nucleus roundness calculated in PE Harmony and not in the CellProfiler pipeline itself?

      We used the nucleus roundness measurements as calculated in PE Harmony to define the “cells selected for cell cycle” subpopulation in the first panel of the Cell Health assay. I.e. this measurement was integral to the Cell Health assay itself. We believe that the addition of example gates (in supplementary figure 2) clears up this confusion.

      Reviewers:

      Jason Swedlow

      Melpi Platani

      Erin Diel

      Emil Rozbicki

      Reviewer #2 (Significance (Required)):

      Nature and Significance: This study aims to demonstrate how phenotypic studies using different markers can be combined and linked to deliver wider application and value.

      Relationship to Published Work: This study extends previous work from the same group and attempts a novel extension. The approach is a useful concept and potentially important.

      Audience: The method this paper proposes will be of interests to scientists involved with drug discovery and/or computational biology.

      Reviewer's Expertise: Cell Biology, Imaging, Imaging Informatics, Machine Learning, Computer Vision

      We would like to again express thanks to these reviewers for their careful read, very helpful comments, and encouraging remarks.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The authors present a novel idea on predicting various cell health readouts based on a general set of markers and cell painting assay. The cell health readouts are based on more specific markers performed in different assays measuring cell proliferation and death. The authors suggest that such an approach can reduce the number of experiments needed. The paper is well written, and the figures are clear and comprehensive.

      We thank the reviewer for their helpful comments and encouragement!

      **Major comments:**

      Some of the health readouts are based on general morphology (cell and nucleus) which can be obtained based on cell painting assay. Although some of these models perform well, it is surprising that the model of nuclear roundness did not perform very well especially for HCC4 (R-square reaching zero). This is surprising as these data can be extracted from cell painting assays. Can the author elaborate on why this is the case?

      We agree that the performance of the live cell roundness and nucleus roundness models were unexpectedly low. One would expect that these shape features as measured by PerkinElmer Harmony software, would be easily predicted from CellProfiler readouts from the Cell Painting assay.

      The roundness property was used in Harmony versions,

      2*sqrt(π)*sqrt(Area-BorderArea/2.0)/BorderArea-0.1)

      where Area is object area in pixels and BorderArea is border area in pixels (we thank Joe Trask, Olavi Ollikainen, Hartwig Preckel, and Kaupo Palo at PerkinElmer for this information.)

      No single feature in the CellProfiler readouts measures roundness directly; instead, CellProfiler will measure a combination of shape features that together could synthesize the idea of “roundness”. However, given that the elastic net approach is well-suited for this type of synthesis, it remains unclear why roundess is not predicted well.

      One possible explanation is that shape features are the most different measurements across cell lines and they are measured precisely in both assays. Precise measurements coupled with our training strategy of using all three lines together, might lead to poor performance in predicting certain cell-line intrinsic features.

      We tested this shape result directly (and also generally to the other cell health features) in a “cell line holdout” analysis, which we describe in more detail in response to the next comment. In this analysis, we tested how well models generalized to cell lines not encountered in the training process. In this analysis, we trained on every combination of two cell lines and applied the trained models to the third. We observed that cell line intrinsic features, like shape, are predicted poorly if a model was not trained using the cell line.

      Using elastic net regression models is well-suited to the problem due to the low number of observations. However, there is a significant difference between the performance of different cell lines. Does the performance of the models improve if different models were trained for every cell line? Leave one out approach can be used to accommodate the scarcity of samples.

      We thank the reviewer for this important question. We also appreciate how different certain models behaved with certain cell lines. We would like to stress that the results presented here represent a small pilot study that is not meant to optimize model performance. Instead, the motivation of the manuscript is to demonstrate proof-of-concept of the approach to predict specific cell health phenotypes directly from Cell Painting images. We believe that the current results demonstrate positive proof, which warrants an expansion of data collection and an improvement of the classification methodology.

      Nevertheless, with our current data, we can answer an important question about the feasibility of signal transfer between cell lines. Therefore, we performed an additional “cell line holdout” analysis. We believe that the cell line holdout analysis tells us that signals can be transferred across contexts, but that any leading observations must be followed up with experiments performed directly in the cell line of interest. This signal transfer is diluted compared to the original test set performance, but it is also worth noting that the models presented in Supplementary Figure 11 (pasted below) were trained on only 66% of the data in the holdout cell line analysis and 85% of the data in the original analysis.

      Supplementary Figure S11: Results from a cell line holdout analysis. We trained and evaluated all 70 cell health models in three different scenarios using each combination of two cell lines to train, and the remaining cell line to evaluate. For example, we trained all 70 models using data from A549 and ES2 and evaluated performance in HCC44. We bin all cell health models into 14 different categories (see Supplementary Table S3 and https://github.com/broadinstitute/cell-health/6.ml-robustness for details about the categories and scores). We also provide the original test set (15% of the data, distributed evenly across all cell types) performance in the last row, as well as results after training with randomly permuted data. This cross-cell-type analysis yields worse performance overall. Nevertheless, despite the models never encountering certain cell lines, and having fewer training data points, many models still have predictive power across cell line contexts. Note that we truncated the y axis to remove extreme outliers far below -1. The raw scores are available on https://github.com/broadinstitute/cell-health.

      And we add the following text to the results section:

      We performed a series of analyses to determine certain parameters and options that are likely to improve models in the future. First, we performed a “cell line holdout” analysis, in which we trained models on two of three cell lines and predicted cell health readouts on the held out cell line. We observed that certain models including those based on viability, S phase, early mitotic and death phenotypes could be moderately predicted in cell lines agnostic to training (Supplementary Figure 11). Not surprisingly, shape-based phenotypes could not be predicted in holdout cell lines, which emphasizes the limitations of transferring certain cell-line specific measurements across cell lines.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      The authors chose to validate based on the number of live cells as it is one of the best models. However, this readout can be obtained using simple viability assays. It would be more convincing to validate on a more complex phenotype that can only be attained using imaging such as #gH2AX spots.

      It is worth noting that we do also show generalizability in the Drug Repurposing Hub for two other models: ROS and G1 cell count. We show that proteasome inhibitors significantly induce high ROS and PLK inhibitors restrict entry to G1. We have also added enrichment tests demonstrating high statistical significance for these compound mechanisms.

      While we recognize that these two examples provide anecdotal evidence, they suggest the ability and power of the approach to assign phenotypes to Cell Painting images.

      Nevertheless, we thank the reviewer for bringing up this critical point and certainly appreciate the benefit of validating a gH2AX model. Therefore, we’ve added a similar analysis in which we demonstrate generalizability of the top performing gH2Ax model: Number of gH2AX spots in G1 cells. We discuss these changes in an updated section:

      We also chose to validate three additional models: ROS, G1 cell count, and Number of gH2AX spots in G1 cells. We observed that the two proteasome inhibitors (bortezomib and MG-132) in the Drug Repurposing Hub set yielded high ROS predictions (OR = 76.7; p -15) (Figure 4C). Proteasome inhibitors are known to induce ROS (Han and Park, 2010; Ling et al., 2003). As well, PLK inhibitors yielded low G1 cell counts (OR = 0.035; p = 3.9 x 10-8) (Figure 4C). The PLK inhibitor HM-214 showed an appropriate dose response (Figure 4D). PLK inhibitors block mitotic progression, thus reducing entry into the G1 cell cycle phase (Lee et al., 2014). Lastly, we observed that aurora kinase and tubulin inhibitors yielded high Number of gH2AX spots in G1 cells predictions (OR = 11.3; p Figure 4E). In particular, we observed a strong dose response for the aurora kinase inhibitor barasertib (AZD1152) (Figure 4F). Aurora kinase and tubulin inhibitors cause prolonged mitotic arrest, which can lead to mitotic slippage, G1 arrest, DNA damage, and senescence (Orth et al. 2011; Cheng and Crasta 2017; Tsuda et al. 2017).

      We also modify the abstract summarizing this result:

      For Cell Painting images from a set of 1,500+ compound perturbations across multiple doses, we validated predictions by orthogonal assay readouts, and by confirming mitotic arrest, ROS, and DNA damage phenotypes via PLK, proteasome, and aurora kinase/tubulin inhibition, respectively.

      And we add this analysis to an updated Figure 4:

      Figure 4: Validating Cell Health models applied to Cell Painting data from The Drug Repurposing Hub. The models were not trained using the Drug Repurposing Hub data. (a) The results of the dose alignment between the PRISM assay and the Drug Repurposing Hub data. This view indicates that there was not a one-to-one matching between perturbation doses. (b) Comparing viability estimates from the PRISM assay to the predicted number of live cells in the Drug Repurposing Hub. The PRISM assay estimates viability by measuring barcoded A549 cells after an incubation period. (c) Drug Repurposing Hub profiles stratified by G1 cell count and ROS predictions. Bortezomib and MG-132 are proteasome inhibitors and are used as positive controls in the Drug Repurposing Hub set; DMSO is a negative control. We also highlight all PLK inhibitors in the dataset. (d) HMN-214 is an example of a PLK inhibitor that shows strong dose response for G1 cell count predictions. (e) Tubulin and aurora kinase inhibitors are predicted to have high Number of gH2AX spots in G1 cells compared to other compounds and controls. (f) Barasertib (AZD1152) is an aurora kinase inhibitor that is predicted to have a strong dose response for Number of gH2AX spots in G1 cells predictions.

      All software updates required to update these figures can be viewed at https://github.com/broadinstitute/cell-health/pull/145

      It is also worth noting that collecting more data for this manuscript is not currently feasible given the amount of projects backlogged from COVID. We feel that given that the motivation of the project is to demonstrate feasibility of the approach, with our current training/testing machine learning framework and the application to Drug Repurposing Hub data is sufficient.

      The text would benefit from expanding the discussion to include the advantages and limitations of their approach.

      We thank the reviewer for bringing up this concern, and we agree that it is worth an increased discussion about advantages and limitations of the approach. Indeed, we’ve added a full new results/discussion subsection directly testing many of the assumptions for why some models performed well and others didn’t. The new section introduces many model limitations:

      We performed a series of analyses to determine certain parameters and options that are likely to improve models in the future. First, we performed a “cell line holdout” analysis, in which we trained models on two of three cell lines and predicted cell health readouts on the held out cell line. We observed that certain models including those based on viability, S phase, early mitotic and death phenotypes could be moderately predicted in cell lines agnostic to training (Supplementary Figure 11). Not surprisingly, shape-based phenotypes could not be predicted in holdout cell lines, which emphasizes the limitations of transferring certain cell-line specific measurements across cell lines. We also performed a systematic feature removal analysis, in which we retrained cell health models after dropping features that are measured from specific groups, compartments, and channels. We observed that many models were robust to dropping entire feature classes during training (Supplementary Figure 12). This result demonstrates that many Cell Painting features are highly correlated, which might permit prediction “rescue” even if the directly implicated morphology features are not measured. Because of this, we urge caution when generating hypotheses regarding causal relationships between phenotypes and individual Cell Painting features. Lastly, we performed a sample size titration analysis in which we randomly removed a decreasing amount of samples from training. For the high and mid performing models we observed a consistent performance drop, suggesting that increasing sample size would result in better overall performance (Supplementary Figure 13).

      **Minor comments**

      Page 8: The authors visualize the predicted G1 cell count and ROS when overlayed on a UMAP based on cell painting data from Drug Repurposing Hub. How these visualisations look like if applied to the original CRISPR training data.

      We address this comment by adding a supplementary figure showing ground truth G1 cell count and ROS readouts.

      We applied uniform manifold approximation (UMAP) to observe the underlying structure of the samples as captured by morphology data (McInnes et al., 2018). We observed that the UMAP space captures gradients in predicted G1 cell count (Supplementary Figure S14A) and in predicted ROS (Supplementary Figure S14B). We also observed similar gradients in the ground truth cell health readouts in the CRISPR Cell Painting profiles used for training cell health models (Supplementary Figure S15). Gradients in our data suggest that cell health phenotypes manifest in a continuum rather than in discrete states.

      Where Supplementary Figure 15 is pasted below:

      Supplementary Figure S15: Applying a Uniform Manifold Approximation (UMAP) to the Cell Painting consensus profile data of CRISPR perturbations. UMAP coordinates visualized by (a) cell line, (b) ground truth G1 cell counts, and (c) ground truth ROS counts. (d) Visualizing the distribution of ground truth ROS compared against G1 cell count. The two outlier ES2 profiles are CRISPR knockdowns of GPX4, which is known to cause high ROS.

      We have also added the option to explore the CRISPR profile Cell Health ground truth in our shiny app https://broad.io/cell-health (screenshot pasted below)

      Modifications to the software introducing these changes can be viewed at https://github.com/broadinstitute/cell-health/pull/141.

      The second part of the last paragraph on page 8 is confusing as it is not related to the first part using the PRISM data.

      We thank the reviewer for noting this. We agree that the clarity of this section could be improved. We have now completely reworked the final section of applying the cell health models to the Drug Repurposing Hub data.

      In particular, we’ve moved the PRISM data section as the first, most simple model to validate, and moved these results to Figure 4. We then describe validation for three other models: ROS, G1 cell count and Number of gH2Ax spots in G1 cells. And we end with the UMAP discussion, which is the original second part of the last paragraph on page 8.

      The PRISM section now reads:

      We first chose a simple, high-performing model to validate. The number of live cells model captures the number of cells that are unstained by DRAQ7. We compared model predictions to orthogonal viability readouts from a third dataset: Publicly available PRISM assay readouts, which count barcoded cells after an incubation period (Yu et al., 2016). Despite measuring perturbations with slightly different doses and being fundamentally different ways to count live cells (Figure 4A), the predictions correlated with the assay readout (Spearman's Rho = 0.35, p -3; Figure 4B).

      Reviewer #3 (Significance (Required)):

      This approach can be of wide interest as it is easy to implement, cost-effective and lead to interpretable models. It would be interesting to see if the results improve when increasing the sample size. Another aspect that can be useful to investigate in the future is whether including a separate marker that indicates infected cells only in the more detailed assays would result in better accuracies.

      We thank the reviewer for their enthusiasm and for this concluding idea. Indeed, we also feel that including a separate marker to indicate infected cells could lead to improved accuracy. We add this thought to the concluding section as a future direction. The full updated conclusion reads as follows:

      We have demonstrated feasibility that information in Cell Painting images can predict many different Cell Health indicators even when trained on a small dataset. The results motivate collecting larger datasets for training, with more perturbations and multiple cell lines. These new datasets would enable the development of more expressive models, based on deep learning, that can be applied to single cells. Including orthogonal imaging markers of CRISPR infection would also enable us to isolate cells with expected morphologies. More data and better models would improve the performance and generalizability of Cell Health models and enable annotation of new and existing large-scale Cell Painting datasets with important mechanisms of cell health and toxicity.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The authors present a novel idea on predicting various cell health readouts based on a general set of markers and cell painting assay. The cell health readouts are based on more specific markers performed in different assays measuring cell proliferation and death. The authors suggest that such an approach can reduce the number of experiments needed. The paper is well written, and the figures are clear and comprehensive.

      Major comments:

      Some of the health readouts are based on general morphology (cell and nucleus) which can be obtained based on cell painting assay. Although some of these models perform well, it is surprising that the model of nuclear roundness did not perform very well especially for HCC4 (R-square reaching zero). This is surprising as these data can be extracted from cell painting assays. Can the author elaborate on why this is the case?

      Using elastic net regression models is well-suited to the problem due to the low number of observations. However, there is a significant difference between the performance of different cell lines. Does the performance of the models improve if different models were trained for every cell line? Leave one out approach can be used to accommodate the scarcity of samples.

      The authors chose to validate based on the number of live cells as it is one of the best models. However, this readout can be obtained using simple viability assays. It would be more convincing to validate on a more complex phenotype that can only be attained using imaging such as #gH2AX spots.

      The text would benefit from expanding the discussion to include the advantages and limitations of their approach.

      Minor comments

      Page 8: The authors visualize the predicted G1 cell count and ROS when overlayed on a UMAP based on cell painting data from Drug Repurposing Hub. How these visualisations look like if applied to the original CRISPR training data.

      The second part of the last paragraph on page 8 is confusing as it is not related to the first part using the PRISM data.

      Significance

      This approach can be of wide interest as it is easy to implement, cost-effective and lead to interpretable models. It would be interesting to see if the results improve when increasing the sample size. Another aspect that can be useful to investigate in the future is whether including a separate marker that indicates infected cells only in the more detailed assays would result in better accuracies.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This report from Way et al describes a method of extending a very popular screening technology called Cell Painting developed by the Carpenter Lab. The authors are contending with an important issue and as such this paper potentially will be of great interest to the community. Cell Painting provides quantitative fingerprints of cell phenotypes in response to changes in the molecular or physiological status of cells. However the molecular basis or even the candidate pathways for those changes is not always clear. Here, the authors take specific markers of cell physiology, e.g., DNA damage, ROS production, cell cycle progression etc. and relate them to Cell Painting features. The authors are trying to address the issue that running many probes of cell physiology is expensive and time consuming and that identifying proxies for these assays using much simpler Cell Painting technologies would be a useful and potentially powerful approach. The overall goal is to develop some type of regression model that can link the state of cells (the "health") to Cell Painting fingerprints.

      The authors use three separate cell lines and CRISPR knockouts delivered through lentivirus that target 59 genes to establish a range of cell physiologies that they directly measure (the "Cell Health") and then relate to similar assays performed by Cell Painting. Ultimately they aim to use Cell Painting models to predict Cell Health.

      Major Issues:

      It appears that the phenotypes that are detected at a high enough level of significance (see Fig. 2), e.g DNA damage (gH2Ax), apoptosis (Caspase 3/7), dead cells, ROS (CellROX), etc. are probably most easily detected by simply monitoring DAPI signal in these screens. To detect many of the phenotypes, the authors have presented a fairly complex method of doing much simpler assays. The authors correctly highlight in Fig. 3 that the phenotypes they are detecting go beyond pure signals from DAPI. They report power in their models from Radial Distribution across many different components of the Cell Painting feature set. However these appear to give outputs that won't be that useful. It is hard to tell whether this is simply because they don't have enough images or whether their signal is confounded by using cell lines where the lentivirus CRISPR knockouts are working less efficiently.

      It seems misleading (or perhaps the explanation lacks clarity) to describe in the same paragraph the need to validate the model by applying it to new datasets, namely the Drug Repurposing Hub project, then describe gradients in cell health features across UMAP coordinates. Is it surprising that cell health phenotypes and gradients therein are present in a dataset describing cell health perturbations? The actual test of the model's performance is in the paragraph below, but the data associated with the Spearman correlation is hidden in Fig. S10b. The data is not convincing by eye, and the artifactually low p value suggests that proper statistical corrections were not applied.

      Fig 1A and associated methods are not sufficient information to describe the manual gating strategy and any variability found across iterations in these gates. Effort should be taken to quantify where these manual boundaries were set and why.

      A fundamental issue that the authors mention but do not address is the efficiency of the CRISPR KOs. The authors should measure the efficiency of representative guides and present these data to help support the interpretation of their models.

      The authors conclude that their results motivate further data acquisition and model training, and that this will improve model performance. This is only true if their lack of predictive power comes from the data volume itself, and not in larger problems of data quality, variability and the core assumptions of their method. The authors note the better predictability in ES2 cells, likely due to higher CRISPR efficiency and therefore stronger phenotypes. It is possible, as I believe the authors suggest, that the ES2 cells provide information that improves the predictive power of cells with poor infection efficiency. It is instead possible that only the ES2 cells with strong phenotypes yield predictive power, pulling the average of the dataset up. Authors could train the cell line specific datasets independently and compare relative changes in predictive performance. Otherwise, is it possible that subtle or highly complex phenotypes simply cannot be detected by this method and more data will be unlikely to improve predictability in modest perturbations.

      Although the authors argue that the Cell Painting assay is capturing complex health phenotypes using a variety of morphological features, there is a clear overweighting of a particular few (in fact two...). It would be interesting to systematically retrain with exclusion of particular features to determine if equalizing the weight across features changes performance. These are also notably the feature groups with the fewest features-- how many individual features within these feature groups are pulling all the weight?

      In summary there is a very interesting concept here, but for several possible, currently undefined reasons, the authors are reporting a very weak measurement. The authors allude to these limitations, but it would be great if the authors could address these issues and provide a stronger dataset.

      Minor issues: Authors should include representative images of their Cell Health assay in the main figures. A full figure of all labels and examples of manual gating should be included (S1 is too limited) Scale bars need to be included in all images, some are missing in S1

      "20x water objective in confocal mode" is not a sufficient level of detail on image acquisition parameters especially considering the lack of representative images. At the very least, NA and if appropriate pinhole size should be reported. Similarly, "9 FOV per well" is not sufficient. Pixel size and FOV area/dimensions are necessary.

      The legends for the different parts of Fig S10 are transposed which makes the figure quite confusing.The authors should amend or clarify the language of "guide perturbation" and "guide profile".

      EdU is defined after it is abbreviated in methods

      The authors should address the following image processing reproducibility concerns:

      Segmentation and feature extraction parameters are not included in the Supplementary Information. Either attach the CellProfiler pipeline or add a table with parameters and settings used for each module.

      CellProfiler and Harmony versions are missing.

      Subpopulation definition (page 14) should be defined in a way that the algorithms (pipelines) could be reproduced, e.g.: "unusually high intensity of Hoechst max" requires a stricter definition.

      Why is the nucleus roundness calculated in PE Harmony and not in the CellProfiler pipeline itself?

      Reviewers: Jason Swedlow Melpi Platani Erin Diel Emil Rozbicki

      Significance

      Nature and Significance: This study aims to demonstrate how phenotypic studies using different markers can be combined and linked to deliver wider application and value.

      Relationship to Published Work: This study extends previous work from the same group and attempts a novel extension. The approach is a useful concept and potentially important.

      Audience: The method this paper proposes will be of interests to scientists involved with drug discovery and/or computational biology.

      Reviewer's Expertise: Cell Biology, Imaging, Imaging Informatics, Machine Learning, Computer Vision

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The submitted manuscript entitled 'Predicting cell health phenotypes using image-based morphology profiling' (RC-2020-00394) by Way et al. presents a set of seven dyes/staining (as two separate panels) to microscopically screen cell viability. For automatic classification a training/test set of 119 CRISPR (approximately 2 sgRNAs per gene) perturbations on 3 cancer cell lines were generated (lung A549, ovarian ES2, lung HCC44). After segmentation of cell nuclei a set of morphological cell measurements were extracted from each perturbation (total 952 features). The nature of these feature spanning cell cycle and viability phenotypes, enabled the authors to define 70 different phenotype classes, which are used to model a classifier by elastic linear regression. Specific definitions (cell cycle and ROS) were partly predicted/validated in an independent existing image data set (Drug Repurposing Hub project). The data is available as web-based application/visualization and the supplementary method is well described.

      Major concerns:

      (1)The only fundamental argument of this manuscript not to apply state-of-the-art deep learning (DL) machine-learning (mentioned in McCain et al. 2018), which does not require segmentation, feature extraction, abstraction, manual gating is the 'interpretability' of the predictions. However, performance, precision, scalability (by modern GPUs) with DL should clearly outperform 'manual' regression models. All recent machine vision benchmarks in microscopy confirm this, but also clearly shows 'real world' translational applications, e.g.

      https://www.nature.com/articles/s43018-020-0085-8,

      https://www.biorxiv.org/content/10.1101/2020.07.02.183814v1.full.pdf,

      In other words, the presented methodology is not compared to DL, and is not convincing in terms of interpretability benefits.

      (2)One aforementioned point of the methodology is cryptically/not described: Why it should be less expensive compared with other (which?) approaches (see introduction)?

      (3)Generalizability and/or training data size is essential for any model-based classification, but not evaluated or validated in the current manuscript. The independent validation on a A549 cell line only data might be not sufficient/convincing.

      Minor concerns:

      (1)Highest test performance comprises that precision is mainly driven by cell cycle/count and live status and could be probably derived from DRAQ7 (Fig. 2) and DNA granularity (Fig. 3, bottom right) and would argue for rigid feature selection across channels and features.

      (2)Any H2AX and 'polynuclear' would probably fail in any cell line with this size of training data.

      (3)To what refers the 'weights' of the model in Fig. 1c?

      Significance

      This manuscript is not advanced in the context of latest improvements/developments of cell-based microscopic classification. Rationale in the introduction and the conclusion are not linked (interpretability, generalizability, costs). It seems to be unfinished or unformatted to this end?

      The author/co-authors have been instrumental/pioneered with their past work on cell-based image processing (CellProfiler software), but the presented methodology is simply outdated. Therefore, a revision towards a comparison and benchmarking with DL will also not help.

      Ref (DL with MIL): https://academic.oup.com/bioinformatics/article/32/12/i52/2288769

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers:

      We are grateful to the referees for investing valuable time in reviewing our work, and for recognising the importance and utility. We thank them for their insightful and constructive comments that have helped us significantly improve the manuscript.

      Below, we provide a point-by-point response to all specific questions raised.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In order to improve SARS-CoV-2 diagnostics, Reijns et al. developed a multiplexed RT-qPCR protocol that allows simultaneous detection of two viral genes, one housekeeping gene as well as an external gene as an extraction control. Compared to running parallel assays to detect genes individually, the turnaround time is much shorter and reagents are saved. Furthermore, the presented data suggest that the assay is more sensitive than commercial kits. The authors also propose the detection of the human housekeeping gene as a measure of sample quality control. In principal, this work has potential but the manuscript itself needs a better structure. **Major concerns:** The authors have used the Takara RT-qPCR kit for their study. Did the authors try other commercial kits?

      We have not assessed other commercial kits as the Takara reagent performed well, and has been easy to source. We expect that other one-step kits could be used if the need arose.

      When we initiated this work in March 2020, we selected the Takara One Step PrimeScript™ III RT-PCR Kit based on 1) the practical advantages of a one-step reaction mix, 2) published evidence of its successful use in SARS-CoV-2 detection (see below), 3) availability in sufficient quantities for testing at scale, and 4) affordability.

      (Published evidence: One of the first descriptions of an assay to detect SARS-CoV-2 [1] employed the Takara One Step PrimeScript™ III RT-PCR Kit, and this kit was later shown by others to perform as well as or better than Qiagen Quantifast Multiplex RT-PCR +R mastermix, ThermoFisher TaqPath 1-Step RT-qPCR MasterMix and ThermoFisher Taqman Fast Virus 1-step mastermix, when used to detect SARS-CoV-2 RNA from nose and throat swabs with N1, N2 or N gene assays [2].)

      Can the authors elaborate on the supply chain of the Takara kit?

      We have not had problems securing the Takara kit in sufficient quantities and in a timely fashion, and did so through the company’s Scotland and NE England representative. The managing director of Takara Bio Europe provided the following statement, as a clarification of the supply chain:

      “Takara Bio Inc. has worked on significantly increasing the production of one-step RT-qPCR reagents to cover worldwide needs for SARS-CoV-2 detection. The production of this kit is based in China under ISO13485 certification and the European stock is based in, and distributed, from Paris. Throughout this pandemic, Takara Bio Europe has supplied millions of reactions around Europe to COVID-19 testing labs, without encountering any shortages or significant shipping delays.”

      Could it cover population testing in case of shortages of other commercial kits?

      Yes, it could. The Takara kit is available in 4,000 and 20,000 reaction pack sizes and therefore could well be a useful option in case of shortages of other commercial kits. Indeed, one motivation for developing the multiplex assay was to ensure diagnostic testing resilience in the face of reagent shortages.

      For better comparison, is it possible to give information on which primers the commercial kits are based on?

      We contacted both ThermoFisher and Abbott to ask for more information on the primers and probes included in the TaqPath COVID‐19 Combo Kit (detects N, ORF1ab and S gene) and Abbott RealTime SARS‐CoV‐2 assay (detects RdRp and N gene). Unfortunately, we were informed that this information is proprietary. For clarity, we have included the following in the Materials and Methods section:

      “Primers and probes included in the TaqPath COVID-19 Combo Kit (Thermo Fisher Scientific, Cat. No. A47814) detect SARS-CoV-2 ORF1ab, N and S gene; those in the Abbott RealTime SARS-CoV-2 assay (Cat. No. 09N77-090) detect RdRp and N gene. Further details are not available, as this information is proprietary.”

      Also, explain better the primers used in this study. For example, the N1 and N2 primers are directed against different regions of the SARS-CoV-2 N gene.

      We thank the reviewer for encouraging us to better explain the primers we use for our own assays, and now provide more detailed information in a new Fig 1.

      The result section needs a better structure as the first two pages do not refer to any of the main figures. For example, in which figure or table can the reader find the data that are discussed in lines 83 to 87?

      We have now substantially re-structured the entire Results section, and include the data that was discussed in lines 83 to 87 of the original manuscript, in Fig 1D of the revised manuscript.

      Table S1, instead of current Table 1, could be moved to main figures as it contains the important finding that the multiplexed assay may be more sensitive than the commercial one.

      As suggested, we have moved Table S1 to the main display items (now Table 1), and moved the original Table 1 to the supplementary items (now Table S3).

      The authors identified some samples that scored negative in commercial assays but positive in their new assay. This is important, however, the possibility of detecting false positives should be strengthened in a "Discussion" section.

      We thank the reviewer for highlighting this, and now discuss the issue of detecting false positives in more detail in the Discussion section of the revised manuscript:

      “RT-qPCR tests are molecular tests with high intrinsic accuracy, however false positive and false negative results can occur. The use of multiplex assays that detect multiple SARS-CoV-2 targets, such as those reported here, reduces the chance of both. Off-target reactivity is one possible cause of false positives, and although some have reported high false positive rates for the E gene assay [20, 22], this does not match our experience. In two patients, our N1E-RP and N2E-RP assays detected virus, albeit weakly, whereas commercial assays did not. As multiple SARS-CoV-2 targets were positive, these are likely true positive results and not due to off-target reactivity. False positives can also occur due to lab issues such as sample mislabelling, data entry errors, reagent contamination with target nucleic acids or contamination of primary specimens. However high standards of quality control at all stages of testing, and effective mitigation strategies should quickly identify problems. Additionally, sample re-test with an independent assay and/or patient re-sampling should also be effective measures to counter false positives, particularly in low pre-test probability situations such as mass screening.”

      Figures 1 to 3 have different panels which seem to be redundant. For example, Fig 1 A and B, Fig 2 B and C, Fig 3 C and D.

      These panels did contain the same data, plotted to convey slightly different information. However, we agree that this introduces a level of redundancy. For enhanced clarity, in the revised manuscript, we have removed most of these panels altogether, or moved them to supplementary figures.

      Figure 1: Give a rational why comparing before and after extraction. This heavily depends on the extraction method and not on the detection itself. In addition, IVT RNA does not reflect the complexity of a clinical specimen. This is rather confusing and deviates from the important findings.

      As part of the validation procedure it was important for us to show that the entire workflow, including the extraction procedure, was robust for use in clinical diagnostics. In this context, comparing pre- and post-extraction RT-qPCR results for both IVT RNA and viral samples provided us with an opportunity to test extraction efficiency. However, we agree that for the purpose of this manuscript, the inclusion of these data in (the former) Fig 1 detracted from the main message. In the revised manuscript we have therefore moved the data comparing Cq values before and after extraction to a new Fig S1, and briefly state the rationale behind this in the main text and figure legend.

      It was not our intention to imply that IVT RNA in any way reflects the complexity of a clinical specimen. We include these data as part of the step-by-step validation of our assays. Firstly, we show high sensitivity using IVT RNA; secondly, we show that a similar sensitivity is achieved on viral positive controls; and thirdly, we show that our assays perform equally well to widely used commercial assays on clinical samples.

      Figure 3: Were any of the negative samples/patients tested with an undetectable housekeeping gene, re-test positively?

      None of our patient samples had undetectable levels of RPP30. We note that all NTS samples were collected by healthcare professionals and in this context such findings will likely be rare. However this may not be the case when dealing with samples obtained by self-swabbing as the reviewer highlights in a comment below.

      Did adding this housekeeping gene as a control actually improve the detection of any patient samples? If the authors want to convince the readership of this quality control, experimental evidence should be provided.

      Fig 3C and D seem to contain this information somewhat, as here, the values were normalized and the CT values for the E and N gene decreased. Nevertheless there is no real explanation of this figure provided in the Result section at all. While this figure has potential, the authors have to keep in mind that the number of cells in a swab can be affected by many biological factors, including age, sample timing, inflammation of the respiratory tract, etc. In addition, viral genomes can exist intra- as well as extracellular, in the form of free virus. So even in the absence of human cells/detectable housekeeping genomes, viral RNA can be or should be present in a sample in case of infection. This explains (probably) why a correlation between detectable housekeeping gene and viral RNA is absent (Fig 3A and B?). This entire Fig 3 just needs a better explanation. The provided text does not describe any results and should go into a "Discussion" section.

      We thank the reviewer for highlighting the need to explain Fig 3 more clearly and that a key question is whether there is a correlation between the levels of the housekeeping control and viral RNA. Prompted by this question, we reanalysed our data and now show that there is a strong and statistically significant positive correlation between Cq values for RPP30 and SARS-CoV-2 targets (see below, and new Fig 4C). This shows that there is a lower probability of detecting SARS-CoV-2 RNA in samples that contain fewer human cells. This likely implies that for samples with high RPP30 Cq values, a proportion of virus positive samples will be missed, contributing to the high false negative rates that have been reported [3-5].

      Providing additional experimentation would require systematic re-contacting and re-testing of cases, and this is beyond our current research framework. While outside the scope of the current study, we hope that our manuscript will encourage others to perform the necessary large-scale experiments. Nonetheless, with this correlation alone, we believe that RPP30 provides useful information of benefit to clinical diagnostics (also see our response to Reviewer 2), and in the revised manuscript we outline how it might be best utilised (Discussion, Table S6).

      To provide a better explanation of Figure 3 (now Fig 4), we have included the following in the Results section:

      “A statistically significant linear correlation between Cq values for each of the viral probes (E, N1, and N2) and the Cq values for the RPP30 sample quality probe (p 40; Fig 4D and Fig S4A). Theoretically, using this approach, even a strong positive sample (SARS-CoV-2 Cq value of 28.2) of good quality (RPP30 Cq value of 20.3) may have given a false negative test result (SARS-CoV-2 Cq value of 40) if it had contained the same low amount of human material as the reference sample (RPP30 Cq value of 32.1; viral Cq: 32.1-20.3+28.2=40). Conversely, normalising samples to an optimal quality sample (RPP30 Cq 20.1/20.3) gives an indication of what viral Cq values may have been if all samples had contained a similar (more optimal) amount of material (Fig 4E, Fig S4B). This highlights the possibility that a proportion of apparent SARS-CoV-2 negative samples are in fact false negatives as a result of insufficient material in the swab fluid.”

      Self-swabbing is surely a potential source of variability and false-negatives, but many publications have shown the suitability of saliva testing. This should also be discussed and would probably negate the need for such a quality control.

      We agree with the reviewer that self-swabbing will be more prone to variability. Therefore, the RPP30 control will have particular value here, lowering the associated risk of false negatives. While NTS sampling remains a major modality for testing for the foreseeable future, saliva is certainly a potential alternative strategy, one that may benefit from lower sample variability.

      We now include the reviewer’s point on this in the Discussion:

      “Testing saliva, as an alternative to NTS sampling, could also be beneficial as a modality that may have less-sample to sample variability [7]”

      Which assay works better, the N1E-RP or the N2E-RP assay? A final conclusion is missing here.

      Although we could not detect substantial differences between these two assays during our validation process, others have reported a marginally higher sensitivity of the N1 over the N2 assay [6]. We would therefore recommend the use of the N1E-RP assay for first line testing, with the N2E-RP assay available as a second line test of equivalent sensitivity in case of inconclusive initial test results. We comment on this in the revised manuscript:

      “Although we did not detect substantial differences between our two assays, others have reported higher sensitivity of the N1 over the N2 assay [19]. We therefore recommend the use of the N1E-RP assay for primary testing, and the N2E-RP assay could be employed if initial results are inconclusive.”

      Reviewer #1 (Significance (Required)): Naturally, in this pandemic, this topic is important as sensitive and affordable methods to detect SARS-CoV-2 infections are in need. This Reviewer agrees that multiplexing could be an elegant approach to fill this need.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): In this manuscript, Reijns and colleagues describe an approach to detect the causative agent of COVID-19, the beta coronavirus SARS-CoV-2, using an inexpensive in-house multiplex RT-qPCR. Concomitantly, viral E, N and RdRP(probe P2) as well as human RPP30 and a herpesvirus nucleic acid are also detected in order to monitor both the sample quality and the sample preparation. Reijns et al. performed testing on a huge amount of samples and used the data to describe the strength and limitations of the assay. The data is sound and give a very good impression of the 4-plex PCR capabilities. I read manuscript fluently and consider as linguistically very good. However, I still have a few comments and remarks that would strengthen the manuscript:

      **Major issues:** In the first section of the results section, many primer / probe conditions are given that make the reading flow difficult. Instead of using (data not shown) it would be helpful to use a table or a graphic to illustrate the various approaches.

      We thank the reviewer for suggesting the use of graphics to explain our different approaches. To aid the reader, we now include a diagram in the new Fig 1 that shows the positions of primers and probes used in our work (A), and illustrate the various 4-plex assays (B, C).

      In general, I suggest to replace Ct by Cq, since the IVT standards are a quantification method.

      As suggested by the reviewer we now use Cq instead of Ct throughout our manuscript, following MIQE guidelines [7].

      There has already been a change away from the initial E and RdRP gene based assay because of the published sensitivity issues and the use of degenerate bases as well as the detection of unspecific nucleic acids for E gene). In particular, it has been shown that the Sarbeco-E-yields false positive results (Toptan et al. 2020 (https://doi.org/10.3390/ijms21124396), Konrad et al. 2020 (https://doi.org/10.2807/1560-7917.ES.2020.25.9.2000173)), so that many laboratories do not consider E-gene-based results for borderline samples anymore. In this manuscript, the authors should comment on why they still use the results from the E gene / RdRP and describe their experience.

      We thank the reviewer for highlighting issues with the RdRp and E gene primer/probe pairs. In the process of our work we had also become aware that the RdRp-P2 assay suffers from low sensitivity, as has now been widely reported. However, although the E gene assay also detects SARS-CoV, we were not aware of potential problems with high rates of false positives as described by Toptan et al (2020) and Konrad et al (2020). It did come to our attention that early on in the pandemic some oligonucleotide producers reported problems with contamination of primers with SARS-CoV-2 template RNA synthesised in the same facilities, and were careful to avoid these providers. In our work, we did not experience any problems with apparent false positive detection of the E gene: it was never detected in any of our negative controls, and out of 84 patients that tested negative with the commercial TaqPath assay we did not find any that were positive for E gene only when using our N1E-RP and N2E-RP assays. In this context, it is also important to emphasise that a positive diagnosis is given only when both viral targets are detected (Table S6), which is one of the strengths of our multiplex assays.

      As suggested by Reviewer 1, we now discuss the issue of false positives in more detail in the revised manuscript. We also comment on high false positive rates observed by others for the E gene assay, citing the two studies, but state that this does not match our own experience:

      “Off-target reactivity is one possible cause of false positives, and although some have reported high false positive rates for the E gene assay [20, 23], this does not match our experience.”

      In this manuscript, it should be indicated that the SARS-CoV-2 specific Probe P2 (according to Corman et al. 2020) was used. The reason for lower sensitivity due to nucleotide ambiguity and mismatch has to be explained in more detail. In addition to Corman et al. 2020 (see reference 2), Toptan et al 2020 (https://doi.org/10.3390/ijms21124396) might serve as helpful literature.

      In tables describing primers and probes, and the new Fig 1, we indicate that we used RdRp probe P2. In addition, we now also specifically state in the legend of Fig 1 that this probe only detects SARS-CoV-2 and that the primers used in the RdRp-P2 assay (as originally designed by Corman et al) contain nucleotide ambiguities and a mismatch. Finally, in the main text we explain:

      “Overall, we find RdRp detection to be at least 20-fold less sensitive than for E gene, N1 and N2 under our assay conditions; consistent with reports by others [19]. This may be due to a mismatch in the reverse primer employed in the RdRp (P2) assay, as originally designed [14].”

      With regard to the marginally positive samples that were not consistent in all assays, were the PCR products analyzed using high-resolution PAA genes and, if possible, sequenced? The sequencing approach (Sanger or NGS) offers the final characterization of the PCR products (especially for pan-genotypic primers such as E-Sarbeco). The samples declared as "inconclusive" could be further characterized in this way.

      Unfortunately, it has not been possible for us to carry out additional analyses for such (now historical) samples. Given the high prevalence of SARS-CoV-2 and the low sequence variability at primer/probe binding sites (new Table 2 and S5), inconclusive or marginally positive samples most likely reflect low viral load and/or low sample quality. Nevertheless, we now highlight the utility of further characterising such samples in the revised manuscript:

      “However, differentiating between samples with low viral loads and false positives is challenging. Analysis of such samples by Sanger sequencing of PCR products, or nanopore sequencing of RNA present could provide useful information. Further clinical evaluation and repeat sampling of the patient involved may also be a beneficial route to a secure clinical diagnosis.”

      The normalization in figure 3 should be also explained in the main text. Especially, why this approach was used for normalization.

      In the Results section we now describe the normalisation as follows:

      “A statistically significant linear correlation between Cq values for each of the viral probes (E, N1, and N2) and the Cq values for the RPP30 sample quality probe (p 40; Fig 4D and Fig S4A). Theoretically, using this approach, even a strong positive sample (SARS-CoV-2 Cq value of 28.2) of good quality (RPP30 Cq value of 20.3) may have given a false negative test result (SARS-CoV-2 Cq value of 40) if it had contained the same low amount of human material as the reference sample (RPP30 Cq value of 32.1; viral Cq: 32.1-20.3+28.2=40). Conversely, normalising samples to an optimal quality sample (RPP30 Cq 20.1/20.3) gives an indication of what viral Cq values may have been if all samples had contained a similar (more optimal) amount of material (Fig 4E, Fig S4B). This highlights the possibility that a proportion of apparent SARS-CoV-2 negative samples are in fact false negatives as a result of insufficient material in the swab fluid.”

      Nonetheless, it looks like the normalized values wills cluster much more strongly than those corresponding to the actual values. The authors should comment on this phenomenon. It appears that the higher cq values (less virus) are subject to a strong correction factor more often than high values. Are there any statistical relevant tendencies towards this phenomenon? For everyday clinical practice, does this mean that low samples Cqs (mostly) only reflect the quality of the sample, but not the viral load?

      We thank the reviewer for highlighting the stronger clustering of Cq values after normalisation, and for encouraging us to explore this further. We now show that there is a statistically significant linear correlation between RPP30 and SARS-CoV-2 Cq values (Fig 4C). This would indeed imply that a substantial proportion of the variability in SARS-CoV-2 Cq values seen in clinical practice is due to sample quality rather than different viral loads. However, outliers from the linear correlation when comparing samples from many different patients are to be expected (as seen in Fig 4C), because viral load is known to vary, with time of sampling relative to onset of symptoms one important contributing factor. In a research context, expressing viral load relative to a human control may be beneficial to differentiate between sample quality and absolute quantities of (intra/extracellular) viral RNA.

      In the revised manuscript we state:

      “Notably, the SARS-CoV-2 Cq values clustered more strongly after normalisation (Fig 4D, E; Fig S4). This reduced variability not only shows that the amount of human material present in NTS samples impacts on assay sensitivity, but also suggests that variability in viral load is not as great as implied by RT-qPCR data without normalisation.”

      Finally, it remains somewhat unclear to what extent the Cq values of the RPP30 should have an influence on the routine diagnostics. The authors discuss that a fixed cutoff value would be a possibility to sort out poor swab samples, but if a cq value is available it would also make sense to generate a kind of quality score that can display the significance of a test. It would be helpful if the authors could comment on this or other possibilities.

      We agree that it would be beneficial for routine diagnostics to derive such a measure. However, at this stage we do not have sufficient data to generate a robust quality score based on the RPP30 Cq values. Nonetheless, we believe RPP30 Cq values have immediate utility for routine diagnostics, and could help improve validity of test results going forward:

      1. Samples with undetectable RPP30 should trigger repeat sample collection, and not be given a false negative test result;
      2. Samples with high viral Cq values and/or for which only one of two viral targets are detected can be better interpreted in the context of the amount of human material as measured by RPP30 Cq;
      3. Ongoing monitoring of swab quality allows rapid identification of potential technical issues with swabbing;
      4. Normalisation of viral Cq values using RPP30 Cq values might be helpful in a research context to derive a more meaningful measure of viral loads, by removing one source of variability;
      5. Collection of such data on an ongoing basis would ultimately allow this to be translated into a quality score that could be used as part of diagnostics algorithms. In the revised manuscript we now discuss this as follows:

      “Absence of RPP30 signal (undetected or Cq >40) clearly indicates that absence of viral detection cannot be interpreted as a negative test result and that a repeat test is required (Table S6). However, utilising RPP30 Cq values when interpreting an apparent SARS-CoV-2 negative sample requires further consideration: what should the RPP30 Cq limit be for which to order a repeat test? One option would be to simply set an arbitrary cut-off, e.g. one could decide to re-test any samples with RPP30 Cq >30, or with Cq values above the 95th centile (Cq ~ 31 for our 108 samples). To determine robust cut-off limits, collection of RPP30 data for a much larger number of patient samples would be desirable. This would allow development of diagnostic algorithms that could incorporate a sample quality score based on the level of RPP30 detected. Nonetheless, RPP30 data, even as it stands, are useful for the interpretation of cases for which only one of the SARS-CoV-2 targets is (weakly) positive, with samples with high RPP30 Cq values interpreted with particular caution. In such cases, repeat testing of the same sample (with an independent assay of equal or better sensitivity) would be advisable, and repeat patient specimen collection and testing might also be considered (see Table S6 for guidance).”

      Over the past few months, more and more virus subtypes have formed through the manifestation of point mutations (and amino acid substitutions). The authors should therefore definitely comment on the current strains as to whether all primers / probes are able to detect the virus variants circulating worldwide without loss of sensitivity.

      We thank the reviewer for this suggestion and now include a table providing information on mismatches in primer and probe binding sites (see Table 2 and Table S5 of the revised manuscript). This shows that only a small proportion of 97,782 strains for which high quality genome sequencing is available have changes in primer/probe binding sites. In addition, the use of two different primer/probe sets in our multiplex assays provides a further safeguard against failure to detect strains with such changes.

      Along this line, which virus strains were used for the cultivation as described in line 131? Is sequence data available? If so, it would provide helpful information to characterize the viral strain.

      We have added strain information, accession codes for genome sequences and information on primer/probe binding for both control strains (hCoV-19/England/02/2020 and BetaCoV/Munich/ChVir984/2020) we used in our work (see Materials and Methods of the revised manuscript).

      Line 206ff: In my opinion, this section belongs more to the discussion part than to material and methods that describe the technical implementation.

      We agree and have now moved this section to the Discussion. Furthermore, we’ve made additional changes to better highlight the potential for further improvements to our assays, and SARS-CoV-2 RT-qPCR assays in general.

      Is there a loss of sensitivity compared to the single PCRs? This data is very important and useful for other users. They should therefore be included explicitly in the manuscript (supplements).

      We set out to develop multiplex PCR assays to allow more efficient and cost-effective testing. In the early stages of this process we performed small pilot experiments with positive control IVT RNA and individual primer/probe pairs that are widely used and well-established to sensitively detect SARS-CoV-2 RNA. With the exception of the RdRp primers/probe, we found all to perform well, with the ability to detect 10 copies of RNA. However, we did not perform a side-by-side comparison of uni- and multiplex PCRs, and to improve the structure and flow of the Results section, as requested by Reviewer 1, we have now removed all mention of the single PCR assays.

      Altogether, the key message of our work is that the N1E-RP and N2E-RP assays are able to detect between 1 and 3 copies of SARS-CoV-2 RNA and show equivalent performance to commercially available multiplex assays.

      **Minor issues:** Line 15 ff.: Source is missing, is this WHO-data?

      The estimated number of infections and fatalities at the time of writing of the original manuscript was based on data from the online interactive dashboard hosted by Johns Hopkins University. At the suggestion of Reviewer 3, we have removed precise numbers from the revised manuscript to make the introduction less time-dependent. Nonetheless, we now include a reference to the JHU online resource, as well as the weekly epidemiological updates from the WHO (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports) for readers interested in the latest figures.

      Fig S3: How was the digital droplet PCR carried out? A brief description should be included in the legend text.

      We purchased these samples from QCMD, an independent International External Quality Assessment & Proficiency Testing (EQA/PT) organisation. QCMD performed the digital droplet PCR, before distribution under the QCMD 2020 Coronavirus Outbreak Preparedness EQA Pilot Scheme, and provided us with details, which have now been added to the Materials and Methods section:

      “Quantification of control samples was carried out by QCMD prior to distribution within the EQA scheme, using droplet digital PCR (ddPCR) with E-gene primers and probe [13, 14] on the Biorad droplet digital PCR platform. A serial dilution of inactivated SARS-CoV-2 (strain BetaCoV/Munich/ChVir984/2020; GenBank Accession MT270112, [32]) was prepared and each dilution replicate tested 4 times using both RT-qPCR and ddPCR assays. Regression analysis was used to assess the linearity across the dilution series, and the analytical measurement range established for both assays, comparing results of each by Bland-Altman difference plot."

      In addition, we provide more details with the relevant table (new Table S3) and in the legend of the associated figure (new Fig 2) we state: “See Materials and Methods for details”.

      Figure 1a: PCR efficiencies are missing.

      We have now added PCR efficiencies to all relevant graphs.

      Line 145: MS2 appears, but without explaining the context. This should be improved here with additional information (this does not appear until line 154).

      At first mention of MS2 in the main text, we now state:

      “Internal controls were included to provide confirmation of successful nucleic acid extraction and absence of PCR inhibitors, with lysis buffer spiked with both MS2 (an RNA bacteriophage that infects Escherichia coli) and PhHV (a DNA virus that infects seals), detected by the TaqPath and N1E-RP/N2E-RP assays respectively..”

      Page 15, H20 instead of H20, reaction mix instead of Reaction mix.

      In the supplementary protocol, we have changed “H2O” to “H2O” and “Reaction mix” to “reaction mix”.

      Reviewer #2 (Significance (Required)): The novel coronavirus SARS-CoV-2 is the causative agent of the acute respiratory disease COVID-19 which has become a global concern due to its rapid spread and high death rate. While some patients have no symptoms at all, but are still able to spread the virus, others have severe symptoms, often with fatal outcome. The gold standard in SARS-CoV-2 detection is the RT-qPCR approach, however, the high cost commercial kits are available in limited amounts only. The issue of the scarcity of resources is still an highly important issue, especially in terms of the incredibly rapidly increasing number of cases worldwide. Thus, the manuscript is of significance for the field and timely. Especially, diagnostic laboratories in low-income countries that are involved in the managing the pandemic but also researchers will benefit from this manuscript and save resources.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): In this study Reijns et al developed a multiplex RT-PCR assay (alternative primer-probe sets and enzyme mixes) to detect SARS‐CoV‐2 and internal controls. The authors conclude that their assay performed equally well as established commercial kits. The authors also demonstrated that nose‐and‐throat swab samples have considerable variability in patient material content (>1,000‐fold variability). High variability is expected, but it is still important to substantiate this notion with numbers. Overall, I like the study and find it methodologically sound. Sample numbers in the tests are in most cases good. I have very few objections and hope to see the manuscript published soon.

      **SPECIFIC COMMENTS:** 1."The COVID‐19 pandemic originated in Wuhan (China) in December 2019 and at the time of writing has infected more than 13.1 million people worldwide, resulting in well over 0.57 million COVID‐19‐related deaths..." I suggest a more timeless starting of the introduction, not pointing out exact number of infections and deaths since these numbers quickly become obsolete. The reader will know the severity of the pandemic and the importance of methodological development without statement of exact numbers. This comment reflects my personal opinion and it is completely up to the authors to choose how to phrase this section.

      We agree with the reviewer that a more timeless start to the introduction makes more sense. Therefore, as suggested, we have changed this section of the manuscript, which now reads as follows:

      “The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2 [1], originated in Wuhan (China) in December 2019 and rapidly spread across the globe, resulting in substantial mortality [2, 3] and widespread economic damage. Until a vaccine becomes available, public health strategies centred on reducing the rate of transmission are crucial to mitigating the epidemic, for which effective and affordable testing strategies to enable widespread population surveillance are essential.”

      2.Tables listing primers and probes should include the amplicon (PCR product) length for each primer-probe pair. Product length is an important consideration for fragmented RNA samples, such as for example heat-inactivated or longer-term stored samples. It should not be put on the reader to find out the amplicon lengths.

      To provide the reader with this information, the revised manuscript now includes the following:

      1. As suggested, the amplicon length for each primer pair is added to all tables that list primers and probes (PCR products: RdRp – 100 bp; E- 113 bp; N1 – 72 bp; N2 – 67 bp);
      2. A diagram in a new Fig 1 indicates the positions of all primers and probes on the SARS-CoV-2 genome along with amplicon length.
      3. A supplementary SnapGene file with primers and probes on the SARS-CoV-2 (Wuhan-Hu-1) genome to allow readers to look at further details in the context of the viral genome.

        3.Line 131: "To confirm sensitivity using total viral RNA, nucleic acids isolated from cultured SARS‐CoV‐2 were also used to make a dilution series (10^‐1 to 10^‐6)." I lack a methodological description how viral nucleic acid was quantified. It is not entirely trivial to separate viral RNA from RNA contributed from the cells used for the in vitro expansion of the virus.

      We apologise for the lack of clarity on this in our original manuscript. The purpose of this experiment was not to measure a defined number of RNA molecules, but to ensure that there was no inhibition of viral target amplification in a more complex sample by demonstrating linearity over a range of dilutions. The cultured SARS-CoV-2 positive control was provided by Prof Rory Gunson (Clinical Lead West of Scotland Specialist Virology Centre, NHS Greater Glasgow and Clyde) as inactivated supernatant from virus (strain hCoV-19/England/02/2020) propagated in cell culture. We then isolated RNA from a dilution series of this supernatant, using the methods described in our manuscript, but did not determine the precise concentration. The RT-qPCR data for this series shows a good fit and amplification efficiency, similar to what was found for the IVT RNA, and QCMD virus calibration curves (new Fig 2). The known copy number of the QCMD virus (as determined by ddPCR) allowed us to calculate that the concentration of the virus in the supernatant provided to us was between 0.7 and 2.2 x 105 copies/ml, with viral RNA detected down to between 0.7 and 3 copies with our N1E-RP and N2E-RP assays. We have substantially restructured the results section, and hope to have made the way we used the different viral controls clearer in the revised version of the manuscript.

      4.Line 150: "All positive and negative controls gave the expected results (Table S4)" I don't like the exact formulation since it is not clear for the reader what are the "expected results", including the "expected" quantitative results (Ct).

      We agree that the use of “expected results” does not provide the reader with sufficient information. We have therefore changed this to:

      “Results for controls were as anticipated (Table S4), with signal absent (undetermined) for SARS-CoV-2 and RPP30 targets for the negative controls, and Cq values for the SARS-CoV-2 RNA positive control (50 copies) similar to those obtained previously (Fig 2A).”

      In addition, we now provide more information on the precise nature of the negative and positive controls with Table S4:

      “-ve (extr), negative control with viral transport medium after RNA isolation (does not contain SARS-CoV-2 or human material; does contain PhHV);

      -ve, negative control containing water only (should not contain any RNA)

      +ve, positive control with in vitro transcribed RNA (50 copies; contains SARS-CoV-2 target RNA, does not contain human or PhHV nucleic acids)”

      Reviewer #3 (Significance (Required)): This study provides an alternative multiplex RT-PCR assay to detect SARS-CoV-2 infection. I find the results important and useful for the research and medical community.

      Rebuttal references

      1. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med. 2020;382(8):727-33. Epub 2020/01/25. doi: 10.1056/NEJMoa2001017. PubMed PMID: 31978945; PubMed Central PMCID: PMCPMC7092803.
      2. Brown JR, O’Sullivan D, Pereira RP, Whale AS, Busby E, Huggett J, et al. Comparison of SARS-CoV2 N gene real-time RT-PCR targets and commercially available mastermixes. 2020:2020.04.17.047118. doi: 10.1101/2020.04.17.047118 %J bioRxiv.
      3. Arevalo-Rodriguez I, Buitrago-Garcia D, Simancas-Racines D, Zambrano-Achig P, del Campo R, Ciapponi A, et al. False-negative results of initial RT-PCR assays for COVID-19: A systematic review. 2020:2020.04.16.20066787. doi: 10.1101/2020.04.16.20066787 %J medRxiv.
      4. Watson J, Whiting PF, Brush JE. Interpreting a covid-19 test result. BMJ. 2020;369:m1808. Epub 2020/05/14. doi: 10.1136/bmj.m1808. PubMed PMID: 32398230.
      5. Woloshin S, Patel N, Kesselheim AS. False Negative Tests for SARS-CoV-2 Infection - Challenges and Implications. N Engl J Med. 2020;383(6):e38. Epub 2020/06/06. doi: 10.1056/NEJMp2015897. PubMed PMID: 32502334.
      6. Vogels CBF, Brito AF, Wyllie AL, Fauver JR, Ott IM, Kalinich CC, et al. Analytical sensitivity and efficiency comparisons of SARS-CoV-2 RT-qPCR primer-probe sets. Nat Microbiol. 2020. Epub 2020/07/12. doi: 10.1038/s41564-020-0761-6. PubMed PMID: 32651556.
      7. Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem. 2009;55(4):611-22. Epub 2009/02/28. doi: 10.1373/clinchem.2008.112797. PubMed PMID: 19246619.
    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this study Reijns et al developed a multiplex RT-PCR assay (alternative primer-probe sets and enzyme mixes) to detect SARS‐CoV‐2 and internal controls. The authors conclude that their assay performed equally well as established commercial kits. The authors also demonstrated that nose‐and‐throat swab samples have considerable variability in patient material content (>1,000‐fold variability). High variability is expected, but it is still important to substantiate this notion with numbers. Overall, I like the study and find it methodologically sound. Sample numbers in the tests are in most cases good. I have very few objections and hope to see the manuscript published soon.

      SPECIFIC COMMENTS:

      1."The COVID‐19 pandemic originated in Wuhan (China) in December 2019 and at the time of writing has infected more than 13.1 million people worldwide, resulting in well over 0.57 million COVID‐19‐related deaths..." I suggest a more timeless starting of the introduction, not pointing out exact number of infections and deaths since these numbers quickly become obsolete. The reader will know the severity of the pandemic and the importance of methodological development without statement of exact numbers. This comment reflects my personal opinion and it is completely up to the authors to choose how to phrase this section.

      2.Tables listing primers and probes should include the amplicon (PCR product) length for each primer-probe pair. Product length is an important consideration for fragmented RNA samples, such as for example heat-inactivated or longer-term stored samples. It should not be put on the reader to find out the amplicon lengths.

      3.Line 131: "To confirm sensitivity using total viral RNA, nucleic acids isolated from cultured SARS‐CoV‐2 were also used to make a dilution series (10^‐1 to 10^‐6)." I lack a methodological description how viral nucleic acid was quantified. It is not entirely trivial to separate viral RNA from RNA contributed from the cells used for the in vitro expansion of the virus.

      4.Line 150: "All positive and negative controls gave the expected results (Table S4)" I don't like the exact formulation since it is not clear for the reader what are the "expected results", including the "expected" quantitative results (Ct).

      Significance

      This study provides an alternative multiplex RT-PCR assay to detect SARS-CoV-2 infection. I find the results important and useful for the research and medical community.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Reijns and colleagues describe an approach to detect the causative agent of COVID-19, the beta coronavirus SARS-CoV-2, using an inexpensive in-house multiplex RT-qPCR. Concomitantly, viral E, N and RdRP(probe P2) as well as human RPP30 and a herpesvirus nucleic acid are also detected in order to monitor both the sample quality and the sample preparation. Reijns et al. performed testing on a huge amount of samples and used the data to describe the strength and limitations of the assay. The data is sound and give a very good impression of the 4-plex PCR capabilities. I read manuscript fluently and consider as linguistically very good. However, I still have a few comments and remarks that would strengthen the manuscript:

      Major issues:

      In the first section of the results section, many primer / probe conditions are given that make the reading flow difficult. Instead of using (data not shown) it would be helpful to use a table or a graphic to illustrate the various approaches. In general, I suggest to replace Ct by Cq, since the IVT standards are a quantification method.

      There has already been a change away from the initial E and RdRP gene based assay because of the published sensitivity issues and the use of degenerate bases as well as the detection of unspecific nucleic acids for E gene). In particular, it has been shown that the Sarbeco-E-yields false positive results (Toptan et al. 2020 (https://doi.org/10.3390/ijms21124396), Konrad et al. 2020 (https://doi.org/10.2807/1560-7917.ES.2020.25.9.2000173)), so that many laboratories do not consider E-gene-based results for borderline samples anymore. In this manuscript, the authors should comment on why they still use the results from the E gene / RdRP and describe their experience.

      In this manuscript, it should be indicated that the SARS-CoV-2 specific Probe P2 (according to Corman et al. 2020) was used. The reason for lower sensitivity due to nucleotide ambiguity and mismatch has to be explained in more detail. In addition to Corman et al. 2020 (see reference 2), Toptan et al 2020 (https://doi.org/10.3390/ijms21124396) might serve as helpful literature. With regard to the marginally positive samples that were not consistent in all assays, were the PCR products analyzed using high-resolution PAA genes and, if possible, sequenced? The sequencing approach (Sanger or NGS) offers the final characterization of the PCR products (especially for pan-genotypic primers such as E-Sarbeco). The samples declared as "inconclusive" could be further characterized in this way.

      The normalization in figure 3 should be also explained in the main text. Especially, why this approach was used for normalization. Nonetheless, it looks like the normalized values wills cluster much more strongly than those corresponding to the actual values. The authors should comment on this phenomenon. It appears that the higher cq values (less virus) are subject to a strong correction factor more often than high values. Are there any statistical relevant tendencies towards this phenomenon? For everyday clinical practice, does this mean that low samples Cqs (mostly) only reflect the quality of the sample, but not the viral load? Finally, it remains somewhat unclear to what extent the Cq values of the RPP30 should have an influence on the routine diagnostics. The authors discuss that a fixed cutoff value would be a possibility to sort out poor swab samples, but if a cq value is available it would also make sense to generate a kind of quality score that can display the significance of a test. It would be helpful if the authors could comment on this or other possibilities.

      Over the past few months, more and more virus subtypes have formed through the manifestation of point mutations (and amino acid substitutions). The authors should therefore definitely comment on the current strains as to whether all primers / probes are able to detect the virus variants circulating worldwide without loss of sensitivity. Along this line,which virus strains were used for the cultivation as described in line 131? Is sequence data available? If so, it would provide helpful information to characterize the viral strain.

      Line 206ff: In my opinion, this section belongs more to the discussion part than to material and methods that describe the technical implementation.

      Is there a loss of sensitivity compared to the single PCRs? This data is very important and useful for other users. They should therefore be included explicitly in the manuscript (supplements).

      Minor issues:

      Line 15 ff.: Source is missing, is this WHO-data?

      Fig S3: How was the digital droplet PCR carried out? A brief description should be included in the legend text.

      Figure 1a: PCR efficiencies are missing.

      Line 145: MS2 appears, but without explaining the context. This should be improved here with additional information (this does not appear until line 154).

      Page 15, H20 instead of H20, reaction mix instead of Reaction mix.

      Significance

      The novel coronavirus SARS-CoV-2 is the causative agent of the acute respiratory disease COVID-19 which has become a global concern due to its rapid spread and high death rate. While some patients have no symptoms at all, but are still able to spread the virus, others have severe symptoms, often with fatal outcome. The gold standard in SARS-CoV-2 detection is the RT-qPCR approach, however, the high cost commercial kits are available in limited amounts only. The issue of the scarcity of resources is still an highly important issue, especially in terms of the incredibly rapidly increasing number of cases worldwide. Thus, the manuscript is of significance for the field and timely. Especially, diagnostic laboratories in low-income countries that are involved in the managing the pandemic but also researchers will benefit from this manuscript and save resources.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In order to improve SARS-CoV-2 diagnostics, Reijns et al. developed a multiplexed RT-qPCR protocol that allows simultaneous detection of two viral genes, one housekeeping gene as well as an external gene as an extraction control. Compared to running parallel assays to detect genes individually, the turnaround time is much shorter and reagents are saved. Furthermore, the presented data suggest that the assay is more sensitive than commercial kits. The authors also propose the detection of the human housekeeping gene as a measure of sample quality control. In principal, this work has potential but the manuscript itself needs a better structure.

      Major concerns:

      The authors have used the Takara RT-qPCR kit for their study. Did the authors try other commercial kits? Can the authors elaborate on the supply chain of the Takara kit? Could it cover population testing in case of shortages of other commercial kits?

      For better comparison, is it possible to give information on which primers the commercial kits are based on? Also, explain better the primers used in this study. For example, the N1 and N2 primers are directed against different regions of the SARS-CoV-2 N gene.

      The result section needs a better structure as the first two pages do not refer to any of the main figures. For example, in which figure or table can the reader find the data that are discussed in lines 83 to 87?

      Table S1, instead of current Table 1, could be moved to main figures as it contains the important finding that the multiplexed assay may be more sensitive than the commercial one. The authors identified some samples that scored negative in commercial assays but positive in their new assay. This is important, however, the possibility of detecting false positives should be strengthened in a "Discussion" section.

      Figures 1 to 3 have different panels which seem to be redundant. For example, Fig 1 A and B, Fig 2 B and C, Fig 3 C and D.

      Figure 1: Give a rational why comparing before and after extraction. This heavily depends on the extraction method and not on the detection itself. In addition, IVT RNA does not reflect the complexity of a clinical specimen. This is rather confusing and deviates from the important findings.

      Figure 3: Were any of the negative samples/patients tested with an undetectable housekeeping gene, re-test positively? Did adding this housekeeping gene as a control actually improve the detection of any patient samples? If the authors want to convince the readership of this quality control, experimental evidence should be provided.

      Fig 3C and D seem to contain this information somewhat, as here, the values were normalized and the CT values for the E and N gene decreased. Nevertheless there is no real explanation of this figure provided in the Result section at all. While this figure has potential, the authors have to keep in mind that the number of cells in a swab can be affected by many biological factors, including age, sample timing, inflammation of the respiratory tract, etc. In addition, viral genomes can exist intra- as well as extracellular, in the form of free virus. So even in the absence of human cells/detectable housekeeping genomes, viral RNA can be or should be present in a sample in case of infection. This explains (probably) why a correlation between detectable housekeeping gene and viral RNA is absent (Fig 3A and B?). This entire Fig 3 just needs a better explanation. The provided text does not describe any results and should go into a "Discussion" section.

      Self-swabbing is surely a potential source of variability and false-negatives, but many publications have shown the suitability of saliva testing. This should also be discussed and would probably negate the need for such a quality control.

      Which assay works better, the N1E-RP or the N2E-RP assay? A final conclusion is missing here.

      Significance

      Naturally, in this pandemic, this topic is important as sensitive and affordable methods to detect SARS-CoV-2 infections are in need. This Reviewer agrees that multiplexing could be an elegant approach to fill this need.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their feedback and constructive comments to our work. We provide here a point-by-point response to the comments of Reviewers #1, #2 and #3 (text in grey and italic).

      Responses written in plain text correspond to Reviewer comments that have been addressed in the revised version of the manuscript provided at this stage of the review process (referred-to as “revised version I” below).

      Reponses written in bold text correspond to comments that need further experiments. The list of experiments we intend to perform to address these comments is provided in a separate document (Revision plan). The results of these additional experiments will be included in a later revised version of the manuscript referred-to as “revised version II” below.

      Reviewer #1

      The manuscript addresses an important topic, the posttranscriptional maturation of ribosomes. This topic is inherently interesting because we normally think of ribosome biogenesis as a sequential series of steps that automatically proceeds and cannot be "accelerated" in physiological conditions, but only "delayed" in the presence of genetic mutations. In short, the manuscript proposes that RIOK2 phosphorylation by the action of RSK, below the Ras/MAPK pathway promotes the synthesis of the human small ribosomal subunit.

      I honestly admit that I have some difficulties in reviewing this manuscript. The quality of the presented data is, in generally, good. However, overall I find the whole manuscript preliminary and I am not much convinced of the conclusions. Several aspects are superficially analyzed. In short, I think that most of the conclusions are not fully supported by the data because shortcuts are present. A list of all the aspects that I found wrong are listed.

      Biological issue

      1. _The authors claim that the effects of the inhibition of the maturation of ribosomes by acting on a pathway upstream of RIOk2 are limited to the 40S subunit. This is far from being a trivial point, for the following reason. RIOK2 is known to affect the maturation of 40S ribosomes. Hence, the fact that using an upstream inhibitor of the MAPK pathway such as PD does not inhibit 60S processing in reality would argue against a biologically relevant control in ribosome maturation (of the MAPK patheay). Have the authors considered this? In a way, also, given the fact that the mutants confirm a role in 18S final maturation, it is a bit complex to put all the data in a clear biological context.

      We agree that we put more emphasis on the effects on the pre-40S pathway than on the pre-60S pathway in the original manuscript but we did not claim that the effects of PD or LJH inhibitors of the MAPK pathway are restricted to the 40S subunit. We described that the effect of PD or LJH on the 32S was less severe than on the 30S, and we did mention variations of the 12S intermediate. These changes are in the same range of amplitude as the changes in the 21S and 18S-E intermediates in the small subunit pathway. The Northern blot data concerning the pre-60S pathway were placed in the supplementary material of the original manuscript, which may have left the reader with an impression of lesser emphasis. We rephrased this part in the present revised version I of the manuscript (Page 6, Line 26) and we now show the pre-40S and pre-60S intermediates on the same figures (Figures 1A and 1C).

      In addition, we will probe more exhaustively the intermediates of the pre-60S pathway in the revised version II of the manuscript as described in the revision plan. These data will be complemented with metabolic labeling experiments to provide a more dynamic analysis of the pre-rRNA processing defects resulting from inactivation of the MAPK pathway. Furthermore, as requested by Reviewer #2 (see below), we will quantify more accurately these data.

      A number of specific issues will be concisely described.

      Manuscript very well written. Data do not always support the strong conclusions. Low magnitude of the observed effects.

      In introduction the authors make a general claim that ribosome biogenesis is one of the most energetically demanding cellular activities. This statement lingers in the literature since 15 years but in reality it has never been formally proved for mammalian cells, and certainly not for HEK293 cells. The original statement, to my knowledge, can be traced by some obscure statement referred to the yeast case and then repeated as a truth. In conclusion, beside being a very banal observation, it should be referenced.

      We agree with this comment of Reviewer #1. The original statement has been proposed by Jonathan R. Warner (Warner, 1999, TiBS and references therein) and data from the Bähler group also supported this statement (Marguerat et al., 2012, Cell). However, these data were indeed referring to yeast (S. cerevisiae and S. pombe). In the present revised version I of the manuscript, we introduced the reference of a review providing quantitative data of ribosome biogenesis in human cells (Lewis & Tollervey, 2000, Science) and we modified the problematic sentence as follows:” Growing human cells produce around 7500 ribosomal subunits per minutes (Lewis and Tollervey 2000), which represents a significant expenditure of energy.” (Page 4, Line 1).

      Growth factors, energy status are not cues but are proteins or metabolites (introduction).

      We agree with this comment of Reviewer #1. We changed the text accordingly in the revised version I of the manuscript (Page 4, Line 8).

      Authors write about mTOR without making statements on mTORC1/2. This is very obsolete. Also I am not sure that the choice of Geyer et al., 1982, and subsequent papers makes much sense. At the very minimum TOP mRNA concepts and mTORC1 must be defined.

      We provide more details on the mTOR pathway in the revised version I of the manuscript according to Reviewer #1’s suggestions (Page 4, Line 13 and Page 5, Line 3).

      The authors claim that their work fills a major gap between known functions of MAPK and cytoplasmic translation. I would not be so sure about it.

      Our original sentence stated that “our work fills a major gap between currently known functions of MAPK signaling in Pol I transcription and cytoplasmic translation”. Indeed, although MAPK signaling was known to regulate Pol I transcription and cytoplasmic translation, the impact of the pathway on the post-transcriptional steps of ribosome synthesis, namely pre-ribosome assembly and maturation, has been very little investigated and remains poorly understood. Our data provides the first example of a detailed mechanism of regulation of the maturation of pre-ribosomal particles by the MAPK pathway. Reviewers #2 and #3 seem to agree with this point:

      Reviewer #2: “However, there is a lacking mechanistic connection of signaling pathways to pre-rRNA processing and maturation steps of ribosome biogenesis. The authors set out to provide a specific example of a direct target of MAPK signaling, RSK that regulates pre-rRNA maturation through the phosphorylation of a ribosome assembly factor (RIOK2), offering for the first time providing mechanistic insight into MAPK regulation of pre-rRNA maturation.

      Reviewer #3: “With these provisos, the work is technically good and will be of considerable interest to the field. The post-transcriptional regulation of ribosome synthesis is increasingly recognized a significant topic.

      Results. Authors start with a major mistake, i.e. that PMA selectively stimulates the MAPK pathway. Perhaps it stimulates, certainly it does not do it selectively.

      We agree with this comment of Reviewer #1. We removed the term “selectively” in the problematic sentence (Page 6, Line 8).

      RIOK2 phosphosites are first found by bioinformatics analysis. It should be noted that the predicted phosphosite (S483) is found only in a limited set of datasets from MS databases. The actual importance of this site would not emerge from unbiased studies. Also, there are many other phosphosites that were not analyzed in this study.

      We agree with Reviewer #1 that phosphorylation of S483 of RIOK2 has been detected in a limited number of mass spectrometry datasets, but these datasets have been reported in high impact journals (Nature Methods, Mol Cell Proteomics, Science), attesting of the quality of these studies

      As mentioned by Reviewer #1, there are several other phosphosites within RIOK2 that were not analyzed in our study. We provided the list of these phosphosites in Supplementary Table S1 of the original manuscript. Besides T481 and S483, none of the other sites belong to consensus motifs recognized by ERK or RSK at medium and high stringency. They are therefore less relevant to our study. We only analyzed phosphorylation at S483 because: (i) our mass spectrometry analysis revealed that S483 is the only phosphosite in RIOK2 whose level increases upon MAPK activation but not in the presence of the MAPK inhibitor PD184352 (Figure 2B); (ii) our in vitro kinase assay showed that the phosphorylation level of RIOK2 by RSK is residual when S483 is replaced by a non-phosphorylatable alanine (Figure 3D); (iii) our data presented in Figure 2C further show that mutation of T481 to an alanine does not prevent RIOK2 phosphorylation on RxRxxS/T motifs upon stimulation of the MAPK pathway.

      We clarified this point in the relevant part of the result section of the revised version I of the manuscript (Page 7, Lines 16 and 24, Page 8, Line 17 and Page 9, Line 5).

      Throughout the paper the authors use the word strongly, significantly, but the actual effects seem in general quite marginal.

      We agree with Reviewer #1 that some of the phenotypes described in the manuscript are modest, in particular the phenotypes resulting from the S483A mutation of RIOK2, which is not aberrant for a point mutation. We rephrased several sentences throughout the manuscript to soften the formulation in the description and interpretation of the data and in the conclusions.

      Discussion. The authors claim that they provide solid evidence on MAPK signalling to ribosome maturation. At the very best this is circumstantial evidence for the 40S maturation.

      We rephrased the sentence accordingly (Page 16, Line 5): “Our study provides evidence that MAPK signaling applies another level of coordination during ribosome biogenesis, by directly regulating pre-40S particle assembly and maturation.

      Figure 1.

      Unclear why LJH should increase P-ERK.

      A negative feedback loop has been described in the MAPK pathway whereby RSK activation partially inhibits ERK phosphorylation (Saha et al., 2012, Horm Metab Res; Dufresne et al., 2001, MCB; Schneider et al., 2011, Neurochem; Re Nett et al., 2018, EMBO Rep). Inactivation of RSK with LJH alleviates this inhibition, which results in increased phosphorylation levels of ERK.

      We added this information in the revised version of the manuscript along with the corresponding references (Page 6, Line 17).

      General lack of quantitation (sd, replicates, bars). Experiment done only on a single cell line in a single experimental setup.

      As also requested by Reviewer #2 (Major comment 1.), we applied in the revised version I of the manuscript RAMP quantifications to all Northern blot data. We included error bars corresponding to biological replicates.

      Furthermore, in order to validate the impact of the MAPK pathway on pre-ribosome assembly and maturation, we plan to perform the same experiments using PD inhibitors in different cell lines and we will provide a figure with accurate RAMP quantifications, error bars and statistical significance, in the revised version II of the manuscript (see revision plan).

      Very different effects on 21S by LJH, PMA and siRNA for RIOK2. Overall the message given by the authors is to me mysterious.

      We assume that the reviewer wanted to point out the difference between PMA, PMA+LJH and shRNA for RSK since we did not perform RNAi targeting RIOK2. We agree with this comment. We believe that this difference is likely due to experimental setups that are different between both experiments. In the experiment using inhibitors, we assessed short-term effects of RSK inhibition after acute stimulation of the MAPK pathway (starved cells stimulated with PMA), while in the experiment using shRSK, we monitored long term effects of RSK depletion in serum-growing cells in which other signaling pathways are also active. Prolonged RSK depletion is likely to induce pleiotropic cellular effects, which would interfere with ribosome biogenesis both directly and indirectly. These differences probably explain the variable effects on the 21S intermediate. However, in both experiments we do observe an accumulation of the early 30S intermediate, consistent with the phenotype observed when ERK is inactivated (PD inhibitor), therefore indicating that RSK regulates some post-transcriptional stages of ribosome biogenesis.

      To make our results clearer we have withdrawn the experiments using shRSK to avoid the risk of showing indirect effects due to the prolonged absence of RSK. Instead, we included RAMP analyses with error bars from 2 biological replicates using PD and LJH inhibitors (Figure 1B).

      Figure 2.

      Several red flags. For instance in 2C the loaded levels of RIOK2-HA loaded are clearly less than the ones of the other genotypes, hence the conclusion on P-RIOK2 is not convincing.

      Our aim in this experiment was to compare the impact of PMA treatment on the phosphorylation levels of different RIOK2 mutants (T481A, S483A, double mutant). For a given mutant, the levels of RIOK2 loaded in the two conditions (i.e. not stimulated and PMA stimulated) are very similar and we therefore assume that our conclusions are valid.

      We nevertheless plan to repeat these experiments and quantify the data for the revised version II of the manuscript.

      Staining with anti-P RIOK2 lacks controls, how can be sure that the signal is due to the phosphate? Phosphatase treatment?

      We fully agree with Reviewer #1 and we did perform an experiment showing that the phosphorylation signal disappears following treatment of the protein extracts with λ-phosphatase. We did not show these data in the original version of the manuscript because of space limitations. We added these data in the supplementary material of the revised version I of the manuscript (Supplementary Figure S2B) and amended the text accordingly (Page 7, Line 24)

      Why FBS does not lead to ERK staining in HEK293? There are plenty of growth factors in FBS that should lead to ERK phosphorylation. I do not understand this experiment.

      We agree with this comment. Addition of serum to starved cells does lead to ERK and RSK phosphorylation but with a much lesser efficiency compared to stimulation by EGF and PMA. ERK phosphorylation is barely visible on the exposure shown in Figure 2D but RSK-phosphorylation is clearly observed, although the signal is much weaker compared to EGF and PMA treatments. It is common to observe a stronger response with purified PMA and EGF (see Carrière et al., 2011, JBC ; Ray et al., 2013, Oncogene). There are indeed several growth factors in the serum, but the most abundant (Insulin, IGF1, TGF) are present at ng/ml concentration, while EGF is used at 25 µg/ml in Figure 2D. Moreover, they are not very strong activators of the Ras/MAPK pathway, and it is also possible that after 20 min of FBS treatment the phosphorylation is in the decreasing phase.

      In the present revised version I of the manuscript, we included a set of western blots from another experiment showing the same results but of better quality to make the effects more visible (Fig. 2D). We also provided quantifications of phosphorylation of RIOK2 and associated statistical analyses (Fig. 2E).

      Figure 3. In vitro phosphorylation, if I understood, it relies on a truncated version of RIOK2. Why? Is the folding of the full length protein not permissive to in vitro phosphorylation?

      We did not test phosphorylation of the full length RIOK2 protein in vitro because RIOK2 has been reported to auto-phosphorylate (Zemp I. et al., 2009, JCB) and we were concerned that this auto-phosphorylation activity of RIOK2 in addition to RSK phosphorylation may render this experiment inconclusive.

      HA-RSK3 is less?

      It was reported that RSK3 is insoluble when over-expressed (Zhao et al., 1996, JBC), which explains the lower levels of protein recovered in our soluble extract. The information was present in the legend of Figure but we transferred it to the main text of the result section in the present revised version I of the manuscript (Page 10, Line 3).

      Figure 4. Immunofluorescence is low mag, difficult to understand.

      We agree with Reviewer #1. We modified the FISH experiment figure to show cells with a higher magnification and we provided more details in the text (Page 12, Lines 20-25) to facilitate the understanding of the data.

      I really like the experiments with RIOK2 mutants, however I wonder what about protein levels after the knock-in? Given the 18S phenotype overlap between the phenotype of the RIOK2 loss of function with the S483A, testing protein level becomes of the utmost importance.

      We checked RIOK2 protein levels and observed that the mutations do not decrease the level of RIOK2. On the contrary, the mutations slightly increase RIOK2 levels. Therefore, we are pretty confident that the phenotypes resulting from expression of RIOK2 mutants do not result from defects in the global accumulation of the protein. These data have been added to Figure 4C of the revised version I of the manuscript and we amended the text accordingly (Page 12, Line 5).

      Figure 5. Low quality IFL.

      Our aim in preparing this figure was to show many cells in the different images to show that the effect of our mutation was homogenous at the level of cell populations. The drawback is that cells are small and look blurred. We improved the quality of the figure in this revised version I of the manuscript with new images from the same experiment, showing less cells with a higher magnification.

      Hard to think that histogram quantitation of nuclear versus cytoplasmic staining are reliable in the absence of fractionation, better quantitation, experiment done in other cell lines and so on.

      We provide in this revised version I of the manuscript a supplementary figure explaining the procedure we used to quantify the fluorescence data (Supplementary Fig. S7).

      Furthermore, to confirm this result using other experimental conditions and cell lines, we will transfect HEK293 and HeLa cells with plasmids expressing GFP-tagged RIOK2 WT or the S483S mutant and we will compare the kinetics of nuclear import of both proteins upon inhibition of pre-40S particle export by leptomycin B using fluorescence microscopy and GFP quantifications. Second, we will transfect HeLa cells with plasmids expressing HA-tagged RIOK2 WT or S483A and perform fractionation assays to monitor their presence in both cytoplasmic and nuclear compartments. We will include these data in the revised version II of the manuscript.

      However, very beautiful Fig. 5E perhaps the best of the paper shows also mobility shift driven by S483, thus supporting posttranslational modifications.

      We thank Reviewer #1 for this comment. We added the note on the evidence of RIOK2 post-translational modification in the result section (Page 14, Line 9).

      Fig. 6. IFL studies are really impossible to interpret.

      We improved the quality of the figure with new images from the same experiment, showing less cells with a higher magnification. NOB1 IF data and quantifications have been transferred to the supplemental material (Supplemental Fig. S4A and S4B) to clarify the figure. In addition, we provided more explanations on the principle of this experiment and expected results in the text (Page 15, Line 9).

      The effects on RIOK2 release (this figure) and 18S maturation (Fig. 5) are very clear and of great quality.

      We thank Reviewer #1 for this comment.

      Overall conclusions. The manuscript tends to overinflate the meaning of several experiments. What to me is very clear and interesting is that the the authors provide clear evidence that S483A mutants have a defect in 40S maturation. Whether this is due to MAPK signalling, is only circumstantial. I would suggest to build up on the strong findings and eliminate ambiguous data.

      We do not fully agree with this comment of Reviewer #1. If mutation S483A were simply a partial loss of function mutation, this would not be of strong interest for the subject of this manuscript. It would just indicate that S483 is important for RIOK2 function independently of its phosphorylation status. Our data show that the impact of S483 mutation on pre-rRNA processing and other phenotypes is different depending on whether the serine is converted to an alanine (phosphorylation mutant) or to an aspartic acid (phospho-mimetic mutation). These data are a strong indication that what matters is not simply the serine residue by itself but its phosphorylation status.

      Reviewer #1 (Significance (Required)):

      The paper deals with an important topic, namely whether a regulation of ribosome maturation exists, and how it is mechanistically regulated. In this context, the analysis of the ERK pathway is highly needed considered that most works deal with effects of the PI3K-mTOR pathway, and the parallel, yet important RAS-ERK pathway, is less understood.

      As a final note, we should consider that S6K downstream of mTOR, and ribosomal S6K, downstream of ERK have been considered to share some substrates.

      We introduced this information in the revised version of the manuscript (Page 19, Line 20). A related comment has been raised by Reviewer #3 (see below, Caveat #2).

      The manuscript is interesting, but several statements given by the authors are rather superficial. An example, listed in the previous section, relates to the linguistic usage of mTOR kinase, instead of detailing whether we are dealing with mTORc1 or mTORc2.

      We agree with this comment of Reviewer #1. Given that the main focus of this manuscript is the regulation by the MAPK pathway, we had chosen to put less emphasis on mTOR in the introduction. However, we added more precise information on mTOR in the present revised version I of the manuscript to address this comment (Page 4, Line 13 and Page 5, Line 3).

      A second gross mistake is the definition of PMA as a stimulator of the ERK pathway. If this is certainly true, this is historically not correct as seminal papers by the group of Parker define this drug as a stimulator of conventional PKC kinases. In short, this paper is a step back in knowledge from the perspective of the literature context.

      We are a bit confused by this comment because seminal papers from the Parker group clearly state that PMA activates the MAPK pathway via PKC (Adams and Parker, 1991, FEBS Lett.; Ways et al., 1992, JBC; Whelan et al., 1999, Cell Growth Differ.). We agree, as mentioned earlier by Reviewer #1, that PMA is not specific to MAPK, a comment that has been addressed above.

      All people interested to the crosstalk between ribosome maturation and signaling pathways will be certainly read this manuscript.

      My expertise is within the ribosome biology and signalling field.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      There have been mechanistic connections of various signaling pathways to regulation ribosome biogenesis steps including rDNA transcription by RNA polymerase I and III, ribosomal protein transcription, and differential mRNA translation efficiency. However, there is a lacking mechanistic connection of signaling pathways to pre-rRNA processing and maturation steps of ribosome biogenesis. The authors set out to provide a specific example of a direct target of MAPK signaling, RSK that regulates pre-rRNA maturation through the phosphorylation of a ribosome assembly factor (RIOK2), offering for the first time providing mechanistic insight into MAPK regulation of pre-rRNA maturation.

      The authors observe slight pre-rRNA processing defects upon the use of RSK inhibitors and RSK depletion. They identified several candidate ribosome assembly and modification factors containing the canonical RSK substrate motif, including the RIOK2 kinase. Phosphorylation at this motif was verified to be specifically phosphorylated by RSK1 and 2 isoforms in cells and in an in-vitro kinase assay. The authors produced RIOK2 knock-in eHAP1 cell lines expressing non-phosphorylatable or phosphomimetic versions of RIOK2, observing slowed cellular proliferation, decreases in global translation, slight pre-rRNA processing abnormalities, but not changes in overall mature 18S rRNA levels. More specifically, the authors defined the inability of RIOK2 to be phosphorylated leads to defects in RIOK2 dissociation from the pre-40S ribosomal subunit in an in-vitro assay, and inability for it to be recycled for reuse in pre-ribosome export from the nucleus to the cytoplasm by immunofluorescence.

      Overall, the authors provide an interesting mechanism of MAPK regulation of a ribosome assembly factor RIOK2. However, they fail to provide the necessary reproducibility, controls, quantification, and consistent results between experiments to support their hypotheses.

      Major Comments:

      1. The northern blots reported throughout the manuscript are lacking proper reproducibility and quantification. First, the northern blots are lacking a loading control, which is necessary to report fold changes that are being measured across treatments. Please include a proper loading control (i.e. 7SL or U6 RNAs). Additionally, more rigorous analysis of the pre-rRNA precursor levels through ratio analysis of multiple precursors (RAMP) (Wang et al 2014) can be completed to provide a clearer depiction on which precursor(s) are accumulating. It is unclear for the Figure 1 northern blots if there were replicates completed and what the error bars represent in Figure 1B. Please report replicates, so that statistical analysis can be completed on the differences in precursor relative abundance. This need is emphasized by the small changes observed in pre-rRNA levels (less than 2 fold) between conditions.

      As mentioned above (Reviewer #1), we applied in the revised version I of the manuscript RAMP quantifications to all Northern blot data. These quantifications are shown as separate panels in the figures of the revised manuscript.

      Furthermore, we are planning to repeat the Northern blot experiments of Figure 1 to obtain biological replicates in other cell lines. We will probe the membranes to detect the 7SL RNA as a loading control in all these experiments. We will perform RAMP analyses on all these Northern blot experiments to provide more accurate quantifications of the pre-rRNA levels in the different conditions. These data will be included in the revised version II of the manuscript.

      1. The western blots reported throughout the manuscript are lacking proper reproducibility and quantification. For example, the western blots validating RSK1 and RSK2 depletion in Figure 1C lack a proper loading control. Additionally, it is unclear if there are replicates completed and there is lack of statistical analysis to determine if the changes are significant. Please include loading controls, replicates, and quantification of the western blots throughout the manuscript.

      We have included actin levels as loading controls in several figures (Figures 2D, 3A, 3C, 3E, 4C) of the revised version I of the manuscript. We also added phosphorylated Rps6 at Ser235/36 to monitor RSK activity in Figures 1A, 2D, 3A.

      We provided quantifications and associated statistical analyses of phosphorylation of RIOK2 presented in Figures 3A and 3C of the revised version I of the manuscript. We also included quantifications of the in vitro phosphorylation assays presented in Figures 3F and 3G.

      We are nevertheless planning to repeat and quantify more accurately the western blot experiments presented in Figures 2A, 2C and 3E of the revised version I of the manuscript. These data will be included in the revised version II of the manuscript.

      1. Please report the full bioinformatic analysis of the RSK substrate motif search among human AMFs including other AMFs found in this search. A sorted list format would be valuable for the reader to understand other potential RSK substrates involved in ribosome biogenesis.

      We understand the request of Reviewer #2. Providing the full list of AMFs identified in our bioinformatic screen would be valuable for the reader, mostly because it would make clearer that RSK seems to be regulating multiple stages of the pre-ribosome maturation pathway, therefore that RSK inhibition induces pleiotropic defects in ribosome synthesis. However, we are currently working on a more global study of the impact of MAPK regulation on the post-transcriptional steps of ribosome synthesis that we would like to publish in a near future.

      1. The authors report that RSK inhibition/depletion leads to accumulation of the 30S pre-rRNA, yet mutation of its target site on RIOK2 or RIOK2 depletion leads to an accumulation of the 18S-E pre-rRNA. Additionally, the phosphomimic mutation of RIOK2 leads to an accumulation of 30S, the opposite of the expected result. Please elaborate on this discrepancy in processing defects observed across experiments.

      In contrast to RIOK2 which is specifically involved in the late, cytoplasmic stages of the maturation of the pre-40S particles, RSK regulates ribosome biogenesis at multiple levels. Upon activation of the MAPK pathway, RSK activates Pol I transcription in the nucleoli and promotes translation of mRNAs encoding ribosomal proteins and AMFs. In addition, our bioinformatic screen identified several AMFs at different stages of the maturation pathway of both ribosomal subunits as potential targets of RSK. These considerations imply that RSK inhibition is expected to impact ribosome biogenesis at multiple levels (Pol I transcription, availability of RPs and AMFs, export of the pre-ribosomal particles, probably several maturation steps) whereas RIOK2 inactivation more specifically delays 18S-E processing in the cytoplasm. In terms of processing, RSK inhibition induces a significant accumulation of the 30S intermediate. This is another evidence that RSK regulates pre-rRNA processing at several stages. This phenotype might result, as recently described in yeast (Yerlikaya et al., 2016, MCB), from an inhibition of RPS6 phosphorylation which affects its early incorporation into pre-ribosomes, although this has not been demonstrated in human cells. This 30S precursor accumulation affects production of the downstream intermediates and we strongly believe that this precludes accumulation of 18S-E even if the activity of RIOK2 is affected. Given the broad implication of RSK at different stages of ribosome biogenesis, it is biologically relevant to observe that inactivation of RSK does not result in the same processing defects as inactivation of RIOK2.

      We nevertheless tried to make this point clearer in the present revised version I of the manuscript. We added in the supplementary material a diagram (Supplementary Fig. S1C) showing all the known and hypothetical targets of ERK and RSK in ribosome synthesis to provide the readers with a global view of the function of RSK in this process and refer to this figure in the introduction and results. In the introduction, we also emphasize more on the multiple aspects of the regulation of ribosome synthesis by ERK and RSK (Page 4, Line 18).

      Concerning the phospho-mimetic mutant, it does accumulate slightly the 45S and 30S intermediates contrary to the non-phosphorylatable mutant but this is not totally unexpected. RIOK2 is incorporated into pre-ribosomes in the nucleus, at a stage that remains unclear, and constitutive RIOK2 phosphorylation may interfere with this recruitment and affect processing at an earlier stage. This point has been addressed in the discussion of the revised version I of the manuscript (Page 18, Line 7).

      Are there similar results for RSK depletion/inhibition and RIOK2 release from the pre-40S and inability to import into the nucleus? If so, this could provide phenotypic consistency between these two proteins in the proposed pathway to further support the hypothesis.

      We performed the same experiments as reported in Figure 6C to try to demonstrate a cytoplasmic retention of RIOK2 after leptomycin B treatment upon ERK inhibition (PD treatment). We also performed IF and cell fractionation experiments upon PD treatment. In all cases, we failed to observe the expected result. We strongly believe that we are facing here the same problem as described above for the previous comment of Reviewer #2. ERK and thus RSK inhibition leads to accumulation of the early, nucleolar 30S intermediate, indicating that the processing pathway is significantly blocked at an early stage preceding formation of the pre-40S particles in which RIOK2 is recruited. This early blockage most likely explains why we do not see the same phenotypes. We discussed this comment in the discussion section of the revised version I of the manuscript (Page 18, Line 19).

      1. Mature levels of 18S rRNA are not altered in the RIOK2 mutant cell lines. This could be due to compensation in these mutant cell lines since RIOK2 is essential.

      We agree with Reviewer #2 that compensation mechanisms may operate to restore mature 18S rRNA levels despite RIOK2 mutation. On the other hand, although RIOK2 is indeed essential, we may expect that the point mutation of S483 only partially affects RIOK2 function and delays the maturation of pre-40S particles but not to a sufficient extent to impact the mature 18S rRNA levels. This has been observed by others (Montellese et al., 2017, NAR; Srivastava et al., 2010, MCB).

      We added this point in the discussion section of the revised version I of the manuscript (Page 19, Line 9).

      Please report the mature 18S rRNA levels upon shRNA depletion and RSK inhibitors to provide insight into if this pathway significantly alters mature 18S rRNAs as a mechanism for the altered translation and proliferation observed.

      We will probe the levels of the mature 18S and 28S rRNAs in these experiments and the results will be included in Figure 1 of the revised version II of the manuscript.

      Minor Comments:

      1. Figure 1A lower: The authors use an RSK inhibitor LJH685, that does not inhibit RSK phosphorylation S380. Therefore, another verification of RSK inhibition must be used besides RSK-pS380 abundance as for PD184352 inhibition. Please validate the usage of this RSK inhibitor in the experiments by inclusion of quantification of a direct downstream substrate of RSK, such as YB1-pS102 quantification.

      We agree with Reviewer #2. We have probed the membrane with anti-RPS6 and anti-phosho-RPS6 antibodies to show the effect of LJH treatment on RPS6 phosphorylation. These data have been added to Figure 1A in the revised version I of the manuscript and the text has been updated (Page 6, Line 16).

      1. Page 7, Lines 8-12: The authors state that RSK knockdown led to increases in the 45S, while the LJH685 treatment led to no changes in 45S levels due to differences in growth conditions. Please elaborate more on how growth conditions would alter 45S pre-rRNA levels. It would be expected that stimulation of the MAPK pathway would increase pre-rRNA transcription compared to steady state growth conditions. However, pre-rRNA processing northern blots are only measuring steady state levels of the precursors. Thus, an rDNA transcription assay would need to be completed to evaluate these differences.

      We do observe that PMA treatment of starved cells induces an increase in 45S precursor levels, consistent with an increase in transcription but we agree that northern blot experiments measure the steady-state levels of the intermediates.

      To address this comment, we propose to perform short pulse labelings with ortho-phosphate to assess synthesis of the 45S precursor independently of its processing in the different conditions. These data will be included in the revised version II of the manuscript.

      1. Figure 2C: Please quantify these results to properly evaluate the role of these two phosphorylation sites in MAPK signaling.

      We will repeat these experiments and quantify the results in the new version of Figure 2C.

      1. Please include the RIOK2 pS483 antibody generation methodology used in this study.

      We added this information in the Materials and Methods section of the revised version I of the manuscript (Page 21, Line 22).

      1. In vitro kinase assay methods: Is the recombinant RSK1 the human version of the protein? Please clarify in methods.

      Human recombinant RSK1 has been purchased from SignalChem. The information has been added in the revised version I of the manuscript (Page 30, Line 5).

      1. Figure 4B: Please include statistical analysis of the puromycin incorporation assay.

      We performed a statistical analysis of this assay out of 3 replicates. This analysis has been included in the present revised version I of the manuscript (Figure 4B).

      1. Page 13, Line 18: Please explain why RIOK2 co-IP with NOB1 is important.

      We added this explanation in the result section of the revised version I of the manuscript (Page 14, Line 3).

      1. In vitro dissociation assay: There is no control for pulldown of entire pre-40S particles and not just NOB1 protein. Thus, it is unclear if RIOK2 is dissociating from NOB1 or entire pre-40S particles. Please reference previous literature of the methodology of this experiment if applicable. Additionally, please include controls, such as western blotting of ribosomal proteins or northern blotting of rRNA in the pulldown fraction used.

      We agree with Reviewer #2. We have probed the membranes with antibodies detecting LTV1 and ribosomal protein RPS7 to show that the entire pre-40S particle is indeed pulled down. These additional data have been added in Figure 6A of the revised version I of the manuscript and the text has been amended accordingly (Page 14, Line 20).

      1. Page 16, Lines 10-12: The authors state "RSK facilitates the release of RIOK2 and other AMFs", however the only other AMF in this study was NOB1. Please reword appropriately that most likely facilitates release of RIOK2 and other AMFs in a RIOK2 dependent or independent manner if it also phosphorylates other AMFs which possess the motif.

      We agree with Reviewer #2 and we changed the text accordingly (Page 16, Line 11) but we did not introduce the hypothesis that RIOK2 may target directly other AMFs of late pre-40S particles which possess the motif because our in silico screen did not identify consensus RXRXXS/T motifs in any of these factors.

      Reviewer #2 (Significance (Required)):

      This manuscript is significant due to the lack of mechanistic connection of cellular signaling pathways to pre-rRNA processing. There have been, for the most part, no mechanistic connection of signaling pathways to pre-rRNA processing regulation and none for direct targets of MAPK signaling (Reviewed in Gaviraghi et al 2019). They provide the groundwork for analysis of MAPK signaling in regulation of an assembly factor and inclusion of their motif analysis could provide RSK signaling targets' regulation of specific steps of ribosome biogenesis that remain to be elucidated.

      Although the research delves into a specific mechanism, its audience could be far reaching as it is in the ribosome biogenesis field and MAPK signaling, which have broad implications in cancer and developmental diseases.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The authors report that inhibition of MAPK signaling via RSK is associated with modest alterations in the relative abundance of human pre-rRNA species, that are most marked for 30S but also visible for 21S - although not clearly shown for 18S-E.

      RIOK2 has two closely spaced sites predicted as RSK targets, one of which was confirmed to be MAPK sensitive and shown to be an RSK substrate in vitro. Substitution of Ser483 with Ala was associated with reduced growth and 18S-E accumulation, consistent with impaired NOB1 cleavage activity. RIOK2-S483A also showed greater pre-ribosome association in vivo and consistent with this, more stable association in vitro and increase cytoplasmic residence. These effects are clear, although the data do not directly demonstrate their linkage to loss of RSK phosphorylation.

      The mutations were apparently generated directly in the genome of haploid cells, potentially raising concerns that the introduction of a deleterious mutation might have been accompanied by compensatory mutations elsewhere. However, three cells line gave similar results, mitigating this concern.

      Specific comments:

      1. To help the reader, the authors should directly discuss why they think the data on MAPK inhibition did not reveal a clearer pre-18S cleavage phenotype, as would have been expected for loss of RIOK2 activity.

      This comment is similar to major comment #4 of Reviewer #2.

      Please refer to the above response.

      1. Fig. S3: The degree of RSK depletion with the siRNAs appears very modest, as are the effects on RIOK2-P. Moreover, the double depletion is not clearly better than single depletions. These data should probably be supported by quantitation or withdrawn._

      We agree with Reviewer #3 that the effects shown in this figure are modest but we originally chose to show these data because their further supported the role of RSK in RIOK2 phosphorylation at S483 in complement to Figure 3.

      We have withdrawn this figure from the present revised version I of the manuscript.

      1. Fig. 5D: For 18S-E recovery with RIOK2, is the ratio adjusted for the increase in 18S-E abundance in the mutant - ie is recovery increased when adjusted for the increased pre-rRNA abundance?_

      In these experiments, the tagged versions of RIOK2 WT and S483A have been expressed ectopically from plasmids in cells expressing the endogenous wild-type protein. RIOK2 S483A does not behave as a dominant negative mutant in these conditions and does not induce 18S-E accumulation, as shown in the northern blot analysis of the 18S-E levels in the cell lysates (lower panel). This information is indicated in the revised version I of the manuscript (Page 13, Line 26).

      Reviewer #3 (Significance (Required)):

      Overall, the analyses on the phenotype of RIOK2-S483A, and the demonstration that this site is an RSK target, appear convincing.

      Caveats are

      1) the phenotype seen on inhibition of RSK, would not have implicated RIOK2 as the obvious candidate for the factor responsible for the observed processing defects;

      We agree with this comment, which has also been raised by Reviewer #2 (Major comment 4.). We provide several evidence in the manuscript that RSK phosphorylates RIOK2 on S483 in vivo and in vitro (Figure 3). However, as explained above in response to Reviewer #2, we cannot correlate the in vivo phenotypes resulting from RSK or RIOK2 inactivation for biological reasons. As mentioned in the introduction, RSK regulates multiple substrates at different stages of ribosome biogenesis (Translation of RPs and AMFs, Pol I transcription, pre-ribosome maturation and export), whereas RIOK2 is specifically implicated in the cytoplasmic maturation of pre-40S particles. Inactivation of RSK is therefore expected to induce pleiotropic defects in ribosome biogenesis, and in particular early defects (Reduced Pol I transcription, 30S precursor accumulation) that preclude observation of the expected phenotype linked to RIOK2 inactivation, i.e. 18S-E accumulation.

      We nevertheless tried to clarify this point as described in the response to Reviewer #2, major comment 4.

      2) the RIOK2-S483A phenotype is not demonstrated to be RSK dependent. This raises the possibility that, although RSK can phosphorylate S483, the effects of the mutation are not due to the loss of this modification.

      As mentioned by Reviewer #3, our data show that RSK can phosphorylate RIOK2 S483 in vitro and in vivo (Figure 3). We believe that Figure 4C strongly suggests that the accumulation of the 18S-E in cells expressing RIOK2 S483A mutant is due to the loss of S483 phosphorylation, since mutation of S483 to an aspartic acid (S483D), generally considered as a mutation mimicking a phosphorylated serine, does not affect 18S-E maturation. However, although our manuscript provides many lines of evidence identifying RSK as the kinase responsible for RIOK2 phosphorylation at S483, we cannot formally exclude that other AGC kinases involved in growth and proliferation, such as S6K or Akt, may also be involved redundantly or alternatively. Our data presented in Figure 3A showing that treatment of cells with the RSK inhibitors LJH decrease RIOK2 phosphorylation at S483 support a specific role of RSK.

      We developed this point in the discussion section (Page 18, from Line 25).

      With these provisos, the work is technically good and will be of considerable interest to the field. The post-transcriptional regulation of ribosome synthesis is increasingly recognized a significant topic.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      There have been mechanistic connections of various signaling pathways to regulation ribosome biogenesis steps including rDNA transcription by RNA polymerase I and III, ribosomal protein transcription, and differential mRNA translation efficiency. However, there is a lacking mechanistic connection of signaling pathways to pre-rRNA processing and maturation steps of ribosome biogenesis. The authors set out to provide a specific example of a direct target of MAPK signaling, RSK that regulates pre-rRNA maturation through the phosphorylation of a ribosome assembly factor (RIOK2), offering for the first time providing mechanistic insight into MAPK regulation of pre-rRNA maturation.

      The authors observe slight pre-rRNA processing defects upon the use of RSK inhibitors and RSK depletion. They identified several candidate ribosome assembly and modification factors containing the canonical RSK substrate motif, including the RIOK2 kinase. Phosphorylation at this motif was verified to be specifically phosphorylated by RSK1 and 2 isoforms in cells and in an in-vitro kinase assay. The authors produced RIOK2 knock-in eHAP1 cell lines expressing non-phosphorylatable or phosphomimetic versions of RIOK2, observing slowed cellular proliferation, decreases in global translation, slight pre-rRNA processing abnormalities, but not changes in overall mature 18S rRNA levels. More specifically, the authors defined the inability of RIOK2 to be phosphorylated leads to defects in RIOK2 dissociation from the pre-40S ribosomal subunit in an in-vitro assay, and inability for it to be recycled for reuse in pre-ribosome export from the nucleus to the cytoplasm by immunofluorescence.

      Overall, the authors provide an interesting mechanism of MAPK regulation of a ribosome assembly factor RIOK2. However, they fail to provide the necessary reproducibility, controls, quantification, and consistent results between experiments to support their hypotheses.

      Major Comments:

      1.The northern blots reported throughout the manuscript are lacking proper reproducibility and quantification. First, the northern blots are lacking a loading control, which is necessary to report fold changes that are being measured across treatments. Please include a proper loading control (i.e. 7SL or U6 RNAs). Additionally, more rigorous analysis of the pre-rRNA precursor levels through ratio analysis of multiple precursors (RAMP) (Wang et al 2014) can be completed to provide a clearer depiction on which precursor(s) are accumulating. It is unclear for the Figure 1 northern blots if there were replicates completed and what the error bars represent in Figure 1B. Please report replicates, so that statistical analysis can be completed on the differences in precursor relative abundance. This need is emphasized by the small changes observed in pre-rRNA levels (less than 2 fold) between conditions.

      2.The western blots reported throughout the manuscript are lacking proper reproducibility and quantification. For example, the western blots validating RSK1 and RSK2 depletion in Figure 1C lack a proper loading control. Additionally, it is unclear if there are replicates completed and there is lack of statistical analysis to determine if the changes are significant. Please include loading controls, replicates, and quantification of the western blots throughout the manuscript.

      3.Please report the full bioinformatic analysis of the RSK substrate motif search among human AMFs including other AMFs found in this search. A sorted list format would be valuable for the reader to understand other potential RSK substrates involved in ribosome biogenesis.

      4.The authors report that RSK inhibition/depletion leads to accumulation of the 30S pre-rRNA, yet mutation of its target site on RIOK2 or RIOK2 depletion leads to an accumulation of the 18S-E pre-rRNA. Additionally, the phosphomimic mutation of RIOK2 leads to an accumulation of 30S, the opposite of the expected result. Please elaborate on this discrepancy in processing defects observed across experiments. Are there similar results for RSK depletion/inhibition and RIOK2 release from the pre-40S and inability to import into the nucleus? If so, this could provide phenotypic consistency between these two proteins in the proposed pathway to further support the hypothesis.

      5.Mature levels of 18S rRNA are not altered in the RIOK2 mutant cell lines. This could be due to compensation in these mutant cell lines since RIOK2 is essential. Please report the mature 18S rRNA levels upon shRNA depletion and RSK inhibitors to provide insight into if this pathway significantly alters mature 18S rRNAs as a mechanism for the altered translation and proliferation observed.

      Minor Comments:

      1.Figure 1A lower: The authors use an RSK inhibitor LJH685, that does not inhibit RSK phosphorylation S380. Therefore, another verification of RSK inhibition must be used besides RSK-pS380 abundance as for PD184352 inhibition. Please validate the usage of this RSK inhibitor in the experiments by inclusion of quantification of a direct downstream substrate of RSK, such as YB1-pS102 quantification.

      2.Page 7, Lines 8-12: The authors state that RSK knockdown led to increases in the 45S, while the LJH685 treatment led to no changes in 45S levels due to differences in growth conditions. Please elaborate more on how growth conditions would alter 45S pre-rRNA levels. It would be expected that stimulation of the MAPK pathway would increase pre-rRNA transcription compared to steady state growth conditions. However, pre-rRNA processing northern blots are only measuring steady state levels of the precursors. Thus, an rDNA transcription assay would need to be completed to evaluate these differences.

      3.Figure 2C: Please quantify these results to properly evaluate the role of these two phosphorylation sites in MAPK signaling.

      4.Please include the RIOK2 pS483 antibody generation methodology used in this study.

      5.In vitro kinase assay methods: Is the recombinant RSK1 the human version of the protein? Please clarify in methods.

      6.Figure 4B: Please include statistical analysis of the puromycin incorporation assay.

      7.Page 13, Line 18: Please explain why RIOK2 co-IP with NOB1 is important.

      8.In vitro dissociation assay: There is no control for pulldown of entire pre-40S particles and not just NOB1 protein. Thus, it is unclear if RIOK2 is dissociating from NOB1 or entire pre-40S particles. Please reference previous literature of the methodology of this experiment if applicable. Additionally, please include controls, such as western blotting of ribosomal proteins or northern blotting of rRNA in the pulldown fraction used.

      9.Page 16, Lines 10-12: The authors state "RSK facilitates the release of RIOK2 and other AMFs", however the only other AMF in this study was NOB1. Please reword appropriately that most likely facilitates release of RIOK2 and other AMFs in a RIOK2 dependent or independent manner if it also phosphorylates other AMFs which possess the motif.

      Significance:

      This manuscript is significant due to the lack of mechanistic connection of cellular signaling pathways to pre-rRNA processing. There have been, for the most part, no mechanistic connection of signaling pathways to pre-rRNA processing regulation and none for direct targets of MAPK signaling (Reviewed in Gaviraghi et al 2019). They provide the groundwork for analysis of MAPK signaling in regulation of an assembly factor and inclusion of their motif analysis could provide RSK signaling targets' regulation of specific steps of ribosome biogenesis that remain to be elucidated.

      Although the research delves into a specific mechanism, its audience could be far reaching as it is in the ribosome biogenesis field and MAPK signaling, which have broad implications in cancer and developmental diseases.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The authors report that inhibition of MAPK signaling via RSK is associated with modest alterations in the relative abundance of human pre-rRNA species, that are most marked for 30S but also visible for 21S - although not clearly shown for 18S-E.

      RIOK2 has two closely spaced sites predicted as RSK targets, one of which was confirmed to be MAPK sensitive and shown to be an RSK substrate in vitro. Substitution of Ser483 with Ala was associated with reduced growth and 18S-E accumulation, consistent with impaired NOB1 cleavage activity. RIOK2-S483A also showed greater pre-ribosome association in vivo and consistent with this, more stable association in vitro and increase cytoplasmic residence. These effects are clear, although the data do not directly demonstrate their linkage to loss of RSK phosphorylation.

      The mutations were apparently generated directly in the genome of haploid cells, potentially raising concerns that the introduction of a deleterious mutation might have been accompanied by compensatory mutations elsewhere. However, three cells line gave similar results, mitigating this concern.

      Specific comments:

      1.To help the reader, the authors should directly discuss why they think the data on MAPK inhibition did not reveal a clearer pre-18S cleavage phenotype, as would have been expected for loss of RIOK2 activity.

      2.Fig. S3: The degree of RSK depletion with the siRNAs appears very modest, as are the effects on RIOK2-P. Moreover, the double depletion is not clearly better than single depletions. These data should probably be supported by quantitation or withdrawn.

      3.Fig. 5D: For 18S-E recovery with RIOK2, is the ratio adjusted for the increase in 18S-E abundance in the mutant - ie is recovery increased when adjusted for the increased pre-rRNA abundance?

      Significance

      Overall, the analyses on the phenotype of RIOK2-S483A, and the demonstration that this site is an RSK target, appear convincing.

      Caveats are

      1)the phenotype seen on inhibition of RSK, would not have implicated RIOK2 as the obvious candidate for the factor responsible for the observed processing defects;

      2)the RIOK2-S483A phenotype is not demonstrated to be RSK dependent. This raises the possibility that, although RSK can phosphorylate S483, the effects of the mutation are not due to the loss of this modification.

      With these provisos, the work is technically good and will be of considerable interest to the field. The post-transcriptional regulation of ribosome synthesis is increasingly recognized a significant topic.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript addresses an important topic, the posttranscriptional maturation of ribosomes. This topic is inherently interesting because we normally think of ribosome biogenesis as a sequential series of steps that automatically proceeds and cannot be "accelerated" in physiological conditions, but only "delayed" in the presence of genetic mutations. In short, the manuscript proposes that RIOK2 phosphorylation by the action of RSK, below the Ras/MAPK pathway promotes the synthesis of the human small ribosomal subunit.

      I honestly admit that I have some difficulties in reviewing this manuscript. The quality of the presented data is, in generally, good. However, overall I find the whole manuscript preliminary and I am not much convinced of the conclusions. Several aspects are superficially analyzed. In short, I think that most of the conclusions are not fully supported by the data because shortcuts are present. A list of all the aspects that I found wrong are listed.

      Biological issue

      1. The authors claim that the effects of the inhibition of the maturation of ribosomes by acting on a pathway upstream of RIOk2 are limited to the 40S subunit. This is far from being a trivial point, for the following reason. RIOK2 is known to affect the maturation of 40S ribosomes. Hence, the fact that using an upstream inhibitor of the MAPK pathway such as PD does not inhibit 60S processing in reality would argue against a biologically relevant control in ribosome maturation (of the MAPK patheay). Have the authors considered this? In a way, also, given the fact that the mutants confirm a role in 18S final maturation, it is a bit complex to put all the data in a clear biological context.

      A number of specific issues will be concisely described.

      Manuscript very well written. Data do not always support the strong conclusions. Low magnitude of the observed effects.

      In introduction the authors make a general claim that ribosome biogenesis is one of the most energetically demanding cellular activities. This statement lingers in the literature since 15 years but in reality it has never been formally proved for mammalian cells, and certainly not for HEK293 cells. The original statement, to my knowledge, can be traced by some obscure statement referred to the yeast case and then repeated as a truth. In conclusion, beside being a very banal observation, it should be referenced.

      Growth factors, energy status are not cues but are proteins or metabolites (introduction). Authors write about mTOR without making statements on mTORC1/2. This is very obsolete. Also I am not sure that the choice of Geyer et al., 1982, and subsequent papers makes much sense. At the very minimum TOP mRNA concepts and mTORC1 must be defined.

      The authors claim that heir work fills a major gap between known functions of MAPK and cytoplasmic translation. I would not be so sure about it.

      Results. Authors start with a major mistake, i.e. that PMA selectively stimulates the MAPK pathway. Perhaps it stimulates, certainly it does not do it selectively.

      RIOK2 phosphosites are first found by bioinformatics analysis. It should be noted that the predicted phosphosite (S483) is found only in a limited set of datasets from MS databases. The actual importance of this site would not emerge from unbiased studies. Also, there are many other phosphosites that were not analyzed in this study.

      Throughout the paper the authors use the word strongly, significantly, but the actual effects seem in general quite marginal.

      Discussion. The authors claim that they provide solid evidence on MAPK signalling to ribosome maturation. At the very best this is circumstantial evidence for the 40S maturation.

      Figure 1. Unclear why LJH should increase P-ERK. General lack of quantitation (sd, replicates, bars). Experiment done only on a single cell line in a single experimental setup. Very different effects on 21S by LJH,PMA and siRNA for RIOK2. Overall the message given by the authors is to me mysterious.

      Figure 2. Several red flags. For instance in 2C the loaded levels of RIOK2-HA loaded are clearly less than the ones of the other genotypes, hence the conclusion on P-RIOK2 is not convincing. Staining with anti-P RIOK2 lacks controls, how can be sure that the signal is due to the phosphate? Phosphatase treatment? Why FBS does not lead to ERK staining in HEK293? There are plenty of growth factors in FBS that should lead to ERK phosphorylation. I do not understand this experiment.

      Figure 3. In vitro phosphorylation, if I understood, it relies on a truncated version of RIOK2. Why? Is the folding of the full length protein not permissive to in vitro phosphorylation? HA-RSK3 is less?

      Figure 4. Immunofluorescence is low mag, difficult to understand. I really like the experiments with RIOK2 mutants, however I wonder what about protein levels after the knock-in? Given the 18S phenotype overlap between the phenotype of the RIOK2 loss of function with the S483A, testing protein level becomes of the utmost importance.

      Figure 5. Low quality IFL. Hard to think that histogram quantitation of nuclear versus cytoplasmic staining are reliable in the absence of fractionation, better quantitation, experiment done in other cell lines and so on. However, very beautiful Fig. 5E perhaps the best of the paper shows also mobility shift driven by S483, thus supporting posttranslational modifications.

      Fig. 6. IFL studies are really impossible to interpret. The effects on RIOK2 release (this figure) and 18S maturation (Fig. 5) are very clear and of great quality. Overall conclusions. The manuscript tends to overinflate the meaning of several experiments. What to me is very clear and interesting is that the the authors provide clear evidence that S483A mutants have a defect in 40S maturation. Whether this is due to MAPK signalling, is only circumstantial. I would suggest to build up on the strong findings and eliminate ambiguous data.

      Significance

      The paper deals with an important topic, namely whether a regulation of ribosome maturation exists, and how it is mechanistically regulated. In this context, the analysis of the ERK pathway is highly needed considered that most works deal with effects of the PI3K-mTOR pathway, and the parallel, yet important RAS-ERK pathway, is less understood. As a final note, we should consider that S6K downstream of mTOR, and ribosomal S6K, downstream of ERK have been considered to share some substrates.

      The manuscript is interesting, but several statements given by the authors are rather superficial. An example, listed in the previous section, relates to the linguistic usage of mTOR kinase, instead of detailing whether we are dealing with mTORc1 or mTORc2. A second gross mistake is the definition of PMA as a stimulator of the ERK pathway. If this is certainly true, this is historically not correct as seminal papers by the group of Parker define this drug as a stimulator of conventional PKC kinases. In short, this paper is a step back in knowledge from the perspective of the literature context.

      All people interested to the crosstalk between ribosome maturation and signaling pathways will be certainly read this manuscript.

      My expertise is within the ribosome biology and signalling field.

  5. Oct 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Please note that the authors have provided a formatted PDF version of this rebuttal, including additional figures and references, via the Open Science Framework: https://osf.io/5acqp/

      Reviewer #1

      This is an interesting and thorough study characterising human iPSC with hetero or homozygous mutation in pi3k pathway that lead to its hyper-activation. They prove that the increased stemness results from enhanced autocrine responsiveness to TGF signalling pathway. The main conclusions are well supported by the presented data. Cutting edge tools and bioinformatic analysis are adequately applied. I have only one important point:

      1) Western blot based validation of TGF pathway activation in wt and mutant iPSCs will be helpful to strengthen the results based on bioinformatic data.

      AUTHORS’ RESPONSE__:__ We thank the Reviewer for the positive evaluation of our work.

      Functional validation of the signalling hypothesis is indeed important, and we did in fact already present supportive data. Current evidence suggests that SMAD2 is the main transcription factor mediating actions of the TGFb/NODAL pathway in an early developmental context [1,2], and we have shown increased phosphorylation of SMAD2 (S465/S467) in PIK3CAH1047R/H1047R iPSCs using RPPA in the two datasets shown in Fig.2.

      We have attempted to demonstrate increased NODAL protein directly in PIK3CAH1047R/H1047R cells, but have been unsuccessful due to poor signal on immunoblotting. We thus opted for functional testing of our hypothesis using the experiment presented in Fig. 5, wherein TGFb (a surrogate for NODAL) is removed from the culture medium. Human iPSCs depend strictly on TGFb/NODAL for maintenance of NANOG expression and thus pluripotency [3,4]. Upon exclusion of TGFb/NODAL from the culture medium of normal human iPSCs, the early responses (prior to overt differentiation) are expected to be: (A) decreased NODAL expression, due to well-established autoregulation [2], then (B) a decrease in NANOG and ultimately POU5F1 (OCT3/4) mRNA levels (see also Introduction, lines 80-90). The evidence in Fig. 5 that PIK3CAH1047R/H1047R fail to exhibit these responses upon exogenous TGFb/NODAL removal supports the notion that these cells autonomously sustain TGFb/NODAL signalling.

      For improved clarity, we have also added the following information to the revised manuscript:

      lines 202-205: “This is consistent with strong NODAL mRNA upregulation and increased pSMAD2 (S465/S467) in PIK3CAH1047R/H1047R iPSCs in the current study (Dataset S2 and RPPA data in Fig. 2, respectively), and with prior evidence of activation of the NODAL/TGFb pathway in homozygous PIK3CAH1047R iPSCs.”

      Reviewer #2

      In this manuscript, Madsen et al have investigated the role of heterozygous versus homozygous PIK3CAH1047R gain-of-function mutation at maintaining stemness of induced pluripotent stem cells (iPSCs). The authors have performed high-depth RNAseq, proteomic, and RPPA analyses to show that biallelic PIK3CA alterations induce stronger activation of the PI3K signaling axis, compared to monoallelic mutations. The authors claim that a higher PI3K signaling dose activates the NODAL/TGF-b pathway, which in turn supports stemness in an autocrine fashion. These are important findings, however, the manuscript and its conclusions can be improved.

      AUTHORS’ RESPONSE__:__ We thank the Reviewer for acknowledging the importance of the work and for their constructive suggestions for improvements.

      The authors have described the role of PIK3CAH-1047R gain-of-function mutation in cancer and overgrowth syndromes. However, cancer associated somatic mutations in PIK3CA are mostly heterozygous. Similarly, PIK3CA related overgrowth syndromes (PROS) are caused by post-zygotic mosaic PIK3CA activating mutation. As such, the relevance of homozygous PIK3CA alterations to these pathological conditions is unclear. The authors should elaborate on the biological implications of their findings.

      AUTHORS’ RESPONSE__:__ We disagree with the Reviewer’s comment which implies that homozygous PIK3CA mutations are not relevant to many cancers. In our previous work [5], we provided evidence that many human cancers harbour multiple PIK3CA mutant alleles. Specifically, among cancers with a unique PIK3CA mutation, approximately 50% exhibit multiple copies according to allele copy number analysis. We further demonstrated that a substantial proportion of cancers have multiple different PIK3CA variants or additional oncogenic ‘hits’ within the pathway. These findings have been supported by other recent high-profile papers [6–8]. Such multiple alterations increase activity of the PI3K pathway beyond the level seen with heterozygosity alone [5,6]. This substantial body of literature renders our PIK3CAH1047R iPSC model system highly relevant for studying disease-relevant, dose-dependent oncogenic PIK3CA activation.

      The Reviewer is correct, however, that PROS is caused by postzygotic heterozygous PIK3CA mutations almost exclusively. Observations in homozygous cells are therefore not directly relevant to the pathogenesis of PROS. On the other hand, the heterozygous cells are closely relevant, being human, carefully matched with isogenic controls, and unperturbed by further manipulations such as artificial immortalisation. Our prior studies demonstrated no clear phenotypes in heterozygous cells in the iPSC differentiation paradigm, despite the rock solid causal nature of heterozygous mutations in PROS. This negative finding, surprising given the dramatic PROS phenotypes, is very important in understanding how best to create disease-relevant PROS models. One intent of the current study was to increase the sensitivity of our transcriptomic analysis, and to combine this with proteomic studies to determine if heterozygous cells really do not exhibit a phenotype. We now show that there are indeed faint echoes in heterozygous cells of the dramatic changes in homozygous cells. We believe that the human growth phenotype is a summative consequence of such small differences in growth behaviours sustained over months and years, highlighting how subtle difference in signalling can lead to dramatic human growth consequences across the lifecourse. Similar observations were also recently made following systematic analyses of oncogenic RAS mutations [9]. The new information we present about heterozygous PIK3CAH1047R cells, while much less “showy” than the cancer-relevant behavious of homozygous cells, we thus contend is very important for understanding of the PROS phenotype and its experimental modelling. To emphasise this point, we have added the following statements to the abstract and discussion, respectively.

      • lines 56-57: “This work illustrates the importance of allele dosage and expression when artificial systems are used to model human genetic disease caused by activating PIK3CA mutations.”
      • lines 104-106: “We discuss the implications of our findings for understanding and modelling developmental disorders and cancers driven by genetic PI3K activation.”
      • lines 333-340: “Finally, our observations are important for future studies seeking to model human PIK3CA-related diseases. The modest changes observed in heterozygous PIK3CAH1047R cells, in sharp contrast to the radical transcriptional alterations in homozygous cells, emphasise the importance of careful allele dose titration when artificial overexpression systems are used to model disorders caused by genetic PIK3CA activation. Our findings in heterozygous cells are also a reminder that very small effect sizes in cellular systems may summate and result in major human phenotypes over a life course. That such minor changes are found in a cellular study of a rare and severe disorder emphasises the challenges of modelling much more subtle disease susceptibility conferred by GWAS-detected genetic associations, where cellular effect sizes are likely to be smaller still.”

        The role of biallelic PIK3CA mutation is reminiscent of compound mutations in PIK3CA which have also been shown to increase PI3K signaling output. However, double PIK3CA mutations confer enhanced sensitivity to PI3K inhibition (Toska et al. Science 2019). Could the authors kindly speculate on this discrepancy.

      AUTHORS’ RESPONSE: We emphasise first that PIK3CAH1047R/H1047R cells do respond to BYL719 at the signalling level, as demonstrated previously [5] and in the manuscript (revised Figure S5; see also additional Western blot below). Our point is that the cells have undergone a switch to self-sustained stemness. That is, while PIK3CA activation was the driver of the initial change in cell state, the induced stemness phenotype is no longer reversed by removal of that trigger, with our data suggesting that this is now driven by self-sustained TGFb/NODAL signalling. This is in line with the role of this pathway in the maintenance of the pluripotent state. We speculate that this may be important in a cancer context where surviving stem cells may permit cancer persistence after toxic therapies, even if short term growth of tumours is reduced by agents such as PI3K inhibitors.

      Our data are not directly comparable to prior cellular data, for example in Vasan et al. [6], due to: (a) use of different cell model system and (b) assessment of different functional responses. We would also sound some methodological notes of caution re some of the prior studies alluded to, as potentially confounding differences in growth rate in the cells studied was not corrected for. It is well-established that IC50 and Emax values depend on cell division rates, and failure to correct for this can result in artefactual correlations between genotype and drug sensitivity (see, e.g., Hafner et al. Nature Methods 2016: “Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs” [10]**).

      Similarly, the p110 alpha specific inhibitor, alpelisib, is highly effective against PIK3CA-mutant ER+ breast cancer and PROS. As such, the clinical relevance of the insensitivity of homozygous PIK3CA mutation to PI3K inhibitors is unclear.

      AUTHORS’ RESPONSE__:__ Efficacy of Alpelisib in PROS is currently supported only by unregistered observational studies, but is nevertheless striking. It is not relevant to our findings in homozygous cells, as the Reviewer has previously observed, however.

      As for cancer, in a randomised phase 3 trial that compared Alpelisib/BYL719 with fulvestrant to fulvestrant alone, the overall response (irrespective of PIK3CA mutant status) was indeed greater with the combination treatment (26.6 % vs 12.8 %), with a hazard ratio of 0.65 (95% CI, 0.5 to 0.85) in patients with PIK3CA-mutant caners versus a hazard ratio of 0.85 (95% CI, 0.58 to 1.25) in those without a PIK3CA mutation [11]. This trial demonstrated the utility of additional PIK3CA mutant-centric stratification, but a substantial proportion of patients with PIK3CA-mutant tumours (>50%) did not benefit from the BYL719 and fulvestrant combination [11]. However, these observations are not directly relevant to this manuscript and are instead included in a separate manuscript focused on PI3K signalling and stemness in human breast cancers (preprint [12]**).

      Figure 2: The authors have performed RPPA analysis in the presence of 100 nM BYL719. Alpelisib is commonly used at 1 uM concentration for in-vitro experiments, and has a cMax of ~5 uM. We suggest the authors perform western blot analysis to confirm the results of RPPA.

      AUTHORS’ RESPONSE__:__ We carefully chose the optimal concentration of BYL719 to preserve inhibitor selectivity, and to avoid undue toxicity and confounding off-target effects, rather than copying the dose “commonly used”. The Cmax is not relevant to our use of BYL719 in the current study as a precise tool compound. We refer the Reviewer to the known pharmacological characteristics of this compound [13,14]. According to available evidence, it is only a selective PI3Kα inhibitor at concentrations 250 nM (Table below adapted from Ref. **[13]; for formatted version, please see PDF version: https://osf.io/ecmhr/)

      Enzyme

      In vitro IC50 for NVP-BYL719 (nM)

      PI3Kα

      4.6 +/- 0.4

      PI3Kα-H1047R

      4.8 +/- 0.4

      PI3K**b

      1156 +/- 77

      PI3K**d

      290 +/- 180

      PI3K**g

      250 +/- 140

      PI4K**b

      571 +/- 42

      We have previously demonstrated (Fig. 2C in Ref. [5]) that 100 nM BYL719 is sufficient to restore pAKT (S473) levels in both heterozygous and homozygous PIK3CAH1047R to levels observed in WT cells. This is consistent with the RPPA data reported in the current work (Fig. 2B). Of note, while 500 nM BYL719 completely ablates pAKT irrespective of genotype, we previously noted substantial toxicity [5], precluding use of this or higher doses of BYL719 in our model system. This is in line with a recent Nature Cell Biology study by Yilmaz et al. ([15]) which demonstrated the essential growth-promoting role of the PI3K pathway in human pluripotent stem cells; Yilmaz et al. also demonstrate that compared to somatic cells (fibroblasts), human pluripotent stem cells suffer dramatic effects on growth/survival in response to Torin1/rapamycin [15], overall suggesting that this cell type is exquisitely sensitive to inhibition of the PI3K/AKT/mTOR pathway.

      In the present study we have also confirmed that 250 nM BYL719, used for Fig. 5 experiments, has worked as expected at the level of pAKT (S473) as shown in the below Western blot (see also revised Fig. S5; please access PDF version to view Western blot: https://osf.io/ecmhr/)

      Figures 3 and 4: The authors should expand their RNAseq analysis to demonstrate enrichment of stemness and TGFb signaling in homozygous mutant cells compared to heterozygous cells.

      AUTHORS’ RESPONSE__:__ We thank the Reviewer for this suggestion. The unsupervised MDS plot (Fig. 1A) clearly demonstrates the overlap between wild-type and heterozygous cells, strongly suggesting functional concordance and consistent differences to homozygous counterparts. Indeed, the below count table illustrates that the majority of differentially expressed genes in homozygous versus wild-type cells are also differentially expressed in homozygous versus heterozygous cells, including the direction of the change (please access the PDF version for formatted table: https://osf.io/ecmhr/)

      Comparison

      Differentially expressed gene count

      HOMvsWT

      5644

      HOMvsHET

      5764

      HOMvsWT AND HOMvsHET

      4825 (2300 upregulated; 2525 downregulated; 1 discordant)

      We have now performed additional fast gene set enrichment analyses (fgsea; shown below - please access PDF version to view figure: https://osf.io/ecmhr/) using the R package fgsea ([16]) and 14 of the Broad Institute’s 50 Hallmark Gene Set Collection [17], including manual addition of the PLURINET signature [18]. The 14 gene sets were chosen based on their relevance to answering the Reviewer’s question as well as their connection to PI3K signalling. Fold changes for all expressed genes were included in the analyses, without further thresholding in order to minimise bias.

      The results for homozygous vs wild-type comparisons are concordant with our upstream regulator analyses using IPA; as expected, TGFb signalling and PI3K signalling are among the top positively enriched (NES > 1) in comparison between homozygous and heterozygous cells. Unsurprisingly, however, the strength of the enrichments are lower when comparing the two PIK3CAH1047R genotypes.

      We are not convinced that including these surplus data will add value to the manuscript and its main message, however we will leave this decision to the discretion of the Editor (please also refer to our response to the subsequent question from Reviewer 2). Moreover, these data will remain visible in the publicly available rebuttal document.

      The authors should confirm the results of pathway analysis in vitro to show that homozygous PIK3CA mutation confers increased stemness compared to heterozygous mutation.

      AUTHORS’ RESPONSE__:__ This was a key finding in our previous publication [5]. The aim of the current study was to interrogate this phenomenon further through high-depth transcriptomic/signalling analyses.

      Figure 5: Kindly provide direct evidence demonstrating that increased PIK3CA signaling output induces NODAL expression in this experimental setting.

      AUTHORS’ RESPONSE__:__ We have consistently demonstrated increased NODAL mRNA expression (RNAseq data, Fig. S4 and Ref. [5]). Unfortunately, we have been unsuccessful in attempts to obtain good quality immunoblots for NODAL protein in PIK3CAH1047R/H1047R cells (as noted in response to Reviewer 1). We note, in fact, that such documentation of NODAL protein levels, while not unprecedented, is fairly rare.

      Also, please normalize gene expression data to WT cells so it is easy to visualize the changes in NODAL and NANOG expression in homozygous and heterozygous mutants compared to WT iPSCs.

      AUTHORS’ RESPONSE__:__ It is arithmetically more precise to normalise to the highest expression (i.e. that of PIK3CAH1047R/H1047R cells) – thereby avoiding artificial inflation of fold-changes when normalising to very low levels of expression. Ultimately, the relative levels calculated – and the increased expression of NODAL in PIK3CAH1047R/H1047R cells – are identical visually. Only the entirely arbitrary units change. Thus we do not deem normalisation to WT to be necessary or to add value to the analysis.

      Kindly quantify Fig. S5.

      AUTHORS’ RESPONSE__:__ These brightfield micrographs were taken as part of routine practice to monitor cell health during maintenance and experimentation, and are suboptimal for direct quantitation due to uneven illumination background and lack of whole-well imaging. Nevertheless, we have now undertaken quantification as the Reviewer suggests, using individual images taken during independent experimental replicates. The results have been added to Fig. S5 and support our assertion that 250 nM BYL719 had a growth inhibitory effect in homozygous PIK3CAH1047R iPSCs. All raw images and associated data have been uploaded to the Open Science Framework (https://osf.io/hbf7x/). The following short method section detailing the image analysis algorithm has also been included in the revised supplementary material:

      “Colony size quantitation from light micrographs

      Routine cell culture light micrographs were acquired on an EVOS FL digital inverted microscope (AMF4300, Thermo Fisher Scientific) using the 4X or 10X objective (final magnification 40X and 100X, respectively). For quantitation, 4X images were used for colony segmentation with Definiens Developer XD software. Background was detected using a contrast threshold; for this each pixel was compared to those in the surrounding 24 pixels (i.e. a 5x5 pixel box), and pixels with low contrast (between -50 and +50) were classified as background. Remaining pixels were classified as colonies, and any holes (pixels that were not initially classified as being part of the colony due to low contrast) were filled. Edges of the resulting colonies were smoothened by shrinking and then growing the colonies by 2 pixels. Finally, colonies less than 2000 pixels in size were reclassified as background. The area of the resulting colonies could then be measured and averaged over each field of view.”

      Reviewer #3

      In this manuscript by Madsen et al., a comparison of the transcriptome and proteome in heterozygous and homozygous PIK3CAH1047R human pluripotent stem cells mutants is presented. The authors demonstrate marked alterations in expression at both the protein and RNA level of homozygous mutants compared to wildtype, while heterozygous lines exhibit only minor changes. Multiple analytical approaches are employed to investigate network alterations, leading the authors to suggest a TGFβ-mediated rewiring of key pluripotent genes to induce a state of sustained stemness. Madsen et al. conclude with a set of experiments to functionally implicate NODAL/TGFβ autocrine signalling in PIK3CAH1047R dose-dependent stemness. The key conclusions are not convincing. While the unbiased omics approach sets up this study well, the study suffers from a lack of convincing functional assays (cell biological assays) to test their model and tease apart a phenotype for the het cells. More robust functional experiments are required to support the finding the NODAL/TGFβ signalling mediates the self-sustained stemness, particularly because this is the major novel finding distinguished from the authors previous work.

      AUTHORS’ RESPONSE__:__ We thank the Reviewer for their detailed critique. Our perspective on the robustness and novelty of our findings diverges from that of the Reviewer, however, as we elaborate on in more detail below.

      While the authors present a comprehensive omics investigation into alterations between wild type, homozygous, and heterozygous mutants, the critical functional experiments are lacking. In Figure 5, the authors seek to support the role of TGFβ in mediated stemness in the homozygous mutants, however, are not able to directly deplete TGFβ due to technical limitations of the culture conditions. Consequentially, the experiments are primarily built on the use of NODAL withdrawal and stimulation. The data presented thus implicate NODAL in the stemness phenotype, but it's not obvious TGFβ is substantially involved, particularly considering the inhibitor subsequently employed also inhibits NODAL type 1 receptors.

      AUTHORS’ RESPONSE__:__ NODAL and TGFb activate shared signalling pathways downstream from their respective receptors, and indeed they (as well as Activin) can be used interchangeably in stem cell culture, which is common practice [19–21]. Commercially available Essential 8/TeSR-E8 is supplemented with TGFb not NODAL; therefore the factor we have removed is TGFb, prior to any controlled introduction of NODAL (based on strong upregulation of its mRNA in PIK3CAH1047R/H1047R). Any residual TGFb-like ligands will be contributed by Matrigel as outlined in the text (lines 247-251). It is well-established that “NODAL/TGFb signalling” denotes signalling through SMAD2/3/4 (as opposed to BMP signalling through SMAD1/5/8), and this is how we use the term throughout the manuscript. Accordingly, it is functional activation of the “NODAL/TGFb signalling pathway” that we investigate (see also response to Reviewer 1, p.1).

      In summary, we seek not to make a distinct point about TGFb, but rather refer to NODAL/TGFb signalling as a matter of biochemical correctness. For clarity, we now replace mentions of “TGFb signalling” with “NODAL/TGFb signalling” throughout the revised manuscript. We have also revised the legend for Figure 3 to make this clearer.

      Furthermore, there is a paucity of readouts for stemness. For example, a more convincing narrative would include additional expression markers of the core pluripotency network (e.g. OCT4, SOX2, etc.) as well as functional readouts (e.g. NODAL withdrawal and assessment of differentiation) after NODAL stimulation/depletion and comparing across genotypes. Overall, the primary conclusions of this work are not well-evidence by the presented data and the authors should consider additional functional experiments or reframing the narrative.

      AUTHORS’ RESPONSE__:__ We chose the current strategy because we wanted to capture the earliest changes after depletion of NODAL/TGFb/ signalling, prior to any signalling rewiring triggered by differentiation. In fact, we believe that a strength of this study is our observation of differences in critical stemness markers in spite of the short time course. To aid non-expert readers we offered a primer on stemness genes and rationale for the markers chosen in the existing introduction (lines 80-90).

      We have further assessed additional stemness and differentiation marker genes in two independent homozygous PIK3CAH1047R cell lines using a high-throughput pluripotent stem cell scorecard (Fig. S4). This replicates the effect on cell marker genes documented by RT-qPCR in Fig.5, while also showing additional reductions in genes that were upregulated in homozygous PIK3CAH1047R cells (MYC, GDF3, FGF4) and which have previously been shown to be highly expressed in pluripotent stem cells (we have now added this additional clarification to the legend of Fig. S4) [22]. Despite the short term treatment, these data also show that no other treatment but SB431542 is capable of triggering expression of early neuroectoderm markers (CDH9, MAP2 and PAPLN) [23], prior to overt morphological changes in the cultures (Fig. S5; higher resolution images are also available via The Open Science Framework: https://osf.io/hbf7x/). Neuroectodermal gene expression is expected upon inhibition of TGFb signalling in human pluripotent stem cells [24,25].

      A key conclusion of this study is there is a dose-dependent stemness phenotype. As this is not explicitly defined, to this reader, it would imply a graded response between wild type, heterozygotes, and homozygotes in the phenotypic and molecular characteristics. However, as is noted particularly in the omics components of the manuscript, there is in fact "near-binary" alteration in the assayed characteristics. Again, this should be qualified more explicitly, but it is more consistent with the data, which suggests the heterozygotes behave very similarly to the wild types, while homozygotes have substantial alterations. I would suggest the authors consider renaming their descriptions, removing "near-binary" and "dose-dependent" to something like "dose-threshold." This suggests after X threshold of oncogenic PI3K signalling, substantial alterations occur; under this threshold (e.g. hets), changes are marginal. In the event however that there may be a more "dose-dependent" effect, I would expect the transcriptomic and proteomic changes observed in the heterozygous cell lines should be seen in the homozygous cell lines (of which they are likely in greater in magnitude in addition to other changes).

      AUTHORS’ RESPONSE__:__ This appears to us to be largely a matter of semantics. In talking of “dose dependency” we were certainly not implying a graded affect (as the Reviewer points out, our are findings are far from this, suggesting a sharp threshold of dose which triggers widespread changes), and indeed nothing in these words strictly suggests this interpretation. Nevertheless we are sensitive to the fact of the Reviewer’s interpretation of the term, and mindful that this might be shared by other readers. On the other hand talking of a “near-binary” effect seems to us to be an accurate description of our findings. We have edited the manuscript to minimise ambiguity with the following changes:

      • line 49 “dose” replaced with “strength”: “We demonstrate signalling rewiring as a function of oncogenic PI3K signalling strength, and provide experimental evidence that self-sustained stemness is causally related to enhanced autocrine NODAL/TGFb
      • line 102: “This work provides in-depth characterisation of the near-binary PI3K signalling effects seen in hPSCs ….”
      • lines 195, 198, 317: inserted “allele dose-dependent We would also like to take issue with the case that the Reviewer seems to be making that a more graded change in gene expression across heterozygotes and homozygotes is to be expected. As mentioned in the manuscript (lines 206-210), there is evidence for NODAL/TGFb pathway activation in heterozygous cells. Nevertheless given the known temporal, context- and dose-dependent effects of this pathway [1,2,26,27] and, importantly, the widely described biological properties of developmental systems (featuring positive feedback loops, bistability and hysteresis; see Ref. [28,29]), we have no reason to expect that transcriptomic and proteomic changes observed in homozygous cell lines will be reproduced in heterozygous cell lines.

      The manuscript would benefit from more direct comparisons between the heterozygotes and homozygotes.

      AUTHORS’ RESPONSE__:__ Please refer to the additional data provided in response to a similar question by Reviewer 2.

      Further to the above point, as the marginal phenotype observed in heterozygotes is a critical point in this paper, the authors would benefit from including heterozygote lines in the functional experiments presented in Fig 5. Inclusion of the hets in these experiments would instill confidence in this reader that the marginal molecular alterations characterized at the proteomic and transcriptomic level is reflected in the lack of functional stemness-sustaining behaviour.

      AUTHORS’ RESPONSE__:__ The lack of stemness-sustaining behaviour in the heterozygous clones was demonstrated across multiple different experiments in our previous work, and further functional studies of early differentiation in these cells seemed a poor use of resource and very unlikely to give useful insights. Given the major disease phenotype associated with the same genetic change (PROS), the relative lack of phenotype in heterozygous cells was surprising and holds obvious implications for disease modelling (see also response to Reviewer 2, pp.2-3), and for how model systems are “calibrated” against human developmental disease. The aim of the current work was to:

        • Determine whether increasing the depth of signalling and transcriptomic analyses would unmask small but important changes in heterozygous mutants that might have been missed in prior studies (i.e. we actively aimed to increase the power of the study for identification of subtle changes) and *
        • To characterise in greater depth the signalling and transcriptional changes underpinning the robust threshold effect observed for self-sustained stemness driven by PIK3CAH1047R/H1047R. We would further observe that PROS does not feature obvious qualititative errors in tissue specification, but rather excessive growth of more or less normally differentiated tissues. We conceptualise this as reflecting a small incremental growth advantage in normally differented tissues of certain lineages that summates to create a major disease phenotype over months and years.*

      Thus, without the functional and mechanistic experiments alluded to above, the claims/ conclusions are speculative. In particular, the cancer narrative is irrelevant to the study. Considering both the lack of conclusive differentiation experiments or relevant breast cancer experiments, the discussion on differentiation therapy for breast cancer should be removed.

      AUTHORS’ RESPONSE__:__ The reference to cancer links to a computational study of human breast cancers where we specifically looked at the relationship between strength of PI3K signalling and ‘stemness’ [12], both measured using established transcriptional indices. We have included the bioRxiv reference in our revised manuscript (see l.337). While there is an element of speculation in this cancer observation, we do feel it is important and grounded in this and the BioRXiv study, and would prefer to maintain it. However, if editors take a different view it can be removed.

      Reproducibility is a concern for this study. The authors should perform more replicates on their experiments (focusing on technical replicates of the lines employed to discern technical vs biological variability). A challenge in reading this manuscript is understanding which replicates were used for which experiments, and whether they are technical or biological (i.e. different lines). While some of the figure legends note this information, it would be helpful to provide clarity throughout the text. In addition, it should be noted that some experiments (e.g. the RPPA analysis in Fig 2B and Fig S3B) show substantial variability between replicates, but because it appears only a single technical replicate from two different cell lines was used, it is impossible to distinguish whether the variability is of a biological or technical nature. The authors would do well to focus on collecting more technical replicates of fewer biological replicates, and then expand to include more biological replicates if initial biological variation is observed.

      AUTHORS’ RESPONSE__:__ We strenuously disagree with the Reviewer on this point. Throughout this manuscript, we have been transparent and thorough in reporting how experiments were performed, including the number of both biological and technical replicates. Representative examples include:

      Legend to Figure 2A (RPPA dataset in growth-replete conditions): “The data are based on 10 wild-type cultures (3 clones), 5 PIK3CAWT/H1047R cultures (3 clones) and 7 PIK3CAH1047R/H1047R cultures (2 clones) as indicated.”

      Legend to Figure 5: “The data are from two independent experiments, with each treatment applied to triplicate cultures of three wild-type and two homozygous iPSC clones.

      Specifically to address the RPPA studies, and as is clear from the Figure 2 legend, we initially performed RPPA analyses in growth factor-replete conditions with extensive technical and biological replication, arguing against the Reviewer’s point. To aid interpretation, we opted for summarising this large dataset in Venn diagrams (following extensive limma-based statistical analysis, including correction for multiple comparisons and sample interdependence as advised in Ref. [30]). If the Reviewer deems it valuable, we could include a heatmap overview as shown below:

      [To view figure, please access PDF version of this rebuttal on https://osf.io/ecmhr/]

      We took the view that the above representation, while comprehensive, is not particularly informative to the reader. All individual data points for both total and phosphoproteins – with and without normalisation – are plotted as part of separate barplots in the accompanying RNotebook (https://osf.io/d9tca/). These clearly demonstrate that the technical and biological variability in canonical PI3K signalling responses at the level of AKT and immediately downstream of AKT is very low. The same applies to the increased phosphorylation of SMAD2 (S465/S467) in PIK3CAH1047R iPSCs. We include two examples below, and would be happy to include the link to the above RNotebook in the respective Figure legend if the Reviewer deems this helpful.

      [To view figure, please access PDF version of this rebuttal on https://osf.io/ecmhr/]

      The interpretation of the second RPPA experiment (Fig. 2B) in growth factor-depleted conditions is focused entirely on these responses due to their consistency across both datasets (further supported by low-throughput signalling analyses in the previous PNAS publication).

      We had made all raw data and guided analysis scripts for the above RPPA dataset publicly available, and the same is true for all original data as highlighted in the Materials & Methods section. Thus we strongly believe that readers have the opportunity to assess our work and reproduce our analyses/conclusions fully should they wish to do so.

      • Finally, we noted in the initial PNAS paper describing these models that we derived and worked with up to 10 independent homozygous PIK3CAH1047R clones, as well as with 3 and 4 independent heterozygous and wild-type clones, respectively. This exceeds the common use of 2 clones (if at all mentioned) in many similar studies in the stem cell literature (e.g. Ref. [31–34]). In our view, derivation of more than two independent clones is crucial for reproducibility in gene editing studies given substantial variability arising from genetic drift [35,36]. We have consistently shown the phenotypic robustness of our mutant clones across the two studies; note, for example, the low technical and biological variability in both heterozygous and homozygous mutants in the transcriptomic data in Fig. 1A. As noted in the manuscript, the high-depth RNAseq data analysis was performed in different clones and independently of the RNAseq reported in Ref. [5], yet yields highly similar results and confirms transcriptional rewiring of PIK3CAH1047R/H1047R iPSCs.*

      Throughout the text, the authors frequently reference their previous study in PNAS and often the lines of what is novel in this paper vs. reproduction of previous findings is blurred. The authors would benefit from reducing the frequency of referencing their previous study and focusing on emphasizing the novelty of the present findings.

      AUTHORS’ RESPONSE__:__ We have carefully reviewed all instances of citation of our previous study in the manuscript and have reduced their numbers to improve focus on the current findings as suggested. As noted above, however, the current study builds closely upon the findings of the previous work, and referring to these to put the current work in context is important. Indeed, this is reflected in some of the reviewers’ collective comments and questions which are answered by the prior study. We have carefully reviewed the places in which we have cited our previous study and note that except for 2 citations in the Introduction and 3 more in the Discussion, all remaining citations are in the context of linking new and old data, which we believe is important for clarity as suggested by the reviewers. However, if editors take a different view we can minimise this and reduce the number of citations.

      Without functional assays to complement and test their models, this manuscript is not a significant advance.

      AUTHORS’ RESPONSE__:__ While we take the Reviewer’s point that further studies could have strengthened robustness of the evidence supporting a mediating role of NODAL/TGFb signalling in PI3K-driven stemness, we think this assertion is far too sweeping, and neglects numerous facets of the study of use and interest to several fields (as agreed by the other reviewers). To recapitulate some key points of interest/use of this study:

      • Using a carefully derived PIK3CAH1047R iPSC model system and pharmacologically relevant doses of a recently approved PI3Ka-selective inhibitor, we demonstrate that the efficacy of the latter can depend on the strength of PI3K pathway activation and phenotype under investigation – despite expected downregulation of PI3K signalling by Alpelisib, the stemness phenotype is not reversed.
      • We link this to self-sustained TGFb signalling in cells with strong PI3K activation by homozygous PIK3CAH1047R The link between the two pathways and the underlying rewiring are likely to be relevant in other contexts, as observed recently in a breast epithelial model system [37]. Given similarity between human pluripotent stem cells and cancer cells, our findings are of wider relevance.
      • Aberrant PI3K activation has been associated with numerous pathologies, so it is important for the field to have well-characterised model systems with endogenous expression of one of the most common PIK3CA mutations. Our thorough characterisation of PIK3CAH1047R iPSCs validates one such model.
      • To our knowledge, this is the first study to provide a comprehensive and integrated characterisation of isoform-specific PI3K signalling and transcriptomic changes in human pluripotent stem cells. This is important because current knowledge of PI3K signalling in human PSCs is largely based on extrapolation of findings from mouse embryonic stem cells, with many previous studies relying on high concentrations of the non-specific pan-PI3K inhibitor LY294002 (the use of which has been discouraged by the PI3K signalling community [38]).

        I believe the narrative was written for pluripotent stem cell biologists but without robust functional and quantitative cell biological assays to test their models, I don't anticipate stem cell biologists will be very interested.

      AUTHORS’ RESPONSE__:__ The Reviewer is incorrect in his/her assertion about the target audience. PI3K signalling plays a key role in numerous disease and physiological processes as well as in development, and is of broad interest to cancer biologists, genetecists, rare disease biologists, biochemists, cell signallers, and endocrinologists among many others. Indeed we started with a primary focus on disease modelling (cancer, PROS) rather than stem cell biology, but because our findings are significant for the role of PI3K in stem cell biology as well as for these diseases, we aimed to make findings accessible across many of these readers. We refer the Reviewer to our previous response with regards to the significance of this work.

      **Minor Comments:**

      Consider adding gridlines to the MDS plots for clarity of read

      AUTHORS’ RESPONSE__:__ This is a matter of taste, and as we honestly can not see how it would enhance appreciation of the very clear clustering, we have decided to leave the plot in its current form.

      In Fig S2, some of the in-figure labelling is incorrect

      AUTHORS’ RESPONSE__:__ We thank the Reviewer for spotting this. We believe the labelling error to be corrected now and we have further tried to streamline the plot headings, but please do let us know if there is something else which we may have missed.

      In Fig S1C, the authors note poor correlation in the heterozygotes between this and a previous study. It would be helpful to qualify this discrepancy, as it is potentially concerning.

      AUTHORS’ RESPONSE__: The sensitivity to detect differential gene expression is high for large fold changes (as seen in PIK3CAH1047R/H1047R mutants) in transcriptomic studies, but declines rapidly for fold changes in expression lines 126-131: “The magnitudes of gene expression changes in PIK3CAH1047R/H1047R cells correlated strongly with our previous findings (Spearman’s rho = 0.74, p WT/H1047R iPSCs (Fig. S1C), as expected given the smaller number and lower magnitude of observed gene expression changes in heterozygous cells, and the lower depth of previous transcriptomic studies__.”*

      Line 208, the authors state that the small p-value for the homozygotes is suggestive of a dose-dependent effect. This is not the case; it simply suggests a greater probability of the effect being non-random.

      AUTHORS’ RESPONSE__:__ The Reviewer is formally correct, and we apologise for the imprecision of our language. Nevertheless biological effect size is pertinent to the p value determined, and so our statement, while requiring an inductive leap from the reader, is not wholly invalid. To tidy this up and improve precision we have reworded as follows:

      lines 215-217: “This is in keeping with the much lower effect size in heterozygous cells, and consistent with a critical role for the TGFbeta pathway in mediating the allele dose-dependent effect of PIK3CAH1047R in human iPSCs.”

      What does the height in Fig 4B correspond to? It would perhaps be of value to scale nodes based on the significance value.

      AUTHORS’ RESPONSE__:__ 4B illustrates hierarchical clustering of the module eigengenes - the height corresponds to similarity of gene expression. We clarify this in the revised manuscript.

      References

      1 Lee, K. L. et al. (2011) Graded Nodal/Activin signaling titrates conversion of quantitative phospho-Smad2 levels into qualitative embryonic stem cell fate decisions. PLoS Genet. 7.

      2 Hill, C. S. (2018) Spatial and temporal control of NODAL signaling. Curr. Opin. Cell Biol. 51, 50–57.

      3 Xu, R. H. et al. (2008) NANOG is a Direct Target of TGFβ/Activin-Mediated SMAD Signaling in Human ESCs. Cell Stem Cell 3, 196–206.

      4 Vallier, L. et al. (2009) Activin/Nodal signalling maintains pluripotency by controlling Nanog expression. Development 136, 1339–49.

      5 Madsen, R. R. et al. (2019) Oncogenic PIK3CA promotes cellular stemness in an allele dose-dependent manner. Proc. Natl. Acad. Sci. 116, 8380–8389.

      6 Vasan, N. et al. (2019) Double PIK3CA mutations in cis increase oncogenicity and sensitivity to PI3Kα inhibitors. Science (80-. ). 366, 714–723.

      7 Saito, Y. et al. (2020) Landscape and function of multiple mutations within individual oncogenes. Nature 582, 95–99.

      8 Gorelick, A. N. et al. (2020) Phase and context shape the function of composite oncogenic mutations. Nature.

      9 Gillies, T. et al. (2020) Oncogenic mutant RAS signaling activity is rescaled by the ERK/MAPK pathway 1–19.

      10 Hafner, M. et al. (2016) Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs. Nat. Methods 13, 521–527.

      11 André, F. et al. (2019) Alpelisib for PIK3CA-mutated, hormone receptor-positive advanced breast cancer. N. Engl. J. Med. 380, 1929–1940.

      12 Madsen, R. R. et al. (2020) Relationship between stemness and transcriptionally-inferred PI3K activity in human breast cancer. bioRxiv 2020.07.09.195974.

      13 Fritsch, C. et al. (2014) Characterization of the novel and specific PI3Ka inhibitor NVP-BYL719 and development of the patient stratification strategy for clinical trials. Mol. Cancer Ther. 13, 1117–1129.

      14 Furet, P. et al. (2013) Discovery of NVP-BYL719 a potent and selective phosphatidylinositol-3 kinase alpha inhibitor selected for clinical evaluation. Bioorganic Med. Chem. Lett.

      15 Yilmaz, A. et al. (2018) Defining essential genes for human pluripotent stem cells by CRISPR–Cas9 screening in haploid cells. Nat. Cell Biol. 20, 610–619.

      16 Sergushichev, A. A. (2016) An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv 060012.

      17 Liberzon, A. et al. (2015) The Molecular Signatures Database Hallmark Gene Set Collection. Cell Syst. 1, 417–425.

      18 Müller, F. J. et al. (2008) Regulatory networks define phenotypic classes of human stem cell lines. Nature 455, 401–405.

      19 James, D. et al. (2005) TGFbeta/activin/nodal signaling is necessary for the maintenance of pluripotency in human embryonic stem cells. Development 132, 1273–82.

      20 Vallier, L. et al. (2005) Activin/Nodal and FGF pathways cooperate to maintain pluripotency of human embryonic stem cells. J. Cell Sci. 118, 4495–4509.

      21 Chen, G. et al. (2011) Chemically defined conditions for human iPSC derivation and culture. Nat. Methods 8, 424–429.

      22 Adewumi, O. et al. (2007) Characterization of human embryonic stem cell lines by the International Stem Cell Initiative. Nat. Biotechnol. 25, 803–816.

      23 Tsankov, A. M. et al. (2015) A qPCR ScoreCard quantifies the differentiation potential of human pluripotent stem cells. Nat. Biotechnol. 33, 1–15.

      24 Smith, J. R. et al. (2008) Inhibition of Activin/Nodal signaling promotes specification of human embryonic stem cells into neuroectoderm. Dev. Biol. 313, 107–117.

      25 Vallier, L. et al. (2004) Nodal inhibits differentiation of human embryonic stem cells along the neuroectodermal default pathway. Dev. Biol. 275, 403–421.

      26 Sorre, B. et al. (2014) Encoding of temporal signals by the TGF-β Pathway and implications for embryonic patterning. Dev. Cell 30, 334–342.

      27 David, C. J. and Massagué, J. (2018) Contextual determinants of TGFβ action in development, immunity and cancer. Nat. Rev. Mol. Cell Biol. 19, 1–17.

      28 Alon, U. (2007) Network motifs: theory and experimental approaches. Nat. Rev. Genet. 8, 450–461.

      29 Sonnen, K. F. and Aulehla, A. (2014) Dynamic signal encoding-From cells to organisms. Semin. Cell Dev. Biol. 34, 91–98.

      30 Germain, P. L. and Testa, G. (2017) Taming Human Genetic Variability: Transcriptomic Meta-Analysis Guides the Experimental Design and Interpretation of iPSC-Based Disease Modeling. Stem Cell Reports 8, 1784–1796.

      31 Wang, L. et al. (2017) GCN5 Regulates FGF Signaling and Activates Selective MYC Target Genes during Early Embryoid Body Differentiation. Stem Cell Reports 10, 287–299.

      32 Zeng, H. et al. (2016) An Isogenic Human ESC Platform for Functional Evaluation of Genome-wide-Association-Study-Identified Diabetes Genes and Drug Discovery. Cell Stem Cell 0, 1660–1669.

      33 Ho, L. et al. (2015) ELABELA Is an Endogenous Growth Factor that Sustains hESC Self-Renewal via the PI3K/AKT Pathway. Cell Stem Cell 17, 435–447.

      34 Roudnicky, F. et al. (2019) Modeling the effects of severe metabolic disease by genome editing of HPSC-derived endothelial cells reveals an inflammatory phenotype. Int. J. Mol. Sci. 20, 1–10.

      35 Veres, A. et al. (2014) Low incidence of Off-target mutations in individual CRISPR-Cas9 and TALEN targeted human stem cell clones detected by whole-genome sequencing. Cell Stem Cell 15, 27–30.

      36 Ben-David, U. et al. (2018) Genetic and transcriptional evolution alters cancer cell line drug response. Nature 560, 325–330.

      37 Katsuno, Y. et al. (2019) Chronic TGF-b exposure drives stabilized EMT, tumor stemness, and cancer drug resistance with vulnerability to bitopic mTOR inhibition. Sci. Signal. 12, eaau8544.

      38 Manning, B. D. and Toker, A. (2017) AKT/PKB Signaling: Navigating the Network. Cell 169, 381–405.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript by Madsen et al., a comparison of the transcriptome and proteome in heterozygous and homozygous PIK3CAH1047R human pluripotent stem cells mutants is presented. The authors demonstrate marked alterations in expression at both the protein and RNA level of homozygous mutants compared to wildtype, while heterozygous lines exhibit only minor changes. Multiple analytical approaches are employed to investigate network alterations, leading the authors to suggest a TGFβ-mediated rewiring of key pluripotent genes to induce a state of sustained stemness. Madsen et al. conclude with a set of experiments to functionally implicate NODAL/TGFβ autocrine signalling in PIK3CAH1047R dose-dependent stemness.

      Major Comments:

      1.The key conclusions are not convincing. While the unbiased omics approach sets up this study well, the study suffers from a lack of convincing functional assays (cell biological assays) to test their model and tease apart a phenotype for the het cells. More robust functional experiments are required to support the finding the NODAL/TGFβ signalling mediates the self-sustained stemness, particularly because this is the major novel finding distinguished from the authors previous work. • While the authors present a comprehensive omics investigation into alterations between wild type, homozygous, and heterozygous mutants, the critical functional experiments are lacking. In Figure 5, the authors seek to support the role of TGFβ in mediated stemness in the homozygous mutants, however, are not able to directly deplete TGFβ due to technical limitations of the culture conditions. Consequentially, the experiments are primarily built on the use of NODAL withdrawal and stimulation. The data presented thus implicate NODAL in the stemness phenotype, but it's not obvious TGFβ is substantially involved, particularly considering the inhibitor subsequently employed also inhibits NODAL type 1 receptors. Furthermore, there is a paucity of readouts for stemness. For example, a more convincing narrative would include additional expression markers of the core pluripotency network (e.g. OCT4, SOX2, etc.) as well as functional readouts (e.g. NODAL withdrawal and assessment of differentiation) after NODAL stimulation/depletion and comparing across genotypes. Overall, the primary conclusions of this work are not well-evidence by the presented data and the authors should consider additional functional experiments or reframing the narrative.

      • A key conclusion of this study is there is a dose-dependent stemness phenotype. As this is not explicitly defined, to this reader, it would imply a graded response between wild type, heterozygotes, and homozygotes in the phenotypic and molecular characteristics. However, as is noted particularly in the omics components of the manuscript, there is in fact "near-binary" alteration in the assayed characteristics. Again, this should be qualified more explicitly, but it is more consistent with the data, which suggests the heterozygotes behave very similarly to the wild types, while homozygotes have substantial alterations. I would suggest the authors consider renaming their descriptions, removing "near-binary" and "dose-dependent" to something like "dose-threshold." This suggests after X threshold of oncogenic PI3K signalling, substantial alterations occur; under this threshold (e.g. hets), changes are marginal. In the event however that there may be a more "dose-dependent" effect, I would expect the transcriptomic and proteomic changes observed in the heterozygous cell lines should be seen in the homozygous cell lines (of which they are likely in greater in magnitude in addition to other changes). The manuscript would benefit from more direct comparisons between the heterozygotes and homozygotes.

      • Further to the above point, as the marginal phenotype observed in heterozygotes is a critical point in this paper, the authors would benefit from including heterozygote lines in the functional experiments presented in Fig 5. Inclusion of the hets in these experiments would instill confidence in this reader that the marginal molecular alterations characterized at the proteomic and transcriptomic level is reflected in the lack of functional stemness-sustaining behaviour.

      2.Thus, without the functional and mechanistic experiments alluded to above, the claims/ conclusions are speculative. In particular, the cancer narrative is irrelevant to the study. Considering both the lack of conclusive differentiation experiments or relevant breast cancer experiments, the discussion on differentiation therapy for breast cancer should be removed.

      3.Reproducibility is a concern for this study. The authors should perform more replicates on their experiments (focusing on technical replicates of the lines employed to discern technical vs biological variability). A challenge in reading this manuscript is understanding which replicates were used for which experiments, and whether they are technical or biological (i.e. different lines). While some of the figure legends note this information, it would be helpful to provide clarity throughout the text. In addition, it should be noted that some experiments (e.g. the RPPA analysis in Fig 2B and Fig S3B) show substantial variability between replicates, but because it appears only a single technical replicate from two different cell lines was used, it is impossible to distinguish whether the variability is of a biological or technical nature. The authors would do well to focus on collecting more technical replicates of fewer biological replicates, and then expand to include more biological replicates if initial biological variation is observed.

      Minor Comments:

      • Consider adding gridlines to the MDS plots for clarity of read
      • In Fig S2, some of the in-figure labelling is incorrect
      • In Fig S1C, the authors note poor correlation in the heterozygotes between this and a previous study. It would be helpful to qualify this discrepancy, as it is potentially concerning.
      • Line 208, the authors state that the small p-value for the homozygotes is suggestive of a dose-dependent effect. This is not the case; it simply suggests a greater probability of the effect being non-random.
      • What does the height in Fig 4B correspond to? It would perhaps be of value to scale nodes based on the significance value.

      Significance

      Nature and significance of the advance:

      • Throughout the text, the authors frequently reference their previous study in PNAS and often the lines of what is novel in this paper vs. reproduction of previous findings is blurred. The authors would benefit from reducing the frequency of referencing their previous study and focusing on emphasizing the novelty of the present findings.

      • Without functional assays to complement and test their models, this manuscript is not a significant advance.

      State what audience might be interested in and influenced by the reported findings.

      • I believe the narrative was written for pluripotent stem cell biologists but without robust functional and quantitative cell biological assays to test their models, I don't anticipate stem cell biologists will be very interested.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      • Stem cell biology, cancer biology, systems biology, mTORC1 signalling

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      As below.

      Significance

      In this manuscript, Madsen et al have investigated the role of heterozygous versus homozygous PIK3CAH1047R gain-of-function mutation at maintaining stemness of induced pluripotent stem cells (iPSCs). The authors have performed high-depth RNAseq, proteomic, and RPPA analyses to show that biallelic PIK3CA alterations induce stronger activation of the PI3K signaling axis, compared to monoallelic mutations. The authors claim that a higher PI3K signaling dose activates the NODAL/TGF-b pathway, which in turn supports stemness in an autocrine fashion. These are important findings, however, the manuscript and its conclusions can be improved.

      The authors have described the role of PIK3CAH-1047R gain-of-function mutation in cancer and overgrowth syndromes. However, cancer associated somatic mutations in PIK3CA are mostly heterozygous. Similarly, PIK3CA related overgrowth syndromes (PROS) are caused by post-zygotic mosaic PIK3CA activating mutation. As such, the relevance of homozygous PIK3CA alterations to these pathological conditions is unclear. The authors should elaborate on the biological implications of their findings.

      The role of biallelic PIK3CA mutation is reminiscent of compound mutations in PIK3CA which have also been shown to increase PI3K signaling output. However, double PIK3CA mutations confer enhanced sensitivity to PI3K inhibition (Toska et al. Science 2019). Could the authors kindly speculate on this discrepancy. Similarly, p110 alpha specific inhibitor, alpelisib, is highly effective against PIK3CA-mutant ER+ breast cancer and PROS. As such, the clinical relevance of the insensitivity of homozygous PIK3CA mutation to PI3K inhibitors is unclear.

      Figure 2: The authors have performed RPPA analysis in the presence of 100 nM BYL719. Alpelisib is commonly used at 1 uM concentration for in-vitro experiments, and has a cMax of ~5 uM. We suggest the authors perform western blot analysis to confirm the results of RPPA.

      Figures 3 and 4: The authors should expand their RNAseq analysis to demonstrate enrichment of stemness and TGFb signaling in homozygous mutant cells compared to heterozygous cells.

      The authors should confirm the results of pathway analysis in-vitro to show that homozygous PIK3CA mutation confers increased stemness compared to heterozygous mutation.

      Figure 5: Kindly provide direct evidence demonstrating that increased PIK3CA signaling output induces NODAL expression in this experimental setting. Also, please normalize gene expression data to WT cells so it is easy to visualize the changes in NODAL and NANOG expression in homozygous and heterozygous mutants compared to WT iPSCs

      Kindly quantify Fig. S5.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This is an interesting and thorough study characterising human ipsc with hetero or homozygous mutation in pi3k pathway that lead to its hyper-activation. They prove that the increased stemness is results from enhanced autocrine responsiveness to TGF signalling pathway.

      The main conclusions are well supported by the presented data. cutting edge tools and bioinformatic analysis are adequately applied. I have only one important point:

      Major comment:

      1) western blot based validation of TGF pathway activation in wt and mutant ipscs will be helpful to strengthen the results based on bioinformatic data.

      Significance

      Important work for studies on signalling, cancer mutations, modelling cancer in stem cells, pluripotency regulation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to the reviewers’ comments:

      We thank both the reviewer for their critical evaluation and excellent suggestion to improve the manuscript. We are making all the changes suggested by both the reviewers and performing the experiments to address all the concerns specifically from the reviewer #1. Please find below our response to the reviewers’ comments:

      Reviewer #1:

      This is an interesting study from the Rahaman group that identifies cardiolipin (CL) as a potential binding target for Drp6 recruitment to the nuclear membrane in Tetrahymena (that has a unique nuclear remodeling program). In addition, they identify a residue, I553 in the DTD region, which they claim is a key residue involved in specific CL interactions. While the experiments themselves are technically sound, and are well performed and controlled, I don't find the major conclusion that I553 is involved in direct CL interactions justified or well rationalized. By their own admission (in the discussion), the conservative mutation I553M may perturb local folding and may indirectly affect CL interactions. There is no test of DTD folding with and without the I553M mutation, nor are there other mutations (e.g. I553A and in the vicinity) tested. CD experiments in the absence and presence of CL-containing membranes will likely yield information on the impact of the I553 mutations, while DLS experiments would inform on the hydrodynamic properties (overall 3D fold) of the DTD and the impact of these mutations. CL interactions generally involve a combination of electrostatic and hydrophobic forces. Where do the electrostatic interactions come from? Why would an Isoleucine to Methionine mutation affect the hydrophobic component, even if I553 is the key hydrophobic residue?

      Response:

      We thank the reviewer for the comments that the experiments are sound, well performed with appropriate controls. While we agree that the exact mechanism of how I553 provides specificity to cardiolipin binding is not addressed in the present manuscript, our study clearly demonstrates that the isoleucine at 553 plays important role in determining cardiolipin specificity and nuclear recruitment. As pointed out by the reviewer, it is possible that changing isoleucine to methionine may affect the local conformation. However, there is no major conformational change in the DTD due to this mutation. This conclusion is based on clear loss of nuclear localization and cardiolipin interaction for the mutant without affecting other properties. The in vitro floatation assay clearly stablish that the effect is directly by inhibiting interaction specifically with cardiolipin containing membrane. It should be further noted that the same domain DTD interacts with other two lipids (PS and PA) and mutant retains interaction with them arguing that conformation of this domain is not significantly changed due to I to M mutation. Consistent with these results I553M mutant could be targeted to the nuclear membrane as a complex with wildtype Drp6 further confirming that I553 could form correct self-assembled structure with wildtype protein required for association with nuclear membrane. This is further substantiated by comparing all the known biochemical properties including GTPase activity, membrane binding via other two lipids, formation of helical spirals and ring structures. Hence it is clear that I553 provides specificity to bind cardiolipin and recruitment to the nuclear membrane. We will further confirm if there is any local conformation change due to the mutation I to M by fluorescence quenching experiments and will be incorporated in the revised manuscript.

      Regarding overall folding of the mutant, this is an excellent suggestion by the reviewer. We are planning to perform CD experiments of the I553M mutant and wildtype proteins to compare if there is any change in overall folding due to mutation. This result would be incorporated in the revised manuscript.

      Reviewer is right to point out that both electrostatic and hydrophobic interactions are important for interaction with cardiolipin. Electrostatic interaction is important for all the phospholipids while interacting with protein and is expected to come from other amino acid residues which are positively charged. Electrostatic interaction may contribute to the affinity of the interaction by providing additional binding energy. But considering its universal nature of interaction with all the phospholipids, it cannot give specificity for a specific lipid and hence would not discriminate among different phospholipids.

      Regarding affecting hydrophobic component, the reviewer is correct that both are strong hydrophobic amino acids and loss of I553M interaction with cardiolipin may not be due to change in hydrophobicity

      To address that the loss of cardiolipin interaction is not specific to methionine and is due to absence of isoleucine, the suggestion from the reviewer to replace I553 with A (alanine) is an excellent one. We are doing the experiments and we anticipate to incorporate these results in our revised manuscript.

      Reviewer #1 (Significance (Required)):

      The addressed phenomenon is restricted to Tetrahymena and may not have far reaching implications. Regardless, the identification of CL as a binding target for Drp6 at the nuclear membrane of this organism is in itself significant. The conclusion that I553 is the key CL binding residue is however not warranted. Additional experiments are needed to dissect how this residue impacts CL interactions and examine whether the observed effect is direct or indirect.

      Response:

      We thank the reviewer for appreciating the significance of this work. We agree that our data is Tetrahymena specific. However, we believe that the study is relevant for all the proteins whose association with target membranes depend on cardiolipin including many cardiolipin interacting DRPs (such as DRPs involved in biogenesis and maintenance of mitochondria).

      We really appreciate the reviewer for the excellent suggestions. Based on this we are performing the following experiments.

      1. CD experiments to assess overall folding of I553M and Wildtype protein
      2. Fluorescence quenching of Tryptophan (at amino acid position 548) residue in the vicinity of I553 to compare conformation of the mutant with that of wildtype protein.
      3. Evaluation of I553A in nuclear localization and cardiolipin binding. We anticipate these results to further confirm if I553 is the key CL binding residue and if the effect is direct.

      The writing is not clear in some parts and may require a round of language editing. There are no issues with reproducibility.

      Response

      We thank the reviewer for pointing out the language editing. We will edit the language wherever we find it appropriate. We would highly appreciate if reviewer can indicate the portions that need special attention.

      Reviewr #2:

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Dynamin is a GTPase superfamily protein involved in membrane fusion and division. This paper focused on Drp6, one of the eight dynamin superfamily proteins of Tetrahymena, and analyzed its nuclear envelope localization mechanism by a combination of in vivo cytogenetical analysis and in vitro biochemical analysis for the various mutant Drp6 proteins. Results showed that a specific amino acid residue (isoleucine at the 553rd) in the membrane binding domain of Drp6 was required for its nuclear membrane localization, but this residue is not required for ER/endosome localization and GTPase activity. Furthermore, in vitro floating analysis using centrifugation indicated that Drp6 specifically bound to the cardiolipin at the 553rd isoleucine residue and this binding was required for Drp6's nuclear membrane localization. Finally, removal of cardiolipin from the conjugating cells using inhibitor treatment showed that cardiolipin was required for the new macronucleus formation (including the expansion of macronuclear envelope) through the function of Drp6. Based on these results, authors concluded that cardiolipin targets Drp6 to the nuclear membrane in Tetrahymena.

      \*Major comments:***

      The experimental data presented in this paper are reasonable and the results are solid, and therefore I think the deduced conclusions are convincing. However, to improve this paper, I have several minor comments to be revised before publication.

      \*Minor comments:***

      1. In the previous paper, it has been shown that GFP-Drp6 is localized in the inner nuclear membrane of both macronucleus and micronucleus. In this paper, however, this point is not clearly stated and is not shown in the figures --- I could not understand such localization pattern of GFP-Drp6 in Fig. 1C and Fig. 3b and the statements in the text. I suggest adding such statements somewhere in Introduction or Result section. Also, add adequate references to the corresponding statements in the text.
        • Related to the comment 1, I suggest replacing Fig. 1C (images of fixed cells) with Fig. S1B (images of live cells) because nuclear localization of GFP-Drp6 are much clearer in Fig. S1B (live cell) than Fig. 1C (fixed cell), and because fixation may cause artificial redistribution of the proteins. Please add arrows in those figures to point out the position of micronucleus in those figures if necessary.*
        • Similarly, I suggest replacing images of Fig. 5B (fixed cells) with those of Fig. S3 (live cells).*
        • page 7, line 224: GFP-Nup3 is used as a marker protein of the nuclear pore complex (NPC). However, there is no description of how GFP-Nup3 is obtained or made. Add description how this DNA plasmid was obtained or generated.*
        • Related to the comment 4, "Nup3" is first discovered in Malone et al., Eukaryotic Cells, 2009, but also soon after discovered as the name of "MicNup98B" in Iwamoto et al., Curr Biol, 2009 and used in several papers including Iwamoto et al., Genes Cells, 2010; JCS, 2015; JCS 2017; and more. Because Nup3 is the Tetrahymena paralogs of human Nup98 and the name of "Nup98" is well established to call these homologs in various eukaryotes, I suggest adding the name of "MacNup98B" after the word of "Nup3" for reader's better understanding. I also suggest adding appropriate references to refer to this protein as follows: Add Malone et al. 2009 for "Nup3" and Iwamoto et al., 2009 for "MacNup98B."*
        • page 9, line 295: I wonder if "Fig. 3b" may be a mistake of "Fig. 5C." If so, please correct this.*
        • page 10, the second paragraph (lines 311-322): This paragraph discussed the possible involvement of Drp6 in the nuclear envelope expansion of the post-zygotic nucleus. It may be interesting to point out that large-scale nuclear envelope reorganization including the formation of the redundant nuclear envelope and the type-switching of the NPC (from the MIC-type NPC to the MAC-type one) has been reported at this developmental stage (Iwamoto et al., JCS 2015). For example, the peculiar shaped nuclear envelope with the redundant/overlapping nuclear envelope structure can be seen and the MAC-type NPCs rapidly assembles to the expanding nuclear envelope. It may be interesting to point out that cardiolipin and Drp6 may be involved in these phenomena. But it is too speculative and therefore consider adding such a discussion as an option.*
        • page 13, line 412: Is the word "GFP-drp6-I553M" written in italics intended for the gene for the GFP-drp6-I553M protein? If so, protein may be acceptable here. Make sure there are no problems with italicized characters. Also, check if the lowercase letter "d" in "drp6" is OK because large letters are used in other cases.*
        • page 20, figure 1: I recommend switching the positions of HDyn1 and Drp6 in Figure 1a to keep the order in Figure 1b.*
        • page 21, line 671: Add the word "Tetrahymena" before "Drp 6" to pair with the word "human dynamin 1".*
        • page 23, line 729: Remove "and."*
        • page 23, lines 729 and 731: Unify the expression of "cardiolipin" and "Cardiolipin"*
        • page 23, line 732: Add "or" before "10% Phosphatidylserin."*
        • page 24, Figure 3a: Please mark the position of I553M in the figure if possible. Alternatively, indicate the range of amino acid residues after the words "red" and "green" in the figure legend.* Response:

      We thank the reviewer for the excellent comments that “the experimental data presented in this paper are reasonable and the results are solid, and therefore I think the deduced conclusions are convincing.” We also thank the reviewer for the minor comments which are thorough and very insightful. it will improve the manuscript substantially. We would incorporate all the changes in the revised manuscript.

      Reviewer #2 (Significance (Required)):

      The corresponding author and his colleagues have reported that Tetrahymena Drp6 is localized to the outer nuclear membrane of both macronucleus and micronucleus of Tetrahymena (Elde et al., 2005) and that Drp6 is required for the formation of new macronuclei during nuclear differentiation (Rahaman et al., 2008). Therefore, these parts are not novel.

      The novelty of this study is as follows:

      (1) The discovery of a specific amino acid residue (isoleucine at the 553rd) of Drp6 that is required for its nuclear membrane localization.

      (2) the discovery of a lipid molecule, cardiolipin, as a critical partner for Drp6's nuclear membrane targeting.

      (3) Discovery of involvement of cardiolipin in the new macronucleus formation (the expansion of macronuclear envelope) through the function of Drp6.

      *

      I think their findings are highly novel and will provide new insight into a field of cell biology. Especially, their findings will contribute to understanding how specific proteins targeted to the specific intracellular membranes. In addition, their methods (such as floatation assay) for analyzing the interaction between the protein of interest and lipid/liposomes will become an important tool.*

      Response:

      We are very happy to note that the reviewer has pointed out the significance of the present study. We fully agree with reviewer and appreciate thorough analysis and excellent conclusion from the reviewer.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Dynamin is a GTPase superfamily protein involved in membrane fusion and division. This paper focused on Drp6, one of the eight dynamin superfamily proteins of Tetrahymena, and analyzed its nuclear envelope localization mechanism by a combination of in vivo cytogenetical analysis and in vitro biochemical analysis for the various mutant Drp6 proteins. Results showed that a specific amino acid residue (isoleucine at the 553rd) in the membrane binding domain of Drp6 was required for its nuclear membrane localization, but this residue is not required for ER/endosome localization and GTPase activity. Furthermore, in vitro floating analysis using centrifugation indicated that Drp6 specifically bound to the cardiolipin at the 553rd isoleucine residue and this binding was required for Drp6's nuclear membrane localization. Finally, removal of cardiolipin from the conjugating cells using inhibitor treatment showed that cardiolipin was required for the new macronucleus formation (including the expansion of macronuclear envelope) through the function of Drp6. Based on these results, authors concluded that cardiolipin targets Drp6 to the nuclear membrane in Tetrahymena.

      Major comments:

      The experimental data presented in this paper are reasonable and the results are solid, and therefore I think the deduced conclusions are convincing. However, to improve this paper, I have several minor comments to be revised before publication.

      Minor comments:

      1. In the previous paper, it has been shown that GFP-Drp6 is localized in the inner nuclear membrane of both macronucleus and micronucleus. In this paper, however, this point is not clearly stated and is not shown in the figures --- I could not understand such localization pattern of GFP-Drp6 in Fig. 1C and Fig. 3b and the statements in the text. I suggest adding such statements somewhere in Introduction or Result section. Also, add adequate references to the corresponding statements in the text.
      2. Related to the comment 1, I suggest replacing Fig. 1C (images of fixed cells) with Fig. S1B (images of live cells) because nuclear localization of GFP-Drp6 are much clearer in Fig. S1B (live cell) than Fig. 1C (fixed cell), and because fixation may cause artificial redistribution of the proteins. Please add arrows in those figures to point out the position of micronucleus in those figures if necessary.
      3. Similarly, I suggest replacing images of Fig. 5B (fixed cells) with those of Fig. S3 (live cells).
      4. page 7, line 224: GFP-Nup3 is used as a marker protein of the nuclear pore complex (NPC). However, there is no description of how GFP-Nup3 is obtained or made. Add description how this DNA plasmid was obtained or generated.
      5. Related to the comment 4, "Nup3" is first discovered in Malone et al., Eukaryotic Cells, 2009, but also soon after discovered as the name of "MicNup98B" in Iwamoto et al., Curr Biol, 2009 and used in several papers including Iwamoto et al., Genes Cells, 2010; JCS, 2015; JCS 2017; and more. Because Nup3 is the Tetrahymena paralogs of human Nup98 and the name of "Nup98" is well established to call these homologs in various eukaryotes, I suggest adding the name of "MacNup98B" after the word of "Nup3" for reader's better understanding. I also suggest adding appropriate references to refer to this protein as follows: Add Malone et al. 2009 for "Nup3" and Iwamoto et al., 2009 for "MacNup98B."
      6. page 9, line 295: I wonder if "Fig. 3b" may be a mistake of "Fig. 5C." If so, please correct this.
      7. page 10, the second paragraph (lines 311-322): This paragraph discussed the possible involvement of Drp6 in the nuclear envelope expansion of the post-zygotic nucleus. It may be interesting to point out that large-scale nuclear envelope reorganization including the formation of the redundant nuclear envelope and the type-switching of the NPC (from the MIC-type NPC to the MAC-type one) has been reported at this developmental stage (Iwamoto et al., JCS 2015). For example, the peculiar shaped nuclear envelope with the redundant/overlapping nuclear envelope structure can be seen and the MAC-type NPCs rapidly assembles to the expanding nuclear envelope. It may be interesting to point out that cardiolipin and Drp6 may be involved in these phenomena. But it is too speculative and therefore consider adding such a discussion as an option.
      8. page 13, line 412: Is the word "GFP-drp6-I553M" written in italics intended for the gene for the GFP-drp6-I553M protein? If so, protein may be acceptable here. Make sure there are no problems with italicized characters. Also, check if the lowercase letter "d" in "drp6" is OK because large letters are used in other cases.
      9. page 20, figure 1: I recommend switching the positions of HDyn1 and Drp6 in Figure 1a to keep the order in Figure 1b. 
      10. page 21, line 671: Add the word "Tetrahymena" before "Drp 6" to pair with the word "human dynamin 1".
      11. page 23, line 729: Remove "and."
      12. page 23, lines 729 and 731: Unify the expression of "cardiolipin" and "Cardiolipin"
      13. page 23, line 732: Add "or" before "10% Phosphatidylserin."
      14. page 24, Figure 3a: Please mark the position of I553M in the figure if possible. Alternatively, indicate the range of amino acid residues after the words "red" and "green" in the figure legend. 

      Significance

      The corresponding author and his colleagues have reported that Tetrahymena Drp6 is localized to the outer nuclear membrane of both macronucleus and micronucleus of Tetrahymena (Elde et al., 2005) and that Drp6 is required for the formation of new macronuclei during nuclear differentiation (Rahaman et al., 2008). Therefore, these parts are not novel.

      The novelty of this study is as follows: (1) The discovery of a specific amino acid residue (isoleucine at the 553rd) of Drp6 that is required for its nuclear membrane localization. (2) the discovery of a lipid molecule, cardiolipin, as a critical partner for Drp6's nuclear membrane targeting. (3) Discovery of involvement of cardiolipin in the new macronucleus formation (the expansion of macronuclear envelope) through the function of Drp6.

      I think their findings are highly novel and will provide new insight into a field of cell biology. Especially, their findings will contribute to understanding how specific proteins targeted to the specific intracellular membranes. In addition, their methods (such as floatation assay) for analyzing the interaction between the protein of interest and lipid/liposomes will become an important tool.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This is an interesting study from the Rahaman group that identifies cardiolipin (CL) as a potential binding target for Drp6 recruitment to the nuclear membrane in Tetrahymena (that has a unique nuclear remodeling program). In addition, they identify a residue, I553 in the DTD region, which they claim is a key residue involved in specific CL interactions. While the experiments themselves are technically sound, and are well performed and controlled, I don't find the major conclusion that I553 is involved in direct CL interactions justified or well rationalized. By their own admission (in the discussion), the conservative mutation I553M may perturb local folding and may indirectly affect CL interactions. There is no test of DTD folding with and without the I553M mutation, nor are there other mutations (e.g. I553A and in the vicinity) tested. CL interactions generally involve a combination of electrostatic and hydrophobic forces. Where do the electrostatic interactions come from? Why would an Isoleucine to Methionine mutation affect the hydrophobic component, even if I553 is the key hydrophobic residue? Additional experiments are therefore essential to identify the actual residues involved in specific CL interactions. CD experiments in the absence and presence of CL-containing membranes will likely yield information on the impact of the I553 mutations, while DLS experiments would inform on the hydrodynamic properties (overall 3D fold) of the DTD and the impact of these mutations.

      The writing is not clear in some parts and may require a round of language editing. There are no issues with reproducibility.

      Significance

      The addressed phenomenon is restricted to Tetrahymena and may not have far reaching implications. Regardless, the identification of CL as a binding target for Drp6 at the nuclear membrane of this organism is in itself significant. The conclusion that I553 is the key CL binding residue is however not warranted. Additional experiments are needed to dissect how this residue impacts CL interactions and examine whether the observed effect is direct or indirect.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      This is a fascinating and beautifully written article about the possible evolutionary relationship between two major protein superfamilies - the P-loop NTPases and the Rossmans. Both are ancient and highly diverse superfamilies, containing a significant proportion of all extant domain sequences and were probably amongst the earliest enzyme superfamilies to emerge in evolution. No major evolutionary classification of proteins, such as SCOP, reports evolutionary relationships between them.

      Both share the same structural architecture of a beta-alpha-beta 3-layer sandwich and have an intriguing number of other shared structural features including the location of the binding site for phospho-ligands. However, whilst both bind phosphorylated ribonucleosides, the mode of binding differs and also the manner in which these compounds are exploited. Furthermore, there are differences in the topologies of the folds possibly suggesting distinct evolutionary trajectories. The Rossmanns appear to be more structurally conserved, whilst the P-Loops vary more in their topologies and possibly represent less stable arrangements of beta-sheets and alpha-helices. The authors have brought together several strands of evidence to explore possibly evolutionary relationships. Detailed structural analyses allow the authors to explicitly detail the significant shared structural features. For example, similarities in the mode of binding the phosphate moiety in the ligand. The structural features are well described and there are appropriate illustrations visualising key differences and similarities. The shared features of the phosphate binding site likely emerged and were favoured early in evolution, as supported by other analyses reported by Longo et al. However, as the authors point out there are other compelling similarities including the equivalent location of this site in the first beta-loop-alpha element in both superfamilies, which is not a necessary constraint of phosphate binding and the authors support this by giving examples of phosphate binding at the tip of alpha-4. In addition, they provide evidence supporting the common involvement of beta-2 which contains the conserved Asp in the Rossmanns common ancestor. The Walker-B Asp in the P-loops is also at the tip of the beta-strand adjacent to beta-1, as in the Rossmanns - although this is an inserted strand relative to the Rossmann topology. The authors propose feasible evolutionary scenarios for how the P-Loops and Rossmans may have diverged to acquire additional secondary structure elements extending the common beta-PBL-alpha-beta-Asp feature present in both superfamilies. Further compelling evidence is given by detection of a bridging protein - Tubulin - linking the two superfamilies. This has the distinct Rossmann topology but binds GTP in the P-loop NTPase mode. Furthermore, the GTP is hydrolysed by water activated by a ligated metal dication. Final support is given by reporting common sequence themes between the P-loop enzyme HPr kinase/phosphatase and some Rossmann proteins. The authors present further interesting and detailed analyses of similarities between the proteins sharing this unusual theme. The evidence provided by the authors for the shared beta-PBL-alpha-beta-Asp fragment seems very strong to me and has been presented in an interesting and informative way. Of course, it is not possible to know the subsequent evolutionary trajectories but the scenarios presented seem plausible.

      We thank the reviewer for their encouraging remarks on our manuscript.

      **I only have minor comments** 1) SCOP2 provides information on links between superfamilies based on rare sequence or structural features. Have the authors checked this resource for any details on beta-PBL-alpha-beta-ASP fragment? Or perhaps consulted with Alexey Murzin about this feature?

      The classification of Rossmann and P-Loop proteins in SCOP2 is consistent with the ECOD classification scheme. For further confirmation, we wrote Alexey Murzin and he replied that Rosmanns and P-Loops are annotated as two separate evolutionary lineages, termed “hyperfamilies” in SCOP2. He found our new evidence compelling, but that given the current criteria for shared ancestry, P-loops and Rossmanns are separate lineages.

      2) I was rather confused by the way in which EC annotations were collected for the two superfamilies ie via Pfam – wouldn’t it be better to use SUPERFAMILY as the domain structures would map directly to these sequence relatives. I’m also surprised that they only took the common EC from a Pfam family since the aim of this analysis was to identify how many different enzyme functions the two superfamilies supported. Pfam does not classify by function and so inevitably groups functionally diverse relatives. However, to get the full range of enzyme functions supported by these superfamilies I would have thought all non-redundant EC functions across these constituent Pfam families should be counted. Perhaps I have misunderstood.

      We have updated the analysis to make use of the SUPERFAMILY database and, as per your suggestion, we now count all non-redundant EC numbers. Although the EC number counts have somewhat changed, the major point – that these are exceptionally diverse evolutionary lineages – has not.

      3) The authors refer to a set of previously curated ‘themes’ and allude to a methodology that will be reported in a forthcoming manuscript. The idea of identifying rare themes and then using them to locate very distant homologues is appealing. However, I think some details should be provided here. For example, some brief details on the technology for detecting the themes and thresholds on significance. How rare are they and how conserved do these fragments need to be between superfamilies to join their curated list? Furthermore, how many of these curated themes are similar to the one reported in their article and do they get crosslinks to other superfamilies based on closely related themes? ie how unique is this theme to the P-loop and Rossmanns and are there closely related themes linking these two superfamilies to other superfamilies? I would imagine it is quite a distinct theme but I would have liked to see a few more details on this to reassure that there are no closely related themes.

      We have updated the manuscript to include a more detailed description of the methods used to detect bridging themes shared between the Rossmann and P-Loop evolutionary lineages. In addition, we now include a supplemental table (Table S2) with all of the initial hits from the theme analysis.

      4) The authors have built model structures to allow them to estimate ligand location in proteins with no structural characterisation. It would be helpful if they reported the degree of sequence similarity between the query and template proteins and also the model quality.

      We have updated this section to include more details. In addition, we have identified a structure from the same T-group to serve as our ligand donor. The updated ligand donor is more closely related to 1ko7 than the previous ligand donor, though the positioning of the ligand is effectively unchanged. We note that the global sequence identity to both the previous and new ligand donor is low (less than 30% sequence identity). However, the phosphate binding loops align well in both sequence and structure, as is detailed in the revised Methods section.


      The study by Longo et al. was devoted to evolutionary history of P-loop NTPases and Rossmann fold proteins. Although not related in sequence, the two protein families share some structural features that imply that they could be diverged from a common ancestor. Using bioinformatic analyses, the study under review identified some bridge proteins (of tubulin family) that share themes of both P-loops and Rossmanns, offering a possible support for the common ancestry. A minimum ancestral peptide structure is proposed based on the analysis and its possible diversification trajectory is hypothesized. Even though the divergence scenario is clearly outlined, the authors do not over-interpret the observations and admit that convergence could still explain the scenario. The methodology and results are sufficiently described and conclusions are explained in detail. Although it would be really interesting to design an experimental study to support the conclusion (and I suppose that the authors will do that), that is clearly outside the scope of this bioinformatic study.

      Obtaining experimental evidence for our hypothesis is far from trivial. Modern proteins, including the bridging ones identified here, may not be amenable to exchange due to differing contexts (epistasis). Still, we agree that highlighting experimental directions is a good idea. We have updated the sections From an ancestral seed to intact domains and Conclusion to include a brief discussion of experiments that may help test our hypotheses about the evolution of these protein lineages.

      I would not propose any major changes to the manuscript as I think that the message is very clear. **Minor comments:** (1)In the results section, the text is very clear but tends to be repetitive in places. I think the manuscript would be more easily readable if more to the point at some sections.

      We have edited the manuscript to remove cases of unnecessary repetition in the results section and throughout.

      (2)There is probably a few typos or unclear sentences, e.g. pg 5, mid-page, "The core, most common topology...); pg 12, three lines from the bottom "(where this element in canonical", probably should be "is canonical"; pg 11, mid page "the mode of binding of the catalytic dication of tubuling (often Ca2+)" - all the structures listed in Table S1 list Mg2+, so "often" is a bit misleading.

      We have corrected the unclear sentences and typos noted above, as well as a few others.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The study by Longo et al. was devoted to evolutionary history of P-loop NTPases and Rossmann fold proteins. Although not related in sequence, the two protein families share some structural features that imply that they could be diverged from a common ancestor. Using bioinformatic analyses, the study under review identified some bridge proteins (of tubulin family) that share themes of both P-loops and Rossmanns, offering a possible support for the common ancestry. A minimum ancestral peptide structure is proposed based on the analysis and its possible diversification trajectory is hypothesized.

      Even though the divergence scenario is clearly outlined, the authors do not over-interpret the observations and admit that convergence could still explain the scenario. The methodology and results are sufficiently described and conclusions are explained in detail. Although it would be really interesting to design an experimental study to support the conclusion (and I suppose that the authors will do that), that is clearly outside the scope of this bioinformatic study.

      I would not propose any major changes to the manuscript as I think that the message is very clear.

      Minor comments:

      (1)In the results section, the text is very clear but tends to be repetitive in places. I think the manuscript would be more easily readable if more to the point at some sections.

      (2)There is probably a few typos or unclear sentences, e.g. pg 5, mid-page, "The core, most common topology...); pg 12, three lines from the bottom "(where this element in canonical", probably should be "is canonical"; pg 11, mid page "the mode of binding of the catalytic dication of tubuling (often Ca2+)" - all the structures listed in Table S1 list Mg2+, so "often" is a bit misleading.

      Significance

      I think this is a very interesting analysis of the evolutionary history of the P-loop and Rossmann fold family which are considered among the most ancient and abundant protein folds. That makes them of high interest also for origins of protein structure. The results are not firmly conclusive (because of the limits of such analyses), making the outcomes of the study partly hypothetical. I think it would be very interesting to outline suggestions for future experiments that could test the hypothesis to be more valuable to a broader audience.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This is a fascinating and beautifully written article about the possible evolutionary relationship between two major protein superfamilies - the P-loop NTPases and the Rossmans. Both are ancient and highly diverse superfamilies, containing a significant proportion of all extant domain sequences and were probably amongst the earliest enzyme superfamilies to emerge in evolution. No major evolutionary classification of proteins, such as SCOP, reports evolutionary relationships between them.

      Both share the same structural architecture of a beta-alpha-beta 3-layer sandwich and have an intriguing number of other shared structural features including the location of the binding site for phospho-ligands. However, whilst both bind phosphorylated ribonucleosides, the mode of binding differs and also the manner in which these compounds are exploited. Furthermore, there are differences in the topologies of the folds possibly suggesting distinct evolutionary trajectories. The Rossmanns appear to be more structurally conserved, whilst the P-Loops vary more in their topologies and possibly represent less stable arrangements of beta-sheets and alpha-helices.

      The authors have brought together several strands of evidence to explore possibly evolutionary relationships. Detailed structural analyses allow the authors to explicitly detail the significant shared structural features. For example, similarities in the mode of binding the phosphate moiety in the ligand. The structural features are well described and there are appropriate illustrations visualising key differences and similarities.

      The shared features of the phosphate binding site likely emerged and were favoured early in evolution, as supported by other analyses reported by Longo et al. However, as the authors point out there are other compelling similarities including the equivalent location of this site in the first beta-loop-alpha element in both superfamilies, which is not a necessary constraint of phosphate binding and the authors support this by giving examples of phosphate binding at the tip of alpha-4. In addition, they provide evidence supporting the common involvement of beta-2 which contains the conserved Asp in the Rossmanns common ancestor. The Walker-B Asp in the P-loops is also at the tip of the beta-strand adjacent to beta-1, as in the Rossmanns - although this is an inserted strand relative to the Rossmann topology. The authors propose feasible evolutionary scenarios for how the P-Loops and Rossmans may have diverged to acquire additional secondary structure elements extending the common beta-PBL-alpha-beta-Asp feature present in both superfamilies.

      Further compelling evidence is given by detection of a bridging protein - Tubulin - linking the two superfamilies. This has the distinct Rossmann topology but binds GTP in the P-loop NTPase mode. Furthermore, the GTP is hydrolysed by water activated by a ligated metal dication. Final support is given by reporting common sequence themes between the P-loop enzyme HPr kinase/phosphatase and some Rossmann proteins. The authors present further interesting and detailed analyses of similarities between the proteins sharing this unusual theme.

      The evidence provided by the authors for the shared beta-PBL-alpha-beta-Asp fragment seems very strong to me and has been presented in an interesting and informative way. Of course, it is not possible to know the subsequent evolutionary trajectories but the scenarios presented seem plausible.

      I only have minor comments

      1)SCOP2 provides information on links between superfamilies based on rare sequence or structural features. Have the authors checked this resource for any details on beta-PBL-alpha-beta-ASP fragment? Or perhaps consulted with Alexey Murzin about this feature?

      2)I was rather confused by the way in which EC annotations were collected for the two superfamilies ie via Pfam - wouldn't it be better to use SUPERFAMILY as the domain structures would map directly to these sequence relatives. I'm also surprised that they only took the common EC from a Pfam family since the aim of this analysis was to identify how many different enzyme functions the two superfamilies supported. Pfam does not classify by function and so inevitably groups functionally diverse relatives. However, to get the full range of enzyme functions supported by these superfamilies I would have thought all non-redundant EC functions across these constituent Pfam families should be counted. Perhaps I have misunderstood.

      3)The authors refer to a set of previously curated 'themes' and allude to a methodology that will be reported in a forthcoming manuscript. The idea of identifying rare themes and then using them to locate very distant homologues is appealing. However, I think some details should be provided here. For example, some brief details on the technology for detecting the themes and thresholds on significance. How rare are they and how conserved do these fragments need to be between superfamilies to join their curated list? Furthermore, how many of these curated themes are similar to the one reported in their article and do they get crosslinks to other superfamilies based on closely related themes? ie how unique is this theme to the P-loop and Rossmanns and are there closely related themes linking these two superfamilies to other superfamilies? I would imagine it is quite a distinct theme but I would have liked to see a few more details on this to reassure that there are no closely related themes.

      4)The authors have built model structures to allow them to estimate ligand location in proteins with no structural characterisation. It would be helpful if they reported the degree of sequence similarity between the query and template proteins and also the model quality.

      Significance

      This article present compelling new evidence on the evolutionary relationship between two major, ancient enzyme superfamilies. As far as I'm aware these insights are novel and the detection of the bridging protein relative and the common 'theme', i.e. beta-PBL-alpha-beta-Asp fragment, is a new discovery.

      This work makes an important contribution to understanding the evolution of two major enzyme superfamilies and the insights can guide future evolutionary studies and protein design studies.

      The audience will be structural and evolutionary biologists, both experimental and computational.

      My expertise is in protein evolution and protein structure analyses and I have published a number of reviews and articles analysing and discussing Rossmann-like superfamilies.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      RESPONSE TO REVIEWER #1

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Ishihara et al. investigate and compare microtubule polymerization/depolymerization dynamics inside vs. at the periphery of microtubule asters in a cell-free Xenopus egg extract system. By tracking EB comets, which localize to growing microtubule ends, they find that the microtubule growth rates and EB comet lifetimes (interpreted as an indicator of microtubule catastrophe rates) are similar between the two spatially-distinct microtubule populations. However, using a tubulin-intensity-difference image analysis, the authors are also able to measure local microtubule depolymerization rates, and they find a significant difference in depolymerization rates of the two populations. Specifically, the authors report that the microtubule depolymerization rates measured within asters are faster than those measured at the periphery.

      \*Specific comments:***

      Figure 2.

      In the text, the authors report: "The depolymerization rate was 36.3 {plus minus} 7.9 μm/min (mean, std) in the aster interior, compared to 29.2 {plus minus} 8.9 μm/min (mean, std) at the aster periphery." This difference is certainly not two-fold (as stated in the abstract). It would also be useful to mark the mean rates on the graph in 2B.

      We removed the words ‘almost two-fold’ in the abstract. In the revision, we will mark the mean rates on Fig. 2B (using vertical lines).

      The bimodal shape of the depolymerization rate distributions in 2B is very interesting. This definitely warrants further investigation. At the minimum, the depolymerization rates should be determined at 50 um- intervals, as done for other parameters in Figure 1. Could it be that there are two coexisting populations of microtubules at the same location? Or is there a clear spatial compartmentalization of the two that is not obvious here because of the too large of a distance interval used for the measurements. This is a very important distinction for the claims of the paper.

      We understand the reviewer’s concern. There are some technical limitations that make the depolymerization measurement more challenging. While we use widefield imaging of EB1-GFP comets to obtain polymerization rates from a field of view spanning 500 microns, we may only use TIRF imaging for depolymerization measurements. In this method, we are limited to observing microtubules very close to the cover slip in a small field of view of 80x80 microns at 500 ms time intervals (movies span 1-2 minutes). One would need to move the TIRF field every 1-2 minutes at 50 micron intervals, but the aster periphery would be changing during this time, so the exact location of the measurement is hard to define. Thus, we opted to image the two spatial extremes: interior (close to the MTOCs) and the very periphery (where MT density is still sparse.)

      Perhaps, the largest limitation of this approach is the choice of peripheral regions based on the apparent sparsity of MTs in the TIRF field of view. Indeed, when we examine the depolymerization rate distributions for individual movies separately (see figure below, periphery #1-3 are three individual movies), we observe that some movies have rates as low as 20 µm/min, while others have higher values with a center around 36 µm/min. The depolymerization rates for the interior also vary from the mean values of 34.8-43.2 µm/min (interior #1-3 are three individual movies). In general, the spread of depolymerization rate within a field of view as well as across different fields of view is much larger than for polymerization. It is possible that this is partly explained by the lack of precise definition of interior vs. periphery in this TIRF-based measurement approach.

      Our data still supports the spatial regulation of depolymerization rate. However, there is no clear evidence for a bimodal distribution of depolymerization rate in any given field of view (80x80 micron square region). To clarify this point, we have removed the language “bimodal” in the main text. In the revisions, we will provide this figure as a supplement.

      We thank the critical feedback from reviewer #1 and #2 that allowed us to clarify this issue of apparent bimodality of the depolymerization rates.

      The authors make a point here that the distribution of measured polymerization rates is fairly narrow. This appears to be in contrast with Figure 1B, where polymerization rates take on a wide range of values. How do the two distributions of polymerization rates obtained by these two methods compare?

      To address this point, we directly compare the standard deviation of the polymerization rate measurements. For Fig. 1B EB1 tracking measurements, std ranges from 7.7-10.5 µm/min for a given spatial bin (as stated in Fig. 1B legend), while for Fig. 2A TIRF measurements std is 4.0 (periphery) and 4.5 µm/min (interior) as stated in the main text. Given that the mean values of polymerization rates are similar, this suggests that the TIRF measurements are less noisy. This further highlights the relative pros and cons of the two measurement methods. To discuss these issues, we have added a new paragraph in the discussion section.

      Figure 3.

      The laser ablation figure and movies are beautiful, but don't seem to add support to the story. Importantly, the authors do not confirm any spatial variability in depolymerization rate with these experiment. As a matter of fact, although the laser ablation experiments are only performed in the aster interior, the measured depolymerization rates appear to be just as consistent with the periphery rates in Figure 2. as they are with the interior rates in Figure 2. (They span quite a large range of values with the average right in the middle between what was measured for the two areas in Figure 2).

      Indeed, the values obtained with laser ablation are quite variable, even compared to the physiological depolymerization rate measured via TIRF microscopy. This perhaps reflects the variability of biology as well as the nature of the laser ablation which measures depolymerization rate at the level of microtubule populations. We hope our paper will increase interest in this rarely measured parameter, and perhaps invention of new probes to measure it more accurately and conveniently.

      Given the variability of our measurements, we conclude that the results between the TIRF based approach vs. laser ablation based approach of depolymerization rates are indistinguishable. We agree with the reviewer that the data does NOT argue that laser ablation results are more consistent with the interior TIRF measurements than peripheral TIRF measurements.

      To clarify this point, we remove the following clause “, which was comparable to the modal value of the depolymerization rates in the aster interior (Fig. 2).”

      We change the concluding sentence of our laser ablation paragraph from

      “Overall, these observations suggest that depolymerization dynamics are similar for plus ends following a natural catastrophe vs. ablation in the aster interior.”

      to

      “Overall, these observations confirm that depolymerization rates are variable, and we find no statistical distinction of rates between plus ends following a natural catastrophe vs. ablation.”

      Although the authors report they don't see any correlation between the distance and depolymerization rate, they should still plot the rate as a function of initial cut positions (Figures 3D, 3E).

      To address this concern, we plan to provide a supplemental figure in the revision. Please see the preliminary figure below. Due to technical limitations with the laser ablation system (field of view for 60x magnification), we only have measurements that span 15-100 microns from the center..

      From the single decaying inward wave the authors conclude that microtubules depolymerize fully to their minus ends which are distributed throughout the aster. Can the possibility that depolymerization is stopped by microtubule lattice defects/islands be excluded by these observations?

      The existence of microtubule lattice/defects is a recent development in the field and much is not known. If we assume that defects are structurally unstable, we predict that the episode of depolymerization will continue even when reaching a defect. If defects are stable and lead to instantaneous rescue of plus ends, we cannot distinguish the defects from minus ends. In this latter scenario, the interpretation of the decaying inward wave requires caution.

      What are the effects of the local increase in tubulin concentration due to the subunit release by depolymerization? What about the release of other lattice-binding MAPs (stabilizers)?

      We are interested in these questions as well. Soluble GDP-bound tubulin, released by depolymerization, is thought to exchange its nucleotide to GTP without need of a GEF, and no GEF is known. The dissociation rate of GDP is ~0.1 [1/sec], for a half-life of ~5 sec (Brylawski and Caplow, 1983, J. of Biol. Chem.), so we believe the tubulin subunits are recycled relatively quickly. It is not entirely obvious whether this necessarily results in a significant increase in ‘soluble’ tubulin concentration given tubulin diffusive transport. We hypothesize the main effect of stabilizing MAPs is on the depolymerization rate as discussed in our model in Fig. 5.

      Figure 4.

      Is the local depletion of tubulin/EB1 thought to be only within the narrow annulus at ~100 um distance, or is it not measurable on the inside due to the polymer signal? Can the two be separated? Such a sharp transition within a discrete annular region doesn't speak to the relative effects on the inside vs. the outside of the aster?!

      Yes, we also believe the soluble tubulin levels are even lower in the more inner regions of the aster. However, polymerized tubulin accounts for a large part of the fluorescence intensity in these inner regions, and our method does not faithfully reflect the soluble fraction. It will be important for future studies to employ specific methods that may unequivocally distinguish polymer vs. soluble tubulin concentrations (see below).

      More importantly, the local depletion of either tubulin or EB1 is not a good representation of a depletion of a MAP component that associates with the microtubule lattice. Both tubulin and EB1 bind preferably to microtubule ends, not lattice. Thus showing a profile of slight local tubulin and/or EB depletion does not seem to be relevant for the proposed model. Rather, overall microtubule polymer mass/density as a function of distance may be more relevant?

      Reviewer #1 makes a valid point that tubulin and EB1 are specifically incorporated to plus ends and not to the entire lattice as we assume for the MAPs in our theoretical model. To address this issue, we analyzed the fluorescence intensity of images obtained for a MAP that associates with the MT lattice, Tau-mCherry (Mooney et al. 2017). This quantification shows a depletion pattern similar to tubulin and EB1. Thus, we believe the local depletion is a general feature. For the revision, we plan to incorporate this Tau-mCherry data in Fig. 4.

      Figure 5.

      The toy model is intuitive and clear, but not sufficient without any experimental investigation. An attempt to quantify the actual distributions of at least one or a few selected proposed MAPs is needed. Is the depletion strongest where microtubule density is highest? What is the ratio of a MAP intensity to microtubule polymer density as a function of distance? How does that relate to local depolymerization rates? What are other testable model predictions that can show support for the proposed mechanism?

      We understand that our proposal is rather speculative, and the goal of this manuscript was to propose a hypothesis that may inspire others working on assembly on intracellular organelles. Although Tau is not an endogenous component of the egg extract system, we believe that our new quantification of Tau-mCherry depletion adds more credibility to our general proposal.

      Microtubule density is roughly uniform within the interior of the aster according to our current understanding (Ishihara et al. 2016 eLife). So the MAP:MT ratio is relatively uniform throughout the aster except at the very periphery where there are very few MTs assembled (i.e. “depletion is weakest where MT density is lowest.”)

      In the future, we may perform (1) FCS measurements of candidate MAPs to directly measure the concentration profile of the candidate MAP in soluble form and (2) depletion/addback to show which MAP most affects depolymerization rate. Although these experiments are appealing, this requires generation of new molecular reagents as well as calibration of a highly specialized optical method. Therefore, we decided to limit this paper to focus on the unusual observation of the variation of depolymerization rate and speculate the underlying mechanism.

      Also, the table is insufficiently described. Are any or all of these MAPs known to be specific regulators of microtubule depolymerization rates, but not other dynamics parameters?

      There are a large number of MAPs in Xenopus eggs, as there are in all cells, and the degree to which their effects on microtubules has been characterized is variable. To address this comment we include in the revised ms a list of known MAPs that are present in Xenopus egg extract, along with their estimated concentration from a published proteomic study. We annotate each MAP as to whether it increases or decreases microtubule stability, acknowledging that these data are very incomplete, in some cases there is disagreement in literature, and that we are combining pure protein and whole cell analysis. This table illustrates the challenge of associating dynamics regulation with any one MAP, since the behavior of microtubules is regulated by all these factors operating in parallel. That said, certain MAPs jump out as candidate depolymerization regulators that have been little studied for effects on dynamics, for example, MAP7.

      In the revision, we suggest to add this expanded table as a supplementary Table in addition to Table 1.

      Protein Description

      Gene Symbol

      Est. Conc. (nM)

      MT polymerization/nucleation/rescue?

      MT depolymerization/catastrophe?

      Lead reference

      Microtubule-associated protein RP/EB family member 1

      MAPRE1

      1800

      Increase

      Decrease

      PMID: 18364701

      Stathmin

      STMN1

      1600

      Decrease

      Increase

      PMID: 11792540

      MAP4

      MAP4

      960

      Increase

      Decrease

      PMID: 7962090

      Echinoderm microtubule-associated protein-like 2

      EML2

      580

      Decrease

      Increase

      PMID: 11694528

      EML4 protein

      EML4

      500

      Increase

      Decrease

      PMID: 17196341

      Disks large-associated protein 5

      DLGAP5

      380

      Increase

      Decrease

      PMID: 16631580

      Cytoskeleton-associated protein 5

      CKAP5

      300

      Increase

      Increase

      PMID: 23666085

      Kinesin-like protein KIF2C

      KIF2C

      200

      Decrease

      Increase

      PMID: 12620232

      CAP-Gly domain-containing linker protein 1

      CLIP1

      190

      na

      na

      Cytoskeleton-associated protein 4

      CKAP4

      160

      Increase

      Decrease

      PMID: 9799226

      Echinoderm microtubule-associated protein-like 1

      EML1

      140

      na

      na

      Ensconsin

      MAP7

      91

      na

      Decrease

      PMID: 31391261

      Targeting protein for Xklp2

      TPX2

      91

      Increase

      Decrease

      PMID: 26414402

      Microtubule-associated protein 1B

      MAP1B

      85

      Increase

      Decrease

      PMID: 7664878

      MAP1S

      MAP1S

      66

      Decrease

      Decrease

      PMID: 25300793

      Hyaluronan mediated motility receptor

      HMMR

      61

      na

      na

      MAP7 domain-containing protein 1

      MAP7D1

      47

      na

      na

      Cytoskeleton-associated protein 2

      CKAP2

      46

      Increase

      Decrease

      PMID: 15504249

      Microtubule-associated tumor suppressor 1

      MTUS1

      43

      na

      na

      Kinesin-like protein KIF2A

      KIF2A

      37

      Decrease

      Increase

      PMID: 29980677

      CLIP-associating protein 1

      CLASP1

      30

      Decrease

      Decrease

      PMID: 29937387

      Microtubule-associated protein RP/EB family member 3

      MAPRE3

      21

      Increase

      Decrease

      PMID: 20850319

      MAP7 domain containing 2 protein variant 2 (Fragment)

      MAP7D2

      8

      na

      na

      CAP-Gly domain-containing linker protein 4

      CLIP4

      2

      na

      na

      \*Minor comments:***

      Figure 1.

      typo in the figure legend: "interior (distance>300 μm) vs. periphery (50 μmThere appears to be a clear dip in EB1 density at 100 um (Figure 1C). What could be the cause of that?*

      Thank you for catching the typo. We corrected this to “periphery (distance>300 µm) vs. interior (50 µmFigure 2.

      Note that the distances used in Figure 2. to define 'interior' and 'periphery' are completely different than those in Figure 1. (Interior in Figure 1 is defined to be between 50 and 280 um from the MTOC, and exterior larger than 300 um. However, in Figure 2. interior is defined as less than 100 um, and exterior as larger than 200 um.) Given that the asters are actively growing, it would be good to clearly explain how these intervals were defined in each case.

      For both experiments, we had clearly stated the definitions of interior and periphery, either in the figure legends or in the methods section. We have added a new paragraph explaining why we could not choose exactly the same quantitative definitions for these two methods (please also see our reply to Reviewer #2 comment 1).

      In the periphery movie, there are several notable examples of apparent minus-end depolymerization and treadmilling. The authors state these are very rare - perhaps a quantification would be useful here?

      Thank you for pointing this out. We modified the sentence to reflect the outward depolymerization events in the periphery. “We observed few outward-moving depolymerization events (Reviewer #1 (Significance (Required)):

      The observation of distinct depolymerization rates within vs. at the periphery of microtubule asters is novel and interesting. However, the manuscript in its current form is rather preliminary. The observation can be significantly strengthened by additional experiments/analysis that would characterize the effect in more detail. Even more importantly, the authors propose a highly speculative (although compelling) mechanism, but make no attempt to test it in any way. This is a major deficiency of the current manuscript that should be addressed prior to publication.

      REFEREES CROSS COMMENTING

      I agree with Reviewer #2 that our comments are both overlapping and complementary. I also find Reviewer #2's comments fair and reasonable and see no need for further adjustments.

      RESPONSE TO REVIEWER #2

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      \*SUMMARY ***

      This paper reports measurements of microtubule dynamics in interphase asters nucleated in Xenopus egg extracts. Dynamics are measured using two methods. First tracking of GFP tagged EB1 protein forming comets at the tips of growing microtubules, as used in other studies, which can only measure growth rates. Second using a recently developed automated tracking based on subtractive difference images of fluorescently labelled microtubules, which can measure both growth and shrinkage rates. The main and novel observation of this paper, using difference image tracking, is that the MT shrinkage rate is ~2 fold faster in the interior of the aster compared with the periphery of the aster, whilst rates of MT polymerisation and catastrophe vary only slightly, if at all. The authors speculate that this might be due to a reduced MAP concentration and occupancy in the aster interior. They also discuss the role of a depletion-dependent increased shrinkage rate as a feedback mechanism to maintain a low MT polymer density in the aster interior.

      \*MAJOR COMMENTS***

      The movies are startling in their beauty and clarity and the key conclusion that the shrinkage rate is significantly faster in the interior compared to the periphery of the aster is convincing.

      The observation that the rate of net MT plus end growth rate is ~10% faster at the periphery compared to interior of the aster is only supported by EB1 tip tracking method. The difference imaging method shows no significant difference in rates. The authors need to discuss this discrepancy between the established and new methods of analysis. It is insufficient to state that the growth rates obtained by the two methods are "consistent".

      This comment prompts the comparison of the two methods (EB1 vs. TIRF difference imaging). On one hand, EB1 tracking is more sensitive in detecting plus ends, and allows large N observations so it is likely to show statistical significance. On the other hand, EB1 tracking method is noisier (higher standard deviation) than the TIRF based measurements (see our response to Reviewer #1). In the TIRF difference imaging, the exact location of the periphery (relative to the center as well as the overall microtubule density profile) is hard to evaluate.

      What is consistent between the two methods is the approximate mean value of polymerization rates. The 10% faster polymerization velocity is only suggested by the EB1 tracking method, calling for caution/further investigation. However, the potential relatively small difference in polymerization rate is not the main point of this paper.

      We deleted the sentence in the results section for the TIRF method: “These values of polymerization rates are consistent with EB1 comet tracking (Fig. 1). ” We have added a new paragraph discussing the discrepancies between the methods in reporting polymerization rate.

      The discussion proposing MAP depletion-dependent increased shrinkage rate as a feedback mechanism to limit MT polymer density is reasonable.

      The model and discussion of the role of MAPs might be criticised as highly speculative and unsupported by any experimental data. The authors do acknowledge this. Whether the ratio of data to speculative interpretation is appropriate will be an editorial decision for whichever journal ultimately hosts this.

      Thank you. This is exactly the kind of comments that we wanted to hear from an initiative like Review Commons. This helps us gauge how our work is received and decide which journal to submit our work.

      In particular since the aster forms by growth from the nucleating bead, early in its formation the final interior MTs must have first formed the peripheral MTs and could therefore enter fresh media and bind MAPs. The authors show by calculation that as the aster expands, these MTs and MAPs become isolated from mixing with the external media. This isolation would then suggest that any MAPS released by dissociation or MT depolymerisation must remain in the interior, and are therefore available to rebind to newly formed MTs. So, it is unclear why the MAPs should be depleted in the interior compared to the periphery, unless expansion of the Aster is slowed in which case additional MAPs could diffuse into the stationary periphery from the surrounding media. The kinetics of MT growth, MAP binding and aster expansion would then also be expected to have an effect on the outcome beyond a simple "depletion" of the internal MAP concentration.

      We use the term “depletion” to mean a significant decrease of MAP from the cytoplasm. As outlined in our toy model, more MTs lead to more MAP binding and depletion of soluble MAPs. Note that the total local abundance of MAP is constant unless there is significant diffusive transport of MAP from one region to another. We argue this transport is ineffective for the large length scale of interphase asters.

      It is also not clear how the authors preferred model would account for the suggestion of bimodal shrinkage rates. It is not clear if this is a simplification (binning things in to external and internal) applied for the purposes of discussion.

      Please see our comment to Reviewer #1. We now believe there is no evidence for bimodality of depolymerization rates. The spread of the data reflects the variability of depolymerization rates in a given a field of view as well as the variability across multiple fields of view.

      \*MINOR COMMENTS***

      Line 71

      Authors reference Gardner et al 2011, when discussing depolymerisation as a zero order process, as showing a free tubulin dimer concentration effect on shrinkage rates. However, the results in Gardner refer to the off rate during MT polymerisation, and measurements of rapid small scale events during overall growth phases and would be applicable to GTP-heterodimers, whereas the extended shrinkage events measured in this paper would presumably apply to post-catastrophe GDP-heterodimer dissociation and may not be comparable. The reference should be omitted or a further explanation given.

      Thanks, good point. We wanted to cite Gardner et al (2011) to make the point that classic assembly models may not always hold, but the reviewer is correct, that paper only looked at concentration dependence of depolymerization at growing ends. The text was changed to:

      “This assumption has been questioned for growing ends (Gardner 2011)​, but not for shrinking ends to our knowledge.”

      Line 89

      States "density of plus ends is approximately homogenous within interphase asters"

      However, in results section it is stated Line 111 that "the plus end density is lower at the periphery compared to the aster center".

      Please clarify

      The plus end density is approximately homogenous from the center to the periphery of the aster. However, only at the most peripheral region, where there are few microtubules, the density drops.

      Line 135

      The distances given for the interior and periphery appear to be mixed up.

      Thank you, we corrected this.

      Line277

      "approximately consistent with our Peclet number estimate". 50µm gives a Pe value of 2.8. The Peclat number "significance" is earlier given in terms of "Pe>>1" (Line255). Please clarify what range of experimental values is required for the argument to hold.

      Our statement was unclear. We modified the sentence in the following way to clarify our point: “The half-width of the depleted zone extended ~50 microns beyond the growing aster periphery, which is smaller than the typical aster radius. This analysis indicated that soluble protein levels may vary between subregions of growing asters due to subunit consumption.”

      Line 404

      needs details of the GFP-EB1 and fluorescent tubulin used in this experiment.

      The detailed concentrations are described for each method in the subsequent sections. To avoid confusion, we removed the sentence in line 404, which omitted details.

      The tubulin depletion measurements detect a 4% reduction in tubulin concentration in the interior versus the exterior, and the same for eGFP-EB1 (Fig.4B). This observation provides important support for the depletion proposal. But the experiments apparently lack a control for potential reduction of fluorescence excitation intensity with depth in these deep specimens (equivalent to the inner filter effect in spectroscopy). Is there a component whose apparent concentration (fluorescence emission intensity) does not decrease by 4% in the interior of the aster?

      Indeed, fluorescent intensity measurements require special attention. Our samples are made by squashing 4 ul of extract under a 18 mm x 18 mm coverslip and the resulting thickness is 10 micron, which we believe is a distance that is too small to result in an inner filter effect.

      In response to Reviewer #2’s request for an example of a component whose fluorescence intensity is uniform, we provide the intensity profile of the inert 10kDa Dextran labeled with Alexa568. This serves as a control for the reviewer’s specific concern with our method. We will incorporate this as a supplementary figure in the revision.

      There is no direct discussion of the relative lifetime of MTs in the interior compared to the exterior of the aster. Catastrophe rates and growth rates are essentially invariant, I think this implies that MT lifetimes are essentially the same in the interior versus the exterior? Please confirm and estimate the lifetime. This could exclude a maturation process whereby one set of MAPs got replaced by another over time?

      Indeed, MT lifetime is a function of four rates: polymerization, depolymerization, catastrophe, and rescue. The figure below shows the MT lifetime as a function of depolymerization rate, assuming other parameters are fixed at what we found in our previous report Ishihara et al. 2016. In regions of fast depolymerization rate 40 µm/min, the microtubule lifetime is 0.98 min. As the depolymerization rate decreases to 30 and 25 µm/min, the lifetime increases to 1.5 and 2.4 min. This implies that the microtubules at the aster periphery are longer lived than those in the interior.

      Association and dissociation rate constants have not been measured for most MAPs, but in general we expect them to be fast compared to the timescale of MT lifetime of ~1 minute. Most MAPs bind in the low micromolar or high nM regime, which implies dissociation rates of seconds or less. MAP4 and MAP7 were both shown to bind and dissociate rapidly in living cells (PMID: 16714020, PMID: 11719555)

      Reviewer #2 (Significance (Required)):

      This paper is significant as it is the first observation of spatial variation in MT shrinkage rates in an aster. It proposes the broad shape of an underlying mechanism (depletion of stabilising MAPS in the aster interior) and presents sound quantitative arguments, but the experiments do not directly test this mechanism. Aster formation in Xenopus egg extracts is widely used as a model system, and if indeed the spatial variation turns out to be due to spatial depletion of components then this will become a landmark paper. The paper may promote wider use of this method of automated analysis and encourage study of shrinkage rate mechanisms in other systems.

      REFEREES CROSS COMMENTING

      In my opinion the comments of reviewer #1 are fair and reasonable and overlap with and complement my own. In my opinion there is zero conflict requiring adjustment.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      SUMMARY

      This paper reports measurements of microtubule dynamics in interphase asters nucleated in Xenopus egg extracts. Dynamics are measured using two methods. First tracking of GFP tagged EB1 protein forming comets at the tips of growing microtubules, as used in other studies, which can only measure growth rates. Second using a recently developed automated tracking based on subtractive difference images of fluorescently labelled microtubules, which can measure both growth and shrinkage rates. The main and novel observation of this paper, using difference image tracking, is that the MT shrinkage rate is ~2 fold faster in the interior of the aster compared with the periphery of the aster, whilst rates of MT polymerisation and catastrophe vary only slightly, if at all. The authors speculate that this might be due to a reduced MAP concentration and occupancy in the aster interior. They also discuss the role of a depletion-dependent increased shrinkage rate as a feedback mechanism to maintain a low MT polymer density in the aster interior.

      MAJOR COMMENTS

      The movies are startling in their beauty and clarity and the key conclusion that the shrinkage rate is significantly faster in the interior compared to the periphery of the aster is convincing.

      The observation that the rate of net MT plus end growth rate is ~10% faster at the periphery compared to interior of the aster is only supported by EB1 tip tracking method. The difference imaging method shows no significant difference in rates. The authors need to discuss this discrepancy between the established and new methods of analysis. It is insufficient to state that the growth rates obtained by the two methods are "consistent".

      The discussion proposing MAP depletion-dependent increased shrinkage rate as a feedback mechanism to limit MT polymer density is reasonable.

      The model and discussion of the role of MAPs might be criticised as highly speculative and unsupported by any experimental data. The authors do acknowledge this. Whether the ratio of data to speculative interpretation is appropriate will be an editorial decision for whichever journal ultimately hosts this.

      In particular since the aster forms by growth from the nucleating bead, early in its formation the final interior MTs must have first formed the peripheral MTs and could therefore enter fresh media and bind MAPs. The authors show by calculation that as the aster expands, these MTs and MAPs become isolated from mixing with the external media. This isolation would then suggest that any MAPS released by dissociation or MT depolymerisation must remain in the interior, and are therefore available to rebind to newly formed MTs. So, it is unclear why the MAPs should be depleted in the interior compared to the periphery, unless expansion of the Aster is slowed in which case additional MAPs could diffuse into the stationary periphery from the surrounding media. The kinetics of MT growth, MAP binding and aster expansion would then also be expected to have an effect on the outcome beyond a simple "depletion" of the internal MAP concentration.

      It is also not clear how the authors preferred model would account for the suggestion of bimodal shrinkage rates. It is not clear if this is a simplification (binning things in to external and internal) applied for the purposes of discussion.

      MINOR COMMENTS

      Line 71 Authors reference Gardner et al 2011, when discussing depolymerisation as a zero order process, as showing a free tubulin dimer concentration effect on shrinkage rates. However, the results in Gardner refer to the off rate during MT polymerisation, and measurements of rapid small scale events during overall growth phases and would be applicable to GTP-heterodimers, whereas the extended shrinkage events measured in this paper would presumably apply to post-catastrophe GDP-heterodimer dissociation and may not be comparable. The reference should be omitted or a further explanation given.

      Line 89 States "density of plus ends is approximately homogenous within interphase asters" However, in results section it is stated Line 111 that "the plus end density is lower at the periphery compared to the aster center". Please clarify

      Line 135 The distances given for the interior and periphery appear to be mixed up.

      Line277 "approximately consistent with our Peclet number estimate". 50µm gives a Pe value of 2.8. The Peclat number "significance" is earlier given in terms of "Pe>>1" (Line255). Please clarify what range of experimental values is required for the argument to hold.

      Line 404 needs details of the GFP-EB1 and fluorescent tubulin used in this experiment.

      The tubulin depletion measurements detect a 4% reduction in tubulin concentration in the interior versus the exterior, and the same for eGFP-EB1 (Fig.4B). This observation provides important support for the depletion proposal. But the experiments apparently lack a control for potential reduction of fluorescence excitation intensity with depth in these deep specimens (equivalent to the inner filter effect in spectroscopy). Is there a component whose apparent concentration (fluorescence emission intensity) does not decrease by 4% in the interior of the aster?

      There is no direct discussion of the relative lifetime of MTs in the interior compared to the exterior of the aster. Catastrophe rates and growth rates are essentially invariant, I think this implies that MT lifetimes are essentially the same in the interior versus the exterior? Please confirm and estimate the lifetime. This could exclude a maturation process whereby one set of MAPs got replaced by another over time?

      Significance

      This paper is significant as it is the first observation of spatial variation in MT shrinkage rates in an aster. It proposes the broad shape of an underlying mechanism (depletion of stabilising MAPS in the aster interior) and presents sound quantitative arguments, but the experiments do not directly test this mechanism. Aster formation in Xenopus egg extracts is widely used as a model system, and if indeed the spatial variation turns out to be due to spatial depletion of components then this will become a landmark paper. The paper may promote wider use of this method of automated analysis and encourage study of shrinkage rate mechanisms in other systems.

      REFEREES CROSS COMMENTING

      In my opinion the comments of reviewer #1 are fair and reasonable and overlap with and complement my own. In my opinion there is zero conflict requiring adjustment.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Ishihara et al. investigate and compare microtubule polymerization/depolymerization dynamics inside vs. at the periphery of microtubule asters in a cell-free Xenopus egg extract system. By tracking EB comets, which localize to growing microtubule ends, they find that the microtubule growth rates and EB comet lifetimes (interpreted as an indicator of microtubule catastrophe rates) are similar between the two spatially-distinct microtubule populations. However, using a tubulin-intensity-difference image analysis, the authors are also able to measure local microtubule depolymerization rates, and they find a significant difference in depolymerization rates of the two populations. Specifically, the authors report that the microtubule depolymerization rates measured within asters are faster than those measured at the periphery.

      Specific comments:

      Figure 2. In the text, the authors report: "The depolymerization rate was 36.3 {plus minus} 7.9 μm/min (mean, std) in the aster interior, compared to 29.2 {plus minus} 8.9 μm/min (mean, std) at the aster periphery." This difference is certainly not two-fold (as stated in the abstract). It would also be useful to mark the mean rates on the graph in 2B.

      The bimodal shape of the depolymerization rate distributions in 2B is very interesting. This definitely warrants further investigation. At the minimum, the depolymerization rates should be determined at 50 um- intervals, as done for other parameters in Figure 1. Could it be that there are two coexisting populations of microtubules at the same location? Or is there a clear spatial compartmentalization of the two that is not obvious here because of the too large of a distance interval used for the measurements. This is a very important distinction for the claims of the paper.

      The authors make a point here that the distribution of measured polymerization rates is fairly narrow. This appears to be in contrast with Figure 1B, where polymerization rates take on a wide range of values. How do the two distributions of polymerization rates obtained by these two methods compare?

      Figure 3. The laser ablation figure and movies are beautiful, but don't seem to add support to the story. Importantly, the authors do not confirm any spatial variability in depolymerization rate with these experiment. As a matter of fact, although the laser ablation experiments are only performed in the aster interior, the measured depolymerization rates appear to be just as consistent with the periphery rates in Figure 2. as they are with the interior rates in Figure 2. (They span quite a large range of values with the average right in the middle between what was measured for the two areas in Figure 2).

      Although the authors report they don't see any correlation between the distance and depolymerization rate, they should still plot the rate as a function of initial cut positions (Figures 3D, 3E).

      From the single decaying inward wave the authors conclude that microtubules depolymerize fully to their minus ends which are distributed throughout the aster. Can the possibility that depolymerization is stopped by microtubule lattice defects/islands be excluded by these observations?

      What are the effects of the local increase in tubulin concentration due to the subunit release by depolymerization? What about the release of other lattice-binding MAPs (stabilizers)?

      Figure 4. Is the local depletion of tubulin/EB1 thought to be only within the narrow annulus at ~100 um distance, or is it not measurable on the inside due to the polymer signal? Can the two be separated? Such a sharp transition within a discrete annular region doesn't speak to the relative effects on the inside vs. the outside of the aster?!

      More importantly, the local depletion of either tubulin or EB1 is not a good representation of a depletion of a MAP component that associates with the microtubule lattice. Both tubulin and EB1 bind preferably to microtubule ends, not lattice. Thus showing a profile of slight local tubulin and/or EB depletion does not seem to be relevant for the proposed model. Rather, overall microtubule polymer mass/density as a function of distance may be more relevant?

      Figure 5. The toy model is intuitive and clear, but not sufficient without any experimental investigation. An attempt to quantify the actual distributions of at least one or a few selected proposed MAPs is needed. Is the depletion strongest where microtubule density is highest? What is the ratio of a MAP intensity to microtubule polymer density as a function of distance? How does that relate to local depolymerization rates? What are other testable model predictions that can show support for the proposed mechanism?

      Also, the table is insufficiently described. Are any or all of these MAPs known to be specific regulators of microtubule depolymerization rates, but not other dynamics parameters?

      Minor comments:

      Figure 1. typo in the figure legend: "interior (distance>300 μm) vs. periphery (50 μm<distance<280 μm)" There appears to be a clear dip in EB1 density at 100 um (Figure 1C). What could be the cause of that?

      Figure 2. Note that the distances used in Figure 2. to define 'interior' and 'periphery' are completely different than those in Figure 1. (Interior in Figure 1 is defined to be between 50 and 280 um from the MTOC, and exterior larger than 300 um. However, in Figure 2. interior is defined as less than 100 um, and exterior as larger than 200 um.) Given that the asters are actively growing, it would be good to clearly explain how these intervals were defined in each case.

      In the periphery movie, there are several notable examples of apparent minus-end depolymerization and treadmilling. The authors state these are very rare - perhaps a quantification would be useful here?

      Significance

      The observation of distinct depolymerization rates within vs. at the periphery of microtubule asters is novel and interesting. However, the manuscript in its current form is rather preliminary. The observation can be significantly strengthened by additional experiments/analysis that would characterize the effect in more detail. Even more importantly, the authors propose a highly speculative (although compelling) mechanism, but make no attempt to test it in any way. This is a major deficiency of the current manuscript that should be addressed prior to publication.

      REFEREES CROSS COMMENTING

      I agree with Reviewer #2 that our comments are both overlapping and complementary. I also find Reviewer #2's comments fair and reasonable and see no need for further adjustments.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to express our appreciation for both the Editors’ and Reviewers’ efforts as essential contributions to the peer review process. We highly value the Reviewers’ constructive critique of our manuscript#RC-2020_00434R entitled “A drug repurposing screen identifies hepatitis C antivirals as inhibitors of the SARS-CoV2 main protease.__” __

      We appreciate the Reviewers’ thoughtful consideration of our work and feel their critiques and recommendations have significantly improved our manuscript. Taken together, we believe the additional data, clarification of data presentation, and revised discussion address the heart of the Reviewers’ previous concerns. Thus we feel the work is ready for reconsideration and will be an impactful addition to the literature appropriate for publication. Below we provide a breakdown and a point by point response to previous review critiques.

      Thank you for your attention. We look forward to your response.

      Best Wishes,

      Brian Kraemer, PhD ▪ Associate Director for Research Geriatric Research Education and Clinical Center ▪ Veterans Affairs Puget Sound Health Care System ▪ Research Professor ▪ Departments of Medicine, Psychiatry and Behavioral Sciences, and Pathology ▪ University of Washington ▪ 1660 South Columbian Way ▪ Seattle, WA 98108 ▪ Phone 206-277-1071 ▪ www.kraemerlab.uw.edu

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Baker et al. report the screening of a collection of ~6,070 drugs for their inhibitory activity against the enzymatic activity of the SARS-SoV-2 Mpro protein in vitro using two peptide substrates. 50 compounds with activity against Mpro were identified and tested for their dose-dependent effect in the same assay. Several hits were identified, among which are approved drugs that target the HCV protease.

      Indeed, there is an urgent need for effective drugs for SARS-CoV-2 infection, and high throughput screenings can discover novel candidates. However, the novelty of this work is quite limited, as former screens have been published with the same target using the same substrates. Moreover, as discussed below the translational impact of the hits discussed is also quite limited, particularly in the absence of antiviral data. Lastly, there are several overstatements in the write up and it will require major editing.

      **Major comments:**

      1. Were there any positive controls previously shown to potently inhibit the SARS-CoV-2 Mpro included in the screen (e.g. ebselen)? How did these perform in this assay? When first designing our protease assay, we did use ebselen as the initial control. Ebselen showed low potency in all our in our assays and was not considered as a positive control subsequently. It should be noted that Ebselen failed to work against multiple substrates. It is possible that our buffer conditions prevented Ebselen activity. See data plotted below. After identifying boceprevir as a potent inhibitor, it was used in all subsequent assays as a positive control.

      It will be helpful if the authors would provide info re the 50 hits from prior screens conducted with this library of compounds - how promiscuous are they across screens? How toxic in cell based assays?

      We have updated the table to provide additional useful information as well as a footnote explaining statuses. The compounds in the Broad repurposing library are generally non-toxic and information about them can be found here: https://clue.io/repurposing

      The translational potential of the findings appears to be limited. The calculated IC50s for these drugs in the Mpro assay are very high (10-1000 fold higher) relative to their IC50 in an enzymatic assay involving the HCV protease (Boceprevir: IC50 = 0.95 μM vs. 0.084 μM in HCV), Ciluprevir (IC50 = 20.77 μM vs. 0.0087 in HCV), Telaprevir (IC50 = 15.25 μM vs.0.050 μM in HCV) (https://aac.asm.org/content/aac/57/12/6236.full.pdf ). In the absence of antiviral data, the main statement of the manuscript that "the work presented here supports the rapid evaluation of previous HCV NS3/4A inhibitors for repurposing as a COVID-19 therapy." is thus an overstatement. Even is there is some activity, since likely to be limited, as with the HIV protease inhibitors, its chances to elicit a meaningful clinical effect is low. Moreover, when used in monotherapy, some of these protease inhibitors have a very low genetic barrier to resistance.

      We have reworked the discussion to incorporate these concerns and limitations of our results.

      There are additional inaccurate or overstatements - e.g. line 61 "Probably the most successful approved antivirals are protease inhibitors such as atazanavir for HIV-1 and simeprevir for hepatitis C. [reviewed in 10 and 11]."

      We have reworded this statement: (Page 4, Lines 61-62)

      “There is precedence for targeting the protease, as this approach has been successful in treating both HIV-1 and hepatitis C (10,11).”

      The manuscript requires editing - e.g. structure of sentences, commas, spacing (including in the abstract) etc.

      The manuscript has been re-proofed throughout (see tracked changes version of manuscript)

      What is the take home message? The statement "Taken together this work suggests previous large-scale commercial drug development initiatives targeting hepatitis C NS3/4A viral protease should be revisited because some previous lead compounds may be more potent against SARS-CoV-2 Mpro than Boceprevir and suitable for rapid repurposing." is unclear.

      The take home message of the manuscript is that HCV-targeting protease inhibitors have potential in blocking the SARS-Cov2 protease and a more thorough analysis of the space is needed. As the reviewer pointed out, the identified hits boceprevir and narlaprevir are less potent when targeting the SARS-Cov2 protease as compared to the HCV protease. However, we believe this work does show the potential for screening HCV-targeting protease inhibitors that may not have made it to the clinic. For instance, Boceprevir or Narlaprevir analogs may be even more potent against Mrpo. Further, we believe that these compounds would benefit from further optimization through medicinal chemistry.

      We have expanded the discussion to incorporate issues brought up here and in point 3.

      Reviewer #1 (Significance (Required)):

      Limited. As discussed above

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      SARS-CoV-2 pandemic causing serious health crisis globally. There are no specific medicine or vaccines to contain this virus currently. To address this issue, the authors developed one efficient fluorescent Mpro assay system and screened ~6070 previous used drugs in this article. Several compounds with activity against SARS-CoV-2 Mpro in vitro were founded. Most hits are hepatitis C NS3/4A protease inhibitors with fair IC50 value. Besides, the authors found that most identified compounds in in silico screen lack activity against Mpro in kinetic protease assays.

      These research results are well proved and reproducible. But there are two minor questions I present below:

      1. In your Mpro assay optimization process you said substrate MCA-AVLQSGFR-K(Dnp)- K-NH2 had drastically lower rates of Mpro catalyzed hydrolysis and were not considered further in your assay development. And in your Fig.1 I saw extremely low RFU changes. But several nice inhibitors were screened using this substrate that was reported in April. Can you explain this result? The substrates used in our assay appear to be much more efficiently cleaved at least with our buffer conditions and Mpro concentrations tested. Variables including recombinant Mpro purity and activity, differences in assay buffer, reader sensitivity may all play a role, but our best guess is that the substrate identified by Marcin Drag’s group (https://doi.org/10.1101/2020.04.29.068890), is more readily cleaved by Mpro. Although screening with other reported substrates is feasible given previous results, we believe the Ac-Abu-Tle-Leu-Gln-AFC to be superior for use in high throughput screening because of its superior cleavage kinetics yielding an improved signal to background ratio for HTS.

      To exclude inhibitors possibly acting as aggregators, a detergent-based control should do at the same time when you do IC50 value measurement.

      Compound aggregation is a concern, and our assays were all run with detergent in the buffer. Our buffer composition was 20mM Tris pH 7.8, 150mM NaCl, 1mM EDTA, 1mM DTT, 0.05% Triton X-100.

      Reviewer #2 (Significance (Required)):

      Nice work but the significance of this article is losing now. Most screened hits are reported in the last serval months. Some inhibitor complex structures have been published or released on Protein Data Bank. The novelty is missing. I suggest the authors add more results and resubmit it again.

      **Referees Cross-commenting**

      I agree with the other two reviewers' comments. The significance of this work is losing but still has something interest. I think it can be published in the lower-impact journal if they complete our suggestions

      We concur with both reviewers that demonstration of antiviral activity would strengthen the impact of the manuscript. However, this work remains outside of the scope of feasibility at our institution. We believe that our screen and hit identification can stand on their own until further translational work can be completed.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this report, Baker et al. show that four inhibitors of hepatitis C virus (HCV) NS3/4 protease (ciluprevir, boceprevir, narlaprevir and telaprevir) are also effective inhibitors of the SARS-CoV-2 main protease (Mpro) in enzymatic assays, with lower IC50 values for narlaprevir and boceprevir (around 1 µM in their assay conditions). HCV NS3/4 inhibitors were identified after screening a library of >6,000 compounds of the Broad Institute, including approved drugs. Screening was done with fluorometric proteolytic assays.

      Experiments have been apparently well-done and results are sound. The manuscript needs editing.

      Reviewer #3 (Significance (Required)):

      Experiments have been apparently well-done and results are sound. However, this is a limited study since there are no data obtained in cell culture and a comparison of IC50 values of the selected drugs against HCV and SARS-CoV-2 proteases is missing. It is difficult to infer whether the drugs would be equally effective against SARS-CoV-2 than against HCV, and otherwise, how much should the doses increase in order to have a therapeutic effect.

      The manuscript needs editing (see below) and the Discussion is poor. The results reported by authors are not new, and a discussion of the effects of HCV inhibitors on SARS-CoV-2 replication, based on previous publications is necessary to provide the appropriate context for the study.

      Here are some references on Covid-19 and HCV inhibitors, that in my opinion should be considered for discussion and proper citation. As correctly pointed out by Baker and co- workers, docking studies should be considered with caution, though.

      We appreciate the feedback and have now reworked and expanded the discussion to incorporate reviewer #1 and #3 comments and suggestions.

      1: Ghahremanpour MM, Tirado-Rives J, Deshmukh M, Ippolito JA, Zhang CH, de Vaca IC, Liosi ME, Anderson KS, Jorgensen WL. Identification of 14 Known Drugs as Inhibitors of the Main Protease of SARS-CoV-2. bioRxiv [Preprint]. 2020 Aug 28:2020.08.28.271957. doi: 10.1101/2020.08.28.271957. PMID: 32869018; PMCID: PMC7457600.

      2: Sacco MD, Ma C, Lagarias P, Gao A, Townsend JA, Meng X, Dube P, Zhang X, Hu Y, Kitamura N, Hurst B, Tarbet B, Marty MT, Kolocouris A, Xiang Y, Chen Y, Wang J. Structure and inhibition of the SARS-CoV-2 main protease reveals strategy for developing dual inhibitors against Mpro and cathepsin L. bioRxiv [Preprint]. 2020 Jul 27:2020.07.27.223727. doi: 10.1101/2020.07.27.223727. PMID: 32766590; PMCID: PMC7402059.

      3: Ma C, Sacco MD, Hurst B, Townsend JA, Hu Y, Szeto T, Zhang X, Tarbet B, Marty MT, Chen Y, Wang J. Boceprevir, GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2viral replication by targeting the viral main protease. Cell Res. 2020 Aug;30(8):678-692. doi: 10.1038/s41422-020-0356-z. Epub 2020 Jun 15. PMID: 32541865; PMCID: PMC7294525.

      4: Ke YY, Peng TT, Yeh TK, Huang WZ, Chang SE, Wu SH, Hung HC, Hsu TA, Lee SJ, Song JS, Lin WH, Chiang TJ, Lin JH, Sytwu HK, Chen CT. Artificial intelligence approach fighting COVID-19 with repurposing drugs. Biomed J. 2020 May 15:S2319- 4170(20)30049-4. doi: 10.1016/j.bj.2020.05.001. Epub ahead of print. PMID: 32426387; PMCID: PMC7227517.

      5: Elzupir AO. Inhibition of SARS-CoV-2 main protease 3CLpro by means of α-ketoamide and pyridone-containing pharmaceuticals using in silico molecular docking. J Mol Struct. 2020 Dec 15;1222:128878. doi: 10.1016/j.molstruc.2020.128878. Epub 2020 Jul 10.

      PMID: 32834113; PMCID: PMC7347502.

      Additional computational studies:

      1: Hosseini FS, Amanlou M. Anti-HCV and anti-malaria agent, potential candidates to repurpose for coronavirus infection: Virtual screening, molecular docking, and molecular dynamics simulation study. Life Sci. 2020 Aug 8;258:118205. doi:10.1016/j.lfs.2020.118205. Epub ahead of print. PMID: 32777300; PMCID:PMC7413873.

      2: Hakmi M, Bouricha EM, Kandoussi I, Harti JE, Ibrahimi A. Repurposing of known anti- virals as potential inhibitors for SARS-CoV-2 main protease using molecular docking analysis. Bioinformation. 2020 Apr 30;16(4):301-306. doi:10.6026/97320630016301.

      PMID: 32773989; PMCID: PMC7392094.

      3: Chtita S, Belhassan A, Aouidate A, Belaidi S, Bouachrine M, Lakhlifi T. Discovery of Potent SARS-CoV-2 Inhibitors from Approved Antiviral Drugs via Docking Screening. Comb Chem High Throughput Screen. 2020 Jul 30. doi:10.2174/1386207323999200730205447. Epub ahead of print. PMID: 32748740.

      4: Alamri MA, Tahir Ul Qamar M, Mirza MU, Bhadane R, Alqahtani SM, Muneer I, Froeyen M, Salo-Ahen OMH. Pharmacoinformatics and molecular dynamics simulation studies reveal potential covalent and FDA-approved inhibitors of SARS-CoV-2 main protease 3CLpro. J Biomol Struct Dyn. 2020 Jun 24:1-13. doi:10.1080/07391102.2020.1782768. Epub ahead of print. PMID: 32579061; PMCID:PMC7332866.

      5: Bafna K, Krug RM, Montelione GT. Structural Similarity of SARS-CoV2 Mpro and HCV NS3/4A Proteases Suggests New Approaches for Identifying Existing Drugs Useful as COVID-19 Therapeutics. ChemRxiv [Preprint]. 2020 Apr 21. doi: 10.26434/chemrxiv.12153615. PMID: 32511291; PMCID: PMC7263768.

      6: Eleftheriou P, Amanatidou D, Petrou A, Geronikaki A. In Silico Evaluation of the Effectivity of Approved Protease Inhibitors against the Main Protease of the Novel SARS- CoV-2 Virus. Molecules. 2020 May 29;25(11):2529. doi:10.3390/molecules25112529.

      PMID: 32485894; PMCID: PMC7321236.

      7: Wang J. Fast Identification of Possible Drug Treatment of Coronavirus Disease-19 (COVID-19) through Computational Drug Repurposing Study. J Chem Inf Model. 2020 Jun 22;60(6):3277-3286. doi: 10.1021/acs.jcim.0c00179. Epub 2020 May 4. PMID: 32315171; PMCID: PMC7197972.

      8: Chen YW, Yiu CB, Wong KY. Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL pro) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates. F1000Res. 2020 Feb 21;9:129. doi: 10.12688/f1000research.22457.2. PMID: 32194944; PMCID: PMC7062204.

      Minor comments:

      We appreciate the time that the reviewer has taken to address grammatical changes and have addressed each throughout the manuscript with tracked changes.

      p.2, line 26: > appears as an attractive

      Manuscript edited

      p.2, line 27: > we show that the existing

      Manuscript edited

      p.2, line 33: > separate numbers and units, eg. 1.10 µM (this is a persisting error that should be corrected throughout the whole ms)

      Manuscript edited

      p.4, line 44: SARS virus should be referred as to SARS-CoV-1 throughout the whole manuscript. MERS-CoV is the name of the virus causing MERS

      Manuscript edited

      p.4, lines 61-62: > the selection of the specific compounds seems to be arbitrary... why atazanavir and not darunavir or other? The sentence should be rewritten.

      Rewritten as: “There is precedence for targeting the protease, as this approach has been successful in treating both HIV-1 and hepatitis C.”

      p.6, line 100: Citing Fig. 2B before completing the description of Fig. 1 is distracting. Authors should think of a better way to describe their results.

      This was a mistake and should have cited Fig 1B. Thank you for catching this.

      p.7, line 116: It is not clear what "10m-20,810" means

      This has been clarified to state: “ΔRFU at 10 minutes = 20,810 relative fluorescence units”

      p.7, lines 125-126: These sentences belong to an introduction, not appropriate in results section.

      We have removed these sentences.

      Figure 2. Part A is not necessary in results (ok for introduction). Black and purple dots in part B is not a good choice since they are difficult to distinguish, maybe orange and black is better.

      We have removed panel A, expanded the size of panel B and changed the color.

      Table 1: Status should be explained in a footnote (i.e the distinction between launched, P2/P3, phase 2, preclinical is not clear).

      The one compound indicated in P2/P3 development is now Phase 3 and the table has been updated. We have added a footnote:

      *Launched = compound approved for humans, though may only be approved for veterinary use in some countries

      Discussion. I think that subheadings are not necessary.

      Subheadings have been removed from the discussion.

      **Referees cross-commenting** I agree with reviewer no. 1 on the limited interest of the study. However, it could be published in a specialized lower-impact journal after addressing issues raised by reviewers 2 and 3 (likely to be completed in less than a month)

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this report, Baker et al. show that four inhibitors of hepatitis C virus (HCV) NS3/4 protease (ciluprevir, boceprevir, narlaprevir and telaprevir) are also effective inhibitors of the SARS-CoV-2 main protease (Mpro) in enzymatic assays, with lower IC50 values for narlaprevir and boceprevir (around 1 µM in their assay conditions). HCV NS3/4 inhibitors were identified after screening a library of >6,000 compounds of the Broad Institute, including approved drugs. Screening was done with fluorometric proteolytic assays.

      Experiments have been apparently well-done and results are sound. The manuscript needs editing.

      Significance

      Experiments have been apparently well-done and results are sound. However, this is a limited study since there are no data obtained in cell culture and a comparison of IC50 values of the selected drugs against HCV and SARS-CoV-2 proteases is missing. It is difficult to infer whether the drugs would be equally effective against SARS-CoV-2 than against HCV, and otherwise, how much should the doses increase in order to have a therapeutic effect. The manuscript needs editing (see below) and the Discussion is poor. The results reported by authors are not new, and a discussion of the effects of HCV inhibitors on SARS-CoV-2 replication, based on previous publications is necessary to provide the appropriate context for the study. Here are some references on Covid-19 and HCV inhibitors, that in my opinion should be considered for discussion and proper citation. As correctly pointed out by Baker and co-workers, docking studies should be considered with caution, though.

      1: Ghahremanpour MM, Tirado-Rives J, Deshmukh M, Ippolito JA, Zhang CH, de Vaca IC, Liosi ME, Anderson KS, Jorgensen WL. Identification of 14 Known Drugs as Inhibitors of the Main Protease of SARS-CoV-2. bioRxiv [Preprint]. 2020 Aug 28:2020.08.28.271957. doi: 10.1101/2020.08.28.271957. PMID: 32869018; PMCID: PMC7457600.

      2: Sacco MD, Ma C, Lagarias P, Gao A, Townsend JA, Meng X, Dube P, Zhang X, Hu Y, Kitamura N, Hurst B, Tarbet B, Marty MT, Kolocouris A, Xiang Y, Chen Y, Wang J. Structure and inhibition of the SARS-CoV-2 main protease reveals strategy for developing dual inhibitors against M<sup>pro</sup> and cathepsin L. bioRxiv [Preprint]. 2020 Jul 27:2020.07.27.223727. doi: 10.1101/2020.07.27.223727. PMID: 32766590; PMCID: PMC7402059.

      3: Ma C, Sacco MD, Hurst B, Townsend JA, Hu Y, Szeto T, Zhang X, Tarbet B, Marty MT, Chen Y, Wang J. Boceprevir, GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2 viral replication by targeting the viral main protease. Cell Res. 2020 Aug;30(8):678-692. doi: 10.1038/s41422-020-0356-z. Epub 2020 Jun 15. PMID: 32541865; PMCID: PMC7294525.

      4: Ke YY, Peng TT, Yeh TK, Huang WZ, Chang SE, Wu SH, Hung HC, Hsu TA, Lee SJ, Song JS, Lin WH, Chiang TJ, Lin JH, Sytwu HK, Chen CT. Artificial intelligence approach fighting COVID-19 with repurposing drugs. Biomed J. 2020 May 15:S2319-4170(20)30049-4. doi: 10.1016/j.bj.2020.05.001. Epub ahead of print. PMID: 32426387; PMCID: PMC7227517.

      5: Elzupir AO. Inhibition of SARS-CoV-2 main protease 3CLpro by means of α-ketoamide and pyridone-containing pharmaceuticals using in silico molecular docking. J Mol Struct. 2020 Dec 15;1222:128878. doi: 10.1016/j.molstruc.2020.128878. Epub 2020 Jul 10. PMID: 32834113; PMCID: PMC7347502.

      Additional computational studies:

      1: Hosseini FS, Amanlou M. Anti-HCV and anti-malaria agent, potential candidates to repurpose for coronavirus infection: Virtual screening, molecular docking, and molecular dynamics simulation study. Life Sci. 2020 Aug 8;258:118205. doi:10.1016/j.lfs.2020.118205. Epub ahead of print. PMID: 32777300; PMCID:PMC7413873.

      2: Hakmi M, Bouricha EM, Kandoussi I, Harti JE, Ibrahimi A. Repurposing of known anti-virals as potential inhibitors for SARS-CoV-2 main protease using molecular docking analysis. Bioinformation. 2020 Apr 30;16(4):301-306. doi:10.6026/97320630016301. PMID: 32773989; PMCID: PMC7392094.

      3: Chtita S, Belhassan A, Aouidate A, Belaidi S, Bouachrine M, Lakhlifi T. Discovery of Potent SARS-CoV-2 Inhibitors from Approved Antiviral Drugs via Docking Screening. Comb Chem High Throughput Screen. 2020 Jul 30. doi:10.2174/1386207323999200730205447. Epub ahead of print. PMID: 32748740.

      4: Alamri MA, Tahir Ul Qamar M, Mirza MU, Bhadane R, Alqahtani SM, Muneer I, Froeyen M, Salo-Ahen OMH. Pharmacoinformatics and molecular dynamics simulation studies reveal potential covalent and FDA-approved inhibitors of SARS-CoV-2 main protease 3CL<sup>pro</sup>. J Biomol Struct Dyn. 2020 Jun 24:1-13. doi:10.1080/07391102.2020.1782768. Epub ahead of print. PMID: 32579061; PMCID:PMC7332866.

      5: Bafna K, Krug RM, Montelione GT. Structural Similarity of SARS-CoV2 M<sup>pro</sup> and HCV NS3/4A Proteases Suggests New Approaches for Identifying Existing Drugs Useful as COVID-19 Therapeutics. ChemRxiv [Preprint]. 2020 Apr 21. doi: 10.26434/chemrxiv.12153615. PMID: 32511291; PMCID: PMC7263768.

      6: Eleftheriou P, Amanatidou D, Petrou A, Geronikaki A. In Silico Evaluation of the Effectivity of Approved Protease Inhibitors against the Main Protease of the Novel SARS-CoV-2 Virus. Molecules. 2020 May 29;25(11):2529. doi:10.3390/molecules25112529. PMID: 32485894; PMCID: PMC7321236.

      7: Wang J. Fast Identification of Possible Drug Treatment of Coronavirus Disease-19 (COVID-19) through Computational Drug Repurposing Study. J Chem Inf Model. 2020 Jun 22;60(6):3277-3286. doi: 10.1021/acs.jcim.0c00179. Epub 2020 May 4. PMID: 32315171; PMCID: PMC7197972.

      8: Chen YW, Yiu CB, Wong KY. Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL <sup>pro</sup>) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates. F1000Res. 2020 Feb 21;9:129. doi: 10.12688/f1000research.22457.2. PMID: 32194944; PMCID: PMC7062204.

      Minor comments:

      p.2, line 26: > appears as an attractive

      p.2, line 27: > we show that the existing

      p.2, line 33: > separate numbers and units, eg. 1.10 µM (this is a persisting error that should be corrected throughout the whole ms)

      p.4, line 44: SARS virus should be referred as to SARS-CoV-1 throughout the whole manuscript. MERS-CoV is the name of the virus causing MERS

      p.4, lines 61-62: > the selection of the specific compounds seems to be arbitrary... why atazanavir and not darunavir or other? The sentence should be rewritten.

      p.6, line 100: Citing Fig. 2B before completing the description of Fig. 1 is distracting. Authors should think of a better way to describe their results.

      p.7, line 116: It is not clear what "10m-20,810" means

      p.7, lines 125-126: These sentences belong to an introduction, not appropriate in results section.

      Figure 2. Part A is not necessary in results (ok for introduction). Black and purple dots in part B is not a good choice since they are difficult to distinguish, maybe orange and black is better.

      Table 1: Status should be explained in a footnote (i.e the distinction between launched, P2/P3, phase 2, preclinical is not clear).

      Discussion. I think that subheadings are not necessary.

      Referees cross-commenting

      I agree with reviewer no. 1 on the limited interest of the study. However, it could be published in a specialized lower-impact journal after addressing issues raised by reviewers 2 and 3 (likely to be completed in less than a month)

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      SARS-CoV-2 pandemic causing serious health crisis globally. There are no specific medicine or vaccines to contain this virus currently. To address this issue, the authors developed one efficient fluorescent Mpro assay system and screened ~6070 previous used drugs in this article. Several compounds with activity against SARS-CoV-2 Mpro in vitro were founded. Most hits are hepatitis C NS3/4A protease inhibitors with fair IC50 value. Besides, the authors found that most identified compounds in in silico screen lack activity against Mpro in kinetic protease assays.

      These research results are well proved and reproducible. But there are two minor questions I present below:

      1.In your Mpro assay optimization process you said substrate MCA-AVLQSGFR-K(Dnp)-K-NH2 had drastically lower rates of Mpro catalyzed hydrolysis and were not considered further in your assay development. And in your Fig.1 I saw extremely low RFU changes. But several nice inhibitors were screened using this substrate that was reported in April. Can you explain this result?

      2.To exclude inhibitors possibly acting as aggregators, a detergent-based control should do at the same time when you do IC50 value measurement.

      Significance

      Nice work but the significance of this article is losing now. Most screened hits are reported in the last serval months. Some inhibitor complex structures have been published or released on Protein Data Bank. The novelty is missing. I suggest the authors add more results and resubmit it again.

      Referees Cross-commenting

      I agree with the other two reviewers' comments. The significance of this work is losing but still has something interest. I think it can be published in the lower-impact journal if they complete our suggestions

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Baker et al. report the screening of a collection of ~6,070 drugs for their inhibitory activity against the enzymatic activity of the SARS-SoV-2 Mpro protein in vitro using two peptide substrates. 50 compounds with activity against Mpro were identified and tested for their dose-dependent effect in the same assay. Several hits were identified, among which are approved drugs that target the HCV protease.<br> Indeed, there is an urgent need for effective drugs for SARS-CoV-2 infection, and high throughput screenings can discover novel candidates. However, the novelty of this work is quite limited, as former screens have been published with the same target using the same substrates. Moreover, as discussed below the translational impact of the hits discussed is also quite limited, particularly in the absence of antiviral data. Lastly, there are several overstatements in the write up and it will require major editing.

      Major comments:

      1.Were there any positive controls previously shown to potently inhibit the SARS-CoV-2 Mpro included in the screen (e.g. ebselen)? How did these perform in this assay?

      2.It will be helpful if the authors would provide info re the 50 hits from prior screens conducted with this library of compounds - how promiscuous are they across screens? How toxic in cell based assays?

      3.The translational potential of the findings appears to be limited. The calculated IC50s for these drugs in the Mpro assay are very high (10-1000 fold higher) relative to their IC50 in an enzymatic assay involving the HCV proteast (Boceprevir: IC50 = 0.95 μM vs. 0.084 μM in HCV), Ciluprevir (IC50 = 20.77 μM vs. 0.0087 in HCV), Telaprevir (IC50 = 15.25 μM vs. 0.050 μM in HCV) (https://aac.asm.org/content/aac/57/12/6236.full.pdf ). In the absence of antiviral data, the main statement of the manuscript that "the work presented here supports the rapid evaluation of previous HCV NS3/4A inhibitors for repurposing as a COVID-19 therapy." is thus an overstatement. Even is there is some activity, since likely to be limited, as with the HIV protease inhibitors, its chances to elicit a meaningful clinical effect is low. Moreover, when used in monotherapy, some of these protease inhibitors have a very low genetic barrier to resistance.

      4.There are additional inaccurate or overstatements - e.g. line 61 "Probably the most successful approved antivirals are protease inhibitors such as atazanavir for HIV-1 and simeprevir for hepatitis C. [reviewed in 10 and 11]."

      5.The manuscript requires editing - e.g. structure of sentences, commas, spacing (including in the abstract) etc.

      6.What is the take home message? The statement "Taken together this work suggests previous large-scale commercial drug development initiatives targeting hepatitis C NS3/4A viral protease should be revisited because some previous lead compounds may be more potent against SARS-CoV-2 Mpro than Boceprevir and suitable for rapid repurposing." is unclear.

      Significance

      Limited. As discussed above

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers and Revision Plan

      We thank all three reviewers for their time and their comments on our manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Here Ryan et al. have used localization analysis following induced rapid relocalization of endogenous proteins to investigate the composition and recruitment hierarchy of a clathrin-TACC3-based spindle complex that is important for microtubule organization and stability.

      The authors generate different HeLa cell lines, each with one of four complex members (TACC3, CLTA, chTOG and GTSE1) endogenously tagged with FKBP-GFP via Cas9-mediated editing. This tag allows rapid recruitment to the mitochondria upon rapamycin addition ("knocksideways"). They ultimately quantify each of the 4 components' localization to the spindle following knocksideways of each component using fluorescently-tagged transfected constructs. The authors' interpretation of the results of this analysis are summarized in the last model figure, in which a core MT-binding complex of clathrin and TACC3 recruit the ancillary components GTSE1 and chTOG. In addition, the authors investigate the contribution of individual clathrin-binding LIDL motifs in GTSE1 to the recruitment of clathrin and GTSE1 to spindles. Their findings here largely agree with and confirm a recent report regarding the contribution of these motifs to GTSE1 recruitment to the spindle. They further analyzed GTSE1 fragments for interphase and mitotic microtubule localization, and identified a second region of GTSE1 required (but not sufficient) for spindle localization. Finally, the authors report that PIK3C2A is not part of this complex, contradicting (correcting) a previously published study.

      **Major comments:**

      1.The chTOG-FKBP-GFP cell line the authors generate has only a small fraction of chTOG tagged, and thus should not be used for any conclusions about protein localization dependency on chTOG. Because they were unable to construct a HeLa cell line with all copies tagged, the authors expect that the homozygous knock-in of chTOG-FKBP-GFP is lethal, and thus their experience is appropriate to report. However, the authors should not use this cell line alone to make statements about chTOG dependency. They would have to use similar localization analysis, but after another method to disrupt chTOG (as a second-best approach), such as RNAi. In fact, they have reported this in a previous publication (Booth et al 2011). However, the result was different. There, loss of chTOG resulted in reduced clathrin on spindles, suggesting it may stabilize or help recruit the complex. Alternatively, they could remove their chTOG data, but this would compromise the "comprehensive" nature of the work.

      The referee is correct. The point here is to show the results we had using this approach for all four proteins under study. For this reason, we do not want to remove this data and prefer to show our results “warts-and-all”. We feel that the shortcomings of our approach are honestly presented and discussed in the manuscript. While only a fraction of chTOG was tagged, we should expect some co-removal after its induced mislocalization. Since we saw no change, we concluded that chTOG is auxiliary.

      The “second best” approach suggested (RNAi of chTOG) is problematic for two reasons. First, chTOG RNAi results in gross changes to spindle structure (multipolar spindles) and it is difficult to pick apart differences in protein partner localization that result from loss of chTOG from those resulting from changes in spindle structure. Second, the paper is about induced mislocalization as a method for determining protein complexes once a normal spindle has formed. So, removing chTOG prior to mitosis is not comparable. If we get the same or different result, does it confirm or conflict with the data we have? Nonetheless, given the discrepancy with our earlier work, we should investigate this further.

      To address this concern, we will stain endogenous clathrin, TACC3 and GTSE1 following chTOG RNAi and measure their relative levels at the spindle.

      Making the chTOG-FKBP-GFP cell line was difficult. As described in the paper, we only recovered heterozygous clones despite repeated attempts. Since submission, we have been made aware of a HCT116 chTOG-FKBP-GFP cell line that is reported to be homozygously tagged (Cherry et al. 2019 doi: 10.1002/glia.23628).

      A note about this cell line has been added to the paper (Results section, final sentence of 1st paragraph).

      2.The authors initially analyze complex member localization after knocksideways experiments by antibody staining, which has the advantage of analyzing endogenous proteins (versus the later transfected fluorescent constructs). Setting aside potential artefacts from fixation, this would seem to be a better method for controlled analysis to take advantage of their setup (short of generating stable cell lines with second proteins endogenously tagged in a second color - a huge undertaking). The authors conclude that antibody specificity problems confounded their analysis and explained unusual results. However, I think is worth investing a little more effort to sort this out, rather than bringing doubt to the whole data set. Verifying and then using another antibody for chTOG localization would be informative. Of course, the negative control should not be their chTOG-FKBP-GFP line, as it does not relocalize most of chTOG.

      In the case of GTSE1, an alternative explanation to antibody specificity issues would be that the GTSE1-FKBP-GFP cell line is not in fact homozygously tagged. Given the low expression levels on the western provided, and the detection of GTSE1 on the spindle in the induced GTSE1-FKBP-GFP cell line (but not TACC3-FKBP-GFP), it seems plausible that an untagged copy remains. If there are multiple copies of GTSE1 in Hela cells, one untagged copy could represent a small fraction of total GTSE1. This should thus be ruled out. GTSE1 clones should be analyzed with more protein extracts loaded - dilutions of the extracts can determine the sensitivity of the blot to lower protein levels. In addition, sequencing of genomic DNA can reveal a small percentage with different reads.

      We used a two-pronged approach for assessing relocalization of protein partners (staining vs transfected constructs). The staining approach is superior since endogenous proteins are examined, but it is limited by antibody specificity. The transfection approach overcomes this limitation but is in turn limited by effects of overexpression and tagging. Together the two approaches allow us, and anyone employing this method, to get a picture of protein complexes. We didn’t want to create the impression that one or other approach is confounded, but the referee is correct that this analysis would benefit from further work.

      Specifically, to address these concerns:

      • We will verify and use alternative chTOG antibodies to try to improve this dataset.
      • We will test the possibility that an untagged allele of GTSE1 remains. We will use western blotting and a summary of our genomic analysis will be added to the paper.

        3.There is a lot of data contained in the small graphs summarizing quantification of localization in Figs 3 and 4. They would be more accessible to the reader if they were larger and/or an "example" of the chart with labels was present explaining it (essentially what is in the figure legends). Furthermore, there is no statistical test applied to this data that I see. This is needed. How do authors determine whether there is an "effect"?

      Our aim was to compress a lot of information into a small space, while still showing some example primary data. All reviewers raised the same concern which tells us that we went too far towards “data visualization”.

      To address this point, we will rework these figures.

      **Minor issues:**

      1.The GTSE1 constructs used for mutation and localization analysis are 720 amino acids long. A recent study analyzing similar mutations uses a 739 amino acid construct (Rondelet et al 2020). The latter is the predominant transcript in NCBI and Ensembl databases. It appears the construct used by the authors omits the first 19 a.a.. I do not think using the truncated transcript affects conclusions of the manuscript, but it could generate confusion when identifying residues based on a.a.#s of mutant constructs (Fig 6). This should be somehow clarified.

      We were aware of the longer transcript but were using the 720 residue form since it is the canonical sequence in Uniprot (https://www.uniprot.org/uniprot/Q9NYZ3). We did not know that the 739 form is the predominant transcript. We agree this is unlikely to affect our work but that the numbering may cause confusion.

      We have added a note to the Methods (Molecular Biology section) to accurately describe what we and Rondelet et al. have used.

      2.The labeling of constructs in Fig 6C/D is confusing, and appears shifted by eye at places. Please relabel this more clearly.

      Apologies for the error.

      We have relabeled Figure 6C,D and also made a similar alteration to Figure 5C.

      The recommended new experimental data (Analysis complex member levels on spindles after full perturbation of spindle chTOG; new chTOG antibody stainings in the FKBP lines; reanalysis of GTSE1 DNA/protein in GTSE1-FKBP line) should only require a new antibody/siRNA, plus a few weeks time to repeat the analyses already in the paper with new reagents.

      Reviewer #1 (Significance (Required)):

      While multiple individual components of this complex have been previously characterized, the structure and nature of the complex formation and its recruitment to microtubules/spindles remains a complex problem that has yet to be solved.

      Overall this study represents a comprehensive localization-dependency analysis of the Clathrin-TACC3 based spindle complex using a consistent methodology. Although several of the conclusions of the findings echo previous reports, some of the previous literature is contradictory within itself as well as with the conclusions here. Analyzing all components with a single, rapid-perturbation technique thus has great value to present a clear data set, given that the experimental setup conditions and analysis are solid (a goal to which the majority of comments refer).

      Beyond the complex localization/recruitment analysis, two novel findings of this study that emerge are:

      a)GTSE1 contains a second, separate protein region, distinct from the clathrin-binding motifs that is required for its localization to the spindle, and most likely a microtubule-interaction site. This suggests that GTSE1 recruitment to the spindle is more complex than previously reported.

      b)PI3KC2A, which has been reported previously to be a stabilizing member of this complex, is in fact not a member, nor localizes to spindles, nor displays a mitotic defect after loss. This is important conclusion to be made as it would correct the literature, and avoid future confusion.

      --

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this paper, the authors investigate the nature of interactions between members of the TACC3-chTOG-clathrin-GTSE1 complex on the mitotic spindle. By using a series of HeLa cell lines that they have created by CRISPR/Cas9 editing to enable spatial manipulation (knocksideways) of either TACC3, chTOG, clathrin and GTSE1, they show that on spindle microtubules TACC3 and clathrin represent core complex members whereas chTOG and GTSE1 bind to them respectively but not to each other. Additionally, the authors find that the protein PIK3C2A, which has been implicated in this complex previously is in fact not a component of this complex in mitotic cells. The main advance of the paper in my opinion is the endogenous tagging of the proteins for knocksideways experiments since former experiments depended on RNAi silencing and expression of tagged proteins from plasmids, which introduced issues of protein silencing efficiency and plasmid overexpression problems. This approach seems to alleviate these problems, except in the case of chTOG which seems to be lethal in its homozygous variant.

      **Major comments:**

      I find the key conclusions regarding the localization of the components of the complex convincing. There are some issues regarding the specificity of antibodies in immunostaining experiments (Fig 3.) and the influence of mCherry-TACC3 expression on distorted localization of the complex prior to knocksideways. However, I think the general conclusion about which complex components (clathrin and TACC3) influence the localization of the other proteins in the complex (chTOG and GTSE1) stands. One thing that I miss from the paper is the data on the consequences on the spindle shape and morphology after knocksideways. I have noticed on images in both Figure 3 and Figure 4 that in some cases distribution of the signal seems to influence quite a bit the spindle morphology. Also, In Figure 3 I have noticed what seems to me a quite big variation in spindle size in tubulin signal in both untreated and rapamycin cells. Since authors have many of these images already, I believe it would be realistic, not costly and of additional value for the paper to provide more data on the consequences of the knocksideways experiments. Change of spindle size, tubulin intensity and DNA/kinetochore misalignment upon knocksideways would be helpful to appreciate more the findings of the paper. More so since the authors on more than one occasion find their motivation in the field of cancer research and spindle stability relation to it. Some data connection to this motivation would be of value. Experiments seem reproducible.

      The focus of the paper is on using the knocksideways methodology to understand a protein complex during mitosis, rather than looking at its function. We are not keen to do new experiments that are not part of the central message of the paper. However, the Reviewer is correct that we do already have a dataset that can be mined in the manner described.

      To address this point, we will analyze spindle size parameters and also the intensity of tubulin. Our analysis will be limited to the short timeframe of our experiments, but it should reveal or refute any changes in spindle structure that may result from loss of complex members.

      **Minor comments:**

      I have some problems with the clarity of Figure 3 and 4. For Figure 3. In Figure 3 plots on the right are a bit small and not easy to read. Some reorganization of the figure might be beneficial. In Figure 4 plots to the right are also too small to be clear. Also, I miss the number of cells (n) I can't see the number of individual arrows because of the size of graphs.

      Our aim was to compress a lot of information into a small space, while still showing some example primary data. All reviewers raised the same concern which tells us that we went too far towards “data visualization”.

      To address this point, we will rework these figures.

      Reviewer #2 (Significance (Required)):

      I find that the biggest significance of the paper is in the creation of new tools (cell lines) to study the localization of proteins TACC3, chTOG, clathrin and GTSE1. Cell lines where endogenous proteins can be delocalized rapidly will be of value for scientist working not only in mitosis but such as in the case of clathrin research, vesicle formation and trafficking or p53-dependent apoptosis in the case of GTSE1. In the field of mitosis it will surely help and speed up the research concerning the role of these proteins in spindle assembly and stability.

      Field of expertise: mitotic spindle

      --

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      This papers analyses the chTog/TACC3/clathrin/GTSE1 complex that crosslinks and stabilises microtubule bundles in the mitotic spindle. The authors have developed an elegant knock sideways approach to specifically analyse the effects of removing individual components of the complex from the spindle and study the effect this has on the other interactors. They report, based on these assays that the core of the complex is formed by TACC3 and Clathrin while GTSE1 and chTog are auxiliary interactors. They also refute previous evidence that this complex also incorporates PIK3C2A. Overall, this is an interesting study that distinguishes itself predominantly by its methodology. However, some of the reported results need more thorough analysis to allow convincing conclusions.

      **Major comments:**

      1)The knockside way method is the main highlight if this paper. Unlike previous studies by the PI, this time endogenous genes are tagged which is a key advance and allows much better interpretation of the results. I am not sure why the authors have chosen HeLa cells as their model here, given the messed up genome of these cells. A non-transformed cell line would have been preferable, but as a proof of principle study, I think HeLa are acceptable, and I wouldn't expect the authors to repeat all the experiment in another system.

      Figure 1,2 and S1 are describing and validating this approach in some detail, but this will require some more work.

      The authors state that gene targeting was validated using a combination of PCR, sequencing, Western blotting, but show only the results for westerns. PCR analysis that demonstrates homozygous or heterozygous gene targeting should be shown here.

      Another issue is the penetrance of the phenotypes induced by Rapamycin. The authors show nice data of the system working in individual cells but do not give us an idea if this happens in all cells. The localisation of the individual tagged genes should be quantified (ideally with line plots) in 50 randomly chosen mitotic cells with 3 repeats before and after rapamycin treatment. Moreover, the analysis of mitotic duration (Figure S1D) should be extended to include a plus Rapamycin cohort and this should be moved in the main Figure.

      If the system works only in a small proportion of cells, this should be clearly stated. I don't think this would prevent publication, but it is an important piece of information that is missing.

      The Reviewer raises two issues here.

      • PCR analysis should be shown. This issue was also partly raised by Reviewer 1. A summary of our PCR analysis was actually included in Table 1, since the analysis we did is pretty unwieldy. We agree though that presenting our evidence for homozygosity of the cell lines would be useful. To address this point, we will add more detail of the PCR and sequencing work done to validate these cell lines.
      • Does knocksideways happen in all cells? The answer to this depends on the transient expression of MitoTrap and sufficient application of rapamycin. We agree that this will be a useful piece of information to add to the manuscript. A related issue is whether knocksideways of complex members affects mitotic progression. We have established through other experiments that rapamycin application to wild-type cells alters mitotic progression, although application of Rapalog does not have this effect. Our plan to address these points is 1) to analyze the efficacy of knocksideways that readers can expect to achieve using these, or similar cells, and 2) analyze mitotic duration in rapalog-treated cells expressing a rapalog sensitive MitoTrap.

        2)Apart from a simple quantification of mitotic duration, I believe a more detailed mitotic phenotype analysis for each knock-side way gene, especially the homozygous targeted clones, should be included. This can involve more high-resolution live cell imaging of mitotic progression with SiR-DNA and GFP-tubulin, using the dark mitotrap.

      We don’t agree that such an analysis should be included. The focus of this paper is on using the knocksideways methodology to understand a protein complex during mitosis, and not looking at its function. There are several papers on the mitotic phenotypes of these genes probed using RNAi in different cellular systems (examples for chTOG: 10.1101/gad.245603; TACC3/clathrin: 10.1038/emboj.2011.15, 10.1242/jcs.075911, 10.1083/jcb.200911091, 10.1083/jcb.200911120; GTSE1: 10.1083/jcb.201606081). Moreover, our 2013 paper used knocksideways (with RNAi and overexpression) and has a detailed analysis of mitotic progression, microtubule stability, checkpoint activity and kinetochore motions (Cheeseman et al., 2013 doi: 10.1242/jcs.124834).

      New experiments that are not part of the central message of the paper and are unlikely to give new insight are not the best use of our revision efforts for this paper (especially during the pandemic). Having said this, Reviewer 2’s suggestion to use our existing dataset to investigate mitotic phenotypes, will largely answer Reviewer 3’s request.

      We will analyze spindle size parameters and also the intensity of tubulin. Our analysis will be limited to the short timeframe of our experiments, but it should reveal or refute any changes in spindle structure that result from the loss of complex members.

      3)Overall, the quantitative analysis in Figure 3 ,4 and 7 is not good enough and sometimes doesn't fully support the conclusions. In Figure 3,4 a convoluted way of demonstrating the change in localisation is shown and this panel is so small that is almost impossible to read. Also, there is no statistical analysis, and the sample size seems very small . At least 25 cells should be analysed here in 3 repeats. I would suggest to unify the quantification in the MS and use the line plots shown in Figure 5 and 6 and compare each protein before and after rapamycin addition. This is much easier to read and more convincing. The images of the cells panels can be moved to a supplement as they contain very little information. This would generate space to expand the size and depth of the quantitative analysis. Instead of Anova tests, I would recommend using a simple t-test comparing each condition to its relevant control since this is the only relevant comparison in the experiment. Statistical significance should be calculated for each experiment with sufficient sample size. It would also be better to show the individual data points from the three repeats in different colours so that the reproducibility between repeat can be judged.

      This type of statistical analysis should be uniformly done throughout the MS and also extended to Figure 7.

      The referee raises several issues here with our data presentation and statistical analysis.

      • Our aim in Figures 3 and 4 was to compress a lot of information into a small space, while still showing some example primary data. All reviewers raised the same concern about these figures which tells us that we went too far towards “data visualization”. To address this point, we will rework Figures 3 and 4 to provide more clear data presentation.
      • The Reviewer’s comments about statistical analysis however are not sound. First, it is incorrect to state that simple t-tests can be applied (this is a form of p-hacking). Correction for multiple testing must be done on these datasets. Second, the reviewer arbitrarily states numbers for cells and experimental repeats without considering the effect size or it seems, understanding the structure of the data that we have collected. Sample sizes are small but they are taken from many independent replicates. Third, and related to the previous point, the fixed and live cell data are structured differently which means that a uniform data presentation is not possible. The live data has a paired design and each cell is an independent replicate (with replicates done over several trials). The fixed data is unpaired and we have taken measures from several experiments (independent replicates). The point about applying statistical tests to the data is also made by Reviewer 1 and we will use appropriate tests (NHST or estimation statistics) as we re-work the figures.

        Reviewer #3 (Significance (Required)):

      In my opinion, the most interesting aspect of the MS is the methodology. Based on this, publication is justified and will be of interest to a wider audience. That is why a more detailed analysis of the penetrance of this manipulation across the cell population will be critical.

      The application of this method to analyse the composition of the TACC3/Clathrin complex on the spindle is the main biological advance, and the novel information is rather limited but not unimportant.

      Overall, if these results can be properly quantified I would recommend publication.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      This papers analyses the chTog/TACC3/clathrin/GTSE1 complex that crosslinks and stabilises microtubule bundles in the mitotic spindle. The authors have developed an elegant knock sideways approach to specifically analyse the effects of removing individual components of the complex from the spindle and study the effect this has on the other interactors. They report, based on these assays that the core of the complex is formed by TACC3 and Clathrin while GTSE1 and chTog are auxiliary interactors. They also refute previous evidence that this complex also incorporates PIK3C2A. Overall, this is an interesting study that distinguishes itself predominantly by its methodology. However, some of the reported results need more thorough analysis to allow convincing conclusions.

      Major comments:

      1)The knockside way method is the main highlight if this paper. Unlike previous studies by the PI, this time endogenous genes are tagged which is a key advance and allows much better interpretation of the results. I am not sure why the authors have chosen HeLa cells as their model here, given the messed up genome of these cells. A non-transformed cell line would have been preferable, but as a proof of principle study, I think HeLa are acceptable, and I wouldn't expect the authors to repeat all the experiment in another system. Figure 1,2 and S1 are describing and validating this approach in some detail, but this will require some more work. The authors state that gene targeting was validated using a combination of PCR, sequencing, Western blotting, but show only the results for westerns. PCR analysis that demonstrates homozygous or heterozygous gene targeting should be shown here. Another issue is the penetrance of the phenotypes induced by Rapamycin. The authors show nice data of the system working in individual cells but do not give us an idea if this happens in all cells. The localisation of the individual tagged genes should be quantified (ideally with line plots) in 50 randomly chosen mitotic cells with 3 repeats before and after rapamycin treatment. Moreover, the analysis of mitotic duration (Figure S1D) should be extended to include a plus Rapamycin cohort and this should be moved in the main Figure. If the system works only in a small proportion of cells, this should be clearly stated. I don't think this would prevent publication, but it is an important piece of information that is missing.

      2)Apart from a simple quantification of mitotic duration, I believe a more detailed mitotic phenotype analysis for each knock-side way gene, especially the homozygous targeted clones, should be included. This can involve more high-resolution live cell imaging of mitotic progression with SiR-DNA and GFP-tubulin, using the dark mitotrap.

      3)Overall, the quantitative analysis in Figure 3 ,4 and 7 is not good enough and sometimes doesn't fully support the conclusions. In Figure 3,4 a convoluted way of demonstrating the change in localisation is shown and this panel is so small that is almost impossible to read. Also, there is no statistical analysis, and the sample size seems very small . At least 25 cells should be analysed here in 3 repeats. I would suggest to unify the quantification in the MS and use the line plots shown in Figure 5 and 6 and compare each protein before and after rapamycin addition. This is much easier to read and more convincing. The images of the cells panels can be moved to a supplement as they contain very little information. This would generate space to expand the size and depth of the quantitative analysis. Instead of Anova tests, I would recommend using a simple t-test comparing each condition to its relevant control since this is the only relevant comparison in the experiment. Statistical significance should be calculated for each experiment with sufficient sample size. It would also be better to show the individual data points from the three repeats in different colours so that the reproducibility between repeat can be judged. This type of statistical analysis should be uniformly done throughout the MS and also extended to Figure 7.

      Significance

      In my opinion, the most interesting aspect of the MS is the methodology. Based on this, publication is justified and will be of interest to a wider audience. That is why a more detailed analysis of the penetrance of this manipulation across the cell population will be critical. The application of this method to analyse the composition of the TACC3/Clathrin complex on the spindle is the main biological advance, and the novel information is rather limited but not unimportant. Overall, if these results can be properly quantified I would recommend publication.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this paper, the authors investigate the nature of interactions between members of the TACC3-chTOG-clathrin-GTSE1 complex on the mitotic spindle. By using a series of HeLa cell lines that they have created by CRISPR/Cas9 editing to enable spatial manipulation (knocksideways) of either TACC3, chTOG, clathrin and GTSE1, they show that on spindle microtubules TACC3 and clathrin represent core complex members whereas chTOG and GTSE1 bind to them respectively but not to each other. Additionally, the authors find that the protein PIK3C2A, which has been implicated in this complex previously is in fact not a component of this complex in mitotic cells. The main advance of the paper in my opinion is the endogenous tagging of the proteins for knocksideways experiments since former experiments depended on RNAi silencing and expression of tagged proteins from plasmids, which introduced issues of protein silencing efficiency and plasmid overexpression problems. This approach seems to alleviate these problems, except in the case of chTOG which seems to be lethal in its homozygous variant.

      Major comments:

      I find the key conclusions regarding the localization of the components of the complex convincing. There are some issues regarding the specificity of antibodies in immunostaining experiments (Fig 3.) and the influence of mCherry-TACC3 expression on distorted localization of the complex prior to knocksideways. However, I think the general conclusion about which complex components (clathrin and TACC3) influence the localization of the other proteins in the complex (chTOG and GTSE1) stands. One thing that I miss from the paper is the data on the consequences on the spindle shape and morphology after knocksideways. I have noticed on images in both Figure 3 and Figure 4 that in some cases distribution of the signal seems to influence quite a bit the spindle morphology. Also, In Figure 3 I have noticed what seems to me a quite big variation in spindle size in tubulin signal in both untreated and rapamycin cells. Since authors have many of these images already, I believe it would be realistic, not costly and of additional value for the paper to provide more data on the consequences of the knocksideways experiments. Change of spindle size, tubulin intensity and DNA/kinetochore misalignment upon knocksideways would be helpful to appreciate more the findings of the paper. More so since the authors on more than one occasion find their motivation in the field of cancer research and spindle stability relation to it. Some data connection to this motivation would be of value. Experiments seem reproducible.

      Minor comments:

      I have some problems with the clarity of Figure 3 and 4. For Figure 3. In Figure 3 plots on the right are a bit small and not easy to read. Some reorganization of the figure might be beneficial. In Figure 4 plots to the right are also too small to be clear. Also, I miss the number of cells (n) I can't see the number of individual arrows because of the size of graphs.

      Significance

      I find that the biggest significance of the paper is in the creation of new tools (cell lines) to study the localization of proteins TACC3, chTOG, clathrin and GTSE1. Cell lines where endogenous proteins can be delocalized rapidly will be of value for scientist working not only in mitosis but such as in the case of clathrin research, vesicle formation and trafficking or p53-dependent apoptosis in the case of GTSE1. In the field of mitosis it will surely help and speed up the research concerning the role of these proteins in spindle assembly and stability.

      Field of expertise: mitotic spindle

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Here Ryan et al. have used localization analysis following induced rapid relocalization of endogenous proteins to investigate the composition and recruitment hierarchy of a clathrin-TACC3-based spindle complex that is important for microtubule organization and stability. The authors generate different HeLa cell lines, each with one of four complex members (TACC3, CLTA, chTOG and GTSE1) endogenously tagged with FKBP-GFP via Cas9-mediated editing. This tag allows rapid recruitment to the mitochondria upon rapamycin addition ("knocksideways"). They ultimately quantify each of the 4 components' localization to the spindle following knocksideways of each component using fluorescently-tagged transfected constructs. The authors' interpretation of the results of this analysis are summarized in the last model figure, in which a core MT-binding complex of clathrin and TACC3 recruit the ancillary components GTSE1 and chTOG. In addition, the authors investigate the contribution of individual clathrin-binding LIDL motifs in GTSE1 to the recruitment of clathrin and GTSE1 to spindles. Their findings here largely agree with and confirm a recent report regarding the contribution of these motifs to GTSE1 recruitment to the spindle. They further analyzed GTSE1 fragments for interphase and mitotic microtubule localization, and identified a second region of GTSE1 required (but not sufficient) for spindle localization. Finally, the authors report that PIK3C2A is not part of this complex, contradicting (correcting) a previously published study.

      Major comments:

      1.The chTOG-FKBP-GFP cell line the authors generate has only a small fraction of chTOG tagged, and thus should not be used for any conclusions about protein localization dependency on chTOG. Because they were unable to construct a HeLa cell line with all copies tagged, the authors expect that the homozygous knock-in of chTOG-FKBP-GFP is lethal, and thus their experience is appropriate to report. However, the authors should not use this cell line alone to make statements about chTOG dependency. They would have to use similar localization analysis, but after another method to disrupt chTOG (as a second-best approach), such as RNAi. In fact, they have reported this in a previous publication (Booth et al 2011). However, the result was different. There, loss of chTOG resulted in reduced clathrin on spindles, suggesting it may stabilize or help recruit the complex. Alternatively, they could remove their chTOG data, but this would compromise the "comprehensive" nature of the work.

      2.The authors initially analyze complex member localization after knocksideways experiments by antibody staining, which has the advantage of analyzing endogenous proteins (versus the later transfected fluorescent constructs). Setting aside potential artefacts from fixation, this would seem to be a better method for controlled analysis to take advantage of their setup (short of generating stable cell lines with second proteins endogenously tagged in a second color - a huge undertaking). The authors conclude that antibody specificity problems confounded their analysis and explained unusual results. However, I think is worth investing a little more effort to sort this out, rather than bringing doubt to the whole data set. Verifying and then using another antibody for chTOG localization would be informative. Of course, the negative control should not be their chTOG-FKBP-GFP line, as it does not relocalize most of chTOG.

      In the case of GTSE1, an alternative explanation to antibody specificity issues would be that the GTSE1-FKBP-GFP cell line is not in fact homozygously tagged. Given the low expression levels on the western provided, and the detection of GTSE1 on the spindle in the induced GTSE1-FKBP-GFP cell line (but not TACC3-FKBP-GFP), it seems plausible that an untagged copy remains. If there are multiple copies of GTSE1 in Hela cells, one untagged copy could represent a small fraction of total GTSE1. This should thus be ruled out. GTSE1 clones should be analyzed with more protein extracts loaded - dilutions of the extracts can determine the sensitivity of the blot to lower protein levels. In addition, sequencing of genomic DNA can reveal a small percentage with different reads.

      3.There is a lot of data contained in the small graphs summarizing quantification of localization in Figs 3 and 4. They would be more accessible to the reader if they were larger and/or an "example" of the chart with labels was present explaining it (essentially what is in the figure legends). Furthermore, there is no statistical test applied to this data that I see. This is needed. How do authors determine whether there is an "effect"?

      Minor issues:

      1.The GTSE1 constructs used for mutation and localization analysis are 720 amino acids long. A recent study analyzing similar mutations uses a 739 amino acid construct (Rondelet et al 2020). The latter is the predominant transcript in NCBI and Ensembl databases. It appears the construct used by the authors omits the first 19 a.a.. I do not think using the truncated transcript affects conclusions of the manuscript, but it could generate confusion when identifying residues based on a.a.#s of mutant constructs (Fig 6). This should be somehow clarified.

      2.The labeling of constructs in Fig 6C/D is confusing, and appears shifted by eye at places. Please relabel this more clearly.

      The recommended new experimental data (Analysis complex member levels on spindles after full perturbation of spindle chTOG; new chTOG antibody stainings in the FKBP lines; reanalysis of GTSE1 DNA/protein in GTSE1-FKBP line) should only require a new antibody/siRNA, plus a few weeks time to repeat the analyses already in the paper with new reagents.

      Significance

      While multiple individual components of this complex have been previously characterized, the structure and nature of the complex formation and its recruitment to microtubules/spindles remains a complex problem that has yet to be solved.

      Overall this study represents a comprehensive localization-dependency analysis of the Clathrin-TACC3 based spindle complex using a consistent methodology. Although several of the conclusions of the findings echo previous reports, some of the previous literature is contradictory within itself as well as with the conclusions here. Analyzing all components with a single, rapid-perturbation technique thus has great value to present a clear data set, given that the experimental setup conditions and analysis are solid (a goal to which the majority of comments refer).

      Beyond the complex localization/recruitment analysis, two novel findings of this study that emerge are:

      a)GTSE1 contains a second, separate protein region, distinct from the clathrin-binding motifs that is required for its localization to the spindle, and most likely a microtubule-interaction site. This suggests that GTSE1 recruitment to the spindle is more complex than previously reported.

      b)PI3KC2A, which has been reported previously to be a stabilizing member of this complex, is in fact not a member, nor localizes to spindles, nor displays a mitotic defect after loss. This is important conclusion to be made as it would correct the literature, and avoid future confusion.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful for the insightful, constructive and very positive reviews provide by the three reviewers. Please find responses to each of the reviewer comments below.


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors study proteins localised to the apical end of the highly polarised parasites causing Toxoplasmosis and malaria. They find new proteins using BioID and examine the localisation of these along with recently identified proteins in the two different parasites. They key question they address is whether there is a conservation of the apical components in these distantly related parasites as well as in some even more distantly related organisms. This is an important question as the apical part comprises many essential proteins of invasion of host cells and shows a unique structure that defines the apicomplexans as a group. The apical structure can be highly elaborate such as in T. gondii and less elaborate as in P. falciparum. The authors now show that there is a large conservation between the species in the protein makeup of the apical end. The experiments are well performed, displayed and discussed and there is no doubt about the validity of the presented results. The text is eloquently written, if at times a bit wordy.

      My only main suggestion would be to possibly add data on gene disruption of the two candidates (0310700 and 1216300) that are not detected in blood stage parasites but in the insect stages. A deletion of these should be technically straightforward and would show whether the proteins are important to the parasite. Likely not all of the now many proteins are essential for the parasites but these are good candidates to rapidly investigate. But showing a functional impact might convince editors at certain journals.

      Authors’ response: The central aim of this study was to ask if the molecular composition of the conoid complex is conserved across Apicomplexa. Functional dissection of proteins is part of an exciting set of subsequent questions and studies that will now follow by us and others. However, careful and thorough phenotyping of gene disruptions is not trivial work, would be most informative to perform in both Toxoplasma and Plasmodium, and is therefore beyond the scope of this project. Regarding the two proteins suggested by this reviewer for follow-up work and the question of ‘essentiality’, that the proteins have not been lost during parasite selection through evolution is clear evidence of their relevance to the biology of Plasmodium.

      Other suggestions in chronological order (line numbers would have helped)

      title: maybe write 'conoid complex proteome'

      Authors’ response: while we initially thought that this change would be suitable, given that the subsequent part of the title is ‘reveals a cryptic conoid feature’ we think it is clearer and more logical to leave this title in its original form. The conoid complex includes the apical polar rings, and these are not considered to be cryptic or previously unrecognised, only the conoid. While our study confirms that there is conservation across all proteome components of the conoid complex, this is secondary to the primary question of this study.

      abstract: not sure about the use of the words instrument and substructures

      Authors’ response: we believe that the use of ‘instrument’ is an appropriate analogy of a tool and not different from the use of ‘machine’ and ‘machinery’ that is widely used in molecular and cellular biology. Similarly, ‘substructure’ acknowledges that within recognised structures, such as the conoid, there is further specific organisation such as the conoid base or apex.

      page 2 last lines: is tubulin monomeric or polymerized?

      Authors’ response: to specify the polymerized state of tubulin as mentioned here the text has been changed to ‘the presence of tubulin polymers’.

      page 3 name protein talked about in 9th line

      Authors’ response: we have now named this protein (RNG2) as suggested.

      third paragraph: mention previous proteomics studies e.g. from Ke Hu (mentioned later in discussion)

      Authors’ response: We feel that it is more appropriate to leave the discussion of the Hu et al (2006) proteomics study, along with various subsequent approaches used in pursuit of discovering conoid-associated proteins, to the discussion as currently occurs. In the introduction we seek to efficiently inform the reader of the current state of knowledge that makes the value and nature of the questions that we have asked in this study apparent. But we do give full credit and evaluation of previous studies in the discussion which we think is the most appropriate place for this.

      first paragraph or results could go into introduction

      Authors’ response: The first paragraph of the Results contains specific detail of just one aspect of this study, the use of hyperLOPIT. This is relevant to the new analysis that we have made of the hyperLOPIT data in this study. We, therefore, believe that it is most appropriately presented here in the Results in association with the new analyses we described. Our aim is that the Introduction is succinct and serves the entire study.

      page 4: add reference after BioID

      Authors’ response: reference added as suggested

      page 5: add definitions of the conoid; what technique was used to report YFP-SAS6?

      Authors’ response: It is unclear what this reviewer is requesting with respect to definitions of the conoid on this page. Nevertheless, we have now included a thorough definition of the conoid based on the original electron microscopy studies (fourth paragraph of the Introduction).

      With respect to the technique used to report on YFP-tagged SAS6 in the de Leon et al 2013 study, we now include fuller description of this previous study as follows:

      ‘The fluorescence imaging used in the de Leon et al study was limited to lower resolution widefield microscopy. Immuno-TEM was also used, however, contrary to their conclusions, did show YFP presence throughout transverse and oblique sections of the conoid consistent with our detection of SAS6L throughout the conoid body.’

      page 7: 'showed similar localisation' instead of 'phenocopied'?; add reference after ookinete stage; add expression levels from PlasmoDB to the Table 1 data at least for merozoites, ookinetes and sporozoites or add separate table for the 9 proteins in supplement

      Authors’ response: ‘phenocopied’ replaced, as suggested. Reference added after ookinete stage, as suggested.

      As requested, we have complied available expression data for the Plasmodium proteins throughout the different zoite stages and will include these data as supplemental material in our subsequent revision.

      Discussion: Maybe discuss that the conoid complex is a cytoskeletal structure and that the other cytoskeletons (actin, microtubules, subpellicular network) also differ between the species investigated in their composition and overall architecture

      Authors’ response: These are reasonable suggested analogies and we will introduce them in the subsequent revision.

      page 9: at least two proteins could be deleted as they seem to not confer any growth defect on blood stages (see main comment)

      Authors’ response: This reviewer has not linked this comment to a specific statement on page 9, however, we are cautious not to interpret lack of observed growth defects in experimental scenarios with unimportant or irrelevant proteins. Maintenance, through natural selection and evolution, of proteins of a structure indicate that they are selectively advantageous and of functional relevance. The two proteins in question are not expressed in the blood stage, so one wouldn’t expect their deletion to have consequence in this stage.

      Apart from classic TEM images also Cryo EM data is available for apex of merozoite and sporozoite. Worth to discuss?

      Authors’ response: According to this review’s subsequent suggestion (below), we are now preparing a schematic for the subsequent revision of each of the zoite stages of Plasmodium and these draw on Cryo EM tomography data.

      Add and discuss the recent work from Curr Biol and EMBO J of the Yuan lab on ookinete formation?

      Authors’ response: These two reports are excellent studies of the polarised development of the cell pellicle during ookinete formation and control of gliding initiation, but don’t specifically related to the conoid complex structures that are the subject of our study. We, therefore, do not see a logical place to include discussion of these works.

      Reviewer #2 (Significance (Required)):

      The paper provides a conceptual advance over previous data as it shows clearly a high level of conservation of the protein components of the conoid complex. It could introduce a new terminology for these important apical structure of Apicomplexan parasites and provides a good basis to dissect the molecular functions.

      Authors’ response: We appreciate this reviewer recognising this opportune point in time to more clearly define the terminology applied to these apical structures so that they can be more clearly and easily compared between taxa. We will use the suggested schematic figure (see comment below) that is now in preparation as a basis and guide for a refined nomenclature based on precedent in the literature.

      As it stands all scientists investigating Plasmodium and Toxoplasma invasion of host cells will be highly interested in this study, most scientists researching apicomplexan organisms should be and some evolutionary scientists will be interested in this study.

      Key papers in the field are the discovery of the Toxoplasma conoid as a highly twisted microtubule-like structure (Hu et al., JCB 2002; doi: 10.1083/jcb.200112086) the first description of an apical proteome (Hu et al., PLoS Path 2006; 10.1371/journal.ppat.0020013), the description of a tilted arrangement of the rings in Plasmodium versus Toxoplasma (Kudryashev et al., Cell Microbiol 2012; doi: 10.1111/j.1462-5822.2012.01836.x) and the discovery of apical located proteins that are essential for conoid formation (Tosetti et al., eLife 2020; 10.7554/eLife.56635) to name a few.

      If intended for a broader audience, a cartoon of a conoid complex across the different species investigated and discussed here would help for visual guidance highlighting the similarities and differences

      Authors’ response: This is a good suggestion and we are presently preparing a schematic of all stages studied and supporting this with electron microscopy.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this work, Koreny et al. characterized the localization of a new collection of conoid proteins in Toxoplasma gondii as well as in several different stages of Plasmodium berghei. The authors discovered that these proteins are located in several distinct substructures in Plasmodium and are expressed in a stage-specific manner. The data are of high quality, well‐organized, and well presented. The paper is well written. The introduction, in particular, was a pleasure to read. This reviewer (Ke Hu) does not have any new experiments to suggest.

      However, while the authors present LOPIT+BIOID as a powerful approach to identify conoid proteins, implying that it is more reliable than previously published approaches (see below), the manuscript includes no data to show what the false positive or false negative rate is with the current approach, nor any estimate of how many conoid proteins were missed entirely.

      Authors’ response: In our validation of putative conoid-associated proteins identified by the hyperLOPIT+BioID approach we reporter-tagged 18 proteins to resolve their cellular location by microscopy. All 18 were verified as being located at the site of the conoid. So, by this measure there were no false positives. The veracity of the hyperLOPIT data was also confirmed across other cell compartments in our report where 62 proteins were reporter-tagged from which there were no false positive assignments of cell location (Barylyuk et al., 2020, Cell Host & Microbe, in press:doi:10.1016/j.chom.2020.09.011), bioRixv: https://doi.org/10.1101/2020 .04.23.057125).

      Estimating false negatives is more difficult, but we know that these would occur as for any mass spectrometry-based detection technique. However, we have not claimed to have been exhaustive, nor was this required to answer our central question of are there conserved conoid-associated proteins throughout Apicomplexa? To address this question, we required a good sample of proteins, and the methods that we have employed provided this.

      Page 7: "Previous identification of conoid complex proteins used methods including subcellular enrichment, correlation of mRNA expression, and proximity tagging (BioID) (Hu et al. 2006; Long, Anthony, et al. 2017; Long, Brown, et al. 2017). Amongst these datasets many components have been identified, although often with a high false positive rate. We have found the hyperLOPIT strategy to be a powerful approach for enriching in proteins specific to the apex of the cell, and BioID has further refined identification of proteins specific to the conoid complex region."

      The authors should state whether the candidate proteins were chosen in an unbiased way or not.

      Authors’ response: Candidate proteins selected for validation by microscopy were not biased for any known likelihood of being associated with the conoid, other than our proteomics data what we were seeking to test. However, we did preference proteins with the following traits, 1) proteins with strong corresponding gene knockout fitness phenotypes from published studies, 2) proteins with some evidence of conserved functional domains, and 3) genes with orthologues found in Plasmodium spp. and other apicomplexans. These traits were chosen with future functional studies in mind where proteins might be more informative of conoid-related functions and relevance in other apicomplexans. All validated proteins, however, were otherwise uncharacterised and, therefore, were not knowingly biased for more likely conoid-association over others discovered by our proteomics approach. We now include the following statement.

      “All proteins selected for validation were previously uncharacterised and with no a priori reason to be identified as conoid-associated other than our proteomics data.”

      If so, how many proteins were localized to the conoid and how many were not?

      Authors’ response: as stated above, we observed no false positives from the sample of 18 protein locations verified by microscopy.

      Related to this, the majority (14 out of 20) of the conoid proteins identified by LOPIT+BIOID in this paper were previously identified as conoid candidate proteins in Hu et al's 2006 paper, based on the number of peptides retrieved from the conoid enriched vs depleted fractions. Those data (see below) have been available from ToxoDB for many years and should be acknowledged.

      Accession# - conoid enriched : conoid depleted (from Hu et al. 2006)

      222350 - 2:0

      274120 - 3:0

      291880 - 1:0

      301420 - 3:1

      246720 - 4:0

      258090 - 10:0

      266630 - 8:1

      208340 - 4:2

      253600 - 1:0

      306350 - not found

      250840 - 1:0

      292120 - not found

      219070 - not found

      274160 - not found

      320030 - 7:1

      227000 - 10:0

      278780 - not found

      284620 - not found

      295420 - 6:0

      297180 - 4:0

      Authors’ response: Proteomic methods and mass spectrometry have experienced revolutionary advances since this 2006 study was conducted. These include improvements in both sensitivity and quantitation accuracy. The Hu et al 2006 study provided an exciting first step towards conoid protein discovery. However, by their original estimation, at least 35% of their putative conoid-specific proteins were identifiable as false positives (e.g. ribosomal proteins) and this estimate could not account for the majority of uncharacterised proteins whose potential for false positive attribution to the conoid was untested. From almost 300 proteins, this study only validated four as associated with the conoid. The further proteins listed above were not validated as conoid proteins in the Hu et al study and, therefore, could not be distinguished from the many false positives reported in their work. In our Table 1, we have acknowledged the Hu et al study for the select proteins that they established as conoid proteins in their study.

      To further assess the utility of this 2006 conoid-enriched proteome we sorted the Hu et al detected proteins on our full hyperLOPIT assignments. Of the proteins that were reported by Hu et al as either exclusive to the conoid-enriched fraction or enriched by at least 2-fold over the conoid-depleted fraction, 15% were assigned to the apical 1 and 2 clusters (representing the relevant compartments to the conoid complex). Thus, according to the hyperLOPIT data these represent the true positives found in this study and 13 of these proteins were independently validated as conoid-associated by us. Significantly, however, 85% of the conoid-exclusive and conoid-enriched proteins from Hu et al (2006) were allocated to a non-apical location with 99% probability by hyperLOPIT, and, during our validation of 62 assignments we verified the alternative location of eight of these. False positives, therefore, greatly outnumbered true positives in this earlier dataset. This high rate of false positives in subcellular isolation proteomics is typical of the challenges that this method faces, and this was the rationale for and strength of the alternative hyperLOPIT approach. Given the overall relatively low level of conoid specificity in the earlier work we do not think that there is value in making specific protein-by-protein reference to it.

      Reviewer #3 (Significance (Required)):

      see above

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      This manuscript details the further use of the hyperplexed Localisation of Organelle Proteins by Isotope Tagging (hyperLOPIT) that the group has previous published using T. gondii tachyzoites by combining this with BioID and super-resolution microscopy in order to uncover new proteins that form part of a structurally known and functionally elusive conoid. The authors conclusively identified new proteins that localise to the conoid structure in T. gondii and also excitingly showed that not only is this structure found in all invasive forms of plasmodium (using the P. berghei model) but there also is a different molecular make up in the blood stage merozoites which have a slightly reduced number of proteins (or possible as yet unknown alternatives) compared to ookinetes and sporozoite conoid structures. This study is scientifically sound and the conclusions reached are well supported by the results presented.

      **Major Comments:** No major comments

      **Minor Comments:**

      1)While both the introduction and discussion and well written and detailed they could both be a little more concise.

      Authors’ response: We take this as a style recommendation, but we note that the other reviewers commented on the text’s “eloquence” and that the introduction in particular was a “pleasure to read”. We take these comments as votes of confidence in the current form.

      2)Selection of the 5 new genes in Tg to be tagged (top pg 5) it was not clear as to the selection criteria for these 5.

      Authors’ response: Please see the same query, and response with modified text, made by Reviewer #3.

      This also leads to the second part of this question where there appears to be some genes missing from Table 1 and Table S1, specifically those found in both SAS6L and RNG2 BioID. It was mentioned that 25 were identified in both SAS6L and RNG2 BioID. In Table 1 (there are 23) there is no mention of 223790, 281650, 224700, and 293540 but they are in the Table S1 (assuming these 4 are not selected in this study for tagging) but in table S1 (there are 25 listed) 216080 (AKMT) and 234250 (CIP1) that are in the Table 1 as being identified in both SAS6L and RNG2 BioID are absent from the Table S1 does this mean there are actually 27 or was the indication of identified in both SAS6L and RNG2 BioID for 216080 (AKMT) and 234250 (CIP1) in Table 1 a mistake?

      Authors’ response: This reviewer has overlooked that Table 1 reports on all currently known conoid associated proteins, including those not detected in the hyperLOPIT data but reported in the literature, whereas Table S1 is exclusively those proteins detected and assigned as ‘apical’ by hyperLOPIT. The reported BioID-detection for each protein is then made within this framework. Thus, the proteins that occur in only one or the other table do so because they don’t satisfy these two sets of criteria. We have rechecked the numbers reported in the text and they are correct.

      3)Table 1: There is the fitness score for Pf orthologues but no mention of fitness in Pb (the model used) from the PlasmoGEM screens, considering the authors use the Pb model it would be of interest to add this in the table.

      Authors’ response: The Plasmodium berghei PlasmoGEM gene disruption screen were much more limited in number than that for P. falciparum. Consequently, fitness scores were available for only two of the Plasmodium orthologues for which we have location data. We, therefore, thought it was of limited utility to include these data in Table 1, and these data are in the public domain should a reader seek them.

      4)Figure 2: The image for localisation with SAS6L for 291880 and 258090 appear to be missing.

      Authors’ response: Initially we did not make the separate transgenic cell lines for each protein with both the SAS6L and RNG2 markers. This was because one marker was usually sufficient to resolve the relative location of the protein of interest. However, given this reviewer’s comment and the potential for some extra information to be recovered by using both markers, we have now generated all cell lines necessary for this analysis. We are presently completing the imaging of these new cell lines and these data will be included in the subsequent revision.

      5)Figure 3: It is unclear why both SAS6L and RNG2 are not used for all localisations shown (this could be clarified in the text)

      Authors’ response: see previous comment.

      6)Figure 5: It is a shame only 7 of the 9 plasmodium orthologues were included in the super resolution as there is only 2 more to have the complete set.

      Authors’ response: Ideally, we would have been able to achieve this but, the restrictions imposed by the COVID-19 disruption to laboratory access and activities ultimately slightly limited these analyses. However, to answer the central question of whether there is conservation of the Toxoplasma conoid proteome in Plasmodium it was not necessary to perform super resolution imaging for all of these proteins. The major outcome of this study, therefore, is not affected by this.

      7)Figure 6: As with Figure 5 it would be better if more were included in the super-resolution images in this sporozoite stage.

      Authors’ response: Same response as above. Generation of sporozoites requires passage through the mosquito vector so this is even more resource-intensive than generation of ookinetes that can be differentiated in vitro from mouse-derived parasites. Again, the answers to the central questions posed by this study do not require these further, high resolution, data.

      8)Figure 7: This would be improved with at least a selection (or even all 6) to have the super-resolution images (possibly even with free merozoites)

      Authors’ response: We did apply 3D-SIM imaging to fixed merozoites, however, unlike ookinetes and sporozoites, the imaged fixed material was inferior to the live cell GFP imaging that we have included. This likely reflects the poorer fixation properties of Plasmodium merozoites that is a challenge of these cell forms that is widely experienced by Plasmodium researchers. We do not have access to a 3D-SIM microscope within a containment laboratory necessary for handling viable parasites, therefore, could not attempt to image live material with this instrument. Again, the answers to the central questions posed by this study do not require these further, high resolution, data

      9)As there are numerous new protein identified in 2 different parasites and with the composition of the conoid differing at different stages it would be beneficial to have some sort of schematic model of the apical complex in Tg and Pb indicating where each new protein localises

      Authors’ response: In response to this reviewer, and reviewer #2’s suggestion, we are now preparing schematic models of the apices of all of the relevant organism stages.

      Reviewer #4 (Significance (Required)):

      The authors have combined expert mass spectrometry and super-resolution microscopy to identify new components of the conoid in Tg and added to the knowledge that will help to uncover the function of the structure. But perhaps the most significant is the conclusive identification of the conoid in all 3 invasive stages of the plasmodium parasite. Until now it was widely accepted that the conoid was missing in plasmodium and to uncover multiple proteins that appear to make up and constitute this structure in Plasmodium is highly significant and clear of interest to the Apicomplexean field. Furthermore the suggestion that the conoid differs in the molecular makeup within Plasmodium depending on stage is very intriguing and clearly of interest. This paper expertly combined cutting-edge proteomic and microscopy to identify the conoid in Plasmodium. This manuscript would have a broad readership in parasitology, proteomics, and cell biology

      Our expertise is largely in molecular parasitology and microscopy

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript details the further use of the hyperplexed Localisation of Organelle Proteins by Isotope Tagging (hyperLOPIT) that the group has previous published using T. gondii tachyzoites by combining this with BioID and super-resolution microscopy in order to uncover new proteins that form part of a structurally known and functionally elusive conoid. The authors conclusively identified new proteins that localise to the conoid structure in T. gondii and also excitingly showed that not only is this structure found in all invasive forms of plasmodium (using the P. berghei model) but there also is a different molecular make up in the blood stage merozoites which have a slightly reduced number of proteins (or possible as yet unknown alternatives) compared to ookinetes and sporozoite conoid structures. This study is scientifically sound and the conclusions reached are well supported by the results presented.

      Major Comments: No major comments

      Minor Comments:

      1)While both the introduction and discussion and well written and detailed they could both be a little more concise.

      2)Selection of the 5 new genes in Tg to be tagged (top pg 5) it was not clear as to the selection criteria for these 5. This also leads to the second part of this question where there appears to be some genes missing from Table 1 and Table S1, specifically those found in both SAS6L and RNG2 BioID. It was mentioned that 25 were identified in both SAS6L and RNG2 BioID. In Table 1 (there are 23) there is no mention of 223790, 281650, 224700, and 293540 but they are in the Table S1 (assuming these 4 are not selected in this study for tagging) but in table S1 (there are 25 listed) 216080 (AKMT) and 234250 (CIP1) that are in the Table 1 as being identified in both SAS6L and RNG2 BioID are absent from the Table S1 does this mean there are actually 27 or was the indication of identified in both SAS6L and RNG2 BioID for 216080 (AKMT) and 234250 (CIP1) in Table 1 a mistake?

      3)Table 1: There is the fitness score for Pf orthologues but no mention of fitness in Pb (the model used) from the PlasmoGEM screens, considering the authors use the Pb model it would be of interest to add this in the table.

      4)Figure 2: The image for localisation with SAS6L for 291880 and 258090 appear to be missing.

      5)Figure 3: It is unclear why both SAS6L and RNG2 are not used for all localisations shown (this could be clarified in the text)

      6)Figure 5: It is a shame only 7 of the 9 plasmodium orthologues were included in the super resolution as there is only 2 more to have the complete set.

      7)Figure 6: As with Figure 5 it would be better if more were included in the super-resolution images in this sporozoite stage.

      8)Figure 7: This would be improved with at least a selection (or even all 6) to have the super-resolution images (possibly even with free merozoites)

      9)As there are numerous new protein identified in 2 different parasites and with the composition of the conoid differing at different stages it would be beneficial to have some sort of schematic model of the apical complex in Tg and Pb indicating where each new protein localises

      Significance

      The authors have combined expert mass spectrometry and super-resolution microscopy to identify new components of the conoid in Tg and added to the knowledge that will help to uncover the function of the structure. But perhaps the most significant is the conclusive identification of the conoid in all 3 invasive stages of the plasmodium parasite. Until now it was widely accepted that the conoid was missing in plasmodium and to uncover multiple proteins that appear to make up and constitute this structure in Plasmodium is highly significant and clear of interest to the Apicomplexean field. Furthermore the suggestion that the conoid differs in the molecular makeup within Plasmodium depending on stage is very intriguing and clearly of interest. This paper expertly combined cutting-edge proteomic and microscopy to identify the conoid in Plasmodium. This manuscript would have a broad readership in parasitology, proteomics, and cell biology

      Our expertise is largely in molecular parasitology and microscopy

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this work, Koreny et al. characterized the localization of a new collection of conoid proteins in Toxoplasma gondii as well as in several different stages of Plasmodium berghei. The authors discovered that these proteins are located in several distinct substructures in Plasmodium and are expressed in a stage-specific manner. The data are of high quality, well‐organized, and well presented. The paper is well written. The introduction, in particular, was a pleasure to read. This reviewer (Ke Hu) does not have any new experiments to suggest.

      However, while the authors present LOPIT+BIOID as a powerful approach to identify conoid proteins, implying that it is more reliable than previously published approaches (see below), the manuscript includes no data to show what the false positive or false negative rate is with the current approach, nor any estimate of how many conoid proteins were missed entirely.

      Page 7: "Previous identification of conoid complex proteins used methods including subcellular enrichment, correlation of mRNA expression, and proximity tagging (BioID) (Hu et al. 2006; Long, Anthony, et al. 2017; Long, Brown, et al. 2017). Amongst these datasets many components have been identified, although often with a high false positive rate. We have found the hyperLOPIT strategy to be a powerful approach for enriching in proteins specific to the apex of the cell, and BioID has further refined identification of proteins specific to the conoid complex region."

      The authors should state whether the candidate proteins were chosen in an unbiased way or not. If so, how many proteins were localized to the conoid and how many were not? Related to this, the majority (14 out of 20) of the conoid proteins identified by LOPIT+BIOID in this paper were previously identified as conoid candidate proteins in Hu et al's 2006 paper, based on the number of peptides retrieved from the conoid enriched vs depleted fractions. Those data (see below) have been available from ToxoDB for many years and should be acknowledged.

      Accession# - conoid enriched : conoid depleted (from Hu et al. 2006)

      222350 - 2:0

      274120 - 3:0

      291880 - 1:0

      301420 - 3:1

      246720 - 4:0

      258090 - 10:0

      266630 - 8:1

      208340 - 4:2

      253600 - 1:0

      306350 - not found

      250840 - 1:0

      292120 - not found

      219070 - not found

      274160 - not found

      320030 - 7:1

      227000 - 10:0

      278780 - not found

      284620 - not found

      295420 - 6:0

      297180 - 4:0

      Significance

      see above

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors study proteins localised to the apical end of the highly polarised parasites causing Toxoplasmosis and malaria. They find new proteins using BioID and examine the localisation of these along with recently identified proteins in the two different parasites. They key question they address is whether there is a conservation of the apical components in these distantly related parasites as well as in some even more distantly related organisms. This is an important question as the apical part comprises many essential proteins of invasion of host cells and shows a unique structure that defines the apicomplexans as a group. The apical structure can be highly elaborate such as in T. gondii and less elaborate as in P. falciparum. The authors now show that there is a large conservation between the species in the protein makeup of the apical end. The experiments are well performed, displayed and discussed and there is no doubt about the validity of the presented results. The text is eloquently written, if at times a bit wordy. My only main suggestion would be to possibly add data on gene disruption of the two candidates (0310700 and 1216300) that are not detected in blood stage parasites but in the insect stages. A deletion of these should be technically straightforward and would show whether the proteins are important to the parasite. Likely not all of the now many proteins are essential for the parasites but these are good candidates to rapidly investigate. But showing a functional impact might convince editors at certain journals.

      Other suggestions in chronological order (line numbers would have helped)

      title: maybe write 'conoid complex proteome'

      abstract: not sure about the use of the words instrument and substructures

      page 2 last lines: is tubulin monomeric or polymerized?

      page 3 name protein talked about in 9th line

      third paragraph: mention previous proteomics studies e.g. from Ke Hu (mentioned later in discussion)

      first paragraph or results could go into introduction

      page 4: add reference after BioID

      page 5: add definitions of the conoid; what technique was used to report YFP-SAS6?

      page 7: 'showed similar localisation' instead of 'phenocopied'?; add reference after ookinete stage; add expression levels from PlasmoDB to the Table 1 data at least for merozoites, ookinetes and sporozoites or add separate table for the 9 proteins in supplement

      Discussion: Maybe discuss that the conoid complex is a cytoskeletal structure and that the other cytoskeletons (actin, microtubules, subpellicular network) also differ between the species investigated in their composition and overall architecture

      page 9: at least two proteins could be deleted as they seem to not confer any growth defect on blood stages (see main comment)

      Apart from classic TEM images also Cryo EM data is available for apex of merozoite and sporozoite. Worth to discuss?

      Add and discuss the recent work from Curr Biol and EMBO J of the Yuan lab on ookinete formation?

      Significance

      The paper provides a conceptual advance over previous data as it shows clearly a high level of conservation of the protein components of the conoid complex. It could introduce a new terminology for these important apical structure of Apicomplexan parasites and provides a good basis to dissect the molecular functions. As it stands all scientists investigating Plasmodium and Toxoplasma invasion of host cells will be highly interested in this study, most scientists researching apicomplexan organisms should be and some evolutionary scientists will be interested in this study.

      Key papers in the field are the discovery of the Toxoplasma conoid as a highly twisted microtubule-like structure (Hu et al., JCB 2002; doi: 10.1083/jcb.200112086) the first description of an apical proteome (Hu et al., PLoS Path 2006; 10.1371/journal.ppat.0020013), the description of a tilted arrangement of the rings in Plasmodium versus Toxoplasma (Kudryashev et al., Cell Microbiol 2012; doi: 10.1111/j.1462-5822.2012.01836.x) and the discovery of apical located proteins that are essential for conoid formation (Tosetti et al., eLife 2020; 10.7554/eLife.56635) to name a few.

      If intended for a broader audience, a cartoon of a conoid complex across the different species investigated and discussed here would help for visual guidance highlighting the similarities and differences

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point response to reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **

      The authors constructed a virtually complete fitness landscape of the P1 extension region (4-base-paired helix) in the group I intron from Tetrahymena thermophila, using a kanamycin resistance reporter to evaluate the fold-change in fitness, which is related to self-splicing activity. This was a clever choice of system because it was known from earlier work that the P1 extension adopts two different conformations during self-splicing. The fitness of each variant was determined from the number of reads acquired from the sequencing data sets and analyzed through an extensive computational pipeline. The strength of the paper is that this machine learning approach can be used to calculate how individual variants contribute to the fitness landscape and assess the directions of epistasis across a large number of identified genotypes.

      We thank the reviewer for highlighting one of the key strengths of our manuscript, the fact that our analytical approach, using SHAP values, enables contributions of individual variants to be assessed in a genotype-specific manner. This approach provides for a sound, robust, and principled way of describing and understanding the fitness impact of one mutation in the context of (potentially many) others.

      The authors argue that machine learning more successfully models subtle effects that arise from interactions between RNA residues, and that the power to analyze deep mutational sequencing experiments can better rationalize fitness constraints arising from multiple conformational states.

      We do indeed argue that machine learning is likely to play an increasing role in making sense of deep mutational scanning data. These scans provide high-resolution information on how fitness maps onto genotype, but the molecular underpinnings of this relationship often remain obscure. It is these “hidden” underpinnings, including the effects of specific mutations on RNA/protein folding, structures, and dynamics, that machine learning approaches can help elucidate.

      The results are mostly consistent with previous studies even though the authors collected the data in a more advanced and complicated way. They are also able to rationalize complex phenotypes - for example, the observed fitness defects are more prevalent under an unfavorable growth condition (30ºC), because the lower temperature hinders conformational exchange. Although such cold sensitive effects are well known in RNA, it is gratifying that this can be captured in the fitness landscape.

      Finding temperature-related fitness effects that are consistent with impaired conformational exchange was also gratifying for us and we thank the reviewer for highlighting this finding.

      The results would be more convincing if the authors directly measure the self-splicing activity of a few key variants, such as the C2C21 mutant, to determine whether these mutations alter the self-splicing mechanism of the Tte-119(C20A) master sequence in the way that they infer from their model. In interpreting their results, they may want to consider misfolding of the intron core (coupled to base pairing of P1) and reverse self-splicing. Reversibility in the hairpin ribozyme, for example, turned out to be the key for understanding the effects of certain mutations.

      We appreciate that measurements of splicing activity for individual genotypes would complement and further strengthen our study. We will therefore aim to construct strains for a few key genotypes and assay self-splicing activity using RT-qPCR – an approach we previously used successfully to monitor splicing kinetics of self-splicing introns in yeast mitochondria (see Rudan et al. 2018 eLife 7:e35330). Specifically, we will quantify the fraction of spliced and unspliced transcripts using primers that span the exon-exon and the 3’ exon-intron junction, respectively (the 5’ intron-exon junction is genotypically diverse and would require genotype-specific primers). This will be done under non-selective (-kan) conditions, where the relative fraction of spliced and unspliced transcripts is a function of intrinsic splicing ability and not confounded by selection. We aim to include the master sequence, C2C21, G3C20 and its mirror genotype C3G20, U3 (which restores perfect complementarity in the master sequence), and G5 (inferred from the high-throughput experiment to make a strong negative contribution to fitness).

      In interpreting our results, we will consider different mechanisms of splicing failure, such as kinetic problems (slow dissociation of P1ex), misfolding of the intron core, reverse self-splicing, and the use of cryptic splice sites, which has previously been documented (see e.g. Woodson & Cech 1991 Biochemistry 30:2042-2050). We note, however that a precise mechanistic dissection of the splicing defects of individual variants is not the purpose of this manuscript and we therefore do not aim to establish genotype-specific defects in great molecular detail.

      Related to the point above, interesting conclusions regarding the relationships between base identity and epistasis that arise from metastability should be strengthened with additional examples. For example, the authors can explain why a reverse base-pairing variant (C3G20) exhibits negative epistasis but is not similar to that of the G3C20 construct. This would ideally use the data from the screen but also be validated by checking the self-splicing activity of a few individuals at low and high temperature.

      In measuring splicing activity and its link to fitness for a subset of key variants (see point #4), we will include at least one mirror example such as C3G20/G3C20. In addition, we will highlight additional examples of this mirror asymmetry based on the results from our high-throughput screen.

      They should validate the screen by showing that kanamycin resistance does indeed correlate strictly with self-splicing activity, and not some other feature such as RNA turnover. (It would also not be a bad idea to check this in the cell, which can be done by primer extension or Northern blotting.)

      This question (i.e. whether altered RNA stability rather than splicing efficiency explains differential KNT production and ultimately fitness) has previously been addressed by Guo & Cech (2002) when introducing the knt+intron reporter system. These authors found no difference in mRNA stability in constructs that displayed differential kanamycin resistance. To shore up this conclusion further, we will measure fitness (via colony counts, growth rate or more directly through competitive fitness assays) of the key variants for which we determine splicing activity (see point #4) and then correlate splicing and fitness.

      The benefit of the machine learning model is that it can extract signals that may be hard to detect otherwise. The downside is that it doesn't produce a physical model, as far as I am aware. The parameters are themselves not meaningful - except to the degree that trends in the fitness estimates can be explained after the fact. This is something that should ideally be explained more directly in the manuscript.

      The reviewer raises an interesting point, that indeed deserves further discussion/explanation. The reviewer is right that, at first sight, high-resolution fitness landscapes like ours do not directly produce a physical (structural) model of the molecule under investigation. They connect genotype and fitness, but the molecular intermediate – a biophysical structure – is not explicit. However, over the last few years, it has become apparent that deep mutational scanning experiments can – both in principle and in practice – yield information that can be leveraged to infer such a physical model. In short, covariation in fitness between residues in a protein or bases in an RNA can be used as inputs for constraint-based modelling of physical interactions. Notably, Schmiedel & Lehner (2019, Nature Genetics 51: 1177-1186) recently demonstrated that deep mutational scanning data can be used in this manner to reconstruct secondary and tertiary protein structure with high accuracy. In principle, the same approach can be used to reconstruct RNA structures. This will require more extensive, molecule-wide fitness data, but our study points towards just this future, even for data collected from structural ensembles.

      When we stated in the original manuscript that deconvolution of the fitness landscape might help to reverse engineer structures, this ability to interpolate between genotype and fitness to reveal hidden biophysical/structural relationships is what we refer to. We will revise the manuscript to make this connection more explicit.

      The authors claim that by evaluating a large number of sequences at two conditions, they can capture variants with intermediate phenotypes (Fig. 1). This is not necessarily true. If the original screen allows only the most active variants to survive on kan+ medium, then the signature of intermediate phenotypes may not be encoded in the original data, and thus not retrievable even with sophisticated algorithms, which may also be prone to overfitting. At what limit of stringency will the screen fail to yield information about intermediate fitness? How deeply must one sequence to recover this information, especially if noisy or degraded? Some discussion of these effects would be helpful.

      The capacity of any high-throughput sequencing-based DMS experiment to resolve intermediate phenotypes does indeed depend on a number of things. The reviewer highlights two of these: First, in screens where the phenotype is not binary (dead/alive) but fitness can be measured on a continuous scale, can we – and do we – capture phenotypes with intermediate fitness? What if only the fittest/most active variants survive? This is, ultimately, an empirical question, and one we can answer quite definitively: we do observe a large range of intermediate phenotypes, which – in our study – correspond to intermediate fold-change values. For each genotype, we can provide confidence limits and assess statistical significance. Table S1 provides this information. Our capacity to resolve these intermediate phenotypes is mainly based on three things. One is adequate sequencing depth, as highlighted by the reviewer. The second is the number of biological replicates (N=6) we analyse, which allows us to differentiate biological variability from noise for a large number of genotypes. This is an important aspect of DMS experiments that has often been overlooked (i.e. there are many other studies where only a single replicate is analysed and biological heterogeneity is not taken into account). With six replicates in hand, we can directly estimate variability (as done e.g. in our DESeq2 analysis) and quantify uncertainty so as to guard against overfitting. In our view, this is arguably more important than sequencing depth in deriving appropriate fitness estimates. Finally, we can resolve intermediate phenotypes because we keep the time lag between initial exposure to kanamycin and assaying genotype frequencies relatively short (overnight growth, see Methods). Our experiment is effectively a multi-genotype competition experiment, and we provide a snapshot across the genotype pool at a given time. If we had measured after several days of culture, genotypes with greater relative fitness would have spread further through the population, at the cost of less fit genotypes, many of which would likely have been eliminated. We kept measurement lag relatively short on purpose so that we could see a clear differential response to kanamycin while still being able to catch more than just a handful of the very fittest genotypes.

      In light of the above, it will be apparent that there are no simple answers to the reviewer’s questions about required sequencing depth, levels of stringency, etc. The ability to assign differential fitness across a large population of genotypes hinges on multiple interrelated considerations (sequencing depth, complexity of the final & starting pool, number of replicates). In revising the manuscript, we will highlight some of the key considerations just discussed, bearing in mind that the manuscript cannot possibly discuss all possible pitfalls and requirements of deep mutational scanning experiments in great detail.

      Lastly, the evolvability of RNA is fascinating and there is much to learn. However, the authors don't discuss the implications of their findings for molecular evolution although they throw the term around. It would be exciting if there is a trend in the fitness landscape that could help explain the trajectory of RNA evolution in nature.

      We agree with the reviewer that it would be exciting to link deep mutational scanning results more closely with observable patterns of RNA evolution. This is true both in relation to evolution of P1ex/group I introns specifically and evolution of dynamic RNA structures more generally. Regarding the latter, we note that selection against excess stability has previously been inferred for 5’ UTRs (see e.g. Gu et al. 2010 PLoS Comp Biol 6: e1000664), although our case is slightly different in that a helix still needs to form but be sufficiently unstable to enable swift dissociation. We also note that riboswitches might make for an excellent subject to study asymmetric constraint and selection against excess stability as they involve formation of competing helices (including participation of some but not all nucleotides in more than one helix), their structure/function is well understood, and many examples are known, providing opportunities for evolutionary analysis. We consider this outside the scope of the current study. We will, however, seek to analyse patterns of evolution in P1ex to establish whether they correspond in a meaningful way to the fitness trends we observe in the laboratory. To do so, we will analyse the distribution and evolutionary history of variants across orthologous introns in different Tetrahymena species/strains, with a focus on P1ex, P10 and the surrounding sequence. Fortunately for us, the 23S ribosomal RNA gene in which the intron is embedded has been used as a phylogenetic marker so that intron/exon sequence information is available for a reasonable number of species/strains (see Doerder 2018 J Eukaryot Microbiol 66:182-208). We will generate an alignment of these sequences and ask, for example, whether N2-N5 are subject to different constraints than N18-N21 mirroring our experimental findings. We have previously successfully quantified patterns of variation surrounding self-splicing introns in yeast mitochondria (Repar & Warnecke 2017 Genetics 205:1641-1648). Note here that extending this analysis beyond Tetrahymena is problematic. Specifically, the intron is absent from close relatives of Tetrahymena (Doerder 2018 J Eukaryot Microbiol 66:182-208) and P1-proximal structures of distant relatives are quite variable. In addition, we are looking at intronic regions that are not only adjacent to but also directly interact with exonic sequence. The exonic context in which the intron is embedded therefore matters but will be quite different for more distant group I introns. We therefore think that aligning and comparing distant orthologs has limited merit.

      The authors use the abbreviation DMS for deep mutational scanning; the RNA structure field uses the reagent dimethylsulfate that is also abbreviated DMS. They may want to choose a different acronym or just avoid an acronym altogether.

      We appreciate this point about false-friend acronyms. We will either find a different acronym or avoid it altogether.

      Reviewer #1 (Significance (Required)):

      As the importance of RNA structure for gene expression becomes more widely appreciated, interest in understanding the evolution of RNA structures is also increasing. Compared with the molecular evolution of proteins, evolution and fitness in RNA is far less understood, although the authors appropriately point to a number of recent studies on this topic. The main advance here is to use machine learning methods to analyze the results of a large genotypic screen, with the goal of more accurately capturing the fitness effects of sequences at varied distances from the parental sequence. The specific conclusions reached here such as the importance of metastability or the prominence of cold sensitive effects are not revolutionary, but the authors illustrate how such phenomena can be investigated more systematically and in more depth.

      We thank the reviewer for highlighting that our analytical approach showcases how deep mutational scanning data can be analysed in an unbiased and systematic manner to better understand the relationship between genotype, molecular phenotype (e.g. structure), and fitness. The reviewer also rightly points to specific results we obtain regarding temperature-related effects and metastability of P1ex/P10. However, we believe that the most important contribution of this work is a more general one, namely our proof-of-principle demonstration that deep mutational scanning data can capture multiple conformational states simultaneously, and that these states can be deconvoluted from a single fitness landscape to attribute the fitness impact of individual mutations to specific RNA conformations. To our knowledge this had not been explicitly demonstrated before and our work provides an important cornerstone for future studies looking to interpret mutational effects in either RNAs or proteins in the light of dynamic structures.

      In light of comments by reviewer #2 below, it is worth reiterating the proof-of-principle nature of this study. Many of the specific results we obtain (e.g. importance of avoiding excess stability in P1ex) are not revolutionary. Indeed, we would be worried if they were. We chose to investigate P1ex because substantial prior work exists that has furnished us with solid positive controls. This independent prior validation allows us to both have great confidence in the data we generate and demonstrate cogently that the two conformational states at the beginning and end of the splicing reaction are captured in the data.

      Finally, we believe our work, in covering a virtually complete genotype space, using multiple replicates to quantify uncertainty in fitness estimates, and using SHAP scores to interpret variant effects in genotype-specific context, sets a new high bar for this type of study and will provide valuable reference data and analytical recipes for future analyses. **

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Soo et al probes the effect of mutations on the fitness of the Tetrahymena Group I self-splicing intron. They used high-throughput sequencing to simultaneously identify the effect of every possible sequence in a 4-bp helix. The approach is sound and the conclusions are generally supported. However, the analysis seems overly complicated given the dataset. Both the analysis and the accompanying writing make it difficult to understand what seems to be a fairly clear conclusion - that the relative stabilities of two alternative RNA helices are important for splicing.

      We thank the reviewer for testifying to the validity of our approach and the soundness of our conclusions. Regarding the complexity of the analysis, the reviewer is right in that – for the conclusion that the relative stabilities of two alternative helices are important for fitness – a simpler analysis would have sufficed. However, as elaborated in response to point #11 above, our objective here is not merely to draw specific conclusions about the relative stabilities of P1ex and P10, but more general: a) to demonstrate that a single fitness landscape can be deconvoluted to implicate multiple conformations in fitness defects and b) to provide a basic but powerful recipe for doing so in an unbiased, systematic manner using machine learning.

      We will strive to make the writing clearer so that readers can follow this reasoning and appreciate our analytical choices.

      • **Major Comments** *

      The authors state that this method can identify the impact of transient conformational states. However, the two conformational states in this study are not transient - in fact they are associated with two distinct chemical steps of splicing and are quite stable. It may be that the effect of important transient states would be observed, but this study does not demonstrate that.

      We used the word “transient” to describe two alternative RNA structures formed during the life cycle of the intron. Both states (characterized by P1ex and P10 formation) are transient in as much as they disappear as splicing proceeds. In retrospect, we agree with the reviewer that this usage is too loose (given how the term is generally used in the literature) and might evoke the wrong connotations. We will therefore revise the manuscript to eliminate references to P1ex and P10 as transient states, but rather describe them as alternative conformations. Of course, the general point remains true: that deep mutational scanning data should in principle capture all fitness-relevant structural states even if these are transient (in the strict sense of the word).

      "Fitness" ends up being on an arbitrary scale, which impairs some analysis. A similar high-throughput sequencing pipeline could have been used to directly monitor splicing of every mutant, though at this point that is outside the scope of this study. Even with the arbitrary units, it would be clearer if more time were spent comparing fitness to base-pair stability on an individual basis, rather than the broad analyses. (See minor comments for details.)

      The reviewer is right in saying that a high-throughput pipeline could have been designed to monitor splicing of each genotype directly (rather than assaying fitness of the cell population that represents a particular genotype).We chose not to do so. One reason for this is that monitoring splicing directly would have necessitated design of a more complicated assay. This is because, to monitor splicing efficiency, one would have to monitor both pre-mRNA and mRNA for different genotypes. The former is straightforward (using primers that span the exon-intron junction) but the latter is not: successful splicing removes the genotype-specific information from the mRNA (that information being solely encoded in the intron). This a solvable problem in principle. One might, for example, introduce barcodes of sufficient complexity in the mRNA that can be linked back to the intron genotype, but doing so would have introduced a further source of error and complicated analysis. We therefore opted for monitoring genotypic fitness by sequencing the plasmids from which the RNAs originate. This does mean that our measurements of fitness are not coupled to a specific molecular phenotype (such as splicing efficiency) – we presume (but are not entirely sure) this is what the reviewer refers to when talking about fitness being on an “arbitrary scale”. However, fitness derived in this manner has the advantage of providing information that does not start from a mechanistic preconception. We ask how variant affects survival and reproduction of the cell without presuming specific mechanism and the results can therefore capture any mechanism, including those that we did not consider initially. The challenge then becomes to tease out possibly multiple mechanisms from unbiased data.

      We will tackle the reviewer’s final comment, regarding analysis of base-pair stability, below in response to one of the minor comments (point #20).

      \*Minor Comments** *

      The sentence in the abstract beginning "Using an in vivo report system..." is very difficult to comprehend. This is due both to the length of the sentence and the word usage. The final sentence of the abstract is similarly difficult. In general, the writing overemphasizes complexity at the cost of clarity.

      We will revise the entire manuscript to make the writing both clearer and more concise.

      Analysis of results in terms of "epistasis" obscures what could be a straightforward observation. This is the same as saying that mutants are not independent, or that their energetic costs are not additive. This follows obviously from the observation that the nucleotides being mutated are base-paired.

      Making explicit reference to “epistasis” is a considered choice. Framing results in terms of epistasis might be less familiar to readers grounded in RNA or protein biophysics/biochemistry, but is very much at the heart of thinking about the genotype-phenotype relationship from an evolutionary perspective, where global descriptions of epistasis are commonplace and usually provide the starting point for thinking about genotype-phenotype relationships, evolution and evolvability. So what seems unnecessarily obscure when seen through the lens of one field, is natural when considered in the context of another. Importantly, it is also the central approach adopted by many if not most prior deep mutational scanning studies (see e.g. Hayden et al. 2011; Pressman et al. 2019; Zhang et al. 2009; Li et al. 2016; Puchta et al. 2016; Domingo et al. 2018; Li and Zhang 2018; Weinreich et al. 2013; Lalić and Elena 2015; Bendixsen et al. 2017 as cited on page 3 of the manuscript) so we think this framing is helpful to compare our results to prior work.

      We expect that the readership will include many researchers interest in mapping genotype-phenotype-fitness relationships who will expect to see global analyses and descriptors of the type we present. We will, however, revise the manuscript to ensure that our description of the findings remains accessible to readers from other fields.

      More specifically, we also note that the fact that mutations are not independent (i.e. epistasis exists) might be trivial from the fact that P1ex is a base-paired helix. The magnitude and direction (“sign”) of epistasis, however, are not. In fact, as we describe, contrary to prior DMS on RNA helices, we find a lot of positive epistasis, reflecting, as we argue, selection against excess stability of P1ex to allow subsequent formation of P10.

      The novel information is the sensitivity of fitness to base pairing. This is best shown in an analysis like Figure 3A (see below), not broad measures of epistasis.

      Please see responses to points #11, #12, and #16 above for an elaboration of what we consider to be the main merits of this study and why providing broad measures of epistasis is a sensible choice.

      Figure 1C isn't necessary for the reader to understand the process.

      We are happy to follow editorial guidance as to whether this panel is superfluous and should be removed or is worth including.

      It is unclear what figure 2C is showing. It appears that the replicates are similar to each other, that 30 deg C and 37 deg C are also similar, but that +/- Kan are different. This probably doesn't need a figure in the main text.

      This figure does indeed capture what the reviewer describes: genotype pools in +/-kan are least similar to each other, while 30/37ºC are similar but distinct in the +kan condition and effectively indistinguishable in the -kan condition, in line with expectations. We agree with the reviewer that this information per se is something that would typically be found in a supplementary figure. However, we would advocate for retention of this panel in the main manuscript in this instance because of the way in which it was derived: using the Bray-Curtis dissimilarity index. To our knowledge, this is the first time that Bray-Curtis dissimilarity has been used to quantify, in a principled way, the similarity between genotype pools. Borrowed from the ecology literature, the index captures both richness (number of different species/genotypes in the ecosystem/genotype pool) and relative abundance to provide an integrated measure of genotype diversity. We believe that this measure will be useful for future studies and rather than relegating the figure to the supplement, we would aim to briefly highlight its methodological novelty. *

      *

      Figure 3A could be the most informative part of the manuscript. However, predicted minimum free energy should be on the x-axis as the independent variable. The expectation then is that you would see a peak in fitness at some free energy, with fitness falling off both with increased and decreased stability. Furthermore, there should be more analysis along these lines. The authors should calculate helical stability for both P1ex and P10 for every mutant and compare with fitness. Mutations which affect both could also be separated out. Figure 4C comes the closest to this but views it only in terms of GC pairs; there is no reason not to quantify the energetic effects given that predictions of stability for helices is quite good. Deviations from a model invoking only helical stabilities would indicate another factor is involved (alternative base-pairing or tertiary structure, for example).

      We agree with the reviewer that the axes in Figure 3A should be flipped and we will do so in the revised manuscript. We also agree that, when it comes to helical stability of P1ex, the simple expectation would be to see a peak at a certain stability with drop-offs either side, as intimated by Figure 4C. We further agree with the reviewer that Figure 4C is rather indirect and can be made more quantitative by considering helical stability across all genotypes directly. To this end, we will use one of the many tools available that allow prediction of helical stability from primary sequence (e.g. the enf2 function in RNAStructure, as used by Torgerson et al 2018 RNA, see point #24 below) and replace Figure 4C with a more quantitative fitness landscape based on these computations. To provide added confidence in the computations of helical stabilities from primary sequence in the context of our structure, we will also calculate helical stabilities from molecular dynamics simulations for the subset of genotypes we considered previously (Figure 4E/F) and see how inferred stabilities compare.

      There appears to be a missing verb in the legend for figure 3A, second sentence.

      We will fix this error.

      Figure S5 appears to be redundant with Figure 1.

      At first glance, Figure S5 does indeed appear redundant with Figure 1 but it is not. Figure S5 shows the relevant sequence of the group I intron and bordering exons in its native context, i.e. when embedded in the 23S ribosomal RNA gene of Tetrahymena thermophila, whereas Figure 1 shows the genotype of the mutant intron embedded in knt. The sequences are different. We will revise the legend to Figure S5 to make this clearer.

      Figure S6 is a better analysis than what appears in the main text, and could be expanded to all base pairs.

      We will expand Figure S6 to include all base pairs as suggested. We disagree that this is a better analysis compared to what appears in the main text. Rather, it provides a complementary, hypothesis-driven view whereas the analysis in the main text is more systematic and unbiased in approach. *

      *

      Reviewer #2 (Significance (Required)):

      This manuscript largely focuses on the technical approach. The shift in analytic strategy described above would increase the conceptual impact. The conclusions are consistent with and fit in with recent uses of high-throughput sequencing to study RNA systems. For example Pitt & Ferré-D'Amaré, Science (2010) and Kobari et al, NAR (2015) describe fitness landscapes of the ligase and HDV ribozymes, respectively. Torgerson et al RNA (2018) make similar measurements on the glycine riboswitch, including a treatment of relative helix stability for two mutually exclusive conformations. The overall results are of interest to researchers in the field of noncoding RNA.

      We thank the reviewer for highlighting the paper by Torgerson et al, of which – embarrassingly – we were not aware. We will make reference to this paper in a revised manuscript and highlight that riboswitches might be a good model system to further explore asymmetric constraint and selection against excess stability in an evolutionary context (also see our response to point #9 above).

      As highlighted earlier, we think the main conceptual impact of our work lies not in the description of helical stabilities. Rather, it lies in a) providing a rigorous proof-of-principle that deep mutational scanning can capture multiple conformational states simultaneously, and b) that, using an unbiased machine learning approach, these states can be deconvoluted from a single fitness landscape to attribute the fitness impact of individual mutations to specific RNA conformations. A shift in analytical strategy to “cut to the chase” and narrowly focus on helical stability would be misguided in this context, as we seek to provide not only insights into the data at hand but also lay out a sound and general recipe for analysing similar datasets in the future.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The manuscript by Soo et al probes the effect of mutations on the fitness of the Tetrahymena Group I self-splicing intron. They used high-throughput sequencing to simultaneously identify the effect of every possible sequence in a 4-bp helix. The approach is sound and the conclusions are generally supported. However, the analysis seems overly complicated given the dataset. Both the analysis and the accompanying writing make it difficult to understand what seems to be a fairly clear conclusion - that the relative stabilities of two alternative RNA helices are important for splicing.

      Major Comments

      1.The authors state that this method can identify the impact of transient conformational states. However, the two conformational states in this study are not transient - in fact they are associated with two distinct chemical steps of splicing and are quite stable. It may be that the effect of important transient states would be observed, but this study does not demonstrate that.

      2."Fitness" ends up being on an arbitrary scale, which impairs some analysis. A similar high-throughput sequencing pipeline could have been used to directly monitor splicing of every mutant, though at this point that is outside the scope of this study. Even with the arbitrary units, it would be clearer if more time were spent comparing fitness to base-pair stability on an individual basis, rather than the broad analyses. (See minor comments for details.)

      Minor Comments

      1.The sentence in the abstract beginning "Using an in vivo report system..." is very difficult to comprehend. This is due both to the length of the sentence and the word usage. The final sentence of the abstract is similarly difficult. In general, the writing overemphasizes complexity at the cost of clarity.

      2.Analysis of results in terms of "epistasis" obscures what could be a straightforward observation. This is the same as saying that mutants are not independent, or that their energetic costs are not additive. This follows obviously from the observation that the nucleotides being mutated are base-paired. The novel information is the sensitivity of fitness to base pairing. This is best shown in an analysis like Figure 3A (see below), not broad measures of epistasis.

      3.Figure 1C isn't necessary for the reader to understand the process.

      4.It is unclear what figure 2C is showing. It appears that the replicates are similar to each other, that 30 deg C and 37 deg C are also similar, but that +/- Kan are different. This probably doesn't need a figure in the main text.

      3.Figure 3A could be the most informative part of the manuscript. However, predicted minimum free energy should be on the x-axis as the independent variable. The expectation then is that you would see a peak in fitness at some free energy, with fitness falling off both with increased and decreased stability. Furthermore, there should be more analysis along these lines. The authors should calculate helical stability for both P1ex and P10 for every mutant and compare with fitness. Mutations which affect both could also be separated out. Figure 4C comes the closest to this but views it only in terms of GC pairs; there is no reason not to quantify the energetic effects given that predictions of stability for helices is quite good. Deviations from a model invoking only helical stabilities would indicate another factor is involved (alternative base-pairing or tertiary structure, for example).

      4.There appears to be a missing verb in the legend for figure 3A, second sentence.

      5.Figure S5 appears to be redundant with Figure 1.

      6.Figure S6 is a better analysis than what appears in the main text, and could be expanded to all base pairs.

      Significance

      This manuscript largely focuses on the technical approach. The shift in analytic strategy described above would increase the conceptual impact. The conclusions are consistent with and fit in with recent uses of high-throughput sequencing to study RNA systems. For example Pitt & Ferré-D'Amaré, Science (2010) and Kobari et al, NAR (2015) describe fitness landscapes of the ligase and HDV ribozymes, respectively. Torgerson et al RNA (2018) make similar measurements on the glycine riboswitch, including a treatment of relative helix stability for two mutually exclusive conformations. The overall results are of interest to researchers in the field of noncoding RNA.

      Our expertise is in RNA biochemistry and biophysics. We are not qualified to evaluate the details of several of the computational pipelines described.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors constructed a virtually complete fitness landscape of the P1 extension region (4-base-paired helix) in the group I intron from Tetrahymena thermophila, using a kanamycin resistance reporter to evaluate the fold-change in fitness, which is related to self-splicing activity. This was a clever choice of system because it was known from earlier work that the P1 extension adopts two different conformations during self-splicing. The fitness of each variant was determined from the number of reads acquired from the sequencing data sets and analyzed through an extensive computational pipeline.

      The strength of the paper is that this machine learning approach can be used to calculate how individual variants contribute to the fitness landscape and assess the directions of epistasis across a large number of identified genotypes. The authors argue that machine learning more successfully models subtle effects that arise from interactions between RNA residues, and that the power to analyze deep mutational sequencing experiments can better rationalize fitness constraints arising from multiple conformational states. The results are mostly consistent with previous studies even though the authors collected the data in a more advanced and complicated way. They are also able to rationalize complex phenotypes - for example, the observed fitness defects are more prevalent under an unfavorable growth condition (30{degree sign}C), because the lower temperature hinders conformational exchange. Although such cold sensitive effects are well known in RNA, it is gratifying that this can be captured in the fitness landscape.

      Despite these strengths, there are several weaknesses that should ideally be addressed before publication.

      1.The results would be more convincing if the authors directly measure the self-splicing activity of a few key variants, such as the C2C21 mutant, to determine whether these mutations alter the self-splicing mechanism of the Tte-119(C20A) master sequence in the way that they infer from their model. In interpreting their results, they may want to consider misfolding of the intron core (coupled to base pairing of P1) and reverse self-splicing. Reversibility in the hairpin ribozyme, for example, turned out to be the key for understanding the effects of certain mutations.

      2.Related to the point above, interesting conclusions regarding the relationships between base identity and epistasis that arise from metastability should be strengthened with additional examples. For example, the authors can explain why a reverse base-pairing variant (C3G20) exhibits negative epistasis but is not similar to that of the G3C20 construct. This would ideally use the data from the screen but also be validated by checking the self-splicing activity of a few individuals at low and high temperature.

      3.They should validate the screen by showing that kanamycin resistance does indeed correlate strictly with self-splicing activity, and not some other feature such as RNA turnover. (It would also not be a bad idea to check this in the cell, which can be done by primer extension or Northern blotting.)

      4.The benefit of the machine learning model is that it can extract signals that may be hard to detect otherwise. The downside is that it doesn't produce a physical model, as far as I am aware. The parameters are themselves not meaningful - except to the degree that trends in the fitness estimates can be explained after the fact. This is something that should ideally be explained more directly in the manuscript.

      5.The authors claim that by evaluating a large number of sequences at two conditions, they can capture variants with intermediate phenotypes (Fig. 1). This is not necessarily true. If the original screen allows only the most active variants to survive on kan+ medium, then the signature of intermediate phenotypes may not be encoded in the original data, and thus not retrievable even with sophisticated algorithms, which may also be prone to overfitting. At what limit of stringency will the screen fail to yield information about intermediate fitness? How deeply must one sequence to recover this information, especially if noisy or degraded? Some discussion of these effects would be helpful.

      6.Lastly, the evolvability of RNA is fascinating and there is much to learn. However, the authors don't discuss the implications of their findings for molecular evolution although they throw the term around. It would be exciting if there is a trend in the fitness landscape that could help explain the trajectory of RNA evolution in nature.

      7.The authors use the abbreviation DMS for deep mutational scanning; the RNA structure field uses the reagent dimethylsulfate that is also abbreviated DMS. They may want to choose a different acronym or just avoid an acronym altogether.

      Significance

      As the importance of RNA structure for gene expression becomes more widely appreciated, interest in understanding the evolution of RNA structures is also increasing. Compared with the molecular evolution of proteins, evolution and fitness in RNA is far less understood, although the authors appropriately point to a number of recent studies on this topic. The main advance here is to use machine learning methods to analyze the results of a large genotypic screen, with the goal of more accurately capturing the fitness effects of sequences at varied distances from the parental sequence. The specific conclusions reached here such as the importance of metastability or the prominence of cold sensitive effects are not revolutionary, but the authors illustrate how such phenomena can be investigated more systematically and in more depth.

  6. Sep 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Dear reviewers,

      Thank you very much for your constructive and helpful remarks and suggestions!

      We marked the changes in the manuscript in yellow.

      Our replies to the specific points:

      Reviewer #1 In the Introduction the authors need to cite earlier work in Chlamydomonas which first showed that binding of specific proteins to the psbA 5'UTR is correlated with increased translation in the light (Danon et al. 1991).

      As suggested, we added the reference to the introduction.

      Reviewer #1 The paper could be improved by testing for protein binding to the footprint region in high vs low light. An obvious candidate is HCF173.

      We agree that HCF173 is an obvious candidate, although its interaction could be mediated via additional proteins. Alice Barkan’s group has demonstrated that in maize HCF173 binds to the same region upstream of the translation initiation region (McDermott et al., 2019) where we detected a footprint (Supplemental Figure S11A-D). Furthermore, McDermott et al showed that the binding sequence is conserved. We would like to analyze this question in more detail, but we have currently in the lab no approach available to specifically isolate psbA mRNA with its bound proteins for this analysis and therefore have to postpone the answer to this question to future studies.

      Reviewer #2: \*Important changes to make before full submission:** 1)It is becoming clear that the translation efficiency (TE) is often not a calculation of translational output from specific mRNAs but in fact is better to be described as ribosome association. There can be many reasons for increased ribosome association including ribosome stalling and increased translational engagement. It would be good for the authors to add a simple Western blot to demonstrate directly increased protein output from psbA during high light as compared to low light treatments. This figure could be added to Figure S1.*

      We want to stress that we have chosen a condition that is well known to increase psbA translation in higher plants as shown in the literature with different methods (e.g. Chotewutmontri and Barkan, 2018; Schuster et al., 2020). The protein encoded by psbA, the D1 subunit of photosystem II, has an increased turnover in high light, i.e. a higher amount of D1 has to be produced to compensate for the increased degradation of photodamaged D1 (Mulo et al., 2012; Li et al., 2018).

      Although there is a lot of evidence in the literature for good correlation of translation efficiency as determined by ribosome profiling and protein synthesis, the reviewer raised a valid concern. Ribosome pausing or even ribosome stalling could also cause increased ribosome binding and thereby increased amounts of ribosome footprints. Therefore, we analyzed ribosome pausing in selected genes including psbA and rbcL. The pattern of ribosome pausing was very similar in low and high light (new Supplemental Figure 14), which rules out any ribosome stalling at specific sites or drastic changes in ribosome pausing. To analyze if there is increased ribosome pausing, we determined the fraction of footprints at pause sites compared to the total number of footprints. We used two different pause scores as cutoffs to determine pause sites. To include as many pausing events as possible, we used a pause score of 1, i.e. everything higher than the mean ribosome density per nucleotide of the corresponding coding region (Gawronski et al., 2018). This fraction was unaltered in low and high light (new Supplemental Figure 14). With a more stringent pause score of 20 (20 times higher ribosome density than the mean), an increase of ribsome pausing in high light was detected for psbA, whereas we did not find differences between high and low light for rbcL and psaA. However, this increase in pausing at the psbA mRNA is insufficient to explain the increase in the total amounts of ribosome footprints. Additional pause scores were tested, the value for the psbA fraction with a pause score of 20 included in Supplemental Figure S14 showed the largest difference.

      Reviewer #2: \*Strongly suggested additions to the manuscript to improve its significance before publication** 1)Identifying the RNA-binding protein(s) (likey HCF173 which may be in a complex with other proteins) that interacts with the 5' UTR of psbA in a highlight dependent manner would increase the significance of this study. Finding that this protein binds to other plastid transcripts with weak Shine-Delgarno sequences would also be a nice addition to this study.*

      See comment to reviewer 1. McDermott et al. (2019) describe HCF173 as relatively specific for psbA. Therefore, we do not assume that other genes with weak Shine-Dalgarno sequences are regulated via HCF173 but via different proteins using a similar molecular mechanism to influence the mRNA secondary structure at the translation initiation region.

      Reviewer #2: \*Strongly suggested additions to the manuscript to improve its significance before publication** 2)Mutational analysis of the RBP binding site and also to change the secondary structure around the start codon based on the new structure maps to show the effects of these various changes on protein output would really provide important new findings on how important the RBP being as compared to the RNA secondary structure changes are for regulating protein output form psbA. It could also allow the demonstration of the dependence or independence of these two features on regulating translation from chloroplast mRNAs.*

      We agree with the reviewer that this would be a very interesting study. Unfortunately, it requires a larger collection of lines with mutated psbA sequences. Plastid transformation in Arabidopsis thaliana is still technically demanding and time consuming. Even in the case of Nicotiana tabacum, for which plastid transformation is well established, such a project would likely need several years. We therefore think that such a study is beyond the scope of the current manuscript.

      Reviewer #3 1.In this paper, author mentioned that DMS can modify four nucleotides under alkaline conditions. Because the chloroplast is slightly alkaline, the authors use DMS reactivity from 4 nucleotides to model RNA secondary structure. Based on Kevin Weeks' s paper, it shows that in cell-free condition, DMS has very limited ability to modify single-stranded G and U compared to A and C (Anthony M. Mustoe et al., 2019, PNAS 116: 24574. fig. 1B). In Lars B. Scharff' paper which is cited by the author, it is also mentioned that A and C is more reliable to model RNA secondary structure. The authors might need to calculate the correlation the DMS data and known RNA structure using G/U or all four nucleotides to show that DMS reactivity from G and U is also reliable to be used. Also in Fig. S3B, the reproducibility of G/U between replicates is not as good as A/C. I don' t think G and U can be used to predict RSS.

      We agree with the reviewer that DMS reactivities at G/U are less reliable than those at A/C. This was shown by Mustoe et al. (2019) and by us for chloroplast rRNAs (Gawronski et al., 2020, Plants). We included a correlation of the known 16S rRNA secondary structure and the DMS reactivities at the different nucleotides (Supplemental Figure S5A) that demonstrates that the DMS reactivities at G/U actually contain information about rRNA secondary structure. This analysis demonstrated again that the reactivities at G/U are less reliable than at A/C. Therefore, we added an analysis of the more reliable A/C for comparison with the results for all four nucleotides (Figure 1D-F, 3C-F).

      Reviewer #3 2.Is the 5'UTR the only region which has RSS change? If not, how do RSS changes in other region contribute to translation?

      Translation initiation in plastids is mainly influenced by the secondary structure of the translation initiation region, especially at the cis-elements required for the recognition of the start codon. In addition, we have analyzed different other regions, e.g. the coding regions, the coding regions without the sequences next to the start codon, the end of the coding region, and the complete 5’ UTR (Supplemental Figure S14). We added a more detailed analysis of the changes of secondary structure of the coding region of those genes we focus on (Supplemental Figure S16). This shows that the secondary structure changes of the complete coding region correlate negatively with translation efficiency (see also Supplemental Figure S14G). A similar observation was made in E. coli and explained to be caused by differences in translation initiation, which are mainly influenced by the secondary structure of the translation initiation region (Mustoe et al., 2018).

      Reviewer #3 3.In Fig. 2A and 2B, the DMS reactivities seem very similar under low light and high light. Why did the authors obtain significantly different RNA secondary structure? Are the parameter of low light and high light the same when modelling RNA structure?

      The parameters for the RNA secondary structure predictions in Figure 2 are not identical (see Figure legend). For all structure predictions, the DMS reactivities were used as constrains, but only for the high light structure the sequence of the RNA binding protein’s footprint was forced to be single-stranded. These structure predictions are included to illustrate the mRNA structures in the presence and absence of an RNA binding protein. These structures are based on the observation that the two halves of the stem loop structure have different DMS reactivities in response to high light. The sequence including the protein footprint has lower DMS reactivities in both low and high light. This is in agreement with both a double-stranded sequence as well as a protein-bound sequence. In contrast, the other half of the stem loop, the sequence including the cis-elements of the translation initiation region, has increased DMS reactivities in high light, indicating that it is single-stranded. This suggests that there is protein binding in high light preventing the formation of the inhibitory stem loop.

      Reviewer #3 4.In Fig. S12, the correlationship between HL and LL in ribo-seq and RNAseq is high, which means no significant changes upon light change. In this paper, psbA should have translation change under high light conditions. I suggest the authors to label the dot representing psbA.

      Thank you very much for this suggestion! We marked psbA in the correlation plots (Supplemental Figure 12). The changes in the transcript levels are really minor, whereas for some genes the translation efficiency changes (see Figure 4 and Supplemental Figure S13).

      Reviewer #3 5.I suggest to use plants at the same stage for DMS-MaPseq and SHAPE probing.

      The different plant material was chosen because of the different requirements during probing. In this context, we would like to point out that observing the same changes in the translation initiation region in response to high light in different developmental stages is a stronger confirmation than observing the same response at the same developmental stage. This indicates that the response is not specific for a developmental stage.

      Reviewer #3 6.In Huang's paper (Jianyan Huang et al., 2019, Cell Reports 29: 4186-4199), there are many differential express genes under high light for 0.5hr. However, in the RNAseq data here, the correlation between high light and low light conditions is very high (Fig. S12). Why? Also, it would be nice if the authors could label several DEG whose expression change under high light treatment in Fig. S12?

      Supplemental Figure S12 contains only plastid-encoded RNAs, whereas Huang et al. (2019) focused on nuclear-encoded mRNAs. We clarified the figure legend of Supplemental Figure S12 by adding “of the plastid-encoded genes”. The values for the individual genes can be seen in Supplemental Figure S13.

      Reviewer #3 7.For the MNase footprint method, is the as-SD region the only region show enrichment under high light conditions? Besides, please provide the detailed method of MNase footprint. Does it work for RNA footprinting?

      The used methods are described under “Ribosome profiling (Ribo-seq)” and “Processing of Ribo-seq and RNA-seq reads” in Material and Methods. The approach was very similar to the one used for ribosome profiling with the difference that also smaller read lengths were included in the analysis (18-40 nt instead of 28-40 nt). We did this, because many plastid RNA binding proteins have footprints that are smaller than a ribosomal footprint. The described footprint is the only one detected near the translation initiation region of psbA. Binding of HCF173 was detected by the Barkan group in the same region using a RIP-Seq Analysis combined with RNase I digestion (McDermott et al., 2019), which confirms that our approach is working. We added a reference to the method section in the results part to clarify which approach was chosen.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      RNA can fold into secondary and tertiary structure through base-pairing. RNA structure plays a crucial role in gene functions and regulations, including transcription, processing, translation and decay. Plants acclimate to fluctuating light conditions to optimize photosynthesis and minimize photodamage. Translational regulation is known to be a strategy of these acclimations. It reported that translation of psbA, encoding the D1 reaction center protein of Photosystem II, is increased under high light condition. The light-controlled psbA translation has been intensively studied and was suggested to be related with redox/thiol signals, the ATP status, and some certain proteins. In this ms, Gawroński et al. explored the possible link between RNA secondary structure and translational efficiency. They adopted DMS-MaPseq and SHAPE-seq methods to profile the RNA secondary structure in 5UTR of psbA under low light and high light conditions. The results showed that the DMS and SHAPE activities of Shine-Dalgarno (SD) sequence, star codon and as-SD region are higher under high light condition than that under low light control, indicating that the psbA translation initiation region becomes more single-strandeness and accessible under high light condition. MNase-digestion and DMS activity analysis suggested that protein binding might cause the change of RNA secondary structure of psbA translation initiation region. In addition, the authors probed the RNA secondary structure of the translation initiation region of rbcL that encodes the large subunit of Rubisco and found no change in RNA structure of rbcL, while the translation of rbcL is also increased under high light condition. To address the question that RNA structure changes is related with high light-dependent translational activation of psbA but not rbcL, plastome-wide translational efficiency and RNA structure were analyzed. The results showed that a significant correlation between the RNA secondary changes and translational efficiency changes in the chloroplast-coded mRNAs with week SDs (such as psbA), but not with strong SDs (such as rbcL).

      The light-dependent translational activation of psbA is critical for maintaining photosynthetic homeostasis. Also, the molecular mechanism of RSS's impact on translation is still exclusive The topic of this study is very important. However, this study just described the phenomenon of RNA secondary structure changes in translational initiation region, but does not give further evidence to validate the effect of RNA secondary changes on the translational activation of psbA under high light condition. Besides, the evidence of protein binding causing RNA structure changes is week and unclear. In addition, there is much room for improvement for this work

      1.In this paper, author mentioned that DMS can modify four nucleotides under alkaline conditions. Because the chloroplast is slightly alkaline, the authors use DMS reactivity from 4 nucleotides to model RNA secondary structure. Based on Kevin Weeks' s paper, it shows that in cell-free condition, DMS has very limited ability to modify single-stranded G and U compared to A and C (Anthony M. Mustoe et al., 2019, PNAS 116: 24574. fig. 1B). In Lars B. Scharff' paper which is cited by the author, it is also mentioned that A and C is more reliable to model RNA secondary structure. The authors might need to calculate the correlation the DMS data and known RNA structure using G/U or all four nucleotides to show that DMS reactivity from G and U is also reliable to be used. Also in Fig. S3B, the reproducibility of G/U between replicates is not as good as A/C. I don' t think G and U can be used to predict RSS.

      2.Is the 5'UTR the only region which has RSS change? If not, how do RSS changes in other region contribute to translation?

      3.In Fig. 2A and 2B, the DMS reactivities seem very similar under low light and high light. Why did the authors obtain significantly different RNA secondary structure? Are the parameter of low light and high light the same when modelling RNA structure?

      4.In Fig. S12, the correlationship between HL and LL in ribo-seq and RNAseq is high, which means no significant changes upon light change. In this paper, psbA should have translation change under high light conditions. I suggest the authors to label the dot representing psbA.

      5.I suggest to use plants at the same stage for DMS-MaPseq and SHAPE probing.

      6.In Huang's paper (Jianyan Huang et al., 2019, Cell Reports 29: 4186-4199), there are many differential express genes under high light for 0.5hr. However, in the RNAseq data here, the correlation between high light and low light conditions is very high (Fig. S12). Why? Also, it would be nice if the authors could label several DEG whose expression change under high light treatment in Fig. S12?

      7.For the MNase footprint method, is the as-SD region the only region show enrichment under high light conditions? Besides, please provide the detailed method of MNase footprint. Does it work for RNA footprinting?

      Significance

      see above

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This study uses multiple high-throughput sequencing approaches to probe the secondary structure of the chloroplasitc psbA mRNA during low and high light treatments. They are able to demonstrate a shift in secondary structure around the start codon of this mRNA in response to the high light treatment as compared to under low light conditions. This structural shift is also accompanied by an RBP binding even that may also be involved in regulating the translation from this mRNA in response to high light. I think this study is very interesting and timely. However, I think determining the relative contributions of the secondary structure and RBP binding changes to potential increases in protein outputs from this mRNA in response to high light would improve this manuscript. I also think directly looking at protein levels through a straight-forward Western blot to show increase psbA protein in response to high light treatment is an important addition to this study. I outline my few suggested experimental additions for this manuscript below.

      Important changes to make before full submission:

      1)It is becoming clear that the translation efficiency (TE) is often not a calculation of translational output from specific mRNAs but in fact is better to be described as ribosome association. There can be many reasons for increased ribosome association including ribosome stalling and increased translational engagement. It would be good for the authors to add a simple Western blot to demonstrate directly increased protein output from psbA during high light as compared to low light treatments. This figure could be added to Figure S1.

      Strongly suggested additions to the manuscript to improve its significance before publication

      1)Identifying the RNA-binding protein(s) (likey HCF173 which may be in a complex with other proteins) that interacts with the 5' UTR of psbA in a highlight dependent manner would increase the significance of this study. Finding that this protein binds to other plastid transcripts with weak Shine-Delgarno sequences would also be a nice addition to this study.

      2)Mutational analysis of the RBP binding site and also to change the secondary structure around the start codon based on the new structure maps to show the effects of these various changes on protein output would really provide important new findings on how important the RBP being as compared to the RNA secondary structure changes are for regulating protein output form psbA. It could also allow the demonstration of the dependence or independence of these two features on regulating translation from chloroplast mRNAs.

      Significance

      This study definitely focuses on a research topic that is currently of interest and highly timely.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This manuscript addresses the regulation of chloroplast translation, an important topic in chloroplast biology. The authors show that specific changes in the secondary structure of the 5'UTR of the psbA mRNA involving the Shine-Dalgarno sequence and the AUG initiation codon can be correlated with changes in translational efficiency during a low light to high light shift. Based on indirect evidence they propose that this may be caused by binding of specific proteins to this region. They also show that this correlation appears to be valid to some extent for other mRNAs with a weak SD sequence. The technical quality of this manuscript is excellent and the manuscript is clearly written.

      Additional remarks

      In the Introduction the authors need to cite earlier work in Chlamydomonas which first showed that binding of specific proteins to the psbA 5'UTR is correlated with increased translation in the light (Danon et al. 1991). The paper could be improved by testing for protein binding to the footprint region in high vs low light. An obvious candidate is HCF173.

      Significance

      This work provides valuable new insights into the molecular mechanisms involving the psbA 5'UTR in the initiation of chloroplast translation.

      This work will be of interest to a wide audience interested in the mechanisms of translational regulation.

      My expertise is in chloroplast biogenesis and in assembly and regulation of the photosynthetic apparatus

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Molenaars et al., describe a protocol to extract and quantify a wide range of polar and apolar metabolites from the same C. elegans sample using methanol-chloroform based phase separation. The authors assess the method across different input amounts, in comparison to a 1-phase extraction method and through metabolic perturbations using RNAi against several metabolic enzymes. Finally, they provide a metabolomics analysis of metabolite variation across several C. elegans strains. The data are of overall high quality and presented in a clearly written manuscript.

      We really appreciate the positive words from the reviewer.

      To help assessing the value of the method to other approaches, several controls are suggested below:

      1.Fig.1: Metabolite abundance in the polar phase should be compared to 1-phase extraction methods (analogous to Fig. 2I, which compares metabolites in the apolar phase to 1-phase extraction)

      We acknowledge the apparent asymmetry in the text; comparing our two-phase method to a single phase lipidomics method indeed suggests a similar comparison for metabolomics. However, our established polar metabolomics method has always been based on this exact two-phase extraction. The current method exclusively asks whether it is possible to integrate our dedicated lipidomics platform into our established two-phase polar metabolomics method, by utilizing the apolar phase that is usually discarded. This way, the method enables comprehensive metabolomics/lipidomics screening while limiting the need of culturing twice the amount of material.

      Our manuscript does not necessarily ask the more fundamental question of the advantages of a one-phase vs two-phase extraction for polar metabolites. Interestingly, the one-phase vs two-phase metabolomics methods have been compared previously and the authors show here that the two-phase method achieved broader metabolite coverage, satisfactory extraction reproducibility, acceptable recovery and safety (DOI: 10.1038/srep38885). This is most probably due to the cHILIC column being sensitive for contamination and therefore excluding lipids from your samples is beneficial for measuring polar metabolites. We hence believe that developing a single phase polar method would appear superfluous for the purpose of this study.

      2.Are polar metabolites also detected in the apolar phase? Can the less hydrophobic lipids missing from the apolar phase detected in the polar phase?

      This is an interesting question that mostly relates to the lyso-lipids that are not detected in the lipid phase of our two-phase extraction. The first point to make is that sample solvents that are used at the final stage of extraction are not compatible between methods. In other words, the solvent we normally use for the lipids phase (xxx) cannot be injected on the cHILIC column. So, in a practical sense, we would not be able to measure these compounds, even if they would technically be dissolved in the other layer. However, we tried a few different alternative approaches to get more information on this point:

      We have attempted to integrate the lyso-lipids in the cHILIC measurements, in the polar layer, using the polar sample solvents. This was unsuccessful; no reproducible peaks, not even the internal standards, were measured. We will include a note on these results in our manuscript. We have, albeit for a different sample matrix, attempted to dissolve both layers of the two-phase extraction in the cHILIC sample solvents. While we cannot guarantee this for all metabolites, it appears that most polar metabolites are exclusively found in the polar layer. We were not able to integrate even a single peak from any of the sugar, amino acids, nucleotides, etc in the apolar layer dissolved in polar solvents. We have reconstituted both the polar and apolar layer of our two-phase extraction in 50:50 methanol:chloroform and analyzed them on the lipidomics platform. We did find some of the lipid internal standards partition to the polar phase, especially LPG (and to a lesser extent LPE and LPA) compared to for instance PE, SM, PG and PC that all end up in the apolar phase. We will include these data in the revised manuscript as a supplemental figure as it demonstrates that the lyso-lipids are poorly measured in the two-phase extraction. This is also why in the text we advise to use the dedicated one-phase extraction when interested primarily in these species.

      3.Fig.3l-n: The authors claim that extracting metabolites from the polar and apolar phases of the same sample leads to better cross-correlation than if metabolites are extracted from different samples using methods optimized for the respective metabolite classes. To provide experimental evidence, metabolite abundance should be compared directly when metabolites are extracted from the same or from different samples using suitable methods.

      We agree with this point. We will amend the text to not overstate these advantages.

      Reviewer #1 (Significance (Required)):

      The methodological and conceptual advancement of the present study is rather incremental. The authors essentially use the classical chloroform/methanol/water phase separation protocols developed by Bligh & Dyer and Folch, which have been used extensively for lipid extraction for many decades now. However, the effort to carefully measure the metabolites contained in the aqueous phase is laudable. For method validation, the authors use well-understood perturbations that yield predictable results. Overall, I consider the study more appropriate for a publication as a methods protocol, which could be of interest to the metabolomics community, rather than as a research paper.

      We agree; our goal was indeed to create and share a method, we will make sure to emphasize this in our cover letter.

      While the extraction method we use is not novel per se and based on classical extraction procedures, it is important to underscore that we are only now able to use these extractions in combination with high-resolution mass spectrometry. This opens new opportunities for basic discovery. The efficiency we achieve by using both phases of the two-phase procedure makes our method highly attractive for hypothesis generation, especially in sample sets where limited amounts of material are available.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors provide a detailed description of a method to analyse both polar as well as lipophilic metabolites from the same nematode sample. This provides significant advantages over methods using individual samples. Moreover and by using internal standards they establish an extremely good correlation of individual metabolites. This paper is of immediate importance for the worms community and beyond.

      We are very grateful to receive this positive response from the reviewer and for highlighting the advantages of our described method also beyond the worm community.

      **Major comments:**

      none **Minor comments:**

      The correction process using internal standards could be described a bit more detailed.

      In our revised manuscript, we will describe the internal standard use and corrections in more detail in the text. In summary: internal standards are selected for specific metabolites based on their Pearson correlation and %CV. Subsequently, metabolite peak areas were divided by the area of the appropriate internal standard. This corrects for any loss of sample during sample prep, for instance during the isolation of the two layers.

      Jenni Watts has written a nice Worm Book chapter on lipids which may be cited in addition to reference 17, since it covers many of the metabolites and related enzymes contained in this manuscript

      We will include a reference to this Worm book chapter reviewing fat regulation in C. elegans in our paper, thank you for the suggestion.

      Reviewer #2 (Significance (Required)):

      see above

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript is well written and consider. However, there is room for further improvements:

      We thank the reviewer for the positive response and for the suggestions raised.

      1) Author need to write exactly how many metabolites not just >, semi-quantitative analysis of >100 polar (metabolomics) and >1000 apolar (lipidomics) metabolites in C. elegans, for example they did with other papers in Table 1

      We understand that this might appear vague. The notation was a compromise, based on the following considerations:

      1. The maximum number of reported metabolites can be different to the number of analyzed metabolites in a specific experiment or even a specific sample. For instance, our method is perfectly capable of measuring creatine metabolism –we have standards for these metabolites and they can be reliably measured–, however we have not yet been able to detect these metabolites in elegans. Some mutants also lose abundance of a certain metabolite to the point of it not being reliably measurable, which means they are filtered out in the bioinformatics.
      2. Since the initial draft of our manuscript we have been able, and will continue to be able, to add new metabolites to our analysis, as we perform a full scan over the range of m/z 50-1200. Because of this, we felt it more accurate to state that we can measure >100 metabolites, instead of a specific number.

        2) Authors also need to clarify on number of samples in the result section while describing the statistical analysis.

      We understand this point raised by the reviewer and will specify not only the number of samples, but also that they are indeed biological replicates. This will be included in the figure legends.

      Reviewer #3 (Significance (Required)):

      This might be interesting paper for the research community who work with C.elegans (metabolism or in general)

      Thank you, we are in fact utilizing this double extraction for other non-worm samples such as mice an human tissues and we believe this could also benefit the research community beyond the model organism C. elegans.

      The authors must deposit the raw data and make it available for the public, so they could also benefit from this good work.

      It is our full intention to share our data in a convenient and standardized way through for instance the MetaboLights database (https://www.ebi.ac.uk/metabolights/). We agree and changes will be implemented as suggested.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      **Summary:** The authors present a method for extraction of both lipid and polar metabolites from the model organism C. elegans. This extraction method is based on the well-established Blyth and Dyer method, with a slight modification to retain and utilize both the organic and non-polar fractions for LCMS analysis. They applied and tested this method against a monophasic extraction utilizing the same solvent system. They report that there is a loss of metabolites in the non-polar fraction to the polar fraction (of more polar metabolites) and small differences between the monophasic and biphasic extractions. They also expanded on the linearity of the extraction efficiency by increasing the number of worms. Further they applied the single extraction method to both knockdown mutants of C. elegans and Recombinant Inbred Lines derived from N2 and the natural isolate CB4856 to determine whether this method would still be able to differentiate the metabolome between the genetically different C. elegans populations.

      We thank the reviewer for their comments and suggestions.

      **Major comments:**

      *Are the key conclusions convincing?*

      As a whole the conclusions are convincing and valid.

      We appreciate that the reviewer considers our work convincing and valid.

      *Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?*

      The use of the adjective "robust" is, to an extent, erroneous. As defined, a robust method implies that the method is capable of withstanding small (deliberate or not) changes or variations. In this case the robustness of the method was not assessed and not clear how replication was carried out.

      We have in fact performed analysis on both biological replicates and repeated injections of pooled samples to determine robustness. We will clarify the biological replicates in the text and will place the pooled QC samples in the main text with additional explanation and relevant statistics such as % coefficient of variance (%CV) between them. For clarity, we plotted %CV of all polar as well as apolar metabolites. For polar metabolites 97% of the metabolites had a %CV lower than 30. For apolar metabolites 86% of the metabolites had a %CV lower than 30.

      *Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.*

      Reproducibility would need to be assessed/quantified to establish how robust the method is. Even though linearity with an increase in the number of worms is a good indication, it does not satisfactorily establish the robustness of the method. The use of replicates to assess the agreement between measurements (i.e. bland-Altman plots), linearity as well as coefficients of variation (included in the sup material but not clear in the body of the manuscript) would characterize the methods best. The isolation of each variance originating from instrumental (pooled quality controls), biological (biological replication) and sample preparation (multiple extractions from the same biological source) is critical.

      We have these data and will elaborate on this in our revised manuscript. We will discuss the quality control samples more prominently in the main body of the manuscript, and show one or more figures that specifically address both analytical and biological variance (see rebuttal figure 2). In summary, we assessed this variance using (a) a repeated injection of a pooled QC sample, and (b) biological replicates prepared individually. Especially the latter condition, in which we assess biological variance is representative for the actual method application. The %CV under these conditions is ≤20% for the majority of metabolites, which is why we consider our method robust.

      *Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.*

      The suggested experiments are in-fact just further analysis with the already collected data. There would be no need for further experiments, however it is not clear whether pooled QCs/or reference materials were used and the number of replicates per experimental design.

      All the data are available. These analyses will be included in the revision.

      *Are the data and the methods presented in such a way that they can be reproduced?*

      The methods are very well described. My only comment is to address how the replicates were grown/created and how many per strain/group. If the replicate measurements were done on the same samples (repeated injections), I believe that would weaken the findings (if not invalidate them altogether), however if these were biological replicates from independent starting populations the findings are valid and convincing.

      We performed bona fide biological replicates. We will explicitly mention this in the paper together with the other descriptions of our validation protocols.

      *Are the experiments adequately replicated and statistical analysis adequate?*

      As per my above comments.

      **Minor comments:**

      *Specific experimental issues that are easily addressable.*

      It is not clear how the sample preparation process was carried out (randomization, run order, QCs etc). As per the guidelines widely accepted from –Broadhurst, D., Goodacre, R., Reinke, S.N. et al. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics 14, 72 (2018). https://doi.org/10.1007/s11306-018-1367-3.

      We will provide details on the analysis itself in a table. In summary: Samples were measured in a random order, with blanks and QC samples throughout the run.

      *Are prior studies referenced appropriately?*

      A major reference that has applied this extraction method before in the same model organism is missing:

      Castro, C., Sar, F., Shaw, W.R. et al. A metabolomic strategy defines the regulation of lipid content and global metabolism by Δ9 desaturases in Caenorhabditis elegans. BMC Genomics 13, 36 (2012). https://doi.org/10.1186/1471-2164-13-36

      We will include this paper in our references. We would like to note though that this method requires not just an LC system to analyze lipids, but also GC with additional derivatization steps. Our method achieves comprehensive lipidomics using a single technique and no additional derivatization.

      Further a recent publication that goes beyond the work described by the authors using similar approach: MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses. Ernesto S. Nakayasu, Carrie D. Nicora, Amy C. Sims, Kristin E. Burnum-Johnson, Young-Mo Kim, Jennifer E. Kyle, Melissa M. Matzke, Anil K. Shukla, Rosalie K. Chu, Athena A. Schepmoes, Jon M. Jacobs, Ralph S. Baric, Bobbie-Jo Webb-Robertson, Richard D. Smith, Thomas O. Metz mSystems May 2016, 1 (3) e00043-16; DOI: 10.1128/mSystems.00043-16

      We will also include this paper, reporting 51 polar metabolites and 84 lipid species, in our references. While we recognize that they also make use of both phases and the protein pellet, we think our method is much more practical in several key ways:

      Our metabolomics platform provides twice as many species and our lipids platform exceeds their analytical capabilities 10 fold. This means a far better coverage of differences within metabolite and lipid classes, allowing for far more intricate patterns to be detected. We show this for instance in our plots comparing carbon chain length to degree of saturation (Fig 4 and S2 in original manuscript); a comparison that is only possible with the data density that our method offers. The MPLEx metabolomics method also requires the use of a GC system and derivatization steps, while our method does not, making it much more user friendly and requiring only a single analytical system.

      *Are the text and figures clear and accurate?*

      Yes *Do you have suggestions that would help the authors improve the presentation of their data and conclusions? *

      The figures, overall are of exceptional quality.

      As per current scientific consensus, Box plots should also be overlaid with the actual datapoints (which was aptly done for the bar charts and other plots).

      The supplementary data even though comprehensive is hard to understand. A "readme" file detailing what data each file contains would improve readability and comply with FAIR principles.

      We agree that a readme file would make the supplemental data more understandable. We will provide such a file. For the box plots we will show the actual data points in our revised manuscript.

      Reviewer #4 (Significance (Required)):

      Even though the approach is not novel and has long been used in Natural Products Chemistry and in other organisms, it's highly significant to set an extraction method standard for the field of C. elegans metabolomics (including myself doing metabolomics and natural products chemistry with LCMS and NMR). However, this manuscript does not cover the technical aspects of the method with sufficient depth to hallmark this method as the standard for the field. Further information is needed to fill the missing gaps (as highlighted by the authors). Ratios between solvent and biological material amounts, reproducibility, recovery rates (even though buried in the supplementary files) and metabolite coverage are still missing.

      As a side note, the disparity between the monophasic and biphasic extractions could be overcome by a sequential extraction of the same sample, with no incurred cost on performance (and removing the much-dreaded pipetting uncertainty near the line between solvents). The second aspect of the manuscript, which initially was a welcoming idea (and important), became >50% of the manuscript creating a disconnect between the information set by the abstract and introduction and the results/conclusion. The work is extremely relevant in both sections of the manuscript, but the technical aspect is still lacking details and/or analysis.

      Strongly suggested: explicit compliance with the minimum reporting standards as per the Metabolomics Standards Initiative (MSI) and deposition of the data to a metabolomics repository (i.e. Metabolights or Metabolomics Workbench). These are internationally accepted requirements for metabolomics publications.

      We are aware that the extraction itself is an analytical chemistry staple. However, it is precisely in this fact that we find novelty. It should be noted that both of the other papers mentioned by the reviewers that have attempted to integrate lipidomics and metabolomics have had to resort to labor intensive (as well as possibly expensive and destructive) derivatization steps and a separate analysis on GC. Our method does not have these requirements. It is indeed a single and very common extraction, after which each dried phase is reconstituted and immediately injected. But this simplicity is not a concession, as our metabolome coverage is easily more comprehensive than the other mentioned methods. We therefore feel that this simplicity should not discount our currently presented method, but be considered an additional advantage.

      Sequential extractions may be an option to consider. However, we feel like they are less user friendly and unneeded. Because we use internal standards, it is never an issue to pipet slightly more or less of any particular sample; making it easy to avoid the line between solvents.

      We will explicitly clarify where we already comply with the standards (such as the analysis of biological replicates and repeated injection of a QC sample) and are confident we can add figures and further information such as deposition of our data to comply with the rest.

      REFEREES CROSS-COMMENTING

      Completely agree with reviewer #1 comments, they are on point and I completely missed it. Relevant and should be addressed.

      Reviewers #2 points out work worth acknowledging, the internal standard work was quite thorough and well designed.

      Reviewer #3 and my comments overlap nicely, the need for further description of samples/replication and deposition of data in a metabolomics repository.

      Further work is required to make this a good publication and standard for the field, without this extra work addressing the reviewers comments I feel this work could be to certain degree misleading and/or incomplete putting in cause its publication potential.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Summary:

      The authors present a method for extraction of both lipid and polar metabolites from the model organism C. elegans. This extraction method is based on the well-established Blyth and Dyer method, with a slight modification to retain and utilize both the organic and non-polar fractions for LCMS analysis. They applied and tested this method against a monophasic extraction utilizing the same solvent system. They report that there is a loss of metabolites in the non-polar fraction to the polar fraction (of more polar metabolites) and small differences between the monophasic and biphasic extractions. They also expanded on the linearity of the extraction efficiency by increasing the number of worms. Further they applied the single extraction method to both knockdown mutants of C. elegans and Recombinant Inbred Lines derived from N2 and the natural isolate CB4856 to determine whether this method would still be able to differentiate the metabolome between the genetically different C. elegans populations.

      Major comments:

      Are the key conclusions convincing?

      As a whole the conclusions are convincing and valid.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      The use of the adjective "robust" is, to an extent, erroneous. As defined, a robust method implies that the method is capable of withstanding small (deliberate or not) changes or variations. In this case the robustness of the method was not assessed and not clear how replication was carried out.

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      Reproducibility would need to be assessed/quantified to establish how robust the method is. Even though linearity with an increase in the number of worms is a good indication, it does not satisfactorily establish the robustness of the method. The use of replicates to assess the agreement between measurements (i.e. bland-Altman plots), linearity as well as coefficients of variation (included in the sup material but not clear in the body of the manuscript) would characterize the methods best. The isolation of each variance originating from instrumental (pooled quality controls), biological (biological replication) and sample preparation (multiple extractions from the same biological source) is critical.

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      The suggested experiments are in-fact just further analysis with the already collected data. There would be no need for further experiments, however it is not clear whether pooled QCs/or reference materials were used and the number of replicates per experimental design.

      Are the data and the methods presented in such a way that they can be reproduced?

      The methods are very well described. My only comment is to address how the replicates were grown/created and how many per strain/group. If the replicate measurements were done on the same samples (repeated injections), I believe that would weaken the findings (if not invalidate them altogether), however if these were biological replicates from independent starting populations the findings are valid and convincing.

      Are the experiments adequately replicated and statistical analysis adequate?

      As per my above comments.

      Minor comments:

      Specific experimental issues that are easily addressable.

      It is not clear how the sample preparation process was carried out (randomization, run order, QCs etc). As per the guidelines widely accepted from -

      Broadhurst, D., Goodacre, R., Reinke, S.N. et al. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics 14, 72 (2018). https://doi.org/10.1007/s11306-018-1367-3.

      Are prior studies referenced appropriately?

      A major reference that has applied this extraction method before in the same model organism is missing:

      Castro, C., Sar, F., Shaw, W.R. et al. A metabolomic strategy defines the regulation of lipid content and global metabolism by Δ9 desaturases in Caenorhabditis elegans. BMC Genomics 13, 36 (2012). https://doi.org/10.1186/1471-2164-13-36

      Further a recent publication that goes beyond the work described by the authors using similar approach:

      MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses Ernesto S. Nakayasu, Carrie D. Nicora, Amy C. Sims, Kristin E. Burnum-Johnson, Young-Mo Kim, Jennifer E. Kyle, Melissa M. Matzke, Anil K. Shukla, Rosalie K. Chu, Athena A. Schepmoes, Jon M. Jacobs, Ralph S. Baric, Bobbie-Jo Webb-Robertson, Richard D. Smith, Thomas O. Metz mSystems May 2016, 1 (3) e00043-16; DOI: 10.1128/mSystems.00043-16

      Are the text and figures clear and accurate?

      Yes

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      The figures, overall are of exceptional quality. As per current scientific consensus, Box plots should also be overlaid with the actual datapoints (which was aptly done for the bar charts and other plots). The supplementary data even though comprehensive is hard to understand. A "readme" file detailing what data each file contains would improve readability and comply with FAIR principles.

      Significance

      Even though the approach is not novel and has long been used in Natural Products Chemistry and in other organisms, it's highly significant to set an extraction method standard for the field of C. elegans metabolomics (including myself doing metabolomics and natural products chemistry with LCMS and NMR). However, this manuscript does not cover the technical aspects of the method with sufficient depth to hallmark this method as the standard for the field. Further information is needed to fill the missing gaps (as highlighted by the authors). Ratios between solvent and biological material amounts, reproducibility, recovery rates (even though buried in the supplementary files) and metabolite coverage are still missing.

      As a side note, the disparity between the monophasic and biphasic extractions could be overcome by a sequential extraction of the same sample, with no incurred cost on performance (and removing the much-dreaded pipetting uncertainty near the line between solvents).

      The second aspect of the manuscript, which initially was a welcoming idea (and important), became >50% of the manuscript creating a disconnect between the information set by the abstract and introduction and the results/conclusion. The work is extremely relevant in both sections of the manuscript, but the technical aspect is still lacking details and/or analysis.

      Strongly suggested: explicit compliance with the minimum reporting standards as per the Metabolomics Standards Initiative (MSI) and deposition of the data to a metabolomics repository (i.e. Metabolights or Metabolomics Workbench). These are internationally accepted requirements for metabolomics publications.

      REFEREES CROSS-COMMENTING

      Completely agree with reviewer #1 comments, they are on point and I completely missed it. Relevant and should be addressed.

      Reviewers #2 points out work worth acknowledging, the internal standard work was quite thorough and well designed.

      Reviewer #3 and my comments overlap nicely, the need for further description of samples/replication and deposition of data in a metabolomics repository.

      Further work is required to make this a good publication and standard for the field, without this extra work addressing the reviewers comments I feel this work could be to certain degree misleading and/or incomplete putting in cause its publication potential.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript is well written and consider. However, there is room for for further improvements,

      1) Author need to write exactly how many metabolites not just >, semi-quantitative analysis of >100 polar (metabolomics) and >1000 apolar (lipidomics) metabolites in C. elegans, for example they did with other papers in Table 1

      2)Authors also need to clarify on number of samples in the result section while describing the statistical analysis.

      Significance

      This might be interesting paper for the research community who work with C.elegans (metabolism or in general)

      The authors must deposit the raw data and make it available for the public,so they could also benefit from this good work.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors provide a detailed description of a method to analyse both polar as well as lipophilic metabolites from the same nematode sample. This provides significant advantages over methods using individual samples. Moreover and by using internal standards they establish an extremely good correlation of individual metabolites. This paper is of immediate importance for the worms community and beyond.

      Major comments: none

      Minor comments:

      The correction process using internal standards could be described a bit more detailed.

      Jenni Watts has written a nice Worm Book chapter on lipids which may be cited in addition to reference 17, since it covers many of the metabolites and related enzymes contained in this manuscript

      Significance

      see above

    5. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Molenaars et al., describe a protocol to extract and quantify a wide range of polar and apolar metabolites from the same C. elegans sample using methanol-chloroform based phase separation. The authors assess the method across different input amounts, in comparison to a 1-phase extraction method and through metabolic perturbations using RNAi against several metabolic enzymes. Finally, they provide a metabolomics analysis of metabolite variation across several C. elegans strains. The data are of overall high quality and presented in a clearly written manuscript.

      To help assessing the value of the method to other approaches, several controls are suggested below:

      1.Fig.1: Metabolite abundance in the polar phase should be compared to 1-phase extraction methods (analogous to Fig. 2I, which compares metabolites in the apolar phase to 1-phase extraction)

      2.Are polar metabolites also detected in the apolar phase? Can the less hydrophobic lipids missing from the apolar phase detected in the polar phase?

      3.Fig.3l-n: The authors claim that extracting metabolites from the polar and apolar phases of the same sample leads to better cross-correlation than if metabolites are extracted from different samples using methods optimized for the respective metabolite classes. To provide experimental evidence, metabolite abundance should be compared directly when metabolites are extracted from the same or from different samples using suitable methods.

      Significance

      The methodological and conceptual advancement of the present study is rather incremental. The authors essentially use the classical chloroform/methanol/water phase separation protocols developed by Bligh & Dyer and Folch, which have been used extensively for lipid extraction for many decades now. However, the effort to carefully measure the metabolites contained in the aqueous phase is laudable. For method validation, the authors use well-understood perturbations that yield predictable results. Overall, I consider the study more appropriate for a publication as a methods protocol, which could be of interest to the metabolomics community, rather than as a research paper.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      We thank the reviewers for their feedback and encouragement. We have now fully revised the manuscript to address all comments. Our specific responses are provided below and we have highlighted changes in the text. The major additions are:

      • analysis of simulated time-courses with lower temporal resolution
      • analysis of ex vivo PER2::LUCIFERASE SCN recordings
      • analysis of simulated time-courses with Poisson distributions of noise
      • plotted summary statistics for several figures
      • mathematical formula and explanation in the Methods Overall, these revisions have strengthened our findings and improved the manuscript, particularly in demonstrating that the issues with the chi-square periodogram are not specific to sampling interval or data type.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Tackenberg & Hughey investigate the reliability of a popular period estimation algorithm, the chi-square periodogram. They find a bias in the estimation, and through careful investigation identify the cause. This is a well executed and well presented study.

      **Comments:**

      In Figs 2+3 the authors show that the discontinuity in periodogram coincides with the number of complete cycles, K. However, in Fig 2C there are several other positions where K abruptly changes, but little effect on the chi-squared statistic is observed. Can the authors offer an explanation as to why the magnitude of the discontinuities differ?

      We have taken a closer look at how each component of the chi-square statistic calculation changes at points where K decreases, and have found that discontinuities do always occur at these points. In addition to the obvious effect of the K * N term on the sudden decreases, we found that the sum of squares of the column means alone (the primary component of the numerator) also changes abruptly at each transition point of K. As a result, the discontinuity magnitude is likely roughly proportional to the amplitude of the chi-square statistic at that point.

      An important claim is that the discontinuity is observed in multiple software implementations. However, the plots of Supplementary Fig 1C,D are presented too small to evaluate this claim.

      In Supplemental Fig. 1C-D, the critical information is the shape of the periodogram and the presence of a discontinuity, so we believe the plot sizes are appropriate.

      It may be of interest to apply the algorithms to a single-cell experimental data set which are qualitatively different (e.g., oscillation shape, damping).

      We have created a new supplemental figure (Supplemental Fig. 8) by applying the strategy and visualization used in Fig. 6 to SCN PER2::LUC recordings instead of wheel-running data, and have updated the text accordingly.

      Reviewer #1 (Significance (Required)):

      It has been previously shown that the chi-square periodogram algorithm has performance shortcomings for the analysis of circadian data (e.g. Zielinski et al., 2004). However, this study demonstrates exactly why, giving more conclusive evidence to support the conclusion that it should be avoided. This will be useful to many in the mammalian circadian community. It should be noted however that other algorithms are already favoured by other ciock communities (e.g. plant), even if a rigorous understanding of the biases were lacking.

      The methods developed here will be valuable for future comparisons of circadian algorithms. Of particular importance will be comparing algorithms for analysis of single-cell rhythms or non-stationary rhythms.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Chi-squared periodograms (CSP) are routinely used in circadian biology. In particular, this test has been used to determine circadian period in behavioral data (e.g. actigraphy) in mammals, flies and other species. This paper suggests that CSP, in some circumstances (e.g. where there are discontinuities), that CSP could be improved by changing the algorithm. They propose different steps to do this (e.g. using their greedy CSP code) and/or by using alternative tests such as Lomb-Scargle.

      The authors use simulated data to demonstrate their findings, and whilst I can see the benefits of this, it would be useful to benchmark the algorithms on actual real world circadian data (e.g. actograms from mouse or fly experiments). Although these types of data may not be publicly available, it would be highly likely to be available from multiple labs in the circadian field. In particular, fly datasets will be abundant in many clock labs. This would aid the utility of the papers findings for the field.

      Fig. 6 is entirely based on real-world circadian data (mouse wheel-running activity), as is the newly added Supplemental Fig. 8.

      Reviewer #2 (Significance (Required)):

      The paper is helpful for the circadian field when dealing with datasets that may contain discontinuities.

      It appears that the paper will be primarily useful for behavioral data, rather than, for example, transcriptomic time courses, since these tend to be much shorter and less sample intensive. Thus, it would be useful for circadian (and other) researchers analysing activity data in particular.

      My expertise is in circadian rhythms, both behavioural and molecular (e.g. sequencing) level analyses. Thus, I would be a possible end-user for the algorithms in this paper.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      The authors identify a serious flaw in a popular method called Chi-squared periodogram (CSP) for period estimation in circadian rhythms. They systematically get to the source of the problem -- a discontinuity in the test statistic. This flaw leads to a bias in the period estimate. They present two modifications to the CSP, one of which they prefer. Nevertheless, they show that other more flexible methods such as Lomb-Scargle Periodogram work well without this discontinuity (bias) issue.

      **Major Comments:**

      1.One thing the authors do not include is timeseries lengths of non-integer days. Would it not be an interesting suggestion to choose a non-integer length time course, which is not a multiple of the periods of interest, and still continue using CSP as is ? This is also rather counter-intuitive.

      Figs. 3A and 6 and newly added Supplemental Fig. 8 use non-integer (24-h) days.

      2.I suppose the authors use a sampling resolution of 6min with wheel-running activity in mind. But it would be worth it in the interest of completeness to also consider a lower resolution. There is nothing in this study that ties it to the specific application, is it not?

      Although a sampling resolution of 6 minutes is not specific to wheel-running activity, we have added an analysis identical to that of Fig. 5 but with a resolution of 20 minutes (Supplemental Fig. 5). Additionally, the PER2::LUC SCN recordings analyzed in Supplemental Fig. 8 have a sampling resolution of 20 minutes.

      3.The authors discuss only the mean absolute error in the text but isn't the direction (sign) of the error also of interest. As far as I can see in Fig 5, conservative CSP overestimates and greedy CSP generally underestimates periods.

      We discuss both the error (references to Fig. 5A) and absolute error (references to Fig. 5B) in the text. We feel the interpretation suggested by the reviewer may be too reliant on the results of 3-day simulations, as the apparent underestimation by greedy appears far less substantial in simulations of 6 and 12 days.

      **Minor Comments:**

      1.I would like to see the formulae for the ratio of variances and p-values to be clear about how the authors computed the CSP. They describe it in words already, but I think some mathematics is warranted here.

      We have added the formula for the standard chi-square periodogram to the Methods section.

      2.It is nice to the see the raw data in the plots. But I would like to see the plot of the summary statistics (mean and variance/st. dev) for each of scatter plots to judge the size of bias. It is not easy to do this with the Excel sheet.

      We have overlaid a black circle representing the median and a vertical black line representing the 5th-95th percentile range onto Fig. 5 and Supplemental Figs. 3-7.

      Reviewer #3 (Significance (Required)):

      The authors present a sobering perspective on the chi-squared periodogram, which is still very popular among empirical biologists. They plainly show using artificial data that it is better to avoid the CSP when possible, although they suggest improvements to the CSP. The authors provide an R package to perform the analysis.

      There have been previous work that have highlighted other limitations of the CSP. This might be considered one more nail in the coffin of the CSP.

      I think this paper would be interest to both computational biologists and wet-lab biologists, but I think it ought to have a greater influence on the latter as the former already resort to more sophisticated approaches.

      My expertise is in Computational and Theoretical biology.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The authors identify a serious flaw in a popular method called Chi-squared periodogram (CSP) for period estimation in circadian rhythms. They systematically get to the source of the problem -- a discontinuity in the test statistic. This flaw leads to a bias in the period estimate. They present two modifications to the CSP, one of which they prefer. Nevertheless, they show that other more flexible methods such as Lomb-Scargle Periodogram work well without this discontinuity (bias) issue.

      Major Comments:

      1.One thing the authors do not include is timeseries lengths of non-integer days. Would it not be an interesting suggestion to choose a non-integer length time course, which is not a multiple of the periods of interest, and still continue using CSP as is ? This is also rather counter-intuitive.

      2.I suppose the authors use a sampling resolution of 6min with wheel-running activity in mind. But it would be worth it in the interest of completeness to also consider a lower resolution. There is nothing in this study that ties it to the specific application, is it not?

      3.The authors discuss only the mean absolute error in the text but isn't the direction (sign) of the error also of interest. As far as I can see in Fig 5, conservative CSP overestimates and greedy CSP generally underestimates periods.

      Minor Comments:

      1.I would like to see the formulae for the ratio of variances and p-values to be clear about how the authors computed the CSP. They describe it in words already, but I think some mathematics is warranted here.

      2.It is nice to the see the raw data in the plots. But I would like to see the plot of the summary statistics (mean and variance/st. dev) for each of scatter plots to judge the size of bias. It is not easy to do this with the Excel sheet.

      Significance

      The authors present a sobering perspective on the chi-squared periodogram, which is still very popular among empirical biologists. They plainly show using artificial data that it is better to avoid the CSP when possible, although they suggest improvements to the CSP. The authors provide an R package to perform the analysis.

      There have been previous work that have highlighted other limitations of the CSP. This might be considered one more nail in the coffin of the CSP.

      I think this paper would be interest to both computational biologists and wet-lab biologists, but I think it ought to have a greater influence on the latter as the former already resort to more sophisticated approaches.

      My expertise is in Computational and Theoretical biology.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Chi-squared periodograms (CSP) are routinely used in circadian biology. In particular, this test has been used to determine circadian period in behavioral data (e.g. actigraphy) in mammals, flies and other species. This paper suggests that CSP, in some circumstances (e.g. where there are discontinuities), that CSP could be improved by changing the algorithm. They propose different steps to do this (e.g. using their greedy CSP code) and/or by using alternative tests such as Lomb-Scargle.

      The authors use simulated data to demonstrate their findings, and whilst I can see the benefits of this, it would be useful to benchmark the algorithms on actual real world circadian data (e.g. actograms from mouse or fly experiments). Although these types of data may not be publicly available, it would be highly likely to be available from multiple labs in the circadian field. In particular, fly datasets will be abundant in many clock labs. This would aid the utility of the papers findings for the field.

      Significance

      The paper is helpful for the circadian field when dealing with datasets that may contain discontinuities.

      It appears that the paper will be primarily useful for behavioral data, rather than, for example, transcriptomic time courses, since these tend to be much shorter and less sample intensive. Thus, it would be useful for circadian (and other) researchers analysing activity data in particular.

      My expertise is in circadian rhythms, both behavioural and molecular (e.g. sequencing) level analyses. Thus, I would be a possible end-user for the algorithms in this paper.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Tackenberg & Hughey investigate the reliability of a popular period estimation algorithm, the chi-square periodogram. They find a bias in the estimation, and through careful investigation identify the cause. This is a well executed and well presented study.

      Comments:

      In Figs 2+3 the authors show that the discontinuity in periodogram coincides with the number of complete cycles, K. However, in Fig 2C there are several other positions where K abruptly changes, but little effect on the chi-squared statistic is observed. Can the authors offer an explanation as to why the magnitude of the discontinuities differ?

      An important claim is that the discontinuity is observed in multiple software implementations. However, the plots of Supplementary Fig 1C,D are presented too small to evaluate this claim.

      It may be of interest to apply the algorithms to a single-cell experimental data set which are qualitatively different (e.g., oscillation shape, damping).

      Significance

      It has been previously shown that the chi-square periodogram algorithm has performance shortcomings for the analysis of circadian data (e.g. Zielinski et al., 2004). However, this study demonstrates exactly why, giving more conclusive evidence to support the conclusion that it should be avoided. This will be useful to many in the mammalian circadian community. It should be noted however that other algorithms are already favoured by other ciock communities (e.g. plant), even if a rigorous understanding of the biases were lacking.

      The methods developed here will be valuable for future comparisons of circadian algorithms. Of particular importance will be comparing algorithms for analysis of single-cell rhythms or non-stationary rhythms.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to thank the reviewers for their comments and suggestions. Our responses to them are listed below. We are hopeful that they will be satisfied with our responses and the changes we made in the revised version of the manuscript.

      REVIEWER #1


      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In this manuscript, Ameen and colleagues report the results of a multidimensional proteomic analysis which combined quantitative proteomics, phosphoproteomics and N-terminomics in an effort to identify neuronal proteins displaying altered abundance or modifications by proteolysis and/or phosphorylation following an excitotoxic insult. Excitotoxicity is known to initiate by over-activation of ionotropic glutamate receptors which allows an increase in intracellular Ca2+ , ultimately leading to activation of proteases. The analysis revealed that glutamate treatment for up to 240 min did not significantly affect the abundance of neuronal proteins but caused dramatic changes in the phosphorylation state of many neuronal proteins. Based upon the phosphopeptides and neo-N-peptides, which contain the neo-N-terminal amino acid residue generated through proteolytic cleavage of intact neuronal proteins during excitotoxicity, the authors identified the proteins that undergo phosphorylation, dephosphorylation and/or enhanced proteolytic processing in excitotoxic neurons. By combining different software packages, they found that these modified proteins form complex interactions that affect signaling pathways regulating survival, synaptogenesis, axonal guidance and mRNA processing. These data suggest that perturbations in the aforementioned pathways mediate excitotoxic neuronal death. Then, the authors showed by Western blot analysis that CRMP2, a crucial regulator of axonal guidance signaling, exhibited enhanced truncation and reduced phosphorylation at specific sites upon glutamate treatment. These events may contribute to injury to dendrites and synapses associated with excitotoxic neuronal death. Furthermore, the authors showed that calpains are responsible for the proteolytic processing and cathepsins for enhanced degradation of proteins during excitotoxicity. Blockage of calpain-mediated cleavage site of the tyrosine kinase Src during excitotoxicity confers neuroprotection in an in vivo model of neurotoxicity. In that regard, over twenty protein kinases are predicted to be activated in excitotoxic neurons. Collectively, this study contributes to the construction of an atlas of phosphorylation and proteolytic processing events that occur during excitotoxicity and as such they can be targeted for therapeutic purposes.

      **Comments** Comment: The identification of potential calpain cleavage sites in neuronal proteins modified during excitotoxicity is an interesting finding of the study. However, the atlas presented appears to miss components such as Kinase D-interacting substrate of 220 kDa (Kidins220), also known as ankyrin repeat-rich membrane spanning (ARMS), a protein recently shown to be cleaved by calpain during excitotoxicity (López-Menéndez et al, 2019, Cell Death and Disease 10, 535).

      Response: The calpain cleavage site of neuronal ARMS/KIDINS220 was mapped to the peptide bond between Asn-1669 and Arg-1670 (Gamir-Morralla, et al. (2015) Cell Death & Diseases 6, e1939). The cleavage is expected to generate two truncated fragments – one of ~185 kDa and another of ~10 kDa at the N-terminal and C-terminal sides, respectively of the cleavage site. Our TAILS analysis failed to detect the 10 kDa fragment which contains the neo-N-terminus generated by calpain cleavage. Here are the possible explanations:

      The neo-N-terminus of the 10 kDa C-terminal fragment is unlikely to be observed in our experiment as the TAILS method relies on the production of peptides by trypsin. The 10 kDa fragment has Arginine as the first amino acid which means that the N-terminal peptide released and isolated by the TAILS method would be a single amino acid. In their publication, Gamir-Morralla, et al. showed that the total levels of both intact and degraded ARMS/Kidins220 decreased as a result of ischemic cerebral stroke, suggesting degradation rather than proteolytic processing to generate stable truncated fragments as the final outcome of calpain cleavage of ARMS/Kidins220 (Figure 2b of the publication by Gamir-Morralla, et al.). The TAILS method predominantly detects proteolytic processing whereas degradation can be more difficult to capture. Degradation often results in peptides containing less than 5-6 amino acids that are difficult to align with a single protein or result in transient peptide that may not be detectable in neurons at 240 min after glutamate treatment. **Overall, it is possible that Kidins220 is generated but was undetected by the TAILS approach.


      Comments: The CRMP2 antibody (Cell Signalling, 35672) used for western blots (figure 5D, also figure S11) and immunofluorescence (figure 5E) is problematic. Copied from https://www.cellsignal.com/products/primary-antibodies/crmp-2-d8l6v-rabbit-mab/35672: Monoclonal antibody is produced by immunizing animals with a synthetic peptide corresponding to residues surrounding lle546 of human CRMP-2 protein. The truncated CRMP2 (figure 5D) studied in the whole section (residues 1-516 or 1-517, ~57kDa) cannot be recognized by this monoclonal antibody. The detected band with the red letters in figure 5D might represent another cleavage product. In any case, asking Cell Signalling for more information about the exact immunogen might help, but since it's monoclonal and derived from residues surrounding lle546 it's very hard to include residues before aa516 and the unique epitope recognition upstream of aa516. The whole result section and discussion has to be reconsidered. Alternatively another antibody can be used to repeat those experiments in order to support the hypothesis. Time and resources are very familiar to authors since they have to repeat their previous work with a new antibody. Finally, there are no "western blot" and "immunofluorescence" methods for CRMP2.

      Response: We would like to apologise for incorrectly listing the catalogue number of the anti-CRMP2 antibody purchased from Cell Signalling technology. Rather than the rabbit monoclonal anti-CRMP2 antibody (Cell Signalling, Cat#: 35672), we used the polyclonal anti-CRMP2 antibody (Cell Signalling, Cat#9393) to perform all the Western blot and immunofluorescence analysis in this paper. The e-mail confirming the purchase of this antibody is appended. According to the vendor, the antibody was raised by immunizing rabbits with a synthetic peptide derived from the human CRMP2 sequence. We decided to order this antibody because Zhang, et al. (Sci Rep. 2016; 6: 37050) reported that it could detect the truncated CRMP2 fragments generated by calpain cleavage in primary cortical neurons in vitro in response to axonal damage.

      *The procedures of Western blot and immunofluorescence detailing the correct CRMP2 antibody descriptions are added in the revised version of the submitted manuscript.

      *


      Comment: The truncated DCLK1 bands detected in figure S8B cannot be attributed to the proteolytic processing of DCLK1 at the sites described: T311↓S312, S312↓S313 and N315↓G316 (predicted M.W. of the (C-terminal) products: 48.7-49.1kDa (figure S8A) which is very close to be well-separated with conventional PAGE). The number and the separation of the bands suggest other cleavage sites. Response: We agree with the reviewer’s comment that conventional SDS-PAGE cannot differentiate the proteolytic products generated by cleavage at the three sites identified by TAILS. Furthermore, the TAILS methods could not detect all peptides generated by a protein during proteolysis. Therefore, validating our results with a Western blot experiment may reveal unidentified peptides in certain cases. We have now added the following statement in the revised manuscript to reflect the presence of other cleavage sites: “Besides detecting the 50-56 kDa truncated fragments, the antibody also cross-reacted with several truncated fragments of ~37-45 kDa. These findings suggest that DCLK1 underwent proteolytic processing at multiple other sites in addition to the three cleavage sites identified by our TAILS analysis.

      Comment: Could the striking observation that almost all proteolytic processing during excitotoxicity is catalyzed by calpains and/or cathepsins have derived (partially) from unspecific targets of calpeptin such as a subset of tyrosine phosphatases (Schoenwaelder and Burridge, 1999: approx. 1h treatment of fibroblasts with approx.. 10x less concentration) or other(s)? Response: Schoenwaelder and Burridge (1999, JBC 274:14359) reported that calpeptin exhibits both protease inhibitor as well as a protease inhibitor-independent activities in fibroblasts. Besides inhibiting calpains and cathepsins, they demonstrated that calpeptin could selectively inhibit a subset of membrane-bound tyrosine phosphatases. Since the TAILS method monitored the protease inhibitor activity of calpeptin, the proteolytically processing events mitigated by calpeptin in neurons during excitotoxicity are likely attributed to its protease inhibitor activity. Additionally, Schoenwaelder and Burridge reported this unconventional protease inhibitor-independent activity of calpeptin in fibroblasts. Since the protein tyrosine kinases expressed in neurons and fibroblasts are different, it is unclear if calpeptin can also exert such activity in neurons.

      Comment: Describing the final part of figure 4C the authors suggest that "Liver kinase B1 homolog (LKB1), CaM kinase kinase β (CaMKKβ) and transforming growth factor‐β‐activating kinase 1 (TAK1) are the known upstream kinases directly phosphorylating T172 of AMPKα to activate AMPK (Herrero-Martin et al., 2009; Woods et al., 2005; Woods et al., 2003). Our findings therefore predict activation of these kinases during excitotoxicity (Figure 4C)." The first question arising here is whether these three kinases are the only ones know to phosphorylate AMPKα. Even if this is true, it is highly speculative to suggest that the findings of the present study predict the activation of these kinases during excitotoxicity, without providing the necessary experimental data, since the increased phosphorylation of AMPK may be an indirect effect of the reduced function of a phosphatase. Thus the proposed model does not hold. Response: Agree. We have therefore revised our interpretation of the results to reflect this possibility. The Revised sentence on page 13 reads “**Liver kinase B1 homolog (LKB1), CaM kinase kinase β (CaMKKβ) and transforming growth factor‐β‐activating kinase 1 (TAK1) are the known upstream kinases directly phosphorylating T172 of AMPKα to activate AMPK (Herrero-Martin et al., 2009; Woods et al., 2005; Woods et al., 2003), while a member of the metal-dependent protein phosphatase (PPM) family could dephosphorylate T172 of AMPK in cells (Garcia-Haro et al., 2010). Our findings therefore predict activation of these kinases and/or inactivation of the PPM family phosphatase in neurons during excitotoxicity (Figure 4C).”

      Additionally, we also deleted the schematic diagram depicting the possibility of activation of LKB1, CaMKKβ and TAK1 in Figure 4 of the revised manuscript.

      __**Minor points**

      __

      Minor Comment: Highlights could present the key points of the study in a more straightforward manner. Response: Agree. We have edited the highlights in our revised manuscript to make them more straightforward.


      Minor comment: Figure 4A is too complicated. Proteins considered as hubs of signaling pathways in neurons should be somehow highlighted to distinguish them.

      Response: Agree. We have now highlighted the signalling hubs by shading them in green in the revised figure. As we merged figures 2 and 4 of the original manuscript, these signalling hubs are presented in Figure 2B of the revised manuscript.

      Minor Comment: The analysis of proteins with enhanced truncation and reduced phosphorylation such as CRMP2 and DCLK1 is fragmented. In addition, the authors should mention the criteria based on which these proteins were selected for further analysis.

      Response: IPA analysis revealed synaptogenesis and axonal guidance as the top-ranked perturbed canonical signalling pathways governed by neuronal proteins undergoing significantly increased proteolytic processing and altered phosphorylation. As CRMP2 and DCLK1 are the key players in these pathways, they were chosen for further biochemical analysis to validate the TAILS results. To address this point, we added a few statements in the sections describing results of biochemical analysis of CRMP2 and DCLK1 in the revised manuscript. The additional sentences on page 13 now read “IPA analysis of the significantly modified neuronal proteins identified in our study predicted perturbation of signalling pathways governing axonal guidance and synaptogenesis in neurons during excitotoxicity (Figure S7). Since CRMP2 (also referred as DPYSL2) is a key player in neuronal axonal guidance and synaptogenesis (Evsyukova et al., 2013) and it underwent significant changes in phosphorylation state and proteolytic processing (Figures 5A and S7), it was chosen for validation of our proteomic results.” The additional sentences on page 15 read ”Similar to CRMP2, DCLK1 is also a key player in regulation of axonal guidance and synaptogenesis (Evsyukova et al., 2013). Since our TAILS results revealed significant proteolytic processing of DCLK1 (Figure S8A), it was chosen for validation of our proteomic results.”

      • *

      Minor comment: The potential therapeutic relevance of phosphorylation and proteolytic processing events that occur during excitotoxicity can be further explored. Response: Thanks for the suggestion. We have added a paragraph describing the additional evidence that protein kinase inhibitors and cell-permeable inhibitors blocking calpain cleavage of specific neuronal proteins as potential neuroprotectants to reduce brain damage induced by ischemic stroke. The additional sentences near the end of the Discussion section (page 25) now read Since CRMP2 is key player in axonal guidance and synaptogenesis revealed by our proteomic analysis as the most perturbed cellular processes in excitotoxicity, blockade of its cleavage to form the truncated CRMP fragment is another potential neuroprotective strategy. Indeed, a cell-permeable Tat-CRMP2 peptide encompassing residues 491-508 close to the identified cleavage sites of CRMP2 could block calpain-mediated cleavage of neuronal CRMP2 and protect neurons against excitotoxic cell death (Yang et al., 2016)**.”

      • *

      The additional paragraph at the end of the Discussion section (page 25) now reads: “Besides the neuronal proteins undergoing enhanced proteolytic processing during excitotoxicity, protein kinases predicted by our phosphoproteomic results to be activated during excitotoxicity are also targets for the development of neuroprotective drugs. For example, our results demonstrated significant activation of neuronal AMPK during excitotoxicity, suggesting that aberrant activation of AMPK can contribute to neuronal death. Of relevance, small-molecule AMPK inhibitors could protect against neuronal death induced by ischemia in vitro, and brain damages induced by ischemic stroke in vivo. Likewise, inhibitors of Src and other Src-family kinases were known to protect against neuronal loss in vivo in a rat model of in traumatic brain injury (Liu et al., 2008a; Liu et al., 2017). Future investigation of the role of the excitotoxicity-activated protein kinases in excitotoxic neuronal death will reveal if small-molecule inhibitors of these kinases are potential neuroprotective drug candidates.”

      • *

      • *

      Minor comment: I am sorry but I could not find Figure 8, which is supposed to show the "In vivo model of NMDA neurotoxicity" (please, see page 30).

      Response: Our apology for the mistake. This should be Figure 6 of the revised manuscript.

      Minor comment: Introduction: O'Collins et al., 2006; Savitz and Fisher, 2007; both references are missing.

      Response:* This was an oversight from our part and the references have been added to the revised manuscript.**

      *

      Minor comment: Figure S1A-B: vehicle treatment time course is needed. Response: All neurons were cultured in neurobasal media for seven days. The control neurons were incubated in culture media while we started treating the other neurons with glutamate for MTT and LDH assay. The additional paragraph describing the design of the cell viability/death assays in page 32 reads “Primary cortical neurons were incubated for 480 min with and without the addition of 100 μM of glutamate. The control neurons were incubated for 480 min in culture medium. For neurons treated with glutamate for 30 min, 60 min, 120 min and 240 min, they were pre-incubated in culture medium for 450 min, 420 min, 360 min and 240 min, respectively prior to the addition of glutamate to induce excitotoxicity. For neurons treated with glutamate for 480 min, they were treated with glutamate just after seven days of culture in neurobasal media.”

      • *

      Minor comment: Figure 5E: Control close-up is missing. Response: A close-up view of the control neurons is now provided in Figure 4E of the revised manuscript.

      *

      *

      Minor comment: "Moreover, the number of CRMP2-containing dendritic blebs in neurons at 240 min of glutamate treatment was significantly higher than that in neurons at 30 min of treatment (inset of Figure 5E)." Such a statistic is not shown in the graph. Response: The statistical analysis results are now added to the revised manuscript in Figure 5E.

      • *

      Minor comment: "Consistent with this prediction, our bioinformatic analysis revealed that the identified cleavage sites in most of the significantly degraded neuronal proteins during excitotoxicity are mapped within functional domains with well-defined three-dimensional structures (Figures 6A)." Authors might mean figure S12A? Response: Correct. Our apology for the mislabelling. This has been corrected to “S12A”in the revised manuscript.

      Minor comment: "Neuronal Src was identified by the three criteria of our bioinformatic analysis to be cleaved by calpains to form a stable truncated protein fragment during excitotoxicity (Figures 6A and Table S6)." Authors might mean figure 6D?

      Response: Correct. Our apology for the mislabelling. Since we merged figures 2 and 4 of the original manuscript. This has been corrected to now read “(Figure 5D)” on page 18 of the revised manuscript.

      Minor comment: Figure 2B: Clusters 1, 3, 4 and 6 do not follow treatment trends homogenously in all time points. For example in cluster 1 there is a phosphopeptide following the pattern 1, 0, -1 and another one following the pattern 0, 1, -1, which is actually a very different pattern even if the end value is stable (-1). The first example could belong to the cluster 6 as well, while the second example to cluster 5. Please elaborate on the rationale behind the categorization. Is there any other clustering method that can be used without making the categorization more complicated? Response: Since we merged Figures 2 and 4 of the original manuscript. This comment relates to the right panel of Figure 2A of the revised manuscript. The rationale behind the categorization of the phosphopeptides into six clusters was based upon the patterns of changes of their abundance (i.e. average of log-2 normalized z-score of phosphopeptide intensity) in three sample groups. **We calculated the number of permutations where the number of sample groups in set (n) = 3 (i.e. Control neurons, neurons of 30 min glutamate treatment and neurons of 240 min glutamate treatment) and number of sample groups in each permutation (r) = 3 (i.e. all three sample groups should be present in each permutation). Hence the number of permutations is 6. The six clusters refer to the six possible permutations of the patterns of abundance changes of the identified phosphopeptides rather than the end results.

      Minor comment: A problem of the manuscript is its length and lack of coherence. Apart from presenting the data from the proteomics, phosphoproteomics and N-terminomics analyses, the authors focus on several different proteins to perform validation experiments and further characterize the biological significance of their modification. Because these proteins do not fall on the same pathway, the authors end up presenting several independent stories that complicate the reader. Response: We agree that proteins that do not operate in the same signalling pathway were chosen for further biochemical analysis. Their choice was justified because they are key players in the most perturbed canonical signalling pathways identified by bioinformatic analysis with the IPA software. We agree that this may complicate the reader. However, it also helps to illustrate that excitotoxic neuronal death is a complicated cell death process caused by dysregulation of multiple neuronal proteins which regulate different cellular processes.

      Minor comment: Moreover, it is necessary for the authors to restructure their introduction, and avoid over-representing previous research on nerinetide, which is not used anywhere in the manuscript. Instead, the introduction must be more focused to better capture the necessity and essence of the present study. Response: We agree. Based on the reviewer’s comments, we decided to restructure the introduction by shortening the description of the results of Nerinetide research. Please refer to the track changes of the revised manuscript for the changes.

      Minor comment: Taking into account figures 1 and S2 I understand that the authors combined samples of neuronal cell cultures (treated or not with Glu) with samples from mouse brains (that have undergone ischemic stroke/TBI or sham operation). If this is the case, why did the authors do that? How did they combine the different samples? And why this is not mentioned anywhere is the main text? Response: For a data-independent acquisition (DIA) based mass spectrometry experiment, it is essential we generate a library of identifiable peptides first using a standard data-dependent acquisition (DDA) approach. For the DIA type experiment to work, the identified peptides have to be in that library first. Excitotoxicity is a major mechanism of neuronal loss caused by ischemic stroke and traumatic brain injury. We therefore included the brains of sham-operated mice, brains of mice suffering ischemic stroke and traumatic brain injury to construct the spectral libraries and that is why the library contains pooled samples from the representative samples. Pre-fractionation of the pooled peptides was also performed to increase the number of identifiable peptides and generate a deeper library.

      • Once we generated that library, all samples are analysed individually as a separate DIA experiment. The DIA approach then makes use of the generated library for identification and quantitation. This methodology allows for deeper identification and lower number of missing values. These statements were added in the method section of the revised manuscript (page 33)*

      Minor comment: Regarding figure 5D, the authors write in the main text "Consistent with our phosphoproteomic results, the truncated fragment CRMP2 fragments could not cross-react with the anti-pT509 CRMP2 antibody (Figure 5D)" In the upper blot the truncated CRMP2 fragment runs well below the 70 kDa marker. However, in the middle panel, where we see the blot with the phospho specific antibody, the respective area of the blot has been cropped, so we cannot see whether the truncated fragment cross-reacts with the phospho specific antibody. Response: The presentation of the western blots in Figure 5D in the revised manuscript are now less cropped and clearly demonstrate there is no cross reactivity of the phospho specific antibody with the truncated fragment. Please refer to the revised Figure 5 for the updated Western blot images.

      Minor comment: It is strange that only 1 and 13 proteins showed significant changes in abundance at 30 and 240min respectively. Especially after 240min of glutamate treatment one could expect that many proteins should change in their levels, since the neurons are almost diminished by cell death at that point. How could the authors explain this phenomenon? Additionally, in their previous publication, they showed that much more proteins change significantly in abundance following glutamate treatment (at 30min and 240min).

      Response: Even though our global spectral libraries contain over 49,000 identifiable peptides derived from 6524 proteins, only 1696 quantifiable proteins were identified in the DIA mass spectrometry analysis (Figure 1) because we used stringent criteria for their identification: (i) false discovery rate of We agree with the reviewer that many more proteins are expected to change their abundance at 240 min as significant cell death was detected. However, if we had used less stringent false discovery rates of their identification and quantification, included proteins with just one unique identified peptide and lowered the threshold of abundance fold changes, many more proteins with significantly changed abundance would be detected. But we preferred to use these stringent criteria to ensure a high confidence in our identification of neuronal proteins undergoing significant changes during excitotoxicity.*

      • *

      • *

      In agreement with the low number of neuronal proteins exhibiting significant changes in abundance reported in this manuscript, our previously published study (Hoque, et al. (2019) Cell Death & Diseases) detected only 26 neuronal proteins undergoing changes in abundance. Hence, we disagree with the reviewer that our previous publication reported much more proteins undergoing changes in abundance in excitotoxicity.

      Reviewer #1 (Significance (Required)): Comment on significance: The manuscript delivers a large amount of data, regarding changes in the proteome, the activation of specific kinases, phosphatases, as well as the molecular pathways that are activated at distinct time points of excitotoxicity. This information could be used in future studies to validate and develop potential therapeutic strategies that could protect against neuronal loss in various neurological disorders. Response: We are excited that Reviewer #1 felt that this large amount of generated data will be useful for subsequent studies to validate and develop novel therapeutic strategies.

      Comment on significance: The same group has very recently published a work very similar to the particular manuscript (Hoque et al. Cell Death and Disease, 2019). In their previous publication, the authors cover a large part of their current objectives. They performed again a proteomic and phosphoproteomic analysis of mouse primary cortical neurons treated with glutamate for distinct time points, in their aim to identify changes in expression and phosphorylation state of neuronal proteins upon excitotoxicity. Apart from the N-terminome, which they investigate in their current manuscript, the proteomic and phospho proteomic analysis are very similar. As such, and because of the fact that the current manuscript is very extensive, the authors should consider to minimize it, and include only their novel findings (changes in the N-terminome, the involvement of specific kinases that contribute to excitotoxic neuronal death, the regulatory mechanism of CRMP2, etc).

      Response: Since the coverage of phosphoproteins undergoing changes in neurons during excitotoxicity identified in the current study is much higher than that of phosphoproteins identified in our previously published study, we prefer to retain the description of the phosphoproteomic findings in this manuscript. Nonetheless, we agree that the manuscript needs to be shortened. Our suggestions to shorten the manuscript are listed below:

      1. Move the description and results of global proteomic analysis to supplementary information. Since we made the same observation that only a small number of neuronal proteins undergo significant changes in abundance during excitotoxicity in our previously published study, moving the global proteomic analysis results away from the main text will not adversely impact the quality of the presentation.
      2. For the description of how we classified the identified N-terminal peptides as those derived from degradation and those derived from proteolytic processing, we would like to move it to the supplementary information. Comment on significance: The authors should describe in a simpler way the proteomic and bioinformatics analyses they are using in the manuscript. It is difficult to understand the methodology used if you are not an expert in proteomics and bioinformatics. My suggestion is to revise their text and make it simpler and more concise. Response: We agree with this criticism. As we are not allowed to make a major revision of the manuscript at this stage, the revised manuscript contains only minor revisions that addresses all of the comments and suggestions provided by the two reviewers. Further changes will be added in the next revised version. Our suggestions to further restructure the manuscript are listed below:

      Figure S5 depicting the rationale for classification of N-terminal peptides as products of degradation and those of proteolytic processing will be moved to the main text. The description of the rationale in the main text will be revised to help readers who are not experts in proteomics to better understand the rationale. A diagram depicting the workflow of our TAILS method will be added as a supplementary figure. For bioinformatic analysis of the proteomic results, we will provide in the supplementary information the definition of the following terms relevant to Ingenuity Pathway Analysis and PhosphoPath analysis of the perturbed biological processes and signalling pathways: (a) Canonical Signalling Pathways, (b) Cellular Processes and (c) Interaction Networks. A short description of how their identification benefits the mapping of the neurotoxic signalling networks in neurons will be provided in the supplementary information.

      • *

      • *

      REVIEWER #2


      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Comment: In this article, Ameen and collaborators identify the modified proteins during neuronal excitotoxicity by using an in vitro model in which mouse primary cortical neurons are treated 30 and 240 min with 100 µM Glutamate. They use different approaches: a quantitative label-free global and phospho-proteomic methods and a quantitative N-terminomic procedure called Terminal Amine Isotopic Labelleling of Subtrates (TAILS). Results show that 240 min glutamate has minimal impact on protein abundance (13 neuronal proteins show significant changes) but enhance a modification of phosphorylation state and proteolysis of nearly 900 proteins. A significant part of these proteins are involved signalling pathway involved in cell survival, synaptogenesis and axonal guidance.

      The paper is globally well written and experiments are convincing. The methodology and the analysis are well described and well explain. The text and each figure are clear and accurate. However, I have just one comment that needs answers and/or clarifications. Thanks for your work. Response: We appreciate the compliment provided by this reviewer on our submitted manuscript.

      **Minor comment:**

      Minor comment: Primary neurons are used at DIV7 and it has been shown that at DIV7 the percentage of astrocytes is relatively low, however astrocytes plays a key role in glutamate recapture and release. It will be relevant to know the percentage of glial cell in the culture model of the authors and how astrocytes are involved in glutamate recapture and also in excitotoxicity.

      Response: The compositions of the DIV7 cultures are: 94.1+/- 1.1 % neurons, 4.9%+/-1.1% astrocytes, and *

      Reviewer #2 (Significance (Required)):

      Comment on significance: Excitotoxicity is a cell death process involved in many neurological disorders. However, nowadays, there are no existent FDA-approved pharmacological agents targeted to protect against excitotoxicity leading to neuronal death. A better comprehension of excitotoxicity is required to improve prevention, therapy and reparation following the disease.

      With this work, the authors highlighted modified proteins in excitotoxic neurons. Interestingly, few of these proteins are involved in cell survival, mRNA processing or axonal guidance. This atlas of phosphorylation and proteolytic processing events during excitotoxicity permit the identification of new therapeutic targets such as calpain-mediated cleavage of Src kinase. This atlas will interest a lot of team working on neurological disorders such as Alzheimer disease, Parkinson disease or stroke. It will permit to better characterize cellular/molecular events involved in neuronal loss and will permit to find new therapeutic targets. Response: In response to this comment and a similar comment by Reviewer 1, we expanded the discussion to include the potential therapeutic values of our findings.

      Comment on significance: My field of expertise: Stroke, cell death, excitotoxicity, signalling pathways and molecular targets, autophagy. I don't have sufficient expertise to evaluate proteomic analysis.

      Response: No response is needed.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this article, Ameen and collaborators identify the modified proteins during neuronal excitotoxicity by using an in vitro model in which mouse primary cortical neurons are treated 30 and 240 min with 100 µM Glutamate. They use different approaches: a quantitative label-free global and phospho-proteomic methods and a quantitative N-terminomic procedure called Terminal Amine Isotopic Labelleling of Subrates (TAILS). Results show that 240 min glutamate has minimal impact on protein abundance (13 neuronal proteins show significant changes) but enhance a modification of phosphorylation state and proteolysis of nearly 900 proteins. A significant part of these proteins are involved signalling pathway involved in cell survival, synaptogenesis and axonal guidance.

      The paper is globally well written and experiments are convincing. The methodology and the analysis are well described and well explain. The text and each figure are clear and accurate. However, I have just one comment that needs answers and/or clarifications. Thanks for your work.

      Minor comment:

      Primary neurons are used at DIV7 and it has been shown that at DIV7 the percentage of astrocytes is relatively low, however astrocytes plays a key role in glutamate recapture and release. It will be relevant to know the percentage of glial cell in the culture model of the authors and how astrocytes are involved in glutamate recapture and also in excitotoxicity.

      Significance

      Excitotoxicity is a cell death process involved in many neurological disorders. However, nowadays, there are no existent FDA-approved pharmacological agents targeted to protect against excitotoxicity leading to neuronal death. A better comprehension of excitotoxicity is required to improve prevention, therapy and reparation following the disease.

      With this work, the authors highlighted modified proteins in excitotoxic neurons. Interestingly, few of these proteins are involved in cell survival, mRNA processing or axonal guidance. This atlas of phosphorylation and proteolytic processing events during excitotoxicity permit the identification of new therapeutic targets such as calpain-mediated cleavage of Src kinase. This atlas will interest a lot of team working on neurological disorders such as Alzheimer disease, Parkinson disease or stroke. It will permit to better characterize cellular/molecular events involved in neuronal loss and will permit to find new therapeutic targets.

      My field of expertise: Stroke, cell death, excitotoxicity, signalling pathways and molecular targets, autophagy. I don't have sufficient expertise to evaluate proteomic analysis.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Ameen and colleagues report the results of a multidimensional proteomic analysis which combined quantitative proteomics, phosphoproteomics and N-terminomics in an effort to identify neuronal proteins displaying altered abundance or modifications by proteolysis and/or phosphorylation following an excitotoxic insult. Excitotoxicity is known to initiate by over-activation of ionotropic glutamate receptors which allows an increase in intracellular Ca2+ , ultimately leading to activation of proteases. The analysis revealed that glutamate treatment for up to 240 min did not significantly affect the abundance of neuronal proteins but caused dramatic changes in the phosphorylation state of many neuronal proteins. Based upon the phosphopeptides and neo-N-peptides, which contain the neo-N-terminal amino acid residue generated through proteolytic cleavage of intact neuronal proteins during excitotoxicity, the authors identified the proteins that undergo phosphorylation, dephosphorylation and/or enhanced proteolytic processing in excitotoxic neurons. By combining different software packages, they found that these modified proteins form complex interactions that affect signaling pathways regulating survival, synaptogenesis, axonal guidance and mRNA processing. These data suggest that perturbations in the aforementioned pathways mediate excitotoxic neuronal death. Then, the authors showed by Western blot analysis that CRMP2, a crucial regulator of axonal guidance signaling, exhibited enhanced truncation and reduced phosphorylation at specific sites upon glutamate treatment. These events may contribute to injury to dendrites and synapses associated with excitotoxic neuronal death. Furthermore, the authors showed that calpains are responsible for the proteolytic processing and cathepsins for enhanced degradation of proteins during excitotoxicity. Blockage of calpain-mediated cleavage site of the tyrosine kinase Src during excitotoxicity confers neuroprotection in an in vivo model of neurotoxicity. In that regard, over twenty protein kinases are predicted to be activated in excitotoxic neurons. Collectively, this study contributes to the construction of an atlas of phosphorylation and proteolytic processing events that occur during excitotoxicity and as such they can be targeted for therapeutic purposes.

      Comments

      The identification of potential calpain cleavage sites in neuronal proteins modified during excitotoxicity is an interesting finding of the study. However, the atlas presented appears to miss components such as Kinase D-interacting substrate of 220 kDa (Kidins220), also known as ankyrin repeat-rich membrane spanning (ARMS), a protein recently shown to be cleaved by calpain during excitotoxicity (López-Menéndez et al, 2019, Cell Death and Disease 10, 535).

      The CRMP2 antibody (Cell Signalling, 35672) used for western blots (figure 5D, also figure S11) and immunofluorescence (figure 5E) is problematic. Copied from https://www.cellsignal.com/products/primary-antibodies/crmp-2-d8l6v-rabbit-mab/35672: Monoclonal antibody is produced by immunizing animals with a synthetic peptide corresponding to residues surrounding lle546 of human CRMP-2 protein. The truncated CRMP2 (figure 5D) studied in the whole section (residues 1-516 or 1-517, ~57kDa) cannot be recognized by this monoclonal antibody. The detected band with the red letters in figure 5D might represent another cleavage product. In any case, asking Cell Signalling for more information about the exact immunogen might help, but since it's monoclonal and derived from residues surrounding lle546 it's very hard to include residues before aa516 and the unique epitope recognition upstream of aa516. The whole result section and discussion has to be reconsidered. Alternatively another antibody can be used to repeat those experiments in order to support the hypothesis. Time and resources are very familiar to authors since they have to repeat their previous work with a new antibody. Finally, there are no "western blot" and "immunofluorescence" methods for CRMP2.

      The truncated DCLK1 bands detected in figure S8B cannot be attributed to the proteolytic processing of DCLK1 at the sites described: T311↓S312, S312↓S313 and N315↓G316 (predicted M.W. of the (C-terminal) products: 48.7-49.1kDa (figure S8A) which is very close to be well-separated with conventional PAGE). The number and the separation of the bands suggest other cleavage sites.

      Could the striking observation that almost all proteolytic processing during excitotoxicity is catalyzed by calpains and/or cathepsins have derived (partially) from unspecific targets of calpeptin such as a subset of tyrosine phosphatases (Schoenwaelder and Burridge, 1999: approx. 1h treatment of fibroblasts with approx.. 10x less concentration) or other(s)?

      Describing the final part of figure 4C the authors suggest that "Liver kinase B1 homolog (LKB1), CaM kinase kinase β (CaMKKβ) and transforming growth factor‐β‐activating kinase 1 (TAK1) are the known upstream kinases directly phosphorylating T172 of AMPKα to activate AMPK (Herrero-Martin et al., 2009; Woods et al., 2005; Woods et al., 2003). Our findings therefore predict activation of these kinases during excitotoxicity (Figure 4C)." The first question arising here is whether these three kinases are the only ones know to phosphorylate AMPKα. Even if this is true, it is highly speculative to suggest that the findings of the present study predict the activation of these kinases during excitotoxicity, without providing the necessary experimental data, since the increased phosphorylation of AMPK may be an indirect effect of the reduced function of a phosphatase. Thus the proposed model does not hold.

      Minor points

      Highlights could present the key points of the study in a more straightforward manner.

      Figure 4A is too complicated. Proteins considered as hubs of signaling pathways in neurons should be somehow highlighted to distinguish them.

      The analysis of proteins with enhanced truncation and reduced phosphorylation such as CRMP2 and DCLK1 is fragmented. In addition, the authors should mention the criteria based on which these proteins were selected for further analysis.

      The potential therapeutic relevance of phosphorylation and proteolytic processing events that occur during excitotoxicity can be further explored.

      I am sorry but I could not find Figure 8, which is supposed to show the "In vivo model of NMDA neurotoxicity" (please, see page 30).

      Introduction: O'Collins et al., 2006; Savitz and Fisher, 2007; both references are missing.

      Figure S1A-B: vehicle treatment time course is needed.

      Figure 5E: Control close-up is missing.

      "Moreover, the number of CRMP2-containing dendritic blebs in neurons at 240 min of glutamate treatment was significantly higher than that in neurons at 30 min of treatment (inset of Figure 5E)." Such a statistic is not shown in the graph.

      "Consistent with this prediction, our bioinformatic analysis revealed that the identified cleavage sites in most of the significantly degraded neuronal proteins during excitotoxicity are mapped within functional domains with well-defined three-dimensional structures (Figures 6A)." Authors might mean figure S12A?

      "Neuronal Src was identified by the three criteria of our bioinformatic analysis to be cleaved by calpains to form a stable truncated protein fragment during excitotoxicity (Figures 6A and Table S6)." Authors might mean figure 6D?

      Figure 2B: Clusters 1, 3, 4 and 6 do not follow treatment trends homogenously in all time points. For example in cluster 1 there is a phosphopeptide following the pattern 1, 0, -1 and another one following the pattern 0, 1, -1, which is actually a very different pattern even if the end value is stable (-1). The first example could belong to the cluster 6 as well, while the second example to cluster 5. Please elaborate on the rationale behind the categorization. Is there any other clustering method that can be used without making the categorization more complicated?

      A problem of the manuscript is its length and lack of coherence. Apart from presenting the data from the proteomics, phosphoproteomics and N-terminomics analyses, the authors focus on several different proteins to perform validation experiments and further characterize the biological significance of their modification. Because these proteins do not fall on the same pathway, the authors end up presenting several independent stories that complicate the reader.

      Moreover, it is necessary for the authors to restructure their introduction, and avoid over-representing previous research on nerinetide, which is not used anywhere in the manuscript. Instead, the introduction must be more focused to better capture the necessity and essence of the present study.

      Taking into account figures 1 and S2 I understand that the authors combined samples of neuronal cell cultures (treated or not with Glu) with samples from mouse brains (that have undergone ischemic stroke/TBI or sham operation). If this is the case, why did the authors do that? How did they combine the different samples? And why this is not mentioned anywhere is the main text?

      Regarding figure 5D , the authors write in the main text "Consistent with our phosphoproteomic results, the truncated fragment CRMP2 fragments could not cross-react with the anti-pT509 CRMP2 antibody (Figure 5D)" In the upper blot the truncated CRMP2 fragment runs well below the 70 kDa marker. However, in the middle panel, where we see the blot with the phospho specific antibody, the respective area of the blot has been cropped, so we cannot see whether the truncated fragment cross-reacts with the phospho specific antibody.

      It is strange that only 1 and 13 proteins showed significant changes in abundance at 30 and 240min respectively. Especially after 240min of glutamate treatment one could expect that many proteins should change in their levels, since the neurons are almost diminished by cell death at that point. How could the authors explain this phenomenon? Additionally, in their previous publication, they showed that much more proteins change significantly in abundance following glutamate treatment (at 30min and 240min).

      Significance

      The manuscript delivers a large amount of data, regarding changes in the proteome, the activation of specific kinases, phosphatases, as well as the molecular pathways that are activated at distinct time points of excitotoxicity. This information could be used in future studies to validate and develop potential therapeutic strategies that could protect against neuronal loss in various neurological disorders.

      The same group has very recently published a work very similar to the particular manuscript (Hoque et al. Cell Death and Disease, 2019). In their previous publication, the authors cover a large part of their current objectives. They performed again a proteomic and phosphoproteomic analysis of mouse primary cortical neurons treated with glutamate for distinct time points, in their aim to identify changes in expression and phosphorylation state of neuronal proteins upon excitotoxicity. Apart from the N-terminome, which they investigate in their current manuscript, the proteomic and phospho proteomic analysis are very similar. As such, and because of the fact that the current manuscript is very extensive, the authors should consider to minimize it, and include only their novel findings (changes in the N-terminome, the involvement of specific kinases that contribute to excitotoxic neuronal death, the regulatory mechanism of CRMP2, etc).

      The authors should describe in a simpler way the proteomic and bioinformatics analyses they are using in the manuscript. It is difficult to understand the methodology used if you are not an expert in proteomics and bioinformatics. My suggestion is to revise their text and make it simpler and more concise.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript describes two advances. First is the technical development for a protein targeting system called PInT that brings a target protein close to (~320 bp) a DNA sequence of interest. The idea is that localisation of the target protein allows one to distinguish its effects on the DNA sequence either in cis (when targeted) or in trans (when not targeted but expressed at the same level). Since targeting is conveyed by simply adding the small molecule ABA to the experiment, it is easy to compare the two situations. This is a clever idea and it is substantiated by data showing that the components of PInT do not affect triplet repeat instability or gene expression of GFP, into whose gene the PInT system is placed. Moreover, targeting is shown to enable enzymatic activity in the targeted region. Using the DNA methylase DNMT1, there are local increases in DNA methylation. Similarly, targeting the histone deacetylase HDAC5 results in local decreases in histone H3 acetylation.

      We thank the reviewer for a thoughtful and helpful review.

      What is not clear from these experiments, however, is whether the targeted proteins can interact normally with partner proteins to form functional complexes. One necessary control is to add ChIP for at least one interacting protein each for DNMT1 and for HDAC5 and show that targeting permits normal protein-protein interactions. This experiment is straightforward as specific interacting proteins are known and good antibodies to precipitate those proteins are available.

      This is a good suggestion and we plan on doing this experiment in our 59B-Y-HDAC5 and 89B-Y-DNMT1 lines with and without ABA using interacting proteins. The exact interacting protein to be used will depend on the antibodies availability and quality, which we will test. We will start with UHRF1 and HDAC3 for PYL-Dnmt1 and PYL-HDAC5, respectively.

      Overall, PInT would likely be useful for many groups studying the effects of chromatin modifiers on a DNA sequence of interest.

      The second advance is conceptual and is focused more specifically on triplet repeat expansions. The manuscript describes experiments that measure genetic instability of long CAG-CTG repeats with and without protein targeting. The results show that allele size distributions are not significantly affected by targeting either DNMT1 or HDAC5. One curious outcome that is not discussed is contraction frequency in the HDAC5 experiment. Zero contractions are reported compared to 10-20% contractions in the other two experiments. Authors need to provide an explanation.

      Lack of contractions in this experiment is likely due to the lower number of repeats in this line (59 vs 89/91). It is known that longer repeats display higher frequency of contractions, and contractions are rarely seen in short repeats (Larson et al Neurobiology of Disease 2015, Gomes-Pereira et al PLOS Genet 2007, Morales et al HMG 2020). Albeit, the threshold may be different in our HEK293-derived cells. Of note, we had a clone of 89B-Y-HDAC5 that did not express the expected amount of GFP for unknown reasons and we did not use it here. However, small pool PCRs using this line with 89 repeats showed that contractions were indeed present. Although we cannot rule out that the reason for the contractions is the unknown mutation(s), it suggests that the difference is due to the size of the expansion. We have added a comment in the methods section.

      It reads: “We have noted that cell lines with repeats that are mildly expanded (e.g., 59 CAGs) have fewer contractions than longer ones. This is consistent with several studies in the context of DM1 and HD [82], albeit the size threshold for seeing more contractions may be shorter in HEK293-derived cells than in mice.”

      The major issue with this set of experiments is that there is no positive control where instability is shown to be clearly manipulated. A knockdown of FAN1 would be the most likely avenue to pursue for identifying a positive control. This is straightforward to perform since successful FAN1 knockdowns have been described in the literature.

      We agree that a positive control to show that the model behaves as expected is necessary. We will add the experiments proposed by the reviewer in the revised version of the manuscript.

      The manuscript also looks at effects on gene expression measured by GFP fluorescence intensity. The potential significance is to see if disease-causing genes with expanded triplet repeats can be silenced by targeting chromatin-modifying enzymes. In the examples tested here, the answer seems to be no. Expression of DNMT1 or HDAC5 reduce fluorescence even in the absence of targeting. Upon targeting, there is a small further decrease, but the expanded triplet repeat resists this further decrease. Domain analysis of HDAC5 indicates that protein-protein interactions, not deacetylase activity, are important for silencing. The key interaction may be with HDAC3, since small molecule inhibition of HDAC3 relieved repeat length-dependent silencing by HDAC5. It was very curious that targeting HDAC3 actually increased expression, instead of silencing. The explanation for this observation was inadequate.

      We have added the following paragraph to the discussion to address this.

      It reads: “We found that targeting of PYL-HDAC3 increases gene expression slightly, independently of repeat size and in the presence of an inhibitor of its catalytic activity. Although this appears counterintuitive, several studies suggest that this is not unexpected. Specifically, HDAC3 has an essential role in gene expression during mouse development that is independent of its catalytic activity [73]. Moreover, HDAC3 binds more readily to genes that are highly expressed in both human and yeast cells [74,75]. The mechanism or function of HDACs binding to highly expressed genes are currently unknown.”

      The claim on page 16 final paragraph that the manuscript 'settled a central question for both HDAC5 and DNMT1 and their involvement in CAG/CTG repeat instability' is not supported by the data. Most of the results are negative so it is premature to claim the question is 'settled'.

      We have rephrased all the conclusions about this in the text, emphasizing that we find no evidence of a role in cis, rather than stating that there is no role in cis.

      Overall, with appropriate modifications described here, these experiments would be of interest with regards to potential therapies of triplet repeat expansion diseases, where silencing the expanded gene is the goal.

      **Minor concerns**

      P 4, last line. 59 bp should read 59 repeats - This is now fixed.

      P 5, line 2. 38 bp of what? This is now amended. It reads: “The CAG/CTG repeats affect splicing of the reporter in a length-dependent manner, with longer repeats leading to more robust insertion of an alternative CAG exon that includes 38 nucleotides downstream of the CAG, creating a frameshift [30].”

      P 10, first paragraph. DNA methylation levels rise from ~10% to ~20% with DNMT1 targeting. Is there a good precedent in the literature that the magnitude of this increase can be expected to be biologically meaningful?

      To our knowledge, it is the first time that DNMT1 is used for targeted epigenome editing. This is therefore the first evidence that targeting DNMT1 leads to silencing of a reporter construct. Nevertheless, this reviewer’s comment stands: is an increase in DNA methylation of 10 to 20% biologically relevant? The answer to this is yes, changes in 10-20% are known to have functional impact on gene expression in various settings (for example see the recent study in developing oocytes by Li et al Nature 2018). Furthermore, there is evidence that DNMT1 has weak de novo activity (Li et al Nature 2018, Wang et al Nat Genet 2020), consistent with a small increase in CpG methylation upon targeting. We now acknowledge in the discussion that one reason for the lack of effect upon targeting may be that the changes in CpG methylation are not dramatic enough. We also point out more clearly that changes of 10 to 20% are correlated with changes in repeat instability (Dion et al HMG 2008). We have amended the text to reflect this.

      The results now reads “To do so, we performed bisulfite sequencing after targeting PYL-DNMT1 for 30 days. This led to changes of 10 to 20% in the levels of CpG methylation, a modest increase(Fig. 3C), which is in line with the weak de novo methyltransferase activity of DNMT1 (for example see [39,40]). Similar changes in levels of CpG methylation in Dnmt1 heterozygous ovaries and testes were seen to correlate with changes in repeat instability in vivo [31].”

      The discussion now states: “It should be pointed out that there remains the possibility that DNMT1 targeting did not lead to large enough changes in CpG methylation to affect repeat instability.”

      P12 first paragraph. Text describing Fig 5 is confusing. First, GFP expression is referred to in terms of fold decrease, but subsequently in percent. Second, the ABA-induced silencing looks to reduce expression from about 0.6 to 0.5 of control. I presume this is where the claim of 16% comes from but it was not clear. Indeed, this is what we mean.

      We now state: “In 16B-Y-DNMT1 cells, ABA treatment decreased GFP expression by 2.2-fold compared to DMSO treatment alone. Surprisingly, ABA-induced silencing was 1.8 fold compared to DMSO alone, or 16% less efficient in 89B-Y-DNMT1 than in 16B-Y-DNMT1 cells.”

      P 15 paragraph 2. Where does the P value of 0.78 come from? Fig 7B shows no corresponding value. The P-value in figure 7B has now been corrected.

      Reviewer #1 (Significance (Required)):

      See above.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      We still do not know whether epigenetics contributes to repeat instability and/or transcriptional activity in unstable CAG/CTG repeat associated pathologies. The aim of this manuscript is to examine whether induced binding of DNMT1 (CpG methylation) or HDAC5 (histone H3 acetylation) modulates CAG/CTG repeat instability and/or gene silencing upon expansion. For this the authors developed a highly sophisticated reporter system (PlnT) that allows to recruit a specific chromatin modifying enzyme (DNMT1/ HDAC5) to a GFP reporter near a CAG/CTG expansion, in the course of transcription (Dox-inducible promoter). This is to determine whether the CTGs, when lengthened and transcribed, become unstable or impede gene activity via epigenetic modifications.

      We appreciate the reviewer highlighting the importance of the question that we address here and the usefulness of PInT.

      **Findings:**

      1.Binding of DNMT1 to the reporter results in a modest increase (~10%) in local DNA methylation, with no change in repeat instability.

      3.Targeting HDAC5 to the reporter results in local reduction in histone H3 acetylation, with no effect on repeat stability.

      4.DNMT1/HDAC5 binding reduces GFP intensity differentially, in normal but not expanded alleles.

      5.The N-terminal domain of HDAC5, when mutated, abolishes the reduction in GFP expression levels.

      6.RGFP966 abolishes the allele-specific effect of HDAC5, resulting in a general decrease in GFP expression regardless of repeat tract size

      7.CTG expanded alleles abolish the reduction in GFP repression by HDAC5 via HDAC3 activity

      **Conclusions:**

      Based on the results using the PlnT reporter assay, the authors claim that:

      1.HDAC5 and DNMT1 do not affect repeat instability in cis

      2.Expanded CAG/CTGs reduce the efficiency of gene silencing by targeting DNMT1/HDAC5 to the locus

      3.Gene silencing that is mediated by HDAC5 recruitment can be abolished by inhibition of HDAC3 activity

      Unfortunately, none of the claims in this manuscript are convincing.

      We note that in the comments below the reviewer does not include a reason why he/she does not find the claims convincing. We therefore cannot address this criticism.

      **General Comments:**

      The major drawback of the PlnT experimental approach is that it ignores the importance of the flanking regions and the genomic organization of the endogenous locus. This is a major concern as it makes the conclusions irrelevant to the related loci. In the case of myotonic dystrophy type 1, for example, the reporter should reside within a CpG island, should be positioned immediately next to CTCF binding site(s), and should be transcribed bi-directionally.

      HDAC3 and DNMT1 were found to have effects on repeat instability both at reporters, which do not harbour flanking sequences from disease loci, and indeed at endogenous loci in vivo (Dion et al HMG 2008, Debacker et al PLoS Biol 2012, Suelves et al Sci Rep 2017, Williams et al PNAS 2020). This highlights the fact that cis elements from disease loci are not required for chromatin modifiers to affect repeat instability.

      The reviewer is suggesting a very interesting set of experiments where specific sequences may be added to our reporter and tested for their influence on gene expression and on repeat instability. PInT is ideally suited for this and we have now added a paragraph highlighting this in the discussion. We have also highlighted that the current study aims to isolate the repeats from its cis-elements to specifically side-step potential locus-specific effects and to look for chromatin modifiers that would be useful for epigenome editing for as many loci as possible.

      Furthermore, only large expansions (at least several hundred copies) can trigger heterochromatin at the DM1 locus. None of these features are recapitulated by the PlnT reporter assay, making it difficult to draw any conclusion regarding the role of these chromatin modifying enzymes to the locus.

      This is true for DM1 but untrue for other disease loci. For example, we have shown that there are changes in the flanking chromatin marks at the SCA1 locus of a mouse model with 145 repeats (Dion et al HMG 2008), DNA methylation is also affected near a SCA7 transgene with 92 CAG repeats (Libby et al PLoS Genet 2008) and transgenes containing CAG repeats (without the flanking sequences) lead to silencing regardless of where the transgene is integrated in the genome (Saveliev et al Nature 2003). Moreover, HDAC5 had effects on repeat expansion in a cell-based shuttle system containing as few as 22 CAG repeats (Gannon et al NAR 2012), again suggesting that chromatin modifiers affect repeat instability in a wide range of repeat sizes. We have reviewed this in Dion and Wilson TiG 2009.

      In fact, the authors state in their Discussion that "targeting a chromatin modifying peptide to different loci can have very different effects"!

      This is indeed the case and the reason why we sought to control for locus-specific effects using an exogenous reporter.

      To better substantiate their conclusions the authors must set up an improved model system that takes into account the flanking regions and the 3D genomic organization of the locus (TADs). The preferable approach would be to insert a reporter cassette by homologous recombination into the differentially methylated/acetylated regions near the repeats, and compare between normal vs. expanded alleles.

      We would like to point out that we have recently published a study where we looked at 3D chromatin folding at the DM1, HD, and the GFP transgene used here. We did not find any evidence for changes in TADs that would underlie changes in repeat instability at these loci (Ruiz Buendia et al Sci Advances 2020). We therefore do not think that it would be important to further manipulate 3D genomic organization in this context.

      To be clear, we are not denying that cis elements are likely to have an effect, there is plenty of evidence supporting this. Rather, we are using a reporter assay to disentangle the potential locus-specific (or cis-element specific) effects from the trans-activating factors. In short, we focus on the trans-acting factors rather than on the cis-elements, as suggested by the reviewer.

      We believe that the addition of the following paragraph highlights the goal of our study and also bring in the idea that cis acting elements can be studied using PInT.

      It now reads:

      “We designed PInT specifically to isolate expanded repeats tracts from other potential locus-specific cis elements. This is helpful to identify factors that would affect instability and/or gene expression across several diseases. Moreover, both HDAC3 and DNMT1 were found to impact repeat instability at different loci, including at reporter genes [31,33,36,37,45]. These observations highlight that cis-acting elements from disease loci are not required by chromatin modifiers to affect repeat instability. A potential application of PInT includes cloning in specific cis elements, including CTCF binding sites and CpG islands, next to the repeat tract and evaluate their effects on instability with or without targeting. In fact, PInT can be used to clone any sequence of interest near the targeting site and can be applied for a wide array of applications, beyond the study of expanded CAG/CTG repeats.”

      My impression was that there is a lot of data but none of it makes sense.

      The focus of the manuscript is not entirely clear: it starts with monitoring the effect of epigenetics on repeat instability and gene activity, then it shifts to the mechanism by which HDAC5 functions, and ends with the allele-specific effect of HDAC5 on gene expression. I lost my train of thought.

      We have now improved the transitions in this new version of this manuscript. Specifically, at the core of this manuscript is the development of PInT, which is highly versatile and allowed us to study multiple aspects of expanded CAG/CTG repeat biology. We hope that it is now clearer.

      **Other concerns:**

      (1)the modest increase in methylation levels following DNMT1 recruitment (10%, reaching a total of 20% at the most) prevents from drawing any conclusions regarding the effect of methylation on stability or expression.

      As mentioned in the response to reviewer 1 above, although 10% to 20% of CpG methylation are associated with changes in gene expression in a variety of settings, we now point out that one reason for the lack of effect in cis is that the de novo activity of DNMT1 is too weak to produce an effect.

      (2)The effect of protein targeting on GFP levels should be better defined at the RNA/protein level. Does it act by blocking transcription? alternative splicing? or alters steady state levels?

      Although the exact mechanism remains unclear, this goes beyond the current scope of this study. All these possibilities remain possible as we pointed out in the discussion.

      (3)Fig 5: the scale is different for A vs. B and C. Also, better to compare the effect of targeting on equal sized expansions (either 91, 89 or 58 repeats).

      We have fixed the scale on the figures.

      Unfortunately, it is not possible to have the same repeat sizes for all the cell lines because by their very nature, repeats are unstable. We have added a note relating to this in the methods.

      It reads: “Notably, it is not possible to obtain several stable lines with the exact same repeat size as they are, by their nature, highly unstable. This is why we have lines with different repeat sizes. Furthermore, the sizes can change over time and upon thawing.”

      (4)Add asterix for significance in all figures.

      This has now been done.

      (5)Figure 6: show raw data rather than normalized.

      We have now added representative flow cytometry profiles for each construct as a new supplementary figure (S5).

      (6)Figure 7: there is a notable difference in GFP expression levels in untreated wild type control (16 CAG repeats) between A vs. B. Why?

      Fig. 7a shows PYL targeting only, whereas 7b shows the GFP expression upon PYL-HDAC5 targeting. The values for PYL-HDAC5 targeting are lower because targeting it, unlike targeting PYL alone, silences the reporter.

      (7)Avoid redundancy. No need to show schematic representations so many times.

      We believe that the schematics make it clearer for the reader.

      Reviewer #2 (Significance (Required)):

      REFEREES CROSS-COMMENTING

      I totally agree with the Reviewer #1 that the PinT targeting system is a potent experimental tool to study the function of specific chromatin binding proteins. However, the significance of the flanking regions is discounted.

      We hope it is now clear that we are not discounting the potential significance of flanking regions and that rather we have designed the system to avoid their potentially complicating effects.

      The fact that the recruitment of HDAC5 has resulted in a significant reduction in acetylated histones provides evidence for that "the targeted proteins can interact normally with partner proteins to form functional complexes". Still, I agree with that the activity of DNMT1 needs to be better established, considering the minor increase in DNA methylation levels.

      We will be using ChIP against interacting proteins of DNMT1 and HDAC5 to address this issue.

      The request for a positive control for repeat instability is totally correct.

      We will be adding this in the revised manuscript.

      It is difficult to discuss the missing effect of HDAC5 on contractions or the unexpected effect of HDAC3 on gene silencing bearing in mind the limits of the experimental system.

      There is no expectation for the effect of HDAC5 on contractions as this has not been studied in any system yet. However, we believe that there is no contractions not because of HDAC5 per se but rather because of the shorter repeat size this line has (see comment to reviewer 1 above). We have now addressed the “unexpected effect” of HDAC3 by citing a number of studies finding a similar evolutionary conserved effect (see comment to Reviewer 1 above).

      I also agree with the statement that "this manuscript settled a central question for both HDAC5 and DNMT1 and their involvement in CAG/CTG repeat instability", is not supported by the data.

      We have now rephrased our conclusions. In this particular case, we changed ‘settled’ to ‘addressed’. We have also rephrased this in the results headings.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      We still do not know whether epigenetics contributes to repeat instability and/or transcriptional activity in unstable CAG/CTG repeat associated pathologies. The aim of this manuscript is to examine whether induced binding of DNMT1 (CpG methylation) or HDAC5 (histone H3 acetylation) modulates CAG/CTG repeat instability and/or gene silencing upon expansion. For this the authors developed a highly sophisticated reporter system (PlnT) that allows to recruit a specific chromatin modifying enzyme (DNMT1/ HDAC5) to a GFP reporter near a CAG/CTG expansion, in the course of transcription (Dox-inducible promoter). This is to determine whether the CTGs, when lengthened and transcribed, become unstable or impede gene activity via epigenetic modifications.

      Findings:

      1.Binding of DNMT1 to the reporter results in a modest increase (~10%) in local DNA methylation, with no change in repeat instability.

      3.Targeting HDAC5 to the reporter results in local reduction in histone H3 acetylation, with no effect on repeat stability.

      4.DNMT1/HDAC5 binding reduces GFP intensity differentially, in normal but not expanded alleles.

      5.The N-terminal domain of HDAC5, when mutated, abolishes the reduction in GFP expression levels.

      6.RGFP966 abolishes the allele-specific effect of HDAC5, resulting in a general decrease in GFP expression regardless of repeat tract size

      7.CTG expanded alleles abolish the reduction in GFP repression by HDAC5 via HDAC3 activity

      Conclusions:

      Based on the results using the PlnT reporter assay, the authors claim that:

      1.HDAC5 and DNMT1 do not affect repeat instability in cis

      2.Expanded CAG/CTGs reduce the efficiency of gene silencing by targeting DNMT1/HDAC5 to the locus

      3.Gene silencing that is mediated by HDAC5 recruitment can be abolished by inhibition of HDAC3 activity

      Unfortunately, none of the claims in this manuscript are convincing.

      General Comments:

      The major drawback of the PlnT experimental approach is that it ignores the importance of the flanking regions and the genomic organization of the endogenous locus. This is a major concern as it makes the conclusions irrelevant to the related loci. In the case of myotonic dystrophy type 1, for example, the reporter should reside within a CpG island, should be positioned immediately next to CTCF binding site(s), and should be transcribed bi-directionally. Furthermore, only large expansions (at least several hundred copies) can trigger heterochromatin at the DM1 locus. None of these features are recapitulated by the PlnT reporter assay, making it difficult to draw any conclusion regarding the role of these chromatin modifying enzymes to the locus. In fact the authors state in their Discussion that "targeting a chromatin modifying peptide to different loci can have very different effects"! To better substantiate their conclusions the authors must set up an improved model system that takes into account the flanking regions and the 3D genomic organization of the locus (TADs). The preferable approach would be to insert a reporter cassette by homologous recombination into the differentially methylated/acetylated regions near the repeats, and compare between normal vs. expanded alleles.

      My impression was that there is a lot of data but none of it makes sense.

      The focus of the manuscript is not entirely clear: it starts with monitoring the effect of epigenetics on repeat instability and gene activity, then it shifts to the mechanism by which HDAC5 functions, and ends with the allele-specific effect of HDAC5 on gene expression. I lost my train of thought.

      Other concerns:

      (1)the modest increase in methylation levels following DNMT1 recruitment (10%, reaching a total of 20% at the most) prevents from drawing any conclusions regarding the effect of methylation on stability or expression.

      (2)The effect of protein targeting on GFP levels should be better defined at the RNA/protein level. Does it act by blocking transcription? alternative splicing? or alters steady state levels?

      (3)Fig 5: the scale is different for A vs. B and C. Also, better to compare the effect of targeting on equal sized expansions (either 91, 89 or 58 repeats).

      (4)Add asterix for significance in all figures.

      (5)Figure 6: show raw data rather than normalized.

      (6)Figure 7: there is a notable difference in GFP expression levels in untreated wild type control (16 CAG repeats) between A vs. B. Why?

      (7)Avoid redundancy. No need to show schematic representations so many times.

      Significance

      REFEREES CROSS-COMMENTING

      I totally agree with the Reviewer #1 that the PinT targeting system is a potent experimental tool to study the function of specific chromatin binding proteins. However, the significance of the flanking regions is discounted. The fact that the recruitment of HDAC5 has resulted in a significant reduction in acetylated histones provides evidence for that "the targeted proteins can interact normally with partner proteins to form functional complexes". Still, I agree with that the activity of DNMT1 needs to be better established, considering the minor increase in DNA methylation levels. The request for a positive control for repeat instability is totally correct. It is difficult to discuss the missing effect of HDAC5 on contractions or the unexpected effect of HDAC3 on gene silencing bearing in mind the limits of the experimental system. I also agree with the statement that "this manuscript settled a central question for both HDAC5 and DNMT1 and their involvement in CAG/CTG repeat instability", is not supported by the data.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript describes two advances. First is the technical development for a protein targeting system called PInT that brings a target protein close to (~320 bp) a DNA sequence of interest. The idea is that localisation of the target protein allows one to distinguish its effects on the DNA sequence either in cis (when targeted) or in trans (when not targeted but expressed at the same level). Since targeting is conveyed by simply adding the small molecule ABA to the experiment, it is easy to compare the two situations. This is a clever idea and it is substantiated by data showing that the components of PInT do not affect triplet repeat instability or gene expression of GFP, into whose gene the PInT system is placed. Moreover, targeting is shown to enable enzymatic activity in the targeted region. Using the DNA methylase DNMT1, there are local increases in DNA methylation. Similarly, targeting the histone deacetylase HDAC5 results in local decreases in histone H3 acetylation. What is not clear from these experiments, however, is whether the targeted proteins can interact normally with partner proteins to form functional complexes. One necessary control is to add ChIP for at least one interacting protein each for DNMT1 and for HDAC5 and show that targeting permits normal protein-protein interactions. This experiment is straightforward as specific interacting proteins are known and good antibodies to precipitate those proteins are available. Overall, PInT would likely be useful for many groups studying the effects of chromatin modifiers on a DNA sequence of interest.

      The second advance is conceptual and is focused more specifically on triplet repeat expansions. The manuscript describes experiments that measure genetic instability of long CAG-CTG repeats with and without protein targeting. The results show that allele size distributions are not significantly affected by targeting either DNMT1 or HDAC5. One curious outcome that is not discussed is contraction frequency in the HDAC5 experiment. Zero contractions are reported compared to 10-20% contractions in the other two experiments. Authors need to provide an explanation. The major issue with this set of experiments is that there is no positive control where instability is shown to be clearly manipulated. A knockdown of FAN1 would be the most likely avenue to pursue for identifying a positive control. This is straightforward to perform since successful FAN1 knockdowns have been described in the literature. The manuscript also looks at effects on gene expression measured by GFP fluorescence intensity. The potential significance is to see if disease-causing genes with expanded triplet repeats can be silenced by targeting chromatin-modifying enzymes. In the examples tested here, the answer seems to be no. Expression of DNMT1 or HDAC5 reduce fluorescence even in the absence of targeting. Upon targeting, there is a small further decrease, but the expanded triplet repeat resists this further decrease. Domain analysis of HDAC5 indicates that protein-protein interactions, not deacetylase activity, are important for silencing. The key interaction may be with HDAC3, since small molecule inhibition of HDAC3 relieved repeat length-dependent silencing by HDAC5. It was very curious that targeting HDAC3 actually increased expression, instead of silencing. The explanation for this observation was inadequate. The claim on page 16 final paragraph that the manuscript 'settled a central question for both HDAC5 and DNMT1 and their involvement in CAG/CTG repeat instability' is not supported by the data. Most of the results are negative so it is premature to claim the question is 'settled'. Overall, with appropriate modifications described here, these experiments would be of interest with regards to potential therapies of triplet repeat expansion diseases, where silencing the expanded gene is the goal.

      Minor concerns

      P 4, last line. 59 bp should read 59 repeats

      P 5, line 2. 38 bp of what?

      P 10, first paragraph. DNA methylation levels rise from ~10% to ~20% with DNMT1 targeting. Is there a good precedent in the literature that the magnitude of this increase can be expected to be biologically meaningful?

      P12 first paragraph. Text describing Fig 5 is confusing. First, GFP expression is referred to in terms of fold decrease, but subsequently in percent. Second, the ABA-induced silencing looks to reduce expression from about 0.6 to 0.5 of control. I presume this is where the claim of 16% comes from but it was not clear.

      P 15 paragraph 2. Where does the P value of 0.78 come from? Fig 7B shows no corresponding value.

      Significance

      See above.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Summary

      The authors present well written work on the evolution of proteome size and complexity, and the corresponding changes in chaperone proteins. Interestingly, they find chaperone copy numbers increase linearly with proteome size, despite the increasing 'complexity' of, in particular, post-LECA genomes. They suggest that to address the rise in complexity, organisms express chaperones at higher levels and an expanding network of co-chaperones has evolved across the tree of life.

      Major comments

      Comment-1. Summary reads strangely relative to the rest of the manuscript, and lists facts in a way that makes the purpose of the study confusing. I think most readers will dislike the characterisation of evolution as a progress from simple to complex, and the authors' might want to avoid this language throughout the manuscript- bacteria and archaea have also been evolving over this period of times, and have not become more 'complex'? Similarly the authors should reconsider their figure legend titles. As a specific example, 'in the course of evolution' should become 'across the tree of life'.

      Response

      Thank you for these crucial suggestions. We agree with the reviewer, and with Reviewer 2 (see below) that bacteria and archaea have also been evolving since their emergence, so basically, we (humans) and the simplest archaea have the same evolutionary origin. However, we all agree that the simplest archaea/bacteria are far more similar to LUCA than we are. That said, we accept the criticism that putting our analysis in the context of evolutionary time is an over-interpretation given that we have not examined the protein/proteome phylogeny (in relation to proteome complexity; for chaperones we have). We have thus reformulated the figures and text, to a comparison across the Tree of life, rather than a time-dependent evolutionary process. Specifically: as a first step, we revised the Figures to rename the X-axis as “Order of divergence”, rather than “Divergence time (million years)” in the previous version. In the revised main text we emphasized the fact that the branch lengths of the Tree of Life represent the relative order of divergence of the different clades, rather than time. All instances of ‘in the course of evolution’ has been replaced by ‘across the Tree of Life’.

      Secondly, we revised the main text to emphasize on prokaryote vs. eukaryote comparison, rather than comparing organisms that diverged at different time-points. Within bacterial and archaeal domains, proteomes do not seem to expand against the order of divergence (as the reviewer argued, bacteria and archaea have not become more complex, also see Comment-5).

      Thirdly, the word ‘complexity’ has been omitted from the manuscript. The section “The expansion of proteome complexity” now reads as “Proteome expansion by de novo innovations”. In the previous version, increasing complexity in fact implied a torrent of de novo innovations that impose a larger burden on the chaperone machinery. Instead of ‘complexity’, the latter is clearly stated in the revised manuscript.

      In the spirit of these changes, the title of the revised manuscript, figure legend titles, and related section titles have been edited as follows.

      Submitted version

      Revised version

      Paper title. On the evolution of chaperones and co-chaperones and the exponential expansion of proteome complexity

      On the evolution of chaperones and co-chaperones and the expansion of proteomes across the Tree of Life

      Section title. A Tree of Life analysis of the expansion of proteome complexity and chaperones

      A Tree of Life analysis of the expansion of proteomes and chaperones

      Section title. The expansion of proteome size

      The expansion of proteome size across the Tree of Life

      Section title. The expansion of proteome complexity

      Proteome expansion by de novo innovations

      Figure 1 legend title. Expansion of proteome size

      Expansion of proteome size across the Tree of Life

      Figure 2 legend title. Expansion of proteome complexity

      Expansion of proteomes by de novo innovations

      Further, changes have been made in the Summary and in the main text to exclude any impression that proteomes/organisms have become more complex with time. Rather we emphasized prokaryote versus eukaryote comparison.

      Comment-2. I think the manuscript would be improved if the authors significantly shortened the discussion of genome size evolution- this is fairly well understood, and could be covered briefly, especially as the main focus of the manuscript is on the evolution of chaperone and co-chaperone repertoire. They could also make clearer quantitative links between protein complexity and the evolution of chaperones and co-chaperones- perhaps this should be in the discussion? The authors might also consider referencing 'The evolution of genome complexity', which could be relevant to this manuscript and might make the work of broader interest.

      Response

      We thank the reviewer for this suggestion. The main focus of our paper is indeed the evolution of chaperones and co-chaperones but within the context of the expansion of proteomes. Having this focus in place, the discussion on proteome size evolution (section: The expansion of proteome size across the Tree of Life) has been revised and shortened to emphasize more on prokaryote versus eukaryotic comparison.

      The suggestion to provide “clearer quantitative links between protein complexity and the evolution of chaperones and co-chaperones” is indeed very useful and we authors sincerely thank the reviewer. To address this suggestion we revised Figure 4 to quantitatively compare the expansion of proteomes and that of chaperones, under one roof. This Figure compares proteome parameters that supposedly demands more chaperone action in all three domains of life and simultaneously summarizes the expansion of the chaperone machinery lacking de novo innovations.

      The first paragraph of the Discussion section has been revised accordingly that walks the reader through the revised Figure 4 and finally introduces to the dichotomy it implies.

      We did not understand the last comment “The authors might also consider referencing 'The evolution of genome complexity', which could be relevant to this manuscript and might make the work of broader interest.” We’d be glad to address it upon further clarification.

      Comment-3. The authors state 'protein trees were generated and compared with ToL to account for gene loss and transfer events'. The methodology for this procedure is not given in the manuscript. The authors should back up this point, and make it clear this is why they reconstruct the trees. Currently it is not convincing to me that the authors have found HGT given the considerable phylogenetic uncertainty in the basal events in the tree of life. I also expect the tree of a single protein to be potentially lack information due to the short sequence considered and possible lack of power. The authors need to consider whether the data is really of high enough quality to assess this.

      Response

      Thank you for this suggestion. For the various chaperone families, we manually compared the protein trees with the Tree of Life. This is clearly stated in the revised Methods section (see Page 25, Lines 31-32). We agree, however, that the identifying HGT, and in general, trees of single domains that are highly diverged, are tricky. We did our best to address these caveats. Specifically:

      We re-evaluated our work in the light of a recent study (PMID: 32316034). This paper discussed the phylogenetic uncertainties associated with molecular dating and re-evaluated the assignment of several protein families to LUCA. A careful analysis revealed that the reviewer is indeed right, meaning many of the HGT events shown in the previous version Figure 3B was indistinguishable from the phylogenetic uncertainties.

      Accordingly, we revised the section “The core-chaperones emerged in early-diverging prokaryotes”. We removed the previous version Figure 3B, along with all instances of HGT events mentioned in the main text, except one (archaea to Firmicute HGT of HSP60, which is well-supported by the data and was also detected previously). Dating the emergence of chaperone families was also re-evaluated. Though the major conclusions were not altered, we discussed the phylogenetic uncertainties associated with our work and the overall confidence of each dating analysis. We believe these discussions would be very useful to the readers.

      Finally, we note that most of our key assignments (points of emergence, and major HGT events) are in agreement with previous works. Specifically: the emergence of HSP20 and HSP60 to LUCA (Sousa et al., 2016; Weiss et al., 2016) and HSP60 being horizontally transferred from archaea to Firmicute (Techtmann and Robb, 2010) and HSP20 being horizontally transferred between bacterial clades and between bacteria and archaea (Kriehuber et al., 2010).

      Comment-4. Methods- the authors could consider taking an alternative source of LUCA proteins, rather than those found in 'Nanoarchaeota and Aquificae': it's possible these are not representative of LUCA, and it seems a somewhat arbitrary choice- the authors could consider using one of the available curated sets, such as that generated by Ranea et al. (2006).

      Response

      The reviewer is right that a more robust LUCA set could be used. However, given that the revised manuscript focuses on comparison across the ToL, and foremost on prokaryote versus eukaryote comparison, we don’t think that refining this set is important. Foremost, this set was used for one purpose only, for determining changes in domain length. And, the set of 38 X-groups used for this analysis are in fact, the ones present in all organisms across the ToL. Hence, we kept the original analysis, while mentioning that these 38 X-groups are conserved across the ToL, and removed the argument for LUCA assignment. See Page 5, Line 22.

      Comment-5. The patterns observed might only hold because of differences in the taxa that diverged pre and post LECA? The authors might consider subgroup analyses to ensure this is not the case. The authors could also consider using methods that take phylogeny into account.

      Response

      The reviewer is right that within prokaryotic domains proteomes do not seem to expand. For example, excluding a few early-diverging prokaryotes and parasites, proteome size in bacteria and archaea varies within 2000-3000 proteins per proteome. Only when pre-LECA and post-LECA organisms are compared, significant differences are observed. We thank the reviewer for this suggestion. We revised the main text to focus on prokaryote versus eukaryote comparison. This re-focusing does not change any of our major conclusions, but rather puts our analysis in the right context (see Comment 1).

      Minor comments

      Comment-6. 'Life's habitability has also expanded from its 10 specific niche of emergence-likely deep-sea hydrothermal vents, to highly variable and extreme 11 ranges of temperature, pressure, exposure to high UV-light, dehydration and free oxygen.' This is not really correct, as bacteria and archaea are found worldwide, and in the most extreme environments.

      Response

      Thank you for this suggestion. We removed the above-mentioned sentence.

      Comment-7. 'We reconciled the topology of our tree'- on first read this was not clear, I did not realise the authors were only building trees for subsets of the data- time tree is the best source for the overall topology. The phrase 'manually curated and adjusted' is used in the methods. This language is much too vague, and not a clear explanation of the steps taken.

      Response

      We apology for this confusion. The overall topology of our Tree of Life is indeed taken for TimeTree. We edited the text in Page 4, Line 4 to clarify this issue.

      The obtained tree topology was manually curated and adjusted to depict eukaryotes stemming from Asgard archaea and Alphaproteobacteria, by an endosymbiosis event. This is clearly mentioned in the Methods section (see Page 22, Lines 24-28).





























      Reviewer #2

      Summary

      Rebeaud and colleagues analyze evolution of chaperones compared to the evolution of whole proteome complexity across the entire tree of life. Their principal conclusions are well captured in the following quote from the Discussion:

      "Comparison of the expansion of proteome complexity versus that of core-chaperones presents a dichotomy-a linear expansion of core-chaperones supported an exponential expansion of proteome complexity. We propose that this dichotomy was reconciled by two features that comprise the hallmark of chaperones: the generalist nature of core-chaperones, and their ability to act in a cooperative mode alongside co-chaperones as an integrated network. Indeed, in contrast to core chaperones, there exist a consistent trend of evolutionary expansion of co-chaperones."

      Major comments

      Comment-1. The general theme of the evolution of proteome management is of obvious interest. Unfortunately, the entire analysis is shaky and fails to convincingly ascertain the authors' conclusions. There are many issues. Throughout the manuscript, the authors discuss 'expansion' of the proteome in bacteria, archaea and eukaryotes, creating the impression of a consistent evolutionary trend. No such trend actually exists if one considers the means or medians of proteome sizes within each of the three domains of life (there is a transition to greater complexity in eukaryotes). The maximum complexity, certainly, increases with time which can be attributed to the 'drunkard's walk' effect. This hardly qualifies as 'expansion'.

      Response

      The reviewer is right that within prokaryotes proteomes do not seem to significantly expand. Reviewer-1 raised a similar concern that prokaryotes and eukaryotes have been evolving for the same period of time and have not expanded significantly. We understand the misconception instated by the earlier version and we thank the reviewers for pointing it out. Accordingly, we revised the main text to clarify these issues, as described in the following.

      Firstly, the main text was revised to emphasize on prokaryote versus eukaryote comparison. The reviewer agrees that compared to prokaryotes, “there is a transition to greater complexity in eukaryotes”. This re-focusing does not change any of our major conclusions, but rather provides a systematic comparison that is adequately supported by data.

      Secondly, we revised the Figures to rename the X-axis as “Order of divergence”, rather than “Divergence time (million years)” in the previous version. We emphasized the fact that the X-axis actually represent the relative order of divergence of the different clades, rather than absolute dates. This emphasis certainly does not create the impression of a consistent evolutionary trend. Instead, combined with the revised main text, it depicts that only when pre-LECA and post-LECA organisms are compared, clear trends of proteome expansion is observed.

      Comment-2. The authors further claim a 'linear' expansion of the chaperone set and 'exponential' expansion of the total proteome size. These are precise mathematical terms and, as such, require fitting to the respective functions. No such thing in this manuscript. Even apart from that shortcoming, the explanation of both 'linear' and 'exponential' are quite confusing. Thus, when explaining the 'linearity' of chaperone evolution, the authors refer to the lack of major innovation among the chaperones. This is correct in itself but has nothing to do with linearity. Apart from the aforementioned conceptual problems, the estimation of the 'exponential' growth of the proteome are naive, inconsistent and inaccurate.

      Response

      Our uses of ‘linear expansion’ versus ‘exponential expansion’ may have been confusing although we have defined quite clearly what we mean by that (i.e., that it is not the mathematical sense). The statement regarding “the lack of major innovation among the chaperones” was made in this context/definition and was consistent with it.

      Nonetheless, to avoid confusion, we revised the main text by excluding the ‘linear expansion’ and ‘exponential expansion’ terms. We simply stated that a torrent of de novo innovations has occurred during the expansion of proteomes from prokaryotes to eukaryotes. In contrast, the evolutionary history of core-chaperones lacks such major innovations. Accordingly, the title of the revised manuscript, figure legend titles, and related section titles have been edited as follows.

      Submitted version

      Revised version

      Paper title. On the evolution of chaperones and co-chaperones and the exponential expansion of proteome complexity

      On the evolution of chaperones and co-chaperones and the expansion of proteomes across the Tree of Life

      Section title. A Tree of Life analysis of the expansion of proteome complexity and chaperones

      A Tree of Life analysis of the expansion of proteomes and chaperones

      Section title. The expansion of proteome complexity

      Proteome expansion by de novo innovations

      Figure 1 legend title. Expansion of proteome size

      Expansion of proteome size across the Tree of Life

      Figure 2 legend title. Expansion of proteome complexity

      Expansion of proteomes by de novo innovations

      Comment-3. As the base point for the expansion estimates for archaea and eukaryotes, the authors take parasitic forms. Even leaving aside the highly dubious claims that these organisms belong to the clades that diverged first from the respective ancestors, parasites are not an appropriate choice for such estimates because they certainly are products of reductive evolution. For bacteria, inconsistently, the authors choose a free-living form from a dubious ancient clade, and not even the one with the smallest genome. All taken together, this robs the expansion estimates of any substantial meaning.

      Response

      This point is overall valid. Although we adamantly reject the insinuation of “dubious claims that these organisms belong to the clades that diverged first from the respective ancestors” – firstly, we did not make any claims to this end, but took the ToL constructed by others (Hedges et al., 2015); second, that these claims are dubious need to backup by counter-evidence/data and with all due respect, neither were provided by the reviewer. However, what is of concern is that in a symbiont/parasite chaperones of the host may have a key role, and thus the comparison to free-living organisms could be misleading. To address this concern we excluded the obligatory endosymbiont Nanoarchaeum equitans and the parasitic organisms from the expansion estimates and such discussions are now limited to free-living organisms only. Further, as described in response to Comment-1, the revised manuscript focuses on prokaryote versus eukaryote comparison.

      Note that phylogenetic analysis often assigns parasitic and symbiotic organisms that have experienced reductive evolution as the earliest diverging clades of their corresponding kingdoms of life. Examples include Nanoarchaeum equitans, an obligate symbiont, assigned as the earliest diverging archaea (Hedges et al., 2015; Huber et al., 2002; Waters et al., 2003), and parasitic Excavate assigned as one of the earliest diverging eukaryotes (Burki et al., 2020; Simpson et al., 2002). In accordance with these studies, these parasitic and symbiotic organisms were included in our analysis. We acknowledged this fact in the Methods section (see Page 22, Lines 9-16).

      Comment-4. The authors do make a salient and I think essentially correct observation: chaperones typically comprise about 0.3% of the proteins in any organism. As such, this presents no dichotomy in evolutionary trends to be explained. Surely, as examined and discussed in the paper, eukaryotes also show significant increases in the size and domain content of the encoded proteins, suggesting the possibility that might need more chaperones. However, if this is the explanandum, rather than the number of proteins in the proteome as such, it should be clearly stated. Furthermore, it is quite natural to assume that this increase in protein complexity without a commensurate increase in the chaperone diversity, is enabled by higher expression of the chaperones as suggested in the Discussion of this paper. I doubt there is any big surprise here and even much need for an extended discussion let alone a special publication.

      Response

      As emphasized, and shown, eukaryotes have not only larger proteomes in terms of the number of proteins or protein size. They have a higher content of proteins that are prone to misfolding. This is shown explicitly, in Figure 2 (namely, multidomain proteins, repeat, beta-rich proteins, etc’) and is reiterated in a summary figure (suggested by Reviewer 1). Further, in response to Reviewer-3’s suggestion, we showed that eukaryotes feature much higher proportions of aggregation-prone proteins per proteome than prokaryotes (Figure 2E).

      To further clarify, we revised Figure 4 to quantitatively compare the expansion of proteomes and that of chaperones, under one roof. This Figure compares proteome parameters that supposedly demands more chaperone action in all three domains of life and simultaneously summarizes the expansion of the chaperone machinery lacking de novo innovations.

      In addition, the first paragraph of this Discussions section is revised to state that from prokaryotes to eukaryotes, proteomes have expanded by duplication-divergence as well as by innovations (de novo emergence of new folds). Thus, it’s not about the size only (a challenge that a proportion expansion of chaperone genes would resolve, i.e., the 0.3%) but about proteome composition changing in a way that demands more and more chaperone action.

      We also agree with the assertion that “it is quite natural to assume that this increase in protein complexity without a commensurate increase in the chaperone diversity, is enabled by higher expression of the chaperones”. However, we belong to a group of scientists for whom natural assumptions are insufficient, and think that supporting evidence is of importance.

      Reviewer’s significance statement

      As such, in the opinion of this reviewer, there is no substantial advance over the existing knowledge in this paper. Should the authors wish to revise, they would need to develop robust methodology to measure proteome expansion. That would involve starting from reconstructed ancestors rather than any extant forms (let alone parasites). I doubt that such analysis, non-trivial in itself, reveals an strong, consistent trends other than the well known increase in complexity in eukaryotes.

      Response

      We agree that to assert evolutionary, time-dependent trends one needs to analyze phylogenies and reconstructed ancestors, but still think that a comparison of proteome and chaperone contents along the Tree of Life is meaningful. We thus respectfully, yet adamantly disagree with “no substantial advance over the existing knowledge”. We strongly believe, as does Reviewer-3, that the results and the model presented in this paper are “fascinating to consider and… will stimulate a good deal of important discussion…”.

      Reviewer #3

      Summary

      The manuscript by Rebeaud et al describes phylogenetic analyses of proteome and chaperone complexity. The authors analyzed species across the tree of life to predict the proteome and chaperone properties of ancestors spanning to the last universal common ancestor. Their analyses indicate that many proteome properties increased in complexity over evolutionary time including: average protein length, the number of multi-domain proteins, the size of the proteome, the number of repeat proteins, and the number of beta-superfold proteins that are known to be difficult to fold. Their analyses also indicate an expansion in chaperone families that corresponds to the increase in proteome complexity. Based on their analyses, the authors propose a model where early life relied on a limited number of chaperones (Hsp20 and Hsp60) and that as proteome complexity evolved, so did chaperone complexity. Core chaperones including Hsp90, Hsp70, and Hsp100 evolved relatively early, and later chaperone evolution was driven by the appearance and alterations of co-chaperones and auxiliary factors as well as by increases in the protein abundance of chaperones.

      Major concerns

      Comment-1. This work is appropriately based on phylogenetic inferences, but as such, the limitations and uncertainties of phylogenetic inferences need to be discussed. This in no way takes away from the work, quite the opposite, it would make it richer by encouraging broader interpretations where justified and clear understanding of where support for the model is strongest. Posterior probabilities need to be discussed and the range of properties that a likely ancestor might have based on the data should be discussed. How this impacts the conclusions and models should be discussed. Throughout the manuscript, the authors present most-likely ancestral models (as I understood it), what are the next most likely models? How much power is there to distinguish one model from another? It would be very helpful to have a section describing the limitations and uncertainties of the phylogenetic analyses and how these relate to the main findings and conclusions.

      Response

      We thank the reviewer for this suggestion. Reviewer-1 raised a similar suggestion (see Comment-3). The phylogenetic analysis in our paper included dating the emergence of core- and co-chaperone families, and attempt to infer major their HGT events, foremost in relation to the origin of eukaryotic chaperones. To highlight the uncertainties of phylogenetic inferences we re-evaluated our work in the light of a recent study (PMID: 32316034) that carefully analyzed the uncertainties associated with the assignment of several protein families to LUCA.

      Ideally, for a protein family to be assigned to LUCA, there must be a single split of bacterial and archaeal domains at the root of the protein tree with strong bootstrap support, and the inter-domain branches would be longer than the intra-domain branches (PMID: 32316034). In the revised main text we discussed that only the HSP60 protein tree satisfies this criterion. HSP20 protein tree depicts a clear single split of bacterial and archaeal domains at the root, albeit with weak bootstrap support, and inter-domain branch lengths are smaller than intra-domain branch-lengths. We discussed that this is indeed the case of phylogenetic uncertainty, which means the sequence of this small, single-domain chaperone lacks the information to make reliable inference at the basal events in the ToL.

      In addition, the HGT events discussed in the previous version appear to be indistinguishable from phylogenetic uncertainties and we removed all instances of HGT events mentioned in the main text as well as Figure 3B. Only one HGT event – HSP60 being horizontally transferred from archaea to Firmicute, which is well-supported by the data is kept in the revised main text. We believe these discussions would be very useful to the readers.

      Finally, we note that most of our key assignments (points of emergence, and major HGT events) are in agreement with previous works. Specifically: the emergence of HSP20 and HSP60 to LUCA (Sousa et al., 2016; Weiss et al., 2016) and HSP60 being horizontally transferred from archaea to Firmicute (Techtmann and Robb, 2010) and HSP20 being horizontally transferred between bacterial clades and between bacteria and archaea (Kriehuber et al., 2010).

      Comment-2. General features that impact foldability, including contact order, should be discussed and what features can be searched for in genomes that relate to these - e.g. beta-rich proteins.

      Response

      Thanks for this valuable idea! Contact order, and other predictors of problematic folding are highly relevant but their analysis is structure-based and hence inapplicable on the proteome (sequence) scale. We did, hwoever, estimate the proportion of aggregation-prone proteins in the proteome. These proteins were identified by CamSol method that assigns poorly soluble regions from sequence data. Indeed, some of these predicted ‘poorly soluble segments’ refer to the hydrophobic core of the respective folded state instead of ‘true’ aggregation hotspots. With this unavoidable potential caveat, it appears that compared to prokaryotes, aggregation-prone proteins in the proteome have become nearly 6-fold more frequent in Chordates.

      Following changes were made to accommodate this new analysis:

      Figure 2 is revised to include a new panel (panel-E) that shows the expansion of aggregation-prone proteins in the proteome across the Tree of Life. The same result is summarized in the summary Figure 4.

      A new paragraph entitled “Proteins predicted as aggregation-prone became ~6-fold more frequent in the proteome” is added to the Results section, which describes the principle and the main results (see Page 7, Lines 14-28).

      The methodology is included in the Methods section, in a paragraph entitled “Predicted proportion of aggregation-prone proteins in the proteome”, see Page 24 Lines 17-27. For each representative organism, the percent of aggregation-prone proteins in proteome data are provided as Data S10.

      This analysis is also included in the revised Abstract: “Proteins prone to misfolding and aggregation, such as repeat and beta-rich proteins, proliferated ~600-fold, and accordingly, proteins predicted as aggregation-prone became 6-fold more frequent in mammalian compared to bacterial proteomes.” See Page 2, Lines 7-9.

      Comment-3. "Core" chaperones needs to be defined.

      Response

      Thank you for this suggestion. We restructured Page 3 Lines 19-23 in the Introduction to clearly explain this aspect. The current text is quoted below.

      “Chaperones can be broadly divided into core- and co-chaperones. Core-chaperones can function on their own, and include ATPases HSP60, HSP70, HSP100, and HSP90 and the ATP-independent HSP20. The basal protein holding, unfolding, and refolding activities of the core-chaperones are facilitated and modulated by a range of co-chaperones such as J-domain proteins (Caplan, 2003; Duncan et al., 2015; Schopf et al., 2017).”

      Minor concerns and thoughts

      Comment-4. This manuscript stimulated me to think about the dynamics between chaperone evolution and proteome evolution. The ability to tolerate proteins that need chaperones seems linked to major evolutionary innovations. Once you have these innovations though, you are addicted to the chaperones - and an expansion of the number of sub-optimal proteins. These ideas seem like they would be valuable to include in the discussion of this work. More generally, it would be wonderful to have a discussion of future directions that this work may spark.

      Response

      This is indeed a fascinating question or set of questions, that we have also become intrigued about following this work, We introduced a short section, though more of an ‘appetizer’ than a detailed discussion, as we know almost nothing about the co-evolution of new proteins and chaperones.

      Reviewer’s significance statement

      This manuscript provides a fascinating glimpse back in time of a fundamental interplay - between chaperone evolution/addiction and proteome evolution. I am not an expert in phylogenetic analyses so I cannot judge the details of the analyses. As an expert in molecular evolution and chaperones, I found the approach and model fascinating to consider and I believe it will stimulate a good deal of important discussion in these fields. I have one major concern that I feel ought to be addressed in the manuscript and a number of points that I would encourage the authors to consider. I am sure that these can be readily addressed and I look forward to seeing this work published and the further discussion and ideas that it may stimulate.

      Response

      Thank you!

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript by Rebeaud et al describes phylogenetic analyses of proteome and chaperone complexity. The authors analyzed species across the tree of life to predict the proteome and chaperone properties of ancestors spanning to the last universal common ancestor. Their analyses indicate that many proteome properties increased in complexity over evolutionary time including: average protein length, the number of multi-domain proteins, the size of the proteome, the number of repeat proteins, and the number of beta-superfold proteins that are known to be difficult to fold. Their analyses also indicate an expansion in chaperone families that corresponds to the increase in proteome complexity. Based on their analyses, the authors propose a model where early life relied on a limited number of chaperones (Hsp20 and Hsp60) and that as proteome complexity evolved, so did chaperone complexity. Core chaperones including Hsp90, Hsp70, and Hsp100 evolved relatively early, and later chaperone evolution was driven by the appearance and alterations of co-chaperones and auxiliary factors as well as by increases in the protein abundance of chaperones.

      Major concerns:

      1. This work is appropriately based on phylogenetic inferences, but as such, the limitations and uncertainties of phylogenetic inferences need to be discussed. This in no way takes away from the work, quite the opposite, it would make it richer by encouraging broader interpretations where justified and clear understanding of where support for the model is strongest. Posterior probabilities need to be discussed and the range of properties that a likely ancestor might have based on the data should be discussed. How this impacts the conclusions and models should be discussed. Throughout the manuscript, the authors present most-likely ancestral models (as I understood it), what are the next most likely models? How much power is there to distinguish one model from another? It would be very helpful to have a section describing the limitations and uncertainties of the phylogenetic analyses and how these relate to the main findings and conclusions.
      2. General features that impact foldability, including contact order, should be discussed and what features can be searched for in genomes that relate to these - e.g. beta-rich proteins.
      3. "Core" chaperones needs to be defined.

      Minor concerns and thoughts:

      1. This manuscript stimulated me to think about the dynamics between chaperone evolution and proteome evolution. The ability to tolerate proteins that need chaperones seems linked to major evolutionary innovations. Once you have these innovations though, you are addicted to the chaperones - and an expansion of the number of sub-optimal proteins. These ideas seem like they would be valuable to include in the discussion of this work. More generally, it would be wonderful to have a discussion of future directions that this work may spark.

      Significance

      This manuscript provides a fascinating glimpse back in time of a fundamental interplay - between chaperone evolution/addiction and proteome evolution. I am not an expert in phylogenetic analyses so I cannot judge the details of the analyses. As an expert in molecular evolution and chaperones, I found the approach and model fascinating to consider and I believe it will stimulate a good deal of important discussion in these fields. I have one major concern that I feel ought to be addressed in the manuscript and a number of points that I would encourage the authors to consider. I am sure that these can be readily addressed and I look forward to seeing this work published and the further discussion and ideas that it may stimulate.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Rebeaud and colleagues analyze evolution of chaperones compared to the evolution of whole proteome complexity across the entire tree of life. Their principal conclusions are well captured in the following quote from the Discussion:

      "Comparison of the expansion of proteome complexity versus that of core-chaperones presents a dichotomy-a linear expansion of core-chaperones supported an exponential expansion of proteome complexity. We propose that this dichotomy was reconciled by two features that comprise the hallmark of chaperones:the generalist nature of core-chaperones,and their ability to act in a cooperative mode alongside co-chaperones as an integrated network.Indeed, in contrast to core chaperones, there exist a consistent trend of evolutionary expansion of co-chaperones."

      The general theme of the evolution of proteome management is of obvious interest. Unfortunately, the entire analysis is shaky and fails to convincingly ascertain the authors' conclusions. There are many issues. Throughout the manuscript, the authors discuss 'expansion' of the proteome in bacteria, archaea and eukaryotes, creating the impression of a consistent evolutionary trend. No such trend actually exists if one considers the means or medians of proteome sizes within each of the three domains of life (there is a transition to greater complexity in eukaryotes). The maximum complexity, certainly, increases with time which can be attributed to the 'drunkard's walk' effect. This hardly qualifies as 'expansion'. The authors further claim a 'linear' expansion of the chaperone set and and 'exponential' expansion of the total proteome size. These are precise mathematical terms and, as such, require fitting to the respective functions. No such thing in this manuscript. Even apart from that shortcoming, the explanation of both 'linear' and 'exponential' are quite confusing. Thus, when explaining the 'linearity' of chaperone evolution, the authors refer to the lack of major innovation among the chaperones. This is correct in itself but has nothing to do with linearity. Apart from the aforementioned conceptual problems, the estimation of the 'exponential' growth of the proteome are naive, inconsistent and inaccurate. As the base point for the expansion estimates for archaea and eukaryotes, the authors take parasitic forms. Even leaving aside the highly dubious claims that these organisms belong to the clades that diverged first from the respective ancestors, parasites are not an appropriate choice for such estimates because they certainly are products of reductive evolution. For bacteria, inconsistently, the authors choose a free-living form from a dubious ancient clade, and not even the one with the smallest genome. All taken together, this robs the expansion estimates of any substantial meaning.

      The authors do make a salient and I think essentially correct observation: chaperones typically comprise about 0.3% of the proteins in any organism. As such, this presents no dichotomy in evolutionary trends to be explained. Surely, as examined and discussed in the paper, eukaryotes also show significant increases in the size and domain content of the encoded proteins, suggesting the possibility that might need more chaperones. However, if this is the explanandum, rather than the number of proteins in the proteome as such, it should be clearly stated. Furthermore, it is quite natural to assume that this increase in protein complexity without a commensurate increase in the chaperone diversity, is enabled by higher expression of the chaperones as suggested in the Discussion of this paper. I doubt there is any big surprise here and even much need for an extended discussion let alone a special publication.

      Significance

      As such, in the opinion of this reviewer, there is no substantial advance over the existing knowledge in this paper. Should the authors wish to revise, they would need to develop robust methodology to measure proteome expansion. That would involve starting from reconstructed ancestors rather than any extant forms (let alone parasites). I doubt that such analysis, non-trivial in itself, reveals an strong, consistent trends other than the well known increase in complexity in eukaryotes.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The authors present well written work on the evolution of proteome size and complexity, and the corresponding changes in chaperone proteins. Interestingly, they find chaperone copy numbers increase linearly with proteome size, despite the increasing 'complexity' of, in particular, post-LECA genomes. They suggest that to address the rise in complexity, organisms express chaperones at higher levels and an expanding network of co-chaperones has evolved across the tree of life.

      Major comments:

      -Summary reads strangely relative to the rest of the manuscript, and lists facts in a way that makes the purpose of the study confusing. I think most readers will dislike the characterisation of evolution as a progress from simple to complex, and the authors' might want to avoid this language throughout the manuscript- bacteria and archaea have also been evolving over this period of times, and have not become more 'complex'? Similarly the authors should reconsider their figure legend titles. As a specific example,'in the course of evolution' should become 'across the tree of life' .

      -I think the manuscript would be improved if the authors significantly shortened the discussion of genome size evolution- this is fairly well understood, and could be covered briefly, especially as the main focus of the manuscript is on the evolution of chaperone and co-chaperone repertoire. They could also make clearer quantitative links between protein complexity and the evolution of chaperones and co-chaperones- perhaps this should be in the discussion? The authors might also consider referencing 'The evolution of genome complexity', which could be relevant to this manuscript and might make the work of broader interest.

      -The authors state 'protein trees were generated and compared with ToL to account for gene loss and transfer events'. The methodology for this procedure is not given in the manuscript. The authors should back up this point, and make it clear this is why they reconstruct the trees. Currently it is not convincing to me that the authors have found HGT given the considerable phylogenetic uncertainty in the basal events in the tree of life. I also expect the tree of a single protein to be potentially lack information due to the short sequence considered and possible lack of power. The authors need to consider whether the data is really of high enough quality to assess this.

      -Methods- the authors could consider taking an alternative source of LUCA proteins, rather than those found in 'Nanoarchaeota and Aquificae':it's possible these are not representative of LUCA, and it seems a somewhat arbitrary choice- the authors could consider using one of the available curated sets, such as that generated by Ranea et al. (2006)

      -The patterns observed might only hold because of differences in the taxa that diverged pre and post LECA? The authors might consider subgroup analyses to ensure this is not the case. The authors could also consider using methods that take phylogeny into account.

      Minor comments:

      'Life's habitability has also expanded from its 10 specific niche of emergence-likely deep-sea hydrothermal vents, to highly variable and extreme 11 ranges of temperature, pressure, exposure to high UV-light, dehydration and free oxygen.' This is not really correct, as bacteria and archaea are found worldwide, and in the most extreme environments.

      ' We reconciled the topology of our tree'- on first read this was not clear, I did not realise the authors were only building trees for subsets of the data- time tree is the best source for the overall topology. The phrase 'manually curated and adjusted' is used in the methods. This language is much too vague, and not a clear explanation of the steps taken.

      Significance

      The work presents interesting results that suggest that more 'complex' organisms have evolved a strategy to cope with increasing proteome size, and is interesting to researchers in the field of molecular evolution.

      I am a researcher in population genetics and molecular evolution.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This study outlines calcium probes for assessing the poorly understood role of peroxisomes in calcium signaling. The authors suggest that these organelles sequester calcium from either calcium influx across the plasma membrane or from release from the ER/SR. This is important since we need to know more about the roles of these organelles in calcium homeostasis and signaling. However, it needs to be robustly demonstrated that the probes are targeted to the right organelle without confounding contamination from other organelles which can be very significant even for a small degree of mis-targeting.

      Major

      1. The difference between the signals seen between the peroxisome and cytosolic D3 versions are not compelling, other than a dampened spike with the former (higher resting levels, smaller peak). See below for pH concerns.
      2. How clean is the peroxisome distribution? Prove that D3 spillover from its being partially in (or on) other compartments (e.g. cyto, ER) is not contributing to the changes. Selective manipulation of Ca2+ in these other compartments should not affect the peroxisome signal.
        • a. For example, the small changes in the D3-px could be explained by peroxisome not changing at all but rather the other compartments (where larger responses are observed) signal(s) contaminating the response.
          • b. e.g. if in the ER lumen, the signal should be eliminated with SERCA inhibitors (thapsigargin, CPA). They used Thapsigargin in cardiac myocytes, why not in HeLa during characterization)?
      3. Any Ca2+ reporter will pH-sensitive to an extent, even D3 (Ca2+ binding, inherent fluorescent proteins).
        • a. It is essential to prove that the signal changes are not due changes perox pH. Target pH-sensitive proteins to the perox lumen by the same strategy and show that the same Ca2+ interventions do not cause pH changes.
        • b. The authors claim different resting levels of [Ca2+] in cytosol/mitochondria/peroxisome. The resting FRET level also depends on the resting pH of the compartments which may also be different. Certainly, mitochondria are more alkaline than the cytosol. Again, to interpret these are real Ca2+ differences requires the pH to be accounted for.
      4. I am puzzled by the model, in particular in view of Fig 3. The genetically-encoded calcium indicator (GECI) is allegedly in on the cytosolic face of the peroxisome and measuring peri-peroxisomal Ca2+.
        • a. The changes with this reporter look pretty similar to the luminal reporter (save that the resting ratio may be lower). I don't understand how the lumen [Ca2+] > cytosolic [Ca2+] without a higher local [Ca2+] (unless there is an energy-driven uptake mechanism, but then how does this fit in with ER-driven Ca2+ release?).
      5. The claim that resting peroxisome [Ca2+] is higher than cytosol is questionable. Is this a calibration artifact (e.g. compartment pH-differences or the reporter behaves differently in the lumen)? Such a gradient could not be sustained without energy-dependent Ca2+ uptake. The authors make no discussion of this.

      Minor

      1. Quantitate localization. Pearson's coefficients for GECIs and Peroxisomes.
      2. Different upstroke rates of D3 with His vs Cao. Quantify.
      3. Page 5. Line 161. 'Different sites', do the authors mean different sides? Similarly, the Legend of Fig 3.

      Significance

      Good peroxisome calcium probes is important to the genral calcium signaling field. This is fundamental science of interst to all cell biologists.

      There has been little published on peroxisome calcium, although for example, the Pozzan lab published a paper in JBC in 2008 on a GFP-based lumenally targeted peroxisome probe. There is contradictory data in the field and reliable new approaches are needed.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript by Sargsyan et al describes an unappreciated role for peroxisomes in Calcium dynamics. Specifically, the authors propose that GPCR/VDCC/SOCE-mediated cytosolic Ca2+ elevation is rapidly sensed by peroxisomes and sequestered. The authors used/generated a peroxisome-targeted genetically encoded Ca2+ indicators which is elegant and powerful tool to monitor the luminal Ca2+ dynamics. While the results and conclusions are novel, there are some important gaps that need to be addressed for consideration for publication in EMBO J.

      Comments:

      Peroxisomes are single membrane bound organelles which are conserved across species spanning from yeast to humans. While housing only -100 proteins, they are responsible for essential steps in lipid metabolism, amino acid metabolism and ROS homeostasis. Unlike other organelles, peroxisomes import fully folded and cofactor-bound proteins into their matrix. Though peroxisomes house specific metabolic functions, there is extensive crosstalk with other organelles, including mitochondria. It is essential to test and define whether silencing/knockdown of mitochondrial Ca2+ transport components like MCU will impact peroxisome Ca2+ uptake upon stimulation with histamine or electrical stimulation.

      Since peroxisomes buffer significant amount of Ca2+, it is worth testing whether blockade of mitochondrial Ca2+ uptake would not alter peroxisome mediated Ca2+ influx. This analysis will provide Ca2+ uptake rate of mitochondria vs peroxisomes (mallilankaraman K. et al CELL 2012 and Nemani N. et al Science Signaling 2020).

      Peroxisomal synthesis of plasmalogens is Ca2+ and oxygen tension dependent, it is essential to show that altering Ca2+ controls plasmalogen synthesis.

      In the introduction authors have stated that "Elevated mitochondrial uptake increases 39 mitochondrial reactive oxygen species (ROS) production and is associated with heart falure and ischemic 40 brain injury (Starkov et al., 2004; Santulli et al., 2015)." These cited articles remotely links MCU and ROS elevation. It is important to point out that Tomar et al 2016 Cell Reports clearly demonstrated that genetic ablation of MCU suppresses mROS production that is mitochondrial Ca2+ dependent.

      Significance

      The significance of the work is very high. The authors employ a variety of complementary techniques and experimental systems to demonstrate that peroxisomes indeed buffer a large quantity of Ca2+ upon stimulation.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      These are straight forward studies aimed to develop probes to asses peroxisomal Ca2+ in rest and in response to receptor stimulation. The probes were designed to measure intraperoxisomal Ca2+ and the Ca2+ the peroxisome experience when cytoplasmic Ca2+ is increased. The pobes fill a need in understanding peroxisomal Ca2+ and Ca2+ signaling in general and should be very useful to investigators in the field.

      The comments are aimed to help in improving the studies and taking them to the next stage.

      The grammar needs improvement and the introduction needs sharpening. It is long and, in many places, not to the point. The results and discussion sections are also quite verbose.

      The sidedness of the probes need to be validated further, especially since the peroxisomal Ca2+ increase follows the cytoplasmic and the slower reduction rate may results from the environment experienced by the probe. Simple experiments: how the probes respond to Ca2+ ionophore; does Ca2+ reduced rapidly when removed from the media of the digitonin permeabilized cells; how the cytoplasmic and peroxisomal thapsigargin responses compare using the protocols in 2A and 4A? Sidedness of PEX13-D3cpV was not examined.

      Calculation of peroxisomal Ca2+ are based on Kd reported in the literature. The Kds of D3cpV-px and PEX13-D3cpV should be determined when in the peroxisome in permeabilized cells for the numbers to have any meaning.

      How the localization of the probes look in the differentiated cardiomyocytes? How it compares to RyRs, VACC, etc..

      The major weakness of the study is that the probes are used only as a tool. The enhance the study and bring it beyond an excellent technical achievement, the authors should use them to study a significant Ca2+-dependent peroxisomal function and show how the use of the tools eliminate the role of Ca2+ in such a function.

      Significance

      These are straight forward studies aimed to develop probes to asses peroxisomal Ca2+ in rest and in response to receptor stimulation. The probes were designed to measure intraperoxisomal Ca2+ and the Ca2+ the peroxisome experience when cytoplasmic Ca2+ is increased. The pobes fill a need in understanding peroxisomal Ca2+ and Ca2+ signaling in general and should be very useful to investigators in the field.

      The major weakness of the study is that the probes are used only as a tool. The enhance the study and bring it beyond an excellent technical achievement, the authors should use them to study a significant Ca2+-dependent peroxisomal function and show how the use of the tools eliminate the role of Ca2+ in such a function.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** This interesting study by Putker et al. showed that circadian rhythmicity persists in several typical circadian assay systems lacking Cry, including Cry knockout mouse behavior and gene expression in Cry knockout fibroblasts. They further demonstrated weak but significant circadian rhythmicity in Cry- and Per- knockout cells. Cry- (and potentially Per-)-independent oscillations are temperature compensated, and CKId/e still has a role in the period regulation of Cry-independent oscillations. **Major comments:** 1) The authors propose that the essential role of mammalian Cryptochrome is to bring the robust oscillation. As the authors analyze in many parts, the robustness of oscillation can be validated by the (relative) amplitude and phase/period variation, both of which should be affected significantly by the method for cell synchronization. Unfortunately, the method for synchronization is not adequately written in this version of supplementary information. This reviewer has no objection to the "iterative refinement of the synchronization protocol" but at least the correspondence between which methods were used in which experiments needs to be clearly explained. The detailed method may be found in the thesis of Dr. Wong, but the methods used in this manuscript need to be detailed within this manuscript.

      We thank the reviewer for recognising the importance of different synchronisation protocols. In experiments where bioluminescent CKO rhythms were observed, different synchronisation protocols resulted in similar results when comparing WT with CKO cells. The different synchronisation methods used in each experiment are now specified in the supplementary methods.

      2) The authors revealed that CKO mice have apparent behavioral rhythmicity under the condition of LL>DD. This is an intriguing finding. However, it should be carefully evaluated whether this rhythmicity (16 hr cycle) is the direct consequence of circadian rhythmicity observed in CKO and CPKO cells (24 hr cycle) because the period length is much different. Is it possible to induce the 16 hr periodicity in CKO mice behavior by 16 hr-L:16 hr-D cycle? Would it be a plausible another possibility that the 16 hr rhythmicity is the mice version of internal desynchronization or another type of methamphetamine-induced-oscillation/food-entrainable-oscillattion?

      The reviewer makes an excellent suggestion. As described in the manuscript text (page 13), CKO mice have already been shown to entrain to restricted feeding cycles (Iijima et al., 2005) and we therefore assessed whether CKO rhythms would entrain to a 16h day as suggested. Whilst CKO (but not WT) mice showed 16h behavioural rhythms during entrainment, they were arrhythmic under constant darkness thereafter (Revised Figure S2A). CKO cellular rhythms show reduced robustness under constant conditions ex vivo, and our other work has revealed that CRY-deficiency renders cells much more susceptible to stress (Wong et al, 2020, BioRxiv). The parsimonious explanation, therefore, is that whilst the cellular timing mechanism remains functional when CRY is absent, the amplitude of cellular clock outputs is severely attenuated (as we showed previously in Hoyle et al., Sci Trans Med, 2017) in a fashion that impairs the fidelity of intercellular synchronisation under most conditions in vivo, as well as the molecular mechanisms of entrainment to light-dark cycles.

      With respect to the apparent discrepancy between mean periods of CKO cultured cells (~21h), SCN (~19h) and mice (~17h). This is also observed in WT cells (~26h), SCN (~25h) and mice (~24h), simply with a smaller effect size and longer intrinsic period.

      We believe this difference in effect size can adequately be explained by differences in oscillator coupling, combined with the reduced robustness of CKO timekeeping. In Figure 1F we show that the range of rhythmic periods expressed by cultured CKO fibroblasts (14-30h) is much greater than for their WT counterparts (range of 22-26h), or that which is observed when cellular oscillators are coupled in CKO SCN (19h). Thus period of CKO oscillations is demonstrably more plastic (less robust) than WT, and with a cell-intrinsic tendency towards shorter period which is revealed more clearly when oscillators are coupled.

      In vivo there is more oscillator coupling in the intact SCN than in an isolated slice, from which communication with the caudal and rostral hypothalamus has been removed. Thus it seems plausible that increased coupling in vivo, combined with positive feedback via behavioural cycles of feeding and locomotor activity, resonate with a common frequency which is shorter than in isolated tissue.

      Critically, for both WT and CKO mice/SCN, the circadian period lies within the range of periods observed in isolated fibroblasts. To communicate this rather nuanced point we have inserted the following text into the supplementary discussion:

      “Circadian timekeeping is a cellular phenomenon. Co-ordinated ~24h rhythms in behaviour and physiology are observed in multi-cellular mammals under non-stressed conditions when individual cellular rhythms are synchronised and amplified by appropriate extrinsic and intrinsic timing cues. In light of short period (~16.5h) locomotor rhythms observed in CKO mice after transition from constant light to constant dark, but failure to entrain to 12h:12h light:dark cycles, it seemed plausible that either CKO mice might entrain to an short 8h:8h light:dark (16h day) or else have a general deficiency to entrainment by light:dark cycles. The data in Figure S2 supports the latter possibility, in that neither WT nor CKO mice stably entrained to 16h cycles whereas WT but not CKO mice entrained to 24h days. The bioluminescence oscillations observed in CKO cells conform to the long-established definition of a circadian rhythm (temperature-compensated ~24h period of oscillation with appropriate phase-response to relevant environmental stimuli). Whereas the locomotor rhythms observed in CKO mice under quite specific environmental conditions correlates with both the cellular and SCN data to suggest the persistence of capacity to maintain behavioural rhythms close to the circadian range, but which is masked under most circumstances. We suggest that in vivo the (pathophysiological) stress of CRY-deficiency is epistatic to the expression of daily rhythms in locomotor activity following standard entrainment by light:dark cycles and thus, whilst not arrhythmic, also cannot be described as circadian in the strictest sense.”

      3) The authors proposed that CKId/e at least in part is the component of cytoscillator (Fig. 5D), and turnover control of PER (likely to be controlled by CKId/e) may be an interaction point between cytoscillator and canonical circadian TTFL (Fig. 4). Strictly speaking, this model is not directly supported by the experimental setting of the current manuscript. The contribution of CKId/e is evaluated in the presence of PER by monitoring the canonical TTFL output (i.e. PER2::LUC); thus it is not clear whether the kinase determines the period of cytoscillator. It would be valuable to ask whether the PF and CHIR have the period-lengthening effect on the Nrd1:LUC in the CPKO cell.

      Another excellent suggestion, thanks. The experiment, showing similar results in CKO and CPKO cells, was performed and is now reported in Revised Figure S5D. The text was amended as follows: “We found that inhibition of CK1d/e and GSK3-α/β had the same effect on circadian period in CKO cells, CPKO cells, and WT controls (Figure 5A, B, S5A, B, D).”

      Moreover, our data are further supported by findings in RBCs, where CK1 inhibition affects circadian period in a similar manner as in WT and CKO cells (Beale et al, JBR 2019).

      **Minor comments:**

      4) The authors argue that the CKO cells' rhythmicity is entrained by the temperature cycle (Fig. 2C). Because the data of CKO cell only shows one peak after the release of constant temperature phase, it is difficult to conclude whether the cell is entrained or just respond to the final temperature shift.

      We agree with the reviewer and have replaced the original figure with another recording that includes an extra circadian cycle in free-running conditions (Revised Figure 2C).

      5) It would be useful for readers to provide information on the known phenotype of TIMELESS knockout flies; TIM is widely accepted as an essential component of the circadian clock in flies; are there any studies showing the presence of circadian rhythmicity in Tim-knockout flies (even if it is an oscillation seen in limited conditions, such as the neonatal SCN rhythm in mammalian Cry knockout)?

      The reviewer is correct that TIM is widely accepted as an essential component of the circadian clock in flies. Using more sensitive modern techniques however, ~50% of classic Tim01 mutant flies exhibit significant behavioural rhythms in the circadian range under constant darkness, as reported:

      https://opus.bibliothek.uni-wuerzburg.de/frontdoor/index/index/year/2015/docId/11914

      For this reason we employed a full gene knockout of the Timeless gene (Lamaze et al., Sci Rep, 2017), where the majority of flies are behaviourally arrhythmic under constant conditions following standard entrainment by light cycles and therefore represents a more appropriate model for CRY-deficient cells.

      We have revised the legend of Figure S2 to include the following:

      “N.B. The generation of Timout flies is reported in Lamaze et al, Sci Rep, 2017. Similar to CRY-deficient mice, whole gene Timeless knockout flies are characterised as being behaviourally arrhythmic under constant darkness following entrainment by light:dark cycles: https://opus.bibliothek.uni-wuerzburg.de/frontdoor/index/index/year/2015/docId/11914”

      5) Figure 3C shows that the amount of PER2::LUC mRNA changes ~2 fold between time = 0 hr and 24 hr in the CKO cell. This amplitude is similar to that observed in WT cell although the peak phase is different. Does the PER2::LUC mRNA level show the oscillation in CKO cells?

      No, we think we have shown convincingly this is not the case. We argue the data in figure 3C show that: (a) there is no circadian variation in mRNA PER2::LUC expression (mRNA levels increase but no trough is observed) and (b) that the temporal relationship between protein and mRNA as observed in WT is broken; i.e. the CRY-independent circadian variation in protein levels cannot be “driven by” changes in transcript levels. Similar results were obtained using transcriptional reporters Per2:LUC and Cry1:LUC (Figure S3E and F). Moreover, our findings are also in line with previous reports, such as Nangle et al. (2014, eLife) and Ode et al. (Mol Cell, 2017).

      6) Figure 3D: the authors discuss the amplitude and variation (whether the signal is noisier or not) of reporter luciferase expression between different cell lines. However, a huge difference in the luciferase signal can be observed even in the detrended bioluminescence plot. This reviewer concerns that some of the phenotypes of CKO and CPKO MEF reflect the lower transfection efficiency of the reporter gene, not the nature of circadian oscillators of these cell lines.

      As reported in the methods, these are stable cell lines rather than transiently transfected cells. The detrended luciferase data presented here do not actually reflect raw levels of luciferase protein expression, but rather reflect the amount of deviation from the 24 hour average. To make it easier to compare expression levels of Per2:LUC and Nr1d1:LUC between the different cell lines we have added figure S3H, presenting the average raw bioluminescence levels over 24 hours (after 24 hours of recovery from media change; ie from 24-48 hours). Using these data one can appreciate that expression levels of the Per2 reporter are never lower in CRY KO cells when compared to WT. We hope these data can take away the reviewer’s concerns about expression levels causing the differences observed.

      Reviewer #1 (Significance (Required)): Although Cryptochrome (Cry) has been considered a central component of the mammalian circadian clock, several studies have shown that circadian rhythms are maintained in the absence of Cry, including in the neonate SCN and red blood cells. Thus, although the need for Cry as a circadian oscillator has been debated, its essential role as a circadian oscillator remains established, at least in the cell-autonomous clock driven by the TTFL. This study provides additional evidence that the circadian rhythmicity can persist in the absence of Cry. More general context, the presence of a non-TTFL circadian oscillator has been one of the major topics in the field of circadian clocks except for the cyanobacteria. In mammals, the authors’ and other groups lead the finding of circadian oscillation in the absence of canonical TTFL by showing the redox cycle in red blood cells (O’Neil, Nature 2011). The presence of circadian oscillation in the absence of Bmal1 is also reported recently(Ray, Science 2020). Bmal1(-CLOCK), CRY, and PER compose the core mechanism of canonical circadian TTFL; thus, this manuscript put another layer of evidence for the non-TTFL circadian oscillation in mammals. Overall, the manuscript reports several surprising results that will receive considerable attention from the circadian community. This reviewer has expertise in the field of mammalian circadian clocks, including genomics, biochemistry, and mice's behavior analysis.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): In the canonical model of the mammalian circadian system, transcription factors, BMAL1/CLOCK, drive transcription of Cry and Per genes and CRY and PER proteins repress the BMAL1/CLOCK activity to close the feedback loop in a circadian cycle. The dominant opinion was that CRY1 and CRY2 are essential repressors of the mammalian circadian system. However, this was challenged by persistent bioluminescence rhythms observed in SCN slices derived from Cry-null mice (Maywood et al., 2011 PNAS) and then by persistent behavior rhythms shown by the Cry1 and Cry2 double knockout mice if they are synchronized under constant light prior to free running in the dark (Ono et al., 2013 PLOS One). In the manuscript, the authors first confirmed behavioral and molecular rhythms in the Cry1/Cry2- deficient mice and then provided evidence to suggest the rhythms of Per2:LUC and Nr1d1:LUC in CKOs are generated from the cytoplasmic oscillator instead of the well-studied transcription and translation feedback loop: Constant Per2 transcription driven by BMAL1/CLOCK plus rhythmic degradation of the PER protein result in a rhythmic PER2 level in the absence of both Cry1 and Cry2, which suggests a connection between the classic transcription- and translation-based negative feedback loops and non-canonical oscillators. **Major points:** Line 38-39, "Challenging this interpretation, however, we find evidence for persistent circadian rhythms in mouse behavior and cellular PER2 levels when CRY is absent." The rhythmic behavioral phenotype of cry1 and cry2 double knockout mice was first documented by Ono et al., 2013 PLOS ONE, in which eight cry1 and cry2 double knockout mice after synchronization in the light displayed circadian periods with different lengths and qualities. The paper reported two period lengths from the Cry mutant mice: "An eye-fitted regression line revealed that the mean shorter period was 22.86+/-0.4 h (n= 8) and the mean longer period was 24.66+/-0.2 h (n =9). The difference of two periods was statistically significant (p, 0.01).", either of which is quite different from the ~16.5 hr period in Figure 1B of the manuscript. A brief discussion on the period difference between studies will be helpful for readers to understand. Period information from the individual mouse should be calculated and shown since big period variations exist among CKO mice (Ono et al., 2013 PLOS One).

      Thanks for this suggestion. The mice used by Ono et al were raised from birth in constant light, whereas we used mice that were weaned and raised in normal LD cycles before being subject to constant light then constant dark as adults. Instead of the somewhat subjective fitting of regression lines by eye performed by Ono et al, our analysis was performed using the periodogram analysis routine of ClockLab 6.0 with a significance threshold for rhythmicity of p=0.0001. We have now repeated this experiment with 10 adult CKO mice (male and female), and found no evidence for two period lengths in that the second most significant period was consistently double that of the first. As the reviewer suggests, there is a much broader distribution of CKO mouse periods compared with WT, as we also found in cultured cells and SCN. These new data are now reported in revised Figure S1B & C. We have also included a statement about how our study differs from Ono et al in the supplementary discussion.

      The behavioral phenotype of Cry-null mice and luminescence from their SCNs are robustly rhythmic while fibroblasts derived from these mice only produce rhythms with very low amplitudes compared with those in WT, which may reflect the difference between the SCN’s rhythm and peripheral clocks. The behavioral phenotype is supposed to be controlled mainly by SCN. However, most molecular analyses in the work were done with MEF and lung fibroblasts. These tissues may not be the best representative of the behavioral phenotype of the CKO mice.

      Behavioural rhythms of CKO mice are significantly less robust than WT, with mean amplitude less than 50% of WT controls (Figures 1A & B, revised S1B. Furthermore, as reported, 40% of CKO SCN slices exhibited PER2::LUC rhythms, compared with 100% of WT SCN slices (as also observed by Maywood et al., PNAS, 2013), and therefore are also less robust by the definition used in this manuscript.

      As now discussed in the revised supplementary discussion:

      Circadian timekeeping is a cellular phenomenon. Co-ordinated ~24h rhythms in behaviour and physiology are observed in multi-cellular mammals under non-stressed conditions when individual cellular rhythms are synchronised and amplified by appropriate extrinsic and intrinsic timing cues.”

      The objective of this study was to understand the fundamental determinants that allow mammalian cells to generate a circadian rhythm, which we find does not include an essential role for CRY genes/proteins. Thus the cell is the appropriate level of biological abstraction at which to investigate the phenomenon, whereas the SCN and behavioural recordings simply serve to illustrate the competence of CRY-independent timing mechanisms to co-ordinate biological rhythms at higher levels of biological scale which are manifest under some conditions. To reiterate, the behavioural data supports the cellular observations, not the converse.

      Stronger evidence is needed to fully exclude the possibility that in CKO cells, the rhythm is not generated by PERs' compensation for the loss of Crys to repress BMAL1 and CLOCK. Since the rhythms of Per:LUC or Nr1d1:LUC (Figures 3D and S3E) are much weaker than those in WT, molecular analyses might not be sensitive enough to reflect the changes across a circadian cycle in the CKOs if the TTFL still occurs. CLOCKΔ19 mutant mice have a ~4 hr longer period than WT (Antoch et al., 1997 Cell; King et al., 1997 Cell). CLOCKΔ19; CKO cells or mice should be very helpful to address the question. Periods of Per:LUC and Nr1d1:LUC from the CLOCKΔ19; CKO should be similar to those in the CKO alone if the transcription feedback does not contribute to their oscillations.

      We agree this would be an interesting experiment, however the data in this manuscript and Wong et al. (BioRxiv, 2020), whilst not disputing the existence of the TTFL, strongly suggest that it fulfils a different function to that which is currently accepted and is not the mechanism that ultimately confers circadian periodicity upon mammalian cells. CLOCKΔ19 is an antimorphic gain-of-function mutation with many pleiotropic effects. Therefore, if the TTFL is not the basis of circadian timekeeping in mammalian cells, it follows that the CLOCKΔ19 mutation may not elicit its effects on circadian rhythms through delaying the timing of transcriptional activation, as was proposed. As such, whether or not CLOCKΔ19 alters circadian period of CKO cells/mice would not allow the two models to be distinguished in the way that the reviewer envisions.

      Secondly, we cannot detect any interaction between PER2 and BMAL1 in the absence of CRY using an extremely sensitive assay.

      Thirdly, very strong biochemical evidence suggests that PER has no repressive function in the absence of CRY (Chiou et al., 2016; Kume et al., 1999; Ode et al., 2017; Sato et al., 2006).

      Finally, in several figures particularly 3C and 4A, we show that PER2 peaks at the same time CKO and WT cells, but in CKO cells this is not accompanied by a coincident peak in the mRNA. Thus, even if PER were able to repress BMAL1/CLOCK without CRY, rhythms in PER2 protein level could not be explained by some residual PER/BMAL1-dependent TTFL mechanism.

      To address the reviewer’s concern however, we have employed mouse red blood cells which offer unambiguous insight into the causal determinants of circadian timing, as we can be absolutely confident that there is no transcriptional contribution to cellular timekeeping. Briefly, we took fibroblasts and RBCs from WT, short period Tau/Tau and long period Afh/Afh mutant mice. The basis of the circadian phenotype of these mutations is quite well established as occurring through the post-translational regulation of PER and CRY proteins respectively, and result in short and long period PER2::LUC rhythms compared with WT fibroblasts. RBCs do not express PER or CRY proteins, and commensurately no genotype-dependent differences of RBC circadian period were observed (Beale et al, 2020, in submission). In contrast, RBC circadian rhythms are sensitive to pharmacological inhibition of casein kinase 1 (Beale et al., JBR, 2019).

      Lines 51-52, "PER/CRY-mediated negative feedback is dispensable for mammalian circadian timekeeping" and lines 310-311, "We found that transcriptional feedback in the canonical TTFL clock model is dispensable for cell-autonomous circadian timekeeping in animal and cellular models." The authors have not excluded the possibility that the rhythmic behaviors of the CKO mice are derived from the PERs' compensation for the role of Crys in the feedback loop of the circadian clock in the SCN. In the fibroblasts, only two genes, Per2 and Nr1d1, have been studied in the work, which cannot be simply expanded to the thousands of circadian controlled genes. Also amplitudes of PER2:LUC and NR1D1:LUC in the CKOs are much lower than those in WT and no evidence has been provided to show that their weak rhythms are biologically relevant.

      The definition of a circadian rhythm (Pittendrigh, 1960) does not mention biological relevance or stipulate any lower threshold for amplitude. As now stated in the revised text (page 6):

      PER2::LUC rhythms in CKO cells were temperature compensated (Figure 2A, B) and entrained to 12h:12h 32°C:37°C temperature cycles in the same phase as WT controls (Figures 2C), and thus conform to the classic definition of a circadian rhythm (Pittendrigh, 1960) – which does not stipulate any lower threshold for amplitude or robustness.

      We make no claims about biological relevance or amplitude in this manuscript, which are addressed in our related manuscript (Wong et al., BioRxiv, 2020). In this related manuscript, we explicitly address whether CRY is necessary for mammalian cells to maintain a circadian rhythm in the abundance of clock-controlled proteins and find that it is not. Indeed, twice as many rhythmically abundant proteins are observed in CKO cells than WT controls, which suggests that, if anything, CRY functions to suppress rhythms in protein abundance rather than to generate them.

      We observe circadian rhythms in the activity of two different bioluminescent reporters, which have already been extensively characterised. The mouse and SCN data in figure 1 are correlative, and simply show that previous published observations are reproducible. PER2::LUC oscillations are not accompanied by Per2 mRNA oscillations. This, together with the absence of a BMAL1-PER2::LUC complex strongly argues against a model where PER2 oscillations are driven by residual (PER2-driven) transcriptional oscillations.

      We therefore concede the reviewer’s point that we “cannot exclude rhythmic behaviors of the CKO mice are derived from the PERs' compensation for the role of Crys in the feedback loop of the circadian clock in the SCN”. The reviewer will agree however, that there exists very strong biochemical evidence suggests that PER has no repressive function in the absence of CRY (Chiou et al., 2016; Kume et al., 1999; Ode et al., 2017; Sato et al., 2006); that there exists no experimental evidence to suggest that PERs can fulfil this function in the absence of CRY in any mammalian cellular context; and finally that our observations are not consistent with the canonical model for the generation of circadian rhythms in mammals.

      We have therefore amended the text to focus on CRY specifically, as follows:

      PER/CRY-mediated negative feedback is dispensable for mammalian circadian timekeeping

      Page 12. “We found that CRY-mediated transcriptional feedback in the canonical TTFL clock model is dispensable for cell-autonomous circadian timekeeping in cellular models. Whilst we cannot exclude the possibility that in the SCN, but not fibroblasts, PER alone may be competent to effect transcriptional feedback repression in the absence of CRY, we are not aware of any evidence that would render this possibility biochemically feasible.”

      **Minor points:** Lines 66-67, "...(Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)." to "... (reviewed in Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)."

      Thanks, changed as requested.

      Line 70, "...((Liu et al., 2008..." to "...(Liu et al., 2008..."

      Thanks, changed as requested.

      Lines 174-175, "Considering recent reports that transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ...". Larrondo et al., 2015 paper says "however, in such ∆fwd-1 cells, the amount of FRQ still oscillated, the result of cyclic transcription of frq and reinitiation of FRQ synthesis." The point of the paper is "we unveiled an unexpected uncoupling between negative element half-life and circadian period determination." instead of "...transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ,"

      This is a good point which, following discussion with Profs Dunlap and Larrondo, we have revised into “no obligate relationship between clock protein turnover and circadian regulation of its activity” – a more accurate summary of their findings.

      Lines 249-252, "CKO cells exhibit no rhythm in Per2 mRNA (Figure 3C, D), nor do they show a rhythm in global translational rate (Figure S4A, B), nor did we observe any interaction between BMAL1 and S6K/eIF4 as occurs in WT cells (Lipton et al, 2015) (Figure S4C)." In figures 3D and S3E, in CKO and CPKO cells the Per2:LUC data without fitting look better than that of Nr1d1:LUC. But the Nr1d1:LUC rhythm became clear after fitting the raw data. So to better visualize the low amplitude rhythm, if any, of Per2:LUC and compare with Nr1d1:LUC, fitted the Per2:LUC data in CKOs and CPKOs in Figure 3D and S3E should be shown as what has been done to Nr1d1:LUC.

      Thanks, these data can be found in Figure S3F. The detrended Per2:Luc CKO and CPKO bioluminescence traces were better fit by the null hypothesis (straight line) than a damped sine wave (p>0.05) and so were not significantly rhythmic by the criteria used in this manuscript.

      Lines 258-259, "much less than the half-life of luciferase expressed in fibroblasts under a constitutive promoter" In figure S4D, the y-axis of the PER2::LUC is ~800 while the y-axis of the SV40::LUC is ~600000. The over-expressed LUC by the SV40 promoter might saturate the degradation system in the cell so the comparison is not fair. A weaker promoter with the level similar to Per2 should be used to make the comparison.

      Thank you for this suggestion. In our experience, the SV40 promoter is actually a rather weak promoter compared with CMV, and faithfully facilitates the constitutive (non-rhythmic) expression of heterologous proteins such as Luciferase (Feeney et al., JBR, 2016). It has been shown previously that constitutive over-expression of heterologous proteins such as GFP or even CRY1 does not affect circadian rhythms in fibroblast cells (e.g. Chen et al., Mol Cell, 2009). To address the reviewer’s reasonable concern however, multiple stable SV40:Luc fibroblast lines were generated by puromycin selection, grown to confluence in 96-well plates, then treated with 25 μg/mL CHX at the beginning of the recording. Random genomic integration of SV40:Luc leads to a broad range of different levels of luciferase expression, evident from the broad range of initial luciferase activities. For each line the decline in luciferase activity was fit with a simple one-phase exponential decay curve (R2≥0.98) to derive the half-life of luciferase in each cell line. There was no significant relationship between the level of luciferase expression and luciferase stability (straight line vs. horizontal line fit p-value = 0.82). Therefore constitutive expression of SV40:Luc in fibroblasts does affect the cellular protein degradation machinery within the range of expression used for our half-life measurements. These new data are reported in Revised Figure S3H.

      Line 430, "sigma" to "Sigma".

      Changed

      In figure S2, the classification of rhythms in Drosophila is not clear since even the "Robustly rhythmic" ones have high background noise. Detrending or fitting the data might be able to improve the quality of the rhythms prior to classification.

      These are noisy data as they come from freely behaving flies. The mean data was shown in Figure S3A and individual examples in S3B, and look very similar to previous bioluminescence fly recordings of XLG-LUC flies in papers from the Stanewsky lab who have published extensively using this model. The classifications arose from double-blinded analysis of the bioluminescence traces by several individuals, but we agree that this was not clearly communicated in our original submission. In Revised figure S2 we now present the mean bioluminescence traces, with and without damped sine wave vs. straight line fitting, as suggested, which is more consistent with the mammalian cellular data presented elsewhere.

      In figure S3B, the original blots for Per2 including Input and IP should be shown.

      The original blots for BMAL1 are shown in figure S3I. PER2::LUC levels were assessed by measuring bioluminescence levels present on the anti-bmal1-beads, as described in the figure 3B legend.

      Supplemental information Line 44, "...(reviewed in (Lakin-Thomas,..." to "...(reviewed in Lakin-Thomas,..."

      Changed

      Line 188, "Period CDS", the full name of CDS should be provided the first time it appearances.

      Changed to “coding sequence”.

      Reviewer #2 (Significance (Required)): The work suggests a link between the TTFL and non-canonical oscillators, which should be interesting to the circadian field.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): **Summary:** The paper "CRYPTOCHROMES confer robustness, not rhythmicity, to circadian timekeeping" by Putker et al. answers the question of whether or not the rhythmic abundance of clock proteins is a prerequisite for circadian timekeeping. They addressed this by monitoring PER2::LUC rhythms in WT and CRY KO (CKO) cells. CRY forms a complex with PER, which in turn represses the ability of CLOCK/BMAL1 to drive the expression of clock-controlled genes, including PER and CRY. Consistent with previous observations, the authors found residual PER2::LUC rhythms in CKO SCN slices, fibroblasts and in a functional analogue KO of CRY in Drosophila, even in the absence of rhythmic Per2 transcription due to the loss of CRY as a negative regulator of the oscillation. They have shown that these rhythms, in the absence of CRY, follow the formal definition of circadian rhythms. They attributed these residual PER2::LUC rhythms to the maintenance of oscillation in PER2::LUC stability independent of CRY, by testing the decay kinetics of luciferase activity when translation is inhibited. Moreover, they implicated the kinases CK1d/e and GSK3 to be involved in regulating PER2::LUC post-translational rhythms through kinase inhibitor studies. They concluded that CRY is not necessary for maintaining PER2::LUC rhythms, but plays an important role in reinforcing high-amplitude rhythms when coupled to a proposed "ctyoscillator" likely composed of CK1d/e and GSK3. **Major comments:** The authors have shown sufficient data that under different testing conditions (mice locomotor activity, SCN preps or fibroblasts), behavioral rhythms and PER2::LUC rhythms are still observed in the CRY KO (CKO) cells, contrary to a previous study (Liu et al., 2007). They also indicated limitations to some of the.experimental work. However, there are some parts of the paper that need clarification to support their conclusions. 1.In Fig. 1A, the x-axes of the actograms for WT and CKO are different. While they mentioned this in the figure legend, and described the axis transformation in Fig. S1A, they need a justification statement about why they did this in the results.

      Thanks, we have included the following sentence in the results section as requested:

      Figure 1 representative actograms are plotted as a function of endogenous tau (**t) to allow the periodic organisation of rest-activity cycles to be readily discerned; 24h-plotted actograms are shown in Figure S1A and S2A

      2.In an attempt to show conservation of their proposed role for CRY, they tested the model system Drosophila melanogaster where TIMELESS serves as the functional analogue of CRY. While they showed in the figures and described in the text that rhythms still persisted with lower relative amplitude in the TIMELESS-deficient flies, they did not describe any period differences between WT and mutant. Showing the period quantification in Supp. Fig. S2 using the robustly rhythmic datasets, and describing this data in the text, will strengthen their claim.

      These analyses are now reported in revised Figure S2 as requested. As described in our response to reviewer 2, the “robustly rhythmic” flies were scored as such through double-blinded analysis by several individuals. We hope the reviewer will appreciate our concern that exclusion of the majority of TIMELESS-deficient flies that were not robustly rhythmic might skew their apparent period by unconscious bias towards favouring traces that most clearly resemble robustly rhythmic WT controls. To avoid any potential bias we therefore included all flies of both genotypes in the analysis of circadian period for the revised figure, as suggested by our other reviewers.

      In Fig. S2B, there is no clear distinction between the representative datasets shown for poorly rhythmic and arrhythmic, i.e. they all appear arrhythmic, without an indicated statistical test. The authors could present better representative data to better reflect the categories.

      As described above, we now show the grouped mean with and without fitting for all flies of both genotypes. The statistical test for rhythmicity and analysis of circadian period is now the same as was performed for the cellular data presented elsewhere.

      3.In Fig. 2A, the authors note the lack of rhythmicity in the CKO fibroblasts in the 1st three days at 37oC. How are the conditions here different from fibroblasts in Fig. 1E, where rhythms are seen during the 1st three days in CKO fibroblasts?

      As discussed in the manuscript, PER2::LUC rhythms in CKO cells and SCN are observed stochastically between recordings i.e. if one dish in a recording showed rhythms, all dishes showed rhythms and vice versa. The media change that occurred after 3 days in Fig 2A, in this case, was sufficient to initiate clear rhythms of PER2::LUC in all experimental replicates. In other experiments, media change did not have this effect. Herculean efforts by multiple lab members over many years, including the PI, have been unable to delineate the basis of this variability – which is discussed at length in the thesis of Dr. David Wong https://www.repository.cam.ac.uk/handle/1810/300610. As such, we clearly state in the discussion:

      We were unable to identify all of the variables that contribute to the apparent stochasticity of CKO PER2::LUC oscillations, and so cannot distinguish whether this variability arises from reduced fidelity of PER2::LUC as a circadian reporter or impaired timing function in CKO cells. In consequence, we restricted our study to those recordings in which clear bioluminescence rhythms were observed, enabling the interrogation of TTFL-independent cellular timekeeping.”

      1. The authors claimed in the results section- "in contrast and as expected, Per2 mRNA in WT cells varied in phase with co-recorded PER2::LUC oscillations." but Fig. 3C does not show this expected lag between mRNA and protein levels. This needs to be explained

      No lag is expected in vitro. A lag between PER protein levels and Per mRNA does occur in vivo and is very likely to attributable to daily rhythms in feeding (Crosby et al, Cell, 2019), where increased insulin signalling elicits an increase in PER protein production 4-6h after E-box and GRE-stimulated increase in Per transcription.

      When luciferin is saturating intracellularly, PER2::LUC activity correlates most closely with the amount of PER2::LUC protein that was translated during the preceding 1-2h, rather than the total amount of PER2, due to the enzymatic inactivation of the luciferase protein (Feeney et al, JBR, 2016). Consistent with many previous observations, under constant conditions, the rate of nascent PER protein synthesis is largely determined by the level of Per2 mRNA, and thus more similar phases are observed between protein and mRNA in vitro than in vivo.

      We have inserted an additional citation of Feeney et al at this point in the text to make this clear.

      5.In Figs. 5A-B, the PER2::LUC periods in the CKO untreated cells seem to vary significantly between A, B, and C. While this could be due to the high variability in the rhythms that were previously described by the authors, the average periods here seem to be longer than the one reported in Fig. 1F. Are there specific condition differences?

      There are no specific condition differences. As reported in Figure S1B, D & E, the range of CKO cellular periods is simply much broader than for WT cells. Over several dozen experiments the average period was significantly shorter, but the period variance is an equally striking feature of rhythms in these cells which we take as evidence for their lack of robustness.

      *Would additional experiments be essential to support the claims of the paper?*

      1. There is sufficient experimental data to support the major claims; however some suggested experiments are listed below.

        a. If CKO exhibits residual rhythms in PER::LUC, it would be interesting to know how CRY overexpression influences PER2::LUC rhythms, or point to previous reference papers which may have already shown such effects. The prediction would be PER2::LUC levels will still be rhythmic when CRY is overexpressed. What would be the extent of "robustness" conferred by CRY on PER2::LUC rhythms based on CRY KO and overexpression studies?

      These experiments have largely already been performed (see Chen et al., Mol Cell; Nangle et al., eLife, 2014; Fan et al., Curr Biol, 2007; Edwards et al., PNAS, 2016) and are cited in this manuscript. As suggested, PER2 rhythms remain intact under CRY1 over-expression, though are clearly perturbed, but their robustness was not investigated in any detail. We hope to be able to address this important question in our subsequent work

      The authors found that CK1d/e and GSK3 contribute to CRY-independent PER2 oscillations by showing that addition of kinase inhibitors affect the PER2::LUC period lengths in WT and CKO in the same manner. It would be interesting to know if a) PER2::LUC stability and b) PER2 phosphorylation status, is affected in WT and CKO in the presence of the inhibitors, or point to previous reference papers which may already have shown such effects.

      As the reviewer points out, PER2 stability is already reported to be regulated via phosphorylation by GSK3 and CK1. We have made explicit reference to this in the revised manuscript as follows:

      In contemporary models of the mammalian cellular clockwork CRY proteins are essential for rhythmic PER protein production, however, the stability and activity of PER proteins are also regulated post-translationally (Lee et al., 2009; Philpott et al., 2020; Iitaka et al, 2005).”

      *Are the data and the methods presented in such a way that they can be reproduced?*

      1. The protocol for the inhibitor treatments are not in the main or supplemental methods.

      In the main text methods, section luciferase recordings we state: “For pharmacological perturbation experiments (unless stated otherwise in the text) cells were changed into drug-containing air medium from the start of the recording. Mock-treatments were carried out with DMSO or ethanol as appropriate.”

      *Are the experiments adequately replicated and statistical analysis adequate?*

      1. All experiments had the sufficient number of technical and biological replicates to make valid statistical analyses. For Fig. S2, the authors used RAIN to assess rhythmicity in WT and mutant flies, but it is not clear whether the different categories (rhythmic, poorly rhythmic, and arrhythmic) were based on amplitude differences alone, or a combination of amplitude and p-values as determined by RAIN.

      As reported above, we have revised the analysis of the fly data to be consistent with the cellular data reported elsewhere in the manuscript.

      **Minor comments:** *1. Are prior studies referenced appropriately?* Authors may wish to include Fan et al., 2007, Current Biology which demonstrated that cycling of CRY1, CRY2, and BMAL1 is not necessary for circadian-clock function in fibroblasts.

      Apologies for the omission of citation to this excellent paper. Now referenced in the introduction.

      *2. Are the text and figures clear and accurate?* Figures were clear and illustrated well. See minor comments on text below:

      1. Other minor comments

      Main Text: p3, line 62; p12, line l32: It doesn't seem necessary or appropriate to cite the dictionary for the definition of robust.

      Thanks for this suggestion. During preparation of the manuscript we found that there was some disagreement between authors as to the meaning of robustness in a circadian context. We therefore feel it most necessary to define clearly what we mean by the use of this word to avoid any potential ambiguity.

      p4, line l87: "~20 h" rhythms instead of "~20h-hour" p3, line 70; p5, line 121; p14, line 380; p16, line 416 and p18, line 458: Close parentheses have been doubled in parenthetical references. p14, line 363: "crassa" instead of "Crassa" p17, line 430: "Sigma" instead of "sigma" p18, lines 464 and 483; p20, line 521: put a space between numerical values and units, to be consistent with other entries p19, line 488: "luciferase" instead of Luciferase p20, line 512: "Cell Signaling" instead of "cell signalling" p20, line 526: "single" instead of "Single"

      We thank the reviewer for his/her thoroughness, all of the above have been changed.

      Main figures: Fig. 2 p37, line 921: close parenthesis was doubled on "red"

      This was actually correct.

      Fig. 4 p41, line 989: "0.1 mM" instead of "0.1 mM" for consistency throughout text Supplementary text: line 171: "30 mM HEPES" instead of "30mM HEPES" line 184: "Cell Signaling" instead of "cell signalling" Supplementary figures: Fig. S2A "Drosophila melanogaster" instead of "Drosophila Melanogaster"

      All of the above have been changed.

      Reviewer #3 (Significance (Required)): This paper revisits the previously proposed idea that rhythmic expression of central TTFL components is not essential for circadian timekeeping to persist. However, this paper does not add a significant advance in the understanding of the underlying reasons behind sustained clock protein rhythmicity like PER in the absence of CRY, since such mechanisms in functional analogs have been shown in other systems, like Neurospora (Larrondo et al., 2015). However, this paper does clarify some issues in the field, such as discrepancies between behavioral and cellular rhythms observed in CKO mice, leading future researchers to examine closely the conditions of their CKO rhythmic assays before making conclusions pertaining to rhythmicity. The identification of the kinases as components of the proposed cytosolic oscillator (cytoscillator) needs further validation, but this is perhaps beyond the scope of the paper. The data provides incremental evidence for the existence of a cytoscillator, but opens up opportunities to identify other players, like phosphatases, to establish the connection between the central TTFL and the proposed cytoscillator.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The paper "CRYPTOCHROMES confer robustness, not rhythmicity, to circadian timekeeping" by Putker et al. answers the question of whether or not the rhythmic abundance of clock proteins is a prerequisite for circadian timekeeping. They addressed this by monitoring PER2::LUC rhythms in WT and CRY KO (CKO) cells. CRY forms a complex with PER, which in turn represses the ability of CLOCK/BMAL1 to drive the expression of clock-controlled genes, including PER and CRY. Consistent with previous observations, the authors found residual PER2::LUC rhythms in CKO SCN slices, fibroblasts and in a functional analogue KO of CRY in Drosophila, even in the absence of rhythmic Per2 transcription due to the loss of CRY as a negative regulator of the oscillation. They have shown that these rhythms, in the absence of CRY, follow the formal definition of circadian rhythms. They attributed these residual PER2::LUC rhythms to the maintenance of oscillation in PER2::LUC stability independent of CRY, by testing the decay kinetics of luciferase activity when translation is inhibited. Moreover, they implicated the kinases CK1and GSK3 to be involved in regulating PER2::LUC post-translational rhythms through kinase inhibitor studies. They concluded that CRY is not necessary for maintaining PER2::LUC rhythms, but plays an important role in reinforcing high-amplitude rhythms when coupled to a proposed "ctyoscillator" likely composed of CK1and GSK3.

      Major comments:

      The authors have shown sufficient data that under different testing conditions (mice locomotor activity, SCN preps or fibroblasts), behavioral rhythms and PER2::LUC rhythms are still observed in the CRY KO (CKO) cells, contrary to a previous study (Liu et al., 2007). They also indicated limitations to some of the.experimental work. However, there are some parts of the paper that need clarification to support their conclusions.

      1.In Fig. 1A, the x-axes of the actograms for WT and CKO are different. While they mentioned this in the figure legend, and described the axis transformation in Fig. S1A, they need a justification statement about why they did this in the results.

      2.In an attempt to show conservation of their proposed role for CRY, they tested the model system Drosophila melanogaster where TIMELESS serves as the functional analogue of CRY. While they showed in the figures and described in the text that rhythms still persisted with lower relative amplitude in the TIMELESS-deficient flies, they did not describe any period differences between WT and mutant. Showing the period quantification in Supp. Fig. S2 using the robustly rhythmic datasets, and describing this data in the text, will strengthen their claim.

      In Fig. S2B, there is no clear distinction between the representative datasets shown for poorly rhythmic and arrhythmic, i.e. they all appear arrhythmic, without an indicated statistical test. The authors could present better representative data to better reflect the categories.

      3.In Fig. 2A, the authors note the lack of rhythmicity in the CKO fibroblasts in the 1st three days at 37oC. How are the conditions here different from fibroblasts in Fig. 1E, where rhythms are seen during the 1st three days in CKO fibroblasts?

      1. The authors claimed in the results section- "in contrast and as expected, Per2 mRNA in WT cells varied in phase with co-recorded PER2::LUC oscillations." but Fig. 3C does not show this expected lag between mRNA and protein levels. This needs to be explained

      5.In Figs. 5A-B, the PER2::LUC periods in the CKO untreated cells seem to vary significantly between A, B, and C. While this could be due to the high variability in the rhythms that were previously described by the authors, the average periods here seem to be longer than the one reported in Fig. 1F. Are there specific condition differences?

      Would additional experiments be essential to support the claims of the paper?

      1. There is sufficient experimental data to support the major claims; however some suggested experiments are listed below.

      a. If CKO exhibits residual rhythms in PER::LUC, it would be interesting to know how CRY overexpression influences PER2::LUC rhythms, or point to previous reference papers which may have already shown such effects. The prediction would be PER2::LUC levels will still be rhythmic when CRY is overexpressed. What would be the extent of "robustness" conferred by CRY on PER2::LUC rhythms based on CRY KO and overexpression studies?

      b. The authors found that CK1and GSK3 contribute to CRY-independent PER2 oscillations by showing that addition of kinase inhibitors affect the PER2::LUC period lengths in WT and CKO in the same manner. It would be interesting to know if a) PER2::LUC stability and b) PER2 phosphorylation status, is affected in WT and CKO in the presence of the inhibitors, or point to previous reference papers which may already have shown such effects.

      Are the data and the methods presented in such a way that they can be reproduced?

      1. The protocol for the inhibitor treatments are not in the main or supplemental methods.

      Are the experiments adequately replicated and statistical analysis adequate?

      1. All experiments had the sufficient number of technical and biological replicates to make valid statistical analyses. For Fig. S2, the authors used RAIN to assess rhythmicity in WT and mutant flies, but it is not clear whether the different categories (rhythmic, poorly rhythmic, and arrhythmic) were based on amplitude differences alone, or a combination of amplitude and p-values as determined by RAIN.

      Minor comments:

      1. Other minor comments

      Main Text:

      p3, line 62; p12, line l32: It doesn't seem necessary or appropriate to cite the dictionary for the definition of robust.

      p4, line l87: "~20 h" rhythms instead of "~20h-hour"

      p3, line 70; p5, line 121; p14, line 380; p16, line 416 and p18, line 458: Close parentheses have been doubled in parenthetical references.

      p14, line 363: "crassa" instead of "Crassa"

      p17, line 430: "Sigma" instead of "sigma"

      p18, lines 464 and 483; p20, line 521: put a space between numerical values and units, to be consistent with other entries

      p19, line 488: "luciferase" instead of Luciferase

      p20, line 512: "Cell Signaling" instead of "cell signalling"

      p20, line 526: "single" instead of "Single"

      Main figures:

      Fig. 2 p37, line 921: close parenthesis was doubled on "red"

      Fig. 4 p41, line 989: "0.1 mM" instead of "0.1 mM" for consistency throughout text

      Supplementary text:

      line 171: "30 mM HEPES" instead of "30mM HEPES"

      line 184: "Cell Signaling" instead of "cell signalling"

      Supplementary figures:

      Fig. S2A "Drosophila melanogaster" instead of "Drosophila Melanogaster"

      Significance

      This paper revisits the previously proposed idea that rhythmic expression of central TTFL components is not essential for circadian timekeeping to persist. However, this paper does not add a significant advance in the understanding of the underlying reasons behind sustained clock protein rhythmicity like PER in the absence of CRY, since such mechanisms in functional analogs have been shown in other systems, like Neurospora (Larrondo et al., 2015). However, this paper does clarify some issues in the field, such as discrepancies between behavioral and cellular rhythms observed in CKO mice, leading future researchers to examine closely the conditions of their CKO rhythmic assays before making conclusions pertaining to rhythmicity. The identification of the kinases as components of the proposed cytosolic oscillator (cytoscillator) needs further validation, but this is perhaps beyond the scope of the paper. The data provides incremental evidence for the existence of a cytoscillator, but opens up opportunities to identify other players, like phosphatases, to establish the connection between the central TTFL and the proposed cytoscillator.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In the canonical model of the mammalian circadian system, transcription factors, BMAL1/CLOCK, drive transcription of Cry and Per genes and CRY and PER proteins repress the BMAL1/CLOCK activity to close the feedback loop in a circadian cycle. The dominant opinion was that CRY1 and CRY2 are essential repressors of the mammalian circadian system. However, this was challenged by persistent bioluminescence rhythms observed in SCN slices derived from Cry-null mice (Maywood et al., 2011 PNAS) and then by persistent behavior rhythms shown by the Cry1 and Cry2 double knockout mice if they are synchronized under constant light prior to free running in the dark (Ono et al., 2013 PLOS One). In the manuscript, the authors first confirmed behavioral and molecular rhythms in the Cry1/Cry2- deficient mice and then provided evidence to suggest the rhythms of Per2:LUC and Nr1d1:LUC in CKOs are generated from the cytoplasmic oscillator instead of the well-studied transcription and translation feedback loop: Constant Per2 transcription driven by BMAL1/CLOCK plus rhythmic degradation of the PER protein result in a rhythmic PER2 level in the absence of both Cry1 and Cry2, which suggests a connection between the classic transcription- and translation-based negative feedback loops and non-canonical oscillators.

      Major points:

      Line 38-39, "Challenging this interpretation, however, we find evidence for persistent circadian rhythms in mouse behavior and cellular PER2 levels when CRY is absent." The rhythmic behavioral phenotype of cry1 and cry2 double knockout mice was first documented by Ono et al., 2013 PLOS ONE, in which eight cry1 and cry2 double knockout mice after synchronization in the light displayed circadian periods with different lengths and qualities. The paper reported two period lengths from the Cry mutant mice: "An eye-fitted regression line revealed that the mean shorter period was 22.86+/-0.4 h (n= 8) and the mean longer period was 24.66+/-0.2 h (n =9). The difference of two periods was statistically significant (p, 0.01).", either of which is quite different from the ~16.5 hr period in Figure 1B of the manuscript. A brief discussion on the period difference between studies will be helpful for readers to understand. Period information from the individual mouse should be calculated and shown since big period variations exist among CKO mice (Ono et al., 2013 PLOS One).

      The behavioral phenotype of Cry-null mice and luminescence from their SCNs are robustly rhythmic while fibroblasts derived from these mice only produce rhythms with very low amplitudes compared with those in WT, which may reflect the difference between the SCN's rhythm and peripheral clocks. The behavioral phenotype is supposed to be controlled mainly by SCN. However, most molecular analyses in the work were done with MEF and lung fibroblasts. These tissues may not be the best representative of the behavioral phenotype of the CKO mice.

      Stronger evidence is needed to fully exclude the possibility that in CKO cells, the rhythm is not generated by PERs' compensation for the loss of Crys to repress BMAL1 and CLOCK. Since the rhythms of Per:LUC or Nr1d1:LUC (Figures 3D and S3E) are much weaker than those in WT, molecular analyses might not be sensitive enough to reflect the changes across a circadian cycle in the CKOs if the TTFL still occurs. CLOCKΔ19 mutant mice have a ~4 hr longer period than WT (Antoch et al., 1997 Cell; King et al., 1997 Cell). CLOCKΔ19; CKO cells or mice should be very helpful to address the question. Periods of Per:LUC and Nr1d1:LUC from the CLOCKΔ19; CKO should be similar to those in the CKO alone if the transcription feedback does not contribute to the their oscillations.

      Lines 51-52, "PER/CRY-mediated negative feedback is dispensable for mammalian circadian timekeeping" and lines 310-311, "We found that transcriptional feedback in the canonical TTFL clock model is dispensable for cell-autonomous circadian timekeeping in animal and cellular models." The authors have not excluded the possibility that the rhythmic behaviors of the CKO mice are derived from the PERs' compensation for the role of Crys in the feedback loop of the circadian clock in the SCN. In the fibroblasts, only two genes, Per2 and Nr1d1, have been studied in the work, which cannot be simply expanded to the thousands of circadian controlled genes. Also amplitudes of PER2:LUC and NR1D1:LUC in the CKOs are much lower than those in WT and no evidence has been provided to show that their weak rhythms are biologically relevant.

      Minor points:

      Lines 66-67, "...(Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)." to "... (reviewed in Dunlap, 1999; Reppert and Weaver, 2002; Takahashi, 2016)."

      Line 70, "...((Liu et al., 2008..." to "...(Liu et al., 2008..."

      Lines 174-175, "Considering recent reports that transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ...". Larrondo et al., 2015 paper says "however, in such ∆fwd-1 cells, the amount of FRQ still oscillated, the result of cyclic transcription of frq and reinitiation of FRQ synthesis." The point of the paper is "we unveiled an unexpected uncoupling between negative element half-life and circadian period determination." instead of "...transcriptional feedback repression is not absolutely required for circadian rhythms in the activity of FRQ,"

      Lines 249-252, "CKO cells exhibit no rhythm in Per2 mRNA (Figure 3C, D), nor do they show a rhythm in global translational rate (Figure S4A, B), nor did we observe any interaction between BMAL1 and S6K/eIF4 as occurs in WT cells (Lipton et al, 2015) (Figure S4C)." In figures 3D and S3E, in CKO and CPKO cells the Per2:LUC data without fitting look better than that of Nr1d1:LUC. But the Nr1d1:LUC rhythm became clear after fitting the raw data. So to better visualize the low amplitude rhythm, if any, of Per2:LUC and compare with Nr1d1:LUC, fitted the Per2:LUC data in CKOs and CPKOs in Figure 3D and S3E should be shown as what has been done to Nr1d1:LUC.

      Lines 258-259, "much less than the half-life of luciferase expressed in fibroblasts under a constitutive promoter" In figure S4D, the y-axis of the PER2::LUC is ~800 while the y-axis of the SV40::LUC is ~600000. The over-expressed LUC by the SV40 promoter might saturate the degradation system in the cell so the comparison is not fair. A weaker promoter with the level similar to Per2 should be used to make the comparison.

      Line 430, "sigma" to "Sigma".

      In figure S2, the classification of rhythms in Drosophila is not clear since even the "Robustly rhythmic" ones have high background noise. Detrending or fitting the data might be able to improve the quality of the rhythms prior to classification.

      In figure S3B, the original blots for Per2 including Input and IP should be shown.

      Supplemental information

      Line 44, "...(reviewed in (Lakin-Thomas,..." to "...(reviewed in Lakin-Thomas,..."

      Line 188, "Period CDS", the full name of CDS should be provided the first time it appearances.

      Significance

      The work suggests a link between the TTFL and non-canonical oscillators, which should be interesting to the circadian field.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      This interesting study by Putker et al. showed that circadian rhythmicity persists in several typical circadian assay systems lacking Cry, including Cry knockout mouse behavior and gene expression in Cry knockout fibroblasts. They further demonstrated weak but significant circadian rhythmicity in Cry- and Per- knockout cells. Cry- (and potentially Per-)-independent oscillations are temperature compensated, and CKId/e still has a role in the period regulation of Cry-independent oscillations.

      Major comments:

      1) The authors propose that the essential role of mammalian Cryptochrome is to bring the robust oscillation. As the authors analyze in many parts, the robustness of oscillation can be validated by the (relative) amplitude and phase/period variation, both of which should be affected significantly by the method for cell synchronization. Unfortunately, the method for synchronization is not adequately written in this version of supplementary information. This reviewer has no objection to the "iterative refinement of the synchronization protocol" but at least the correspondence between which methods were used in which experiments needs to be clearly explained. The detailed method may be found in the thesis of Dr. Wong, but the methods used in this manuscript need to be detailed within this manuscript.

      2) The authors revealed that CKO mice have apparent behavioral rhythmicity under the condition of LL>DD. This is an intriguing finding. However, it should be carefully evaluated whether this rhythmicity (16 hr cycle) is the direct consequence of circadian rhythmicity observed in CKO and CPKO cells (24 hr cycle) because the period length is much different. Is it possible to induce the 16 hr periodicity in CKO mice behavior by 16 hr-L:16 hr-D cycle? Would it be a plausible another possibility that the 16 hr rhythmicity is the mice version of internal desynchronization or another type of methamphetamine-induced-oscillation/food-entrainable-oscillattion?

      3) The authors proposed that CKId/e at least in part is the component of cytoscillator (Fig. 5D), and turnover control of PER (likely to be controlled by CKId/e) may be an interaction point between cytoscillator and canonical circadian TTFL (Fig. 4). Strictly speaking, this model is not directly supported by the experimental setting of the current manuscript. The contribution of CKId/e is evaluated in the presence of PER by monitoring the canonical TTFL output (i.e. PER2::LUC); thus it is not clear whether the kinase determines the period of cytoscillator. It would be valuable to ask whether the PF and CHIR have the period-lengthening effect on the Nrd1:LUC in the CPKO cell.

      Minor comments:

      4) The authors argue that the CKO cells' rhythmicity is entrained by the temperature cycle (Fig. 2C). Because the data of CKO cell only shows one peak after the release of constant temperature phase, it is difficult to conclude whether the cell is entrained or just respond to the final temperature shift.

      5) It would be useful for readers to provide information on the known phenotype of TIMELESS knockout flies; TIM is widely accepted as an essential component of the circadian clock in flies; are there any studies showing the presence of circadian rhythmicity in Tim-knockout flies (even if it is an oscillation seen in limited conditions, such as the neonatal SCN rhythm in mammalian Cry knockout)?

      5) Figure 3C shows that the amount of PER2::LUC mRNA changes ~2 fold between time = 0 hr and 24 hr in the CKO cell. This amplitude is similar to that observed in WT cell although the peak phase is different. Does the PER2::LUC mRNA level show the oscillation in CKO cells?

      6) Figure 3D: the authors discuss the amplitude and variation (whether the signal is noisier or not) of reporter luciferase expression between different cell lines. However, a huge difference in the luciferase signal can be observed even in the detrended bioluminescence plot. This reviewer concerns that some of the phenotypes of CKO and CPKO MEF reflect the lower transfection efficiency of the reporter gene, not the nature of circadian oscillators of these cell lines.

      Significance

      Although Cryptochrome (Cry) has been considered a central component of the mammalian circadian clock, several studies have shown that circadian rhythms are maintained in the absence of Cry, including in the neonate SCN and red blood cells. Thus, although the need for Cry as a circadian oscillator has been debated, its essential role as a circadian oscillator remains established, at least in the cell-autonomous clock driven by the TTFL. This study provides additional evidence that the circadian rhythmicity can persist in the absence of Cry.

      More general context, the presence of a non-TTFL circadian oscillator has been one of the major topics in the field of circadian clocks except for the cyanobacteria. In mammals, the authors' and other groups lead the finding of circadian oscillation in the absence of canonical TTFL by showing the redox cycle in red blood cells (O'Neil, Nature 2011). The presence of circadian oscillation in the absence of Bmal1 is also reported recently(Ray, Science 2020). Bmal1(-CLOCK), CRY, and PER compose the core mechanism of canonical circadian TTFL; thus, this manuscript put another layer of evidence for the non-TTFL circadian oscillation in mammals.

      Overall, the manuscript reports several surprising results that will receive considerable attention from the circadian community.

      This reviewer has expertise in the field of mammalian circadian clocks, including genomics, biochemistry, and mice's behavior analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the Reviewers for the positive assessment of our work and their insightful remarks. Please find below a point-by-point response to each comment.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Scheckel et al. report a large dataset on cell type-specific translational profiling of PrD-associated molecular alterations in the a mouse model thorough RiboTRAP and ribosome profiling approaches. They report a more severe alteration in the translatome specifically in astrocyte and microglia as compared to neuronal populations. This highlights that changes in these two cell classes might have a predominant role in the pathology of PrD.

      Data and the methods are presented such that they can be reproduced. The data analysis section of the manuscript could be further elaborated. In particular, it could be clarified which / how comparisons with existing dataset have been performed. Statistical analysis description is sometimes missing (e.g. fig 6e, not clear what the stars on top of the bars stands for, which test was performed and the significance). Moreover, the section of the methods regarding the western blots presented in figure 6 appear to be missing.

      Fig 6e shows the output (log2 fold change) of DESeq2. Genes with a Benjamini-Hochberg adjusted p value \*Major concern:**

      The most important improvement the authors should consider for their paper is to more specifically attempt to isolate specific effects on translational efficiency of mRNAs. As it stands, the authors largely use RiboTrap data as a reference to compare their footprinting data - but arguably, this misses mRNAs that are present in the transcriptome and not efficiently recruited onto ribosomes. It appears to be somewhat a lost opportunity to not attempt to test in the dataset (possibly by comparison to RNA-Seq from FACS isolated cells as a reference) whether there is a systematic change in translational efficiency (possibly in mRNAs with specific features?). In the current form, the RiboTrap and footprinting approaches largely serve to isolate mRNAs from cre-defined cell types but given the lack of a "total transcriptome" reference from the respective cells, it can not be easily interpreted whether certain transcripts are heavily regulated at the level of translation. Thus, despite using much more advanced methodologies than the Sorce study, the fundamental conclusions emerging from this work are rather similar to this previously published piece of work.

      Translational changes can be assessed in a cell-type specific manner without artefacts related to dissociation/isolation procedures and are arguably more relevant than transcriptional changes (Haimon et al., Nat. Immunol. 2018). Both, the assessment of translation as well as the investigation of specific cell types differentiates this study from transcriptional profiling studies including Sorce et al. Accordingly, our approach identified > 1000 cell-type specific translational changes that were missed in the Sorce study (Fig. 5a-d).

      We agree however with the reviewer that a comparison of our data with RiboTrap data does not take non-transcribed RNAs into account. We have refrained from such a comparison for several reasons:

      We agree with the reviewer that a systematic comparison of transcriptomes and translatomes in the assessed cell types at every time point would have allowed us to identify genes regulated on a post-transcriptional level. The goal of this study was however to identify biologically relevant prion-induced molecular changes in a cell-type specific manner rather than identify post-transcriptional regulation. To assess the validity of our approach we chose closely related datasets (RiboTrap datasets) to compare our data to. The inclusion of RNAseq datasets from FACS-isolated cells would require an additional 2 years of work since all samples and datasets would need to be newly generated (breeding mice, inoculating mice with prions and waiting for up to 8 months for mice to reach the terminal time point, establishing procedures, generating and analyzing datasets) RNA-Seq from FACS isolated neurons is problematic due to neuronal processes often being lost during the dissociation/isolation procedures. Additionally, dissociation/isolation procedures typically introduce stress-related artefacts. These procedure-induced changes complicate comparisons with techniques that have been optimized to avoid such artefacts (including the method applied in this manuscript). Differences between transcriptional and translational datasets could thus be either due to post-transcriptional regulation or due to artefact differences and are likely difficult to interpret.

      **Additional suggestions:**

      1) In Figure 1d the authors point out occasional neuronal cells exhibiting Rpl10a-GFP expression with arrows. It appears that these arrows may have moved during figure preparation - please check/fix if necessary.

      Thank you for pointing this out. We have fixed the arrows.

      2) In Supplementary Figure 1b and c it appears that the PV labeling is missing in the panel for Rpl10a:GFP controls. If this is intentional please indicate this in the figure legend.

      A co-localization of GFP-positive cells and PV was assessed only in Cre-positive (GFP expressing) mice but not in Cre-negative mice that don’t express GFP. We have clarified this point in the corresponding figure legend.

      3) It appears that the authors sequenced a significant number of libraries generated for multiple time points post-inoculation. From the figures and legends it was not entirely clear to me, how many replicates were analyzed given that in some analyses samples from different time points were combined in a single plot.

      All analyzed samples are listed in Supplementary File 1. We have emphasized this pointed in the results section.

      4) It was unclear to me how long after inoculation the group of "terminally ill" mice were sacrificed. Somewhere in the text it states that there are 2 months between 24 wpi and terminally ill - but it appears that this was not a preset timepoint but varied from animal to animal based on symptoms. Please clarify.

      We sacrifice mice at the last humane time point possible at which they show terminal disease symptoms, including piloerection, hind limb clasping, kyphosis and ataxia. Intraperitoneal inoculated mice reach that time point at 31 - 32 weeks post inoculation (+/- few days). Control mice (inoculated with non-infectious brain homogenate) were sacrificed at the same time. We have clarified this point in the methods section.

      5) From the Western blot data in Figure 6f the authors conclude that GFAP expression is upregulated in PrD mice whereas astrocyte number is unchanged. Given that the translatome is assessed based on a Rpl10-GFP dependent on recombination mediated by cre driven from GFAP promoter it is possible that the astrocytic alterations in ribosome footprints are in part a secondary consequence of increased Rpl10-GFP recombination/ expression in PrD mice (due to activation of the GFAP promoter). To estimate the impact of such an effect the authors should compare GFP levels in terminally ill control and PrD mice by western blotting.

      We agree with the reviewer that this information would be important to add. We have therefore assessed GFP levels in Rpl10a:GFP mice bred with GFAPCre and Cx3cr1CreER mice. The corresponding western blots are included in Supplementary Figure 11. GFP levels remained constant in terminally ill GFAPCre mice. This is not surprising since even a low GFAP promoter activity is likely to allow sufficient Cre recombinase expression to remove a STOP cassette allowing GFP expression (controlled by the Rosa26 promoter) in GFAPCre mice. In contrast, we observed an increase in GFP expression in terminally Cx3cr1CreER mice, which is most likely linked to the increase in microglia numbers. As pointed out in the manuscript, the translational changes we identified cannot reflect differences in cell numbers due to the nature of our assay. This suggests that a difference in GFP expression does not impact our analyses.

      We have added this data to the manuscript.

      6) The western blot analysis of fig 6f-g has been performed using a normalization over calnexin, yet no calnexin signals shown to support this statement.

      We have included blots of the normalization control calnexin as Supplementary Figure 11a.

      7) Clarify the percentage of non-parenchimal machrophages that are accounting for the Cx3cr1-creER mouse line since the authors consider this only to be a minor contamination.

      The labeling of non-parenchymal macrophages using Cx3cr1CreER mice has previously been estimated to be ~1% (Haimon et al., Nat. Immunol. 2018). We have added this information to the manuscript.

      8) Regarding the presentation of the data, Fig 5a would be clearer if in the y axes, for each cell type the order of PrD and Ctrl samples was maintained.

      Fig 5a displays hierarchical clustering based on Euclidian distances. As samples are ordered according their distance from each other, we cannot change the order as suggested by the reviewer.

      Reviewer #1 (Significance (Required)):

      Overall, this is an important and interesting study. Besides its insights into the biology, the transcriptomic data will provide a valuable resource for researchers in the field.

      Previous studies employed bulk RNAseq or microdissection for mapping transcriptomic changes (Majer et al.2019; Sorce et al. 2020 and others). The Sorce et al study concluded that astrocytic alterations in the transcriptome are more dominant than neuronal gene expression changes. While the conclusion of the present study remains the same, it is the first to use of ribosome profiling to dissect actively translated transcripts over the progression of the pathology in the mouse model. Thus, the data presented here would allow for identifying cell type-specific alterations as well as alterations specifically in mRNA translation which would be missed by bulk RNA-Seq and RNA-Seq on FACS-isolated cells. However, the authors do not fully capitalize on this strength, given that no detailed comparisons are done to a real transcriptome reference are performed (see above).

      This work is of broad interest to scientists in neurodegeneration as well as glial biology.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Using a series of Cre-driven mouse strains a GFP-tagged version of RPL10a (a ribosomal protein) was targeted to different cell types allowing Dr Scheckel and colleagues to investigate translational changes as prion disease progresses in mice. Their data suggest massive changes in microglia and astrocytes but not neurons. The approach was particularly powerful as ribosome IP has been combined with ribosome profiling. The manuscript is very well written. What might help, however, is to make the figures more accessible (perhaps change some of the labelling?)

      I have only minor comments regarding some of the figures:

      Fig 1a: This scheme could be improved, adding wpi and better aligning the cell-types in relation to the time when the cell-types were analysed.

      We have replaced weeks with wpi and changed the alignment of cell types to clarify that all cell types were analyzed at every time point.

      Fig 1b-e: The resolution could be improved to better discern the different cell-types.

      We submitted low-quality figures due to an upload limit but will submit final figures of higher quality. Additionally, we have added higher magnification pictures to better discern the different cell types as Supplementary Fig. 1d-e.

      Fig 4: Astrocytes are categorised into A1 and A2 and microglia based on DAM and homeostatic signature (How does this relate to the M1 and M2 classification?).

      The categorization of microglia into homeostatic and disease-associated (as well as other) microglia has largely replaced the initial categorization into pro-inflammatory M1 and anti-inflammatory M2 microglia (Dubbelaar et al., Front Immunol. 2018), We have therefore opted for the more current categorization. This explanation has also been added to the manuscript.

      Reviewer #2 (Significance (Required)):

      Highly significant. I have published on de novo protein synthesis in neurodegenerative disease

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      The authors sampled actively translated proteins by cell type in the brains of RiboTag expressing mice under the control of cell specific cre recombination to determine changes in the translational profiles. They injected prions IP to induce prion disease. Their model shows little to no neuron loss at the terminal stage due to animal welfare regulations, but neuronal loss is a key hallmark of prion disease, along with gliosis. However, since other groups under different animal welfare regulations have shown that prion injection is sufficient to fully model the disease given enough time, there is sufficient evidence that this model captures early disease pathogenesis. The methodology used here has some clear advantages over previous cell-type isolation methods that require more lengthy sorting procedures. However, proteins with a long half-life or tightly regulated levels (such as TDP-43) are likely underrepresented by this method. The method also depends strongly on the specificity of the cre driver used; CamkIIa (excitatory N), parvalbumin (inhibitory N), GFAP (A), Cx3cr1 (microglia). While there is some off-target expression of the GFAP and Cx3cr1, the overall expression profiles generally match cell-specific transcriptomes obtained by other groups using other methods. They find major changes in astrocytes and microglia at terminal stages, after the onset of neurological symptoms, and comparatively fewer in neurons. Oligodendrocytes are not examined. The authors are commended on a thorough and well-designed study, especially in the comparison of multiple neuronal and glial types simultaneously.

      **Major comments:**

      Key conclusion 1: "Our results suggest that aberrant translation within glia may suffice to cause severe neurological symptoms and may even be the primary driver of prion disease." This conclusion is well-supported, serving as a hypothesis for future work. The data shows that the most abundant PTG changes are indeed in microglia at 24 wpi, before the onset of symptoms. In addition, although some genes are also differentially translated in the neuronal populations, examination of the Supplemental Tables shows that these are mostly highly expressed glial genes and could represent contamination of the sample during gliosis. The authors may wish to discuss this more prominently to avoid confusion. This data indeed suggests that glial changes alone are could be sufficient to produce the neurological symptoms in these mice. However, the authors should include discussion that the two genes changed at 24 weeks in PV neurons (Oprm1, Cyp2s1) do appear to be neuronal and may be relevant to pathogenesis as well. These mRNAs were also decreased in their previous paper conducting bulk sequencing in the hippocampus, according to the authors' online Prion RNAseq Database. Knockout experiments in mouse models have shown that dysregulation of one or a few critical genes in neurons can be sufficient to induce dysfunction and neurological symptoms, and the current evidence does not seem sufficient to rule it out. Fig 3d also suggests that PTGs in PV neurons may be particularly important, even accounting for the additional regions present in the RP analysis.

      We agree with the reviewer that few critical neuronal genes might be sufficient to induce neurological dysfunction and symptoms and have added this point to the results and discussion. Additionally, we have highlighted that many neuronal genes are glia-enriched and might reflect glia contamination.

      Key Conclusion 2: "Cell-type specific changes become only evident at late PrD stages." This conclusion is well supported. However, as the authors noted, due to legal constraints their model represents early to mid disease onset rather than a true terminal environment matching that of patients. Therefore, it would be advantageous to choose a more appropriate name for the "terminal" group, perhaps based on one of the key humane endpoint criteria that would help readers in the field to place these important results in context of the overall disease process.

      We have added additional information to clarify our definition of terminal stage to the methods.

      Key Conclusion 3: "This suggests that the prion-induced molecular phenotypes reflect major glia alterations, whereas the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes such as altered neuronal connectivity." The authors should modify the second half of this claim. As discussed above, changes to even a few neuronal genes can be sufficient to induce neurodegeneration. The claim that "the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes," fails to acknowledge the changes in PV neurons observed in this study, however few they may be. The authors also do not take into account the possible role of transcribed RNAs that are not immediately translated (for example those that accumulate at synapses for fast translation on demand) or the overall proteome, which are not included in their analysis. Though their method cannot detect these components, the authors should examine the implications that such other changes may still be present in the discussion. The authors should also discuss the functions of the few specific PV PTGs and explore their potential relationship with neurodegeneration. This is especially important since the authors acknowledge that a key reason for including PV neurons in the analysis is ample evidence in the literature that they play a role in disease pathogenesis. Finally, the authors note that a top GO term in microglial cells was synaptic transmission. The authors should expand on this finding in the discussion, as the interplay of glia and neurons in the pathogenesis of disease is likely highly relevant.

      We have removed the claim that “behavioral phenotypes may be ascribed to biochemically undetectable changes” and added the point that few neuronal changes might be sufficient to induce neuronal dysfunction & symptoms. As stated in the manuscript, we believe that the enrichment of the GO term synaptic transmission in microglia is an artefact. We therefore refrained from further discussing this finding and have highlighted that it is in artefact in the results.

      • *Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.* - *Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.*

      As discussed above, the inclusion of RNAseq datasets from FACS isolated cells would require an additional 2 years of work since all samples and datasets would need to be newly generated (breeding mice, inoculating mice with prions and waiting for up to 8 months for mice to reach the terminal time point, establishing procedures, generating and analyzing datasets).

      Key Conclusion 1: No additional experiments needed. Key Conclusion 2: No additional experiments needed. Key Conclusion 3: No additional experiments needed for a modified statement.

      The data and methods are largely reproducible. Additional information should be provided about the methods for Gene Ontology analysis, how it was controlled, and what was used as a significance measure.

      We have added additional information about the GO analysis to the methods section. The complete list of GO terms is now included as Supplementary File 10.

      Some groups contain only two animals. At least three should be included per group for a minimally robust analysis.

      We have tried to include 3 replicates per group as suggested by the reviewer. In few exceptions, we lost an individual sample and one sample had to be excluded due to low quality. In these instances (GFAP_2wpi Ctrl; CamKIIa_CX_term_Ctrl, CamKIIa_CX_term_PrD, Cx3cr1_term_Ctrl and Cx3cr1_term_PrD) we ensured that both replicates showed a high correlation and could still yield reliable results (see below). Consistently, the DESeq2 algorithm (which can handle also just 2 replicates per group) identified differentially translated genes in the terminal samples.

      **Minor comments:**

      Fig. 1 c-e all panels should have a scale bar. E, closer insets or larger images are needed to see the colocalization in these very small cells.

      We have added scale bars to all panels. A colocalization is indeed not visible in the uploaded low-quality Figures that were submitted due to the size limit. We believe that a colocalization is visible in the high-quality final pictures but are also happy to provide closer insets upon editorial request.

      Fig. 5f: To allow interpretation of the Gene Ontology analysis, authors should include the number of genes involved in the pathway and the number of those genes found in their sample input list.

      We have added details regarding the GO analysis to the methods section, and are now providing the requested information in Supplementary File 10.

      Fig. S6: It is not clear from viewing the figure or the legend what the percentages on the axes refer to.

      The principal components 1 and 2 are plotted on the x and y axes, respectively. The % of variance explained by these principal components is indicated. We have added this information to the figure legend.

      Fig. S7: the gene numbers are confusing because they do not match the data in Fig. 4a. It would be helpful to use the same LFC cutoff as in Fig. 4a to avoid misunderstandings by the reader, or explain why no cutoff is used and what information the authors wish to convey by presenting the data that way.

      *Typically, all significant changes (p adj Fig S9: The legend indicates that genes changed in all 5 datasets are colored in green, however this is not easily visible on the graphs (appears more gray).

      Genes changing in all datasets are colored in green in Fig. 5. Genes changing in all datasets are colored in grey in Supplementary Fig. 9. We have adjusted the corresponding legends. The quality of the figures is very low due to the upload limit. The final figures will be of higher quality.

      Fig. S10: on page 12 Supplementary Fig. 10c is referenced, but likely refers to 10b. Throughout manuscript: It should be RNase, not RNAse.

      Both points have been addressed.

      Reviewer #3 (Significance (Required)):

      This work provides an important conceptual advance in prion disease research that glia may be primary drivers of disease equal to or surpassing certain neuronal populations. Though the authors have shown previously that glial changes are dominant in bulk sequencing of the hippocampus, cell type-specific analysis adds an important level of detail to convince the field that few transcriptional changes occur in neurons though neurological defects are already present. Historically, neuronal defects have been assumed to occupy the main role, with glia being largely ignored. This echoes recent similar changes in other areas of the neurodegenerative disease field where we are recognizing the important roles of glia in pathogenesis, and how they may be modulated to treat disease.

      Their findings in PV neurons also may reflect early key changes in this important neuronal population that contribute to neurological symptom onset. They will allow further study of the genes and pathways involved and may lead to additional effective treatments for disease. Finally, the thorough comparison of multiple neuronal and glial populations will allow future investigation of the interplay of neurons and microglia in pathogenesis and shows the importance of studying them synergistically rather than individually.

      *Audience:*

      The neurodegenerative disease field in general will be interested in the findings. Immunologists, other neuroscientists, and pharmaceutical and other drug development organizations will also be influenced by the work.

      *Own expertise:*

      Neurodegenerative disease, transgenic mouse models, neuropathology, translational neuroscience

      REFEREE'S CROSS-COMMENTING:

      I agree with Reviewer 1 that a comparison of the total transcriptome with ribosomally active transcripts would aid the interpretation of this work. It would also uncover or refute the presence of cell-type differences in translation efficiency that directly impact the authors' major conclusion that glia are more affected than neurons. I support the request of this additional experiment.

      As discussed above we have refrained from such a comparison since 1) the scope of this study was to identify biologically relevant prion-induced molecular changes and not study post-transcriptional regulation, 2) the generation of such dataset will take ~ 2 years, and 3) difference between transcriptional and translational changes are likely a combination of post-transcriptional regulation and artefact induced change that are probably difficult to interpret.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The authors sampled actively translated proteins by cell type in the brains of RiboTag expressing mice under the control of cell specific cre recombination to determine changes in the translational profiles. They injected prions IP to induce prion disease. Their model shows little to no neuron loss at the terminal stage due to animal welfare regulations, but neuronal loss is a key hallmark of prion disease, along with gliosis. However, since other groups under different animal welfare regulations have shown that prion injection is sufficient to fully model the disease given enough time, there is sufficient evidence that this model captures early disease pathogenesis. The methodology used here has some clear advantages over previous cell-type isolation methods that require more lengthy sorting procedures. However, proteins with a long half-life or tightly regulated levels (such as TDP-43) are likely underrepresented by this method. The method also depends strongly on the specificity of the cre driver used; CamkIIa (excitatory N), parvalbumin (inhibitory N), GFAP (A), Cx3cr1 (microglia). While there is some off-target expression of the GFAP and Cx3cr1, the overall expression profiles generally match cell-specific transcriptomes obtained by other groups using other methods. They find major changes in astrocytes and microglia at terminal stages, after the onset of neurological symptoms, and comparatively fewer in neurons. Oligodendrocytes are not examined. The authors are commended on a thorough and well-designed study, especially in the comparison of multiple neuronal and glial types simultaneously.

      Major comments:

      Key conclusion 1: "Our results suggest that aberrant translation within glia may suffice to cause severe neurological symptoms and may even be the primary driver of prion disease." This conclusion is well-supported, serving as a hypothesis for future work. The data shows that the most abundant PTG changes are indeed in microglia at 24 wpi, before the onset of symptoms. In addition, although some genes are also differentially translated in the neuronal populations, examination of the Supplemental Tables shows that these are mostly highly expressed glial genes and could represent contamination of the sample during gliosis. The authors may wish to discuss this more prominently to avoid confusion. This data indeed suggests that glial changes alone are could be sufficient to produce the neurological symptoms in these mice. However, the authors should include discussion that the two genes changed at 24 weeks in PV neurons (Oprm1, Cyp2s1) do appear to be neuronal and may be relevant to pathogenesis as well. These mRNAs were also decreased in their previous paper conducting bulk sequencing in the hippocampus, according to the authors' online Prion RNAseq Database. Knockout experiments in mouse models have shown that dysregulation of one or a few critical genes in neurons can be sufficient to induce dysfunction and neurological symptoms, and the current evidence does not seem sufficient to rule it out. Fig 3d also suggests that PTGs in PV neurons may be particularly important, even accounting for the additional regions present in the RP analysis.

      Key Conclusion 2: "Cell-type specific changes become only evident at late PrD stages." This conclusion is well supported. However, as the authors noted, due to legal constraints their model represents early to mid disease onset rather than a true terminal environment matching that of patients. Therefore, it would be advantageous to choose a more appropriate name for the "terminal" group, perhaps based on one of the key humane endpoint criteria that would help readers in the field to place these important results in context of the overall disease process.

      Key Conclusion 3: "This suggests that the prion-induced molecular phenotypes reflect major glia alterations, whereas the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes such as altered neuronal connectivity." The authors should modify the second half of this claim. As discussed above, changes to even a few neuronal genes can be sufficient to induce neurodegeneration. The claim that "the neuronal changes responsible for the behavioral phenotypes may be ascribed to biochemically undetectable changes," fails to acknowledge the changes in PV neurons observed in this study, however few they may be. The authors also do not take into account the possible role of transcribed RNAs that are not immediately translated (for example those that accumulate at synapses for fast translation on demand) or the overall proteome, which are not included in their analysis. Though their method cannot detect these components, the authors should examine the implications that such other changes may still be present in the discussion. The authors should also discuss the functions of the few specific PV PTGs and explore their potential relationship with neurodegeneration. This is especially important since the authors acknowledge that a key reason for including PV neurons in the analysis is ample evidence in the literature that they play a role in disease pathogenesis. Finally, the authors note that a top GO term in microglial cells was synaptic transmission. The authors should expand on this finding in the discussion, as the interplay of glia and neurons in the pathogenesis of disease is likely highly relevant.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. - Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Key Conclusion 1: No additional experiments needed. Key Conclusion 2: No additional experiments needed. Key Conclusion 3: No additional experiments needed for a modified statement.

      The data and methods are largely reproducible. Additional information should be provided about the methods for Gene Ontology analysis, how it was controlled, and what was used as a significance measure. Some groups contain only two animals. At least three should be included per group for a minimally robust analysis.

      Minor comments:

      Fig. 1 c-e all panels should have a scale bar. E, closer insets or larger images are needed to see the colocalization in these very small cells. Fig. 5f: To allow interpretation of the Gene Ontology analysis, authors should include the number of genes involved in the pathway and the number of those genes found in their sample input list. Fig. S6: It is not clear from viewing the figure or the legend what the percentages on the axes refer to. Fig. S7: the gene numbers are confusing because they do not match the data in Fig. 4a. It would be helpful to use the same LFC cutoff as in Fig. 4a to avoid misunderstandings by the reader, or explain why no cutoff is used and what information the authors wish to convey by presenting the data that way. Fig S9: The legend indicates that genes changed in all 5 datasets are colored in green, however this is not easily visible on the graphs (appears more gray). Fig. S10: on page 12 Supplementary Fig. 10c is referenced, but likely refers to 10b. Throughout manuscript: It should be RNase, not RNAse.

      Significance

      This work provides an important conceptual advance in prion disease research that glia may be primary drivers of disease equal to or surpassing certain neuronal populations. Though the authors have shown previously that glial changes are dominant in bulk sequencing of the hippocampus, cell type-specific analysis adds an important level of detail to convince the field that few transcriptional changes occur in neurons though neurological defects are already present. Historically, neuronal defects have been assumed to occupy the main role, with glia being largely ignored. This echoes recent similar changes in other areas of the neurodegenerative disease field where we are recognizing the important roles of glia in pathogenesis, and how they may be modulated to treat disease.

      Their findings in PV neurons also may reflect early key changes in this important neuronal population that contribute to neurological symptom onset. They will allow further study of the genes and pathways involved and may lead to additional effective treatments for disease. Finally, the thorough comparison of multiple neuronal and glial populations will allow future investigation of the interplay of neurons and microglia in pathogenesis and shows the importance of studying them synergistically rather than individually.

      Audience:

      The neurodegenerative disease field in general will be interested in the findings. Immunologists, other neuroscientists, and pharmaceutical and other drug development organizations will also be influenced by the work.

      Own expertise:

      Neurodegenerative disease, transgenic mouse models, neuropathology, translational neuroscience

      REFEREE'S CROSS-COMMENTING:

      I agree with Reviewer 1 that a comparison of the total transcriptome with ribosomally active transcripts would aid the interpretation of this work. It would also uncover or refute the presence of cell-type differences in translation efficiency that directly impact the authors' major conclusion that glia are more affected than neurons. I support the request of this additional experiment.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Using a series of Cre-driven mouse strains a GFP-tagged version of RPL10a (a ribosomal protein) was targeted to different cell types allowing Dr Scheckel and colleagues to investigate translational changes as prion disease progresses in mice. Their data suggest massive changes in microglia and astrocytes but not neurons. The approach was particularly powerful as ribosome IP has been combined with ribosome profiling. The manuscript is very well written. What might help, however, is to make the figures more accessible (perhaps change some of the labelling?)

      I have only minor comments regarding some of the figures:

      Fig 1a: This scheme could be improved, adding wpi and better aligning the cell-types in relation to the time when the cell-types were analysed. Fig 1b-e: The resolution could be improved to better discern the different cell-types. Fig 4: Astrocytes are categorised into A1 and A2 and microglia based on DAM and homeostatic signature (How does this relate to the M1 and M2 classification?).

      Significance

      Highly significant. I have published on de novo protein synthesis in neurodegenerative disease

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Scheckel et al. report a large dataset on cell type-specific translational profiling of PrD-associated molecular alterations in the a mouse model thorough RiboTRAP and ribosome profiling approaches. They report a more severe alteration in the translatome specifically in astrocyte and microglia as compared to neuronal populations. This highlights that changes in these two cell classes might have a predominant role in the pathology of PrD.

      Data and the methods are presented such that they can be reproduced. The data analysis section of the manuscript could be further elaborated. In particular, it could be clarified which / how comparisons with existing dataset have been performed. Statistical analysis description is sometimes missing (e.g. fig 6e, not clear what the stars on top of the bars stands for, which test was performed and the significance). Moreover, the section of the methods regarding the western blots presented in figure 6 appear to be missing.

      Major concern:

      The most important improvement the authors should consider for their paper is to more specifically attempt to isolate specific effects on translational efficiency of mRNAs. As it stands, the authors largely use RiboTrap data as a reference to compare their footprinting data - but arguably, this misses mRNAs that are present in the transcriptome and not efficiently recruited onto ribosomes. It appears to be somewhat a lost opportunity to not attempt to test in the dataset (possibly by comparison to RNA-Seq from FACS isolated cells as a reference) whether there is a systematic change in translational efficiency (possibly in mRNAs with specific features?). In the current form, the RiboTrap and footprinting approaches largely serve to isolate mRNAs from cre-defined cell types but given the lack of a "total transcriptome" reference from the respective cells, it can not be easily interpreted whether certain transcripts are heavily regulated at the level of translation. Thus, despite using much more advanced methodologies than the Sorce study, the fundamental conclusions emerging from this work are rather similar to this previously published piece of work.

      Additional suggestions:

      1) In Figure 1d the authors point out occasional neuronal cells exhibiting Rpl10a-GFP expression with arrows. It appears that these arrows may have moved during figure preparation - please check/fix if necessary.

      2) In Supplementary Figure 1b and c it appears that the PV labeling is missing in the panel for Rpl10a:GFP controls. If this is intentional please indicate this in the figure legend.

      3) It appears that the authors sequenced a significant number of libraries generated for multiple time points post-inoculation. From the figures and legends it was not entirely clear to me, how many replicates were analyzed given that in some analyses samples from different time points were combined in a single plot.

      4) It was unclear to me how long after inoculation the group of "terminally ill" mice were sacrificed. Somewhere in the text it states that there are 2 months between 24 wpi and terminally ill - but it appears that this was not a preset timepoint but varied from animal to animal based on symptoms. Please clarify.

      5) From the Western blot data in Figure 6f the authors conclude that GFAP expression is upregulated in PrD mice whereas astrocyte number is unchanged. Given that the translatome is assessed based on a Rpl10-GFP dependent on recombination mediated by cre driven from GFAP promoter it is possible that the astrocytic alterations in ribosome footprints are in part a secondary consequence of increased Rpl10-GFP recombination/ expression in PrD mice (due to activation of the GFAP promoter). To estimate the impact of such an effect the authors should compare GFP levels in terminally ill control and PrD mice by western blotting.

      6) The western blot analysis of fig 6f-g has been performed using a normalization over calnexin, yet no calnexin signalis shown to support this statement.

      7) Clarify the percentage of non-parenchimal machrophages that are accounting for the Cx3cr1-creER mouse line since the authors consider this only to be a minor contamination.

      8) Regarding the presentation of the data, Fig 5a would be clearer if in the y axes, for each cell type the order of PrD and Ctrl samples was maintained.

      Significance

      Overall, this is an important and interesting study. Besides its insights into the biology, the transcriptomic data will provide a valuable resource for researchers in the field.

      Previous studies employed bulk RNAseq or microdissection for mapping transcriptomic changes (Majer et al.2019; Sorce et al. 2020 and others). The Sorce et al study concluded that astrocytic alterations in the transcriptome are more dominant than neuronal gene expression changes. While the conclusion of the present study remains the same, it is the first to use of ribosome profiling to dissect actively translated transcripts over the progression of the pathology in the mouse model. Thus, the data presented here would allow for identifying cell type-specific alterations as well as alterations specifically in mRNA translation which would be missed by bulk RNA-Seq and RNA-Seq on FACS-isolated cells. However, the authors do not fully capitalize on this strength, given that no detailed comparisons are done to a real transcriptome reference are performed (see above).

      This work is of broad interest to scientists in neurodegeneration as well as glial biology.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      __*Review 1 Summary:

      __In this manuscript, Borah et al showed that Heh2, a component of INM, can be co-purified with a specific subset of nucleoporins. They also found that disrupting interactions between Heh2 and NPC causes NPC clustering. Lastly, they showed that the knockout of Nup133, which does not physically interact with Heh2, causes the dissociation of Heh2 from NPCs. These findings led the authors to propose that Heh2 acts as a sensor of NPC assembly state. *

      __Reviewer 1 major comment 1:__ The authors claimed that Heh2 acts as a sensor of NPC assembly state, as evidenced by their finding that Heh2 fails to bind with NPCs in nup133 Δ cells (Fig2, Fig 5). However, there is a possibility that the association between Heh2 and NPCs is merely affected by the clustering of the NPCs (as the authors discussed) but not related to the structural integrity of NPC.

      • *

      Our Response: We agree that this is a possibility, however, we ask the reviewer to also consider that we artificially cluster NPCs using the anchor away system (Figure 3C) and this does not affect Heh2’s association with NPCs. Thus, clustering per se is insufficient to disrupt Heh2 binding to NPCs. We will also make changes in the text to make this point.

      • *

      Reviewer 1 major comment 2: In addition, their data showing that the Heh2-NPCs association is not easily disrupted by knocking out the individual components of the IRC (Fig. 5A and 5D), also disfavor the idea that Heh2 could sense NPC assembly state.

      Our Response: There are three considerations here. The first is that as this is the first evidence of any kind of “NPC assembly state” sensor, it is difficult to make any assumptions as to what specifically such a sensor would be monitoring. i.e. perhaps sensing only the ORC is what is functionally important. Second, for obvious reasons, we only tested non-essential IRC nups so by definition there is inherent functional redundancy that maintains NPC function and thus there may be no need to “sense” anything in the absence of these IRC nups. Further (and last), the IRC is essential for NPC assembly. Thus, without an IRC there is no NPC assembly state to sense.

      Reviewer 1 major comment 3: Since some nup knockout strains, other than nup133 Δ, are also known to show the NPC clustering (ex. nup159 (Gorsch JCB 1995) and nup120 (Aitchison JCB 1995; Heath JCB 1995)), it will be worth trying to monitor the localization of Heh2 and its interaction with nucleoporins (by Heh2-TAP) using these strains. While Nup159 is a member of the cytoplasmic complex, Nup120 is an ORC nucleoporin. Thus, biochemical and phenotypical analysis using these mutant cells will be useful to clarify if the striking phenotypes the authors found are specific to nup133 knockout strain (or ORC Nup knockouts) or could be commonly observed in the strains that show NPC clustering. Another interesting point is that Nup159 shows strong interaction with Heh2, even in nup133Δ cells. As the authors mentioned, Nup159-Heh2 interaction may not be sufficient for Heh2-NPC association, but it could be important for NPC clustering.

      Our Response: These are excellent points and we agree that there is a need to more thoroughly explore how NPC clustering driven by abrogating the function of other nups impacts Heh2’s association with NPCs. Thus, in a revised manuscript, we would examine Heh2’s association with NPCs in several additional genetic backgrounds where NPCs cluster.

      Reviewer 1 major comment 4: Figure 4C: Is it known that rapamycin treatment in this strain did not affect the protein levels of nucleoporins? Otherwise, the authors should confirm this by western blotting (at least some of them).

      Our Response: This is a good point and we will directly address this with Western blotting of some nups.

      Reviewer 1 major comment 5: Figure 5: The authors mentioned (line 256-257) that "in all cases the punctate, NPC-like distribution of Heh2-GFP was retained (Fig 5D)". However, nup107 KO strain seems to show more diminished punctate staining as compared with other strains. To clarify this, the authors should express mCherry tagged Nup as in Fig. 2 or Fig. 3.

      Our Response: Yes, we agree and in fact this observation is consistent with the fact that there is an ER-pool of Heh2 observed in this strain and we observe loss of nup interactions in the affinity purification. We will include a more thorough quantification of this in a revised manuscript and more directly address this in the text.

      **Minor comments:**

      Reviewer 1 minor comment 1: Figure 4A and 4B: The authors should show Scatter plot as in Fig. 2 and Fig. 3.

      • *

      We will include this in a revised manuscript.

      Reviewer 1 minor comment 2: Figure 5C: Explanations of the arrowheads is missing in the figure legend.

      Thank you for pointing this out, it will be fixed in a revised manuscript.

      Reviewer 1 minor comment 3: Figure 6: Is there any information as to where Heh2 (316-663) is localized in the cell?

      As this truncation lacks INM targeting sequences, it is found throughout the cortical ER. The determinants of Heh2 targeting (including truncations) has been extensively evaluated in King et al. 2006, Meinema et al., 2011 and Rempel et al. 2020. We will make this clearer in the revised manuscript.

      Reviewer 1 minor comment 4: Figure 6B: Nucleoporins should be marked with color circles as in Fig. 1 and Fig. 5.

      This will be done.

      Reviewer 2

      Borah et al. present a biochemical and cell biological examination of the inner nuclear membrane (INM) protein Heh2 and its putative interactions with the nuclear pore complex (NPC). The potential conceptual advance of this study is that Heh2 interacts with the NPC, while mutations believed to trigger NPC mis-assembly are shown to abolish interaction with Heh2, leading to the hypothesis that Heh2 is a sensor for NPC assembly states within the (INM). The conclusions would undoubtably be of broad interest to the nucleocytoplasmic transport field, but the evidence provided thus far is insufficient to build confidence and consequently this manuscript is premature for publication.

      Our Response: We thank the reviewer for recognizing the potential for a significant conceptual advance for the field but object to the notion that the work is “premature for publication”. This is a highly subjective statement that does not seem to meet the mission or purpose of the Review Commons platform. While it is possible that some of the conclusions drawn in our manuscript might not be fully supported by the data in its current form, there is a substantial body of work here that is certainly publishable.

      Reviewer 2 major comment 1: The TAP-tag Heh1/Heh2 pulldowns are the most significant experiment presented, and on face value provide compelling evidence that Heh2 interacts with the NPC. It is stated that mass spectroscopy (MS) was used to confirm the identities of the labeled bands yet there is no methods section, nor any MS data reported in the manuscript. Given the large number of unspecified proteins observed in these gels, and the single-step pulldown methodology used, knowledge of the contaminants present may aid in elucidating how Heh2 pulls down NPC components. Consequently, within the supplementary materials, the authors must indicate which regions of the gel were excised for MS analysis and provide a table listing all of the proteins that were detected for each sample, including the number of unique/expected peptides observed. Our Response: This was a major oversight on our part and a revised manuscript will contain all relevant details with regards to the MS analysis including a more detailed description of the excised bands and the quantification of spectra derived from these bands.

      Reviewer 2 major comment 2a: The representative micrographs provided across Figures 2, 3, 4, 5 and 6 are very noisy. Particularly in the case of the mCherry labeled nucleoporins, this is both unusual and unfortunate given this is used to infer colocalization of Heh2 with the NPC.

      Our Response: These micrographs are not unusual and are in fact of respectable quality. We agree that the apparent “noise” is unfortunate, but this is simply a reality of the yeast system. We remind the reviewer that there are only ~100 to ~200 NPCs per budding yeast nucleus, which is an order of magnitude smaller than a typical mammalian cell nucleus. Further, the copy number of yeast nups per NPC is half of the mammalian cell NPC. Further, budding yeast are spherical with a cell wall that is extremely effective at scattering light; they are also highly autofluorescent (particularly in the red channel). Lastly, unlike in mammalian cells, budding yeast NPCs are mobile on the nuclear envelope. Thus, co-localization is challenging (particularly with the long exposures required to obtain good images). This is why clustering of NPCs driven by nup133**∆ cells has provided one of the key assays in the field to assess whether a given protein associates with NPCs at the level of light microscopy.

      Reviewer 2 major comment 2b: As a result it is unclear whether this experiment can be used to differentiate between NPC colocalization vs. nuclear envelope colocalization.

      Our Response: The reviewer is correct. Co-localization between Heh2-GFP and any Nup-mCherry is insufficient to assess NPC association in WT cells. In fact, as we point out in Figure 3B, at best one can expect a correlation of r = 0.48 for two well established nups. Thus, to further support the conclusion that Heh2 associates with NPCs, we established the Nsp1-FRB NPC clustering assay (Figure 3).

      Reviewer 2 major comment 2c: The authors should include negative controls for an alternative NE membrane protein that doesn't bind the NPC, which would be expected to exhibit a reduced level of colocalization with NPC proteins when compared to Heh2. For example, Heh1 would be a suitable, given the clear-cut negative pulldown data and its prior usage as a negative control in Figure 4.

      • *

      Our Response: This is included in Figure 3D.

      Reviewer 2 major comment 3a. Figure 2. The rim staining for the Nup82-mCherry in the WT background is unusually punctate, bringing into question the viability of the cells imaged.

      Our Response: As the middle cell in the panel is undergoing cell division, these cells are clearly viable. All our imaging is performed on mid-log phase cultures.

      • *

      Reviewer 2 major comment 3b. Why has ScNup82, a cytoplasmic filament component, been selected for colocalization experiments when Heh2 is proposed to interact with the inner ring complex?

      Our Response: The resolution of a conventional light microscope is, at best, 200 nm in x, y. As NPCs are 100 nm in diameter, even two NPCs side-by-side cannot be resolved. The IRC is tens of nm away from the cytoplasmic filaments thus any nup is relevant for a co-localization analysis with a light microscope.

      Reviewer 2 major comment 3c: Additionally, the experiments shown in panels A and C are not directly comparable, ScNup82 is an asymmetric cytoplasmic nucleoporin, while SpNup107 is located in the Y-shaped Nup84 nucleoporin complex and present on both faces of the NPC. This experiment should be repeated with scNup84 to match panel C, additionally a viability dot spot assay and western blot analysis of the labeled proteins should be conducted.

      Our response: These are in fact directly comparable within the limits of resolution of light microscopy as described above. Viability assays are not required here as both nups are essential and perturbation to their function would lead to inviability.

      Reviewer 2 major comment 4: Figure 3, the authors use yeast strains where proteins are tagged with FRB and FKBP12 domains, which dimerize upon the addition of rapamycin inducing NPC clusters. The authors then observe the effect this has on Heh2 NPC colocalization. However, Rapamycin may also have an effect independent from the induced dimerization event. Negative controls should be performed in strains lacking the FRB and FKBP12 tagged proteins to demonstrate that Rapamycin doesn't modify Heh2 localization independently of NPC clustering.

      Our response: This is a good point and important control that we performed in prior studies, see Colombi et al., JCB, 2013. We will be more explicit in describing that this control has been done.

      Reviewer 2 major comment 5: Figure 4. The authors provide a qualitative description of the colocalization presented, while in all other instances they calculate a Pearson correlation coefficient. This is significant because Heh2 appears to be evenly distributed within the NE of the DMSO control (panel B). Given the presented hypothesis isn't colocalization expected with Nup192? As a minimum, a Pearson correlation coefficient analysis should be conducted and added to Figure 4.

      Our response: This will be included in a revised manuscript.

      Reviewer 2 major comment 6: Figure 4. Pom152-mCherry localizes at both the NE and strongly within the cytoplasm, which is unexpected given typical rim staining phenotypes observed previously for both Pom152-YFP and Pom152-GFP strains (Katta, ..., Jaspersen et al., Genetics (2015) & Upla, ..., Fernandez-Martinez et al., Structure (2017), respectively). Given the unusually weak rim staining observed throughout, viability assays of the strains listed in Table S1 and protein expression analysis of the tagged nucleoporins via western blot is necessary.

      Our response: This is not localization in the cytoplasm but is in fact autofluorescence from the yeast vacuole. We regret we were not more explicit in describing this and we will make the manuscript more accessible for the non yeast expert. In order to perform the Western blot analysis for all strains requested by the reviewer would require a battery of antibodies to the endogenous proteins to directly assess how tagging influences nup levels, which we do not have (nor does anyone else that we are aware of). This is also not standard practice in the field as it is an onerous and unnecessary burden.

      Reviewer 2 major comment 7:* Figure 5A. The TAP-tagged pulldowns from ∆Pom152 and ∆Nup133 strains appear to be from a different round of experiments than the previous deletion strains presented. Interestingly, there appears to be an additional band at approximately 250 kDa in both cases that is not present in any other experiments. This band could be a contaminant observed due to different experimental conditions, or a protein that exclusively binds to Heh2 in the ∆Pom152 and ∆Nup133 background. Either way the authors should identify this protein with MS to address this ambiguity.

      *

      Our response: We will include negative controls for these specific experiments to show that this is a non specific band.

      Reviewer 2 major comment 8: Figure 6B. Please label the nucleoporin bands in the TAP-tagged pulldowns.

      Our response: This will be done.

      Reviewer 2 major comment 9: Figure 6D. Please specify Heh2-GFP clustering in the y-axis.

      Our response: As this represents both Heh2-GFP and heh2-1-570-GFP, we will keep it as is to avoid confusion.

      Reviewer 2 major comment 10: *Under the results section titled 'Heh2 binds to specific nups in evolutionarily distant yeasts', the authors state that spHeh2 co-purifies with "several specific species". The meaning is unclear, this sentence should be rephrased and the specific species clearly described. **

      *

      Our response: Ok.

      Reviewer 2 major comment 11: Under the results section titled 'Heh2 fails to interact with NPCs lacking Nup133', the authors refer to a Pearson correlation coefficient of -0.03 as a clear anticorrelation. Instead state there was no correlation.

      Our response: Ok.

      Reviewer 2 major comment 12: In the discussion, the authors state that "clustering itself may sterically preclude an interaction with Heh2". The text should be expanded to explain this in more detail, it is not clear from the presented data why this would occur.

      Our response: Ok.

      Reviewer 2 comment on significance: the manuscript is premature for publication.

      Our Response: Such a statement has no relevance to this form of review as a decision as to whether a study is premature for publication should be made by journal editors, not reviewers. We would argue quite strongly that we have definitively shown that Heh2 binds to NPCs, that it does so in multiple evolutionarily distant yeasts and that this binding is functionally relevant. For example, we can specifically disrupt the association of Heh2 with NPCs with a specific domain deletion and observe a loss of function phenotype (e.g. NPC clustering). What all three reviewers agree on is that the concept of a “NPC assembly state sensor” needs additional data to be fully supported, although we note that this reviewer did not provide any suggestions for how we might achieve this goal. We further note that we added the qualifier “may” into the title of the work. Thus, we will therefore perform additional experiments as outlined in comments to Reviewer 1 to support this conclusion in order to introduce this as a new concept in the field.

      Reviewer Comment from Cross Commenting: It seems to me that all reviewers agree that the manuscript is premature for publication. The data thus far do not support the conclusion that Heh2 may be an NPC assembly sensor nor does it provide any mechanistic insight. Reading the comments of the other two reviewers makes me more negative, as it is care that the paper also lacks scientific rigor. The manuscript is a great starting point for a rigorous dissection but I do not see this paper to be a candidate for a broad impact journal.

      Our Response: The statement that this manuscript is premature for publication is an opinion and does not seem to reflect the sentiment of the other reviewers. It is also confounding that this reviewer suggests that this work lacks rigor. With the exception of the omission of the MS analysis (our fault), the data are of high quality and rigorously quantified. Our assertion of rigor and data quality is based on our collective team’s many decades-long history of publishing and reviewing papers at the highest levels in this field. Questions as to the quality of the data as stated by this reviewer (and only this reviewer) in fact address limitations of light microscopy and the yeast system more generally in this one respect.


      Reviewer 3

      Reviewer 3 Summary part a*: This is quite an interesting manuscript that explores the relationship between an INM protein, Heh2, and NPCs. It represents an extension of earlier work performed by this group in which it was shown that the HEH2 gene shares genetic interactions with the genes encoding various nucleoporins. Heh2 belongs to an intriguing family of conserved proteins that includes its orthologue, Heh1, as well as human MAN1 (LEMD3) and LEMD2, among others. Each of these proteins contains two transmembrane domains with the N- and C-terminal regions extending in to the nucleoplasm. The two TM domains are separated by a short lumenal loop.

      In this study, the authors show that a population of Heh2 is associated with Nups of the NPC inner ring complex. This was demonstrated initially in pulldown experiments. The authors go on to show that when NPCs are caused to aggregate, by physical tethering employing an FKBP/FRP system in combination with Rapamycin, Heh2, but not Heh1, colocalizes with the NPC clusters. *

      • *

      Our Response: Thank you to the reviewer for recognizing the value of this work.

      • *

      Reviewer 3 Summary_b. Although not stated explicitly in the manuscript, this would imply that there is a population of Heh2 that resides in the NPC membrane domain, with the remainder in the INM. As an idle question, is there any evidence for a similar localization of MAN1 or LEMD2 in mammals? I am guessing probably not.

      Our Response: We regret this was not made more clear but the idea that there is a pool of Heh2 at the POM and a pool at the INM is an important conclusion of the work and was stated in the results - we’ll re-emphasize in the revised discussion. As to whether MAN1 or LEMD2 has a similar NPC association, we hypothesize that MAN1 but not LEMD2 will indeed interact with NPCs in mammalian cells. This is based on considering that we show that both the budding and fission yeast orthologues of MAN1 share this association so unless it was lost in evolution, this is a likely outcome of future studies.

      Reviewer 3 Significance statement a: The complications arise when the authors show that an alternative method of NPC aggregation (although they did this first), involving Nup133 deletion, results in failure of Heh2 to co-aggregate. In other words, Nup133 is required for the association of Heh2 with NPCs. The issue here is that there is no evidence for an interaction between Heh2 and Nup133, and furthermore that loss of Nup133 (a Y complex component of the outer ring complex) leaves the inner ring complex intact.

      • *

      Our Response: We tested the nup133Δ background first as this is the standard approach for assessing NPC-association of a given protein so we felt this would be logical for a reader in the field. Further, while the disruption of Heh2’s binding by loss of Nup133 may be a complication, we prefer to see it as an opportunity for discovery. As described in our manuscript, we have chosen to interpret this result in the context of a new biological function/concept with Heh2 being a novel “NPC assembly state” sensor. While one could argue that we have not fully met this bar yet, we will perform additional experiments as outlined in our response to reviewer 1 to help support this compelling conclusion.

      • *

      Reviewer 3 Signfiicance statement b: What is clear, however, is that Heh2 seems to be required to inhibit NPC aggregation since Heh2 deficient cells exhibit NPC clusters. The association between Heh2 and IRC Nups resides in the C-terminal nucleoplasmic winged helix domain. The N-terminal domain, in contrast confers INM localization.

      • *

      Our Response: We agree.__*


      Reviewer 3 Signfiicance statement c I must admit, I am in two minds about this manuscript. The data clearly show that Heh2 is associated with IRC components and I agree with the authors that this protein may well have a role in NPC assembly quality control perhaps in the guise of a chaperone. However, I find it hard to come up with a convincing model for the effects of Nup133. On the one hand, one could make an argument that the data presented here is too preliminary and fails to provide a complete story. On the other hand, it does provide an intriguing foundation for future studies and I do feel positively disposed towards it. In short, I have no fundamental complaints about the science, I am just uncertain as to whether the study is ready for publication.

      Our Response: This statement nicely articulates the challenge with this manuscript as there are some solid findings (that Heh2 binds specifically to NPCs etc.) but also a provocative finding (that loss of Nup133 breaks Heh2’s interaction with NPCs despite not physically interacting). Thus, there is a decision to be made about whether there is value in introducing a novel concept to the field once additional data is provided in a revised manuscript.

      Reviewer 3 Cross commenting: I have no fundamental disagreements with either of the other two reviewers. The comment from Reviewer#2 summarises this quite neatly. While I have fewer concerns about the quality of the data as presented, I think we all agree that at best the study is preliminary. What the authors need to do is to construct a coherent model that will account for the observations described here and then to design experiments that will test this model. I'm not suggesting that they must have a complete story, but they do need to go beyond what is in the current manuscript.

      • *

      Our Response: We appreciate that the reviewer does not have any questions about the quality of our data, but we argue that we have in fact presented the most coherent interpretation of the data as it currently stands. As described above, we intend to attempt to solidify this model by performing experiments suggested by reviewer 1.



      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting. Reply to the Reviewers I thank the Referees for their...Referee #1__

      1. The authors should provide more information when... Responses__

      The typical domed appearance of a hydrocephalus-harboring skull is apparent as early as P4, as shown in a new side-by-side comparison of pups at that age (Fig. 1A). Though this is not stated in the MS

      1. Figure 6: Why has only... Response: We expanded the comparisonMinor comments:__

      2. The text contains several... Response: We added... Referee #2__

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This is quite an interesting manuscript that explores the relationship between an INM protein, Heh2, and NPCs. It represents an extension of earlier work performed by this group in which it was shown that the HEH2 gene shares genetic interactions with the genes encoding various nucleoporins. Heh2 belongs to an intriguing family of conserved proteins that includes its orthologue, Heh1, as well as human MAN1 (LEMD3) and LEMD2, among others. Each of these proteins contains two transmembrane domains with the N- and C-terminal regions extending in to the nucleoplasm. The two TM domains are separated by a short lumenal loop.

      In this study, the authors show that a population of Heh2 is associated with Nups of the NPC inner ring complex. This was demonstrated initially in pulldown experiments. The authors go on to show that when NPCs are caused to aggregate, by physical tethering employing an FKBP/FRP system in combination with Rapamycin, Heh2, but not Heh1, colocalizes with the NPC clusters. Although not stated explicitly in the manuscript, this would imply that there is a population of Heh2 that resides in the NPC membrane domain, with the remainder in the INM. As an idle question, is there any evidence for a similar localization of MAN1 or LEMD2 in mammals? I am guessing probably not.

      Significance

      The complications arise when the authors show that an alternative method of NPC aggregation (although they did this first), involving Nup133 deletion, results in failure of Heh2 to co-aggregate. In other words, Nup133 is required for the association of Heh2 with NPCs. The issue here is that there is no evidence for an interaction between Heh2 and Nup133, and furthermore that loss of Nup133 (a Y complex component of the outer ring complex) leaves the inner ring complex intact. What is clear, however, is that Heh2 seems to be required to inhibit NPC aggregation since Heh2 deficient cells exhibit NPC clusters. The association between Heh2 and IRC Nups resides in the C-terminal nucleoplasmic winged helix domain. The N-terminal domain, in contrast confers INM localization.

      I must admit, I am in two minds about this manuscript. The data clearly show that Heh2 is associated with IRC components and I agree with the authors that this protein may well have a role in NPC assembly quality control perhaps in the guise of a chaperone. However, I find it hard to come up with a convincing model for the effects of Nup133. On the one hand, one could make an argument that the data presented here is too preliminary and fails to provide a complete story. On the other hand, it does provide an intriguing foundation for future studies and I do feel positively disposed towards it. In short, I have no fundamental complaints about the science, I am just uncertain as to whether the study is ready for publication.

      REFEREES CROSS COMMENTING

      I have no fundamental disagreements with either of the other two reviewers. The comment from Reviewer#2 summarises this quite neatly. While I have fewer concerns about the quality of the data as presented, I think we all agree that at best the study is preliminary. What the authors need to do is to construct a coherent model that will account for the observations described here and then to design experiments that will test this model. I'm not suggesting that they must have a complete story, but they do need to go beyond what is in the current manuscript.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Borah et al. present a biochemical and cell biological examination of the inner nuclear membrane (INM) protein Heh2 and its putative interactions with the nuclear pore complex (NPC). The potential conceptual advance of this study is that Heh2 interacts with the NPC, while mutations believed to trigger NPC mis-assembly are shown to abolish interaction with Heh2, leading to the hypothesis that Heh2 is a sensor for NPC assembly states within the (INM). The conclusions would undoubtably be of broad interest to the nucleocytoplasmic transport field, but the evidence provided thus far is insufficient to build confidence and consequently this manuscript is premature for publication.

      Specific comments:

      (1)The TAP-tag Heh1/Heh2 pulldowns are the most significant experiment presented, and on face value provide compelling evidence that Heh2 interacts with the NPC. It is stated that mass spectroscopy (MS) was used to confirm the identities of the labeled bands yet there is no methods section, nor any MS data reported in the manuscript. Given the large number of unspecified proteins observed in these gels, and the single-step pulldown methodology used, knowledge of the contaminants present may aid in elucidating how Heh2 pulls down NPC components. Consequently, within the supplementary materials, the authors must indicate which regions of the gel were excised for MS analysis and provide a table listing all of the proteins that were detected for each sample, including the number of unique/expected peptides observed.

      (2)The representative micrographs provided across Figures 2, 3, 4, 5 and 6 are very noisy. Particularly in the case of the mCherry labeled nucleoporins, this is both unusual and unfortunate given this is used to infer colocalization of Heh2 with the NPC. As a result it is unclear whether this experiment can be used to differentiate between NPC colocalization vs. nuclear envelope colocalization. The authors should include negative controls for an alternative NE membrane protein that doesn't bind the NPC, which would be expected to exhibit a reduced level of colocalization with NPC proteins when compared to Heh2. For example, Heh1 would be a suitable, given the clear-cut negative pulldown data and its prior usage as a negative control in Figure 4.

      (3)Figure 2. The rim staining for the Nup82-mCherry in the WT background is unusually punctate, bringing into question the viability of the cells imaged. Why has ScNup82, a cytoplasmic filament component, been selected for colocalization experiments when Heh2 is proposed to interact with the inner ring complex? Additionally, the experiments shown in panels A and C are not directly comparable, ScNup82 is an asymmetric cytoplasmic nucleoporin, while SpNup107 is located in the Y-shaped Nup84 nucleoporin complex and present on both faces of the NPC. This experiment should be repeated with scNup84 to match panel C, additionally a viability dot spot assay and western blot analysis of the labeled proteins should be conducted.

      (4)Figure 3, the authors use yeast strains where proteins are tagged with FRB and FKBP12 domains, which dimerize upon the addition of rapamycin inducing NPC clusters. The authors then observe the effect this has on Heh2 NPC colocalization. However, Rapamycin may also have an effect independent from the induced dimerization event. Negative controls should be performed in strains lacking the FRB and FKBP12 tagged proteins to demonstrate that Rapamycin doesn't modify Heh2 localization independently of NPC clustering.

      (5)Figure 4. The authors provide a qualitative description of the colocalization presented, while in all other instances they calculate a Pearson correlation coefficient. This is significant because Heh2 appears to be evenly distributed within the NE of the DMSO control (panel B). Given the presented hypothesis isn't colocalization expected with Nup192? As a minimum, a Pearson correlation coefficient analysis should be conducted and added to Figure 4.

      (6)Figure 4. Pom152-mCherry localizes at both the NE and strongly within the cytoplasm, which is unexpected given typical rim staining phenotypes observed previously for both Pom152-YFP and Pom152-GFP strains (Katta, ..., Jaspersen et al., Genetics (2015) & Upla, ..., Fernandez-Martinez et al., Structure (2017), respectively). Given the unusually weak rim staining observed throughout, viability assays of the strains listed in Table S1 and protein expression analysis of the tagged nucleoporins via western blot is necessary.

      (7)Figure 5A. The TAP-tagged pulldowns from ∆Pom152 and ∆Nup133 strains appear to be from a different round of experiments than the previous deletion strains presented. Interestingly, there appears to be an additional band at approximately 250 kDa in both cases that is not present in any other experiments. This band could be a contaminant observed due to different experimental conditions, or a protein that exclusively binds to Heh2 in the ∆Pom152 and ∆Nup133 background. Either way the authors should identify this protein with MS to address this ambiguity.

      (8)Figure 6B. Please label the nucleoporin bands in the TAP-tagged pulldowns.

      (9)Figure 6D. Please specify Heh2-GFP clustering in the y-axis.

      (10)Under the results section titled 'Heh2 binds to specific nups in evolutionarily distant yeasts', the authors state that spHeh2 co-purifies with "several specific species". The meaning is unclear, this sentence should be rephrased and the specific species clearly described.

      (11)Under the results section titled 'Heh2 fails to interact with NPCs lacking Nup133', the authors refer to a Pearson correlation coefficient of -0.03 as a clear anticorrelation. Instead state there was no correlation.

      (12)In the discussion, the authors state that "clustering itself may sterically preclude an interaction with Heh2". The text should be expanded to explain this in more detail, it is not clear from the presented data why this would occur.

      Significance

      the manuscript is premature for publication.

      REFEREES CROSS COMMENTING

      It seems to me that all reviewers agree that the manuscript is premature for publication. The data thus far do not support the conclusion that Heh2 may be an NPC assembly sensor nor does it provide any mechanistic insight. Reading the comments of the other two reviewers makes me more negative, as it is care that the paper also lacks scientific rigor. The manuscript is a great starting point for a rigorous dissection but I do not see this paper to be a candidate for a broad impact journal.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript, Borah et al showed that Heh2, a component of INM, can be co-purified with a specific subset of nucleoporins. They also found that disrupting interactions between Heh2 and NPC causes NPC clustering. Lastly, they showed that the knockout of Nup133, which does not physically interact with Heh2, causes the dissociation of Heh2 from NPCs. These findings led the authors to propose that Heh2 acts as a sensor of NPC assembly state.

      Major comments:

      The authors claimed that Heh2 acts as a sensor of NPC assembly state, as evidenced by their finding that Heh2 fails to bind with NPCs in nup133 Δ cells (Fig2, Fig 5). However, there is a possibility that the association between Heh2 and NPCs is merely affected by the clustering of the NPCs (as the authors discussed) but not related to the structural integrity of NPC. In addition, their data showing that the Heh2-NPCs association is not easily disrupted by knocking out the individual components of the IRC (Fig. 5A and 5D), also disfavor the idea that Heh2 could sense NPC assembly state. Since some nup knockout strains, other than nup133 Δ, are also known to show the NPC clustering (ex. nup159 (Gorsch JCB 1995) and nup120 (Aitchison JCB 1995; Heath JCB 1995)), it will be worth trying to monitor the localization of Heh2 and its interaction with nucleoporins (by Heh2-TAP) using these strains. While Nup159 is a member of the cytoplasmic complex, Nup120 is an ORC nucleoporin. Thus, biochemical and phenotypical analysis using these mutant cells will be useful to clarify if the striking phenotypes the authors found are specific to nup133 knockout strain (or ORC Nup knockouts) or could be commonly observed in the strains that show NPC clustering. Another interesting point is that Nup159 shows strong interaction with Heh2, even in nup133Δ cells. As the authors mentioned, Nup159-Heh2 interaction may not be sufficient for Heh2-NPC association, but it could be important for NPC clustering.

      Figure 4C: Is it known that rapamycin treatment in this strain did not affect the protein levels of nucleoporins? Otherwise, the authors should confirm this by western blotting (at least some of them).

      Figure 5: The authors mentioned (line 256-257) that "in all cases the punctate, NPC-like distribution of Heh2-GFP was retained (Fig 5D)". However, nup107 KO strain seems to show more diminished punctate staining as compared with other strains. To clarify this, the authors should express mCherry tagged Nup as in Fig. 2 or Fig. 3.

      Minor comments:

      Figure 4A and 4B: The authors should show Scatter plot as in Fig. 2 and Fig. 3.

      Figure 5C: Explanations of the arrowheads is missing in the figure legend.

      Figure 6: Is there any information as to where Heh2 (316-663) is localized in the cell?

      Figure 6B: Nucleoporins should be marked with color circles as in Fig. 1 and Fig. 5.

      Significance

      Heh2 has been implicated in the quality control of NPC assembly, however, the molecular mechanism of how Huh2 interacts and affects NPC assembly/function remained largely unknown. The relationship between Heh2 and specific nucleoporins shown in this study is novel and interesting. While the data are overall good quality and convincing, the current manuscript still lacks the molecular mechanistic insights. In particular, it is not clear if the observed phenotypes are due to structural defects of NPC or NPC clustering.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): The manuscript by Huh et al. reports that oxidative stress causes fragmentation of a specific tyrosine pre-tRNA, leading to two parallel outcomes. First, the fragmentation depletes the mature tRNA, causing translational repression of genes that are disproportionally rich in tyrosine codon. These genes are enriched for those involved in electron transport chain, cell cycle and growth. Second, the fragmentation generates tRNA fragments (tRFs) that bind to two known RNA binding proteins. Finally, the authors identify a nuclease that is needed for efficient formation of tyrosine tRFs. Comment 1: Th­­­­e authors should include a short diagram indicating the various known steps of pre-tRNA fragmentation (perhaps as a supplement) for general readers.

      Response: We thank the reviewer for their suggestion. Pre-tRNA fragmentation is still an unknown field but an initial introduction is best seen from pre-tRNA processing where there is a cleavage event for pre-tRNAs with an intron. This is a complex subject but a recent review from Hopper and Nostramo has done an excellent job in in describing the current field in yeast and vertebrate species (Hopper and Nostramo, Front. Genet., 2019). We have added this citation and new text in the manuscript about pre-tRNA processing for general readers to follow up on. We feel that a supplementary figure might be a bit too brief in describing the knowns and unknowns of pre-tRNA processing and fragmentation.

      Comment 2: I find the enrichment for mitochondrial electron transport chain (ETC) curious. The ETC includes several oxidoreductases, which may be rich in tyrosine as it is a common amino acid used in electron transfer. The depletion of the tyrosine tRNA from among many tRNAs under oxidative stress may not be incidental but related to an attempt by the cell to decrease oxygen consumption to avoid further oxidative damage. The authors could further mine their data to corroborate this hypothesis. For example, are the ETC genes among the targets of the RNA binding proteins targeted by tyrosine tRFs? This could potentially connect the effects of mature tRNA depletion and tRFs.

      Response: We thank the reviewer for this very interesting comment and insight, which had not occurred to us. The relationship between this response and oxidoreductase regulation could be a factor in both the tRNA and tRF modulations seen in our cells. Interestingly, we find that many oxidoreductases genes (such as the NDUF family) are bound by hnRNPA1 by CLIP. In new data, we have done stability experiments with the tRF (new Fig 7E-F) to show the regulon of hnRNPA1 is modulated with overexpression and LNA against the tRF, revealing that this tRNA fragmentation response modulates expression of certain oxidoreductase genes. However, we do not see clear and significant differences for ETC genes in particular. As hnRNPA1 is known to act as both a promoter and destabilizer of genes depending on context, it is likely that further and more detailed work will be needed to parse this hypothesis out in future studies.

      Comment 3: In figure 4A, the authors should provide the tyrosine codon content of the overlap genes and show how much it differs from a randomly selected sample.

      Response: We have identified an error in our manuscript where the overlap actually identifies 109 proteins rather than the 102 reported in the original manuscript. We apologize for this oversight. As for the overlap proteins, we plotted the downstream proteins detected in the proteome by mass spectrometry based off on Tyr-codon content. As explained in the text, the targets we tested were chosen for having higher than median levels of Tyr-codon, as seen in the histogram, and for showing some of the greatest reduction after Tyr tRNA-GUA depletion (Fig S4A). The other proteins found in the overlap will fall in a similar pattern along the histogram.

      Comment 4: Fig.6F, lower panel: the model should show pre-tRNA, as opposed to mature tRNA, because it is the former that is fragmented.

      Response: We apologize for the confusion. The model in Fig 7F was supposed to denote the pre-tRNA with the trailer and leader sequences intact initially, then lost with processing to mature tRNA. To make it clearer, we have now labeled the first species as “Pre-tRNA.”

      Reviewer #1 (Significance (Required)): This study is comprehensive and novel, and includes several orthogonal and complementary approaches to provide convincing evidence for the conclusions. The main discovery is significant because it presents an important advance in post-transcriptional control of gene expression. The process of tRF formation was previously thought not to affect the levels of mature tRNA. This study changes that understanding by describing for the first time the depletion of a specific mature tRNA as its precursor form is fragmented to generate tRFs. Finally, the authors identify DIS3L2 as a nuclease involved in fragmentation. This is also an important finding as the only other suspected nuclease, albeit with contradictory evidence, is angiogenin. Collectively, the findings of this study would be of interest to a broad group of scientists. I only have a few minor comments and suggestions (see above).

      Response: We thank the reviewer for their very positive and insightful comments and feedback.

      REFEREES CROSS-COMMENTING I have the following comments on other reviewers' critiques. Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly? Reviewer 3 raises the issue of cross hybridization in Northern blots. The authors indicate that they "could not detect the other tyrosyl tRNA (tRNA Tyr AUA) in MCF10A cells by northern blot..." (page 6). Also, they gel extracted tRFs and sequenced them (figure S6B), directly identifying the fragments. I think these findings mitigate the concern of cross hybridization and clearly identify the nature of tRFs. Finally, I think that the codon-dependent reporter experiment (figure 5D) addresses many issues surrounding codon dependent vs indirect effects. In that experiment, the authors mutate 5 tyrosine codons of a reporter gene and demonstrate that the encoded protein is less susceptible to repression in response to oxidative stress.

      Response: We thank the reviewer for their tremendous insights. We are in agreement regarding the three points in the cross-comments.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): This very interesting study from Sohail Tavazoie's lab describes the consequences of oxidative stress on the tRNA pool in human epithelial cell lines. As previously described, the authors observed that tRNA fragments were generated upon exposure of cells to ROS. In addition, the authors made the novel observation that specific mature tRNAs were also depleted under these conditions. In particular, the authors focused on tyrosyl tRNA-GUA, which was decreased ~50% after 24 hours of ROS exposure, an effect attributable to a decrease in the pre-tRNA pool. Depletion of tyrosyl tRNA resulted in reduced translation of specific mRNAs that are enriched in tyr codons and likely contributed to the anti-proliferative effects of ROS exposure. In addition, the authors demonstrated that the tRFs produced from tyr tRNA-GUA can interact with specific RNA binding proteins (SSB and hnRNPA1). The major contribution of this paper is the novel finding that stress-induced tRNA fragmentation can result in a measurable reduction of specific mature tRNAs, leading to a selective reduction in translation of mRNAs that are enriched for the corresponding codons. Previously, studies of tRNA fragmentation largely focused on the functions of the tRFs themselves and it was generally believed that the mature tRNA pool was not impacted sufficiently to reduce translation. The findings reported here therefore add a new dimension to our understanding of the cellular consequences of stress-induced tRNA cleavage. Overall, the data are of high quality, the experiments are convincing, and the conclusions are well supported. I have the following suggestions that would further strengthen the study and bolster the conclusions. Comment 1: The authors have not formally demonstrated that the reduction in pre-tRNA in H2O2-treated cells is a consequence of pre-tRNA cleavage. It is possible that reduced transcription contributes to this effect. Pulse-chase experiments with nucleotides such as EU would provide a tractable approach to demonstrate that a labelled pool of pre-tRNA is rapidly depleted upon H2O2 treatment, which would further support their model. Since the response occurs rapidly (within 1 hour), it would be feasible to monitor the rate of pre-tRNA depletion during this time period in control vs. H2O2-treated cells.

      Response: We thank the reviewer for their suggestion and agree that testing for a transcriptional effect using a pulse-chase experiment would further support these findings. We are grateful to both reviewer 1 and reviewer 2 in the cross-comments for recognizing that the tRNA repression response we see is too rapid to be a transcriptional response and that the fact that this tRNA depletion response occurs concomitantly with the tRF generation supports our model that this is a pre-tRNA fragmentation response. It would be of interest for future studies to also examine the impact of cellular stress on tRNA transcription.

      Comment 2: To what extent is the growth arrest that results from H2O2 treatment attributable to tyr tRNA-GUA depletion (Fig. 3A)? Since the reduction in tRNA levels is only partial (~50%), it should be feasible to restore tRNA levels by overexpression (strategy used in Fig. 3E, S3B) and determine whether this measurably rescues growth in H2O2-treated cells.

      Response: We thank the reviewer for their suggestion. Originally, we had also thought of this experiment and attempted to test this hypothesis. Upon experimentation, we ran into technical challenges that prevented us from drawing any conclusions. The problems were that we were unable to develop a cell line that stably overexpressed the Tyr tRNA-GUA and had to settle for a transient overexpression that only lasted for a couple of days (Fig S3B). For transient transfection, we used Lipofectamine 3000 (Invitrogen) that has associated cell toxicities and requires a control RNA transfection in lipofectamine. In addition, H2O2 in itself is a stress. The simultaneous occurrence of these two stresses led to a combination of cell death and cell growth for the control and experimental group. Given the high variability, we were unable to draw any conclusions on cell growth with this combination. We hope to identify a way to stably overexpress Tyr tRNA-GUA in the future to address this hypothesis.

      Comment 3: Knockdown of YARS/tyr tRNA-GUA resulted in reduced expression of EPCAM, SCD, and USP3 at both the protein and mRNA levels (Fig. 4C-D, S4C). In contrast, H2O2-exposure reduced the abundance of these proteins without affecting mRNA levels (Fig. 5A-B, S5A). The authors should comment on this apparent discrepancy. Perhaps translational stalling induces No-Go decay, but it is unclear why this response would not also be triggered by ROS.

      Response: We would like to clarify that out of the three genes in Fig. S5A, only EPCAM mRNA levels were significantly reduced with H2O2-exposure while no changes were observed in the mRNA levels of USP3 or SCD. It is difficult to ascertain the reason for EPCAM mRNA reduction but one hypothesis is due to timing and steady state levels. Levels of mRNAs seen with knockdown of YARS or tRNA represent steady state levels where mRNA decay and transcriptional changes can be easily seen. Following H2O2, the data is collected at 24 hours, which may be before mRNA effects can be fully appreciated. We have edited the text to clarify the uncertainty involved. We agree with the reviewer’s insightful comment and find these differences to be interesting and will consider them in future studies to better understand the interplay between translation and mRNA levels in the context of tRNA depletion.

      Comment 4: In addition to the analyses of ribosome profiling in Fig. 5E-F, it might also be helpful to show a metagene analysis of ribosome occupancy centered upon UAC/UAU codons (for an example, see Figure 2 of Schuller et al., Mol Cell, 2017). This has previously been used as an effective way to visualize ribosome stalling at specific codons. Additionally, do the authors see a global correlation between tyrosine codon density and reduced translational efficiency in tRNA knockdown cells?

      Response: We thank the reviewer for their important suggestion. We have expanded the analysis to look at codon usage scatterplots across all codons for shTyr and shControl replicates (Fig S5D). The 5 most changed codons are labeled with UAC, a codon for the tyrosine amino acid, being the most affected (red arrow). Consistent with our model, a tyrosine codon, when at the ribosome A-site, is most affected with depletion of the corresponding tRNA. The text has also been edited to reflect our new analysis providing further evidence that ribosomal stalling could occur upon depletion of this tRNA. The gray outline around the regression line represents the 95% confidence interval.

      Fig S5D

      As seen in Fig 5F, a significant overlap was noted for genes with the lowest translational efficiency and tyrosine enrichment. We did further analysis to test if a direct and linear relationship exists between tyrosine codon density and reduced translational efficiency on the global scale (i.e. does more stalling occur with more tyrosine codons on a global scale). We again see that a reduced translational efficiency is significantly correlated with tyrosine codon enrichment (above median parameters) in the tRNA knockdown ribosome profiling data. However, our analysis on a direct relationship between codon density and translational efficiency is inconclusive. This analysis is limited given the sequencing depth and number of experimental replicates available and we lack the statistical power to draw strong conclusions. To prevent overstating our claims, we have omitted any conclusions regarding this second analysis.

      Comment 5: MINOR: On pg. 4, the authors state that tRF-tyrGUA is the most highly induced tRF, but Fig. S1B appears to show stronger induction of tRF-LeuTAA.

      Response: The reviewer is correct in that the data from Fig S1B shows Leu-tRFs with higher induction. Our text was meant to suggest we focused on tRF-TyrGUA due to higher band intensity seen on northern blot validation. We have edited the text in the manuscript to clarify this.

      Reviewer #2 (Significance (Required)): The major advance provided by this work is the demonstration that stress-induced tRNA cleavage can reduce the abundance of the mature tRNA pool sufficiently to impact translation. Moreover, the effect on mature tRNAs is selective, resulting in the reduced translation of a specific set of mRNAs under these conditions. These findings reveal previously unknown consequences of oxidative stress on gene expression and will be of interest to scientists working on cellular stress responses and post-transcriptional regulation.

      Response: We thank the reviewer for the kind comments and feedback.

      REFEREES CROSS-COMMENTING Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly? Here is what I was thinking: The generation of tRFs does not generally result in reduction in levels of the mature tRNAs. So you can imagine a scenario where oxidative stress causes tRF generation from the mature tyr tRNA (which does not impact its steady-state levels), as is the case for other tRNAs. At the same time, decreased transcription would reduce the pre-tRNA pool, leading to a delayed reduction in mature tRNA, as observed. However, looking back at the data, I see that after only 5 min of H2O2 treatment, the authors observed reduced pre-tRNA and increased tRFs (Fig. 2A). This seems very fast for a transcriptional response, which would presumably require some kind of signal transduction. In addition, when you consider the amount of tRFs produced in Fig. S2C, it is hard to imagine that this would not impact the mature tRNA pool if they were derived from there. So I agree that the transcriptional scenario seems unlikely. Nevertheless, I think that looking at pre-tRNA degradation directly with the pulse-chase strategy would strengthen their story, so I would like to give the authors this suggestion. However, I am fine with listing this as an optional experiment which would enhance the paper but should not be essential for publication.

      Response: We thank the reviewer for these insightful comments. As mentioned above, five minutes is likely too rapid for a transcriptional response to be the main effect of H2O2 on Tyr-tRNA GUA. Moreover, the concomitant appearance of the tRF at this time-point makes tRNA fragmentation the most parsimonious and likely explanation rather than transcriptional repression, which would not cause a tRNA fragment to occur concurrently. Moreover, extraction and sequencing of the tRF shows it likely derives from the pre-tRNA as a 5’ leader sequence is present. We appreciate the reviewer’s suggestion and scholarly willingness to reassess their own hypothesis.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): The major findings in this manuscript are: 1.) Oxidative stress in human cells causes a decrease in tyrosine tRNA levels and accumulation of tyrosine tRNA fragments; 2.) The depletion of tyrosyl-tRNA synthetase or tyrosine tRNAs in human cells results in altered translation of certain genes and reduced cell growth and 3.) hnRNPA1 and SSB/La can bind tyrosine tRNA fragments. There is also preliminary evidence that the DIS3L2 endonuclease contributes to the appearance of tyrosine tRNA fragments upon oxidative stress. Based upon these results, the Authors conclude that tyrosine tRNA depletion is part of a conserved stress-response pathway to regulate translation in a codon-based manner. **Major comments:** Comment 1: There is a considerable amount of data in this paper and the experiments are performed in a generally rigorous manner. Sufficient details are provided for reproducing the findings and all results have been provided to appropriate databases (RNA-Seq and ribosome profiling).

      Response: We thank the reviewer for the positive comments and feedback.

      Comment 2: The manuscript uses a probe against the 5' half of Tyrosine tRNA for Northern blotting. However, tRNA probes can be prone to cross-hybridization, especially with some tRNA isoacceptors being similar in sequence. Thus, the blots in Figure 2 and Supplemental Figures should be probed with an oligonucleotide against the 3' half of tRNA-Tyr. This will confirm the pre- and mature tRNA-Tyr bands detected with the 5' probe. Moreover, this will determine whether 3' tRNA-Tyr fragments accumulate.

      Response: We agree that the reviewer is correct in suggesting that the 3’ tRNA-Tyr might also accumulate. However, we disagree that any accumulation of the 3’ tRF might be relevant in our particular model for multiple reasons. As supported by reviewer 1’s cross-comments, cross-hybridization between isoacceptors (GUA vs AUA) would be unlikely as Tyr-AUA could not even be detected by the initial 5’ tRF probe. Additionally, the sequences for Tyr-GUA are different with no nucleotide alignment from Tyr-AUA. Furthermore, the extraction and sequencing of the 5’ tRF (Fig S6B) confirms the 5’ leader sequence unique to the pre-tRNA (also noted by reviewer 1). While the 3’ half of many Tyr-GUA are similar, we find selective binding of our RNA binding proteins only to the 5’ tRF. The 3’ tRF may play some role in binding to other proteins in cell regulatory pathways but such experiments would be outside the scope of this study.

      Comment 3: The analysis of the proteomic and ribosome profiling experiments seem rather limited, or based upon what was presented in this manuscript. If additional analyses were performed, then they should be included as well, even if they yielded negative results. For example, the manuscript identifies 102 proteins that decrease after tRNA-Tyr depletion and YARS-depletion with a certain threshold of Tyr codon content. We realize the Authors were trying to find potential genes that are modulated under all three conditions. However, this does not provide information whether there is a relationship between a certain codon such as Tyr and protein abundance if only binning into two categories representing below and above a certain codon content. The Authors should plot the abundance change of each detected protein versus each codon and determine the correlation coefficient. This analysis is important for substantiating the conclusion of a codon-based system of specifically modulating transcripts enriched for certain codons. Otherwise, how could changes in tRNA-Tyr levels modulate codon-dependent gene expression if two different transcripts with the same Tyr codon content exhibit differences in translation? Moreover, this analysis should be performed with all the other codons as well.

      Response: We have identified an error in our manuscript where the overlap identified 109 proteins and not 102 as reported previously. We apologize for this oversight. While the reviewer is correct in that identifying codon dependent changes for all 3500+ proteins detected would offer greater insight, our study was specifically focused on tyrosine as we observed this tRNA to become depleted and our experimental system modulated this specific tRNA. As for the second point on Tyr tRNA level effects on translation, we felt that the most rigorous course would be to assess causality rather than an association for this tRNA and its codon in regulating a target gene. The only way to do this is to perform mutagenesis and reporter studies. Our codon dependent reporter clearly shows a direct effect on translation in a tyrosine-codon dependent manner. As for translational regulation for two different transcripts with the same Tyr codon content, it is unclear the molecular mechanisms that could dictate these differences. The reviewer has already brought up possibilities in the next comment regarding Tyr codons in 5’ or 3’ ends or consecutive Tyr codons. These are all interesting hypotheses that others in the field have devoted entire publications to try and understand how and why codon interactions and localizations impact translation (see Gamble et al., Cell 2016, Kunec and Osterreider, Cell Reports 2016, Gobet et al., PNAS 2020). While these further analyses would be interesting, our current experimental data would be insufficient to properly address these questions. We have focused on a specific tRNA, its fragment, and demonstrated direct effects of the tRNA on the codon-dependent translation of a specific growth-regulating target gene and the tRNA fragment on the modulation of the activity of the RNA binding protein it binds to with respect to its regulon. We believe that these findings individually reveal causal roles for this tRNA and tRF in downstream gene regulation and collectively reveal a previously unappreciated post-transcriptional response. We hope the reviewer agrees with us regarding the already deep extent of the studies and that further such analyses beyond this tRNA are outside the scope and focus of this current study.

      Comment 4: The Authors should provide the specific parameters used to calculate the median abundance of Tyr codons in a protein and the list of proteins containing higher than median abundance of Tyr codon content. Moreover, the complete list of 102 candidate genes should also be provided. This will allow one to determine what percentage of these Tyr-enriched proteins exhibited a decrease in levels. Moreover, is there anything special about these Tyr codon-enriched transcripts where they are affected at the level of translation but not the other Tyr-codon enriched transcripts? For example, are these transcripts enriched at the 5' or 3' ends for Tyr codons? Do these transcripts exhibit multiple consecutive Tyr codons? This deeper analysis would enrich the findings in this manuscript.

      Response: For the proteins identified in the mass spectrometry and overlap listed in Fig 4A, Tyr codon abundance was calculated by dividing the number of Tyr amino acids present by the total number of amino acids for each protein. For genes with different isoforms possible, the principal isoform, using ENSEMBL, was used for calculations. We are also happy to provide the entire list of proteins. Additionally, please see above response to comment 3. We wish to emphasize that the goal of identification of these proteins was to identify downstream targets of this response for functional studies, which we have done. We have identified downstream genes that become modulated by this response and that regulate cell growth, consistent with the phenotype of the tRNA. We then demonstrated a direct causal tRNA-dependent codon-based response with a specific target gene using mutagenesis.

      While we agree that the additional analysis the reviewer is requesting to determine what constitutes heightened translational sensitivity to this response is interesting, we believe this is a challenging question for future studies. It is possible that enrichment at 5’ or 3’ or concentration of tyrosine codons could cause increased sensitivity. Ideally, one would have information on a larger set of proteins so that such challenging questions could be better statistically bolstered. Ultimately, the requested experiments that go beyond our current work would require further analyses and experiments to allow firm conclusions to be drawn. As the other reviewers state and this reviewer agrees, we have uncovered the initial discovery regarding this tRNA fragmentation response and provided mechanistic characterization. Future studies, which are beyond the scope of the current work will undoubtedly further characterize features of this response.

      Comment 5: The ribosome profiling results are condensed into two panels of Figure 5E and 5F. We recommend the ribosome profiling experiment be expanded into its own figure with more extensive analysis and comparison beyond just looking at tRNA-Tyr. This could reveal insight into other codons that are impacted coordinately with Tyr codons and perhaps strengthen their conclusion. As an example of a more thorough analysis of ribosome profiling and proteomics, we point the Authors to this recent paper: Lyu et al. 2020 PLoS Genetics, https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008836

      Response: We thank the reviewer for their suggestion. We have expanded the analysis to look at codon usage scatterplots across all codons for shTyr and shControl replicates (Fig S5D). The 5 most changed codons are labeled with UAC, a codon for the tyrosine amino acid, being the most affected (red arrow). Consistent with our model, a tyrosine codon, when at the ribosome A-site, is most affected with depletion of the corresponding tRNA. The text has also been edited to reflect our new analysis providing further evidence that ribosomal stalling might occur with depletion of a given tRNA. The gray outline around the regression line represents the 95% confidence interval.

      Fig S5D

      Comment 6: Moreover, one would expect that the mRNAs encoding USP3, EPCAM and SCD would exhibit increased ribosome occupancy. Thus, the authors should at least provide relative ribosome occupancy information on these transcripts to provide evidence that the decrease in protein levels is indeed linked to ribosome pausing or stalling.

      Response: We would like to emphasize that resolution of ribosomal profiling data at the codon level for specific genes requires a high number of reads and replicates to draw accurate conclusions. There is an inherent level of stochasticity when mapping RPFs to specific genes and as a result, our analysis revolved around Tyr-enriched vs Tyr-low populations as this analysis was appropriate for our sequencing depth and number of replicates. To be able to conclusively make claims regarding ribosome pausing or stalling for specific genes, we would likely need further experimentation than can be currently done. However, we are currently conducting the requested bioinformatic analysis and have promising preliminary transcript-level data supporting our model.

      Comment 7: The results with hnRNPA1 and SSB/La are extremely preliminary and simply show binding of tRNA fragments but no biological relevance. We realize that the Authors attempted to see if Tyr-tRNA fragments impacted RNA Pol III RNA but found no effect. A potential experiment would be to perform HITS-CLIP on H2O2-treated cells to see if stress-induced tRNA fragments bind to SSB/La or hnRNPA1. In this case, at least the Authors would link the oxidative stress results found in Figure 1 and 2 with La/SSB and hnRNPA1.

      Response: We agree with the reviewer that a tRF function was not established in the manuscript. As a result, we have recently completed experiments looking at mRNA stability of the hnRNPA1 regulon in the context of overexpressing the tRF as well as using LNA to inhibit this Tyr-tRF (Fig 7E-F). Our data shows, in an hnRNPA1-dependent manner, that its regulon can be functionally regulated by Tyr-tRF. With tRF overexpression and RNAi-mediated depletion of hnRNPA1, a right shift in transcript stability is seen. Importantly, when we do the converse experiment with tRF inhibition in the same RNAi-mediated reduction of hnRNPA1, we see a left shift. These complementary experiments provide data that the Tyr-tRF has a functional role when bound to hnRNPA1 by modulating the regulon of hnRNPA1 and expand the scope of this manuscript and extend the pathway defined downstream of this tRNA fragmentation event.

      Fig 7E-F

      Comment 8: The manuscript concludes that "Tyrosyl tRNA-GUA fragments are generated in a DIS3L2-dependent manner" based upon data in Supplemental Figure S7. However, there is still a substantial amount of tyrosine tRNA fragments in both worms and human cells depleted of DIS3L2. Thus, DIS3L could play a role in the formation of Tyrosine tRNA fragments but it is too strong a claim to say that tRNA fragments are "dependent" upon DIS3L2. We suggest that the Authors soften their conclusions.

      Response: While there are certainly tRFs still apparent with DIS3L2 depletion (Fig S7F-I), we note significant impairment of tRF induction with DIS3L2 knockdown/knockout with multiple different methods in C. elegans and human cells. This data supports our conclusion that tRF generation is dependent on DIS3L2 as this ribonuclease is necessary to elicit the full Tyr-tRF response. We do not make claims that Tyr-tRFs are solely or completely dependent on DIS3L2. There must be other RNases involved given the data highlighted by the reviewer. To this point, we have added clarifying text that DIS3L2 depletion does not completely eliminate the tRF induction.

      Comment 9: Moreover, what is the level of DIS3L2 depletion in the worm and human cell lines? The Authors should provide the immunoblot of DIS3L2 that was described in the Materials and Methods.

      Response: An immunoblot of DIS3L2 depletion in human cells has now been added as a supplementary figure (Fig S7I). Depletion in C. elegans was confirmed through sequencing of a mutation, as is standard in the field. The wild-type PCR product is 1nt longer (859 bp) than the mutant product (858 bp) with CTC to TAG nonsynonymous mutation preceding a single nucleotide deletion.

      Wild-type disl-2: GTTGAAGCCGCAGGGC[CTC]ACTCAGACAGCTACAGG

      disl-2 (syb1033): GTTGAAGCCGCAGGGC[TAG]-CTCAGACAGCTACAGG

      Fig S7I

      Comment 10: The key conclusions of "a tRNA-regulated growth suppressive oxidative stress response pathway" and an "underlying adaptive codon-based gene regulatory logic inherent to the genetic code" are overstated. This is because of the major caveat that knockdown of tyrosine-tRNA or tyrosyl-tRNA synthetase are likely to trigger numerous indirect effects. While the authors validate that three proteins are expressed at lower levels under all three conditions (H2O2, tRNA-Tyr and YARS), they might overlap in some manner but not necessarily define a coordinated response. Thus, a glaring gap in this paper is a clear, mechanistic link between H2O2-induced changes in translation versus the changes in expression when either tRNA-Tyr or YARS is depleted. Thus, it is too preliminary to conclude that tRNA depletion is part of a "pathway" and "regulatory logic" when it could all be pleiotropic effects. At the very least, the authors should discuss the possibility of indirect effects to provide a more nuanced discussion of the results obtained using two different cell systems and oxidative stress.

      Response: We thank the reviewer for the feedback. While we agree that indirect effects may exist, we do not make any claims that our pathway is the only one required to have translation effects. The text for Fig 4A already acknowledges the pleiotropic effects of tRNA depletion. Our data shows that H2O2 stress leads to a depletion of Tyr tRNA-GUA and that depletion of this tRNA through multiple complementary methods has a codon-dependent effect on protein expression. We hope the reviewer agrees that the reduction of a specific target gene in a tyrosine codon-dependent manner (demonstrated by mutagenesis) and the binding of the tRF directly to an RBP and the modulation of the regulon of this RBP by this tRF (demonstrated by gain- and loss-of-function studies) demonstrates a direct role of this response on specific downstream target genes rather than pleiotropy. This is in keeping with the cross-comments of reviewer 1, where Fig 5D shows a direct Tyr codon link between H2O2 and downstream effects. As a result, we feel that our conclusions of a pathway (not the only pathway) are valid. However, the conclusion of a “regulatory logic” might not be interpreted in the same way by all readers and we have thus changed the text to reflect a more nuanced position.

      **Minor comments:** Comment 11: Tyrosyl-tRNAs refers to the aminoacylated form of tRNA. We recommend that all instances of tyrosyl-tRNA be changed to tyrosine tRNA or tRNA-Tyr which is more generic and provides no indication as to the aminoacylation status of a tRNA.

      Response: We thank the reviewer for their correction. We have changed all instances of “tyrosyl” to “tyrosine” in the text.

      Comment 12: In Figure 5C, the promoter is drawn as T7, which is a bacteriophage promoter. While the plasmid used in this manuscript (psiCHECK2) does contain a T7 promoter, mammalian gene expression is driven from the SV40 promoter. Thus, the relevant label in Figure 5C should be "SV40 promoter". Moreover, additional details should be provided on how the construct was made (such as sequence information etc.).

      Response: We thank the reviewer for their correction. We have changed the promoter text in the figure. In the methods for the construct, we have included which USP3 was used and would be happy to include further information if requested.

      Comment 13: Please provide original blots for each of the replicates in: Figure 4C, n=4 Figure 4A, n=9 Figure 4D, n=3 Figure 5D, n=3

      Response: There appears to be an unintentional mislabeling of the requested blots by the reviewer. The original blots for Fig 4C, Fig 5A, Fig 5D, and Fig 6D have been made available in a separate file for reviewers.

      Reviewer #3 (Significance (Required)): This manuscript provides evidence that specific tRNAs are depleted upon oxidative stress as part a conserved stress-response pathway in humans (and worms) to regulate translation in a codon-based manner. Unfortunately, the manuscript attempts to tie together results from different conditions and systems without providing any definitive links that suggest a "pathway" involved in the oxidative stress response. The findings in this paper provide a useful starting point but fall short of being a major advance due to the lack of a clear mechanism. However, there are intriguing results in this manuscript based upon the cell lines depleted of tRNA-Tyr or tyrosine synthetase that could interest researchers in the field of tRNA biology.

      Response: We thank the reviewer for the positive comments regarding our demonstration of a conserved stress response, acknowledging the intriguing nature of our findings that will be a starting point for future studies and that our work will be of interest to researchers in the field of tRNA biology. We hope that the very positive comments of reviewer 1 and 2, the cross-comments of reviewer 1 in response to reviewer 3’s comments regarding the specificity of this response, and our inclusion for reviewer 3 of additional data on the function of the tRF in regulating the activity of the hnRNPA1 RNA binding protein defining a post-transcriptional pathway and additional corroborating requested codon-level computational analyses provide compelling support that that our findings indeed represent a major advance for the field.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The major findings in this manuscript are: 1.) Oxidative stress in human cells causes a decrease in tyrosine tRNA levels and accumulation of tyrosine tRNA fragments; 2.) The depletion of tyrosyl-tRNA synthetase or tyrosine tRNAs in human cells results in altered translation of certain genes and reduced cell growth and 3.) hnRNPA1 and SSB/La can bind tyrosine tRNA fragments. There is also preliminary evidence that the DIS3L2 endonuclease contributes to the appearance of tyrosine tRNA fragments upon oxidative stress. Based upon these results, the Authors conclude that tyrosine tRNA depletion is part of a conserved stress-response pathway to regulate translation in a codon-based manner.

      Major comments:

      •There is a considerable amount of data in this paper and the experiments are performed in a generally rigorous manner. Sufficient details are provided for reproducing the findings and all results have been provided to appropriate databases (RNA-Seq and ribosome profiling).

      •The manuscript uses a probe against the 5' half of Tyrosine tRNA for Northern blotting. However, tRNA probes can be prone to cross-hybridization, especially with some tRNA isoacceptors being similar in sequence. Thus, the blots in Figure 2 and Supplemental Figures should be probed with an oligonucleotide against the 3' half of tRNA-Tyr. This will confirm the pre- and mature tRNA-Tyr bands detected with the 5' probe. Moreover, this will determine whether 3' tRNA-Tyr fragments accumulate.

      •The analysis of the proteomic and ribosome profiling experiments seem rather limited, or based upon what was presented in this manuscript. If additional analyses were performed, then they should be included as well, even if they yielded negative results. For example, the manuscript identifies 102 proteins that decrease after tRNA-Tyr depletion and YARS-depletion with a certain threshold of Tyr codon content. We realize the Authors were trying to find potential genes that are modulated under all three conditions. However, this does not provide information whether there is a relationship between a certain codon such as Tyr and protein abundance if only binning into two categories representing below and above a certain codon content. The Authors should plot the abundance change of each detected protein versus each codon and determine the correlation coefficient. This analysis is important for substantiating the conclusion of a codon-based system of specifically modulating transcripts enriched for certain codons. Otherwise, how could changes in tRNA-Tyr levels modulate codon-dependent gene expression if two different transcripts with the same Tyr codon content exhibit differences in translation? Moreover, this analysis should be performed with all the other codons as well.

      •The Authors should provide the specific parameters used to calculate the median abundance of Tyr codons in a protein and the list of proteins containing higher than median abundance of Tyr codon content. Moreover, the complete list of 102 candidate genes should also be provided. This will allow one to determine what percentage of these Tyr-enriched proteins exhibited a decrease in levels. Moreover, is there anything special about these Tyr codon-enriched transcripts where they are affected at the level of translation but not the other Tyr-codon enriched transcripts? For example, are these transcripts enriched at the 5' or 3' ends for Tyr codons? Do these transcripts exhibit multiple consecutive Tyr codons? This deeper analysis would enrich the findings in this manuscript.

      •The ribosome profiling results are condensed into two panels of Figure 5E and 5F. We recommend the ribosome profiling experiment be expanded into its own figure with more extensive analysis and comparison beyond just looking at tRNA-Tyr. This could reveal insight into other codons that are impacted coordinately with Tyr codons and perhaps strengthen their conclusion. As an example of a more thorough analysis of ribosome profiling and proteomics, we point the Authors to this recent paper: Lyu et al. 2020 PLoS Genetics, https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008836

      •Moreover, one would expect that the mRNAs encoding USP3, EPCAM and SCD would exhibit increased ribosome occupancy. Thus, the authors should at least provide relative ribosome occupancy information on these transcripts to provide evidence that the decrease in protein levels is indeed linked to ribosome pausing or stalling.

      •The results with hnRNPA1 and SSB/La are extremely preliminary and simply show binding of tRNA fragments but no biological relevance. We realize that the Authors attempted to see if Tyr-tRNA fragments impacted RNA Pol III RNA but found no effect. A potential experiment would be to perform HITS-CLIP on H2O2-treated cells to see if stress-induced tRNA fragments bind to SSB/La or hnRNPA1. In this case, at least the Authors would link the oxidative stress results found in Figure 1 and 2 with La/SSB and hnRNPA1.

      •The manuscript concludes that "Tyrosyl tRNA-GUA fragments are generated in a DIS3L2-dependent manner" based upon data in Supplemental Figure S7. However, there is still a substantial amount of tyrosine tRNA fragments in both worms and human cells depleted of DIS3L2. Thus, DIS3L could play a role in the formation of Tyrosine tRNA fragments but it is too strong a claim to say that tRNA fragments are "dependent" upon DIS3L2. We suggest that the Authors soften their conclusions.

      •Moreover, what is the level of DIS3L2 depletion in the worm and human cell lines? The Authors should provide the immunoblot of DIS3L2 that was described in the Materials and Methods.

      •The key conclusions of "a tRNA-regulated growth suppressive oxidative stress response pathway" and an "underlying adaptive codon-based gene regulatory logic inherent to the genetic code" are overstated. This is because of the major caveat that knockdown of tyrosine-tRNA or tyrosyl-tRNA synthetase are likely to trigger numerous indirect effects. While the authors validate that three proteins are expressed at lower levels under all three conditions (H2O2, tRNA-Tyr and YARS), they might overlap in some manner but not necessarily define a coordinated response. Thus, a glaring gap in this paper is a clear, mechanistic link between H2O2-induced changes in translation versus the changes in expression when either tRNA-Tyr or YARS is depleted. Thus, it is too preliminary to conclude that tRNA depletion is part of a "pathway" and "regulatory logic" when it could all be pleiotropic effects. At the very least, the authors should discuss the possibility of indirect effects to provide a more nuanced discussion of the results obtained using two different cell systems and oxidative stress.

      Minor comments:

      •Tyrosyl-tRNAs refers to the aminoacylated form of tRNA. We recommend that all instances of tyrosyl-tRNA be changed to tyrosine tRNA or tRNA-Tyr which is more generic and provides no indication as to the aminoacylation status of a tRNA.

      •In Figure 5C, the promoter is drawn as T7, which is a bacteriophage promoter. While the plasmid used in this manuscript (psiCHECK2) does contain a T7 promoter, mammalian gene expression is driven from the SV40 promoter. Thus, the relevant label in Figure 5C should be "SV40 promoter". Moreover, additional details should be provided on how the construct was made (such as sequence information etc.).

      •Please provide original blots for each of the replicates in:

      Figure 4C, n=4

      Figure 4A, n=9

      Figure 4D, n=3

      Figure 5D, n=3

      Significance

      This manuscript provides evidence that specific tRNAs are depleted upon oxidative stress as part a conserved stress-response pathway in humans (and worms) to regulate translation in a codon-based manner. Unfortunately, the manuscript attempts to tie together results from different conditions and systems without providing any definitive links that suggest a "pathway" involved in the oxidative stress response. The findings in this paper provide a useful starting point but fall short of being a major advance due to the lack of a clear mechanism. However, there are intriguing results in this manuscript based upon the cell lines depleted of tRNA-Tyr or tyrosine synthetase that could interest researchers in the field of tRNA biology.

      This review is written from the perspective of a researcher with expertise in RNA processing, RNA biology and translation regulation.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This very interesting study from Sohail Tavazoie's lab describes the consequences of oxidative stress on the tRNA pool in human epithelial cell lines. As previously described, the authors observed that tRNA fragments were generated upon exposure of cells to ROS. In addition, the authors made the novel observation that specific mature tRNAs were also depleted under these conditions. In particular, the authors focused on tyrosyl tRNA-GUA, which was decreased ~50% after 24 hours of ROS exposure, an effect attributable to a decrease in the pre-tRNA pool. Depletion of tyrosyl tRNA resulted in reduced translation of specific mRNAs that are enriched in tyr codons and likely contributed to the anti-proliferative effects of ROS exposure. In addition, the authors demonstrated that the tRFs produced from tyr tRNA-GUA can interact with specific RNA binding proteins (SSB and hnRNPA1).

      The major contribution of this paper is the novel finding that stress-induced tRNA fragmentation can result in a measurable reduction of specific mature tRNAs, leading to a selective reduction in translation of mRNAs that are enriched for the corresponding codons. Previously, studies of tRNA fragmentation largely focused on the functions of the tRFs themselves and it was generally believed that the mature tRNA pool was not impacted sufficiently to reduce translation. The findings reported here therefore add a new dimension to our understanding of the cellular consequences of stress-induced tRNA cleavage.

      Overall, the data are of high quality, the experiments are convincing, and the conclusions are well supported. I have the following suggestions that would further strengthen the study and bolster the conclusions.

      1.The authors have not formally demonstrated that the reduction in pre-tRNA in H2O2-treated cells is a consequence of pre-tRNA cleavage. It is possible that reduced transcription contributes to this effect. Pulse-chase experiments with nucleotides such as EU would provide a tractable approach to demonstrate that a labelled pool of pre-tRNA is rapidly depleted upon H2O2 treatment, which would further support their model. Since the response occurs rapidly (within 1 hour), it would be feasible to monitor the rate of pre-tRNA depletion during this time period in control vs. H2O2-treated cells.

      2.To what extent is the growth arrest that results from H2O2 treatment attributable to tyr tRNA-GUA depletion (Fig. 3A)? Since the reduction in tRNA levels is only partial (~50%), it should be feasible to restore tRNA levels by overexpression (strategy used in Fig. 3E, S3B) and determine whether this measurably rescues growth in H2O2-treated cells.

      3.Knockdown of YARS/tyr tRNA-GUA resulted in reduced expression of EPCAM, SCD, and USP3 at both the protein and mRNA levels (Fig. 4C-D, S4C). In contrast, H2O2-exposure reduced the abundance of these proteins without affecting mRNA levels (Fig. 5A-B, S5A). The authors should comment on this apparent discrepancy. Perhaps translational stalling induces No-Go decay, but it is unclear why this response would not also be triggered by ROS.

      4.In addition to the analyses of ribosome profiling in Fig. 5E-F, it might also be helpful to show a metagene analysis of ribosome occupancy centered upon UAC/UAU codons (for an example, see Figure 2 of Schuller et al., Mol Cell, 2017). This has previously been used as an effective way to visualize ribosome stalling at specific codons. Additionally, do the authors see a global correlation between tyrosine codon density and reduced translational efficiency in tRNA knockdown cells?

      5.MINOR: On pg. 4, the authors state that tRF-tyrGUA is the most highly induced tRF, but Fig. S1B appears to show stronger induction of tRF-LeuTAA.

      Significance

      The major advance provided by this work is the demonstration that stress-induced tRNA cleavage can reduce the abundance of the mature tRNA pool sufficiently to impact translation. Moreover, the effect on mature tRNAs is selective, resulting in the reduced translation of a specific set of mRNAs under these conditions. These findings reveal previously unknown consequences of oxidative stress on gene expression and will be of interest to scientists working on cellular stress responses and post-transcriptional regulation.

      REFEREES CROSS-COMMENTING

      Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly?

      Here is what I was thinking: The generation of tRFs does not generally result in reduction in levels of the mature tRNAs. So you can imagine a scenario where oxidative stress causes tRF generation from the mature tyr tRNA (which does not impact its steady-state levels), as is the case for other tRNAs. At the same time, decreased transcription would reduce the pre-tRNA pool, leading to a delayed reduction in mature tRNA, as observed.

      However, looking back at the data, I see that after only 5 min of H2O2 treatment, the authors observed reduced pre-tRNA and increased tRFs (Fig. 2A). This seems very fast for a transcriptional response, which would presumably require some kind of signal transduction. In addition, when you consider the amount of tRFs produced in Fig. S2C, it is hard to imagine that this would not impact the mature tRNA pool if they were derived from there. So I agree that the transcriptional scenario seems unlikely.

      Nevertheless, I think that looking at pre-tRNA degradation directly with the pulse-chase strategy would strengthen their story, so I would like to give the authors this suggestion. However, I am fine with listing this as an optional experiment which would enhance the paper but should not be essential for publication.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript by Huh et al. reports that oxidative stress causes fragmentation of a specific tyrosine pre-tRNA, leading to two parallel outcomes. First, the fragmentation depletes the mature tRNA, causing translational repression of genes that are disproportionally rich in tyrosine codon. These genes are enriched for those involved in electron transport chain, cell cycle and growth. Second, the fragmentation generates tRNA fragments (tRFs) that bind to two known RNA binding proteins. Finally, the authors identify a nuclease that is needed for efficient formation of tyrosine tRFs.

      The authors should include a short diagram indicating the various known steps of pre-tRNA fragmentation (perhaps as a supplement) for general readers.

      I find the enrichment for mitochondrial electron transport chain (ETC) curious. The ETC includes several oxidoreductases, which may be rich in tyrosine as it is a common amino acid used in electron transfer. The depletion of the tyrosine tRNA from among many tRNAs under oxidative stress may not be incidental but related to an attempt by the cell to decrease oxygen consumption to avoid further oxidative damage. The authors could further mine their data to corroborate this hypothesis. For example, are the ETC genes among the targets of the RNA binding proteins targeted by tyrosine tRFs? This could potentially connect the effects of mature tRNA depletion and tRFs.

      In figure 4A, the authors should provide the tyrosine codon content of the overlap genes and show how much it differs from a randomly selected sample.

      Fig.6F, lower panel: the model should show pre-tRNA, as opposed to mature tRNA, because it is the former that is fragmented.

      Significance

      This study is comprehensive and novel, and includes several orthogonal and complementary approaches to provide convincing evidence for the conclusions. The main discovery is significant because it presents an important advance in post-transcriptional control of gene expression. The process of tRF formation was previously thought not to affect the levels of mature tRNA. This study changes that understanding by describing for the first time the depletion of a specific mature tRNA as its precursor form is fragmented to generate tRFs. Finally, the authors identify DIS3L2 as a nuclease involved in fragmentation. This is also an important finding as the only other suspected nuclease, albeit with contradictory evidence, is angiogenin. Collectively, the findings of this study would be of interest to a broad group of scientists. I only have a few minor comments and suggestions (see above).

      REFEREES CROSS-COMMENTING

      I have the following comments on other reviewers' critiques.

      Regarding the concern that the disappearance of the pre-tRNA could be a transcriptional response (reviewer 2), I think that the appearance of tRFs makes this scenario unlikely. If pre-tRNA levels decreased due to transcriptional repression, wouldn't one expect that both tRNA and the tRF levels diminish concomitantly?

      Reviewer 3 raises the issue of cross hybridization in Northern blots. The authors indicate that they "could not detect the other tyrosyl tRNA (tRNA Tyr AUA) in MCF10A cells by northern blot..." (page 6). Also, they gel extracted tRFs and sequenced them (figure S6B), directly identifying the fragments. I think these findings mitigate the concern of cross hybridization and clearly identify the nature of tRFs.

      Finally, I think that the codon-dependent reporter experiment (figure 5D) addresses many issues surrounding codon dependent vs indirect effects. In that experiment, the authors mutate 5 tyrosine codons of a reporter gene and demonstrate that the encoded protein is less susceptible to repression in response to oxidative stress.

  7. Aug 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to the References

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript Yan et al describe a method to perform imaging based pooled CRISPR screens based on photoactivation followed by selection and sorting of the cells with the desired phenotypes.

      They establish a system in mammalian RPE-1 cells where they integrate a photo-activatable mCherry, identify the cells of interest under the microscope based on a phenotype, automatically activate the mCherry fluorescence in these cells and then sort the desired populations by FACS. They demonstrate the reliability of their enrichment method and finally use this approach to look for factors that regulate nuclear size by a targeted pooled CRISPR screen.

      **Major points:**

      1.This year Hassle et al described a very very similar approach that they name: Visual Cell Sorting . In this case, they use a photoconvertible fluorescent protein (green-to-red conversion) to select cells with a certain visual cellular phenotype and enrich those by FACS. The Hassle et al 2020 MSB paper is only mentioned together with the other methods in the introduction in one sentence (ref #19 in this manuscript):

      " Recently, several in situ sequencing15,16 and cell isolation methods17-20 were developed which allow microscopes to be used for screening. However, these methods contain non-high throughput steps that limit their scalability."

      I think the current citation of the Hassle et al paper, is not really fair. The idea and the execution of the two approaches are almost exactly the same. Here, the authors concentrate on a CRISPR based application, but obviously the applications of the method are not limited to that. The authors should discuss how these similar ideas can be used in several different applications.

      We agree with the reviewer that we need to describe more about the Hasle et al. paper (now ref #20 in the revised manuscript) and expand our description of other applications that could be performed with the method. For this purpose, we have made the following changes:

      We have modified the relevant paragraph in the Introduction.

      p.3 the second paragraph

      Recently, an imaging based method named “visual cell sorting” was described that uses the photo-convertible fluorescent protein Dendra2 to enrich phenotypes optically, enabling pooled genetic screens and transcription profiling(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020). Here, we developed an analogous approach to execute an imaging-based pooled CRISPR screen using optical enrichment by automated photo-activation of the photo-activatable fluorescent protein, PA-mCherry.

      We have also added the following paragraph in the Discussion.

      p.14 line 1

      In our study, optical enrichment was utilized for pooled CRISPR screens on phenotypes identifiable through microscopy. However, optical enrichment can be used for other purposes, as demonstrated previously(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020). In a recent study by Hasle et al.(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020), the process of separating cells by FACS after optical enrichment was termed “visual cell sorting”. This method was used to evaluate hundreds of nuclear localization sequence variants in a pooled format and to identify transcriptional regulatory pathways associated with paclitaxel resistance using single cell sequencing(Hasle, N.; Cooke, A.; Srivatsan, S.; Huang, H.; Stephany, J. J.; Krieger, Z.; Jackson, D.; Tang, W.; Pendyala, S.; Monnat, R. J., Jr.; Trapnell, C.; Hatch, E. M.; Fowler, D. M. 2020), demonstrating the broad applicability and power of this approach beyond CRISPR screening.

      1. While I understand that the authors mean conversion from the dark state to fluorescent state when they describe their photo-activatable mCherry, I think the term "photo-activation" can be confusing for the general reader since typically photo-conversion refers to a change in color. I would here suggest stick to the term photo-activation.

      We thank the reviewer for pointing this out and to avoid future confusion, we restricted the usage of photo-conversion to specifically indicate conversion of fluorescence from one color into another: e.g. when talking about the published visual cell sorting paper in which Dendra2 is used as a photo-convertible fluorescent protein. We use photo-activation in reference to the activation of PA-mCherry in our work.

      1. For validation of the hits coming from the nuclear size screen: Did the authors have any controls making sure that the right targets were down-regulated? This might be obvious for some of the targets (e.g. CPC proteins that are known to induce division errors display the nuclear fragmentation that the authors also observe) but especially for the ones that are less known or unknown to induce any nuclear size change, it will be important to demonstrate the specificity of the targets.

      For validating hits coming from the nuclear size screen, we have verified the successful transduction of corresponding sgRNA constructs by FACS analysis, but have not confirmed the knockdown. Before final journal publication, we propose to perform rt-qPCR on our 15 gene hits before and after knockdown to measure the percentage of knockdown separately.

      In addition, it is not clear from the figure legends and the material and methods if these phenotypes are verified by 3-4 gRNAs they use in the validation. Are the histograms representative of a single experiment with one gRNA or a combination of gRNAs in different experiments? Methods of replication of the data presented in Fig4 is unclear.

      We apologize for the confusion. These phenotypes were verified with pools of 3-4 sgRNAs and the histograms are representative of a single replicate infected with a mixed 3-4 sgRNA pool. We have modified the legend to Figure 5 (original Fig. 4) and the method section to explain this point.

      Minor points:

      1. Related to major point #3: I could not find much experimental info on how the hits from the screen were verified in materials and methods.

      The description of the experiment and information about the selected sgRNAs has been added in the Method section as follows:

      p.23

      Verification of hits from nuclear size screen

      For each hit in the nuclear size screen, the two sgRNAs with the highest phenotypic score in the screen and the two sgRNAs with the highest score predicted by the CRISPRi-v2 algorithm24 were selected and pooled to generate a mixed sgRNA pool of 3-4 sgRNAs (detailed information in Supplementary file 8). Cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were transduced with pooled sgRNAs targeting each gene and puromycin selected for 2 days to prepare for imaging. Cells were then seeded into 96-well glass bottom imaging dishes. Images were collected the next day and nuclear size was measured using the Auto-PhotoConverter µManager plugin. To focus on cells with successful transduction, BFP was co-expressed on the sgRNA construct and only cells with BFP intensity above a threshold value were included in nuclear size measurements. This BFP threshold was established by comparing the average BFP intensity of cells with and without sgRNA transduction (Fig.S3a).

      We agree with this important point and have changed the figure legend of Fig. 5c (original Fig. 4c) to just describe the plot:

      c, The ratios between median level of nuclear size measured from microscopy and H2B-mGFP fluorescence or FSC signal measured from FACS after knockdown, were plotted separately. TACC3, confirmed to be a control gene, was used for comparison (Grey bar).

      The typo has been corrected.

      Reviewer #1 (Significance (Required)):

      I think the idea of performing pooled screens coupled to microscopy is exciting and this approach has definitely more potential than the Craft-ID approach that the authors also discuss in their manuscript. In addition, the approach that is described in this manuscript is convincing and although the fact that the analysis part will require more work (to adapt the software to recognise different types of phenotypic readouts) in the future to make it accessible to the scientific community, the authors present sufficient evidence that the system can be robust. They also present some clever ideas such as to calculate enrichments with different photo-activation times (2sec vs 100ms) followed by separation of these populations by FACS.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Yan et al. present optical enrichment, a method for conducing pooled optical screens. Optical enrichment works by combining microscopy to mark cells of interest using the PA-mCherry photo-activatable fluorescent protein with FACS to recover them. The method is similar to other methods (Photostick, Visual Cell Sorting), and provides an alternative to in situ sequencing/FISH methods. The authors use optical enrichment to conduct a pooled optical CRISPRi screen for nuclear size. They identify and exhaustively validate hits, showing that optical enrichment works for its intended purpose. The development of a uManager protocol and discussion of the number of sgRNA's required for a genetic screen using optical enrichment were welcome. The authors' reported throughput of 1.5 million cells per eight hour experiment is impressive; and the demonstrated use of low cell number input for next generation sequencing appears promising. Overall, the manuscript is well written, the methods clear and the claims supported by the data presented.

      **General comments**

      -I found the analysis and scoring methods to be lacking, both in terms of the clarity of description and in terms of what was actually done. The authors might consider using established methods (eg https://www.biorxiv.org/content/10.1101/819649v1.full). In any case, they should revise the text to clarify what was done and address the other concerns raised below.

      -Relatedly, details regarding how to perform the experiments described are lacking. It is not clear from the text, figures, "Online Methods" section, and Supplementary Files whether all imaging is performed before activation, or whether each field of view is subject to an individual round of imaging followed by activation. It is also unclear whether cells in 96 well plates are sorted as 96 separate tubes or pooled into a single tube prior to sorting. Furthermore, at a minimum, the following details are requested for each optical enrichment "run". These details are critical considerations for those who seek to use optical enrichment in their own laboratories:

      Seeding density

      Time elapsed (in hours) between cell plating and optical enrichment

      The number of fields of view examined

      The median number of cells per field of view; the proportion of each plate's surface area that is imaged and photo-converted

      The total time taken (in hours) to perform imaging and photoconversion

      The gating protocol used for sorting by FACS (preferably including a figure with example gates for one or two experiments). The gating protocol is described for the genetic screen but not for the control experiments.

      We agree with the reviewer and apologize for the confusion that arose from our description. We also thank the reviewer for suggesting using established methods. However, MAUDE, an analysis for sorting-based CRISPR screen with multiple expression bins, might not be suitable for our study since 1) the distribution of mCherry fluorescence intensity is a reflection of photo-activation efficiency and not sgRNA effect 2) only one sorting bin is collected for each experimental condition. Our analysis is adapted from an existing method from the Weissman lab (https://github.com/mhorlbeck/ScreenProcessing).

      We agree with the reviewer regarding clarifying other points and rewrote the following part in the Method section:

      p. 20

      mIFP proof-of-principle screen, Nuclear size screen, FSC screen and H2B-mGFP screen

      For the mIFP proof-of-principle screen, mIFP positive cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP mIFP-NLS) and mIFP negative cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were stably transduced with the “mIFP sgRNA library” (CRISPRa library with 860 elements, see Supplementary file 5) and the “control sgRNA library” (CRISPRa library with 6100 elements, see Supplementary file 6) separately. For the nuclear size screen, FSC screen and H2B-mGFP screen, cells (hTERT-RPE1 dCas9-KRAB-BFP PA-mCherry H2B-mGFP) were stably transduced with the “nuclear size library” (CRISPRi library with 6190 elements, see Supplementary file 7). To guarantee that cells receive no more than one sgRNA per cell, BFP was expressed on the same sgRNA construct and cells were analyzed by FACS the day after transduction. The experiment only continued when 10-15% of the cells were BFP positive. These cells were further enriched by puromycin selection (a puromycin resistance gene was expressed from the sgRNA construct) for 3 days to prepare for imaging. For FSC and H2B-mGFP screens, cells were then subjected to FACS sorting. Cells before FACS (unsorted sample for FSC and H2B-mGFP screens) and top 10% cells based on either FSC signal (high FSC sample) or GFP fluorescence signal (high GFP sample) were separately collected and prepared for high throughput sequencing. For mIFP proof-of-principle screen and nuclear size screen, cells were then seeded into 96-well glass bottom imaging dishes (Matriplate, Brooks) and imaged starting from the morning of the next day (around 15 hr after plating). A series of densities ranging from 0.5E4 cells/well to 2.5E4 cells/well with 0.5E4 cells/well interval were selected and seeded. The imaging dish with cells around 70% confluency was selected to be screened on the imaging day. For mIFP proof-of-principle screen, a single imaging plate was performed for each replicate while 4 imaging plates per replicate were imaged for the nuclear size screen. When executing multiple imaging runs, 2 consecutive runs could be imaged on the same day (day run and night run). 64 (8x8, day run) or 81 (9x9, night run) fields of view were selected for each imaging well and each field of view was subjected to an individual round of imaging directly followed by photo-activation. Around 200-250 cells were present in each given field of view and 60% to 80% surface area of each well was covered. Either mIFP positive cells or cells passing the nuclear size filter were identified and photo-activated automatically using the Auto-PhotoConverter µManager plugin. The total time to perform imaging and photo-activation of a single 96-well imaging dish with around 1.5 million cells was around 8 hr. The night run generally took longer, since more fields of view were included than in the day run. Cells were then harvested by trypsinization and pooled into a single tube for isolation by FACS. Sorting gates were pre-defined using samples with different photo-activation times (e.g. 0s, 200ms, 2s) and detailed gating strategies are described in Supplementary file 1. Sorted samples were used to prepare sequencing samples.

      -The authors use PA-mCherry. There are a variety of other photo-activatable fluorophores available, and it would be good for them to comment on why they chose PA-mCherry. Also, since the method is supposed to be used for generic pooled optical screens, it would be good for the authors to comment on what colors remain available for imaging cellular structures.

      To address these, we have added the following sentences:

      p. 4 line 16

      A photo-activatable fluorescent protein was chosen over a photo-convertible fluorescent protein to increase the number of channels available for imaging. PA-mCherry was chosen to leave the better performing green channel open for labeling of other cellular features. Moreover, non-activated PA-mCherry has low background fluorescence in the mCherry channel (Fig. S1b), and it can be activated to different intensities when photo-activated for various amounts of time.

      p. **14 line 10

      Phenotypes of interest should be identifiable under the microscope and generally require fluorescent labeling. Commonly used fluorescence microscopes use four channels for fluorescent imaging with little spectral overlap: blue, green, red and far red. In our study, the red channel was occupied by cell labeling with PA-mCherry and the blue channel was used to estimate sgRNA transduction efficiency. Since sgRNA transduction efficiency can be measured by other approaches, the blue channel could be used together with the remaining two channels to label cellular structures. Combining bright field imaging with deep learning can be used to reconstruct the localization of fluorescent labels(Ounkomol, C.; Seshamani, S.; Maleckar, M. M.; Collman, F.; Johnson, G. R. 2018), making it possible to use bright field imaging to further expand the phenotypes that can be studied with our technique.

      -In general, the figures are hard to read, with most space being dedicated to beautiful but complex schematics/workflows. Points and fonts should be bigger, and the authors should consider revising the schematics to take up less space.

      We thank the reviewer for this remark and revised all figures accordingly. Points and fonts were enlarged, and schematics were simplified or removed.

      -There is extensive use of editorialzing adverbs. Adverbs such as "highly" (abstract and page 15), "easily" (pages 4 and 11), "completely" (page 11), and "only" (page 12) are unnecessary at best and unsupported by the data at worst (e.g. cells are not "completely" separable with 100 ms photo-conversion, see page 11 and Figure 1C). Please remove "completely" from page 11 and consider removing other adverbs as well.

      We agree with the reviewer and the following adverbs have been removed: “highly” in abstract and page 15; “easily” on pages 4 and 11; “completely” on page 11 and three “only” on page 12.

      -Apologies if I missed it, but I couldn't find a data availability statement. Sequencing reads from the experiments should be deposited in SRA or GEO and made available upon publication.

      We apologize that we missed this, and the sequencing data has been deposited to GEO (GSE156623) which will be made available before final publication. The following part has been added to address this.

      p. 24

      DATA AND SOFTWARE AVAILABILITY

      The raw and processed data for the high throughput sequencing results have been deposited in NCBI GEO database with the accession number (GSE156623). The plugin Auto-PhotoConverter developed for open source microscope control software μManager(Edelstein, A. D.; Tsuchida, M. A.; Amodaj, N.; Pinkard, H.; Vale, R. D.; Stuurman, N. 2014) has been deposited on github (https://github.com/nicost/mnfinder).

      **Specific comments**

      Pages 5/6 - The authors present experiments that show that optical enrichment is highly specific for desired cells. But, they should consider presenting precision (fraction of called positives that are true positive) and recall (fraction of all true positives that are called positive) instead. I think these relate more directly to a pooled optical screen than specificity.

      We apologize for our poor terminology. Our original definition of “specificity” is the same as “precision” suggested by the reviewer. To avoid future confusion, we have changed all relevant occurrences of “specificity” into “precision”. The following sentence was modified to clarify the definition:

      p. 5 line 15

      To evaluate the precision (the fraction of called positives that are true positives) of this assay, all cells were collected and analyzed by FACS after image analysis and photo-activation (Fig. 2d and 2e). We calculated precision as the fraction of photo-activated cells (mCherry positive cells) that are true positives (mIFP-mCherry double positive cells) (Fig. 2f).

      Measuring recall is complicated because the microscope is unable to visit all locations in the imaging plate, hence recall will depend on the fraction of cells actually “seen” by the microscope. For the screening strategy employed in the nuclear size screen, recall is not as important as precision, since lower recall rates are compensated for by screening larger cell numbers. We therefore did not attempt to measure recall directly.

      Page 6 - Related to the above point, the authors state "These results indicate the assay yields reliable hit identification regardless of the percentage of hits in the library." This statement seems too strong given that the authors looked at specificity experimentally with a mixture of ~1% mIFP positive cells. In fact, hits might be much less than 1% of the total population of cells, and specificity would certainly fall from the 80% measured at 1% of the total population. The authors should do a bit more to fairly discuss their ability to find rare hits.

      We agree with the reviewer and have changed the following description:

      p. 5 line 20

      The precision varied with the initial percentage of mIFP positive cells and ranged from 80% to ~100% (initial percentage of mIFP positive cells ranging between 2.3% and 43.7%) (Fig. 2f). Precision is expected to fall below 80% with initial percentage of mIFP positive cells less than 2.3%. However, these results indicate that optical enrichment can be used to identify hits with high precision even at relatively low hit rates.

      Pages 6/7 - The authors perform a validation experiment using two different sgRNA libraries, infecting mIFP- and mIFP+ cells separately. Then, they demix these populations via optical enrichment, sequence and compute a phenotype score for sgRNAs or groups of sgRNAs. The way the experiment is described and visualized is extremely confusing. If I understood correctly (and I am not sure that I did), the bottom right panel of Figure 2b shows that if sgRNAs are (randomly?) paired AND two replicates are combined then optical enrichment nearly perfectly separates all (combined, paired) sgRNAs in the two libraries. The authors should rewrite this section, especially clarifying what is meant by "1 sgRNA/group and 2 sgRNA/group," and consider changing Figure 2b (perhaps just show the lower right panel?).

      We apologize for our confusing description. To avoid the confusion, we rewrote the paragraph describing the experiment and added a schematic (Fig. 3a) to better describe this experiment. We also simplified the result by just presenting the lower right panel of original Fig. 2b (current Fig. 3b) and moved the other data into supplementary figures (Fig. S2).

      p. 6 line 4

      mIFP negative cells and mIFP positive cells were separately infected with two different CRISPRa sgRNA libraries (6100 sgRNAs for mIFP negative cells; 860 sgRNAs for mIFP positive cells) at a low multiplicity of infection (MOI) to guarantee a single sgRNA per cell. Note that in these experiments, the sgRNAs only function as barcodes to be read out by sequencing, but do not cause phenotypic changes as the cells do not express corresponding CRISPR reagents. These two populations were then mixed at a ratio of 9:1 mIFP negative cells: mIFP positive cells. We again used mIFP expression as our phenotype of interest (outlined in Fig. 3a). Two biological replicates were performed and at least 200-fold coverage of each sgRNA library was guaranteed throughout the screen, including library infection, puromycin selection, imaging/photo-activation and FACS.

      Page 8 - Related to Supplementary Figure 3, why are there not clear BFP+ and BFP- populations but instead one continuous population? How was the gating determined (e.g. how was the boundary between red and gray picked)? Here, and generally, flow plots and histograms of flow plots should indicate the number of cells. If replicates were performed, they should be included.

      We have clarified our description. There are no clear BFP+ and BFP- populations but instead one continuous population due to the background expression of BFP from the dCas9 construct: dCas9-KRAB-BFP (which is now clearly indicated in the manuscript). On top of the dCas9-KRAB-BFP, another BFP is encoded on the sgRNA construct, which leads to a higher BFP expression level.

      There was no gating in the experiment, the grey dots in the figure represents wild type cells without viral transduction while the red dots (partially covered by the grey dots) were cells infected with the two negative control sgRNAs. We mistakenly wrote the legend of original Fig. S3 (current Fig. S3a) that these were FACS data; however, the data were acquired by imaging. We apologize for the confusion and thank the reviewer for detecting the issue. We completely rewrote the legend to Fig. S3a (original Fig. S3) to clarify.

      We now include the number of cells analyzed and the number of replicates for the other flow plots and histograms in the manuscript.

      Page 8 - "Nuclear sizes...". The authors should say in the main text what size metric was used.

      To address the reviewer’s point, we have included the following sentence:

      p. 8 line 23

      We defined nuclear size as the 2D area in square microns measured by H2B-mGFP using an epifluorescence microscope, as determined by automated image analysis (Fig. 4a and Supplementary file 2).

      Page 9 - I am a little confused about the statistical analysis of the screen. In Supplementary File 1, the authors state that p-values were "calculated based on comparison between the distribution of all the phenotypic scores of sgRNAs targeting to the gene/assigning in the group and the one of negative control sgRNAs in the libraries." I presume this means that all phenotypic scores (across replicates) of all sgRNAs targeting each gene were included in a Mann Whitney U test with a single randomized set of phenotypic scores. If that's right, it seems like an odd way to get p-values. Better would be a randomization test, where a null distribution of phenotypic scores for each gene is built by randomizing sgRNA-level scores many times. Then the actual phenotypic score is compared to the randomized null distribution, yielding a p-value. In any case, the authors must clarify what they did in the main text and Supplementary File 1.

      Page 9 - It does not appear that the p-values presented in Figure 3c have been adjusted for multiple hypothesis testing. This should be done.

      Page 9 - "A value of the top 0.1 percentile of control groups was used as a cutoff for hits." Why? This seems arbitrary. It seems like appropriate false-discovery rate control would enable a more rigorous method for choosing a cutoff.

      Page 9 - The same comments regarding analysis and scoring of the optical enrichment screen applies to the FSC and GFP screens.

      We clarified the description of the statistical analysis of the screen (see new/changed text below). Mann-Whitney p-values for the two replicates were calculated independently. The Mann-Whitney U test was not performed against a randomized set of phenotypic scores, but using the phenotypic scores of the 22 control non-targeting sgRNAs that were part of the library. Because there are only 22 control sgRNAs (adding more control sgRNAs would increase the size of the library, and reduce the number of genes that can be screened within a given amount of time), the statistical significance of testing genes against these controls is not expected to be very high, and using direct approaches such as multiple hypothesis testing are not expected to yield hits. Instead, we calculated a score combining the severity (phenotypic score) and the trustworthiness (Mann-Whitney p value) of the phenotype (a method previously developed in the Weissman lab at UCSF: https://github.com/mhorlbeck/ScreenProcessing24). We thank the reviewer for suggesting using false discovery rate control as a better method for choosing a cutoff. We modified our original analysis and now determine the threshold of our score based on a calculated empirical false discovery rate (eFDR). We used this approach to maximize the number of true hits and relied on a repeat of the screen and follow-up testing of hits to narrow down true hits. We added the following part in the method section and added an analysis example to the supplementary files (Supplementary file 9)."

      p. 22

      Bioinformatic analysis of the screen

      Analysis was based on the ScreenProcessing pipeline developed in the Weissman lab (https://github.com/mhorlbeck/ScreenProcessing)**(Horlbeck, M. A.; Gilbert, L. A.; Villalta, J. E.; Adamson, B.; Pak, R. A.; Chen, Y.; Fields, A. P.; Park, C. Y.; Corn, J. E.; Kampmann, M.; Weissman, J. S. 2016). The phenotypic score (ε) of each sgRNA was quantified as previously defined(Kampmann, M.; Bassik, M. C.; Weissman, J. S. 2013)** (Supplementary file 9). For the mIFP proof-of-principle screen, phenotypic score of each group was the average score of two sgRNAs assigned to the group and averaged between two replicates except otherwise described. For the nuclear size screen, FSC screen and H2B-mGFP screen, genes were scored based on the average phenotypic scores of the sgRNAs targeting them. For the nuclear size screen, phenotypic scores were further averaged between 4 runs for each replicate. For the nuclear size screen, FSC screen and H2B-mGFP screen, sgRNAs were first clustered by transcription start site (TSS) and scored by the Mann-Whitney U test against 22 non-targeting control sgRNAs included in the library. Since only 22 control sgRNAs were included, significance of hits was assessed by comparison with simulated negative controls that were generated by random assignment of all sgRNAs in the library and phenotypic scores of these simulated negative controls were scored in the same way as phenotypic scores for genes. A score η that includes the phenotypic score and its significance was calculated for each gene and simulated negative control. The optimal cut-off for score η was determined by calculating an empirical false discovery rate (eFDR) at multiple values of η as the number of simulated negative controls with score η higher than the cut-off (false positives) divided by the sum of genes and simulated negative controls with score η higher than the cut-off (all positives). The cut-off score η resulting in an eFDR of 0.1% was used to call hits for further analysis (Supplementary file 9). An example analysis is described in detail in Supplementary file 9 and raw counts and phenotypic scores for all four screens are listed in Supplementary file 10 and 11.

      Page 9 - "These data suggest that a direct measurement utilizing a microscope can provide significant improvement in hit yield even for phenotypes that could be indirectly screened with other approaches." I think this conclusion is too strong. It rests on the assumption that the FSC/GFP phenotypes should have the same set of hits as the microscope phenotype (larger nuclear area). This may not be the case. For example, genes whose inactivation increases GFP expression would be hits in the former, but not latter case. The authors should moderate this statement.

      We agree with the reviewer and have changed the sentence into:

      p. 10 line 17

      These data suggest that a direct measurement utilizing a microscope can provide different information and reveal hits that are inaccessible using other screening approaches.

      Page 11 - "This is significantly faster than the in situ methods." The authors should provide a citation and an actual comparison to the speed of in situ methods.

      We agree with the reviewer and have modified the sentence with a citation:

      p. 12 line 20

      This is significantly faster than in situ methods which process millions of cells over a period of a few days(Feldman, D.; Singh, A.; Schmid-Burgk, J. L.; Carlson, R. J.; Mezger, A.; Garrity, A. J.; Zhang, F.; Blainey, P. C. 2019).

      Page 12 - I think the authors could say a bit more about the possibility of low hit rate screens. How low do they think it is feasible to go? What hit rates are expected based on existing arrayed optical screens?

      We have added more description in the discussion section:

      p. 13 the second paragraph

      Optical enrichment screening also is possible for phenotypic screens with relatively low hit rates (defined as the fraction of all genes screened that are true hits). The ability to detect hits at low hit rates in our method depends on multiple factors, including: 1) the penetrance of the phenotype; 2) cellular fitness effect of the phenotype; 3) detection and photo-activation accuracy of the phenotype; 4) limitations imposed by FACS recovery and sequencing sample preparations of low cell numbers. The first three factors vary with the phenotype of interest. We optimized the genomic DNA preparation protocol (Methods), and are now able to process sequencing samples from a few thousand cells, enabling screens of low hit rate phenotypes. In our nuclear size screen, more than 1.5 millions cells were analyzed during each run with 2000-4000 cells recovered after FACS sorting. The hit rate of this screen was 2.76%, similar to optical CRISPR screens performed in an arrayed format(de Groot, R.; Luthi, J.; Lindsay, H.; Holtackers, R.; Pelkmans, L. 2018)**, demonstrating the possibility to apply our approach to investigate phenotypes with low hit rates.

      Page 14 - It is weird that the discussion includes a fairly important couple of paragraphs that seem to belong in the results (e.g. the text surrounding Figure 4b and c). Obviously, I don't want to prescribe stylistic changes, but I suggest the authors consider moving this description of the experiments/analyses to the results.

      The relevant description has been moved to the results.

      Page 14 - The authors validate their hits individually, and observe that expression of hit sgRNAs does increase nuclear size in some cells. But, many/most cells remain control-like in these validation experiments. The authors should comment on why this is the case (e.g. inefficient knockdown, cell cycle effects, etc).

      To address this point, we have added the following sentences in legend of Fig. 5:

      The cell population is heterogeneous due to inefficient knockdown, incomplete puromycin selection, and penetrance of the phenotype. A BFP was expressed from the same sgRNA construct. Only cells with high BFP intensity, indicating successfully sgRNA transduction, were included for data analysis as described in Methods.

      Page 14 - It would be nice to formally compare the control and sgRNA distributions in each panel of 4a and Supplementary Figure 5 (e.g. with a Komolgorov-Smirnov test, etc). That would allow a more precise statement to be substituted for "14 out of 15 hits (the exception was TACC3) were confirmed to be real hits, with cells exhibiting larger nuclei after knock down (Fig. 4a and Fig. S5)," which is not quantitative.

      We applied the Kolmogorov-Smirnov test and the corresponding sentence was changed into:

      p. 10 last line

      *14 out of 15 hits were confirmed to be real hits (Kolmogorov-Smirnov test two tailed p-value

      Figure 2a - I am not sure it is necessary to show the entire workflow again. The first and possibly last panels are the informative ones here.

      Figure 3a - Same comment as above - these workflow panels take up a lot of real estate and I suggest simplifying them if possible.

      The figures were simplified to just show the example images.

      Figure 3c - At least on my PDF/screen, the "scrambled control" points appear very light gray and are impossible to find. They should be an easier to spot color.

      We agree with the reviewer and changed the color.

      Figure 4b - "Most cells developed a larger cellular size and higher H2B-mGFP level after knock down." I think it would be more accurate to say that the median cell size/GFP level increased, or that some cells developed larger sizes/median GFP levels.

      We agree with the reviewer’s point; “most” has been changed to “some”.

      Figure 4c - I don't understand "Normalized FITC/nuclear size." Do the bars show the mean/median of a population (if so, why not show a dot plot or box plot or violin plot)? Also, what is FITC (I presume it's GFP levels)?

      Figure 4c - "Most cells maintained a constant ratio between nuclear size and DNA content..." I'm not sure where DNA content came from. Are the authors assuming that their H2B-mGFP is a proxy for DNA content? Or was some other measurement made? If the former, is there a citable reason why this is a good assumption?

      The bars represent the ratio of the median level of H2B-mGFP intensity (the axis is now labeled with "GFP" rather than "FITC", the colloquial name for the channel used on the FACS machine) measured by FACS and the median nuclear size of the same population of cells measured by microscopy. We plan to perform additional experiments to measure DNA content using a DNA dye in the same cell by microscopy so that we will be able to correlate these on a cell by cell basis. Data will be added before final publication.

      Reviewer #2 (Significance (Required)):

      I don't generally comment on significance in reviews. Since ReviewCommons is specifically asking, I'll say that this manuscript describes optical enrichment, a method that is an extension of previous work and is substantially similar to a previously published method, Visual Cell Sorting. However, given the timing, it is obvious that these authors have been working independently on optical enrichment. Since the application is distinct, and optical enrichment incorporates some nice features like software to make it easier to execute, it is clearly of independent value.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This study reports a rapid and high-throughput CRISPR-based phenotypic screen approach consisting of selecting cells with phenotypes of interest, label them by photo-conversion and isolating them by FACS. The idea of the method is interesting (has been around) in principle. The key advantage is that is relatively simple, accessible to many groups as it does not require robotics. However, the manuscript is so badly written and hard to follow, that it makes it difficult to judge the technology, to really understand how the experiments were done and whether the results are interpreted correctly. Strictly speaking, it is unclear whether and how good scientific practices GSP have been followed, as the description of the experiments is sometimes lacking totally. Consequently, it is impossible to seriously evaluate this study and judge whether the technology described is really promising. It is probably less sensitive than arrayed screens, in all likelihood can miss hits that affect growth, cannot capture as many phenotypic classes as one would like from high-content screens and the computational and experimental workflow is more complicated. It is puzzling that the authors don't even compare the results with arrayed screens which are of course the current gold-standard.

      We do not in any way claim that the presented method replaces arrayed screens. However, most current sgRNA libraries are pooled libraries, and the few available arrayed sgRNA libraries are expensive and difficult to maintain, hence our methods to screen pooled sgRNA libraries are timely and useful. Comparisons with arrayed screens are unwarranted as no claims are made with respect to arrayed screens.

      We have clarified the manuscript in many places, and hope it is now readable and better understandable by more readers with diverse backgrounds.

      **Specific points:**

      The specificity test (Fig 1) does not make sense how it is described. If the authors spike a certain percentage of cells that can be photoconverted, when analysing the outcome, there will be three classes: mIFP positive, mIFP/mCherry positive and negative. How can they calculate specificity if they do not know whether they converted all mIFP cells? Also the formula used is questionable or is her an error? Furthermore, it is totally unclear how many cells were used and how they were scanned. If they took 90 negative cells and 10 mIFP cells, getting them all back is easy. If they start with 10e9 cells, the specificity should be quantified. Furthermore, the phenotype they pick is an easy and convenient one. Much more challenging is to apply it on a multi-parametric phenotype. Again, this is now the gold standard.

      We used the term specificity inadvertently and should have used precision, as also pointed out by Referee 2. This has been corrected in the current manuscript. We picked the mIFP phenotype as this was a proof of principle screen to clarify the performance of our screening approach and needed a phenotype that can be measured both by microscopy and FACS. We demonstrate that multi-parametric read-outs are possible, but do not think that the first demonstration of new technology needs such an application.

      In their first sgRNA assay, it is not possible to have a clear idea of what groups they are talking about. Do they mean they get phenotypic signatures which they group? How? They need to describe what they do. Here, only ~3500 genes are scanned (the 6843 is both populations and you only select from the mIFP neg population) and it took them 8hrs. This means for the genome it would require ~60h which is indeed fast. However, this experiment is not clearly described. They cannot select the negative population since there is no fluorescent marker (except false positive which are around 1.7%). So I assume they just randomly pick cells (they should really explain much better what they do!). Why go through the hassle? If these sequences are supposed to be a negative population, just pick them in the computer. Also, they cannot calculate an enrichment compared to the negative population, since two different libraries were infected. Again, I can't follow.

      We improved the description of this experiment. To clarify, we used mIFP in a proof of concept screen to validate whether sgRNAs infecting mIFP positive cells can be distinguished from those infecting mIFP negative cells No phenotypic signature other than the mIFP signal is used (as described in the text). As customary in pooled screens, a primary comparison was made between the positive (optically selected) cells and the complete population. To improve the clarity of this screen, we further described the concept of pooled sgRNA screens, which may have made this section harder to follow.

      I find their results about calculating scores based only on true negatives surprising. The average phenotypic score is improved from 3 to 5, which is enormous. This suggests that the phenotypes induced in the mIFP population are extremely common. These results are hard to interpret given the poor description of the experiment. It is possible that it is the same dataset as in 1, but in that case, the false negatives must be rare since the negatives can be selected by absence of both mCherry and mIFP.

      There are no phenotypes induced in the mIFP population (as now explicitly explained in the text). The mIFP population is isolated using optical enrichment, and we test our ability to discriminate the sgRNAs present in the enriched population. It is unsurprising that comparing to the negatively selected population (which is not possible in most other pooled screens) is significantly better than comparing against the total population (as customary in pooled screens).

      In the nuclear size screen, 6000 sgRNAs were screened. To array so many sequences would require 20 plates. They required ~40h for imaging one replicate. This is slow, imagine the time with a 60x lens.

      There are no arrayed screens performed in our study.

      Reviewer #3 (Significance (Required)):

      Overall, there is no sufficient evidence in this manuscript to convince this reviewer that this method is valid and truly powerful. I cannot support publication in its present form.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This study reports a rapid and high-throughput CRISPR-based phenotypic screen approach consisting of selecting cells with phenotypes of interest, label them by photo-conversion and isolating them by FACS. The idea of the method is interesting (has been around) in principle. The key advantage is that is relatively simple, accessible to many groups as it does not require robotics. However, the manuscript is so badly written and hard to follow, that it makes it difficult to judge the technology, to really understand how the experiments were done and whether the results are interpreted correctly. Strictly speaking, it is unclear whether and how good scientific practices GSP have been followed, as the description of the experiments is sometimes lacking totally. Consequently, it is impossible to seriously evaluate this study and judge whether the technology described is really promising. It is probably less sensitive than arrayed screens, in all likelihood can miss hits that affect growth, cannot capture as many phenotypic classes as one would like from high-content screens and the computational and experimental workflow is more complicated. It is puzzling that the authors don't even compare the results with arrayed screens which are of course the current gold-standard.

      Specific points:

      The specificity test (Fig 1) does not make sense how it is described. If the authors spike a certain percentage of cells that can be photoconverted, when analysing the outcome, there will be three classes: mIFP positive, mIFP/mCherry positive and negative. How can they calculate specificity if they do not know whether they converted all mIFP cells? Also the formula used is questionable or is her an error? Furthermore, it is totally unclear how many cells were used and how they were scanned. If they took 90 negative cells and 10 mIFP cells, getting them all back is easy. If they start with 10e9 cells, the specificity should be quantified. Furthermore, the phenotype they pick is an easy and convenient one. Much more challenging is to apply it on a multi-parametric phenotype. Again, this is now the gold standard.

      In their first sgRNA assay, it is not possible to have a clear idea of what groups they are talking about. Do they mean they get phenotypic signatures which they group? How? They need to describe what they do. Here, only ~3500 genes are scanned (the 6843 is both populations and you only select from the mIFP neg population) and it took them 8hrs. This means for the genome it would require ~60h which is indeed fast. However, this experiment is not clearly described. They cannot select the negative population since there is no fluorescent marker (except false positive which are around 1.7%). So I assume they just randomly pick cells (they should really explain much better what they do!). Why go through the hassle? If these sequences are supposed to be a negative population, just pick them in the computer. Also, they cannot calculate an enrichment compared to the negative population, since two different libraries were infected. Again, I can't follow.

      I find their results about calculating scores based only on true negatives surprising. The average phenotypic score is improved from 3 to 5, which is enormous. This suggests that the phenotypes induced in the mIFP population are extremely common. These results are hard to interpret given the poor description of the experiment. It is possible that it is the same dataset as in 1, but in that case, the false negatives must be rare since the negatives can be selected by absence of both mCherry and mIFP.

      In the nuclear size screen, 6000 sgRNAs were screened. To array so many sequences would require 20 plates. They required ~40h for imaging one replicate. This is slow, imagine the time with a 60x lens.

      Significance

      Overall, there is no sufficient evidence in this manuscript to convince this reviewer that this method is valid and truly powerful. I cannot support publication in its present form.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Yan et al. present optical enrichment, a method for conducing pooled optical screens. Optical enrichment works by combining microscopy to mark cells of interest using the PA-mCherry photo-activatable fluorescent protein with FACS to recover them. The method is similar to other methods (Photostick, Visual Cell Sorting), and provides an alternative to in situ sequencing/FISH methods. The authors use optical enrichment to conduct a pooled optical CRISPRi screen for nuclear size. They identify and exhaustively validate hits, showing that optical enrichment works for its intended purpose. The development of a uManager protocol and discussion of the number of sgRNA's required for a genetic screen using optical enrichment were welcome. The authors' reported throughput of 1.5 million cells per eight hour experiment is impressive; and the demonstrated use of low cell number input for next generation sequencing appears promising. Overall, the manuscript is well written, the methods clear and the claims supported by the data presented.

      General comments

      -I found the analysis and scoring methods to be lacking, both in terms of the clarity of description and in terms of what was actually done. The authors might consider using established methods (eg https://www.biorxiv.org/content/10.1101/819649v1.full). In any case, they should revise the text to clarify what was done and address the other concerns raised below.

      -Relatedly, details regarding how to perform the experiments described are lacking. It is not clear from the text, figures, "Online Methods" section, and Supplementary Files whether all imaging is performed before activation, or whether each field of view is subject to an individual round of imaging followed by activation. It is also unclear whether cells in 96 well plates are sorted as 96 separate tubes or pooled into a single tube prior to sorting. Furthermore, at a minimum, the following details are requested for each optical enrichment "run". These details are critical considerations for those who seek to use optical enrichment in their own laboratories: • Seeding density • Time elapsed (in hours) between cell plating and optical enrichment • The number of fields of view examined • The median number of cells per field of view; the proportion of each plate's surface area that is imaged and photo-converted • The total time taken (in hours) to perform imaging and photoconversion • The gating protocol used for sorting by FACS (preferably including a figure with example gates for one or two experiments). The gating protocol is described for the genetic screen but not for the control experiments.

      -The authors use PA-mCherry. There are a variety of other photo-activatable fluorophores available, and it would be good for them to comment on why they chose PA-mCherry. Also, since the method is supposed to be used for generic pooled optical screens, it would be good for the authors to comment on what colors remain available for imaging cellular structures.

      -In general, the figures are hard to read, with most space being dedicated to beautiful but complex schematics/workflows. Points and fonts should be bigger, and the authors should consider revising the schematics to take up less space.

      -There is extensive use of editorialzing adverbs. Adverbs such as "highly" (abstract and page 15), "easily" (pages 4 and 11), "completely" (page 11), and "only" (page 12) are unnecessary at best and unsupported by the data at worst (e.g. cells are not "completely" separable with 100 ms photo-conversion, see page 11 and Figure 1C). Please remove "completely" from page 11 and consider removing other adverbs as well.

      -Apologies if I missed it, but I couldn't find a data availability statement. Sequencing reads from the experiments should be deposited in SRA or GEO and made available upon publication.

      Specific comments

      Pages 5/6 - The authors present experiments that show that optical enrichment is highly specific for desired cells. But, they should consider presenting precision (fraction of called positives that are true positive) and recall (fraction of all true positives that are called positive) instead. I think these relate more directly to a pooled optical screen than specificity.

      Page 6 - Related to the above point, the authors state "These results indicate the assay yields reliable hit identification regardless of the percentage of hits in the library." This statement seems too strong given that the authors looked at specificity experimentally with a mixture of ~1% mIFP positive cells. In fact, hits might be much less than 1% of the total population of cells, and specificity would certainly fall from the 80% measured at 1% of the total population. The authors should do a bit more to fairly discuss their ability to find rare hits.

      Pages 6/7 - The authors perform a validation experiment using two different sgRNA libraries, infecting mIFP- and mIFP+ cells separately. Then, they demix these populations via optical enrichment, sequence and compute a phenotype score for sgRNAs or groups of sgRNAs. The way the experiment is described and visualized is extremely confusing. If I understood correctly (and I am not sure that I did), the bottom right panel of Figure 2b shows that if sgRNAs are (randomly?) paired AND two replicates are combined then optical enrichment nearly perfectly separates all (combined, paired) sgRNAs in the two libraries. The authors should rewrite this section, especially clarifying what is meant by "1 sgRNA/group and 2 sgRNA/group," and consider changing Figure 2b (perhaps just show the lower right panel?).

      Page 8 - Related to Supplementary Figure 3, why are there not clear BFP+ and BFP- populations but instead one continuous population? How was the gating determined (e.g. how was the boundary between red and gray picked)? Here, and generally, flow plots and histograms of flow plots should indicate the number of cells. If replicates were performed, they should be included.

      Page 8 - "Nuclear sizes...". The authors should say in the main text what size metric was used.

      Page 9 - I am a little confused about the statistical analysis of the screen. In Supplementary File 1, the authors state that p-values were "calculated based on comparison between the distribution of all the phenotypic scores of sgRNAs targeting to the gene/assigning in the group and the one of negative control sgRNAs in the libraries." I presume this means that all phenotypic scores (across replicates) of all sgRNAs targeting each gene were included in a Mann Whitney U test with a single randomized set of phenotypic scores. If that's right, it seems like an odd way to get p-values. Better would be a randomization test, where a null distribution of phenotypic scores for each gene is built by randomizing sgRNA-level scores many times. Then the actual phenotypic score is compared to the randomized null distribution, yielding a p-value. In any case, the authors must clarify what they did in the main text and Supplementary File 1.

      Page 9 - It does not appear that the p-values presented in Figure 3c have been adjusted for multiple hypothesis testing. This should be done.

      Page 9 - "A value of the top 0.1 percentile of control groups was used as a cutoff for hits." Why? This seems arbitrary. It seems like appropriate false-discovery rate control would enable a more rigorous method for choosing a cutoff. Page 9 - The same comments regarding analysis and scoring of the optical enrichment screen applies to the FSC and GFP screens.

      Page 9 - "These data suggest that a direct measurement utilizing a microscope can provide significant improvement in hit yield even for phenotypes that could be indirectly screened with other approaches." I think this conclusion is too strong. It rests on the assumption that the FSC/GFP phenotypes should have the same set of hits as the microscope phenotype (larger nuclear area). This may not be the case. For example, genes whose inactivation increases GFP expression would be hits in the former, but not latter case. The authors should moderate this statement.

      Page 11 - "This is significantly faster than the in situ methods." The authors should provide a citation and an actual comparison to the speed of in situ methods.

      Page 12 - I think the authors could say a bit more about the possibility of low hit rate screens. How low do they think it is feasible to go? What hit rates are expected based on existing arrayed optical screens?

      Page 14 - It is weird that the discussion includes a fairly important couple of paragraphs that seem to belong in the results (e.g. the text surrounding Figure 4b and c). Obviously, I don't want to prescribe stylistic changes, but I suggest the authors consider moving this description of the experiments/analyses to the results.

      Page 14 - The authors validate their hits individually, and observe that expression of hit sgRNAs does increase nuclear size in some cells. But, many/most cells remain control-like in these validation experiments. The authors should comment on why this is the case (e.g. inefficient knockdown, cell cycle effects, etc).

      Page 14 - It would be nice to formally compare the control and sgRNA distributions in each panel of 4a and Supplementary Figure 5 (e.g. with a Komolgorov-Smirnov test, etc). That would allow a more precise statement to be substituted for "14 out of 15 hits (the exception was TACC3) were confirmed to be real hits, with cells exhibiting larger nuclei after knock down (Fig. 4a and Fig. S5)," which is not quantitative.

      Figure 2a - I am not sure it is necessary to show the entire workflow again. The first and possibly last panels are the informative ones here.

      Figure 3a - Same comment as above - these workflow panels take up a lot of real estate and I suggest simplifying them if possible.

      Figure 3c - At least on my PDF/screen, the "scrambled control" points appear very light gray and are impossible to find. They should be an easier to spot color.

      Figure 4b - "Most cells developed a larger cellular size and higher H2B-mGFP level after knock down." I think it would be more accurate to say that the median cell size/GFP level increased, or that some cells developed larger sizes/median GFP levels.

      Figure 4c - I don't understand "Normalized FITC/nuclear size." Do the bars show the mean/median of a population (if so, why not show a dot plot or box plot or violin plot)? Also, what is FITC (I presume it's GFP levels)?

      Figure 4c - "Most cells maintained a constant ratio between nuclear size and DNA content..." I'm not sure where DNA content came from. Are the authors assuming that their H2B-mGFP is a proxy for DNA content? Or was some other measurement made? If the former, is there a citable reason why this is a good assumption?

      Significance

      I don't generally comment on significance in reviews. Since ReviewCommons is specifically asking, I'll say that this manuscript describes optical enrichment, a method that is an extension of previous work and is substantially similar to a previously published method, Visual Cell Sorting. However, given the timing, it is obvious that these authors have been working independently on optical enrichment. Since the application is distinct, and optical enrichment incorporates some nice features like software to make it easier to execute, it is clearly of independent value.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript Yan et al describe a method to perform imaging based pooled CRISPR screens based on photoactivation followed by selection and sorting of the cells with the desired phenotypes. They establish a system in mammalian RPE-1 cells where they integrate a photo-activatable mCherry, identify the cells of interest under the microscope based on a phenotype, automatically activate the mCherry fluorescence in these cells and then sort the desired populations by FACS. They demonstrate the reliability of their enrichment method and finally use this approach to look for factors that regulate nuclear size by a targeted pooled CRISPR screen.

      Major points:

      1.This year Hassle et al described a very very similar approach that they name: Visual Cell Sorting . In this case, they use a photoconvertible fluorescent protein (green-to-red conversion) to select cells with a certain visual cellular phenotype and enrich those by FACS. The Hassle et al 2020 MSB paper is only mentioned together with the other methods in the introduction in one sentence (ref #19 in this manuscript):

      " Recently, several in situ sequencing15,16 and cell isolation methods17-20 were developed which allow microscopes to be used for screening. However, these methods contain non-high throughput steps that limit their scalability."

      I think the current citation of the Hassle et al paper, is not really fair. The idea and the execution of the two approaches are almost exactly the same. Here, the authors concentrate on a CRISPR based application, but obviously the applications of the method are not limited to that. The authors should discuss how these similar ideas can be used in several different applications.

      1. While I understand that the authors mean conversion from the dark state to fluorescent state when they describe their photo-activatable mCherry, I think the term "photo-activation" can be confusing for the general reader since typically photo-conversion refers to a change in color. I would here suggest stick to the term photo-activation.
      2. For validation of the hits coming from the nuclear size screen: Did the authors have any controls making sure that the right targets were down-regulated? This might be obvious for some of the targets (e.g. CPC proteins that are known to induce division errors display the nuclear fragmentation that the authors also observe) but especially for the ones that are less known or unknown to induce any nuclear size change, it will be important to demonstrate the specificity of the targets. In addition, it is not clear from the figure legends and the material and methods if these phenotypes are verified by 3-4 gRNAs they use in the validation. Are the histograms representative of a single experiment with one gRNA or a combination of gRNAs in different experiments? Methods of replication of the data presented in Fig4 is unclear.

      Minor points:

      1. Related to major point #3: I could not find much experimental info on how the hits from the screen were verified in materials and methods.
      2. The legend of Figure 4c is not describing what the plot is showing. Instead it tells the readers the authors' interpretation of the data.
      3. Figure S1b there is a typo

      Significance

      I think the idea of performing pooled screens coupled to microscopy is exciting and this approach has definitely more potential than the Craft-ID approach that the authors also discuss in their manuscript. In addition, the approach that is described in this manuscript is convincing and although the fact that the analysis part will require more work (to adapt the software to recognise different types of phenotypic readouts) in the future to make it accessible to the scientific community, the authors present sufficient evidence that the system can be robust. They also present some clever ideas such as to calculate enrichments with different photo-activation times (2sec vs 100ms) followed by separation of these populations by FACS.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to express our upmost gratitude to the three anonymous reviewers for their constructive and insightful comments on our manuscript. We broadly agree with all comments made and have uploaded a preliminary revised version with changes highlighted in bold. We now deal with each of the reviewer comments in turn.

      Reviewer #1

      L50-52: Can you predict where the unmapped read came from? Could viral infections be the source as in land plants?

      Having done a crude examination of unmapped reads, we couldn't find compelling evidence of them being of viral origin. The unmapped fraction in fact was in the same region as seen for other sRNA libraries in our lab which we found to occur for a number of reasons such as sequencing errors, incomplete assembly, differences between the sequenced lines and the reference line. Those all result in unmapped reads, which is also cause by since we employed a stringent mapping (0 mismatches).

      L67-68, which is the explanation?

      Thank you for querying this. After much closer inspection of the papers cited by Casas-Mollano et al. as evidence of the 23nt peak the evidence for the 23nt doesn't seem that strong and may even be a mistake on their part. Nonetheless, it is far from a critical piece of information for this paper and we have thus decided to remove this sentence.

      Fig 1D the reference to the A,C,G,U 5' should be re-positioned within Figure 1D panel space.

      Thanks, this has been addressed.

      Figure 3: it could be a supplementary figure based on the relevance given in the manuscript to this point.

      We agree, and have moved Fig3 to Supplement.

      *P5, line 107: while commenting on strand bias there seems to be a mistake in strong bias definition, it should be x 0.8, not "strong bias (0.2

      Thank you for pointing this out, we have now corrected this error. We have duly corrected it in the text.

      P5, line 110: marked changes regarding locus size are not as striking in my opinion, in particular log size 6 and following, which is not marked in the graph (the cut off between 6 and 8). Maybe this curve should be split into two distribution graphs based on some important features (as repetitiveness?) that might allow a better definition of cut-offs.

      Thank you for pointing this out. You are correct that the changes in the density distribution are not as striking for locus size. A great deal of deliberation on our part went into deciding what to do about this. In the end, we decided that for the size classes there was benefit in having several different classes with the understanding that having additional potentially redundant cut-offs would not adversely effect the analysis. In doing this, we were partially driven by the albeit subtle changes in the curve, but also by the desire to have size classes that were biologically relevant and informative. For example, a locus 3000nt captures the long tail. However, we neglected to fully explain these subtleties in our decision-making, something we have now rectified through some added explanation in the text. These choices were validated by the way size classes are differentially associated with different locus clusters in Figure 8.

      Fig 5: the legend has the C subfigure twice, the second should be D.

      Thank you for highlighting this. It has now been corrected.

      Table 1: I believe the data would be better presented in a plot, potentially something similar to the plot in Figure 1 A and B. The numbers are already presented in the supplementary spreadsheet.

      Thanks for pointing this out. We agree with this suggestion and have replaced Table 1 with a Figure (Fig 5) which is indeed a better way to present those results.

      Fig 6A: The boxplots regarding Stability of the clusters should be better described. What exactly does the y-axis in each "small plot" represent?

      Thank you for pointing this out, we understand that this isn't clear at the moment. Briefly, for this analysis we performed the clustering multiple times each time with a random sample of the loci (with replacement) of the same size as the original dataset. We then calculated the proportion of loci that retained their original clustering. We have clarified this in the figure legend and also elaborated on the approach in the methods section to ensure that it is better described.

      P6, line 142: analyses of stability and variance shows 7 as the optimal k, while gap statistics and NMI suggested 6 as the optimal. It is not clear why 6 was preferred. The MCA section in Methods is unclear regarding this point too.

      Thank you for querying this. The process of choosing the appropriate value of k is a complicated one and we appreciate that the explanation could be clearer. After your comment, we re-visited our decision-making process and were reassured that a k value of 6 rather than 7 was indeed appropriate. The stability plots in Fig. 6A start with k=2 and it can be clearly seen for k=6 that stability is comparatively high for dimensions 7-10. Indeed, k values of 2,3 and 6 seem to be the only feasible values. k=7 is fairly unstable for all dimensions from 1-8. We have done some rewording of the methods to hopefully make this clearer.

      Fig S2-S5: please check legends, they are identical, although they should cover examples of loci in LC2 through LC5. These figures are not cited in the text, only S1 and S2.

      Thanks for pointing this out. This is now corrected and we have referenced all figures in the main text.

      Fig 9: I suggest using different colors in density plots to ease interpretation. LC tracks could share a color and Gene, TEs, DNA meth, and All loci should have a different color each.

      A good suggestion - this has been replotted with different colours.

      Supplementary Files S1: The full-annotated locus map should be provided as a spreadsheet file or as a text (.csv) file, not as a pdf file.

      Thanks for pointing this out. We originally submitted this file as a gff format. We are not sure why this got converted. We will make sure this is going to be in appropriate format in the final form, especially having suffered from the pains of pdf tables ourselves in the past.

      I may be misunderstanding Fig. 6E, but it looks strange that the observed sum-of-squares is smooth, but the expected is not. Is it possible that the in-figure reference is inverted?

      Indeed, the colours were inverted. Thanks a lot for that spot, we have now swapped them around.

      Reviewer #2

      I am concerned that the methodology used does not adequately distinguish small RNA loci that are attributable to random RNA degradation products from loci that are truly fit the DCL / AGO paradigm. I think this is critical to maximize the utility of the annotations for the community. This issue was not directly addressed in the current version of the manuscript. There is cause for concern: 64% of the annotations overlap with protein-coding genes (lines 116-117), 55% with exons (line 118), and 41% of loci show strong strand bias (lines 123-124). These are all associations expected for breakdown products of mRNAs. Furthermore, only 11% of the loci were found to be dependent on CrDCL3 (line 123). Small RNA sequencing data from the other 2 DCL mutants are not yet available (line 211). One way that has been effective in angiosperms is to track the proportion of "DCL-sized" RNAs within all RNAs from each locus. Loci comprised of random degradation products will be single-stranded, generally touching exons, and have a very wide size distribution. In contrast, loci where the small RNAs are truly created by a DCL protein will have a very narrow size distribution. In any event, I think a strong effort to identify and flag small RNA loci that are less likely to be DCL / AGO silencing RNAs, and more likely to be degradation products, would be an important change to this study.

      Thank you for this very insightful comment which has helped us to reflect on the methodological approach. While it is likely that there are some RNA breakdown products picked-up in the sRNA sequencing, we do not think that the locus-map as a whole is undermined by this. For example 54% of loci have a predominance for 21-nt sRNAs and 18% for 20-nt sRNAs, so the majority of sRNA loci do have a predominance for a specific RNA size.

      However, your point does raise a very valid concern with implications for the interpretation of LC4. Although we posit some explanations for these loci (e.g. DCL-mediated sRNA production without an accessory protein to provide PAZ domain-like sRNA measurement), given the very strong strand bias and association with genic regions we do agree that there is a risk that these loci predominantly represent degradation fragments. Therefore, we have now reworded how we discuss LC4 in the discussion to reflect this. This also reveals a key advantage of the clustering approach in that should LC4 indead represent degradation products, they have been successfully grouped together into a seperate cluster such that they don't undermine the insights gained from the other locus clusters.

      One of the key results likely to be used by others is the final GFF3 file (Sup File S1). The Description fields in this file are extremely verbose. Do these load well on a genome browser? I suggest it might be good to store most of the information currently in the Description field in a separate flat file, and limit the GFF3 descriptions to key information (locus name, the LC group).

      Thank you for pointing this out. In a pursuit to share as many details as possible, we appreciate that this can be too verbose, as righlfully noticed here. In order to not compromise detail too much, we have created a second, toned down, version as csv which now includes essential details such as name, position and LC. As for the gff, we kept all details in since it loads quickly in a genome browser, but also into other tools such R in which those feature can be used as efficient filters.

      Sup Table S1 would be much more useful for future researchers if it had a column with the direct accession numbers for the raw sequencing libraries.

      We have included another table which includes direct accession number for ENA as well as numerous other meta data in Sup Table S6 i.e. "Supp_Table_S6_library_ENA_accession"

      Figures showing genome browser snapshots are too small; the text is mostly illegible on screen and when printed. This includes Figure 4 and Figures S1-S5.

      The snapshots have been improved to ensure better readability.

      Lines 67-68: This is unclear to me. Did the authors do Northerns? Please clarify / re-write.

      Thank you for querying this. After much closer inspection of the papers cited by Casas-Mollano et al. as evidence of the 23nt peak the evidence for the 23nt doesn't seem that strong and may even be a mistake on their part. Nonetheless, it is far from a critical piece of information for this paper and we have thus decided to remove this sentence.

      Figure 2B: X-axis label, perhaps change to "number of reads in library" for clarity.

      We agree and have changed it accordingly

      Figure 4 caption: The acronym "CRSL" should be defined.

      CRSL is now been duly defined in the manuscript

      Line 387: Reference #29 (line 509): There is not enough information here to find the data.

      We have used the appropriate bibtex code to reference this Zenodo share (https://zenodo.org/record/3862405/export/hx). The current cite format does somehow omit some information. We hope this will be fixed by the publisher but we have also provided the full DOI address in the “additional information” section just in-case. We will keep an eye on how it comes out.

      Style suggestion on title: What is "secret" about the genome? I didn't really understand that first part of the title. Perhaps consider revision to make it more factual and less literary. Just "A small RNA locus map for Chlamydomonas reinhardtii"?

      Thank you for this suggestion, we have adapted the title to make it more descriptive.

      Reviewer #3

      …the evolutionary implications are not clear. The authors state in the abstract that "These results are consistent with the idea that there was diversification in sRNA mechanisms after the evolutionary divergence of algae from higher plant lineages." Although in the end this may prove to be correct, the only species compared are Arabidopsis thaliana (as representative of land plants) and Chlamydomonas reinhardtii (as representative of green algae). With this very limited information it is not possible to infer the sRNA loci (much less sRNA mechanisms) in an ancestral species. It remains formally possible that an ancestral progenitor species had a greater diversity of sRNA loci that were subsequently lost in a selective manner in specific lineages. Moreover, the diversity of sRNA loci may not correlate strictly with the diversity of the RNAi machinery since, at least some loci, do not appear to be associated with RNAi components such as Dicer or Argonaute.

      Thank you for these insightful comments. As we followed a very similar methodological approach to that used to produce the Arabidopsis sRNA locus map published in Hardcastle et al. (2018), we wanted to take the opportunity to compare the results and build upon the ongoing discussion concerning the evolution of sRNA mechanisms in Chlamydomonas (e.g. Valli et al. 2016). Your point about the possibility of an ancestral progenitor with greater diversity that was then lost is very valid. You are also of course correct about the limitations to what can be concluded from this study and the limited comparisons that can be made. We see our approach as a useful tool for hypothesis generation which can be complemented by more in-depth exploration in the future. With this in mind, and taking on board your comments, we have elaborated on our discussion of the evolutionary implications of our study, which we hope now gives a more balanced account.

      I may have missed it but I could not find a table listing the specific sRNA loci assigned to each of the locus classes. It would be very useful to provide the class annotation of each sRNA locus in order to facilitate future analyses of sRNA biogenesis and function.

      That information was indeed missing, thanks for bringing it up. We have now included this in the gff file (column LC) as well as in another cleaner table (Supp_Table_S7_loci_class_annotation).

      Figures S2 to S5 have the same legend but they correspond to different loci. It would be useful to provide for each locus class, as supplementary figures, two examples of typical sRNA loci.

      Thanks for pointing this out, this was an error on our part, the captions have now been corrected. Unfortunately, due to the ongoing pandemic-related restrictions we were unable to run to get a genome browser session to run to this point to create more loci figures.

      If information is available, the paper would be strengthened by some locus class validation based on features not used to generate the classification.

      Thank you for this suggestion. In fact, not all annotation features were used predictively in the MCA and clustering process, and so these "supplementary" annotations as outlined in supplementary table S3 can provide some cross-validation. With that in mind, we have now included an additional heatmap as a supplementary figure which shows associations for some of these supplementary annotations as well as corresponding explanations in the text. Further validation is provided by the chromosome tracks in figure 9 showing the distinct genomic distributions of each locus cluster despite chromosomal location not being a factor in the clustering.

      Pg 5, line 108. I think you mean "strong bias (0.2 > x > 0.8)."

      Thank you for pointing this out, we have now corrected this error.

      Pg 7, Table 1. Some of the annotation features are obvious but some abbreviations may need clarification using footnotes.

      Table 1 has been replaced by the new Fig 5, annotation/abbreviations should now be more obvious.

      Pg 8, lines 156-157. This sentence is not clear. Additionally, the legends to Figures S2-S5 do not refer to LC2 paragon (CSRL003890).

      Thank you for pointing this out. We have now moved the reference to the paragons to earlier in the section where we introduce the six clusters. We hope this is now clearer.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript presents a detailed map of sRNA (precursor) loci in the green alga Chlamydomonas reinhardtii based on large volumes of sequencing data (145 sRNA libraries). The locus map based on a false discovery rate of less than 0.05 had 6164 loci, covering 4.1% of the Chlamydomonas reference genome. Individual loci were annotated based on both intrinsic features, such as sRNA size, 5'-nucleotide, strand bias and phasing pattern, and extrinsic features, such as sRNA expression, genotype and overlap with genomic attributes (e.g., genes, transposons, methylation levels).

      By using the intrinsic and extrinsic features of each sRNA locus and Multiple Correspondence Analysis (MCA) approaches, the sRNA loci were clustered into six distinct classes, referred to as locus class (LC) 1-6. This strategy is partly validated by the grouping of well-characterized Chlamydomonas miRNAs into the same cluster, LC3.

      As the authors state, this data-driven approach is valuable for hypothesis generation since (with the possible exception of LC3) the biogenesis and function of most sRNA loci (and of the corresponding locus classes) remain uncharacterized in Chlamydomonas. The analysis provides a framework to facilitate future characterization of the diverse types of sRNAs in this model algal system.

      However, the evolutionary implications are not clear. The authors state in the abstract that "These results are consistent with the idea that there was diversification in sRNA mechanisms after the evolutionary divergence of algae from higher plant lineages." Although in the end this may prove to be correct, the only species compared are Arabidopsis thaliana (as representative of land plants) and Chlamydomonas reinhardtii (as representative of green algae). With this very limited information it is not possible to infer the sRNA loci (much less sRNA mechanisms) in an ancestral species. It remains formally possible that an ancestral progenitor species had a greater diversity of sRNA loci that were subsequently lost in a selective manner in specific lineages. Moreover, the diversity of sRNA loci may not correlate strictly with the diversity of the RNAi machinery since, at least some loci, do not appear to be associated with RNAi components such as Dicer or Argonaute.

      Some specific comments:

      1.I may have missed it but I could not find a table listing the specific sRNA loci assigned to each of the locus classes. It would be very useful to provide the class annotation of each sRNA locus in order to facilitate future analyses of sRNA biogenesis and function.

      2.Figures S2 to S5 have the same legend but they correspond to different loci. It would be useful to provide for each locus class, as supplementary figures, two examples of typical sRNA loci.

      3.If information is available, the paper would be strengthened by some locus class validation based on features not used to generate the classification.

      4.Pg 5, line 108. I think you mean "strong bias (0.2 > x > 0.8)."

      5.Pg 7, Table 1. Some of the annotation features are obvious but some abbreviations may need clarification using footnotes.

      6.Pg 8, lines 156-157. This sentence is not clear. Additionally, the legends to Figures S2-S5 do not refer to LC2 paragon (CSRL003890).

      Significance

      Chlamydomonas reinhardtii is a model unicellular green alga, the lineage of which diverged from land plants approximately one billion years ago. Chlamydomonas encodes a great number of diverse small RNAs. However, the biogenesis and function of the majority of these sRNAs are not known. By grouping sRNA loci into specific classes (based on intrinsic and extrinsic features), this manuscript provides a framework that will facilitate the future characterization of sRNAs in Chlamydomonas and, very likely, in other algal species. This information may also contribute to our understanding of the evolution of sRNA loci within eukaryotes.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This manuscript describes the annotation of small RNA-prodicing loci from the green alga Chlamydomonas reinhardtii. A large number of small RNA-sequencing datasets were anlayzed and used to create genome-wide annotations of small RNA-producing loci. These loci were annotated based on several features, and then classified into six major groups based on these features.

      Major comments:

      Are the key conclusions convincing? --> Yes.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? --> No

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary to evaluate the paper as it is, and do not ask authors to open new lines of experimentation. --> Yes, additional analyses should be conducted, see itemized list below.

      Are the suggested experiments realistic for the authors? It would help if you could add an estimated cost and time investment for substantial experiments. --> Perhaps a few weeks to a month of analysis and revision time.

      Are the data and the methods presented in such a way that they can be reproduced? --> Yes.

      Are the experiments adequately replicated and statistical analysis adequate? --> Yes.

      SPECIFIC COMMENTS:

      1.I am concerned that the methodology used does not adequately distinguish small RNA loci that are attributable to random RNA degradation products from loci that are truly fit the DCL / AGO paradigm. I think this is critical to maximize the utility of the annotations for the community. This issue was not directly addressed in the current version of the manuscript. There is cause for concern: 64% of the annotations overlap with protein-coding genes (lines 116-117), 55% with exons (line 118), and 41% of loci show strong strand bias (lines 123-124). These are all associations expected for breakdown products of mRNAs. Furthermore, only 11% of the loci were found to be dependent on CrDCL3 (line 123). Small RNA sequencing data from the other 2 DCL mutants are not yet available (line 211). One way that has been effective in angiosperms is to track the proportion of "DCL-sized" RNAs within all RNAs from each locus. Loci comprised of random degradation products will be single-stranded, generally touching exons, and have a very wide size distribution. In contrast, loci where the small RNAs are truly created by a DCL protein will have a very narrow size distribution. In any event, I think a strong effort to identify and flag small RNA loci that are less likely to be DCL / AGO silencing RNAs, and more likely to be degradation products, would be an important change to this study.

      MINOR COMMENTS:

      2.One of the key results likely to be used by others is the final GFF3 file (Sup File S1). The Description fields in this file are extremely verbose. Do these load well on a genome browser? I suggest it might be good to store most of the information currently in the Description field in a separate flat file, and limit the GFF3 descriptions to key information (locus name, the LC group).

      3.Sup Table S1 would be much more useful for future researchers if it had a column with the direct accession numbers for the raw sequencing libraries.

      4.Figures showing genome browser snapshots are too small; the text is mostly illegible on screen and when printed. This includes Figure 4 and Figures S1-S5.

      5.Lines 67-68: This is unclear to me. Did the authors do Northerns? Please clarify / re-write.

      6.Figure 2B: X-axis label, perhaps change to "number of reads in library" for clarity.

      7.Figure 4 caption: The acronym "CRSL" should be defined.

      8.Line 387: Reference #29 (line 509): There is not enough information here to find the data.

      9.Style suggestion on title: What is "secret" about the genome? I didn't really understand that first part of the title. Perhaps consider revision to make it more factual and less literary. Just "A small RNA locus map for Chlamydomonas reinhardtii"?

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.:

      This study provides a genome-wide annotation of small RNA-producing loci from Chlamydomonas reinhardtii. This will serve as a use data resource for researchers working with this model system. The results overall confirm what is known from previous studies of Chlamy small RNAs : They are rather distinct from angiosperm small RNAs and from animal small RNAs.

      Place the work in the context of the existing literature (provide references, where appropriate).:

      This may be the first study to provide a genome-wide annotation (as opposed to a focused effort) for Chalmy small RNA populations.

      State what audience might be interested in and influenced by the reported findings:

      Chlamy researchers, especially those interested in gene silencing and genome annotations, and small RNA specialists with interest in annotations and in wide phylogenetic comparisons.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. :

      Plant microRNAs, siRNAS, genetics, and genomics.