1,452 Matching Annotations
  1. Dec 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their constructive suggestions, which have substantially improved this work. We have comprehensively revised the manuscript, and detail individual responses below:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The study by Forbes et al describes and characterizes a 2nd generation peptide-based inhibitor of the MYB:CBP interaction, termed CRYBMIM, which they use to study MYB:cofactor interactions in leukemia cells. The CRYBMIM has improved properties relative to the MYBMIM peptide, and display more potency in biochemical and cell-based assays. Using a combination of epigenomics and biochemical screens, the authors define a list of candidate MYB cofactors whose functional significance as AML dependencies is supported by analysis of the DepMap database. Using genomewide profiling of TF and CBP occupancy, the authors provide evidence that CRYBMIM treatment reprograms the interactome of MYB in a manner that disproportionately changes specific cis-elements over others. Stated differently, the overall occupancy pattern of many TFs/cofactors shows gains and losses at specific cis elements, resulting in a complex modulation of MYB function and changes in transcription in leukemia cells. Overall, this is a strong, well-written study, with clear experimental results and relatively straightforward conclusions. The therapeutic potential of modulating MYB in cancer is enormous, and hence I believe this study will attract a broad interest in the cancer field and will likely be highly cited. I list below a few control experiments that would clarify the specificity of CRYBMIM. 1) Does CRYBMIM bind to other KIX domains, such as of MED15. It would be important to evaluate the specificity of this peptide for whether it binds to other KIX domains.

      Response: We analyzed all known human KIX domain sequences, and found that the most similar one to CBP/P300 is MED15 (38% identity), as shown in revised Supp. Fig. 2D. The sequence similarity of the remaining human KIX domains is substantially lower. To determine the specificity of CRYBMIM in binding the CBP/P300 versus MED15, we exposed human AML cell extracts to biotinylated CRYBMIM immobilized on streptavidin beads versus beads alone. Whereas CRYBMIM binds efficiently to CBP/P300, it does not exhibit any measurable binding to MED15 (even though MED15 is highly expressed), as shown in revised Supp. Fig. 2E, and reproduced for convenience below. While this does not exclude the possibility that CRYBMIM binds to other proteins, the biochemical specificity observed here, combined with the genetic requirement of CBP for cellular effects of CRYBMIM as shown by a genome-wide CRISPR screen (Fig. 1B and below), indicate that CRYBMIM is a specific ligand of CBP/P300. The manuscript has been revised on page 6 and 4-5 accordingly.

      2) Similarly, it would be useful to perform a mass spec analysis to all nuclear factors that associate with streptavidin-immobilized CRYBMIM. This again would be help the reader to understand the specificity of this peptide.

      Response: We agree with the reviewer that macromolecular ligands like CRYBMIM may interact with cellular proteins in complex ways. To define specific effects, we utilized four orthogonal strategies, explained below.

      First, we purified the CBP-containing nuclear complex using immunoprecipitation and determined its composition by mass spectrometry proteomics. This analysis revealed 833 proteins that are specifically associated with CBP (revised Table S6). Although technically feasible, the fact that CBP is associated with hundreds of proteins would make the experiment suggested by the reviewer difficult to interpret, because it would be a major challenge to distinguish proteins bound directly by the peptide versus proteins purified indirectly by virtue of the fact that CRYBMIM binds to CBP/P300, which in turn binds to many other proteins. While we recently developed improved methods for cross-linking mass spectrometry proteomics that permit the identification of direct protein-protein interactions (Ser, Cifani, Kentsis 2019, https://doi.org/10.1021/acs.jproteome.9b00085), we believe that these experiments are beyond the scope of the current manuscript, which already includes 40 new figure panels as part of this revision.

      In lieu of this experiment, we purified the CBP-containing nuclear complex after treatment with CRYBMIM or control using immunoprecipitation and determined its composition by targeting Western blotting. This analysis revealed RUNX1, LYL1 and SATB1 are specifically associated with CBP (revised Fig. 14B), among which RUNX1 is specifically remodeled in the MYB:CBP/P300 complex upon CRYBMIM binding. This transcriptional factor recruitment and remodeling support the idea of CRYBMIM’s specificity for the MYB:CBP/P300 complex.

      Second, to define the specificity of CRYBMIM, we used glycine mutants of CRYBMIM and its parent MYBMIM, CG3 and TG3, respectively, in which residues that form key salt bridge and hydrophobic interactions with KIX are replaced with glycines, but otherwise retain all other features of the active probes. Both CG3 and TG3 exhibit significantly reduced effects on the viability of AML cell lines, consistent with the specific effects of CRYBMIM (Fig. 3D).

      To confirm that this is due to CBP binding, we purified cellular CBP/P300 by binding to biotinylated CRYBMIM, and observed that it can be efficiently competed by excess of free CRYBMIM, but not TAT (Fig. 2E).

      Finally, to establish definitively that cellular CBP is responsible for CRYBMIM effects, we generated isogenic cell lines that are either deficient or proficient for CBP using CRISPR genome editing. This experiment demonstrated that CBP deficiency confers significant resistance to CRYBMIM, indicating that CBP is required for CRYBMIM-mediated effects (revised Figure 4), and reproduced below. We revised the manuscript on pages 21, 8, 6 and 9 accordingly.

      3) The major limitation of this study which modestly lessens my enthusiasm of this work is that the mechanistic model of MYB-sequestered TFs proposed here is based on a face-value interpretation of IP-MS data coupled with ChIP-seq data. Normally, I would expect such a mechanism to be supported with some additional focused biochemical experiments of specific interactions, to complement all of the omics approaches. For example, can the authors evaluate and/or validate further how MYB physically interacts with LYL1, CEBPA, SPI1, or RUNX1. Are these interactions direct or indirect? Which domains of these proteins are involved? Does CRYBMIM treatment modulate the ability of these proteins to associate with one another in a co-IP? Do these interactions occur in normal hematopoietic cells? A claim is made throughout this study that these are aberrant TF complexes, but I believe more evidence is required to support this claim.

      Response: We appreciate the reviewer’s comment and totally agree with this point. To examine how MYB aberrantly assembles transcription factors in AML, we performed MYB co-immunoprecipitation (co-IP) in a panel of seven genetically diverse AML cell lines with varying susceptibility to CRYBMIM, chosen to represent the common and refractory forms of human AML. Here, we confirmed co-assembly of CBP/P300, LYL1, E2A, LMO2 in all AML cell lines tested, and cell type-specific co-assembly of SATB1 and CEBPA, as shown in revised Fig. 8A, which are in agreement with the IP-MS and ChIP-seq results. We further corroborated these findings by co-IP studies of CBP/P300, as shown in the revised Fig. 8B. We performed similar co-IP experiments in normal hematopoietic progenitor cells, and found most of the co-assembled factors in AML cells were not observed in normal cells except for CBP/P300 and LYL1, as shown in the revised Figure 9E. Combined with the apparently aberrant expression of E2A and SATB1 in AML cells but not normal blood cells, this leads us to conclude that MYB assembles aberrant transcription factor complexes in AML cells. These complexes can be remodeled by peptidomimetic inhibitors, leading to their redistribution on chromatin, suppression of oncogenic gene expression and induction of cellular differentiation. We confirmed this mechanism by direct biochemical experiments in AML cells, demonstrating disassembly and remodeling of CBP/P300 complexes, as shown in the revised Figure 14. At least some of these interactions are direct, given the known direct binding between MYB and CEBPA (Oelgeschläger, Nuchprayoon, Lüscher, Friedman 1996, https://doi.org/10.1128/mcb.16.9.4717). We revised the manuscript text on pages 13, 15 and 21 accordingly.

      Reviewer #1 (Significance (Required)):

      Overall, this is a strong, well-written study, with clear experimental results and relatively straightforward conclusions. The therapeutic potential of modulating MYB in cancer is enormous, and hence I believe this study will attract a broad interest in the cancer field and will likely be highly cited.

      Response: We appreciate this sentiment and completely agree with the reviewer. The phenomenon reported in this work represents the first of its kind demonstration of the aberrant organization of transcription factor control complexes in cancer, and its pharmacologic modulation. We believe that this concept will serve as a transformative paradigm for understanding oncogenic gene control and the development of effective therapies for its definitive treatment.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This manuscript reports the generation of a new and improved peptide mimetic inhibitor of the interaction between MYB and CBP/P300. The original MYBMIM inhibitor of this interaction, reported recently by the same laboratory, was modified by addition and substitution of peptide sequences from CREB, thus improving the affinity of the resulting CRYBMIM peptide to CBP/P300. The improved inhibitor profile results in increased anti-AML efficacy of CRYBMIM over MYBMIM. The authors go on to examine the mechanism underlying the anti-AML activity of CRYBMIM by integrating gene expression analysis, chromatin immunoprecipitation sequencing and mass spectrometric protein complex identification in human AML cells. I have some minor questions the authors may wish to comment on:

      1) The relocation of MYB, along with CBP/P300, to genes controlling myeloid differentiation (clusters 4 and 9) upon CRYBMIM treatment is reminiscent of the increased binding of MYB to myeloid pro-differentiation genes in AML cells following RUVBL2 silencing, recently reported in Armenteros-Monterroso et al. 2019 Leukemia 33:2817. Do the authors know if there is any overlap between genes in either of the clusters and the list reported in the latter study?

      Response: We thank the reviewer for making this suggestion. We also observe both RUVBL2 and RUVBL1 in the protein complex specifically associated with MYB (Fig. 7A and B). We compared the gene expression changes induced by CRYBMIM with those reported by Armenteros-Monterroso et al in 2019 (https://doi.org/10.1038/s41375-019-0495-8), and found that 37% of upregulated genes by RUVBL2 silencing were shared with genes induced by CRYBMIM treatment. In addition, upregulated genes in cluster 4 and 9 included myeloid differentiation-related genes, such as JUN, FOS and FOSB, which were also induced RUVBL2 silencing. We revised the manuscript to reflect this association on page 12.

      2) Could the authors comment on a possible mechanism to explain the co-localization of MYB and CBP/P300 to the loci in clusters 4 and 9 following CRYBMIM treatment? Is it possible that CBP/P300 is recruited by other transcription factors to these loci, independently of binding to MYB? Or is the binding of CBP/P300 to MYB at these loci somehow more resistant to disruption by CRYBMIM?

      Response: The reviewer has focused on an interesting point. At least for cluster 9, these genes exhibit gain of CBP/P300 in association with RUNX1 (Figure 12A), which we confirm by direct biochemical studies of MYB and CBP/P300 complexes immunoprecipitated from AML cells (revised Figure 14B-C). These experiments show that CRYBMIM treatment disrupts the MYB:CBP/P300 complexes, leading to the increased assembly of CBP/P300 with RUNX1. These findings are consistent with a dynamic competition mechanism that governs availability of CBP/P300 to transcriptional co-activation, in which distinct transcription factors compete for limiting amounts of CBP/P300. This possible mechanism is discussed in the revised manuscript (page 18-19 and 21).

      3) In the first paragraph of page 9, the text states: "Previously, we found that MYBMIM can suppress MYB:CBP/P300-dependent gene expression, leading to AML cell apoptosis that required MYB-mediated suppression of BCL2 (Ramaswamy et al., 2018)." I think this is a typo, since in this study, MYBMIM treatment results in loss of MYB binding to the BCL2 gene and consequent reduction in BCL2 expression. Do the authors mean 'MYBMIM-mediated suppression of BCl2' or 'loss of MYB-mediated activation of BCL2'?

      Response: We thank the reviewer and have corrected this typographic error in the text.

      4) The authors explain the failure of excess CREBMIM to displace CBP/P300 from immobilised CREBMIM (Figure 1E-F) by the nature of the CREB:CBP/P300 interaction. Does this imply that CREBMIM is unable to disrupt the interaction between CREB and CBP/P300 in living cells and that the CBP/P300 purified from native MV4;11 lysates by immobilised CREBMIM was from a pool not associated with CREB?

      Response: We thank the reviewer for making this point. Indeed, we reproducibly observe that CRYBMIM binding to CBP can be competed with excess free CRYBMIM, but CREBMIM binding cannot be competed by excess CREBMIM. This may be due to the different stabilities of the CBP complexes that are available for binding in cells. Alternatively, it is also possible that CREB binding to CBP, as reflected by CREBMIM, has a relatively slow dissociation rate, as compared to MYB, as reflected by CRYBMIM. We have begun to purify cellular CBP complexes (revised Fig 8. and response to comment 2 for Reviewer 1), and aim to define their determinants in future studies, as enabled by the introduction of CRYBMIM, CREBMIM and MLLMIM probes in the current work.

      Reviewer #2 (Significance (Required)):

      Based on this integrative analysis, the authors propose a convincing hypothesis, involving the assembly of aberrant transcription factor complexes and sequestration of P300/CBP from genes involved in normal myeloid development, for the oncogenic activity of MYB in AML. As well as the obvious therapeutic potential of the CRYBMIM inhibitor itself, the data reported here reveal multiple avenues for future investigation into novel anti-AML therapeutic strategies. This is an innovative and important study.

      This study will be of interest to scientists and clinicians involved in leukaemia research as well as cancer biology in general.

      My field of expertise: leukaemia biology, leukaemia models, aberrant transcription factor activity in leukaemia

      Response: We appreciate and agree with this assessment.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This manuscript describes an improved MYB-mimetic peptide (cf the group's earlier work published in Nature Communications, 2018) and its effects on AML cell lines. It also describes - and this constitutes the majority of the paper - the dynamics of chromatin occupancy by MYB and other associated transcription factors upon disruption of the MYB-CBP/P300 interaction. The authors suggest this represents a shift from an oncogenic program to a myeloid differentiation program. \*Major comments:***

      Regarding the improved affinity, and biological activity, of CRYBMIM:

      1.Improved affinity of CRYBMIM cf MYBMIM: clearly, it is improved, but not by a lot. By MST the increased affinity is about 3x. In terms of effects on AML cell viability: there is no direct comparison, and this should be included. In the group's previous paper there is no direct estimate for MYBMIM but it looks like the IC50 is between 10 and 20 micromolar so the effect is again around 2.5 fold. Also, the effects of the amino acid substitutions in CG3 are also very small (2.4x) given that 3 critical residues are altered. This is quite concerning.

      Response: As pointed out by the reviewer, CRYBMIM exhibits several fold increase in binding affinity, as measured using purified proteins in vitro. Similar increase in cellular potency is observed after short-term treatment of AML cells, as shown in revised Figure 3C, and reproduced below. However, increasing the duration of treatment to several days leads to substantial improvement in apparent cellular potency (Figure 3G). For example, while MYBMIM induces approximately 100-fold reduction in cell viability of MV411 cells, CRYBMIM induces more than 1,000-fold reduction. Similarly, whereas MYBMIM exhibited relatively modest effects on OCIAML3 and SKM1 cells, CRYBMIM induces more than 1,000-fold reduction in cell viability. As we show in the revised manuscript, this appears to be due to the combination of increased biochemical affinity and specific proteolysis of MYB, which cooperate to induce extensive remodeling of MYB transcriptional complexes and gene expression (revised Figure 11). In all, this exemplifies how pharmacologic modulators of protein interactions can achieve significantly improved biological potency from relatively modest affinity effects, a concept that recently has been successfully used to develop a variety of PROTACs that leverage this “event-driven” as opposed to occupancy-driven pharmacology. The manuscript has been revised on page 8 and 18 to clarify this point.

      2.Does CRYBMIM really "spare" normal hematopoietic cells? Not according to Fig 2E, where there is only a 2-fold difference in IC50.

      Response: To better define the relative toxicity of CRYBMIM and MYBMIM, we examined their effects on the growth and survival of normal hematopoietic progenitor cells as compared to AML cells using colony forming assays in methylcellulose under more physiologic conditions in the presence of human hematopoietic cytokines (revised Figure 3E, and reproduced below). While CRYBMIM significantly reduced the clonogenic capacity, growth and survival of MV411 AML cells, there were no significant effects on the total clonogenic activity of normal CD34+ human umbilical cord blood progenitor cells under these conditions. At the highest dose, CRYBMIM induced modest reduction in CFU-MG colony formation, and modest increase in BFU-E colony formation of normal hematopoietic progenitor cells. We revised the manuscript to indicate that CRYBMIM “relatively spares” normal blood progenitor cells on page 8.

      Response: We appreciate the attention to this issue. In the original manuscript, we showed dose-response curves of cord blood progenitor cells cultured in suspension supplemented with fetal bovine serum, a system that is known to induce in appropriate hematopoietic cell differentiation (https://doi.org/10.1016/j.molmed.2017.07.003). In the revised manuscript, we show results of colony formation assays of hematopoietic progenitor cells cultured in serum-free, semi-solid conditions supplemented with human hematopoietic cytokines (revised Figure 3E and 3F). This is a more physiologic system which more faithfully maintains normal hematopoietic cell differentiation, as compared to the cellular differentiation induced by fetal bovine serum-containing media lacking hematopoietic growth factors, as used in the experiments in our original manuscript. To establish a positive control, in addition to treating AML cells under the same condition, we used doxorubicin, which is part of current treatment of patients with AML, and which in our experiments, exhibits significant and pronounced reduction in the clonogenic capacity, growth and survival of normal blood progenitor cells (revised Figure S3B). The manuscript has been revised on page 8 accordingly.

      1. Fig 2F doesn't include any lines that express very low or undetectable levels of MYB. Some of these should be included to further examine specificity.

      Response: We have now tested CRYBMIM against a large panel of non-hematopoietic tumor and non-tumor cell lines, with varying degrees of MYB expression. Some of those cells exhibit high level of MYB gene expression and MYB genetic dependency, which is at least in part correlated with susceptibility to CRYBMIM. (revised Figure S4, and reproduced below). The manuscript has been revised on page 8 accordingly.

      Effects on gene expression and MYB binding:

      Data on MYB target gene expression and apoptosis/differentiation, and the conclusions drawn per se are sound, but:

      5.Fig S3 seems to show that MYB protein is lost on treatment with CRYBMIM. This isn't even mentioned in the text but raises a whole range of major questions eg why is this the case? Is this what is responsible for the loss of MYB-p300 interaction and/or biological effects on AML cells? Is this what is responsible for the effects on MYB target gene expression in Fig 3 and MYB binding to chromatin in Fig 4? This must be addressed.

      Response: We have revised the manuscript to include this discussion, and performed additional experiments to define this phenomenon. We confirmed rapid reduction in MYB protein levels upon CRYBMIM treatment on the time-scale of one to four hours in diverse AML cell lines (revised Figure 11), with the rate of MYB protein loss correlating to the cellular susceptibility to CRYBMIM (revised Figure 11, and reproduced below). The manuscript has been revised on page 18 accordingly.

      This is consistent with the specific proteolysis of MYB induced by the peptidomimetic remodeling of the MYB:CBP/P300 complex. We confirmed this by combined treatment with the proteosomal/protease inhibitor MG132 (revised Figure 11C, and reproduced below). This effect was specific because overexpression of BCL2, which blocks MYBMIM-induced apoptosis (Ramaswamy et al, Kentsis, https://doi.org/10.1038/s41467-017-02618-6), was unable to rescue CRYBMIM-induced proteolysis of MYB, arguing that MYB proteolysis is a specific effect of CRYBMIM rather than a non-specific consequence of apoptosis. The manuscript has been revised on page 18 accordingly.

      6.Fig 4 and the accompanying text are a bit hard to follow, but if I understood them correctly, I am surprised that the "gained MYB peaks" don't include the MYB binding motif itself? This at least deserves some comment. Also, there doesn't seem to have been any attempt to integrate the ChIP-Seq data with the expression data of Fig 3. This would provide clearer insights into the identities and types of MYB-regulated genes that are directly affected by suppression of CBP/p300 binding to MYB.

      Response: We thank the reviewer for this suggestion. The revised manuscript now includes a comprehensive and integrated analysis of chromatin and gene expression dynamics (revised Figures 13A and 13B). In contrast to the model in which blockade of MYB:CBP/P300 induces loss of gene expression and loss of transcription factor and CBP/P300 chromatin occupancy, we also observed a large number of genes with increased expression and gain of CBP/P300 occupancy (revised Figure 13A-B, and reproduced below). This includes numerous genes that control hematopoietic differentiation, such as FOS, JUN, and ATF3. As a representative example, in the case of FOS, we observed that CRYBMIM-induced accumulation of CBP/P300 was associated with increased binding of RUNX1, and eviction of CEBPA and LYL1 (revised Figure 13C). Thus, the absence of “gained MYB peaks” is due to the redistribution of CBP/P300 with alternative transcription factors, such as RUNX1. In all, these results support the model in which the core regulatory circuitry of AML cells is organized aberrantly by MYB and its associated co-factors including LYL1, CEBPA, E2A, SATB1 and LMO2, which co-operate in the induction and maintenance of oncogenic gene expression, as co-opted by distinct oncogenes in biologically diverse subtypes of AML (revised Figure 14). This involves apparent sequestration of CBP/P300 from genes controlling myeloid cell differentiation. Thus, oncogenic gene expression is associated with the assembly of aberrantly organized MYB transcriptional co-activator complexes, and their dynamic remodeling by selective blockade of protein interactions can induce AML cell differentiation. The manuscript has been revised on page 20-21 accordingly.

      7.The MS studies on MYB-interacting proteins seem very interesting and novel. I am not an expert on MS, though, so I'd suggest this section be reviewed by someone who is. Moreover, I was unable to see the actual data from this study because the material I was provided with didn't include Table S4 and S5.

      Response: We appreciate this point. For this reason, we have deposited all of our mass spectrometry data to be openly available via PRIDE (accession number PXD019708), and also openly provide all of the analyzed data via Zenodo (https://doi.org/10.5281/zenodo.4321824), as additionally provided in the Supplementary Material for this manuscript.

      \Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?* 8.Claims regarding biological activity, specificity and improvements cf MYBMIM should be moderated given the small size of these effects as mentioned above (points 1 and 3).*

      Response: As explained in detail in response to comments 1-3 above (page 12-14 of this response), we have substantially revised the manuscript to incorporate both new experimental results and additional explanations (pages 6-8).

      9.I found the description of the studies related to Figs 5 and 6 somewhat difficult to follow and convoluted. While changes in MYB and CBP/p300 chromatin occupancy clearly occur on M CRYBMIM treatment, it is not clear that the complexes seen on genes prior to treatment represent "aberrant" complexes. These may just be characteristic of undifferentiated (myeloid) cells. The authors appear to argue that because some of the candidate co-factors show "apparently aberrant expression in AML cells" based on comparison of (presumably mRNA) expression data with normal cells, the presence of these factors in the complexes make them "aberrant" (moreover, the "aberrancy score" of Fig 5 C is not defined anywhere, as far as I can see). This inference is drawing a rather long bow, given that the AML-specific factors may not actually be absent from the complexes in normal cells. So this conclusion should be moderated if a more direct MS comparison cannot be provided (for which I understand the technical difficulties).

      Response: We have now measured protein abundance levels of key transcription factors assembled with MYB in AML cells in various normal human hematopoietic cells (revised Figure 9, and reproduced below). We found that most transcription factors that are assembled with MYB in diverse AML cell lines could be detected in one or more normal human blood cells, albeit with variable abundance, with the exception of CEBPA and SATB1 that were measurably expressed exclusively in AML cells (revised Figure 9A). Using unsupervised clustering and principal component analysis, we defined the combinations of transcription factors that are associated with aberrant functions of MYB:CBP/P300, as defined by their susceptibility to peptidomimetic remodeling (revised Figure 9B-D). In addition, we directly examined the physical assembly of MYB with key transcription factors in normal hematopoietic cells using co-immunoprecipitation studies (revised Figure 9E). In agreement with the physical association of MYB seen in AML cell lines, we observed association with CBP/P300 and LYL1 in normal hematopoietic cells. However, we did not observe physical association with E2A and SATB1 in normal cells, which indicates aberrant association of these in AML cell lines. This leads us to propose that these complexes are aberrantly assembled, at least in part due to the inappropriate transcription factor co-expression. The manuscript has been revised on page 15 accordingly.

      \Would additional experiments be essential to support the claims of the paper?*

      Response: As explained in detail in response to comment 5 above (page 16 of this response), we have carried out extensive studies of the specific proteolysis of MYB. We conclude that MYB transcription complexes are regulated both by MYB:CBP/P300 binding and by specific factor proteolysis, and can be induced by its peptidomimetic blockade in AML cells. Such “event-driven” pharmacology is emerging as a powerful tool to modulate protein function in cells, and studies reported in our work should enable its translation into improved therapies for patients, and improved probes for basic science.

      11.Provision of a positive control for the experiment of Fig S2.

      Response: As explained in detail in response to comment 2 above (page 13-14 of this response), we precisely defined the effects of CRYBMIM and MYBMIM on the clonogenic capacity, growth and survival of normal hematopoietic progenitor cells in serum-free, methylcellulose media supplemented with human hematopoietic cytokines. These experiments showed relatively modest effects (9.3 ± 3.8% reduction) of CRYBMIM on normal cells (Figure 3E), as compared to substantial inhibition (54 ± 2.4 % reduction) of the growth and survival of AML cells (Figures 3E). For comparison, doxorubicin led to more than 98 % reduction in clonogenic capacity (revised Figure S3B).

      12.\Are the data and the methods presented in such a way that they can be reproduced?**

      -Mostly yes

      Response: The revised manuscript includes a complete description of all methods, including a detailed supplement, listing technical details, with all analyzed data available openly via Zenodo (https://doi.org/10.5281/zenodo.4321824).

      13.\Are the experiments adequately replicated and statistical analysis adequate?**

      -Mostly yes

      Response: All experiments were performed in at least three replicates, with all quantitative comparisons performed using appropriate statistical tests, as explained in the manuscript.

      **Minor comments:**

      *Specific experimental issues that are easily addressable.*

      -These are mostly indicated above.

      In addition:

      14.Why is BCL2 expression down-regulated by MYBMIM but not CRYMYB?

      Response: We made the same observation, and attribute this difference to the fact that BCL2 expression is regulated by several transcription factors, including CEBPA, which is affected by CRYBMIM but not MYBMIM. Similar to MYBMIM treatment, MYB occupancy at the BCL2 enhancer was reduced upon CRYBMIM treatment. However, new binding sites of other factors, such as CBP/P300 and RUNX1, appeared simultaneously, suggesting that redistribution of transcription factors following CRYBMIM treatment can affect transcriptional regulation of BCL2 expression (revised Figure S9 and shown below).

      *Are prior studies referenced appropriately?

      -Yes *Are the text and figures clear and accurate?*

      15.Generally, although some details are missing eg what aberrancy score in Fig 5C means.

      Response: Thank you for pointing this out. We have revised this figure to clarify this score, which is defined as the ratio of gene expression in AML cells relative to normal hematopoietic progenitor cells (revised Figure 7C).

      16.\Do you have suggestions that would help the authors improve the presentation of their data and conclusions?**

      -The title of this manuscript could and I think should be changed. The term "therapeutic", is not appropriate because no therapeutic agents are described in the m/s nor is any form of AML, even experimentally, treated. Also "CBP" should be replaced with CBP/P300, especially since most evidence suggests that P300 is the likely more important partner of MYB (eg Zhao et al 2011

      Response: We agree and have revised the title to clarify the significance of this work: “Convergent organization of aberrant MYB complexes controls oncogenic gene expression in acute myeloid leukemia.” We have revised the manuscript to indicate CBP/P300.

      17.-It would be worth discussing the core observation that disruption of the MYB-CBP/P300 interaction actually results in changes in MYB DNA binding. That this would occur is not at all obvious, because CBP/p300 doesn't interact with MYB's DNA binding domain nor does it have intrinsic DNA binding activity.

      Response: We thank the reviewer for this comment, and agree that remodeling of the MYB complex must affect the binding of MYB and other cofactors to DNA, at least in part mediated by potential acetylation by CBP/P300 (page 24).

      Reviewer #3 (Significance (Required)):

      **The Nature and Significance of the Advance**

      1) The major significance of this work lies in the chromatin occupancy and MYB complex studies. There are a number of very interesting findings including the apparent redistribution of MYB and/or CBP/P300 upon treatment with CRYBMIM. These suggest a series of changes in factors associated with particular gene sets involved in myeloid differentiation, although as mentioned above particular target genes are not specifically identified. However the pathways corresponding to these are listed in Table S6.

      Response: We have revised the manuscript to include the target genes in revised Supplemental Table 4 as well as DESeq2 tables (deposited in Zenodo, https://doi.org/10.5281/zenodo.4321824).

      2) The new peptide design (CRYBMIM) is interesting but its differences in binding and biological effects of MYBMIM are mostly incremental. See above.

      Response: We respectfully disagree and would like to explain how this work is significant both for conceptual and technical reasons. First, while the biochemical affinity of CRYBMIM is quantitatively increased compared with MYBMIM, this quantitatively increased affinity translates into qualitatively improved biological potency, as a result of “event-driven” pharmacology that characterizes pharmacologic protein interaction modulators (please also see response to Reviewer 3, comment 1, page 6 of this response). MYBMIM suppresses the growth and survival mostly of MLL-rearranged leukemias, whereas CRYBMIM does so for the vast majority (10 out of 11) of studied subtypes of AML. This now enables its therapeutic translation, as we are currently pursuing in collaboration with Novartis. Second, its improved biological activity led to the discovery of the previously unknown and unanticipated CBP/P300 sequestration mechanism of oncogenic gene control. We use this discovery to develop a precise model of aberrant gene control in AML that for the first time unifies previously disparate observations into a general mechanism. This is highly significant because it provides shared molecular dependencies for most subtypes of AML, a long-standing conundrum in cancer biology.

      *Place the work in the context of the existing literature (provide references, where appropriate).*

      -This m/s builds on and extends the report from the same group in Nature Communications (2018), which described the earlier peptide MYBMIM, some effects on MYB target genes and on AML cells. It and the previous paper also draw on the findings regarding the role of the MYB-CBP/P300 interaction in myeloid leukemogenesis (Pattabirman et al 2014) and on previous genome-wide studies of MYB target genes (Zhoa et al 2011; Zuber et al 2011).

      *State what audience might be interested in and influenced by the reported findings.*

      -This m/s will likely be of interest to scientists interested in MYB per se, in AML, in cancer genomics and transcriptional regulation.

      *Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.* -My expertise: AML, experimental hematology, transcription, MYB, cancer genomics

      3) As mentioned above, I feel that additional expertise is required to review the MS studies.

      Response: We have deposited all raw data in PRIDE (accession number PXD019708) and all processed data in Zenodo (https://doi.org/10.5281/zenodo.4321824), making it available for the community for further analysis.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      This manuscript describes an improved MYB-mimetic peptide (cf the group's earlier work published in Nature Communications, 2018) and its effects on AML cell lines. It also describes - and this constitutes the majority of the paper - the dynamics of chromatin occupancy by MYB and other associated transcription factors upon disruption of the MYB-CBP/P300 interaction. The authors suggest this represents a shift from an oncogenic program to a myeloid differentiation program.

      Major comments:

      Regarding the improved affinity, and biological activity, of CRYBMIM:

      1.Improved affinity of CRYBMIM cf MYBMIM: clearly, it is improved, but not by a lot. By MST the increased affinity is about 3x. In terms of effects on AML cell viability: there is no direct comparison, and this should be included. In the group's previous paper there is no direct estimate for MYBMIM but it looks like the IC50 is between 10 and 20 micromolar so the fecct is again around 2.5 fold. Also, the effects of the amino acid substitutions in CG3 are also very small (2.4x) given that 3 critical residues are altered. This is quite concerning.

      2.Does CRYBMIM really "spare" normal hematopoietic cells? Not according to Fig 2E, where there is only a 2-fold difference in IC50.

      3.Fig 2E and Supp Fig S2 appear to be contradictory. The latter shows no effect of 20micromolar CRYBMIM on colony formation by normal CD34+ cells, in complete contrast to killing with IC50 of 12.8 micromolar in Fig 2E. There is no +ve control for Fig S2 ie does the peptide work under colony assay conditions? This MUST be addressed.

      4.Fig 2F doesn't include any lines that express very low or undetectable levels of MYB. Some of these should be included to further examine specificity.2

      Effects on gene expression and MYB binding:

      Data on MYB target gene expression and apoptosis/differentiation, and the conclusions drawn per se are sound, but:

      5.Fig S3 seems to show that MYB protein is lost on treatment with CRYBMIM. This isn't even mentioned in the text but raises a whole range of major questions eg why is this the case? Is this what is responsible for the loss of MYB-p300 interaction and/or biological effects on AML cells? Is this what is responsible for the effects on MYB target gene expression in Fig 3 and MYB binding to chromatin in Fig 4? This must be addressed.

      6.Fig 4 and the accompanying text are a bit hard to follow, but if I understood them correctly, I am surprised that the "gained MYB peaks" don't include the MYB binding motif itself? This at least deserves some comment. Also, there doesn't seem to have been any attempt to integrate the ChIP-Seq data with the expression data of Fig 3. This would provide clearer insights into the identities and types of MYB-regulated genes that are directly affected by suppression of CBP/p300 binding to MYB.

      7.The MS studies on MYB-interacting proteins seem very interesting and novel. I am not an expert on MS, though, so I'd suggest this section be reviewed by someone who is. Moreover, I was unable to see the actual data from this study because the material I was provided with didn't include Table S4 and S5.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      8.Claims regarding biological activity, specificity and improvements cf MYBMIM should be moderated given the small size of these effects as mentioned above (points 1 and 3).

      9.I found the description of the studies related to Figs 5 and 6 somewhat difficult to follow and convoluted. While changes in MYB and CBP/p300 chromatin occupancy clearly occur on M CRYBMIM treatment, it is not clear that the complexes seen on genes prior to treatment represent "aberrant" complexes. These may just be characteristic of undifferentiated (myeloid) cells. The authors appear to argue that because some of the candidate co-factors show "apparently aberrant expression in AML cells" based on comparison of (presumably mRNA) expression data with normal cells, the presence of these factors in the complexes make them "aberrant" (moreover, the "aberrancy score" of Fig 5 C is not defined anywhere, as far as I can see). This inference is drawing a rather long bow, given that the AML-specific factors may not actually be absent from the complexes in normal cells. So this conclusion should be moderated if a more direct MS comparison cannot be provided (for which I understand the technical difficulties).

      Would additional experiments be essential to support the claims of the paper?

      1. Address the issue of the apparent loss of MYB protein upon CRYBMIM treatment. If this is occurring, the whole premise of the subsequent work is undermined.

      12.Provision of a positive control for the experiment of Fig S2.

      Are the data and the methods presented in such a way that they can be reproduced?

      -Mostly yes

      Are the experiments adequately replicated and statistical analysis adequate?

      -Mostly yes

      Minor comments:

      Specific experimental issues that are easily addressable. -These are mostly indicated above.

      In addition: oWhy is BCL2 expression down-regulated by MYBMIM but not CRYMYB?

      *Are prior studies referenced appropriately?

      -Yes

      Are the text and figures clear and accurate?

      -Generally, although some details are missing eg what aberrancy score in Fig 5C means.

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      -The title of this manuscript could and I think should be changed. The term "therapeutic", is not appropriate because no therapeutic agents are described in the m/s nor is any form of AML, even experimentally, treated. Also "CBP" should be replaced with CBP/P300, especially since most evidence suggests that P300 is the likely more important partner of MYB (eg Zhao et al 2011

      -It would be worth discussing the core observation that disruption of the MYB-CBP/P300 interaction actually results in changes in MYB DNA binding. That this would occur is not at all obvious, because CBP/p300 doesn't interact with MYB's DNA binding domain nor does it have intrinsic DNA binding activity.

      Significance

      The Nature and Significance of the Advance

      -The major significance of this work lies in the chromatin occupancy and MYB complex studies. There are a number of very interesting findings including the apparent redistribution of MYB and/or CBP/P300 upon treatment with CRYBMIM. These suggest a series of changes in factors associated with particular gene sets involved in myeloid differentiation, although as mentioned above particular target genes are not specifically identified. However the pathways corresponding to these are listed in Table S6.

      -The new peptide design (CRYBMIM) is interesting but its differences in binding and biological effects cf MYBMIM are mostly incremental. See above.

      Place the work in the context of the existing literature (provide references, where appropriate).

      -This m/s builds on and extends the report from the same group in Nature Communications (2018), which described the earlier peptide MYBMIM, some effects on MYB target genes and on AML cells. It and the previous paper also draw on the findings regarding the role of the MYB-CBP/P300 interaction in myeloid leukemogenesis (Pattabirman et al 2014) and on previous genome-wide studies of MYB target genes (Zhoa et al 2011; Zuber et al 2011).

      State what audience might be interested in and influenced by the reported findings.

      -This m/s will likely be of interest to scientists interested in MYB per se, in AML, in cancer genomics and transcriptional regulation.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      -My expertise: AML, experimental hematology, transcription, MYB, cancer genomics

      -As mentioned above, I feel that additional expertise is required to review the MS studies.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This manuscript reports the generation of a new and improved peptide mimetic inhibitor of the interaction between MYB and CBP/P300. The original MYBMIM inhibitor of this interaction, reported recently by the same laboratory, was modified by addition and substitution of peptide sequences from CREB, thus improving the affinity of the resulting CRYBMIM peptide to CBP/P300. The improved inhibitor profile results in increased anti-AML efficacy of CRYBMIM over MYBMIM. The authors go on to examine the mechanism underlying the anti-AML activity of CRYBMIM by integrating gene expression analysis, chromatin immunoprecipitation sequencing and mass spectrometric protein complex identification in human AML cells.

      I have some minor questions the authors may wish to comment on:

      1) The relocation of MYB, along with CBP/P300, to genes controlling myeloid differentiation (clusters 4 and 9) upon CRYBMIM treatment is reminiscent of the increased binding of MYB to myeloid pro-differentiation genes in AML cells following RUVBL2 silencing, recently reported in Armenteros-Monterroso et al. 2019 Leukemia 33:2817. Do the authors know if there is any overlap between genes in either of the clusters and the list reported in the latter study?

      2) Could the authors comment on a possible mechanism to explain the co-localization of MYB and CBP/P300 to the loci in clusters 4 and 9 following CRYBMIM treatment? Is it possible that CBP/P300 is recruited by other transcription factors to these loci, independently of binding to MYB? Or is the binding of CBP/P300 to MYB at these loci somehow more resistant to disruption by CRYBMIM?

      3) In the first paragraph of page 9, the text states: "Previously, we found that MYBMIM can suppress MYB:CBP/P300-dependent gene expression, leading to AML cell apoptosis that required MYB-mediated suppression of BCL2 (Ramaswamy et al., 2018)." I think this is a typo, since in this study, MYBMIM treatment results in loss of MYB binding to the BCL2 gene and consequent reduction in BCL2 expression. Do the authors mean 'MYBMIM-mediated suppression of BCl2' or 'loss of MYB-mediated activation of BCL2'?

      4) The authors explain the failure of excess CREBMIM to displace CBP/P300 from immobilised CREBMIM (Figure 1E-F) by the nature of the CREB:CBP/P300 interaction. Does this imply that CREBMIM is unable to disrupt the interaction between CREB and CBP/P300 in living cells and that the CBP/P300 purified from native MV4;11 lysates by immobilised CREBMIM was from a pool not associated with CREB?

      Significance

      Based on this integrative analysis, the authors propose a convincing hypothesis, involving the assembly of aberrant transcription factor complexes and sequestration of P300/CBP from genes involved in normal myeloid development, for the oncogenic activity of MYB in AML. As well as the obvious therapeutic potential of the CRYBMIM inhibitor itself, the data reported here reveal multiple avenues for future investigation into novel anti-AML therapeutic strategies. This is an innovative and important study.

      This study will be of interest to scientists and clinicians involved in leukaemia research as well as cancer biology in general.

      My field of expertise: leukaemia biology, leukaemia models, aberrant transcription factor activity in leukaemia

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The study by Forbes et al describes and characterizes a 2nd generation peptide-based inhibitor of the MYB:CBP interaction, termed CRYBMIM, which they use to study MYB:cofactor interactions in leukemia cells. The CRYBMIM has improved properties relative to the MYBMIM peptide, and display more potency in biochemical and cell-based assays. Using a combination of epigenomics and biochemical screens, the authors define a list of candidate MYB cofactors whose functional significance as AML dependencies is supported by analysis of the DepMap database. Using genomewide profiling of TF and CBP occupancy, the authors provide evidence that CRYBMIM treatment reprograms the interactome of MYB in a manner that disproportionately changes specific cis-elements over others. Stated differently, the overall occupancy pattern of many TFs/cofactors shows gains and losses at specific cis elements, resulting in a complex modulation of MYB function and changes in transcription in leukemia cells.

      Overall, this is a strong, well-written study, with clear experimental results and relatively straightforward conclusions. The therapeutic potential of modulating MYB in cancer is enormous, and hence I believe this study will attract a broad interest in the cancer field and will likely be highly cited. I list below a few control experiments that would clarify the specificity of CRYBMIM.

      1) Does CRYBMIM bind to other KIX domains, such as of MED15. It would be important to evaluate the specificity of this peptide for whether it binds to other KIX domains.

      2) Similarly, it would be useful to perform a mass spec analysis to all nuclear factors that associate with streptavidin-immobilized CRYBMIM. This again would be help the reader to understand the specificity of this peptide.

      The major limitation of this study which modestly lessens my enthusiasm of this work is that the mechanistic model of MYB-sequestered TFs proposed here is based on a face-value interpretation of IP-MS data coupled with ChIP-seq data. Normally, I would expect such a mechanism to be supported with some additional focused biochemical experiments of specific interactions, to complement all of the omics approaches. For example, can the authors evaluate and/or validate further how MYB physically interacts with LYL1, CEBPA, SPI1, or RUNX1. Are these interactions direct or indirect? Which domains of these proteins are involved? Does CRYBMIM treatment modulate the ability of these proteins to associate with one another in a co-IP? Do these interactions occur in normal hematopoietic cells? A claim is made throughout this study that these are aberrant TF complexes, but I believe more evidence is required to support this claim.

      Significance

      Overall, this is a strong, well-written study, with clear experimental results and relatively straightforward conclusions. The therapeutic potential of modulating MYB in cancer is enormous, and hence I believe this study will attract a broad interest in the cancer field and will likely be highly cited.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 The authors study allostery with a beautiful genotype-phenotype experiment to study the fitness landscape of an allosteric lac repressor protein. The authors make a mutational library using error prone pcr and measure the impact on antibiotic resistance protein expression at varying levels of ligand, IPTG, expression. After measuring the impact of mutations authors fill-in the missing data using a neural net model. This type of dose response is not standard in the field, but the richness of their data and the discovery of the "band pass" phenomena prove its worth here splendidly. Using this mixed experimental/predicted data the authors explore how each mutation alters the different parameters of a hill equation fit of a dose response curve. Using higher order mutational space the authors look at how mutations can qualitatively switch phenotypes to inverted or band-stop dose-response curves. To validate and further explore a band-stop novel phenotype, the authors focused on a triple mutant and made all combinations of the 3 mutations. The authors find that only one mutation alone alters the dose-response and only in combination does a band-stop behavior present itself. Overall this paper is a fantastic data heavy dive into the allosteric fitness landscape of protein. Overall, the data presented in this paper is thoroughly collected and analyzed making the conclusions well-based. We do not think additional experiments nor substantial changes are needed apart from including basic experimental details and more biophysical rationale/speculation as discussed in further detail below.

      The authors do a genotype-phenotype experiment that requires extensive deep sequencing experiments. However, right now quite a bit of basic statistics on the sequencing is missing. Baseline library quality is somewhat shown in supplementary fig 2 but the figure is hard to interpret. It would be good to have a table that states how many of all possible mutations at different mutation depths (single, double, etc) there are. Similarly, sequencing statistics are missing- it would be useful to know how many reads were acquired and how much sequencing depth that corresponds to. This is particularly important for barcode assignment to phenotype in the long-read sequencing. In addition, a synonymous mutation comparison is mentioned but in my reading that data is not presented in the supplemental figures section.

      We thank the reviewer for this succinct summary of the manuscript and the results. We appreciate the reviewer identifying data of interest that were not included in the original manuscript. We agree that this information is necessary to consider the results. Specific changes are summarized in the comments below.

      The paper is very much written from an "old school" allostery perspective with static end point structures that are mutually exclusive - eg. p5l10 "relative ligand-binding affinity between the two conformations" - however, an ensemble of conformations is likely needed to explain their data. This is especially true for the bandpass and inverted phenotypes they observe. The work by Hilser et al is of particular importance in this area. We would invite the authors to speculate more freely about the molecular origins of their findings.

      We agree with the suggestions to adopt a modern allosteric perspective. We have changed the language throughout the manuscript to align with the ensemble model of allostery. We continue to frame results using the Monad-Wyman-Changeaux model, which reliably predicts LacI activity from biophysical parameters and is not exclusive of more modern models of allostery.

      **Minor** There are a number of small modifications. In general this paper is very technical and could use with some explanation and discussion for relevance to make the manuscript more approachable for a broader audience. P1L23: Ligand binding at one site causes a conformational change that affects the activity of another > not necessarily true - and related to using more "modern" statistical mechanical language for describing allostery.

      We agree with the reviewer’s comment. We have addressed this comment by adopting language in line with more modern view of allostery, for example:

      “With allosteric regulation, ligand binding at one site on a biomolecule changes the activity of another, often distal, site. Switching between active and inactive states provides a sense-and-response function that defines the allosteric phenotype.”

      P2L20: The core experiment of this paper is a selection using a mutational library. In the main body the authors mention the library was created using mutagenic pcr but leave it at that. More details on what sort of mutagenic pcr was used in the main body would be useful. According to the methods error prone pcr was used. Why use er-pcr vs deep point mutational libraries? Presumably to sample higher order phenotype? Rationale should be included. Were there preliminary experiments that helped calibrate the mutation level?

      We agree that justifying the decision to use error-prone PCR for library construction would be helpful. To explain this decision, we have added to the main text to explain this decision and to reflect on the consequences.

      “We used error-prone PCR across the full lacI CDS to investigate the effects of higher-order substitutions spread across the entire LacI sequence and structure.”

      And

      Novel phenotypes emerged at mutational distances greater than one amino acid substitution, highlighting the value in sampling a broader genotype space with higher-order mutations. Furthermore, the untargeted, random mutagenesis approach used here was critical for finding these novel phenotypes, as the genotypes required for these novel phenotypes were unpredictable.”

      P2L20: Baseline library statistics would be great in a table for coverage, diversity, etc especially as this was done by error prone pcr vs a more saturated library generation method. This is present in sup fig2 but it's a bit complicated.

      To more clearly convey the diversity within the library, we have included a heatmap of amino acid substitution counts found within the library (Supplementary Fig. 4). Additionally, we have added Supplementary Table 1, which lists the distribution of mutational distances of LacI variants found within the library, and the corresponding coverage of all possible mutations for each mutational distance.

      P2L26: How were FACS gates drawn? This is in support fig17 - should be pointed to here.

      We agree that a better description of the FACS process would be helpful. To address this we have included Supplementary Fig. 2, showing flow cytometry measurements of the library before and after FACS. Additionally, we have extended the description of the FACS process:

      “The initial library had a bimodal distribution of G__­0, as indicated by flow cytometry results, with a mode at low fluorescence (near G__­0 of wildtype LacI), and mode at higher gene expression. To generate a library in which most of the LacI variants could function as allosteric repressors, we used fluorescence activated cell sorting (FACS) to select the portion of the library with low fluorescence in the absence of ligand, gating at the bifurcation of the two modes (Sony SH800S Cell Sorter, Supplementary Fig. 2).”

      __

      P3L4: Where is the figure/data for the synonymous SNP mutations? This should be in the supplement.

      We agree this data is necessary to support the claim that LacI function was not impacted by synonymous mutations. We have included a new Supplementary Fig. 9, which shows the distribution of Hill equation parameters for LacI variants that code for the wild-type amino acid sequence, but with non-identical coding DNA sequences. Additionally, we included the results of a statistical analysis in the main text, this analysis compared all synonymous sequences in the library:

      “__We compared the distributions of the resulting Hill equation parameters between two sets of variants: 39 variants with exactly the wild-type coding DNA sequence for LacI (but with different DNA barcodes) and 310 variants with synonymous nucleotide changes (i.e. the wild-type amino acid sequence, but a non-wild-type DNA coding sequence). Using the Kolmogorov-Smirnov test, we found no significant differences between the two sets (p-values of 0.71, 0.40, 0.28, and 0.17 for G0, G∞, EC50, and n respectively, Supplementary Fig. 9).” __

      P3L20: The authors use a ML learning deep neural network to predict variant that were not covered in the screen. However, the library generation method is using error prone pcr meaning there could multiple mutations resulting in the same amino acid change. The models performance was determined by looking at withheld data however error prone pcr could result in multiple nonsynomymous mutations of the same amino acid. For testing were mutations truly withheld or was there overlap? Because several mutations are being represented by different codon combinations. Was the withheld data for the machine learning withholding specific substitutions?

      We thank the reviewer for identifying the need to clarify this critical data analysis. Data was held-out at the amino acid level, and so no overlap between the training and testing datasets occurred. We have clarified the description of the method in the main text:

      “We calculated RMSE using only held-out data not used in the model training, and the split between held-out data and training data was chosen so that all variants with a specific amino acid sequence appear in only one of the two sets.”


      In addition, higher order protein interactions are complicated and idiosyncratic. I am surprised how well the neural net performs on higher order substitutions. P4L4: Authors find mutations at the dimer/tetramer interfaces but don't mention whether polymerization is required. is dimerization required for dna binding? Tetramerization?

      We agree with the reviewer that, overall, a description of LacI structure and function would improve messaging the reported results. As such, we have added Supplementary Table 2, which defines the structural features discussed throughout the manuscript. Additionally, we have strived to describe the relevant structural and functional role of specific amino acids that are discussed in the text. Finally, we have also added a paragraph to the main text that summarizes the structure and function of LacI.

      “The LacI protein has 360 amino acids arranged into three structural domains__22–24__. The first 62 N-terminal amino acids form the DNA-binding domain, comprising a helix-turn-helix DNA-binding motif and a hinge that connects the DNA-binding motif and the core domain. The core domain, comprising amino acid positions 63-324, is divided into two structural subdomains: the N-terminal core and the C-terminal core. The full core domain forms the ligand-binding pocket, core-pivot region, and dimer interface. The tetramerization domain comprises the final 30 amino acids and includes a flexible linker and an 18 amino acid α-helix (Fig. 3, Supplementary Table 2). Naturally, LacI functions as a dimer of dimers: Two LacI monomers form a symmetric dimer that further assembles into a tetramer (a dimer of dimers).”

      P4L8: Substitutions near the dimer interface both impact g0 and ec50, which authors say is consistent with a change in the allosteric constant. Can authors explain their thinking more in the paper to make it easier to follow? Are the any mutations in this area that only impact g0 or ec50 alone? Why may these specific residues modify dimerization?

      We agree that a more in-depth discussion on the possible mechanisms behind these phenotypic changes would improve the manuscript. We have added discussion throughout the subsection “Effects of amino acid substitutions on LacI phenotype,” we believe this added discussion improve the manuscript and clarify the relationship between the observed allosteric phenotypes and the molecular mechanisms behind them. W

      Overall, we have made a number of changes in the manuscript that we hope will address these concerns.

      P4L8: The authors discuss the allosteric constant extensively within the paper but do not explain it. It would be helpful to have an explanation of this to improve readability. This explanation should include the statistical mechanical basis of it and some speculation about the ways it manifests biophysically.

      The allosteric constant is a critical concept, and we agree that it must be defined and discussed clearly throughout the manuscript. We have greatly expanded the discussion of the effects of single amino acid substitutions, and in the process we give examples of biochemical changes in the protein, and how they may affect the allosteric constant. We think this added text improves the manuscript and helps clarify the allosteric constant and the biomolecular processes that affect it.

      P4L1-16: Authors see mutations in the dimerization region that impact either G0 and Gsaturated in combination with Ec50 but not g0 and gsaturated together. Maybe we do not fully understand the hill equation but why are there no mutations that impact both g0 and gsaturated seen in support fig 13c? Why would mutations in the same region potentially impacting dimerization impact either g0 or gsaturated? What might be the mechanism behind divergent responses?

      It is important to recognize that the dimer interface does not just support the formation of dimers. There are many points of contact along the dimer interface that change when LacI switches between the active and inactive states. So, the dimer interface also helps regulate the balance between the active and inactive states. Our results show that different substitutions near the dimer interface can push this balance either toward the active or inactive states to varying degrees. We’ve added text throughout the description of single-substitutions effects to give specific examples and added a new paragraph at the end of that section to provide additional discussion and context. With regard to the more specific question of changes to both G0 and Ginf, the models indicate that simultaneous changes to those Hill Equation parameters requires an unusual combination of biophysical changes. To clarify this point, we added a short paragraph to the text:

      “None of the single amino substitutions measured in the library simultaneously decrease __G∞ and increase G0 (Supplementary Fig. 20c). This is not surprising, since substitutions that shift the biophysics to favor the active state tend to decrease G∞ while those that favor the inactive state tend to increase G0, and the biophysical models2,14,15 indicate that only a combination of parameter changes can cause both modifications to the dose-response. The library did, however, contain several multi-substitution variants with simultaneously decrease __G∞ and increase G0. These inverted variants, and their associated substitutions are discussed below.”


      P4L29: for interpretability it would be good to explain what log-additive effect means in the context of allostery.

      We agree that this information would be useful to the reader and have added additional text to explain log-additivity. We thank the reviewer for pointing out this oversight.

      “Combining multiple substitutions in a single protein almost always has a log-additive effect on EC50. That is, the proportional effects of two individual amino acid substitutions on the EC50 can be multiplied together. For example, if substitution A results in a 3-fold change, and substitution B results in a 2-fold change, the double substitution, AB, behaving log-additively, results in a 6-fold change__.”__

      P4L34-P5L19: This section is wonderful. Really cool results and interesting structural overlap! P5L34 Helix 9 of the protein is mentioned but it's functional relevance is not. This is common throughout the paper - it would be useful for there to be an overview somewhere to help the reader contextualize the results with known structural role of these elements.

      We agree with the reviewer that this information would help to contextualize the results. We have made a number of changes to address this. First, we have added Supplementary Table 2, which describes the structural features of LacI. Second, we have added a paragraph overviewing the structure and function of LacI. Third, we have expanded the section “the effects of individual amino acid substitutions on the function of LacI” to discuss the structural or biochemical impact of specific substitutions. We thank the reviewer for this suggestion.

      P5L39: The authors identified a triple mutant with the band-stop phenotype then made all combination of the triple mutant. Of particular interest is R195H/G265D which is nearly the same as the triple mutant. It would be nice if the positions of each of these mutations and have some discussion to begin to rationalize this phenotype, even if to point out how far apart they are and that there is no easy structural rationale!

      We appreciate the reviewer highlighting this area of interest. We have added structural information to Fig. 6, which indicates to position of the amino acid substitutions that result in the band-stop phenotype, as well as a small discussion in the main text:

      “To further investigate the band-stop phenotype, we chose a strong band-stop LacI variant with only three amino acid substitutions (R195H/G265D/A337D). These three positions are distributed distally on the periphery of the C-terminal core domain, and the role that each of these substitutions plays in the emergence of the band-stop phenotype is unclear.”

      P6L9: There should be more discussion of the significance of this work directly compared to what is known. For instance, negative cooperativity is mentioned as an explanation for bi-phasic dose response but this idea is not explained. Why would the relevant free energy changes be more entropic? Another example is the reverse-TetR phenotype observed by Hillen et al.

      We agree that more discussion is necessary to frame the results reported in the manuscript. To address this, we have added additional discussion throughout the manuscript that relates the results to the current understanding of allostery. Also, in the Conclusion, we added specific examples that lead us to link the ideas of bi-phasic dose response, negative cooperativity, and entropy/disorder. We believe these additions have improved the manuscript and we thank the reviewer for this suggestion.

      P6L28: The authors mention that phenotypes exist with genotypes that are discoverable with genotype-phenotype landscapes. This study due to the constraints of error prone pcr were somewhat limited. How big is the phenotypic landscape? Is it worth doing a more systematic study? What is the optimal experimental design: Single mutations, doubles, random - where is there the most information. How far can you drift before your machine learning model breaks down? How robust would it be to indels?

      The reviewer raises some excellent questions here, some of which are appropriate subjects for future work. The optimal experimental design depends on the objective: If the goal is to understand every possible mutation, a systematic site-saturation approach would be more appropriate. However, the landscape of a natural protein is limited by its wild-type DNA coding sequence, and so some substitutions are inaccessible (due to the arrangement of the codon table). The approach we took allowed to us characterize most of the accessible amino acid substitutions, while also allowing us to identify novel functions that would not have been identified with other approaches. We have added a little to the main text to discuss this (below). With regard to the DNN model, in the manuscript (SI Fig. 14), we show how the predictive accuracy degrades with mutational distance from the wild-type. It is possible that the type of DNN that we used could handle indels, since it effectively encodes each variant as a set of step-wise changes from the wild-type. But as with all machine-learning methods, it would require training with a dataset that included indels.

      “Novel phenotypes emerged at mutational distances greater than one amino acid substitution, highlighting the value in sampling a broader genotype space with higher-order mutations. Furthermore, the untargeted, random mutagenesis approach used here was critical for finding these novel phenotypes, as the genotypes required for these novel phenotypes were unpredictable.”

      Figures: Sup figs 3-7: The comparison of library-based results and single mutants is a great example of how to validate genotype-phenotype experiments!

      Thank you.

      Supp fig 5.: Missing figure number.

      We appreciate the reviewer catching this error and have attempted to properly label all figures and tables in this revision. Thank you.

      Supp fig7: G0 appears to have very poor fit between library vs single mutant version. Why might this be? R^2 would likely be better to report here as opposed to RMSE as RMSE is sensitize to the magnitude of the data such that you cannot directly compare RMSE of say 'n' to G0.

      We agree that these are important discussion points and have addressed this concern with an expanded discussion in the main text, as well as the addition of coefficient of correlation (R^2) in the caption for Figure 2 (previously supplementary figure 7). We believe these additions contribute meaningfully to the manuscript, and they address the concerns of the reviewer. The additional text reads:

      “We compared the Hill equation parameters from the library-scale measurement to those same parameters determined from flow cytometry measurements for each of the chemically synthesized LacI variants (Fig. 2). This served as a check of the new library-scale method’s overall ability to measure dose-response curves with quantitative accuracy. The accuracy for each Hill equation parameter in the library-scale measurement was: 4-fold for G0, 1.5-fold for G∞, 1.8-fold for EC50, and ± 0.28 for n. For G0, G∞, and EC50, we calculated the accuracy as: __, where __ is the root-mean-square difference between the logarithm of each parameter from the library-scale and cytometry measurements. For n, we calculated the accuracy simply as the root-mean-square difference between the library-scale and cytometry results. The accuracy for the gene expression levels (G0 and G∞) was better at higher gene expression levels (typical for G∞) than at low gene expression levels (typical for G0), which is expected based on the non-linearity of the fitness impact of tetracycline (Supplementary Figs. 10-11). Measurements of the Hill coefficient, n, had high relative uncertainties for both barcode-sequencing and flow cytometry, and so the parameter n was not used in any quantitative analysis.”

      Sup fig13c: it is somewhat surprising that mutations only appear to effect g0 and not gsaturated. This implies that basal and saturated activity are not coupled. Is this expected? Why or why not?

      This comment is partially addressed with a response above (P4L1-16). Coupled gene expression increases do occur, especially with substitutions at the start codon that result in fewer copies of LacI in the cell. In this instance, both G0 and G∞ are increased. Otherwise, changes to multiple biophysical parameters are required to increase both G0 and G∞.

      Reviewer #1 (Significance (Required)): Allostery is hard to comprehend because it involves many interacting residues propagating information across a protein. The Monod-Wyman-Changeux (MWC) and Koshland, Nemethy, and Filmer (KNF) models have been a long standing framework to explain much of allostery, however recent formulations have focused on the role of the conformational ensemble and a grounding in statistical mechanics. This manuscript focuses on the functional impact of mutations and therefore contribution of the amino acids to regulation. The authors unbiased approach of combining a dose-response curve and mutational library generation let them fit every mutant to a hill equation. This approach let the authors identify the allosteric phenotype of all measured mutations! The authors found inverted phenotypes which happen in homologs of this protein but most interesting is the strange and idiosyncratic 'Band-stop' phenotype. The band-stop phenotype is bi-phasic that will hopefully be followed up with further studies to explain the mechanism. This manuscript is a fascinating exploration of the adaptability of allosteric landscapes with just a handful of mutations. Genotype-phenotype experiments allow sampling immense mutational space to study complex phenotypes such as allostery. However, a challenge with these experiments is that allostery and other complicated phenomena come from immense fitness landscapes altering different parameters of the hill equation. The authors approach of using a simple error prone pcr library combined with many ligand concentrations allowed them to sample a very large space somewhat sparsely. However, they were able to predict this data by training and using a neural net model. I think this is a clever way to fill in the gaps that are inherent to somewhat sparse sampling from error prone pcr. The experimental design of the dose response is especially elegant and a great model for how to do these experiments. With some small improvements for readability, this manuscript will surely find broad interest to the genotype-phenotype, protein science, allostery, structural biology, and biophysics fields. We were prompted to do this by Review Commons and are posting our submitted review here: Willow Coyote-Maestas has relevant expertise in high throughput screening, protein engineering, genotype-phenotype experiments, protein allostery, dating mining, and machine learning. James Fraser has expertise in structural biology, genotype-phenotype experiments, protein allostery, protein dynamics, protein evolution, etc.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): The authors use deep mutational scanning to infer the dose-response curves of ~60,000 variants of the LacI repressor and so provide an unprecedently systematic dataset of how mutations affect an allosteric protein. Overall this is an interesting dataset that highlights the potential of mutational scanning for rapidly identifying diverse variants of proteins with desired or unexpected activities for synthetic biology/bioengineering. The relatively common inverted phenotypes and their sequence diversity is interesting, as is the identification of several hundred genotypes with non-sigmoidal band-stop dose-response curves and their enrichment in specific protein regions. A weakness of the study is that some of the parameter estimates seem to have high uncertainty and this is not clearly presented or the impact on the conclusions analysed. A second shortcoming is that there is little mechanistic insight beyond the enrichments of mutations with different effects in different regions of the protein. But as a first overview of the diversity of mutational effects on the dose-response curve of an allosteric protein, this is an important dataset and analysis. **Comments** **Data quality and reproducibility** "The flow cytometry results confirmed both the qualitative and quantitative accuracy of the new method (Supplementary Figs. 3-7)"

      • There need to be quantitative measures of accuracy in the text here for the different parameters.

      We believe this comment is addressed along with the following two comments.

      • Sup fig 7 panels should be main text panels - they are vital for understanding the data quality In particular, the G0 parameter estimates from the library appear to have a lower bound ie they provide no information below a cytometry Go of ~10^4. This is an important caveat and needs to be highlighted in the main text. The Hill parameter (n) estimate for wt (dark gray) replicate barcodes is extremely variable - why is this?

      • In general there is not a clear enough presentation of the uncertainty and biases in the parameter estimations which seem to be rather different for the 4 parameters. Only the EC50 parameter seems to correlate very well with the independent measurements.

      We thank the reviewer for identifying a need for more information on the accuracy of this method. So, we have moved Supplementary Fig. 7 to the main text (Fig 2 in the revised manuscript) and have added coefficients of correlation to each Hill equation parameter in that figure caption. Furthermore, we have added new data (Supplementary Fig. 11), which shows the uncertainty associated with different gene expression levels. Finally, we have added a discussion on the accuracy of this method for each parameter of the Hill equation to the main text. Estimation of the Hill coefficient (n) from data is often highly uncertain and variable, because that parameter estimate can be highly sensitive to random measurement errors at a single point on the curve. The estimate for the wild type appears to be highly variable because the plot contains 53 replicate measurements. So, the plotted variability represents approximately 2 standard deviations. The spread of wild-type results in the plot is consistent with the stated RMSE for the Hill coefficient. Furthermore, the Hill coefficient is not used in any of the additional quantitative analysis in our manuscript, partially because of its relatively high measurement uncertainty, but also because, based on the biophysical models, it is not as informative of the underlying biophysical changes.

      “We compared the Hill equation parameters from the library-scale measurement to those same parameters determined from flow cytometry measurements for each of the chemically synthesized LacI variants (Fig. 2). This served as a check of the new library-scale method’s overall ability to measure dose-response curves with quantitative accuracy. The accuracy for each Hill equation parameter in the library-scale measurement was: 4-fold for G0, 1.5-fold for G∞, 1.8-fold for EC50, and ± 0.28 for n. For G0, G∞, and EC50, we calculated the accuracy as: "exp" ["RMSE" ("ln" ("x" ))], where "RMSE" ("ln" ("x" )) is the root-mean-square difference between the logarithm of each parameter from the library-scale and cytometry measurements. For n, we calculated the accuracy simply as the root-mean-square difference between the library-scale and cytometry results. The accuracy for the gene expression levels (G0 and G∞) was better at higher gene expression levels (typical for G∞) than at low gene expression levels (typical for G0), which is expected based on the non-linearity of the fitness impact of tetracycline (Supplementary Figs. 10-11). Measurements of the Hill coefficient, n, had high relative uncertainties for both barcode-sequencing and flow cytometry, and so the parameter n was not used in any quantitative analysis.”

      • The genotypes in the mutagenesis library contain a mean of 4.4 aa substitutions and the authors us a neural network to estimate 3 of the Hill equation parameters (with uncertainties) for the 1991/2110 of the single aa mutations. It would be useful to have an independent experimental evaluation of the reliability of these inferred single aa mutational effects by performing facs on a panel of single aa mutants (using single aa mutants in sup fig 3-7, if there are any, or newly constructed mutants).

      We agree that the predictive performance of the DNN requires experimental validation. We evaluated the performance by withholding data from 20% of the library, including nearly 200 variants with single amino acid substitutions, and then compared the predicted effect of those substitutions to the measured effect. The results of this test are reported in Supplementary Fig. 14. Additionally, we have adjusted the main text to more clearly explain the evaluation process.

      “To evaluate the accuracy of the model predictions, we used the root-mean-square error (RMSE) for the model predictions compared with the measurement results. We calculated RMSE using only held-out data not used in the model training, and the split between held-out data and training data was chosen so that all variants with a specific amino acid sequence appear in only one of the two sets.” __ __

      • fig3/"Combining multiple substitutions in a single protein almost always has a log-additive effect on EC50." How additive are the other 2 parameters? this analysis should also be presented in fig 3. If they are not as additive is it simply because of lower accuracy of the measurements? If the mutational effects are largely additive, then a simple linear model (rather than the DNN) could be used to estimate the single mutant effects from the multiple mutant genotypes.

      We agree with the reviewer that exploring the log-additivity of the Hill equation parameters is informative, and have included Supplementary Figure 21, which displays this information. Furthermore, we expanded the discussion of log-additivity on all three parameters in the main text:

      “Combining multiple substitutions in a single protein almost always has a log-additive effect on EC50. That is, the proportional effects of two individual amino acid substitutions on the EC50 can be multiplied together. For example, if substitution A results in a 3-fold change, and substitution B results in a 2-fold change, the double substitution, AB, behaving log-additively, results in a 6-fold change. Only 0.57% (12 of 2101) of double amino acid substitutions in the measured data have EC50 values that differ from the log-additive effects of the single substitutions by more than 2.5-fold (Fig. 4). This result, combined with the wide distribution of residues that affect EC50, reinforces the view that allostery is a distributed biophysical phenomenon controlled by a free energy balance with additive contributions from many residues and interactions, a mechanism proposed previously1,39 and supported by other recent studies17, rather than a process driven by the propagation of local, contiguous structural rearrangements along a defined pathway.

      A similar analysis of log-additivity for G0 and G∞ is complicated by the more limited range of measured values for those parameters, the smaller number of substitutions that cause large shifts in G0 or G∞, and the higher relative measurement uncertainty at low G(L). However, the effects of multiple substitutions on G0 and G∞ are also consistent with log-additivity for almost every measured double substitution variant (Supplementary Fig. 21).”

      **Presentation/clarity of text and figures**

      • The main text implies that the DNN is trained to predict 3 parameters of the Hill equation but not the Hill coefficient (n). This should be clarified / justified in the main text.

      We agree that the decision to exclude the parameter ‘n’ requires explanation in the main text. To address this, we have added to the main text:

      “Measurements of the Hill coefficient, n, had high relative uncertainties for both barcode-sequencing and flow cytometry, and so the parameter n was not used in any quantitative analysis.”

      and

      “We trained the model to predict the Hill equation parameters G0, G∞, and EC50 (Supplementary Fig. 13), the three Hill equation parameters that were determined with relatively low uncertainty by the library-scale measurement.”

      • The DNN needs to be better explained and justified in the main text for a general audience. How do simpler additive models perform for phenotypic prediction / parameter inference?

      We agree with the reviewer that the DNN needs to be justified in the main text. As part of the revision plan, we propose to compare the predictive performance of the DNN to an additive model.

      • Ref 14. analyses a much smaller set of mutants in the same protein but using an explicit biophysical model. It would be helpful to have a more extensive comparison with the approach and conclusions to this previous study.

      Throughout the manuscript, we frame the results and discussion in terms of the referenced biophysical model. Using the model, we describe the biophysical effects that a substitution may have on LacI, based on observed changes to function associated with that substitution. We also comment briefly on the limitations of this model when applied to the extensive dataset presented here.

      “Most of the non-silent substitutions discussed above are more likely to affect the allosteric constant than either the ligand or operator affinities. Within the biophysical model, those affinities are specific to either the active or inactive state of LacI, i.e. they are defined conditionally, assuming that the protein is in the appropriate state. So, almost by definition, substitutions that affect the ligand-binding or operator-binding affinities (as defined in the models) must be at positions that are close to the ligand-binding site or within the DNA-binding domain. Substitutions that modify the ability of the LacI protein to access either the active state or inactive state, by definition, affect the allosteric constant. This includes, for example, substitutions that disrupt dimer formation (dissociated monomers are in the inactive state), substitutions that lock the dimer rigidly into either the active or inactive state, or substitutions that more subtly affect the balance between the active and inactive states. Thus, because there are many more positions far from the ligand- and DNA- binding regions than close to those regions, there are many more opportunities for substitutions to affect the allosteric constant than the other biophysical parameters. Note that this analysis assumes that substitutions don’t perturb the LacI structure too much, so that the active and inactive states remain somehow similar to the wild-type states. Our results suggest that this is not always the case: consider, for example, the substitutions at positions __K84 and M98 discussed above and the substitutions resulting in the inverted and band-stop phenotypes discussed below.”__

      • Enrichments need statistical tests to know how unexpected that results are e.g. p5 line 12 "67% of strongly inverted variants have substitutions near the ligand-binding pocket"

      We agree that this information is necessary to interpret the results. We have included p-values (previously reported only in the Methods section) throughout the main text of the manuscript.

      The publication by Poelwijk et al. was considered extensively when planning this work, and failing to cite that manuscript would have been tremendously unjust. We have included it, as well as a few additional references that have identified and discussed inverted LacI variants. We sincerely thank the reviewer for identifying this oversight.

      • What mechanisms do the authors envisage that could produce the band-stop dose response curves? There is likely previous theoretical work that could be cited here. In general there is little discussion of the biophysical mechanisms that could underlie the various mutational effects.

      We agree with the reviewer, that discussing the biophysical mechanisms that underlie many of the reported mutations is important to understand the results. We have expanded the subsection “Effects of amino acid substitutions on LacI phenotype” to include discussion on several of the key substitutions (or groups of substitutions) and their potential biophysical effects. Additionally, we consider mechanism that may underlie the band-stop sensor, and propose one model that could explain the band-stop phenotype:

      “In particular, the biphasic dose-response of the band-stop variants suggests negative cooperativity: that is, successive ligand binding steps have reduced ligand binding affinity. Negative cooperativity has been shown to be required for biphasic dose-response curves__42,43. The biphasic dose-response and apparent negative cooperativity are also reminiscent of systems where protein disorder and dynamics have been shown to play an important role in allosteric function1, including catabolite activator protein (CAP)44,45 and the Doc/Phd toxin-antitoxin system46. This suggests that entropic changes may also be important for the band-stop phenotype. A potential mechanism is that band-stop LacI variants have two distinct inactive states: an inactive monomeric state and an inactive dimeric state. In the absence of ligand, inactive monomers may dominate the population. Then, at intermediate ligand concentrations, ligand binding stabilizes dimerization of LacI into an active state which can bind to the DNA operator and repress transcription. When a second ligand binds to the dimer, it returns to an inactive dimeric state, similar to wildtype LacI. This mechanism, and other possible mechanisms, do not match the MWC model of allostery or its extensions2,13–15__ and require a more comprehensive study and understanding of the ensemble of states in which these band-stop LacI variants exist.”

      • "This result, combined with the wide distribution of residues that affect EC50, suggests that LacI allostery is controlled by a free energy balance with additive contributions from many residues and interactions." 'additive contributions and interactions' covers all possible models of vastly different complexity i.e. this sentence is rather meaningless.

      We have attempted to contextualize this statement by adding additional discussion and references. We hope these additions give more meaning to this section.

      “__This result, combined with the wide distribution of residues that affect EC50, reinforces the view that allostery is a distributed biophysical phenomenon controlled by a free energy balance with additive contributions from many residues and interactions, a mechanism proposed previously1,39 and supported by other recent studies17, rather than a process driven by the propagation of local, contiguous structural rearrangements along a defined pathway.”__

      • fig 4 c and d compress a lot of information into one figure and I found this figure confusing. It may be clearer to have multiple panels with each panel presenting one aspect. It is also not clear to me what the small circular nodes exactly represent, especially when you have one smaller node connected to two polygonal nodes, and why they don't have the same colour scale as the polygonal nodes.

      We agree with the reviewer that figure 4 (or Figure 5 in the revised manuscript) contains a lot of information. The purpose of this figure is to convey the structural and genetic diversity among the sets of inverted variants and band-stop variants. We designed this figure to convey this point at two levels: a brief overview, where the diversity is apparent by quickly considering the figure, and at a more informative level, with some quantitative data and structurally relevant points highlighted. We have modified the caption slightly, in an effort to improve clarity.

      • line 25 - 'causes a conformational change' -> 'energetic change' (allostery does not always involve conformational change

      We thank the reviewer for this comment and have adopted a more modern language describe allostery throughout the manuscript.

      • sup fig 5 legend misses '5'

      We thank the reviewer for pointing this out, we have attempted to number all figures and tables more carefully.

      • sup fig 7. pls add correlation coefficients to these plots (and move to main text figures).

      We agree that this information is of interest and have included this data as main text Figure 2. In addition, we have included coefficients of correlation in the caption of this figure.

      • Reference 21 is just a title and pubmed link

      We thank the reviewer for identifying this error, we have corrected this in the references.

      • "fitness per hour" -> growth rate

      To ensure that this connection is clearly established, when we introduce fitness for the first time, we clarify that it relates to growth rate:

      “Consequently, in the presence of tetracycline, the LacI dose-response modulates cellular fitness (i.e. growth rate) based on the concentration of the input ligand isopropyl-β-D-thiogalactoside (IPTG).”

      Also, we define ‘fitness’ in the Methods section:

      “The experimental approach for this work was designed to maintain bacterial cultures in exponential growth phase for the full duration of the measurements. So, in all analysis, the Malthusian definition of fitness was used, i.e. fitness is the exponential growth rate__58__.”

      • page 6 line 28 - "discoverable only via large-scale landscape measurements" - directed evolution approaches can also discover such genotypes (see e.g. Poelwijk /Tans paper). Please re-phrase.

      We agree with the reviewer and have adjusted the main text accordingly.

      “__Overall, our findings suggest that a surprising diversity of useful and potentially novel allosteric phenotypes exist with genotypes that are readily discoverable via large-scale landscape measurements.”__

      • pls define jargon the first time it is used e.g. band-stop and band-pass

      We agree that all unconventional terms should be explicitly defined when used, and we have attempted to define the band-pass and band-stop dose-response curves more clearly in the main text:

      “These include examples of LacI variants with band-stop dose-response curves (i.e. variants with high-low-high gene expression; e.g. Fig. 1e, Supplementary Fig. 7), and LacI variants with band-pass dose-response curves (i.e. variants with low-high-low gene expression; e.g. Supplementary Fig. 8).”

      **Methods/data availability/ experimental and analysis reproducibility:** The way that growth rate is calculated on page 17 equation 1- This section is confusing. Please be explicit about how you accounted for the lag phase, what the lag phase was, and total population growth during this time. In addition, please report the growth curves from the wells of the four plates, the final OD600 of the pooled samples, and exact timings of when the samples were removed from 37 degree incubation in a table. These are critical for calculating growth rate in individual clones downstream.

      We thank the reviewer for identifying the need to clarify this section of text. The ‘lag’ in this section referred to a delay before tetracycline began impacting the growth rate of cells. To address this, we have changed ‘lag’ in this context to ‘delay.’ Furthermore, we have attempted to clarify precisely the cause of this delay, and how we accounted for it in calculating growth rates:

      For samples grown with tetracycline, the tetracycline was only added to the culture media for Growth Plates 2‑4. Because of the mode of action of tetracycline (inhibition of translation), there was a delay in its effect on cell fitness: Immediately after diluting cells into Growth Plate 2 (the first plate with tetracycline), the cells still had a normal level of proteins needed for growth and proliferation and they continued to grow at nearly the same rate as without tetracycline. Over time, as the level of proteins required for cell growth decreased due to tetracycline, the growth rate of the cells decreased. Accordingly, the analysis accounts for the variation in cell fitness (growth rate) as a function of time after the cells were exposed to tetracycline. With the assumption that the fitness is approximately proportional to the number of proteins needed for growth, the fitness as a function of time is taken to approach the new value with an exponential decay:

      (3)

      where μitet is the steady-state fitness with tetracycline, and α is a transition rate. The transition rate was kept fixed at α = log(5), determined from a small-scale calibration measurement. Note that at the tetracycline concentration used during the library-scale measurement (20 µg/mL), μitet was greater than zero even at the lowest G(L) levels (Supplementary Fig. 10). From Eq. (3), the number of cells in each Growth Plate for samples grown with tetracycline is:

      • What were the upper and lower bounds of the measurements? (LacI deletion vs Tet deletion / autofluoresence phenotype - true 100% and true 0% activity). Knowing and reporting these bounds will also allow easier comparison between datasets in the future.

      We agree that knowing the limitations of the measurement are important for contextualizing the results. To address this point, we have included Supplementary Fig. 11, which shows the uncertainty of the measurement across gene expression levels.

      Please clarify whether there was only 1 biological replicate (because the plates were pooled before sequencing)? Or if there were replicates present an analysis of reproducibility.

      We thank the reviewer for pointing out the ambiguity in the original manuscript. The library-scale measurement reported here was completed once, the 24 growth conditions were spread across 96 wells, so each condition occupied 4 wells. The 4 wells were combined prior to DNA extraction. We have clarified this process in the methods by removing ‘duplicate’:

      “Growth Plate 2 contained the same IPTG gradient as Growth Plate 1 with the addition of tetracycline (20 µg/mL) to alternating rows in the plate, resulting in 24 chemical environments, with each environment spread across 4 wells.”

      Despite there being only a single library-scale measurement, the accuracy and reliability of the results are supported by many distinct biological replicates within the library (i.e. LacI variants with the same amino acid sequence but with different barcodes, see new Supplementary Fig. 9), as well as over 100 orthogonal dose-response curve measurements completed with flow cytometry (Figure 2). We believe these support the reproducibility of the work and we have included statistical analysis on the accuracy of the library-scale measurement results.

      “To test the accuracy of the new method for library-scale dose-response curve measurements, we independently verified the results for over 100 LacI variants from the library. For each verification measurement, we chemically synthesized the coding DNA sequence for a single variant and inserted it into a plasmid where LacI regulates the expression of a fluorescent protein. We transformed the plasmid into E. coli and measured the resulting dose-response curve with flow cytometry (e.g. Fig. 1e). We compared the Hill equation parameters from the library-scale measurement to those same parameters determined from flow cytometry measurements for each of the chemically synthesized LacI variants (Fig. 2). This served as a check of the new library-scale method’s overall ability to measure dose-response curves with quantitative accuracy. The accuracy for each Hill equation parameter in the library-scale measurement was: 4-fold for G0, 1.5-fold for G∞, 1.8-fold for EC50, and ± 0.28 for n. For G0, G∞, and EC50, we calculated the accuracy as: "exp" ["RMSE" ("ln" ("x" ))], where "RMSE" ("ln" ("x" )) is the root-mean-square difference between the logarithm of each parameter from the library-scale and cytometry measurements. For n, we calculated the accuracy simply as the root-mean-square difference between the library-scale and cytometry results (Supplementary Fig. 7).”

      • Please provide supplementary tables of the data (in addition to the raw sequencing files). Both a table summarising the growth rates, inferred parameter values and uncertainties for genotypes and a second table with the barcode sequence counts across timepoints and associated experimental data.

      We agree that access to this information is critical. Due to the size of the associated data, we have made this data available for download in a public repository. We direct readers to the repository information in the “Data Availability” statement:

      “The raw sequence data for long-read and short-read DNA sequencing have been deposited in the NCBI Sequence Read Archive and are available under the project accession number PRJNA643436. Plasmid sequences have been deposited in the NCBI Genbank under accession codes MT702633, and MT702634, for pTY1 and pVER, respectively.

      The processed data table containing comprehensive data and information for each LacI variant in the library is publicly available via the NIST Science Data Portal, with the identifier ark:/88434/mds2-2259 (https://data.nist.gov/od/id/mds2-2259 or https://doi.org/10.18434/M32259). The data table includes the DNA barcode sequences, the barcode read counts, the time points used for the libarary-scale measurement, fitness estimates for each barcoded variant across the 24 chemical environments, the results of both Bayesian inference models (including posterior medians, covariances, and 0.05, 0.25, 0.75, and 0.95 posterior quantiles), the LacI CDS and amino acid sequence for each barcoded variant (as determined by long-read sequencing), the number of LacI CDS reads in the long-read sequencing dataset for each barcoded variant, and the number of unintended mutations in other regions of the plasmid (from the long-read sequencing data).

      Code Availability

      All custom data analysis code is available at https://github.com/djross22/nist_lacI_landscape_analysis.”

      Reviewer #2 (Significance (Required)): The authors present an unprecedently systematic dataset of how mutations affect an allosteric protein. This illustrates the potential of mutational scanning for rapidly identifying diverse variants of allosteric proteins / regulators with desired or unexpected activities for synthetic biology/bioengineering. Previous studies have identified inverted dose-response curve for a lacI phenotypes https://www.cell.com/fulltext/S0092-8674(11)00710-0 but using directed evolution i.e. they were not comprehensive in nature. The audience of this study would be protein engineers, the allostery field, synthetic biologists and the mutation scanning community and evolutionary biologists interested in fitness landscapes. My relevant expertise is in deep mutational scanning and genotype-phenotype landscapes, including work on allosteric proteins and computational methods. Reviewer #3 (Evidence, reproducibility and clarity (Required)): In this interesting manuscript the authors developed in ingenious high throughput screening approach which utilizes DNA barcoding to select variants of LacI proteins with different allosteric profiles for IPTG control using E. coli fitness (growth rate) in a range of antibiotic concentrations as a readout thus providing a genotype-phenotype map for this enzyme. The authors used library of 10^5-10^ variants of LacI expressed from a plasmid and screened for distinct IPTG activation profiles under different conditions including several antibiotic stressors. As a result they identified various patterns of activation including normal (sigmoidal increase), inverted (decrease) and unusual stop-band where the dependence of growth on [IPTG] is non-monotonic. The study is well-conceived, well executed and provides statistically significant results. The key advance provided by this work is that it allows to identify specific mutations in LacI connected with one of three allosteric profiles. The paper is clearly written all protocols are explained and it can be reproduced in a lab that possesses proper expertise in genetics. Reviewer #3 (Significance (Required)): The significance of this work is that it discovered libraries of LacI variants which give rise to distinct profiles of allosteric control of activation of specific genes (in this case antibiotic resistance) by the Lac mechanism. The barcoding technology allowed to identify specific mutations which are (presumably) causal of changes in the way how allosteric activation of LacI by IPTG works. As such it provides a rich highly resolved dataset of LacI variants for further exploration and analysis. Alongside with these strengths several weaknesses should also be noted:

      1. First and foremost the paper does not provide any molecular-level biophysical insights into the impact of various types of mutations on molecular properties of LacI. Do the mutations change binding affinity to IPTG? Binding side? Communication dynamics? Stability? The diagrams of connectivity for the stop-band mutations (Fig.4) do not provide much help as they do not tell much which molecular properties of LacI are affected by mutations and why certain mutations have specific effect on allostery. A molecular level exploration would make this paper much stronger.

      We address this comment with comment (2), below.

      1. In the same vein a theoretical MD study would be quite illuminating in answering the key unanswered question of this work: Why do mutations have various and pronounced effects of allosteric regulation by LacI?. I think publication of this work should not be conditioned on such study but again adding would make the work much stronger.

      We appreciate the reviewer’s comments and agree that investigating the molecular mechanisms driving the phenotypic changes identified in this work is a compelling proposition. Throughout the manuscript, we identify positions and specific amino acid substitutions that affect the measurable function of LacI, and occasionally discuss the biophysical effects that may underly these changes. We have expanded the discussion to include possible molecular-level effects.

      The dataset reported here identifies many potential candidates for molecular-level study, either computationally or experimentally. However, this manuscript is scoped to report a large-scale method to measure the genotype-phenotype landscape of an allosteric protein, and a limited investigation into the emergence of novel phenotypes that are identified in the landscape.

      1. Lastly a recent study PNAS v.116 pp.11265-74 (2019) explored a library of variants of E. coli Adenylate Kinase and showed the relationship between allosteric effects due to substrate inhibition and stability of the protein. Perhaps a similar relationship can explored in this case of LacI.

      We thank the reviewer for highlighting this publication. We agree with the reviewer that similar effects may play a role in the activity of LacI. Establishing such a relationship would require additional experimentation, and, we think, is outside the scope of the submitted manuscript. Although, we hope follow-up studies using this dataset will investigate this phenomenon and other related mechanisms, that may underlie the band-stop phenotype and other observed effects.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this interesting manuscript the authors developed in ingenious high throughput screening approach which utilizes DNA barcoding to select variants of LacI proteins with different allosteric profiles for IPTG control using E. coli fitness (growth rate) in a range of antibiotic concentrations as a readout thus providing a genotype-phenotype map for this enzyme. The authors used library of 10^5-10^ variants of LacI expressed from a plasmid and screened for distinct IPTG activation profiles under different conditions including several antibiotic stressors. As a result they identified various patterns of activation including normal (sigmoidal increase), inverted (decrease) and unusual stop-band where the dependence of growth on [IPTG] is non-monotonic. The study is well-conceived, well executed and provides statistically significant results. The key advance provided by this work is that it allows to identify specific mutations in LacI connected with one of three allosteric profiles. The paper is clearly written all protocols are explained and it can be reproduced in a lab that possesses proper expertise in genetics.

      Significance

      The significance of this work is that it discovered libraries of LacI variants which give rise to distinct profiles of allosteric control of activation of specific genes (in this case antibiotic resistance) by the Lac mechanism. The barcoding technology allowed to identify specific mutations which are (presumably) causal of changes in the way how allosteric activation of LacI by IPTG works. As such it provides a rich highly resolved dataset of LacI variants for further exploration and analysis.

      Alongside with these strengths several weaknesses should also be noted:

      1. First and foremost the paper does not provide any molecular-level biophysical insights into the impact of various types of mutations on molecular properties of LacI. Do the mutations change binding affinity to IPTG? Binding side? Communication dynamics? Stability? The diagrams of connectivity for the stop-band mutations (Fig.4) do not provide much help as they do not tell much which molecular properties of LacI are affected by mutations and why certain mutations have specific effect on allostery. A molecular level exploration would make this paper much stronger.
      2. In the same vein a theoretical MD study would be quite illuminating in answering the key unanswered question of this work: Why do mutations have various and pronounced effects of allosteric regulation by LacI?. I think publication of this work should not be conditioned on sucgh study but again adding would make the work much stronger.
      3. Lastly a recent study PNAS v.116 pp.11265-74 (2019) explored a library of variants of E. coli Adenylate Kinase and showed the relationship between allosteric effects due to substrate inhibition and stability of the protein. Perhaps a similar relationship can explored in this case of LacI.
    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors use deep mutational scanning to infer the dose-response curves of ~60,000 variants of the LacI repressor and so provide an unprecedently systematic dataset of how mutations affect an allosteric protein. Overall this is an interesting dataset that highlights the potential of mutational scanning for rapidly identifying diverse variants of proteins with desired or unexpected activities for synthetic biology/bioengineering. The relatively common inverted phenotypes and their sequence diversity is interesting, as is the identification of several hundred genotypes with non-sigmoidal band-stop dose-response curves and their enrichment in specific protein regions. A weakness of the study is that some of the parameter estimates seem to have high uncertainty and this is not clearly presented or the impact on the conclusions analysed. A second shortcoming is that there is little mechanistic insight beyond the enrichments of mutations with different effects in different regions of the protein. But as a first overview of the diversity of mutational effects on the dose-response curve of an allosteric protein, this is an important dataset and analysis.

      Comments

      Data quality and reproducibility

      "The flow cytometry results confirmed both the qualitative and quantitative accuracy of the new method (Supplementary Figs. 3-7)"

      • There need to be quantitative measures of accuracy in the text here for the different parameters.
      • Sup fig 7 panels should be main text panels - they are vital for understanding the data quality In particular, the G0 parameter estimates from the library appear to have a lower bound ie they provide no information below a cytometry Go of ~10^4. This is an important caveat and needs to be highlighted in the main text. The Hill parameter (n) estimate for wt (dark gray) replicate barcodes is extremely variable - why is this?
      • In general there is not a clear enough presentation of the uncertainty and biases in the parameter estimations which seem to be rather different for the 4 parameters. Only the EC50 parameter seems to correlate very well with the independent measurements.
      • The genotypes in the mutagenesis library contain a mean of 4.4 aa substitutions and the authors us a neural network to estimate 3 of the Hill equation parameters (with uncertainties) for the 1991/2110 of the single aa mutations. It would be useful to have an independent experimental evaluation of the reliability of these inferred single aa mutational effects by performing facs on a panel of single aa mutants (using single aa mutants in sup fig 3-7, if there are any, or newly constructed mutants).
      • fig3/"Combining multiple substitutions in a single protein almost always has a log-additive effect on EC50." How additive are the other 2 parameters? this analysis should also be presented in fig 3. If they are not as additive is it simply because of lower accuracy of the measurements? If the mutational effects are largely additive, then a simple linear model (rather than the DNN) could be used to estimate the single mutant effects from the multiple mutant genotypes.

      Presentation/clarity of text and figures

      • The main text implies that the DNN is trained to predict 3 parameters of the Hill equation but not the Hill coefficient (n). This should be clarified / justified in the main text.
      • The DNN needs to be better explained and justified in the main text for a general audience. How do simpler additive models perform for phenotypic prediction / parameter inference?
      • Ref 14. analyses a much smaller set of mutants in the same protein but using an explicit biophysical model. It would be helpful to have a more extensive comparison with the approach and conclusions o this previous study.
      • Enrichments need statistical tests to know how unexpected that results are e.g. p5 line 12 "67% of strongly inverted variants have substitutions near the ligand-binding pocket"
      • missing citation: Poelwijk et al 2011 https://www.cell.com/fulltext/S0092-8674(11)00710-0 previously reported an inverted dose-response curve for a lacI mutant.
      • What mechanisms do the authors envisage that could produce the band-stop dose response curves? There is likely previous theoretical work that could be cited here. In general there is little discussion of the biophysical mechanisms that could underlie the various mutational effects.
      • "This result, combined with the wide distribution of residues that affect EC50, suggests that LacI allostery is controlled by a free energy balance with additive contributions from many residues and interactions." 'additive contributions and interactions' covers all possible models of vastly different complexity i.e. this sentence is rather meaningless.
      • fig 4 c and d compress a lot of information into one figure and I found this figure confusing. It may be clearer to have multiple panels with each panel presenting one aspect. It is also not clear to me what the small circular nodes exactly represent, especially when you have one smaller node connected to two polygonal nodes, and why they don't have the same colour scale as the polygonal nodes.
      • line 25 - 'causes a conformational change' -> 'energetic change' (allostery does not always involve conformational change
      • sup fig 5 legend misses '5'
      • sup fig 7. pls add correlation coefficients to these plots (and move to main text figures).
      • Reference 21 is just a title and pubmed link
      • "fitness per hour" -> growth rate
      • page 6 line 28 - "discoverable only via large-scale landscape measurements" - directed evolution approaches can also discover such genotypes (see e.g. Poelwijk /Tans paper). Please re-phrase.
      • pls define jargon the first time it is used e.g. band-stop and band-pass

      Methods/data availability/ experimental and analysis reproducibility:

      • The way that growth rate is calculated on page 17 equation 1- This section is confusing. Please be explicit about how you accounted for the lag phase, what the lag phase was, and total population growth during this time. In addition, please report the growth curves from the wells of the four plates, the final OD600 of the pooled samples, and exact timings of when the samples were removed from 37 degree incubation in a table. These are critical for calculating growth rate in individual clones downstream.
      • What were the upper and lower bounds of the measurements? (LacI deletion vs Tet deletion / autofluoresence phenotype - true 100% and true 0% activity). Knowing and reporting these bounds will also allow easier comparison between datasets in the future.
      • Please clarify whether there was only 1 biological replicate (because the plates were pooled before sequencing)? Or if there were replicates present an analysis of reproducibility.
      • Please provide supplementary tables of the data (in addition to the raw sequencing files). Both a table summarising the growth rates, inferred parameter values and uncertainties for genotypes and a second table with the barcode sequence counts across timepoints and associated experimental data.

      Significance

      The authors present an unprecedently systematic dataset of how mutations affect an allosteric protein. This illustrates the potential of mutational scanning for rapidly identifying diverse variants of allosteric proteins / regulators with desired or unexpected activities for synthetic biology/bioengineering.

      Previous studies have identified inverted dose-response curve for a lacI phenotypes https://www.cell.com/fulltext/S0092-8674(11)00710-0 but using directed evolution i.e. they were not comprehensive in nature.

      The audience of this study would be protein engineers, the allostery field, synthetic biologists and the mutation scanning community and evolutionary biologists interested in fitness landscapes.

      My relevant expertise is in deep mutational scanning and genotype-phenotype landscapes, including work on allosteric proteins and computational methods.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors study allostery with a beautiful genotype-phenotype experiment to study the fitness landscape of an allosteric lac repressor protein. The authors make a mutational library using error prone pcr and measure the impact on antibiotic resistance protein expression at varying levels of ligand, IPTG, expression. After measuring the impact of mutations authors fill-in the missing data using a neural net model. This type of dose response is not standard in the field, but the richness of their data and the discovery of the "band pass" phenomena prove its worth here splendidly.

      Using this mixed experimental/predicted data the authors explore how each mutation alters the different parameters of a hill equation fit of a dose response curve. Using higher order mutational space the authors look at how mutations can qualitatively switch phenotypes to inverted or band-stop dose-response curves. To validate and further explore a band-stop novel phenotype, the authors focused on a triple mutant and made all combinations of the 3 mutations. The authors find that only one mutation alone alters the dose-response and only in combination does a band-stop behavior present itself. Overall this paper is a fantastic data heavy dive into the allosteric fitness landscape of protein.

      Major

      Overall, the data presented in this paper is thoroughly collected and analyzed making the conclusions well-based. We do not think additional experiments nor substantial changes are needed apart from including basic experimental details and more biophysical rationale/speculation as discussed in further detail below.

      The authors do a genotype-phenotype experiment that requires extensive deep sequencing experiments. However, right now quite a bit of basic statistics on the sequencing is missing. Baseline library quality is somewhat shown in supplementary fig 2 but the figure is hard to interpret. It would be good to have a table that states how many of all possible mutations at different mutation depths (single, double, etc) there are. Similarly, sequencing statistics are missing- it would be useful to know how many reads were acquired and how much sequencing depth that corresponds to. This is particularly important for barcode assignment to phenotype in the long-read sequencing. In addition, a synonymous mutation comparison is mentioned but in my reading that data is not presented in the supplemental figures section.

      The paper is very much written from an "old school" allostery perspective with static end point structures that are mutually exclusive - eg. p5l10 "relative ligand-binding affinity between the two conformations" - however, an ensemble of conformations is likely needed to explain their data. This is especially true for the bandpass and inverted phenotypes they observe. The work by Hilser et al is of particular importance in this area. We would invite the authors to speculate more freely about the molecular origins of their findings.

      Minor

      There are a number of small modifications. In general this paper is very technical and could use with some explanation and discussion for relevance to make the manuscript more approachable for a broader audience.

      P1L23: Ligand binding at one site causes a conformational changes that affects the activity of another > not necessarily true - and related to using more "modern" statistical mechanical language for describing allostery.

      P2L20: The core experiment of this paper is a selection using a mutational library. In the main body the authors mention the library was created using mutagenic pcr but leave it at that. More details on what sort of mutagenic pcr was used in the main body would be useful. According to the methods error prone pcr was used. Why use er-pcr vs deep point mutational libraries? Presumably to sample higher order phenotype? Rationale should be included. Were there preliminary experiments that helped calibrate the mutation level?

      P2L20: Baseline library statistics would be great in a table for coverage, diversity, etc especially as this was done by error prone pcr vs a more saturated library generation method. This is present in sup fig2 but it's a bit complicated.

      P2L26: How were FACS gates drawn? This is in support fig17 - should be pointed to here.

      P3L4: Where is the figure/data for the synonymous SNP mutations? This should be in the supplement.

      P3L20: The authors use a ML learning deep neural network to predict variant that were not covered in the screen. However, the library generation method is using error prone pcr meaning there could multiple mutations resulting in the same amino acid change. The models performance was determined by looking at withheld data however error prone pcr could result in multiple nonsynomymous mutations of the same amino acid. For testing were mutations truly withheld or was there overlap? Because several mutations are being represented by different codon combinations. Was the withheld data for the machine learning withholding specific substitutions?

      In addition, higher order protein interactions are complicated and idiosyncratic. I am surprised how well the neural net performs on higher order substitutions.

      P4L4: Authors find mutations at the dimer/tetramer interfaces but don't mention whether polymerization is required. is dimerization required for dna binding? Tetramerization?

      P4L8: Substitutions near the dimer interface both impact g0 and ec50, which authors say is consistent with a change in the allosteric constant. Can authors explain their thinking more in the paper to make it easier to follow? Are the any mutations in this area that only impact g0 or ec50 alone? Why may these specific residues modify dimerization?

      P4L8: The authors discuss the allosteric constant extensively within the paper but do not explain it. It would be helpful to have an explanation of this to improve readability. This explanation should include the statistical mechanical basis of it and some speculation about the ways it manifests biophysically.

      P4L1-16: Authors see mutations in the dimerization region that impact either G0 and Gsaturated in combination with Ec50 but not g0 and gsaturated together. Maybe we do not fully understand the hill equation but why are there no mutations that impact both g0 and gsaturated seen in support fig 13c? Why would mutations in the same region potentially impacting dimerization impact either g0 or gsaturated? What might be the mechanism behind divergent responses?

      P4L29: for interpretability it would be good to explain what log-additive effect means in the context of allostery.

      P4L34-P5L19: This section is wonderful. Really cool results and interesting structural overlap!

      P5L34 Helix 9 of the protein is mentioned but it's functional relevance is not. This is common throughout the paper - it would be useful for there to be an overview somewhere to help the reader contextualize the results with known structural role of these elements.

      P5L39: The authors identified a triple mutant with the band-stop phenotype then made all combination of the triple mutant. Of particular interest is R195H/G265D which is nearly the same as the triple mutant. It would be nice if the positions of each of these mutations and have some discussion to begin to rationalize this phenotype, even if to point out how far apart they are and that there is no easy structural rationale!

      P6L9: There should be more discussion of the significance of this work directly compared to what is known. For instance negative cooperativity is mentioned as an explanation for bi-phasic dose response but this idea is not explained. Why would the relevant free energy changes be more entropic? Another example is the reverse-TetR phenotype observed by Hillen et al.

      P6L28: The authors mention that phenotypes exist with genotypes that are discoverable with genotype-phenotype landscapes. This study due to the constraints of error prone pcr were somewhat limited. How big is the phenotypic landscape? Is it worth doing a more systematic study? What is the optimal experimental design: Single mutations, doubles, random - where is there the most information. How far can you drift before your machine learning model breaks down? How robust would it be to indels?

      Figures:

      Sup figs 3-7: The comparison of library-based results and single mutants is a great example of how to validate genotype-phenotype experiments!

      Supp fig 5.: Missing figure number.

      Supp fig7: G0 appears to have very poor fit between library vs single mutant version. Why might this be? R^2 would likely be better to report here as opposed to RMSE as RMSE is sensitize to the magnitude of the data such that you cannot directly compare RMSE of say 'n' to G0.

      Sup fig13c: it is somewhat surprising that mutations only appear to effect g0 and not gsaturated. This implies that basal and saturated activity are not coupled. Is this expected? Why or why not?

      Significance

      Allostery is hard to comprehend because it involves many interacting residues propagating information across a protein. The Monod-Wyman-Changeux (MWC) and Koshland, Nemethy, and Filmer (KNF) models have been a long standing framework to explain much of allostery, however recent formulations have focused on the role of the conformational ensemble and a grounding in statistical mechanics. This manuscript focuses on the functional impact of mutations and therefore contribution of the amino acids to regulation. The authors unbiased approach of combining a dose-response curve and mutational library generation let them fit every mutant to a hill equation. This approach let the authors identify the allosteric phenotype of all measured mutations! The authors found inverted phenotypes which happen in homologs of this protein but most interesting is the strange and idiosyncratic 'Band-stop' phenotype. The band-stop phenotype is bi-phasic that will hopefully be followed up with further studies to explain the mechanism. This manuscript is a fascinating exploration of the adaptability of allosteric landscapes with just a handful of mutations.

      Genotype-phenotype experiments allow sampling immense mutational space to study complex phenotypes such as allostery. However, a challenge with these experiments is that allostery and other complicated phenomena come from immense fitness landscapes altering different parameters of the hill equation. The authors approach of using a simple error prone pcr library combined with many ligand concentrations allowed them to sample a very large space somewhat sparsely. However, they were able to predict this data by training and using a neural net model. I think this is a clever way to fill in the gaps that are inherent to somewhat sparse sampling from error prone pcr. The experimental design of the dose response is especially elegant and a great model for how to do these experiments.

      With some small improvements for readability, this manuscript will surely find broad interest to the genotype-phenotype, protein science, allostery, structural biology, and biophysics fields.

      We were prompted to do this by Review Commons and are posting our submitted review here:

      Willow Coyote-Maestas has relevant expertise in high throughput screening, protein engineering, genotype-phenotype experiments, protein allostery, dating mining, and machine learning.

      James Fraser has expertise in structural biology, genotype-phenotype experiments, protein allostery, protein dynamics, protein evolution, etc.

      Referees cross-commenting

      Seems like our biggest issues are: better uncertainty estimates of the parameters and more biophysical/mechanistic explanation/speculation. The uncertainty estimates might be tricky with the deep learning approach. The more biophysical speculation will require some re-writing around an ensemble rather than a static structure perspective.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):**

      This study addresses the role IL-13 in promoting lung damage following migration of the helminth N. brasiliensis larvae from the circulatory system to the lung. The work clearly shows using IL13-/- mice that Nb elicited IL-13 immunity at days 2-6 post-infection reduces pathology. The authors demonstrate an association with reduced eosinophils but no effect on neutrophil numbers.

      Proteomic analysis identifies a number of molecules known to be involved in protecting against type 2 pathologies such as relm-a and SP-D.

      The authors then identify a clear requirement for IL-13 in driving relm-a expression.

      Finally, the authors present a whole lung RNA transcript profile which largely supports their proteomic observations.

      Taken together the work presents a sound case for IL-13 being an important player in protecting against initial lung pathology.

      **Major requests:**

      The paper is really very interesting and important. To an extent it questions existing dogma of IL-13 being a driver of lung inflammation.

      Addressing the following could hopefully be achieved using archived samples or with an acceptable amount of extra experimental work.

      Figure 1: D2 and D6 Lung IL-13 concentrations (ELISA) in WT mice would set the scene for the papers story*

      We agree that showing IL-13 concentrations in the lung would nicely set the stage for the role of IL-13 during __Nippostrongylus__ infection. In the current paper, we showed IL-13 mRNA levels in Figure 3 but in a revised version, we will include D2 and D6 mRNA data in Figure 1. We attempted to quantify IL-13 protein levels in the BAL fluid of infected WT mice on D2 and D6 post-infection. However, IL-13 in the BAL was below the levels of detection for our ELISA assay. Therefore, we would need to measure IL-13 protein in total lung homogenates but we do not have material archived at present. If the editor feels this is a critical piece of data we will perform repeat experiments.

      Figure 2: The authors should add evidence that function/activity of neutrophils/eosinophils is changed/not changed: e.g. granzyme, MBP, EPO release in BAL and/or lung.

      As supported by referee 3, we feel that measuring functional readouts of neutrophils and eosinophils, while interesting, is currently outside of the scope of the paper. Further, with respect to eosinophils, we see a major reduction in total eosinophil numbers in IL-13-deficient mice which would likely result in a reduction in the level of functional molecules such as MBP. Thus, these readouts in the BAL may not be a reliable indicator of cellular function and results difficult to interpret in light of altered cell numbers.

      Additionally, some data showing changes in epithelial stress related cytokines such as IL-23 and IL-33 would be informative (IHC and /or ELISA).

      The reviewer makes a good suggestion that would complement our proteomics/pathway analysis. As described in our comment below regarding Foxa2 pathways, we do have additional data showing epithelial cell defects in the absence of IL-13 and will add this to a revised manuscript. While we do see a trend for a reduction in IL-33 mRNA in infected IL-13-deficient mice, it is difficult to correlate this with functional protein. If requested, we can perform additional analyses to measure IL-23 and/or IL-33 protein levels in archived BAL fluid or by IHC of lung sections.

      *The following will require a new experiment:

      The authors present a strong case for RELMa being associated with/driven by IL-13 responses. The following I feel would prove that IL-13 driven RELMa is important in reducing lung pathology. Can enhanced lung pathology or cell responses associated with pathology be reduced/altered by dosing Nb infected IL13-/- mice with recombinant relma or by restimulating BAL cells (for example) from IL-13-/- mice. This team is well placed to comment on the potential for such an in vivo experiment to be feasible.

      Or could the authors could also test the ability for other candidate molecules to reduce lung pathology? Would for example i/n dosing of IL-13-/- mice with AMCase, BRP39 or SP-D protect against pathology? It would be expected to be the case for SP-D.*

      Our previous study has shown that RELM-__a plays an important yet highly complex role during lung repair (see Sutherland et al. 2018: https://doi.org/10.1371/journal.ppat.1007423____). The suggested experiments would advance our understanding of the function of RELM-a and other effector molecules during type 2 immunity and repair. However, it is unlikely that the impact of IL-13 will be due to a single effector molecule (as supported by Reviewer 3) and thus these types of experiments would shift the focus of the paper from the impact of IL-13 to understanding specific function of type 2 effectors. Since our study deals more broadly with the function of IL-13 rather than the downstream effectors, we hope that this will open up further investigation of these specific molecules to the wider community to take forward.__

      *Reviewer #1 (Significance (Required)):

      The manuscript places IL-13 as an important initiator of early protection from acute lung damage. This is important as it is to an extent a non-canonical role for this cytokine. This is also important as IL-13 can be manipulated therapeutically. To maximise potential application of such drugs requires detailed understanding of the various contextual roles of IL-13. This study provides such evidence.

      The authors identify a range of target mediators.

      This is an important body of work that is useful for understanding how acute lung damage can be regulated.

      This work will be of interest to Type 2 immunologists, any researcher with an interest in pulmonary inflammation as well as mucosal immunity.

      I make these suggestions/comments based on my own background in Type 2 immunity, lung inflammation and parasitic helminth infection and immunity.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Chenery et al report that IL-13 plays a critical role in protecting mice from lung damage caused by the infection of a nematode, Nippostrongylus Brasiliensis, in WT or IL-13 knockout mice (IL-13 eGFP knock-in mice, Neill et al., Nature 2010). Phenotypically, they demonstrated that IL-13 genetic deficiency resulted in more severe lung injuries and haemorrhaging following the larvae migratory infections. Through the proteomic and transcriptomic profiling, they identified gene-expression changes involved in the cellular stress responses, e.g. up-regulating the expression of epithelial-derived type 2 molecules, controlled by IL-13. They also found that type 2 effector molecules including RELM-alpha and surfactant protein D were compromised in IL-13 knockout mice. Thus, they proposed that IL-13 has tissue-protective functions during lung injury and regulates epithelial cell responses during type 2 immunity in this acute setting. Overall, the manuscript was clearly written and a number of findings were interesting and expected compared to the published knowledge. However, this work could be improved and more impactful by further performing the following suggested experiments.

      Major points:

      1. It may not be accurate to claim that "IL-13 played a critical role in limiting tissue injury ... in the lung following infection" since IL-13 participates in both repelling worms and activating tissue reparative responses. It is very hard to distinguish these two kinds of responses with the current experimental settings because the much higher worm burden led to more direct lung damage in IL-13-/- mice than WT counterparts.*

      The reviewer raises an important point that we will need to clarify in a revised manuscript. Based on several studies, the role of IL-13 in mediating __Nippostrongylus expulsion occurs in the small intestine, after the parasites have already cleared the lung tissue. The number of worms in the lung do not differ at the time points we are investigating. We have qRT-PCR data measuring Nippostrongylus__-specific actin levels, which we and others have previously shown to accurately reflect worm numbers. We can therefore demonstrate that the differences in lung damage do not reflect a difference in the number of larvae in the lungs of IL-13 KO mice compared to WT mice. These data will be incorporated into the manuscript to better clarify this point.

      1. It would be more informative if the authors could perform the RNA-seq analysis on the IL-13-responsive cell type such as airway epithelial cells (goblet cell) by comparing WT vs IL13-/- in the context of lung damage caused by Nitrostrongylus Brasiliensis infection.*

      RNA-sequencing of specific cells would indeed be an excellent experiment that would reveal more IL-13-depedendent processes in our model. However, this would be a considerable undertaking at this stage (as reviewer 3 has pointed out). Nonetheless, our extended analysis of the Foxa2 pathway as requested below has highlighted a number of genes regulated by IL-13, which are known to be involved in epithelial cell function.

      We agree with the reviewer that showing additional validation data to support the Foxa2 defect in IL-13-deficient mice would strengthen our paper’s overall message. We have additional qRT-PCR data of IL-13-dependent genes regulated by Foxa2 (__Clca1, Muc5ac, Ccl11, and Foxa3__) that clearly support this epithelial cell-specific defect that we can readily incorporate into the revised paper.

      *Reviewer #2 (Significance (Required)):

      Overall, the manuscript was clearly written and a number of findings were interesting and expected compared to the published knowledge.

      **Referees cross-commenting**

      To Reviewer #1's Review: fair and constructive

      To Reviewer #3's Review: agree in general.*

      * Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this study, Allen, Sutherland and colleagues utilize IL-13 deficient mice to investigate the function of IL-13 in the early response to lung tissue damage induced by helminth infection. They demonstrate that IL-13 deficiency has significant effects on the acute tissue response to helminth infection (at day 2 and 6 post-infection). Particularly, IL-13 deficiency results in increased lung hemorrhaging, and more pronounced lung tissue damage evidenced by increased gaps in the alveolar architecture. They perform proteomic and transcriptomic profiling of the lungs to determine IL-13-induced pathways and demonstrate many protein and gene expression changes in the absence of IL-13. These include dysregulated collagens, reduced epithelial-derived proteins RELMalpha and surfactant protein D, downregulated pathways related to cellular stress, and increased genes associated with the Foxa2 pathway.

      Overall, the key conclusions are convincing, and the study design, methods and data analysis are clear, rigorous and thorough.

      **Minor Comments:**

      1. The authors concluded that lung epithelial cells are more sensitive to IL-13 than IL-4, but the intranasal injection of both proteins showed a similar induction of RELMα - investigation into this difference would be useful. Alternatively, providing an explanation for these different findings could be helpful.*

      Our suggestion that epithelial cells are likely to be more sensitive to IL-13 was based both on our data and the existing literature. We would agree that we do not have definitive evidence for this. Indeed, because the type 2 receptor can respond to both IL-4 and IL-13 this issue is difficult to easily resolve experimentally. We will expand on this in a revised manuscript, making our explanations clearer whilst acknowledging the alternative explanations.

      This is a good suggestion and we have additional flow cytometry data looking at hematopoietic cell expression of RELM-__a from these experiments that we can incorporate into the revised manuscript. We have found that airway macrophages were another source of RELM-a__ in the lung and mirrored the airway epithelial cell responses to both intraperitoneal and intranasal delivery of IL-4 and IL-13.

      We agree that a comparison of IL-13Ra1 versus IL-13 deficiency should be included in the discussion of our manuscript. These authors found epithelial-specific defects in IL-13Ra1-deficient mice such as Clca1 (aka Clca3), RELM-__a, and chitinase-like proteins even under homeostatic conditions, which is highly consistent with our data. This study also found that IL-13Ra1 deficiency led to increased bleomycin-induced pathology and together with our data, offers further insight into the IL-13/IL-13Ra__1 axis during lung injury. We will add these points to our discussion and will attempt to directly compare their gene expression data set with our data to find more overlapping genes between the two mouse strains and disease models.

      This is indeed a very important point we will address in a revised discussion. IL-4R__a-deficient mice did show increased bleeding in the Chen et al. study that was not seen in the IL-13Ra__1 KO suggesting IL-4 alone is sufficient to limit bleeding. This is in contrast to our study where we found increased bleeding in IL-13 KO mice independent of IL-4. However, a major difference between the studies is the background strain of mice used, which was BALB/c in the Chen et al. study versus C57BL/6 mice we used in our study. In addition to differences in IL-4 and IL-13 levels between the strains, we have unpublished observations of major differences in vascular integrity with BALB/c much more prone to bleeding, which is an active area of investigation in the lab. Although we have yet to unravel these differences mechanistically, they could explain differential requirements for IL-4 versus IL-13 to limit bleeding between the two strains.

      Our apologies, we will fix the reference duplication.

      *Reviewer #3 (Significance (Required)):

      This study addresses the specific function of IL-13 in acute helminth infection of the lung, which has not previously been studied, as most studies investigate the combined function of IL-4 and IL-13 through IL-4 receptor KO or Stat6 KO mice.

      It is a thorough, well-conducted and well-organized study with high quality data using 'omics' strategies to profile IL-13-induced genes and proteins. Their data identifies intriguing pathways that are dependent on IL-13, opening new avenues to explore for IL-13-mediated protective roles in acute lung tissue damage. Therefore this study provides conceptual and technical advances. Additionally, since targeting IL-4 and IL-13 are in clinical trials or employed therapeutically for pulmonary disorders, the findings from these studies are clinically relevant. It would however have been useful to validate some of these pathways and demonstrate epithelial-specific outcomes for IL-13-induced tissue protection.

      Previous studies using IL4RKO have shown that IL-4 and IL-13 are necessary to protect from acute tissue damage in helminth infection (Chen, Nature medicine - referred to by authors). Other studies have investigated IL-13 in fibrosis and granulomatous inflammation (papers referenced by authors, and Ramalingam Nature Immunology 2009). Last, one study shows that IL-13Ra1 signaling is important for protection in bleomycin-induced lung injury, findings using a different transgenic mouse, which are relevant for this study and may be useful to discuss (Karo-Atar, Mucosal Immunology 2016).

      As stated above - the data in this manuscript identify intriguing pathways that are dependent on IL-13, opening new, exciting avenues to explore for IL-13-mediated protective roles in acute lung tissue damage. Their data is also unique as it combines proteomics and transcriptomics, and identities previously unappreciated IL-13 regulated pathways such as cellular stress and Foxa2, which would be interesting to investigate further.

      **Referees cross-commenting**

      To Reviewer 1: The suggested data for Figure 1 (IL-13 concentrations) could be useful, but suggested experiments for Figure 2 could be outside the main focus of the paper.

      For the main suggested experiment: treatment of IL-13-/- with RELMalpha, this could be useful, One caveat is that RELMalpha might not be the only effector molecule downstream of IL-13 so the authors may not get a definitive answer. An alternative (although not as RELMalpha-specific) would be to treat IL13KO mice with FcIL-4 or FcIL-13 - the latter that drives RELMalpha, and look at whether FcIL-13 is more protective than FcIL-4.*

      We agree that rescue experiments could provide insights into the relative protective effects of IL-4 versus IL-13. However, it might be challenging to interpret the results in part because of the difficulty in establishing physiologically relevant doses and timing and the fact that IL-4 will also signal through the type 2 receptor. These difficulties are reflected in the interpretation of our current data as discussed above (pt. 1 reviewer 3). Although we have found IL-4 and IL-13 delivery experiments valuable and have used them in many of our papers, we have always been cautious in our interpretation, as we typically use these at super-physiological doses. However, this is an experiment we would consider if the editors felt it essential to the story.

      To Reviewer 2: I agree with points 1 and 3 - especially with point 3, which would give more in-depth understanding into the functional outcomes of the IL-13 -> FoxA2 pathway identified. For point 2, RNA-seq of epithelial cells would be informative, but may be beyond the scope of the project.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this study, Allen, Sutherland and colleagues utilize IL-13 deficient mice to investigate the function of IL-13 in the early response to lung tissue damage induced by helminth infection. They demonstrate that IL-13 deficiency has significant effects on the acute tissue response to helminth infection (at day 2 and 6 post-infection). Particularly, IL-13 deficiency results in increased lung hemorrhaging, and more pronounced lung tissue damage evidenced by increased gaps in the alveolar architecture. They perform proteomic and transcriptomic profiling of the lungs to determine IL-13-induced pathways and demonstrate many protein and gene expression changes in the absence of IL-13. These include dysregulated collagens, reduced epithelial-derived proteins RELMalpha and surfactant protein D, downregulated pathways related to cellular stress, and increased genes associated with the Foxa2 pathway.

      Overall, the key conclusions are convincing, and the study design, methods and data analysis are clear, rigorous and thorough.

      Minor Comments:

      1. The authors concluded that lung epithelial cells are more sensitive to IL-13 than IL-4, but the intranasal injection of both proteins showed a similar induction of RELMα - investigation into this difference would be useful. Alternatively, providing an explanation for these different findings could be helpful.
      2. Providing data by immunofluorescence or flow cytometry of non-epithelial expression of RELMalpha following intranasal versus intraperitoneal injection of IL-4 versus IL-13 would be useful.
      3. Discussion of IL-13Ra1 deficient mice would be useful, in particular the study by Karo-Atar and Munitz in Mucosal Immunology 2016, showing that IL13Ra1 is protective against bleomycin-induced pulmonary injury (PMID: 26153764). Comparing their data with the gene expression datasets from this study would be useful (acknowledging the caveat that IL-4 effects through the type 2 receptor would also be abrogated in these IL13Ra1 mice).
      4. The authors reference Chen et al. Nature Medicine 2012, but do not discuss the finding in this paper that neither IL-4-/- nor IL13Ra1-/- have increased lung hemorrhage. This might be a mouse strain issue and worthwhile discussing.
      5. Reference 32 and 36 (Sutherland PLoS pathogens) are duplicates

      Significance

      This study addresses the specific function of IL-13 in acute helminth infection of the lung, which has not previously been studied, as most studies investigate the combined function of IL-4 and IL-13 through IL-4 receptor KO or Stat6 KO mice.

      It is a thorough, well-conducted and well-organized study with high quality data using 'omics' strategies to profile IL-13-induced genes and proteins. Their data identifies intriguing pathways that are dependent on IL-13, opening new avenues to explore for IL-13-mediated protective roles in acute lung tissue damage. Therefore this study provides conceptual and technical advances. Additionally, since targeting IL-4 and IL-13 are in clinical trials or employed therapeutically for pulmonary disorders, the findings from these studies are clinically relevant. It would however have been useful to validate some of these pathways and demonstrate epithelial-specific outcomes for IL-13-induced tissue protection.

      Previous studies using IL4RKO have shown that IL-4 and IL-13 are necessary to protect from acute tissue damage in helminth infection (Chen, Nature medicine - referred to by authors). Other studies have investigated IL-13 in fibrosis and granulomatous inflammation (papers referenced by authors, and Ramalingam Nature Immunology 2009). Last, one study shows that IL-13Ra1 signaling is important for protection in bleomycin-induced lung injury, findings using a different transgenic mouse, which are relevant for this study and may be useful to discuss (Karo-Atar, Mucosal Immunology 2016).

      As stated above - the data in this manuscript identify intriguing pathways that are dependent on IL-13, opening new, exciting avenues to explore for IL-13-mediated protective roles in acute lung tissue damage. Their data is also unique as it combines proteomics and transcriptomics, and identities previously unappreciated IL-13 regulated pathways such as cellular stress and Foxa2, which would be interesting to investigate further.

      Referees cross-commenting

      To Reviewer 1: The suggested data for Figure 1 (IL-13 concentrations) could be useful, but suggested experiments for Figure 2 could be outside the main focus of the paper.

      For the main suggested experiment: treatment of IL-13-/- with RELMalpha, this could be useful, One caveat is that RELMalpha might not be the only effector molecule downstream of IL-13 so the authors may not get a definitive answer. An alternative (although not as RELMalpha-specific) would be to treat IL13KO mice with FcIL-4 or FcIL-13 - the latter that drives RELMalpha, and look at whether FcIL-13 is more protective than FcIL-4.

      To Reviewer 2: I agree with points 1 and 3 - especially with point 3, which would give more in-depth understanding into the functional outcomes of the IL-13 -> FoxA2 pathway identified. For point 2, RNA-seq of epithelial cells would be informative, but may be beyond the scope of the project.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Chenery et al report that IL-13 plays a critical role in protecting mice from lung damage caused by the infection of a nematode, Nippostrongylus Brasiliensis, in WT or IL-13 knockout mice (IL-13 eGFP knock-in mice, Neill et al., Nature 2010). Phenotypically, they demonstrated that IL-13 genetic deficiency resulted in more severe lung injuries and haemorrhaging following the larvae migratory infections. Through the proteomic and transcriptomic profiling, they identified gene-expression changes involved in the cellular stress responses, e.g. up-regulating the expression of epithelial-derived type 2 molecules, controlled by IL-13. They also found that type 2 effector molecules including RELM-alpha and surfactant protein D were compromised in IL-13 knockout mice. Thus, they proposed that IL-13 has tissue-protective functions during lung injury and regulates epithelial cell responses during type 2 immunity in this acute setting. Overall, the manuscript was clearly written and a number of findings were interesting and expected compared to the published knowledge. However, this work could be improved and more impactful by further performing the following suggested experiments.

      Major points:

      1. It may not be accurate to claim that "IL-13 played a critical role in limiting tissue injury ... in the lung following infection" since IL-13 participates in both repelling worms and activating tissue reparative responses. It is very hard to distinguish these two kinds of responses with the current experimental settings because the much higher worm burden led to more direct lung damage in IL-13-/- mice than WT counterparts.
      2. It would be more informative if the authors could perform the RNA-seq analysis on the IL-13-responsive cell type such as airway epithelial cells (goblet cell) by comparing WT vs IL13-/- in the context of lung damage caused by Nitrostrongylus Brasiliensis infection.
      3. Figure 6C, the transcriptional profiling of mouse lungs revealed that the Foxa2 pathway was significantly up-regulated in the IL-13-/- infected mice. This is an important finding because this pathway plays a critical role in the process of alveolarization and inhibiting goblet cell hyperplasia. In order to validate this finding, some components in this pathway could be further examined.

      Significance

      Overall, the manuscript was clearly written and a number of findings were interesting and expected compared to the published knowledge.

      Referees cross-commenting

      To Reviewer #1's Review: fair and constructive

      To Reviewer #3's Review: agree in general.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This study addresses the role IL-13 in promoting lung damage following migration of the helminth N. brasiliensis larvae from the circulatory system to the lung. The work clearly shows using IL13-/- mice that Nb elicited IL-13 immunity at days 2-6 post-infection reduces pathology. The authors demonstrate an association with reduced eosinophils but no effect on neutrophil numbers.

      Proteomic analysis identifies a number of molecules known to be involved in protecting against type 2 pathologies such as relm-a and SP-D.

      The authors then identify a clear requirement for IL-13 in driving relm-a expression.

      Finally, the authors present a whole lung RNA transcript profile which largely supports their proteomic observations.

      Taken together the work presents a sound case for IL-13 being an important player in protecting against initial lung pathology.

      Major requests:

      The paper is really very interesting and important. To an extent it questions existing dogma of IL-13 being a driver of lung inflammation.

      Addressing the following could hopefully be achieved using archived samples or with an acceptable amount of extra experimental work.

      Figure 1: D2 and D6 Lung IL-13 concentrations (ELISA) in WT mice would set the scene for the papers story

      Figure 2: The authors should add evidence that function/activity of neutrophils/eosinophils is changed/not changed: e.g. granzyme, MBP, EPO release in BAL and/or lung. Additionally, some data showing changes in epithelial stress related cytokines such as IL-23 and IL-33 would be informative (IHC and /or ELISA).

      The following will require a new experiment:

      The authors present a strong case for RELMa being associated with/driven by IL-13 responses. The following I feel would prove that IL-13 driven RELMa is important in reducing lung pathology. Can enhanced lung pathology or cell responses associated with pathology be reduced/altered by dosing Nb infected IL13-/- mice with recombinant relma or by restimulating BAL cells (for example) from IL-13-/- mice. This team is well placed to comment on the potential for such an in vivo experiment to be feasible.

      Or could the authors could also test the ability for other candidate molecules to reduce lung pathology? Would for example i/n dosing of IL-13-/- mice with AMCase, BRP39 or SP-D protect against pathology? It would be expected to be the case for SP-D.

      Significance

      The manuscript places IL-13 as an important initiator of early protection from acute lung damage. This is important as it is to an extent a non-canonical role for this cytokine. This is also important as IL-13 can be manipulated therapeutically. To maximise potential application of such drugs requires detailed understanding of the various contextual roles of IL-13. This study provides such evidence.

      The authors identify a range of target mediators.

      This is an important body of work that is useful for understanding how acute lung damage can be regulated.

      This work will be of interest to Type 2 immunologists, any researcher with an interest in pulmonary inflammation as well as mucosal immunity.

      I make these suggestions/comments based on my own background in Type 2 immunity, lung inflammation and parasitic helminth infection and immunity.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for carefully reading our manuscript. We found their comments to be incredibly thoughtful and constructive and greatly appreciate their feedback. We are confident that addressing the reviewers’ concerns will strengthen our manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this manuscript entitled 'Combinatorial patterns of graded RhoA activation and uniform F-actin depletion promote tissue curvature' by Denk-Lobnig et al. the authors study the organisation of junctional F-actin during the process of mesoderm invagination during gastrulation in the model Drosophila. Following on from previous work by the same lab that identified and analysed a multicellular myosin II gradient across the mesoderm important for apical constriction and tissue bending, the authors now turn their attention to actin. Using imaging of live and fixed samples, they identify a patterning of F-actin intensity/density at apical junctions that they show is dynamically changing going into mesoderm invagination and is set up by the upstream transcription factors driving this process, Twist and Snail. They go on to show, using genetic perturbations, that both actin and the previously described myosin gradient are downstream of regulation and activation by RhoA, that in turn is controlled by a balance of RhoGEF2 activation and RhoGAP C-GAP inactivation. The authors conclude that the intricate expression patterns of all involved players, that all slightly vary from one another, is what drives the wild-type distinctive cell shape changes in particular rows of cells of the presumptive mesoderm and surrounding epidermis.

      This is a very interesting study analysing complex and large-scale cell and tissue shape changes in the early embryo. Much has been learned over the last decade and more about many of the molecular players and their particular behaviours that drive the process, but how all upstream regulators work together to achieve a coordinated tissue-scale behaviours is still not very well understood, and this study add important insights into this.

      The experiments seem well executed and support the conclusion drawn, but I have a few comments and questions that I feel the authors should address to strengthen their argument.

      We thank the reviewer for their interest in the paper and their helpful comments.

      **General points:**

      1. The authors state early on that they chose to focus on junctional rather than apical medial F-actin, but it is unclear to me really what the rationale behind that is. In much of the authors earlier work, they study the very dynamic behaviour of the apical-medial actomyosin that drives the apical cell area reduction in mesodermal cells required for folding. They have previously analysed F-actin in the constricting cells, but have only focused on the most constricting central cell rows (Coravos, J. S., & Martin, A. C. (2016). Developmental Cell, 1-14). The role of junctional F-actin compared to the apical-medial network on which the myosin works to drive constriction is much less clear, it could stabilize overall cell shape or modulate physical malleability or compliance of cells, or it could more actively be involved in implementing the 'ratchet' that needs to engage to stabilise a shrunken apical surface. I would appreciate more explanation or guidance on why the authors chose not to investigate apical-medial F-actin across the whole mesoderm and adjacent ectoderm, but rather focused in junctional F-actin, especially explaining better throughout what they think the role of the junctional F-actin they measure is.

      We focused on the junctional/lateral F-actin pool because this is where tissue-wide patterns in intensity variation are observed, especially when looking across the mesoderm-ectoderm boundary. Indeed, when we compare the apical-medial F-actin of marginal mesoderm cells to ectoderm cells in cross sections, we find no apparent difference, whereas there is a striking difference in junctional/lateral F-actin density (Fig. 1B, C; Supplemental Fig. 1A, D). We provide some preliminary en face views of the medial-apical surface in our response to Point 2, and we will obtain higher resolution images from live and fixed embryos to better show the network organization. We agree with the reviewer that this requires added justification. Therefore, we will: 1) Provide higher resolution images of apical-medial F-actin comparing different regions of mesoderm and ectoderm, and 2) revise the text to better justify why we chose junctional/lateral F-actin to focus our tissue-level analysis and to elaborate more on what we think the role of junctional/lateral F-actin in this process may be.

      Comparing the F-actin labeling in the above previous paper to the stainings/live images shown here, they look quite different. This is most likely due to the authors here not showing the whole apical area but focusing on junctional, i.e. below the most apical region. It is not completely clear to me from the paper at what level along the apical-basal axis the authors are analysing the junctional F-actin. Supplemental Figure 2 seems to suggest about half-way down the cell, which would be below junctional levels. Could the authors indicate this more clearly, please? Overall, I would appreciate if the authors could supply some more high-resolution images of F-actin from fixed samples (which I assume will give the better resolution) of how F-actin actually looks in the different cells with differing levels. Is there for instance a visible change to F-actin organisation? And could this help explain the observed changes in intensity and their function?

      We apologize for the confusion, we were referring to ‘junctions’ as the lateral contacts between cells, as opposed to the adherens junctions at the apical surface. We have modified the text to use the term ‘lateral’ rather than ‘junctional’ F-actin, so as to avoid this confusion. The difference in cortical F-actin staining is not restricted to a particular apical-basal position, but extends along the length of the lateral domain (Fig. 1B, C). As far as we can tell the actin is bundled and underlies the entire cell circumference. We will: 1) better define the apical-basal position within the cell that we are showing, and 2) show high-resolution en face images of F-actin at different apical-basal positions, across different cell positions, in live and fixed embryos to better justify our focus on lateral F-actin (similar orientation, but higher resolution/quality than preliminary live data below).

      Along the same lines of thought as in point 2): Dehapiot et al. (Dehapiot, B., ... & Lecuit, T. (2020). Assembly of a persistent apical actin network by the formin Frl/Fmnl tunes epithelial cell deformability. Nature Cell Biology, 1-21) have recently shown for the process of germband extension and amnioserosa contraction that two pools of F-actin can be observed, a persistent pool not dependent on Rho[GTP] and a Rho-[GTP] dependent one. Could the authors comment on what they think might occur in the mesoderm, are similar pools present here as well?

      1. As the authoirs state themselves, Rho does not only affect actin via diaphanous, but of course also myosin via Rock. So it would be good to refelect this more in the interpretation and discussion of data, as the causal timeline could be complex.

      We thank the reviewer for reminding us to address this point and to discuss this excellent recent paper. We have not observed a persistent medial actin network in mesoderm cells or ectoderm cells at this stage (i.e. prior to germband extension). It was previously shown in mesoderm cells that pulsed myosin contractions condense the medio-apical F-actin network, but that this is often followed by F-actin network remodeling and that total F-actin levels decrease during apical constriction (Mason et al., 2013). Furthermore, Rho-kinase inhibition in mesoderm cells significantly disrupts this network, but does not inhibit the rapid assembly and disassembly of apical F-actin cables, which could reflect elevated actin turnover (Mason et al., 2013; Jodoin et al., 2015). To address the reviewer’s points, we 1) now include a paragraph in the Discussion to discuss the Dehapiot et al. paper (Comment 3) and the possible roles of various pools of F-actin and Rock/myosin shape the tissue (Comment 4) (lines 404-408), and 2) will image the apical surface of mesoderm and ectoderm at this stage and also germband extension (as a positive control) in order to determine whether there is a persistent network.

      **More specific comments to experiments and figures:**

      1. Reduction of junction function by alpha-catenin-RNAi: how strong is the reduction in catenin? Could they label a-catenin in fixed embryos? The authors conclude the original pre-constriction patterning of F-actin intensity is not dependent on intact junctions, but they show that the increase in F-actin in the mesodermal cells concomitant with apical constriction is in fact impaired in the RNAi. Thus, the authors can also not conclude whether the continued accumulation of myosin and its persistence depend on intact junctions. The initial set-up of the myosin gradient in terms of intensity distribution is unaffected, but clearly dynamics, subcellular pattern, interconnectivity etc. of myosin are affected and thus may well depend on some mechanical feed-back. I find this section of the manuscript slightly overstated and feel the conclusion should be more cautious.

      We thank the reviewer for pointing this out; we completely agree that we should have been more precise with our language. Our main conclusion was that myosin accumulation in a gradient does not require ‘sustained mechanical connectivity’. We felt it was important, given our model of transcriptional patterning, to show that some patterns did not result from mechanics or even apical constriction. Alpha-catenin knock-down provides the cleanest and most severe disruption of adhesion that we can accomplish at this developmental stage. We showed that alpha-catenin-RNAi resulted in: a) almost no intercellular connectivity in myosin structures (Yevick et al., 2019), and b) no apical constriction (this study, Fig. 3B).

      We: 1) revised the text in this section, clarifying that we are only referring to the gradient and that other myosin properties clearly do depend on mechanics, 2) will include data better showing the extent of the alpha-catenin knockdown and its effects on junctions and actomyosin.

      Figure 1 versus Figure 2: Why do the Utrophin-ABD virtual cross-sections look so fuzzy and bad in comparison to phalloidin labelled F-actin in the virtual cross-section in Fig. 1B and C? The labelling shown in 2B and D does not even look very junctional...

      We apologize that we did not explain the difference in visualization methods more clearly. For live images (Figure 2), we used a projection of cross-sections, which includes 20 µm length along the anterior-posterior (AP) axis. This projection method is less dependent on the specific AP position of the cross-section and the specific cells being shown. We did this because the projection helps to visualize the tissue pattern in live images where fluorescence images are noisier than fixed images, which exhibit cleaner labeling (Fig. 1). To address this point, we plan to: 1) Edit the text to make the method of visualization clearer, and 2) fix snail and twist mutant embryos and also provide thin cross-sections analogous to Fig. 1.

      Figure 5 C and D: the control gradients for myosin shown in C and D are completely different, for C the half-way height cell row is deduced as 5 whereas the (in theory identical) control measure in D has row 3 at halfway height! Why is this? Putting all curves together in the same panel would suggest that that C control curve is very similar to RhoGEF2-OE! This can't be right.

      The reason for the different width of the gradients in these controls is the Sqh::GFP copy number. All of our experiments perturbing Rho were carefully controlled so as to ensure the same copy number of the fluorescent marker that we were visualizing. For technical reasons, we were only able to get 1 copy of the Sqh::GFP into the RhoGEF2-OE background. Having two copies of the Sqh::GFP appears to have a slightly activating effect; in fact, the reviewer might have noticed that ventral furrows with 2 copies Sqh::GFP (and a wider gradient) have lower curvature, consistent with our main conclusion (Fig. 7 C). The effects of fluorescently tagged markers were a concern for us and so we were careful to show that the effects of changing RhoA activity on tissue curvature occur regardless of the fluorescent marker (i.e., Sqh::GFP or Utr::GFP, Fig. 7 and Sup. Fig. 7).

      Still in Figure 5: Panels C and D again, but for apical area: are the control and C-GAP-RNAi or RhoGEF2-OE curves significantly different? What statistics were used on this?

      We thank the reviewer for this point. We did not include statistical comparisons of the gradient width originally, because we felt that it does not completely capture the difference between the two curves and that presenting the curves instead lets readers examine the intricacies of the data as a whole. However, to address the reviewer’s point, we will add statistical comparisons for apical area as well as myosin and actin patterns.

      Supplemental Figure 1: Panels in D: I appreciate this control, but would really also like to see the same control at a stage when folding has commenced and stretched cells are present at the margin of the mesoderm. How homogenous does the GAP43 label look in those?

      We will add a more apical projection (with quantification) of this embryo, in which folding has already commenced, to the revised manuscript, so its stage is clearer.

      Supplemental Figure 5: Panel 5 B: the authors conclude that the myosin gradient under RhoGEF2 RNAi is not smaller, but looking at the curves it in fact looks wilder. They also mention that the overall level of myosin in this condition is lower than the control...

      We will include quantification of absolute levels in Supplemental Figure 5 to compare overall levels. We will also statistically compare RhoGEF2 RNAi and control gradients and update our conclusions accordingly.

      Following on from the above, a comment of Figure 7: - The authors use RhoGEF2 RNAi stating that it affects the actin pattern, but the myosin pattern also seems affected. In line 318 the authors state that they use this condition to look at how junctional actin density affects curvature. I find this phrase misleading as It might lead the readers to think that RHoGEF2 RNAi only affects junctional F-actin, although it also affects myosin patterning.

      We thank the reviewer for catching this, that’s a good point. We have revised the text in lines 317-326 to more accurately describe the effect of RhoGEF2-RNAi on actin and myosin patterning, and to connect this to the effect on curvature.

      • Line 311: confusingly, the authors state that an increase in the actomyosin gradient affects curvature. But it is only the myosin gradient that is increased, while the junctional actin gradient is flatter than the control in both C-GAP RNAi and RhoGEF2 OE (the distinction is even made by authors line 243). Could this be clarified?

      We thank the reviewer for pointing out this imprecision on our part and have revised Line 311 to more precisely describe the individual effects on myosin and F-actin pattern changes upon RhoA perturbation.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Mesoderm invagination during Drosophila gastrulation has been a paradigm for how regionally restricted gene expression locally activates Rho signalling and for how subsequently activated acto-myosin drives cell shape changes which in turn lead to a change in tissue morphology. Despite the numerous studies on this subject and a good understanding of the overall process, several important aspects have remained elusive so far. Among these is the dynamics of cortical and junctional F-actin and its contribution to the shape changes of cells and tissue. Previous studies have focused on MyoII, the „active" component of the actin cytoskeleton. The dynamics of the „passive" counterpart, namely actin filaments, has been neglected, although it is clear that Rho signalling controls both branches.

      We thank the reviewer for the tough questions. The reviewer raises important points that, even if not all feasible to address experimentally, can be addressed by being more precise with our language__ and conclusions.__

      1. Although I clearly acknowledge the efforts taken by the authors to define a function of cortical (junctional) F-actin in apical constriction and furrow formation, several central aspects of the study are not sufficiently resolved and conclusive. Rho signalling controls MyoII via Rok and F-actin via forming/dia, among other less defined targets. The role of MyoII and cortical contraction could be conclusively sorted out, since inhibition of Rok affects the MyoII branch but not the other branches. A similar approach, i. e. a specific inhibition/depletion without affecting the other branch, has not been taken yet for the F-actin branch. The authors have not resolved this issue. When analysing the mutants, the authors cannot distinguish the effect of Rho signalling on the MyoII and F-actin branch. For this reason the changes in F-actin distribution in the mutants are linked to changes in Myo activity and thus a function cannot be assigned to F-actin. In order to derive a specific role of F-actin distribution for furrow formation, the authors need to find experimental ways to affect F-actin levels without affecting MyoII, for example by analysing mutants for dia or other formins.

      The reviewer’s assertion that Rok and Diaphanous only affect myosin and actin, respectively, is oversimplified. For example, in mammals, Rok regulates the Lim-Kinase/Cofilin pathway and thus F-actin (Geneste et al., JCB, 2002). The ‘F-actin branch’ of the RhoA pathway has been examined in multiple previous studies of mesoderm invagination (Fox and Peifer, 2007; Homem et al., 2008; Mason et al., 2013). We did not include diaphanous mutants in this tissue-level study because diaphanous mutants and actin drugs: a) affect RhoA signaling (Munjal et al., 2015; Coravos et al., 2016; Michaux et al., 2018), b) disrupt adherens junctions and tissue integrity (Homem et al, 2008; Mason et al., 2013), and c) have a preponderance of cellularization defects (Afshar et al., 2000). However, we agree with the reviewer that this could potentially be interesting, and so we 1) will look at the tissue-level pattern in Diaphanous-depleted embryos, 2) will analyze tissue-level actomyosin patterns in Rok inhibitor-injected embryos, and 3) have added a section to the Discussion (lines 418-432) explaining past work in this area and why we did not provide data on diaphanous mutants. A caveat of the proposed experiments is that actin and myosin ‘branches’ may be too interconnected to be conclusively separated.

      The authors employ a discontinuous spatial axis by the cell number. Although there are good arguments to understand and treat the cells as units, there are also good arguments for using a scale with absolute distance. I have doubts that the graded distributions presented by the authors are a result of this scaling with cell units. When looking at panel B of Fig 1 or Fig. 2A,B, for example, a sharp step like distribution is visible at the boundary between mesoderm and ectoderm anlage. In contrast a F-actin intensity distribution is graded after quantification. The graded distribution appears not to be a consequence of averaging because an even sharper step is very obvious in a projection along the embryonic axis as shown in panel B and D of Fig. 2, for example. The difference of a sharp step in the images and graded distribution after quantification with a spatial axis in cell number, is obvious for a-catenin in Fig. 3D and Rho signalling in Fig. 4 B. As the authors base their central conclusion (see headline) on the graded distribution, resolving the issue of spatial scale is a prerequisite of publication.

      We thank the reviewer for their point. It is an excellent idea and we have included representative plots with a continuous spatial scale in addition to our cell-based analysis (see below, each trace is average line intensity for 1 embryo). The spatially resolved analysis shows similar patterns for F-actin, myosin and RhoA pathway components as the cell-based metric and we plan to include this data as Supplemental Fig. 3 and 4 in a revised version of the manuscript.

      The authors put the spatial distribution of Rho signalling and F-actin into the center of their conclusion. They do so by affecting the pattern with mutants in twist/snail and varying upstream factors of Rho signalling. With respect to myo activation this have been done previously although possibly with less detail and it is no new insight that the width of the mesoderm anlage and corresponding Rho signalling domain has a consequence on the shape of the groove and furrow. To maintain the conclusion of the manuscript that spatially graded Rho signalling is contributes to tissue curvature, more convincing ways to change the pattern of Rho signalling are needed. Changing the balance of GEF and GAP shows the importance of Rho signalling and possibly signalling levels but not the contribution of its spatial distribution.

      A strength of our study was that we were able to stably ‘tune’ Rho signaling pattern and then follow tissue shape at later stages to determine the connection between the two. We respectfully disagree with the statement that, “with respect to myosin activation this has been done previously”. In past work, we expanded myosin activation by modifying embryonic cell fate, including changes in dorsal cell fates (Heer et al. 2017; Chanet et al., 2017). Here, we directly manipulate RhoA signaling, maintaining the width of the mesoderm anlage (see images below).

      A central conclusion of our study is that RhoA activation level determines the width of myosin activation within a normally sized mesoderm anlage, which has not been done before. The genetic approach presented in the paper was the best way we found to manipulate the spatial pattern of myosin/actin in a stable manner that lasts through invagination. It is worth noting that this approach allowed us to carefully ‘tune’ the level of RhoA activation so as to avoid elevating RhoA levels to the point that it disrupts signaling polarity within the cell (Mason et al., 2016). In our hands, optogenetic manipulation of RhoA, which requires continuous optical input, was less robust because: a) 2D tissue flow precludes delivering a consistent level of activation to given cells over the time course of invagination, b) tissue folding (i.e. 3D deformation) dramatically alters how much light is delivered to the mesoderm cells.

      To address the reviewer’s point, we: 1) edited the Discussion to explicitly state that we did not alter the pattern of RhoA activation without altering RhoA signaling levels and (lines 339-343), 2) plan to include Snail or Twist stainings showing that the width of the mesoderm anlage is not altered by changes in RhoA signaling so there is no confusion about this point, and 3) plan to include a mechanical model that compares how altering signaling levels vs. altering the spatial distribution of signaling affect fold curvature, respectively.

      Reviewer #2 (Significance (Required)):

      The question of a contribution of F-actin is addressed in this manuscript. The authors quantify F-actin in fixed and living embryos at two prominent steps in ventral furrow formation, (1) shortly prior to onset of apical constrictions and (2) when the groove has formed. They distinguish junctional and „medial" cortical F-actin. They employ a discontinuous spatial axis, the number of cells away from the ventral midline but not an absolute scale (see my notes below). The measurements are applied to wild type and mutant embryos affecting the transcriptional patterning (twist, snail), adherens junctions, and Rho signalling. The authors claim to reveal by their measurements a graded distribution of F-actin intensities with a peak at the ventral midline and a second peak at the boundary between mesoderm and ectoderm with a low point in the stretching cells of the mesectoderm. The authors further claim to reveal a graded distribution of Rho signalling components within the mesoderm anlage. Based on these data the authors conclude that graded Rho signalling and depletion of F-actin promote tissue curvature.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Previous work has shown that mesoderm invagination at the ventral midline of the Drosophila embryo requires precise spatial regulation of actomyosin levels in order to fold the tissue. In this work, Denk-Lobnig and colleagues further investigate the spatial distribution of myosin and F-actin in the mesoderm and how these patterns are established. The authors identify an F-actin pattern at the apical cell junctions that emerges upon folding, with elevated levels in the cells around the ventral midline, a decrease in junctional F-actin in the marginal mesoderm, and then an increase at the mesoderm-ectoderm border. They identify Snail and Twist as regulating different aspects of establishing this F-actin pattern. Additionally, by modulating RhoA activity (downstream of Twist) the authors are able to alter the width of the actomyosin pattern without affecting the width of the mesoderm tissue, which in turn affects the curvature of the tissue fold and the post-fold lumen size.

      The authors have conducted an elegant quantitative analysis of the distribution of actin, myosin and several of their regulators across the tissue. The study makes an attempt at integrating a large amount of information into a model of tissue folding, and the concept of mechanical gradients is exciting and still underexplored. I am concerned that the interpretation of some results focuses on specific details but ignores larger scale effects (e.g. potential effects of some of the manipulations on the ectoderm, and the impact that that could have on tissue folding are largely ignored). The statistical analysis of several results should also be improved. I suggest to address the following points.

      We thank the reviewer for their interest in our work and their important suggestions.

      **MAJOR**

      1. Line 127 and Figure 1E: The authors argue that there is an anticorrelation between F-actin distribution and cell areas. However, an R-squared value of 0.1083 rather suggests little-to-no correlation. The authors should evaluate the statistical significance of that correlation.

      To indicate whether the relationship between F-actin distribution and cell areas is significant, we will report the p-value for the F-test for overall significance for our regression analysis, as well as sample size, of this data in the revised manuscript. The F-statistic for this analysis is __F = 89.2, p-value = 4.7e-20.__

      Figure 5: claims that the width of the actomyosin gradient is affected by the various perturbations should be supported with statistical analysis. For example, the half-maximal gradient position for each individual myosin trace could be calculated (instead of using the mean trace), displayed using a box plot, and tested for significance using the Mann-Whitney U test, as in Figure 7. This is slightly complicated by the fact that the control group in Figure 5C is the same as the control group in Figure 3E, which needs to be carefully considered. Also, similar calculations should be made for the F-actin data in Fig 5E-G since throughout the rest of the paper, the authors refer to the width of the "actomyosin gradient" which implicates both myosin and actin.

      We thank the reviewer for this point We will include statistical comparisons for myosin gradients in the revised manuscript. To allow for multiple comparisons using the same control, we plan to use Kruskal-Wallis testing, which is analogous to one-way ANOVA for non-parametric data, and a post-hoc test to determine which pairs have significantly different distributions.

      We will update the language in the manuscript to distinguish between actin and myosin patterns. As our main conclusion is that F-actin depletion levels are changed by RhoA in marginal mesoderm cells, we will statistically compare this between groups.

      Line 142 and Figure 2B-C: I was confused by the description of the snail phenotype: - a. the claim that in snail mutants actin levels are uniform: based on Figure 2C, I'd say that F-actin levels decrease across the mesoderm moving away from the ventral midline, and that the main issue is with the accumulation of actin in the distal end of the mesoderm. The authors should better justify the claim that F-actin levels are uniform in snail mutants (or remove it). Maybe comparing F-actin levels in the first four or five rows of the mesoderm? - b. how about the increase of F-actin in the distal mesoderm, just adjacent to the ectoderm boundary? Why is it gone in snail mutants?

      1. We agree that the intensity in all embryos appears to decrease on the sides of the embryos when imaged in this way, but it is also clear that there is no abrupt increase in F-actin density going into the ectoderm. In our experience, the edge effect is due to the distance of the side of the embryo from the coverslip rather than actual lower F-actin density. This is suggested by: a) the fact that all snail mutant embryos peak at the center of the image even though they are not all oriented with the ventral side perfectly on top, and b) all embryos exhibit an intensity decrease within the ectoderm toward the edges of the image that are further away from the coverslip (Fig. 2 C, E, F). We will: 1) modify the text to include an explanation, and 2) fix and stain snail and twist mutant cross-sections that will not exhibit this effect of imaging depth, for comparison.
      2. We show in Figure S1C that in wild-type, F-actin does not actually increase in cells at the ectoderm boundary, but merely decreases in lateral mesoderm cells. Thus, it is likely that snail mutant embryos are merely lacking patterning in the mesoderm, where snail is active.
      3. With alpha-catenin-RNAi, F-actin depletion across the mesoderm still occurs, but junctional F-actin levels are not increased around the midline. While some explanations are offered in the text, the reason for this phenotype seems important for the story. The text in lines 204-205 suggests that F-actin that would normally be localized to the apical junctions is instead being drawn into medioapical actomyosin foci. Is this idea supported by evidence that medioapical F-actin in control embryos is lower than in alpha-catenin RNAi?

      We appreciate the reviewer’s suggestion to explain this more thoroughly. We find that in alpha-catenin-RNAi and even arm (β-catenin) mutant embryos, junctional complexes (i.e., E-cadherin) are drawn into the myosin spot through continuous contractile flow (see below and Martin et al., 2010 for arm). To make this clear in the manuscript, we plan to: 1) include data showing the effects of alpha-catenin RNAi on F-actin and E-cadherin localization in fixed embryos, which is now included in Supplemental Figure S3, and 2)

      include live imaging of UtrGFP-labeled alpha-catenin RNAi embryos.

      Figure 6A: there is a correlation between cell position and the productivity of myosin pulses, which the authors attribute to the RhoA gradient. This should be more definitively demonstrated by:

      • a. Plot and calculate the correlation between RhoA levels (measured with the RhoA probe) and the change in cell area caused by a contraction pulse. Is this a significant correlation?

      • b. How does myosin persistence change when RhoA is manipulated, e.g. in RhoA overexpressing embryos or in RhoA RNAi?

      It has already been shown that there is a correlation between myosin amplitude and apical constriction amplitude (Xie et al., 2015).__ Apical myosin and Rho-kinase localization depends entirely on RhoA activity (Mason et al., 2016) and Rho-kinase co-localizes precisely with myosin in both space and time (Vasquez et al., 2014). Changing levels of the RhoA regulator C-GAP has been shown to affect myosin persistence and the productivity of apical constriction, with higher C-GAP causing less productive constriction (Mason et al., 2016). We plan to update the text to connect the observation with what has been shown in previous studies and to make statements regarding causality on the tissue-level more cautious. However, our observation further shows how cytoskeletal activity is patterned across the mesoderm, so we think it has value and that it should be included in this paper. An in depth study of the connection between RhoA regulators and myosin persistence/pulsing is beyond the scope of the present study, especially considering possible COVID-19 restrictions. Making these connections will require substantial effort in the future.__

      **MINOR**

      1. The authors should indicate if the myosin shown in Figure 1A is junctional or medioapical. If it is junctional, does medioapical myosin better match junctional F-actin and cell areas? Similarly, if they are showing medioapical myosin, how does junctional myosin compare to junctional actin? It seems to me that consistently comparing the patterns of junctional F-actin and medioapical myosin (and RhoGEF2, RhoA, and ROCK in Figure 4) could be somewhat misleading, as the pools compared localize in different subcellular compartments.

      The myosin images shown throughout the paper are medioapical myosin. Junctional myosin in mesoderm cells is lower in intensity and cannot easily be seen by live imaging. We agree that it is important for the reader to see all pools of these proteins. Therefore, we will include in a supplemental figure high resolution images of actin and myosin at both apical and subapical positions for midline mesoderm, marginal mesoderm, and ectoderm cells at the time of folding. We will also justify why the analyzed pools were chosen, respectively.

      Most of the intensity traces for myosin and F-actin are presented as normalized intensity, relative to the highest intensity in the trace. However, there are claims throughout the text about the relative levels of myosin (ex. Line 241) and F-actin (conclusions based on Fig. 2B-D) that should be supported by quantification. It seems that changes in intensity for both F-actin and myosin, in addition to shape of the gradient, would contribute to the understanding of actomyosin regulation in this tissue. However, if intensities cannot be directly compared between groups due to variation in imaging settings or staining protocols, there should be no claims made about changes in overall F-actin or myosin intensity.

      We appreciate the point made by the reviewer here. To address this point, we will provide data for absolute levels in relevant cases and be more precise in our conclusions.

      The significance of the correlation in Figure 7E should be quantified.

      We will report the p-value for the F-test for overall significance for our regression analysis of this data. The F-statistic for this analysis is F = __15.6, p-value = 0.00103.__

      Supplemental Figure 2: does the segmentation image match the second Z reslice immediately above? It does not appear so, or perhaps they are just not aligned. Having the two match would be more convincing of the segmentation technique.

      We will ensure that matching images are used for this figure.

      Reviewer #3 (Significance (Required)):

      The authors have conducted an elegant quantitative analysis of the distribution of actin, myosin and several of their regulators across the tissue. The study makes an attempt at integrating a large amount of information into a model of tissue folding, and the concept of mechanical gradients is exciting and still underexplored. I am concerned that the interpretation of some results focuses on specific details but ignores larger scale effects (e.g. potential effects of some of the manipulations on the ectoderm, and the impact that that could have on tissue folding are largely ignored). The statistical analysis of several results should also be improved.

      This is a great point. It is important to note that our conclusions required us to ‘tune’ the expression of GEF and the depletion of GAP with GAL4 drivers to get expression levels that do not dramatically affect RhoA polarity within mesoderm cells, but that alter the tissue level pattern within the mesoderm. Furthermore, we were cautious in making sure that our perturbations that elevate RhoA activation level did not lead to elevated myosin in the ectoderm (Fig. 5A and B). It is worth noting that RhoGEF2 is still full-length in all cases and has all of the normal regulatory domains that allow its activity to be restricted to the mesoderm at this stage. To more explicitly show the effect of our perturbations on ectoderm cells, we plan to include higher resolution images comparing myosin and F-actin organization/levels in the ectoderm for our manipulations of RhoA signaling.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Previous work has shown that mesoderm invagination at the ventral midline of the Drosophila embryo requires precise spatial regulation of actomyosin levels in order to fold the tissue. In this work, Denk-Lobnig and colleagues further investigate the spatial distribution of myosin and F-actin in the mesoderm and how these patterns are established. The authors identify an F-actin pattern at the apical cell junctions that emerges upon folding, with elevated levels in the cells around the ventral midline, a decrease in junctional F-actin in the marginal mesoderm, and then an increase at the mesoderm-ectoderm border. They identify Snail and Twist as regulating different aspects of establishing this F-actin pattern. Additionally, by modulating RhoA activity (downstream of Twist) the authors are able to alter the width of the actomyosin pattern without affecting the width of the mesoderm tissue, which in turn affects the curvature of the tissue fold and the post-fold lumen size.

      The authors have conducted an elegant quantitative analysis of the distribution of actin, myosin and several of their regulators across the tissue. The study makes an attempt at integrating a large amount of information into a model of tissue folding, and the concept of mechanical gradients is exciting and still underexplored. I am concerned that the interpretation of some results focuses on specific details but ignores larger scale effects (e.g. potential effects of some of the manipulations on the ectoderm, and the impact that that could have on tissue folding are largely ignored). The statistical analysis of several results should also be improved. I suggest to address the following points.

      MAJOR

      1. Line 127 and Figure 1E: The authors argue that there is an anticorrelation between F-actin distribution and cell areas. However, an R-squared value of 0.1083 rather suggests little-to-no correlation. The authors should evaluate the statistical significance of that correlation.
      2. Figure 5: claims that the width of the actomyosin gradient is affected by the various perturbations should be supported with statistical analysis. For example, the half-maximal gradient position for each individual myosin trace could be calculated (instead of using the mean trace), displayed using a box plot, and tested for significance using the Mann-Whitney U test, as in Figure 7. This is slightly complicated by the fact that the control group in Figure 5C is the same as the control group in Figure 3E, which needs to be carefully considered. Also, similar calculations should be made for the F-actin data in Fig 5E-G since throughout the rest of the paper, the authors refer to the width of the "actomyosin gradient" which implicates both myosin and actin.
      3. Line 142 and Figure 2B-C: I was confused by the description of the snail phenotype:
        • a. the claim that in snail mutants actin levels are uniform: based on Figure 2C, I'd say that F-actin levels decrease across the mesoderm moving away from the ventral midline, and that the main issue is with the accumulation of actin in the distal end of the mesoderm. The authors should better justify the claim that F-actin levels are uniform in snail mutants (or remove it). Maybe comparing F-actin levels in the first four or five rows of the mesoderm?
        • b. how about the increase of F-actin in the distal mesoderm, just adjacent to the ectoderm boundary? Why is it gone in snail mutants?
      4. With alpha-catenin-RNAi, F-actin depletion across the mesoderm still occurs, but junctional F-actin levels are not increased around the midline. While some explanations are offered in the text, the reason for this phenotype seems important for the story. The text in lines 204-205 suggests that F-actin that would normally be localized to the apical junctions is instead being drawn into medioapical actomyosin foci. Is this idea supported by evidence that medioapical F-actin in control embryos is lower than in alpha-catenin RNAi?
      5. Figure 6A: there is a correlation between cell position and the productivity of myosin pulses, which the authors attribute to the RhoA gradient. This should be more definitively demonstrated by:
        • a. Plot and calculate the correlation between RhoA levels (measured with the RhoA probe) and the change in cell area caused by a contraction pulse. Is this a significant correlation?
        • b. How does myosin persistence change when RhoA is manipulated, e.g. in RhoA overexpressing embryos or in RhoA RNAi?

      MINOR

      1. The authors should indicate if the myosin shown in Figure 1A is junctional or medioapical. If it is junctional, does medioapical myosin better match junctional F-actin and cell areas? Similarly, if they are showing medioapical myosin, how does junctional myosin compare to junctional actin? It seems to me that consistently comparing the patterns of junctional F-actin and medioapical myosin (and RhoGEF2, RhoA, and ROCK in Figure 4) could be somewhat misleading, as the pools compared localize in different subcellular compartments.
      2. Most of the intensity traces for myosin and F-actin are presented as normalized intensity, relative to the highest intensity in the trace. However, there are claims throughout the text about the relative levels of myosin (ex. Line 241) and F-actin (conclusions based on Fig. 2B-D) that should be supported by quantification. It seems that changes in intensity for both F-actin and myosin, in addition to shape of the gradient, would contribute to the understanding of actomyosin regulation in this tissue. However, if intensities cannot be directly compared between groups due to variation in imaging settings or staining protocols, there should be no claims made about changes in overall F-actin or myosin intensity.
      3. The significance of the correlation in Figure 7E should be quantified.
      4. Supplemental Figure 2: does the segmentation image match the second Z reslice immediately above? It does not appear so, or perhaps they are just not aligned. Having the two match would be more convincing of the segmentation technique.

      Significance

      The authors have conducted an elegant quantitative analysis of the distribution of actin, myosin and several of their regulators across the tissue. The study makes an attempt at integrating a large amount of information into a model of tissue folding, and the concept of mechanical gradients is exciting and still underexplored. I am concerned that the interpretation of some results focuses on specific details but ignores larger scale effects (e.g. potential effects of some of the manipulations on the ectoderm, and the impact that that could have on tissue folding are largely ignored). The statistical analysis of several results should also be improved.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Mesoderm invagination during Drosophila gastrulation has been a paradigm for how regionally restricted gene expression locally activates Rho signalling and for how subsequently activated acto-myosin drives cell shape changes which in turn lead to a change in tissue morphology. Despite the numerous studies on this subject and a good understanding of the overall process, several important aspects have remained elusive so far. Among these is the dynamics of cortical and junctional F-actin and its contribution to the shape changes of cells and tissue. Previous studies have focused on MyoII, the „active" component of the actin cytoskeleton. The dynamics of the „passive" counterpart, namely actin filaments, has been neglected, although it is clear that Rho signalling controls both branches.

      1. Although I clearly acknowledge the efforts taken by the authors to define a function of cortical (junctional) F-actin in apical constriction and furrow formation, several central aspects of the study are not sufficiently resolved and conclusive. Rho signalling controls MyoII via Rok and F-actin via forming/dia, among other less defined targets. The role of MyoII and cortical contraction could be conclusively sorted out, since inhibition of Rok affects the MyoII branch but not the other branches. A similar approach, i. e. a specific inhibition/depletion without affecting the other branch, has not been taken yet for the F-actin branch. The authors have not resolved this issue. When analysing the mutants, the authors cannot distinguish the effect of Rho signalling on the MyoII and F-actin branch. For this reason the changes in F-actin distribution in the mutants are linked to changes in Myo activity and thus a function cannot be assigned to F-actin. In order to derive a specific role of F-actin distribution for furrow formation, the authors need to find experimental ways to affect F-actin levels without affecting MyoII, for example by analysing mutants for dia or other formins.
      2. The authors employ a discontinuous spatial axis by the cell number. Although there are good arguments to understand and treat the cells as units, there are also good arguments for using a scale with absolute distance. I have doubts that the graded distributions presented by the authors are a result of this scaling with cell units. When looking at panel B of Fig 1 or Fig. 2A,B, for example, a sharp step like distribution is visible at the boundary between mesoderm and ectoderm anlage. In contrast a F-actin intensity distribution is graded after quantification. The graded distribution appears not to be a consequence of averaging because an even sharper step is very obvious in a projection along the embryonic axis as shown in panel B and D of Fig. 2, for example. The difference of a sharp step in the images and graded distribution after quantification with a spatial axis in cell number, is obvious for a-catenin in Fig. 3D and Rho signalling in Fig. 4 B. As the authors base their central conclusion (see headline) on the graded distribution, resolving the issue of spatial scale is a prerequisite of publication.
      3. The authors put the spatial distribution of Rho signalling and F-actin into the center of their conclusion. They do so by affecting the pattern with mutants in twist/snail and varying upstream factors of Rho signalling. With respect to myo activation this have been done previously although possibly with less detail and it is no new insight that the width of the mesoderm anlage and corresponding Rho signalling domain has a consequence on the shape of the groove and furrow. To maintain the conclusion of the manuscript that spatially graded Rho signalling is contributes to tissue curvature, more convincing ways to change the pattern of Rho signalling are needed. Changing the balance of GEF and GAP shows the importance of Rho signalling and possibly signalling levels but not the contribution of its spatial distribution.

      Significance

      The question of a contribution of F-actin is addressed in this manuscript. The authors quantify F-actin in fixed and living embryos at two prominent steps in ventral furrow formation, (1) shortly prior to onset of apical constrictions and (2) when the groove has formed. They distinguish junctional and „medial" cortical F-actin. They employ a discontinuous spatial axis, the number of cells away from the ventral midline but not an absolute scale (see my notes below). The measurements are applied to wild type and mutant embryos affecting the transcriptional patterning (twist, snail), adherens junctions, and Rho signalling. The authors claim to reveal by their measurements a graded distribution of F-actin intensities with a peak at the ventral midline and a second peak at the boundary between mesoderm and ectoderm with a low point in the stretching cells of the mesectoderm. The authors further claim to reveal a graded distribution of Rho signalling components within the mesoderm anlage. Based on these data the authors conclude that graded Rho signalling and depletion of F-actin promote tissue curvature.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript entitled 'Combinatorial patterns of graded RhoA activation and uniform F-actin depletion promote tissue curvature' by Denk-Lobnig et al. the authors study the organisation of junctional F-actin during the process of mesoderm invagination during gastrulation in the model Drosophila. Following on from previous work by the same lab that identified and analysed a multicellular myosin II gradient across the mesoderm important for apical constriction and tissue bending, the authors now turn their attention to actin. Using imaging of live and fixed samples, they identify a patterning of F-actin intensity/density at apical junctions that they show is dynamically changing going into mesoderm invagination and is set up by the upstream transcription factors driving this process, Twist and Snail. They go on to show, using genetic perturbations, that both actin and the previously described myosin gradient are downstream of regulation and activation by RhoA, that in turn is controlled by a balance of RhoGEF2 activation and RhoGAP C-GAP inactivation. The authors conclude that the intricate expression patterns of all involved players, that all slightly vary from one another, is what drives the wild-type distinctive cell shape changes in particular rows of cells of the presumptive mesoderm and surrounding epidermis.

      This is a very interesting study analysing complex and large-scale cell and tissue shape changes in the early embryo. Much has been learned over the last decade and more about many of the molecular players and their particular behaviours that drive the process, but how all upstream regulators work together to achieve a coordinated tissue-scale behaviours is still not very well understood, and this study add important insights into this.

      The experiments seem well executed and support the conclusion drawn, but I have a few comments and questions that I feel the authors should address to strengthen their argument.

      General points:

      1. The authors state early on that they chose to focus on junctional rather than apical medial F-actin, but it is unclear to me really what the rationale behind that is. In much of the authors earlier work, they study the very dynamic behaviour of the apical-medial actomyosin that drives the apical cell area reduction in mesodermal cells required for folding. They have previously analysed F-actin in the constricting cells, but have only focused on the most constricting central cell rows (Coravos, J. S., & Martin, A. C. (2016). Developmental Cell, 1-14). The role of junctional F-actin compared to the apical-medial network on which the myosin works to drive constriction is much less clear, it could stabilize overall cell shape or modulate physical malleability or compliance of cells, or it could more actively be involved in implementing the 'ratchet' that needs to engage to stabilise a shrunken apical surface. I would appreciate more explanation or guidance on why the authors chose not to investigate apical-medial F-actin across the whole mesoderm and adjacent ectoderm, but rather focused in junctional F-actin, especially explaining better throughout what they think the role of the junctional F-actin they measure is.
      2. Comparing the F-actin labeling in the above previous paper to the stainings/live images shown here, they look quite different. This is most likely due to the authors here not showing the whole apical area but focusing on junctional, i.e. below the most apical region. It is not completely clear to me from the paper at what level along the apical-basal axis the authors are analysing the junctional F-actin. Supplemental Figure 2 seems to suggest about half-way down the cell, which would be below junctional levels. Could the authors indicate this more clearly, please? Overall, I would appreciate if the authors could supply some more high-resolution images of F-actin from fixed samples (which I assume will give the better resolution) of how F-actin actually looks in the different cells with differing levels. Is there for instance a visible change to F-actin organisation? And could this help explain the observed changes in intensity and their function?
      3. Along the same lines of thought as in point 2): Dehapiot et al. (Dehapiot, B., ... & Lecuit, T. (2020). Assembly of a persistent apical actin network by the formin Frl/Fmnl tunes epithelial cell deformability. Nature Cell Biology, 1-21) have recently shown for the process of germband extension and amnioserosa contraction that two pools of F-actin can be observed, a persistent pool not dependent on Rho[GTP] and a Rho-[GTP] dependent one. Could the authors comment on what they think might occur in the mesoderm, are similar pools present here as well?
      4. As the authoirs state themselves, Rho does not only affect actin via diaphanous, but of course also myosin via Rock. So it would be good to refelect this more in the interpretation and discussion of data, as the causal timeline could be complex.

      More specific comments to experiments and figures:

      1. Reduction of junction function by alpha-catenin-RNAi: how strong is the reduction in catenin? Could they label a-catenin in fixed embryos? The authors conclude the original pre-constriction patterning of F-actin intensity is not dependent on intact junctions, but they show that the increase in F-actin in the mesodermal cells concomitant with apical constriction is in fact impaired in the RNAi. Thus, the authors can also not conclude whether the continued accumulation of myosin and its persistence depend on intact junctions. The initial set-up of the myosin gradient in terms of intensity distribution is unaffected, but clearly dynamics, subcellular pattern, interconnectivity etc. of myosin are affected and thus may well depend on some mechanical feed-back. I find this section of the manuscript slightly overstated and feel the conclusion should be more cautious.
      2. Figure 1 versus Figure 2: Why do the Utrophin-ABD virtual cross-sections look so fuzzy and bad in comparison to phalloidin labelled F-actin in the virtual cross-section in Fig. 1B and C? The labelling shown in 2B and D does not even look very junctional...
      3. Figure 5 C and D: the control gradients for myosin shown in C and D are completely different, for C the half-way height cell row is deduced as 5 whereas the (in theory identical) control measure in D has row 3 at halfway height! Why is this? Putting all curves together in the same panel would suggest that that C control curve is very similar to RhoGEF2-OE! This can't be right.
      4. Still in Figure 5: Panels C and D again, but for apical area: are the control and C-GAP-RNAi or RhoGEF2-OE curves significantly different? What statistics were used on this?
      5. Supplemental Figure 1: Panels in D: I appreciate this control, but would really also like to see the same control at a stage when folding has commenced and stretched cells are present at the margin of the mesoderm. How homogenous does the GAP43 label look in those?
      6. Supplemental Figure 5: Panel 5 B: the authors conclude that the myosin gradient under RhoGEF2 RNAi is not smaller, but looking at the curves it in fact looks wilder. They also mention that the overall level of myosin in this condition is lower than the control...
      7. Following on from the above, a comment of Figure 7:
        • The authors use RhoGEF2 RNAi stating that it affects the actin pattern, but the myosin pattern also seems affected. In line 318 the authors state that they use this condition to look at how junctional actin density affects curvature. I find this phrase misleading as It might lead the readers to think that RHoGEF2 RNAi only affects junctional F-actin, although it also affects myosin patterning.
        • Line 311: confusingly, the authors state that an increase in the actomyosin gradient affects curvature.But it is only the myosin gradient that is increased, while the junctional actin gradient is flatter than the control in both C-GAP RNAi and RhoGEF2 OE (the distinction is even made by authors line 243). Could this be clarified?

      Significance

      Morphogenesis of organs, and how these highly coordinated processes are driven by transcriptional events, local control (of for instance cytoskeletal behaviour), is a major field in developmental and cell biology. Advances over the last decade have led to a much better understanding of the role of myosin (in the form of actomyosin) in defining cell and therefore tissue shape in morphogenesis. The role and control fo actin organisation, that the myosin depends upon for its action, is much less understood. Thus this study will add an important piece of understanding of the basic control of morphogenesis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their enthusiastic support for our work and their insightful comments and suggestions which we believe strengthen the manuscript. Below we detail how we propose to respond to each of the specific points raised by each reviewer.

      Reviewer #1__

      1). It is convincingly shown that adding insulator elements (cHS4) reduces crosstalk between the two PAX6 CREs tested (Fig. 3). However, it is unclear if this approach will work for other CREs. This point should be discussed, and perhaps the authors could give some troubleshooting advice (e.g. adding more insulators or trying different insulator elements?).

      We will include these suggestions in the discussion and describe some ongoing efforts to characterise another insulator element in our assay.

      2). All CREs used in proof-of-concept experiments in this work have well known activities in zebrafish embryos. A new/uncharacterized CRE has not been tested yet using this system. It is unclear from the workflow (Fig. 1B) what happens if the CRE does not drive detectable levels of EGFP/mCherry. How does one determine whether lack of reporter expression is due to technical problem (with the transgene or phiC31 integration) or that the CRE is not active in zebrafish? Perhaps adding a PCR-based genotyping step could address this potential problem?

      We will include a PCR-based genotyping assay in the description of the assay pipeline and discuss its utility in assessing successful integration events as suggested by the reviewer.

      3). Other limitations of the system should also be discussed. For example, the system appears to be useful for identifying variant CREs that result in a change (either loss or gain) of temporal or spatial activity, but it is not clear how subtle changes in expression level (either slightly increased or decreased) would be identified or quantified. Perhaps other approaches could be used in combination with this system to fully analyze mutant CRE activity. Another limitation is that this approach is only be applicable to CREs that are active in the first few days of zebrafish embryonic development.

      We will include these suggestions in the discussion and clearly address the limitations of the system

      **Minor points:**

      i) Although it is discussed in the previous work published in PLoS Genetics, it is probably worth mentioning here why the gata2 minimal promoter was chosen for the reporter system.

      The choice of the gata2 promoter in our constructs was based on previously published work from our group. We will re-iterate and reference these studies in the workflow description.

      1. ii) It would be helpful if the cSH4 element is briefly described (e.g. "insulator element") in Fig.1 legend. We will modify the figure legend according to the suggestion.

      3). It is not clear from the manuscript whether the new reagents reported here-including dual reporter vectors and transgenic attB landing site zebrafish strains-will be made available to the scientific community, or how these reagents would be distributed.

      We would include a section describing our plans for distribution of reagents and tools described in the manuscript. All the vectors would be deposited in Addgene for distribution and all the zebrafish lines would be openly shared with the scientific community.

      Reviewer #2:

      1. The dual reporter system uses EGFP and mCherry to report the activities of two different CREs in the same animal. However, EGFP and mCherry have drastically different fluorescence properties which have not been measured particularly well in vivo and especially not in zebrafish. They have different maturation times (mCherry is much quicker). Both are quite stable in vivo, but mCherry is particularly stable in cell culture and in vivo, even resisting lysosomal degradation (EGFP does not - it is acid and protease sensitive) (Katayama et al., 2008; McWilliams et al., 2016). Often, promoter activity assays in zebrafish employ short lived "destabilized" FPs, such as destabilized GFP and destabilized dsRed. With stable FPs, false positives could be reported due to the fluorescent signal remaining for a long period of time after promoter activity has ceased. Replacing the traditional FPs with destabilized versions could be one way to improve the temporal resolution of this assay. This is probably not necessary to do in the present study but might be a worthy future direction.

      We would discuss these points in the possible limitations of our assay and will also endeavour to incorporate these suggestions in future versions of our assays.

      However, no matter which pair of FPs is chosen, there will be differences in signal intensity/brightness and decay rate. Thus, the FP swap experiments should be employed for any experiment claiming a temporal (Fig. 4) or quantitative (Fig. 5) difference between CRE activation or deactivation. If the EGFP/mCherry swap experiments show the same results, the confidence in the assay will be significantly bolstered.

      We estimate the proposed experiments to take about 4 months to allow for molecular cloning of the FP swapped constructs, injection into the "landing" strain, raising to sexual maturity (2.5 mo), screening for founders, and performing the imaging. These are the only two suggested experiments I would need to feel confident in the results and to recommend publication

      We appreciate the reviewer’s suggestion but would point out that we included dye-swaps for the PAX6-CREs described in Figure 3 in this manuscript. The dye-swap experiment for SBE2WT/SBE2Mut were described in our previous work published in Plos Genetics. However, to increase the confidence of the readers in our current system we would include the other suggested dye swaps in the revised version of our manuscript.

      Reviewer #3:

      **Major comments**

      1. First, given the importance of quality landing lines for the methodology, I would like to see more clarity and emphasis on validation of the Shh-SBE2 landing pad in the main text. Based on supplemental tables 1 and 2, this reviewer is somewhat unclear on whether there is one or three lines with Shh-SBE2 based landing pads (one site is mentioned in table 1, but table 3 mentions three F0 lines, and the text is ambiguous). The authors also state that the Shh-SBE2 landing pad is a single copy integration, but the data supporting this conclusion does not appear to be included (linker mediated PCR does not rule out other integrations).

      We will provide a detailed description of the landing lines addressing all the concerns raised by the reviewer.

      It would also be useful to have more clear numbers indicating the reproducibility of the expression pattern in F1 animals. Do 100% of F1 progeny from multiple crosses show the integration show the expression pattern in image 2 A? If there is variability how much, and how many fish were examined? This reviewer also wonders whether appropriate expression of Shh-SBE2 in this landing site is enough to call it neutral. For example, perhaps position effects might be observed with a different weaker CRE in this site? Better documentation will allow for more widespread and appropriate use of the landing pad.

      We will expand the description for the part of the pipeline the reviewer is referring to, providing the details of transgene segregation.

      Similar concerns apply to the integration of test constructs. To evaluate the practicality of the approach, it would be useful to have numbers reporting the frequency of recovering F1 individuals with PhiC mediated integration of the reporter into the desired landing site. It is also important to provide better documentation of the degree of reproducibility in expression patterns between F1 progeny. Numbers of embryos imaged and fraction with the indicated expression pattern are needed for all data in the main text. At minimum, gross expression patterns should be examined in at least 10 F1 larvae. If there is variability between individuals, some image documentation of this in supplementary data would be welcome.

      We will include the suggested information in the results and provide the supplementary data as suggested by the reviewer.

      **Minor comments:**

      i) For figure 1, it may be clearer to present generation of the landing pad lines and screening of CRES using these lines in separated figure panels (B) for generation of landing pads, and (C) for CRE analysis.

      We will modify figure 1 as suggested.

      ii) Landing pads that were less effective might also be moved out of figure 2, to the supplemental material to help improve clarity and to allow for focus on the tools with the most utility

      We will modify figure 2 as suggested.

      iii) Scale bars should be included in all images,

      This will be done for all the images

      iv) In some cases, image labeling somewhat obscures the relevant features

      We will rectify this in the revised version

      v) To help evaluate consistency, in all relevant figures (4, 5, sup fig 3 ect) the number of embryos examined should be included in the legend.

      We will modify the figure legends to include this information

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The manuscript by Bhatia addresses a longstanding need for rigorous methods to directly compare the effectiveness of cis-regulatory elements (CREs) during vertebrate embryogenesis. The manuscript describes a method for simultaneous quantitative assessment of the spatial and temporal activity of wild-type and mutant CREs using live imaging in zebrafish embryos. The approach takes advantage of a predefined neutral docking site, and dual-CRE reporter cassette that can be integrated into this site using PhiC31. Using this method, the authors demonstrate subtle differences in the spatial and temporal dynamics of two shh CREs that have been previously reported to have similar domains of activity, and they demonstrate changes in CRE activity in embryos harboring a disease specific mutation in the SBE2 CRE.

      Major comments

      Overall this manuscript describes a valuable tool and key conclusions regarding its need and utility convincing. However, some additional documentation of methods and key reagents, and numbers would be of value.

      First, given the importance of quality landing lines for the methodology, I would like to see more clarity and emphasis on validation of the Shh-SBE2 landing pad in the main text. Based on supplemental tables 1 and 2, this reviewer is somewhat unclear on whether there is one or three lines with Shh-SBE2 based landing pads (one site is mentioned in table 1, but table 3 mentions three F0 lines, and the text is ambiguous). The authors also state that the Shh-SBE2 landing pad is a single copy integration, but the data supporting this conclusion does not appear to be included (linker mediated PCR does not rule out other integrations). It would also be useful to have more clear numbers indicating the reproducibility of the expression pattern in F1 animals. Do 100% of F1 progeny from multiple crosses show the integration show the expression pattern in image 2 A? If there is variability how much, and how many fish were examined? This reviewer also wonders whether appropriate expression of Shh-SBE2 in this landing site is enough to call it neutral. For example, perhaps position effects might be observed with a different weaker CRE in this site? Better documentation will allow for more widespread and appropriate use of the landing pad.

      Similar concerns apply to the integration of test constructs. To evaluate the practicality of the approach, it would be useful to have numbers reporting the frequency of recovering F1 individuals with PhiC mediated integration of the reporter into the desired landing site. It is also important to provide better documentation of the degree of reproducibility in expression patterns between F1 progeny. Numbers of embryos imaged and fraction with the indicated expression pattern are needed for all data in the main text. At minimum, gross expression patterns should be examined in at least 10 F1 larvae. If there is variability between individuals, some image documentation of this in supplementary data would be welcome.

      Presumably nearly all of this data has already been collected during validation of the tools and just isn't reported clearly, so these updates would not require significant time or cost.

      Minor comments:

      With respect to clarity, while the authors do an excellent job of explaining the rational for their system, the details of execution in the manuscript can be difficult to follow at times, below are minor suggestions to help the reader follow more easily.

      For figure 1, it may be clearer to present generation of the landing pad lines and screening of CRES using these lines in separated figure panels (B) for generation of landing pads, and (C) for CRE analysis.

      Landing pads that were less effective might also be moved out of figure 2, to the supplemental material to help improve clarity and to allow for focus on the tools with the most utility

      Scale bars should be included in all images,

      In some cases, image labeling somewhat obscures the relevant features

      To help evaluate consistency, in all relevant figures (4, 5, sup fig 3 ect) the number of embryos examined should be included in the legend.

      Significance

      This manuscript is significant as if provides useful tools for direct comparison of CRE activity in stable transgenic embryos, where two CREs are integrated into a single genomic location. The method offers an advance in efficiency and rigor compared to past approaches. As a zebrafish researcher, it is easy to recognize the value of having a transgenic line with a validated neutral landing site for transgene analysis, and having a well-designed construct for detailed in vivo comparison of CRE activity.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This study presents a dual fluorescent protein (FP) reporter system to determine differential activities of Cis regulator elements (CREs) on transcription factor behavior in an in vivo setting. The strategy uses the PhiC31 system to ensure single copy insertion into a consistent genomic locus and is an important improvement over the authors' previous work using a similar system with random genomic integration and separated FP constructs. Because different genomic loci are more accessible than others, comparing the activities of randomly inserted CREs cannot be quantitative and requires generation and comparison of multiple lines for each CRE to validate. The bulk of this study is validation of the new specifically inserted, dual FP system including showing that including insulator sequences between the CREs of interest is necessary to prevent crosstalk. The last two figures demonstrate the utility of the system to interrogate spatial and temporal regulation of CRE variants and the quantitative expression levels of a mutant and WT CRE pair. This is an exciting tool with clear potential to uniquely compare CRE activities in vivo, and the results are clearly presented. However, given that the impact of this study is as a technical improvement over previous methods and that it is aimed to demonstrate the robustness and utility of the reporter system, additional controls are necessary to demonstrate that FP choice does not influence the temporal or quantitative readouts.

      The dual reporter system uses EGFP and mCherry to report the activities of two different CREs in the same animal. However, EGFP and mCherry have drastically different fluorescence properties which have not been measured particularly well in vivo and especially not in zebrafish. They have different maturation times (mCherry is much quicker). Both are quite stable in vivo, but mCherry is particularly stable in cell culture and in vivo, even resisting lysosomal degradation (EGFP does not - it is acid and protease sensitive) (Katayama et al., 2008; McWilliams et al., 2016). Often, promoter activity assays in zebrafish employ short lived "destabilized" FPs, such as destabilized GFP and destabilized dsRed. With stable FPs, false positives could be reported due to the fluorescent signal remaining for a long period of time after promoter activity has ceased. Replacing the traditional FPs with destabilized versions could be one way to improve the temporal resolution of this assay. This is probably not necessary to do in the present study but might be a worthy future direction. However, no matter which pair of FPs is chosen, there will be differences in signal intensity/brightness and decay rate. Thus, the FP swap experiments should be employed for any experiment claiming a temporal (Fig. 4) or quantitative (Fig. 5) difference between CRE activation or deactivation. If the EGFP/mCherry swap experiments show the same results, the confidence in the assay will be significantly bolstered.

      We estimate the proposed experiments to take about 4 months to allow for molecular cloning of the FP swapped constructs, injection into the "landing" strain, raising to sexual maturity (2.5 mo), screening for founders, and performing the imaging. These are the only two suggested experiments I would need to feel confident in the results and to recommend publication.

      Significance

      The impact of this study is as a technical improvement over previous methods and is aimed to demonstrate the robustness and utility of the reporter system.

      The manuscript is geared towards zebrafish experts with an interest in the imaging of intracellular and transcriptional processes.

      Our laboratory has expertise in zebrafish developmental genetics and live imaging of reporters.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This is a technical manuscript that describes a new transgenic reporter system in zebrafish that is designed to simultaneously test the activity of two cis-regulatory elements (CREs) in the same living embryo. This is an extension of previous work from the authors that established methods to compare two CREs in transgenic zebrafish embryos (published in PLoS Genetics; DOI: 10.1371/journal.pgen.1005193). Here, to address the problem of position effects caused by random transgene integration, the authors have created a dual reporter transgene that can be integrated into a specific neutral site (using phiC31 recombination) in the zebrafish genome. Expression of different fluorescent proteins (EGFP and mCherry) are regulated by two CREs of interest in the zebrafish embryo, which allows visualization of the temporal and spatial activity of the CREs in real time during embryonic development. The authors propose this system could be used to directly compare wild-type and mutant CREs, and then provide several lives of evidence that establish proof-of-concept. Overall, the results are clearly presented, and the conclusions are convincing. The description of methods (including supplemental tables) is extensive, which will facilitate reproducibility. The manuscript is succinct, and describes a useful approach to characterize CREs. However, I have a few points for the authors to consider:

      Major points:

      1)It is convincingly shown that adding insulator elements (cHS4) reduces crosstalk between the two PAX6 CREs tested (Fig. 3). However, it is unclear if this approach will work for other CREs. This point should be discussed, and perhaps the authors could give some troubleshooting advice (e.g. adding more insulators or trying different insulator elements?).

      2)All CREs used in proof-of-concept experiments in this work have well known activities in zebrafish embryos. A new/uncharacterized CRE has not been tested yet using this system. It is unclear from the workflow (Fig. 1B) what happens if the CRE does not drive detectable levels of EGFP/mCherry. How does one determine whether lack of reporter expression is due to technical problem (with the transgene or phiC31 integration) or that the CRE is not active in zebrafish? Perhaps adding a PCR-based genotyping step could address this potential problem?

      3)Other limitations of the system should also be discussed. For example, the system appears to be useful for identifying variant CREs that result in a change (either loss or gain) of temporal or spatial activity, but it is not clear how subtle changes in expression level (either slightly increased or decreased) would be identified or quantified. Perhaps other approaches could be used in combination with this system to fully analyze mutant CRE activity. Another limitation is that this approach is only be applicable to CREs that are active in the first few days of zebrafish embryonic development.

      Minor points:

      1)Although it is discussed in the previous work published in PLoS Genetics, it is probably worth mentioning here why the gata2 minimal promoter was chosen for the reporter system.

      2)It would be helpful if the cSH4 element is briefly described (e.g. "insulator element") in Fig.1 legend.

      3)It is not clear from the manuscript whether the new reagents reported here-including dual reporter vectors and transgenic attB landing site zebrafish strains-will be made available to the scientific community, or how these reagents would be distributed.

      Significance

      This work introduces a new method to analyze cis-regulatory element (CRE) activity in vivo. By generating transgenic zebrafish with a neutral phiC31 landing site for reporter transgene integration, this work improves on previous methods by overcoming the problem of position effects caused by random transgene integration. This will be useful approach to characterize CREs during embryonic development, and variant CREs associated with human disease. This paper will be of interest to developmental biologists, and geneticists trying to understand CRE activity. I have expertise in zebrafish genetics, with extensive experience using Tol2 transgenesis, and some experience using phiC31 recombination. The described experimental approach here is straightforward, and will be easy to apply in labs with experience in zebrafish transgenesis, and imaging fluorescent protein expression in embryos.

  2. Nov 2020
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer__ #1 (Evidence, reproducibility and clarity (Required)):__ Septins are highly conserved small GTPase cytoskeletal proteins that function as molecular scaffolds for dynamic cell wall and plasma membrane-remodeling, as well as diffusion barriers restricting movement of membrane and cell wall-associated molecules. Recent work has started to unravel the functional connections between the septins, cell wall integrity MAPK pathway signaling, and lipid metabolism, however most studies have focused on a small sub-set of septin monomers and/or were conducted in primarily yeast-type fungi. Here the authors show in the filamentous fungus A. nidulans that the core hexamer septins are required for proper coordination of the cell wall integrity pathway, that all septins are involved in lipid metabolism. Especially sphingolipid, but not sterols and phosphoinositides, contributes to the localization and stability of core septins at the plasma membrane. The experiments are simple and clear, therefore the conclusion is convincing. Fig.8 model, I would like to see the situation of septin mutant.

      We thank the reviewer for the positive comments. In response to the request from this reviewer and a similar one from reviewer 2 for more on the effect of the loss of individual septins, we added text clarifying the roles of core hexamer, core octamer and noncore septins throughout the manuscript including in the legend to Fig 8 (li 439-444) and the discussion (li 388-402). Please see responses to reviewer 2 comments for more detail.

      Reviewer #1 (Significance (Required)):

      Since localization of cell wall synthesis proteins, lipid domains and septins are likely to depend on each other, sometimes difficult to evaluate the effect is direct or indirect. The comprehensive analyses like performed here are helpful to catch the overview in the field.

      Reviewer__ #2 (Evidence, reproducibility and clarity (Required)):__ **Summary** The study by Mela and Momany describes the function of core septins of A. nidulans and links with the requirement of the cell wall integrity pathway and the sphingolipids which, are required for membrane and cell wall stability. The study is of interest for the fungal genetics community, and the authors have conducted a substantial amount of work in a field they have substantial experience. However, one of the main weaknesses of the manuscript is the assumption whether the CWI pathway controls de septin function of if the core septins control it.

      We agree that while our data clearly indicate interactions between the septins and the CWI pathway, which component controls the other is not clear. We have modified the text to address this concern in several places as detailed in responses to the reviewer’s specific comments below.

      **Major comments** In the abstract, the authors claim that double mutant analysis suggested core septins function downstream of the final kinase of the cell wall integrity pathway. However, from the experiments showed, it is difficult to be convinced about that. The authors should make efforts do make it clear in the manuscript and the discussion. For example: -Line 25-26 (abstract): "Double mutant analysis suggested core septins function downstream of the final kinase of the cell wall integrity pathway."

      We agree that while the double mutant analysis shows interaction of septins with the CWI pathway, the evidence for them being downstream is not strong. We have revised the abstract as follows:

      Li29-30: Double mutant analysis with Δ**mpkA suggested core septins interact with the cell wall integrity pathway.”

      -Line 181-182; 219-220 (results) "Double mutant analyses suggest core septins modulate the cell wall integrity pathway downstream of the kinase cascade." This conclusion is one of the most important of the manuscript. However, this reviewer argues that it cannot be convincingly addressed if at least the phosphorylation ok the MAP kinase MpkA in the septins background is not evaluated under conditions of cell stress and sphingolipid biosynthesis inhibition. The genetic analysis alone maybe not enough to infer if septins control the CWI or the other way around. There may have compensatory effects when the CWI pathway is impaired. For example, most of the septins and mpkA double mutants seems to suppress the defect of the delta mpkA under cell wall stress. The authors should consider this idea.

      Although we discuss the epistasis experiments as one possible interpretation, we agree the genetic analysis is not enough to definitively show that the septins are upstream of the CWI pathway or the other way around. The suppression of cell wall defects by deletion of septins in a mpkA null mutant background under cell wall stress suggests a bypass of the CWI pathway for remediation of the cell wall or some other alternate regulatory node. One possible interpretation of these data could be that by inactivation of normal CWI integrity function through deletion of the final kinase, in addition to deletion of septins (possibly acting as negative regulators of CWI components), there may be a parallel node by which cell wall remediation could still occur.

      Wording throughout the abstract, results, and discussion has been modified accordingly.

      Li 29-30: Double mutant analysis with Δ**mpkA suggested core septins interact with the cell wall integrity pathway.

      Li 208-209: Double mutant analyses suggest the core septin aspB cdc3 modulates the cell wall integrity pathway in the ∆mpkA background under cell wall stress.

      Li 221-225: When challenged with low concentrations of CASP and CFW, the ∆aspBcdc3**∆mpkAslt2 and ∆aspE ∆mpkA slt2 mutants were more sensitive than ∆aspBcdc3 and ∆aspE single mutants, but suppressed the colony growth defects of ∆mpkA slt2. The novel phenotype of the double mutants shows that septins are involved in cell wall integrity and raises the possibility that they act in a bypass or parallel node for remediation of cell wall defects (Fig 4).

      Li 227-228: Fig 4. Double mutant analyses suggest core septins modulate the cell wall integrity pathway.

      Li 464-468: Double mutant analyses between septins and CWI pathway kinases also support a role for core septins in maintaining cell wall integrity under stress (Fig 4). Suppression of cell wall defects under cell wall stress by deletion of septins in an ∆mpkA slt2 background suggests a parallel node by which septins negatively regulate cell wall integrity pathway sensors or kinases could exist.

      There is no clear evidences on the manuscript that the core septins AspA, AspB, AspC, and ApsD are epithastic in A. nidulans. Therefore, the authors choice of using different Asp deletion mutants as a proxy for all the septins mutants is questionable. For example, there is no mention of why AspB was chosen for Figure 2 (chitin and β-1,3-glucan deposition), and AspA was chosen for Figure 3 (chitin synthase localization) since these experiments are correlated. The same is true for Figure S1 where AspB and AspE were used. One can wonder if some of the core septins would have a major impact in the chitin content.

      We agree with the reviewer that not all four core septins are equivalent. Previously published work from our lab shows that AspACdc11, AspBCdc3, AspCCdc12, and AspDCdc10 form octamers and that AspACdc11, AspBCdc3, and AspCCdc12 form hexamers, that both of these heteropolymers co-exist, and that the noncore septin AspE is not part of either core heteropolymer, though it appears to influence them possibly through brief interactions (Lindsay et al., 2010; Hernandez-Rodriguez et al., 2012; Hernandez-Rodriguez et al., 2014). This previous work also clearly shows that strains in which the hexameric septins have been deleted (ΔaspA, ΔaspB, and ΔaspC) have very similar phenotypes while strains in which the octamer-exclusive septin has been deleted (ΔaspD) have different phenotypes.

      In our attempt to simplify the current manuscript we discussed the four core septins as a group. In retrospect this caused us to miss important distinctions on the roles of hexamer vs octamer septins and we are grateful to the reviewer for pointing this out. We have modified language throughout the revised manuscript to specify whether results and interpretations apply to core hexamer septins, core octamer septins, the noncore septin, or individual septins. This more detailed analysis has given us several new ideas to test in future work.

      While we cannot exclude the possibility that interesting results might be produced by analyzing null alleles of each individual septin gene for all experiments, we agree with the cross-reference by Reviewer #3 that there is a very low likelihood that we would see different results by analyzing all individual septins within each subgroup (hexamer, octamer or noncore).

      To the reviewer’s questions on choice of septins for Fig 2, Fig 3, and Fig S1:

      ΔaspA, ΔaspB, and ΔaspC showed similar sensitivity to cell wall-disturbing agents in the plate-based assays in Fig 1 and are all part of the core hexamer. We have modified text including the figure legends to make it clear which septins were used in the experiments and which group they belong to.

      In a related comment about Figure 3, the reallocation of chitin synthases in the absence of septins is very interesting, but consider that all the core septin genes should be tested. Without a fully functioning cell wall, the formation of septa will be impaired. It makes their results less surprising.

      In the case of Fig 3, we were unable to recover ChsB-GFP in the ΔaspB or ΔaspC backgrounds but were able to recover it in the ΔaspA background. We have clarified as follows:

      Li184-187: To determine the localization of synthases, a chitin synthase B-GFP (chsB-GFP) strain was crossed with strains in which core hexamer septins were deleted. After repeated attempts, the only successful cross was with core hexamer deletion strain ∆aspA cdc11.

      Figure 3, Panels A and B, chitin was also labeled by Calcofluor White which clearly shows that the formation of septa was not impaired even in the septin null mutant background (this is in agreement with previous work form our lab which shows that septa still forms in individual septin null mutants). The results showed that unlike WT cells, chitin synthase is not only absent in most branch tips in the septin null mutant background, but seems to be limited primarily to longer (presumably actively growing/non-aborted) branches; these findings were surprising to us, considering other major cell wall synthesis events, such as targeting of cell wall synthases to septa during septation appeared to be unimpaired (based on the presence of fully-developed, chitin-labeled septa).

      The labeling of septa by calcofluor is now noted in the legend to Figure 3 as follows:

      Li 201: Calcofluor White labeling shows the presence of the polymer chitin at septa, main hyphal tips, branches, and …

      Why was chitin synthase B chosen to be analyzed in terms of reallocation? How many chitin synthases are in the A. nidulans genome. This rationale should be explained in the manuscript.

      We have added the following:

      Lines 173-182: A. nidulans contains six genes for chitin synthases: chsA, chsB, chsC, chsD, csmA, and csmB. Chitin synthase B localizes to sites of polarized growth in hyphal tips, as well as developing septa in vegetative hyphae and conidiophores, a pattern very similar to septin localization. Deletion of chitin synthase B shows severe defects in most filamentous fungi analyzed thus far, and repression of the chitin synthase b gene expression in chsA, chsC, and chsD double mutants exacerbated growth defects from a number of developmental states observed in each single mutant, suggesting it plays a major role in chitin synthesis at most growth stages (Fukuda et al., 2009). For these reasons, we chose chitin synthase B as a candidate to observe in septin mutant background for possible defects in localization.

      Figure 3 and Figure 4. The authors should make efforts to quantify the phonotypes they claim. They are overall very subtle, especially for Figure 3. Also, a decrease of fluorescence is a tricky observation that should be better reported by quantification.

      Line scans of aniline blue and CFW label were conducted and added as Fig S1. Quantitation was performed and added as Fig S3. See author’s response to Reviewer #3 below for details.

      Again, in Figures 5, 6, and 7, it is clear that the different septins respond differently when ergosterol or sphingolipids synthesis is impaired. It also raises the question again if there are differences in the role of septin genes. Can the authors use previous information about differences in septin function to improve the model (Figure 8)

      As described above, we have modified the manuscript throughout to clarify which phenotypes are seen for core hexamer, core octamer, and noncore septin deletions. As the reviewer notes, these are especially relevant for the sphingolipid-disrupting agents. Our model includes interaction of septins with sterol rich domains that contain both sphingolipids and ergosterol. Because it is not yet clear how subgroups of septins interact with each other and are organized at SRDs, we show all core septins in our model without distinguishing hexamers and octamers in the drawing, but we have now added text to clarify roles and outstanding questions.

      The changes are summarized in the abstract as follows:

      Li 37-40: Our data suggest that the core hexamer and octamer septins are involved in cell wall integrity signaling with the noncore septin playing a minor role; that all five septins are involved in monitoring ergosterol metabolism; that the hexamer septins are required for sphingolipid metabolism; and that septins require sphingolipids to coordinate the cell wall integrity response.

      The clarifications are reflected in the Figure 8 legend (and associated sections of the discussion) as follows:

      Li 436-441: As described in the text, our data suggest that all five septins are involved in cell wall and membrane integrity coordination. The core septins that participate in hexamers appear to be most important for sphingolipid metabolism while all septins appear to be involved in ergosterol metabolism and cell wall integrity. Because SRDs contain both sphingolipids and ergosterol and because it is not yet clear how subgroups of septins interact with each other at SRDs, we show all core septins in our model without distinguishing hexamers and octamers.

      For the above-discussed reasons, the conclusion on lines 384-388 (discussion) is not completely supported by the experiments shown in the manuscript. The authors need to make a better structured and more straightforward story emphasizing the stronger points and reducing descriptions of more speculative points.

      As discussed above, we have made changes throughout the manuscript to clarify which subgroups of septins are involved in which process and to refine our conclusions accordingly. The beginning of the discussion section has been changed as follows:

      Li 384-399: Our data show that A. nidulans septins play roles in both plasma membrane and cell wall integrity and that distinct subgroups of septins carry out these roles. Previous work has shown that the five septins of A. nidulans septins form hexamers (AspACdc11, AspBCdc3, and AspCCdc12) and octamers (AspACdc11, AspBCdc3, AspCCdc12, and AspDCdc10) and that the noncore septin AspE does not appear to be a stable member of a heteropolymer (20). The current work suggests that though all septins are involved in coordinating cell wall and membrane integrity, the roles of hexamers, octamers, and the noncore septin are somewhat different. Core hexamer septins appear to be most important for sphingolipid metabolism, all five septins appear to be involved in ergosterol metabolism, and core septins are most important for cell wall integrity pathway with the noncore septin possibly playing a minor role. As summarized in Figure 8 and discussed in more detail below, our previous and current data are consistent with a model in which: (A) All five septins assemble at sites of membrane and cell wall remodeling in a sphingolipid-dependent process; (B) All five septins recruit and/or scaffold ergosterol and the core hexamer septins recruit and/or scaffold sphingolipids and associated sensors at these sites, triggering changes in lipid metabolism; and (C) The core septins recruit and/or scaffold cell wall integrity machinery to the proper locations and trigger changes in cell wall synthesis. The noncore septin might play a minor role in this process.

      Minor comments Overall the figure caption could be shortened. They are too descriptive and contain details that are easily inferred for the images and from the materials and methods.

      Legends to the following figures have been streamlined by removing portions that belong in the methods: Figure 2, Fig 3, and Fig 6

      The authors made every effort to cove the precedent literature, but the manuscript has 115 references. The authors should evaluate if all the cited literature is extremely relevant. The manuscript would benefit for that conciseness.

      Because this manuscript addresses septins, ergosterol, sphingolipids, cell wall integrity, and multiple different pathways, there is a lot of literature underlying our approaches. Our strong preference is to cite primary literature, however we can shorten our reference list by relying on reviews if requested by the journal.

      Line 124, 493: Replace 10ˆ7, 10ˆ4 to 107, 104, etc

      “10^7” and all other scientific notation was altered to replace carrots “^7” with superscripts “7” throughout.

      The use of fludioxonil as a probe to detect cell wall impairment is perhaps out of context. This drug responds primarily to the HOG pathway and also respond to oxidative damage. So, these results could be suppressed.

      Previous work by Kojima et al., 2006 showed that in addition to the HOG pathway, cell wall integrity is required for resistance to fludioxonil treatment. C. neoformans cell wall integrity mutants bck1, mkk1, and mpk1 (Aspergillus nidulans bckA, mkkA, and mpkA homologues) all exhibit hypersensitivity to fludioxonil, and this was shown to be remediated by the addition of osmotic stabilizers, suggesting cell wall impairment was involved in the growth defect produced by this treatment. Although this drug seems to act primarily through the HOG pathway, the CWI and HOG pathways have been shown to antagonize/negatively regulate one another through a parallel pathway (SVG pathway in yeast) (Lee and Elion, 1999). It has been hypothesized that internal accumulation of glycerol by constitutive activation of the HOG pathway causes decreased cell wall integrity. Due to the apparent cross-pathway control between the HOG and CWI pathways, as well as the high level of conservation of these pathway components in filamentous fungi, we thought this treatment was rightfully dual-purposed to investigate both cell wall impairment in the septin mutants and any possible involvement of the HOG pathway. This seems to be would a reasonable drug treatment to look at cell wall impairment that is not likely to be redundant with the modes of action observed in the other Figure 1 treatments (e.g. CFW, Congo Red, and Caspofungin).

      The text clarifies this point as follows: li 110-112: Fludioxonil (FLU), a phenylpyrrol fungicide that antagonizes the group III histidine kinase in the osmosensing pathway and consequently affects cell wall integrity pathway signaling (Fig 1)(58-67).

      Line 140: "exposure" would be more appropriate than architecture. Please also consider that the difference in the cell wall reported in Figure S1 are very subtle. Are they relevant?

      The differences in the cell wall content reported in Figure S1 (Figure S2 in the revised manuscript) showed that the peak for 4-Glc was almost identical in WT and aspB null mutant, however the overall ratio of peaks switched, where 4-GlcNac content exceeded the 4-Glc content in the mutant compared to WT. By comparison, this was not the case with the septin aspE null mutant. Although this could be considered a ‘subtle’ change in chitin content, we believe this was an important unbiased analysis of the cell wall polysaccharide content and addressed some of the cell wall sensitivity phenotypes we observed, not only between WT and the septin mutants, but also between the septin null mutants which showed sensitivity to cell wall disturbing agents (i.e. aspA, aspB, and aspC) vs. those that did not show significant sensitivity (e.g. aspE). For these reasons we believe this warranted at the very least a supplemental figure for these data.

      Though our idea of cell wall architecture includes changes in polymer exposure, as pointed out by the reviewer, others might use the phrase to mean only content changes. To avoid this misunderstanding, we have replaced the word “architecture” with “organization” in Li 147-148: These data show that cell wall organization is altered in ∆aspB cdc3 and raise the possibility that it might be altered in other core hexamer septin null mutants as well.

      Line 144: explain briefly what it is about and why it was chosen instead of the total detection of chitin sugar monomers. Line 538: Cell wall extraction section. Is this a new method? There is no supporting literature.

      We chose this method because it provides an analysis of all cell wall polysaccharide components and associated linkages. Detection of chitin sugar monomers would have also been a reasonable analysis if this were the only component of the cell wall we were investigating initially. The results showed differences in cell wall chitin content, so these were the data we presented.

      This was addressed on lines 574-576: “Cell walls were isolated from a protocol based on (Bull, 1970); cell wall extraction and lyophilization were conducted as previously described in (Guest and Momany, 2000) with slight modifications listed in full procedure below.”

      The results described on lines 232-257 are marginal to the study and are not exploited by the authors to address the central question of the manuscript, which is the role of the CWI pathway, septins, and sphingolipids. This section could be suppressed or very briefly mentioned in the preceding section.

      We agree that these data did not show any additional involvement of septins in the Calcineurin and cAMP-PKA pathways, and the relevance of the TOR signaling pathway connection is still quite unclear. For this reason, these data were added as a supplemental figure. On the other hand, there are a number of important signaling pathways which have been shown to affect the Cell Wall Integrity pathway directly and indirectly (these three pathways in particular), which is part of the central question of the manuscript. Considering such extensive ‘cross-talk’ between pathways (references produced on Line 65) in filamentous fungi, we felt it necessary to inspect possible involvement of these pathways in septin function via plate-based assays and feel that this s most clearly communicated as its own brief section in the text.

      Reviewer #2 (Significance (Required)): The topic of the manuscript is highly relevant to the fungal biology field and employs a very important genetic model. The cooperation of signaling pathways in mains aspects of fungal physiology is the main significant contribution of this manuscript. Reviewer__ #3 (Evidence, reproducibility and clarity (Required)):__ **Summary:** In this work the authors use genetic analysis in Aspergillus nidulans to identify phenotypes of septin mutants that point to roles for septins in coordinating the cell wall integrity pathway with lipid metabolism in a manner involving sphingolipids. Most of the major conclusions derive from monitoring the effects of combined genetic or chemical manipulations that target specific components of the pathways of interest. Additionally, the authors monitor the subcellular localization of septins, cell-wall modifying enzymes, and components of the cell wall itself. **Major comments:** The key conclusions are convincing, with the unavoidable caveat that null mutations of this sort and chemical inhibitors of these kinds could have unanticipated effects, such as upregulation of unexpected pathways or other compensatory alterations. The authors qualify their conclusions appropriately in this regard. The methods are explained very clearly and the data are presented appropriately. In some cases results are shown as representative images illustrating altered localization of a protein or a cell wall component. The changes observed in the experimental conditions are fairly obvious, but some quantification would not be difficult and would likely make the results even more obvious. For example, the Calcofluor White staining patterns might be nicely quantified by linescans along the hyphal length, and the same is true for AspB-GFP localization upon addition of drugs.

      We thank the reviewer for the positive comments and have made the suggested changes as follows:

      Line scans of aniline blue and CFW label were conducted and added as Fig S1. Text has been modified accordingly (Li 140-147).

      Quantification of Chitin synthase-GFP localization and CFW staining and statistical analysis have now been added as Figure S3 and main text (Li 187-191) has been modified accordingly.

      I could imagine one simple experiment that might generate interesting and relevant results, but by no means would this be a critical experiment for this study. In yeast, exposure to Calcofluor triggers increased chitin deposition in the wall. It would be interesting to know how Calcofluor staining looks in WT or septin-mutant cells that have been growing the presence of Calcofluor for some time, particularly with regard to the localization of chitin deposition in these cells. Such experiments could help connect the idea of septins as sensors of membrane lipid status and also effectors of CWI signaling.

      This is a cool idea that we will pursue in future work. Thanks!

      **Minor comments:** • Body text refers to Figure 1A and 1B but the figure itself does not have panels labeled A or B.

      Figure 1 was revised to show panels A and B labeled clearly.

      • Line 885: "S3" is missing from the beginning of the title of the figure.

      “S” was added to the figure title.

      Reviewer Identity: This is Michael McMurray, PhD, Associate Professor of Cell and Developmental Biology, University of Colorado Anschutz Medical Campus

      Reviewer #3 (Significance (Required)): This is an important conceptual advance in our understanding of septin function because previous work in fungal septins mostly points toward them being important in directing or restricting the localization of other proteins that modify the cell wall or plasma membrane. This new work suggests that septins can play a sensing role, as well. As a fungal (budding yeast) septin researcher myself, I think that other fungal septin researchers would be very interested in these results, and I also think the broader septin community would appreciate it. Additionally, those studying fungal cell wall and plasma membrane biogenesis and coordination, including the Cell Wall Integrity Pathway, will be interested. REFEREES CROSS COMMENTING After reading Reviewer #1's comments, I agree that it would be appropriate to modify the wording of the authors' conclusions about where the septins lie in the CWI pathway (upstream or downstream). While they do mention that there may be other ways to interpret their results, a reader would have to search for the mention of these caveats and if the reader did not, then the strong conclusion statements might be taken as fact.

      The abstract, main text, and discussion have been modified to show that while there is evidence that the septins interact with the CWI pathway, it is not clear which component is upstream vs downstream. See response to reviewer 2 above for details.

      On the other hand, I don't think additional experiments looking at deletions of the other core septins will be worthwhile. I think that there is sufficient evidence to suspect that any single core septin deletion mutant will behave similar to another, and therefore that any one can be taken as representative. While it's possible that the authors might find something informative by looking at other mutants, I personally find the likelihood too low to justify additional experimentation along those lines.

      Based on results from previous work from our lab, there are two subgroups of core septins in A. nidulans (hexamer and octamer) and septins within subgroups appear to behave similarly. The results from the current work support this idea with the same groups of mutants behaving in very similar ways. So, the core hexamer septins, AspACdc11, AspBCdc3, and AspCCdc12 can be used to make predictions about each other, but not about the octamer-exclusive septin AspDCdc10 or the noncore septin AspE. We agree with reviewer 3 that repeating analysis on multiple septins within a subgroup is not likely to give new insight. However, we were not careful in the original version of the manuscript to distinguish between core hexamer and octamer septins. As detailed in the response to reviewer 2 above, we have modified the manuscript throughout to make clear which subgroup of septins were being examined and to put conclusions into this context.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this work the authors use genetic analysis in Aspergillus nidulans to identify phenotypes of septin mutants that point to roles for septins in coordinating the cell wall integrity pathway with lipid metabolism in a manner involving sphingolipids. Most of the major conclusions derive from monitoring the effects of combined genetic or chemical manipulations that target specific components of the pathways of interest. Additionally, the authors monitor the subcellular localization of septins, cell-wall modifying enzymes, and components of the cell wall itself.

      Major comments:

      The key conclusions are convincing, with the unavoidable caveat that null mutations of this sort and chemical inhibitors of these kinds could have unanticipated effects, such as upregulation of unexpected pathways or other compensatory alterations. The authors qualify their conclusions appropriately in this regard.

      The methods are explained very clearly and the data are presented appropriately. In some cases results are shown as representative images illustrating altered localization of a protein or a cell wall component. The changes observed in the experimental conditions are fairly obvious, but some quantification would not be difficult and would likely make the results even more obvious. For example, the Calcofluor White staining patterns might be nicely quantified by linescans along the hyphal length, and the same is true for AspB-GFP localization upon addition of drugs.

      I could imagine one simple experiment that might generate interesting and relevant results, but by no means would this be a critical experiment for this study. In yeast, exposure to Calcofluor triggers increased chitin deposition in the wall. It would be interesting to know how Calcofluor staining looks in WT or septin-mutant cells that have been growing the presence of Calcofluor for some time, particularly with regard to the localization of chitin deposition in these cells. Such experiments could help connect the idea of septins as sensors of membrane lipid status and also effectors of CWI signaling.

      Minor comments:

      • Body text refers to Figure 1A and 1B but the figure itself does not have panels labeled A or B. • Line 885: "S3" is missing from the beginning of the title of the figure.

      Reviewer Identity: This is Michael McMurray, PhD, Associate Professor of Cell and Developmental Biology, University of Colorado Anschutz Medical Campus

      Significance

      This is an important conceptual advance in our understanding of septin function because previous work in fungal septins mostly points toward them being important in directing or restricting the localization of other proteins that modify the cell wall or plasma membrane. This new work suggests that septins can play a sensing role, as well. As a fungal (budding yeast) septin researcher myself, I think that other fungal septin researchers would be very interested in these results, and I also think the broader septin community would appreciate it. Additionally, those studying fungal cell wall and plasma membrane biogenesis and coordination, including the Cell Wall Integrity Pathway, will be interested.

      REFEREES CROSS COMMENTING

      After reading Reviewer #1's comments, I agree that it would be appropriate to modify the wording of the authors' conclusions about where the septins lie in the CWI pathway (upstream or downstream). While they do mention that there may be other ways to interpret their results, a reader would have to search for the mention of these caveats and if the reader did not, then the strong conclusion statements might be taken as fact. On the other hand, I don't think additional experiments looking at deletions of the other core septins will be worthwhile. I think that there is sufficient evidence to suspect that any single core septin deletion mutant will behave similar to another, and therefore that any one can be taken as representative. While it's possible that the authors might find something informative by looking at other mutants, I personally find the likelihood too low to justify additional experimentation along those lines.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The study by Mela and Momany describes the function of core septins of A. nidulans and links with the requirement of the cell wall integrity pathway and the sphingolipids which, are required for membrane and cell wall stability. The study is of interest for the fungal genetics community, and the authors have conducted a substantial amount of work in a field they have substantial experience. However, one of the main weaknesses of the manuscript is the assumption whether the CWI pathway controls de septin function of if the core septins control it.

      Major comments

      In the abstract, the authors claim that double mutant analysis suggested core septins function downstream of the final kinase of the cell wall integrity pathway. However, from the experiments showed, it is difficult to be convinced about that. The authors should make efforts do make it clear in the manuscript and the discussion.

      For example:

      -Line 25-26 (abstract): "Double mutant analysis suggested core septins function downstream of the final kinase of the cell wall integrity pathway."

      -Line 181-182; 219-220 (results) "Double mutant analyses suggest core septins modulate the cell wall integrity pathway downstream of the kinase cascade."

      This conclusion is one of the most important of the manuscript. However, this reviewer argues that it cannot be convincingly addressed if at least the phosphorylation ok the MAP kinase MpkA in the septins background is not evaluated under conditions of cell stress and sphingolipid biosynthesis inhibition. The genetic analysis alone maybe not enough to infer if septins control the CWI or the other way around. There may have compensatory effects when the CWI pathway is impaired. For example, most of the septins and mpkA double mutants seems to suppress the defect of the delta mpkA under cell wall stress. The authors should consider this idea.

      There is no clear evidences on the manuscript that the core septins AspA, AspB, AspC , and ApsD are epithastic in A. nidulans. Therefore, the authors choice of using different Asp deletion mutants as a proxy for all the septins mutants is questionable. For example, there is no mention of why AspB was chosen for Figure 2 (chitin and β-1,3-glucan deposition), and AspA was chosen for Figure 3 (chitin synthase localization) since these experiments are correlated. The same is true for Figure S1 where AspB and AspE were used. One can wonder if some of the core septins would have a major impact in the chitin content.

      In a related comment about Figure 3, the reallocation of chitin synthases in the absence of septins is very interesting, but consider that all the core septin genes should be tested. Without a fully functioning cell wall, the formation of septa will be impaired. It makes their results less surprising.

      Why was chitin synthase B chosen to be analyzed in terms of reallocation? How many chitin synthases are in the A. nidulans genome. This rationale should be explained in the manuscript.

      Figure 3 and Figure 4. The authors should make efforts to quantify the phonotypes they claim. They are overall very subtle, especially for Figure 3. Also, a decrease of fluorescence is a tricky observation that should be better reported by quantification.

      Again, in Figures 5, 6, and 7, it is clear that the different septins respond differently when ergosterol or sphingolipids synthesis is impaired. It also raises the question again if there are differences in the role of septin genes. Can the authors use previous information about differences in septin function to improve the model (Figure 8)

      For the above-discussed reasons, the conclusion on lines 384-388 (discussion) is not completely supported by the experiments shown in the manuscript. The authors need to make a better structured and more straightforward story emphasizing the stronger points and reducing descriptions of more speculative points. Minor comments Overall the figure caption could be shortened. They are too descriptive and contain details that are easily inferred for the images and from the materials and methods.

      The authors made every effort to cove the precedent literature, but the manuscript has 115 references. The authors should evaluate if all the cited literature is extremely relevant. The manuscript would benefit for that conciseness.

      Line 124, 493: Replace 10ˆ7, 10ˆ4 to 107, 104, etc

      The use of fludioxonil as a probe to detect cell wall impairment is perhaps out of context. This drug responds primarily to the HOG pathway and also respond to oxidative damage. So, these results could be suppressed.

      Line 140: "exposure" would be more appropriate than architecture. Please also consider that the difference in the cell wall reported in Figure S1 are very subtle. Are they relevant?

      Line 144: explain briefly what it is about and why it was chosen instead of the total detection of chitin sugar monomers. Line 538: Cell wall extraction section. Is this a new method? There is no supporting literature.

      The results described on lines 232-257 are marginal to the study and are not exploited by the authors to address the central question of the manuscript, which is the role of the CWI pathway, septins, and sphingolipids. This section could be suppressed or very briefly mentioned in the preceding section.

      Significance

      The topic of the manuscript is highly relevant to the fungal biology field and employs a very important genetic model. The cooperation of signaling pathways in mains aspects of fungal physiology is the main significant contribution of this manuscript.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Septins are highly conserved small GTPase cytoskeletal proteins that function as molecular scaffolds for dynamic cell wall and plasma membrane-remodeling, as well as diffusion barriers restricting movement of membrane and cell wall-associated molecules. Recent work has started to unravel the functional connections between the septins, cell wall integrity MAPK pathway signaling, and lipid metabolism, however most studies have focused on a small sub-set of septin monomers and/or were conducted in primarily yeast-type fungi.

      Here the authors show in the filamentous fungus A. nidulans that the core hexamer septins are required for proper coordination of the cell wall integrity pathway, that all septins are involved in lipid metabolism. Especially sphingolipid, but not sterols and phosphoinositides, contributes to the localization and stability of core septins at the plasma membrane.

      The experiments are simple and clear, therefore the conclusion is convincing. Fig.8 model, I would like to see the situation of septin mutant.

      Significance

      Since localization of cell wall synthesis proteins, lipid domains and septins are likely to depend on each other, sometimes difficult to evaluate the effect is direct or indirect. The comprehensive analyses like performed here are helpful to catch the overview in the field.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors use focused-ion beam (FIB) milling coupled with cryo-electron tomography and subtomogram averaging to uncover the structure of the elusive proximal and distal centrioles, as well as different regions of the axoneme in the sperm of 3 mammalian species: pig, horse, and mouse. The in-situ tomograms of the sperm neck region beautifully illustrate the morphology of both the proximal centriole, confirming the partial degeneration of mouse sperm, and intriguingly, asymmetry in the microtubule wall of pig sperm. In distal centrioles, the authors show that in all mammalian species, microtubule doublets of the centriole wall are organized around a pair of singlet microtubules. The presented segmentation of the connecting piece is beautiful and nicely shows the connecting piece forming a nine-fold, asymmetric, chamber the centrioles. The authors further use subtomogram averaging to provide the first maps of the mammalian central pair and identify sperm-specific radial spoke-bridging barrel structures. Lastly, the authors perform further subtomogram averaging to show to the connecting site of the outer dense fibers to the microtubule doublet of the proximal principal piece and confirm the presence of the TAILS microtubule inner protein complex (Zabeo et al, 2018) in the singlet microtubules occupying the tip of sperm tails.

      The manuscript provides the clearest insight into flagellar base morphology to date, giving insight into the morphological difference between different mammalian cilia and centriole types. The manuscript is suitable for publication, once the following questions are addressed.

      We are ecstatic that the reviewer shares our enthusiasm for this work. We are particularly grateful that the reviewer appreciates the significance of the unique, and hitherto under-explored biology of the sperm centrioles and the flagellar base.

      **Major Points:**

      How many centrioles and axonemes were used in generating the averages presented in the paper? If too few samples were used, especially in centrioles undergoing dramatic remodeling or degeneration, the reality of MIPs and MAPs being present might be completely affected. For instance, In figure 1d, the authors present a cryoET map of the centriole microtubule triplet. However, centrioles are divided into several regions with different accessory elements. Here, the authors could show the presence of only part of the A-C linker. The A-C linker covers only 40% of the centriole, so does it mean that this centriole is made only of the accessories that characterize the proximal side of the centriole? In the same line, what were the boundaries governing subtomogram extraction? For example, in the distal centriole, were microtubules extracted from just before the start of the transition zone, to the end of the microtubule vaulting, more pronounced at the end of the proximal region? There are known heterogeneities in centriole, as well as flagella, ultrastructure along the proximal distal axis. If no pre-classification was performed for subtomogram longitudinal position along with the centriole and axoneme, structural features may be averaged out, and or present and not reflecting their real longitudinal localization. The classification should be applied here if it was not the case.

      These are all valid points. Because there is no easy way to target the PC/DC when cryo-FIB milling, and because there is only one of each structure in every cell, the chances of catching them in ~150-nm-thin lamellae are slim (not to mention the number of things that can and do go wrong when doing cryo-ET on lamellae). As such, the averages of the PC were generated from 3 tomograms (3 cells) and those of the DC from 2 tomograms (2 cells). We do have more tomograms with the PC/DC, but these were used for segmentation/visual inspection since we only used the best tomograms for averaging. These numbers are not entirely atypical for cryo-FIB datasets; the only other in situ centriole structures are from 5-6 centrioles (from Chlamydomonas, from Le Guennec et al 2020 doi: 10.1126/sciadv.aaz4137 and Klena et al 2020 doi: 10.15252/embj.2020106246).

      To allow readers to adjust their interpretations according to the small number of cells analysed, we explicitly stated the number of animals/cells/tomograms used to generate averages in Table S1. Furthermore, we amended the text to clarify which regions of the centrioles our averages represent. These changes are detailed below:

      (1) proximal centriole

      The lamellae used for averaging PC triplets caught mostly the proximal end of the centriole, and essentially all of the particles come from the most proximal ~ 400 nm. In a sense, this was a form of pre-classification. We now state explicitly that our structure represents only the proximal region and that proximal/distal differences may be identified in the future (see section on distal centriole below). Despite the limited particle number, we are confident in the presence of the MIPs as these are also visible in the raw data (the striations in Fig. 1a, now Fig. 1d, for instance). Page 7, Line 165 was edited accordingly as well as the legend to Fig. 1.

      (2) distal centriole

      The subtomograms used for the DC average were extracted from the region of the distal centriole closest to the base of the axoneme (i.e; the region marked “distal centriole” in Fig. 2h-i). Because the DC doublet average in Fig. 2j was generated from very few particles, we tried to be very conservative when interpreting it. Page 9, Line 216 was edited accordingly likewise the legend to Fig. 2.

      (3) axoneme

      We did attempt to average the axoneme from different regions of flagella (midpiece, proximal principal piece, distal principal piece). This is shown in Fig. 6d-l. The major difference we found was at the doublet-ODF connection. We did not find any striking differences in MIP densities, or in radial spoke densities along the proximodistal axis. As such, the averages in Fig. 5 are from the entire principal piece (but not the midpiece), which we state in the figure legend.

      Because mammalian sperm flagella are very long, it is possible that we missed more subtle differences. We now state this in the Discussion (page 20, line 491):

      **Minor Points:**

      • In line 3, motile cilia are not only used to swim, they can move liquid or mucus for instance.

      Done. Page 3, line 64

      • In line 175, the authors stated " a prominent MIP associated with protofilament A9, was also reported in centrioles isolated from CHO cells (Greenan et al. 2018) and in basal bodies from bovine respiratory epithelia (Greenan et al 2020). Actually, this MIP has been seen in many other centrioles from other species, such as Trichonympha (https://doi.org/10.1016/j.cub.2013.06.061 ), Chlamydomonas, and Paramecium ( DOI: 10.1126/sciadv.aaz4137 ). Citing these studies will reinforce the evolutionary conservation of this MIP and therefore its potential crucial role in the A microtubule.

      We thank the reviewer for pointing out these very important papers, we added them to the manuscript (page 7, lines 175-176).

      • In Line178, the authors stated: "Protofilaments A9 and A10 are proposed to be the location of the seam (Ichikawa et 2017)". High-resolution cryoEM maps confirmed it: https://doi.org/10.1016/j.cell.2019.09.030 . This publication should be cited. Moreover, authors should also refer to this paper when discussing MIPs in the microtubule doublet.

      Done (page 7, lines 178-179 and page 13, line 329).

      We also now cite Ma et al (along with Ichikawa et al 2019 doi: 10.1073/pnas.1911119116 and Khalifa et al 2020 doi: 10.7554/eLife.52760) in the Discussion when alluding to high-resolution structures as a possible means of identifying MIPs (page 19, lines 479).

      • In Line 187-189 the authors stated, "We resolved density of the A-C linker (gold) which is associated with protofilaments C9 and C10." The A-C linker interconnects the triplets of the proximal centriole (Guichard et. al. 2013, Li et. al. 2019, Klena et. al. 2020) with distinct regions binding the C-tubule, as shown by the authors in gold, as well as an A-link, making contact with the A-tubule through various protofilaments in a species-specific manner, but always on protofilament A9. The authors may have identified the A-link, labeled in green, on the outside of protofilament A8/A9 in Figure 1d.

      We thank the reviewer for pointing this out. The position of the olive green density associated with A8/A9 is indeed consistent with the A-link, and this is also now illustrated more clearly in the new version of Fig. 1e (now Fig. 1h, see below). We accordingly edited page 8, lines 187-188.

      • In figure 1e, the authors provide a 9-fold representation of the centriole based on their map. How relevant is this model ? the distance between triplet is inconsistent here, which has not been observed before. Do they use true 3D coordinates to generate this model? The A-C linker, which is only partially reconstructed, does not contact the A microtubule. Is it really the case? did the authors see that the A-link density of the A-C linker has disappeared? If these points are not clearly specified, this representation might be misleading.

      In order to avoid misleading readers, we replaced this panel with a model generated directly by plotting back the averages into their original positions and orientations in the tomogram (new Fig. 1h). This model now shows that the olive green density on A8/A9 is in the right position to form part of the A-C linker (as Reviewer 1 correctly pointed out in their previous point). We have amended the figure legend accordingly. We also described how the plotback was generated in the Materials and Methods section (page 26, line 648).

      As the reviewer points out, the distance between triplets does indeed seem inconsistent in the plotback. This is an interesting observation, but we feel it is a bit too preliminary to discuss in detail here. This can be explored in a follow-up study more focused on sperm centriole geometry.

      • The nomenclature regarding MIPs is sometimes confusing in this manuscript. For example, in lines 228-229 "We then determined the structure of DC doublets, revealing the presence of MIPs distinct from those in the PC." Does this include the gold and turquoise labeled structures in Figure 2j? These densities appear to correspond to the inner scaffold stem in the gold density presented in Figure 2j, and armA, presented in the turquoise density (Li et. al. 2011, Le Guennec et. al. 2020). The presence of this Stem here is important as it correlates with the presence of the molecular player making the inner scaffold (POC5, POC1B, CENTRIN): https://doi.org/10.1038/s41467-018-04678-8

      While we were initially very conservative with interpreting the DC doublet average (as stated above it comes from very few particles), we agree with the reviewer’s assessment that the gold and turquoise densities in Fig. 2j are consistent with the Stem and armA respectively of the inner scaffold. Because the inner scaffold contributes to centriole rigidity, it will be interesting to determine if and how it changes during remodelling of the atypical DC in mammalian sperm. Intriguingly, at least some inner scaffold components (including POC5, POC1B) reorganise into two rods in the mammalian sperm DC (Fishman et al 2018 doi: 10.1038/s41467-018-04678-8). We expanded the section on the DC average (page 9, lines 218-220):

      • The connecting piece is composed of column vaults emanating from the striated columns is compelling and beautiful segmentation data. However, it is important to note how many pig sperm proximal centrioles had immediate-short triplet side contact with the Y-shaped segmented column 9, as well as in how many mouse centrioles have the two electron-dense structures flanking the striated columns.

      Done. Material and Methods Page 25, lines 615-619.

      The resolution of the mammalian central pair is an important development brought by this work. The structural similarity between the central pair of pig and horse is convincing. However, with only 281 subtomograms being averaged for the murine central pair, corresponding to an estimated resolution of 49Å, the absence of the helical MIP of C1 with 8 nm periodicity suggests that there is simply not enough signal to capture it in the average. The same could be said for the smaller MIP displayed in Figure 4 c, panel ii. This point should be clearly stated.

      We agree with the reviewer that the quality of the mouse CPA structure is not on par with the pig and horse CPA structures. We now explicitly state this caveat in the text (pages 11, lines 276-277):

      Another piece of compelling data presented in this study is the attachment of the outer dense fibers to the axoneme of the midpiece and proximal and distal principal pieces. From the classification data presented along the flagellar length, it is clear that the only ODF contact made with the axoneme is at the proximal principle plate. However, this is far from obvious in the native top view images presented. Is it possible to include a zoomed inset of the connection between the A-tubule and ODF connection?

      We are very happy that the reviewer finds this data exciting. As Fig. 6 is quite cluttered as is, we instead tried to better annotate the cross-section views of the axoneme by tracing one doublet-ODF pair in each image (or only a doublet in the case of the distal principal piece). This shows that there is a gap between the doublet and the ODF in the midpiece, and that there is no such gap in the principal piece. We also hope that annotating one doublet-ODF pair helps the reader see that the same pattern holds true for the other doublets/ODFs. The legend to Fig. 6 was changed accordingly.

      Reviewer #1 (Significance (Required)):

      This work is of good quality and provides crucial information on the structure of centriole and axoneme in 3 different species. This work complements well the previous works.

      The audience for this type of study is large as it is of interest to researchers working on centrioles, cilium, and sperm cell architecture.

      We are pleased the reviewer appreciate the quality of our work and see the interest for broad audience.

      My expertise is cryo-tomography and centriole biology

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study, Leung et al. used state-of-the-art EM imaging techniques, including FIB cryo-milling, Volta Phase plate, cryo-electron tomography and subtomogram averaging, to study the structure of sperm flagella from three mammalian species, pig, horse and mouse. First, they described two unique centrioles in the sperm, the PC and the DC. They found the PCs are composed of a mixture of triplet and doublet MTs. In contrast, the DCs are composed mainly of doublet and singlet MTs. By using subtomogram averaging, they identified a number of accessory proteins, including many MIPs bound to the MT wall. Many are unique to the mammalian sperm. They further described the connecting piece region of the sperm enclosing the centrioles and found an asymmetric arrangement. Furthermore, the authors presented the structure of sperm axonemes from all three species. These include the DMT and the CPA. Finally, they described the tail region of the sperm and described how the DMTs transitioned to the singlet MTs.

      This is a beautiful piece of work! It is by far the most comprehensive structural study of mammalian sperm cells. These findings will serve as a valuable resource for structure and function analysis of the mammalian flagella in the future. Now the stage is set for identifying the molecular nature of the structures and densities described in this study.

      We thank the reviewer for their positive evaluation! We are very happy that they share our excitement for the work, and that they also see it as “setting the stage” for future studies at the molecular level.

      The manuscript is clearly written. The data analysis is thorough. The conclusions are solid and not overstated. I don't have any major issues for its publication. A number of minor suggestions are listed below. Most are related to the figures and figure legends.

      Figure 1d, the figure legend should mention this is the subtomogram average of PC triplet MTs from pig sperm, though this is mentioned in the text. Also, for convenience, the color codes for the MIPs should be mentioned in the figure legend.

      Done.

      Figure 2J, similarly, the figure legend should mention this is the subtomogram average of DC doublets. It also needs a description of the color codes of the identified MIPs. For the DMT, please indicate the A- and B-tubule, which are colored in light or dark blue.

      Done, except we would prefer not to enumerate the MIPs as we did not name them nor discuss them extensively in the main text as we do not want to over-interpret the MIPs at this point as the average is from relatively small number of particles. However, we did specify that the gold and turquoise densities on the luminal surface are consistent with the inner scaffold. The figure legend was edited accordingly.

      Line 228, "We then determined the structure of DC doublet by subtomogram averaging"

      Done.

      For both Fig 2 and Fig 3. the DC doublets are colored in dark and light blue, please specify which is the A- or B-tubule in the figure legends.

      Done.

      Line 273, need space between "goldenrod"

      We would prefer to keep “goldenrod” spelled as is since this is how the color is referred to in Chimera and ChimeraX.

      Figure 4. need to expand the figure legend. Panels I, ii, iii, iv, are cut-through view of the lumen of CPA microtubules C1 and C2.

      Done.

      Line 338, Interestingly, the RS1 barrel is radially distributed asymmetrically around the axoneme

      Done.

      Figure 5, need color codes for the arrowheads (light pink, pink, magenta) in panels i~n,

      Done.

      Figure 7, (a-c) please use arrowheads to indicate the location of caps in the singlet MT.

      Done.

      Reviewer #2 (Significance (Required)):

      This is a beautiful and significant work - by far the most comprehensive analysis of mammalian sperm structure

      We are thrilled the reviewer appreciate the novelty of our work.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This is a very interesting study that explores the structural diversity of mammalian sperm flagella, in pig, mouse and horse, at high resolution using cryo-FIB milling and cryo-tomography. The study provides the first in situ cryo-EM structure of a mammalian centriole and describes a number of microtubule associated structures, such as MIPs and plugs at the plus-end of microtubules, that were not been reported so far. Additionally, the authors identify several asymmetries in the overall structure of the flagellum in the three species, which have implications for the understanding of the flagellar beat and waveform geometry in sperm, which are discussed by the authors. Although this study does not provide mechanistic novel information on the function of the described structures, it will undoubtedly serve as a reference for future theoretical and empirical work on the role of these structures in shaping the flagellar beat.

      With the exception of a couple of "eclectic word choices" in the Introduction (see detailed feedback in Minor Comments), the manuscript is also well written. Image acquisition and analysis are sound.

      We thank the reviewer for positively evaluating our work. We are glad that they feel our study will “serve as a reference” to inform future studies.

      However, I have some suggestions that should help the authors to strengthen their claims and present their results. The study is in principle suitable to be published, after the following points will be addressed:

      **Major comments:**

      • A major concern is that it is not clear how many animals, sperms and lamellae the authors used to acquire the data presented in the manuscript. This information needs to be provided, because it not uncommon to encounter aberrant flagella, even in a wildtype animal. The authors should state how many animals, and how many flagella per each animal were analyzed, in order to allow the reader to have an opinion on the reliability of their observations.

      • The figures are esthetically pleasing; however, the figures legends should be carefully revised to include necessary information about color codes, image annotations.

      We thank the reviewer for raising these points. We completely agree that the numbers of animals and cells are important pieces of information. As such, we now explicitly state the number of animals/cells/tomograms used for each average in Table S1. For more qualitative observations (such as the relationship between the asymmetry of the pig sperm PC and the Y-shaped segmented columns), we now state in the number of cells and animals in which we see each feature (see detailed response to Reviewer 1).

      **Minor comments:**

      • Line 26. I do not think that the word "menagerie" is properly used in this context.

      • Line 29. The same is true for the word "Bewildering" in this sentence.

      We apologise for our somewhat eclectic word choice. We see the reviewer’s point that unconventional word choice may distract readers, so we replaced these two words with ‘diverse’ and ‘an extensive’, respectively.

      • Line 286 "Our structures of the CPA are the first from any mammalian system, and our structures of the doublets are the first from any mammalian sperm, thus filling crucial gaps in the gallery of axoneme structures." Sentences like this one would fit much better in the Conclusions or at least in the Discussion.

      We thank the reviewer for this suggestion, but we would prefer to keep this sentence where it is, if possible. We think it is useful to tell the audience upfront why these structures are significant, especially since readers who aren’t deep in the field may be bogged down by all the details.

      • Line 377 "Large B-tubule MIPs have so far only been seen in human respiratory cilia (Fig. 5j) and in Trypanosoma (the ponticulus, Fig. 5n), but the morphometry of these MIPs differs from the helical MIPs in mammalian sperm." Please insert the citations for the studies about respiratory cilia and Trypanosoma flagella.

      Done.

      • In Figure 1. What do the stars shown in panel a and a' indicate?

      We indeed failed to specify what the asterisks/stars indicate. They are meant to emphasise that the electron-dense material in the lumen of the PC is continuous with the CP. We have now specified this in the text (page 10, lines 245).

      Given the complexity of the structures that compose the flagellar system of sperms, it would be helpful to add an illustration of the sperm with careful annotation of the centriole structures and the various segments of the flagellum.

      This is an excellent suggestion. To help orient readers, we added three panels to Fig. 1 (Fig. 1a-c) showing low-magnification images of whole sperm cells. We annotated different parts of the flagellum (neck, midpiece, principal piece, endpiece) so that readers can refer back to these panels in case they want to know which part of the cell the averages are from.

      • Figure 2. Explanation of the used color codes is missing. Additionally, the authors should include an explanation for the black and white arrows and for the 2 insets in i.

      Done. For the color code, please see response to Reviewer 2. For the black and white arrows, we edited the figure legend.

      • In "(j) In situ structure of the pig sperm DC with the tubulin backbone in grey and microtubule inner protein densities colored individually" ...it should be written "...sperm DC microtubule doublet..."

      Done.

      • In this figure, but also in every other figure that shows centriole, axoneme, or even microtubule averages it is important to indicate the microtubule polarity. Please add the symbol + and - to indicate microtubule polarity in the figures.

      Done. In order to avoid overcrowding, we only labelled the pig structures as the horse and the mouse structures are always shown in the same orientations as the pig.

      • Figure 3. Additional to the images in a,b, and c, the original tomographic slices (without segmentation) should be shown here, to allow the reader to visualize the structure.

      We now include three additional supplementary movies slicing through the respective tomograms.

      • Figure 7. Scale bars are missing in d-f.

      Done.

      • Scale bars are missing in most Supplementary figures.

      Done.

      • Table S1. The Information about horse and mouse centriole data is missing.

      The reviewer is correct, but this information is missing because we did not average from the horse and the mouse. For the mouse, the triplets were in various stages of degeneration, resulting in heterogeneity that precluded us from averaging. For the horse, we simply did not catch enough centrioles to generate a meaningful structure.

      Reviewer #3 (Significance (Required)):

      This study provides several novel structural insights in to the sperm flagellum structure that have implications for the understanding of the flagellar beat and waveform geometry in sperm. Although this study does not provide mechanistic novel information on the function of the described structures, it will undoubtedly serve as a reference for future theoretical and empirical work on the role of these structures in shaping the flagellar beat.

      Great to see the reviewer appreciate the novelty of our work.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This is a very interesting study that explores the structural diversity of mammalian sperm flagella, in pig, mouse and horse, at high resolution using cryo-FIB milling and cryo-tomography. The study provides the first in situ cryo-EM structure of a mammalian centriole and describes a number of microtubule associated structures, such as MIPs and plugs at the plus-end of microtubules, that were not been reported so far. Additionally, the authors identify several asymmetries in the overall structure of the flagellum in the three species, which have implications for the understanding of the flagellar beat and waveform geometry in sperm, which are discussed by the authors. Although this study does not provide mechanistic novel information on the function of the described structures, it will undoubtedly serve as a reference for future theoretical and empirical work on the role of these structures in shaping the flagellar beat. With the exception of a couple of "eclectic word choices" in the Introduction (see detailed feedback in Minor Comments), the manuscript is also well written. Image acquisition and analysis are sound.

      However, I have some suggestions that should help the authors to strengthen their claims and present their results. The study is in principle suitable to be published, after the following points will be addressed:

      Major comments:

      • A major concern is that it is not clear how many animals, sperms and lamellae the authors used to acquire the data presented in the manuscript. This information needs to be provided, because it not uncommon to encounter aberrant flagella, even in a wildtype animal. The authors should state how many animals, and how many flagella per each animal were analyzed, in order to allow the reader to have an opinion on the reliability of their observations.
      • The figures are esthetically pleasing; however, the figures legends should be carefully revised to include necessary information about color codes, image annotations.

      Minor comments:

      • Line 26. I do not think that the word "menagerie" is properly used in this context.
      • Line 29. The same is true for the word "Bewildering" in this sentence.
      • Line 286 "Our structures of the CPA are the first from any mammalian system, and our structures of the doublets are the first from any mammalian sperm, thus filling crucial gaps in the gallery of axoneme structures." Sentences like this one would fit much better in the Conclusions or at least in the Discussion.
      • Line 377 "Large B-tubule MIPs have so far only been seen in human respiratory cilia (Fig. 5j) and in Trypanosoma (the ponticulus, Fig. 5n), but the morphometry of these MIPs differs from the helical MIPs in mammalian sperm." Please insert the citations for the studies about respiratory cilia and Trypanosoma flagella.
      • In Figure 1. What do the stars shown in panel a and a' indicate? Given the complexity of the structures that compose the flagellar system of sperms, it would be helpful to add an illustration of the sperm with careful annotation of the centriole structures and the various segments of the flagellum.
      • Figure 2. Explanation of the used color codes is missing. Additionally, the authors should include an explanation for the black and white arrows and for the 2 insets in i.
      • In "(j) In situ structure of the pig sperm DC with the tubulin backbone in grey and microtubule inner protein densities colored individually" ...it should be written "...sperm DC microtubule doublet..."
      • In this figure, but also in every other figure that shows centriole, axoneme, or even microtubule averages it is important to indicate the microtubule polarity. Please add the symbol + and - to indicate microtubule polarity in the figures.
      • Figure 3. Additional to the images in a,b, and c, the original tomographic slices (without segmentation) should be shown here, to allow the reader to visualize the structure.
      • Figure 7. Scale bars are missing in d-f.
      • Scale bars are missing in most Supplementary figures.
      • Table S1. The Information about horse and mouse centriole data is missing.

      Significance

      This study provides several novel structural insights in to the sperm flagellum structure that have implications for the understanding of the flagellar beat and waveform geometry in sperm. Although this study does not provide mechanistic novel information on the function of the described structures, it will undoubtedly serve as a reference for future theoretical and empirical work on the role of these structures in shaping the flagellar beat.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this study, Leung et al. used state-of-the-art EM imaging techniques, including FIB cryo-milling, Volta Phase plate, cryo-electron tomography and subtomogram averaging, to study the structure of sperm flagella from three mammalian species, pig, horse and mouse. First, they described two unique centrioles in the sperm, the PC and the DC. They found the PCs are composed of a mixture of triplet and doublet MTs. In contrast, the DCs are composed mainly of doublet and singlet MTs. By using subtomogram averaging, they identified a number of accessory proteins, including many MIPs bound to the MT wall. Many are unique to the mammalian sperm. They further described the connecting piece region of the sperm enclosing the centrioles and found an asymmetric arrangement. Furthermore, the authors presented the structure of sperm axonemes from all three species. These include the DMT and the CPA. Finally, they described the tail region of the sperm and described how the DMTs transitioned to the singlet MTs.

      This is a beautiful piece of work! It is by far the most comprehensive structural study of mammalian sperm cells. These findings will serve as a valuable resource for structure and function analysis of the mammalian flagella in the future. Now the stage is set for identifying the molecular nature of the structures and densities described in this study.

      The manuscript is clearly written. The data analysis is thorough. The conclusions are solid and not overstated. I don't have any major issues for its publication. A number of minor suggestions are listed below. Most are related to the figures and figure legends.

      Figure 1d, the figure legend should mention this is the subtomogram average of PC triplet MTs from pig sperm, though this is mentioned in the text. Also, for convenience, the color codes for the MIPs should be mentioned in the figure legend.

      Figure 2J, similarly, the figure legend should mention this is the subtomogram average of DC doublets. It also needs a description of the color codes of the identified MIPs. For the DMT, please indicate the A- and B-tubule, which are colored in light or dark blue.

      Line 228, "We then determined the structure of DC doublet by subtomogram averaging"

      For both Fig 2 and Fig 3. the DC doublets are colored in dark and light blue, please specify which is the A- or B-tubule in the figure legends.

      Line 273, need space between "goldenrod"

      Figure 4. need to expand the figure legend. Panels I, ii, iii, iv, are cut-through view of the lumen of CPA microtubules C1 and C2.

      Line 338, Interestingly, the RS1 barrel is radially distributed asymmetrically around the axoneme

      Figure 5, need color codes for the arrowheads (light pink, pink, magenta) in panels i~n,

      Figure 7, (a-c) please use arrowheads to indicate the location of caps in the singlet MT.

      Significance

      This is a beautiful and significant work - by far the most comprehensive analysis of mammalian sperm structure

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this study, the authors use focused-ion beam (FIB) milling coupled with cryo-electron tomography and subtomogram averaging to uncover the structure of the elusive proximal and distal centrioles, as well as different regions of the axoneme in the sperm of 3 mammalian species: pig, horse, and mouse. The in-situ tomograms of the sperm neck region beautifully illustrate the morphology of both the proximal centriole, confirming the partial degeneration of mouse sperm, and intriguingly, asymmetry in the microtubule wall of pig sperm. In distal centrioles, the authors show that in all mammalian species, microtubule doublets of the centriole wall are organized around a pair of singlet microtubules. The presented segmentation of the connecting piece is beautiful and nicely shows the connecting piece forming a nine-fold, asymmetric, chamber the centrioles. The authors further use subtomogram averaging to provide the first maps of the mammalian central pair and identify sperm-specific radial spoke-bridging barrel structures. Lastly, the authors perform further subtomogram averaging to show to the connecting site of the outer dense fibers to the microtubule doublet of the proximal principal piece and confirm the presence of the TAILS microtubule inner protein complex (Zabeo et al, 2018) in the singlet microtubules occupying the tip of sperm tails. The manuscript provides the clearest insight into flagellar base morphology to date, giving insight into the morphological difference between different mammalian cilia and centriole types. The manuscript is suitable for publication, once the following questions are addressed.

      Major Points: How many centrioles and axonemes were used in generating the averages presented in the paper? If too few samples were used, especially in centrioles undergoing dramatic remodeling or degeneration, the reality of MIPs and MAPs being present might be completely affected. For instance, In figure 1d, the authors present a cryoET map of the centriole microtubule triplet. However, centrioles are divided into several regions with different accessory elements. Here, the authors could show the presence of only part of the A-C linker. The A-C linker covers only 40% of the centriole, so does it mean that this centriole is made only of the accessories that characterize the proximal side of the centriole? In the same line, what were the boundaries governing subtomogram extraction? For example, in the distal centriole, were microtubules extracted from just before the start of the transition zone, to the end of the microtubule vaulting, more pronounced at the end of the proximal region? There are known heterogeneities in centriole, as well as flagella, ultrastructure along the proximal distal axis. If no pre-classification was performed for subtomogram longitudinal position along with the centriole and axoneme, structural features may be averaged out, and or present and not reflecting their real longitudinal localization. The classification should be applied here if it was not the case.

      Minor Points:

      • In line 3, motile cilia are not only used to swim, they can move liquid or mucus for instance.
      • In line 175, the authors stated " a prominent MIP associated with protofilament A9, was also reported in centrioles isolated from CHO cells (Greenan et al. 2018) and in basal bodies from bovine respiratory epithelia (Greenan et al 2020). Actually, this MIP has been seen in many other centrioles from other species, such as Trichonympha (https://doi.org/10.1016/j.cub.2013.06.061 ), Chlamydomonas, and Paramecium ( DOI: 10.1126/sciadv.aaz4137 ). Citing these studies will reinforce the evolutionary conservation of this MIP and therefore its potential crucial role in the A microtubule.
      • In Line178, the authors stated: "Protofilaments A9 and A10 are proposed to be the location of the seam (Ichikawa et 2017)". High-resolution cryoEM maps confirmed it: https://doi.org/10.1016/j.cell.2019.09.030 . This publication should be cited. Moreover, authors should also refer to this paper when discussing MIPs in the microtubule doublet.
      • In Line 187-189 the authors stated, "We resolved density of the A-C linker (gold) which is associated with protofilaments C9 and C10." The A-C linker interconnects the triplets of the proximal centriole (Guichard et. al. 2013, Li et. al. 2019, Klena et. al. 2020) with distinct regions binding the C-tubule, as shown by the authors in gold, as well as an A-link, making contact with the A-tubule through various protofilaments in a species-specific manner, but always on protofilament A9. The authors may have identified the A-link, labeled in green, on the outside of protofilament A8/A9 in Figure 1d.
      • In figure 1e, the authors provide a 9-fold representation of the centriole based on their map. How relevant is this model ? the distance between triplet is inconsistent here, which has not been observed before. Do they use true 3D coordinates to generate this model? The A-C linker, which is only partially reconstructed, does not contact the A microtubule. Is it really the case? did the authors see that the A-link density of the A-C linker has disappeared? If these points are not clearly specified, this representation might be misleading.
      • The nomenclature regarding MIPs is sometimes confusing in this manuscript. For example, in lines 228-229 "We then determined the structure of DC doublets, revealing the presence of MIPs distinct from those in the PC." Does this include the gold and turquoise labeled structures in Figure 2j? These densities appear to correspond to the inner scaffold stem in the gold density presented in Figure 2j, and armA, presented in the turquoise density (Li et. al. 2011, Le Guennec et. al. 2020). The presence of this Stem here is important as it correlates with the presence of the molecular player making the inner scaffold (POC5, POC1B, CENTRIN): https://doi.org/10.1038/s41467-018-04678-8
      • The connecting piece is composed of column vaults emanating from the striated columns is compelling and beautiful segmentation data. However, it is important to note how many pig sperm proximal centrioles had immediate-short triplet side contact with the Y-shaped segmented column 9, as well as in how many mouse centrioles have the two electron-dense structures flanking the striated columns.

      The resolution of the mammalian central pair is an important development brought by this work. The structural similarity between the central pair of pig and horse is convincing. However, with only 281 subtomograms being averaged for the murine central pair, corresponding to an estimated resolution of 49Å, the absence of the helical MIP of C1 with 8 nm periodicity suggests that there is simply not enough signal to capture it in the average. The same could be said for the smaller MIP displayed in Figure 4 c, panel ii. This point should be clearly stated.

      Another piece of compelling data presented in this study is the attachment of the outer dense fibers to the axoneme of the midpiece and proximal and distal principal pieces. From the classification data presented along the flagellar length, it is clear that the only ODF contact made with the axoneme is at the proximal principle plate. However, this is far from obvious in the native top view images presented. Is it possible to include a zoomed inset of the connection between the A-tubule and ODF connection?

      Significance

      This work is of good quality and provides crucial information on the structure of centriole and axoneme in 3 different species. This work complements well the previous works. The audience for this type of study is large as it is of interest to researchers working on centrioles, cilium, and sperm cell architecture.

      My expertise is cryo-tomography and centriole biology

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      This manuscript follows on from previous work from the Rhind lab to investigate whether the load of MCMs at origins is a factor in when the origin activate (as a population average) during S phase. The authors use budding yeast and a auxin degron system to modulate the levels of an MCM subunit. This allows them to titrate down the concentration of the MCM hexamer and observe the effect. Crucially, they assay both the reduction in MCM load at origins and the subsequent replication dynamics in the same experiment. This is the power of their approach and allows them to rigorously test their hypothesis.

      **Major comments**

      1.I found the introductory paragraph discussing the Rhind lab hypothesis about the possibility of multiple MCM being loaded at origins somewhat misleading. The first paragraph of the discussion was much clear. However, I feel that the introductory paragraph should deal with the difference between the two proposals: 0-1 MCM-DH per origin (de Moura et al), vs 0-50+ MCM-DH (Yang et al). It s also important to note that Foss et al find that "In budding yeast, [MCM] complexes were present in sharp peaks comprised largely of single double-hexamers" - i.e. consistent with 0-1 MCM-DH per origin.

      To improve the balance of the introduction, I think the authors should briefly introduce the concepts behind the 0-1 MCM-DH per origin; this was defined as origin competence by Stillman and clearly described by McCune et al (2008; see figure 8) prior to the work from de Moura et al.

      Furthermore, in the discussion the authors should be more even-handed. To date there is no data to conclusively rule one way or the other in distinguishing between single vs multiple MCMs. The authors cite Lynch et al and state "overexpression of origin-activating factors in S phase causes most all origins to fire early in S phase, consistent with most origins having at least one MCM loaded". However, Lynch et al report equivalent (roughly equal) origin efficiencies, but the assay doesn't distinguish between all going up to high efficiency or all going to a lower intermediary efficiency. Given that fork factors (polymerases, etc) are likely to become limiting at some point (or checkpoints could be activated due to limited dNTP supplies) it would seem plausible that uniform origin efficiency could be a consequence of less than maximal origin firing. As part of this discussion it would be useful for the authors to include what conclusions have been reached on MCM load from in vitro systems (with chromatin substrates).

      Because the main focus of the paper is not dependent on whether MCM stoichiometry varies from 0 to 1 or 0 to many, we had relegated our discussion of absolute stoichiometry to the Discussion. However, it is clear from multiple reviewer's comments that it is something very much on readers minds. Therefore, we have now included a brief introduction to the 0-to-1 and 0-to-many scenarios in the Introduction and moved the bulk of the discussion of the data supporting the two scenarios to the Discussion.

      2.The authors are not the first to look at the consequence of reduced MCM concentrations on origin function. This was essentially the basis for the MCM screen undertaken by Bik Tye's lab that first identified the MCM genes. In addition to temperature sensitive mutants, the Tye group also examined heterozygotes (Lei et al., 1996) to show differential effect on the ability of two origins to support plasmid replication. The authors finds are entirely consistent with these early studies, particularly since ARS416 (formerly ARS1) was found to highly sensitive to reduced MCM levels and ARS1021 (formerly ARS121) was found to be insensitive to MCM levels. The authors find a signifiant reduction in MCM load at ARS416, but the MCM load at ARS1021 is unaltered by reduced MCM concentration. It would be worth the authors noting this consistency. The authors do cite the Lei study, but not in this context. The original MCM screen was published here:

      Maine, G., Sinha, P., Tye, B. (1984). Mutants of S. cerevisiae defective in the maintenance of minichromosomes Genetics 106(3), 365 - 385.

      Furthermore, at the end of the discussion the authors state that "it will be interesting to dissect the specific cis- and trans-acting factors that make origins sensitive or resistant to changes in MCM levels". The equivalent effect reported by the Tye lab has already been dissected by the Donaldson lab (Nieduszynski et al., 2006) and perhaps it would be worth briefly mentioning their findings.

      We have included both of these literature precedents in the Discussion.

      3.The authors should show the flow cytometry data for each of their cell cycle experiments, if only in supplementary figures. This is important to allow a reader (and reviewer) to judge the level of synchrony achieved when interpreting the results.

      This data is now included as Figure S1

      4.I think the authors should show the ChIP signal at some example origins, including ones sensitive and insensitive to the reduction in MCM concentration. Currently all the high resolution ChIP data (i.e. over 1400 bp, e.g. Fig 3a) is presented as meta-analyses of many origins.

      We will include this analysis in a subsequent revision.

      5.When describing the results in Fig 4a the authors focus on changes (highlighted in black boxes) that fit their expectation. However, there are other sites that should at least be mentioned that don't seem to fit the authors model, e.g. ARS517, ARS518. It would be worth discussing what fraction of the timing data can be explained by the reduced MCM load.

      We now explicitly point out that Figures 4c and 4d address this issue of the robustness of the correlation. Although there is significant variation, as the reviewer points out, the trend is seen genome wide. As it happens, both ARS517 and ARS518 do fit the model reasonably well. They have intermediate loss of MCM signal and intermediate delay in timing.

      **Minor comments**

      -These data, rather than this data (throughout).

      I suspect that the journal style and/or copy editors will make the final call. However, I will point out that although 'data' is most certainly plural in Latin, its predominate modern English usage is as a mass noun, such as water or sand or information. In general, users do not think of, or use, 'data' as a collection of discrete elements, each on being a 'datum', a contention supported by the very infrequent use of the word datum. For instance, in ChIP-seq experiment, what is a datum? Each individual read? Each individual nucleotide in each read? The quality score for each individual nucleotide in each read? Each pixel in each image from the sequencer? When one wants to refer to an individual piece of data, common usage is to refer to a data point, just as one would refer to a grain of sand. Moreover, if 'data' were plural, it would be incorrect to use it in phrases such as "there is very little data available". Would the review really suggest using "there are very few data available"?

      -the authors should clearly state in figure legends what window size has been used in analysing genomic data.

      All analyses were done using 1kb windows, as now stated in the figure legends.

      -in figure 2a the authors show pairwise comparisons between conditions, it would be nice to see the 3rd pairwise comparisons perhaps as a supplementary figure

      We have included the third comparison in Figure 2a.

      -in figure 2c it would be clearer to use the same colour for the lines and the points

      The regression lines are in the same colors as the data points they fit. x=y is shown in blue for comparison, as now noted in the figure legend.

      -the authors should avoid the use of red/green colour combinations in their figures (see: https://thenode.biologists.com/data-visualization-with-flying-colors/research/)

      All figures will be redrawn in colorblind-accessible colors in a subsequent revision.

      -in the text the authors state "ORC binding to the ACS and subsequent MCM loading is a directional process dependent on a ACS- site and a similar but inverted nearby sequence (Xu et al., 2006)". I think it would be more appropriate to cite the following study here:

      Coster, G., Diffley, J. (2017). Bidirectional eukaryotic DNA replication is established by quasi-symmetrical helicase loading Science (New York, NY) 357(6348), 314 - 318. https://dx.doi.org/10.1126/science.aan0063

      The Coster reference has been included.

      -the list of factors that influence replication timing should include Rif1, whereas it is less clear that Rpd3 acts within the unique genome (as opposed to indirectly via repetitive DNA, e.g. rDNA)

      Rif1 has been added to the list.

      -figure 4 - it might help to mark the centromere on panel a. Also, why do the ChIP peaks and annotated origins appear to line up so poorly?

      The shift between the peaks and the ACS positions was introduced during the construction of the figure. Thanks for catching it. The alignment has been corrected and the centromere annotation has been added.

      -figure 4d - would it not be better to use fraction of lost MCM signal on the x-axis as in previous figures?

      If T_rep was a linear function of MCM stoichiometry, fraction lost would work as well as amount lost. However, we find that there is a lower correlation between fraction of MCM signal lost and T_rep delay than between absolute MCM signal lost and T_rep delay, suggesting a more complicated relationship.

      -"with galactose or raffinose, to induce or repress Mcm2-7 overexpression, respectively." This is incorrect, raffinose does not repress this promoter (that requires glucose).

      Fixed.

      -the S. pombe spike in is a great addition to the over expression experiments. It's a shame that it wasn't included in the auxin experiments.

      Yes, we agree.

      -why does the data in fig 5d appear to be at much lower resolution that the previous ChIP data?

      The resolution was inadvertently reduced during the rendering of the figure. The resolution has restored.

      -in the sequencing analysis pipeline for MCM ChIP the authors use a 650 bp upper size limit; why have such a large threshold compared to the size of a nucleosome? Are the analyses and findings sensitive to this size threshold?

      Although the MNase digestion was optimized to produce mostly mononucleosomal-sized digestion, some di- and very little tri- nucleosomal fragments still remain. In order to capture as many of the MCM-protected immunoprecipitated fragments as possible, the upper limit was set at 650 bp (up to 4 nucleosomes-worth of DNA). However, there is a very minimal contribution from fragments larger than mononucleosomes, qualitatively as well as quantitatively in 1kb windows around origins. Figure 3a provides a qualitative depiction of the contribution of dinucleosomes (input, ~300bp).

      -the repliscope package was published here:

      Batrakou, D., Müller, C., Wilson, R., Nieduszynski, C. (2020). DNA copy-number measurement of genome replication dynamics by high-throughput sequencing: the sort-seq, sync-seq and MFA-seq family. Nature Protocols 15(3), 1255 - 1284. https://dx.doi.org/10.1038/s41596-019-0287-7

      The reference has been corrected.

      Reviewer #1 (Significance):

      This work builds upon a body of work from the Rhind group (and others) to determine the contribution of MCM load to replication origin activation dynamics. To my mind this is the most convincing dataset and analysis to date and goes a long way to supporting the model that the efficiency of MCM loading is a major factor in determining the mean replication time of an origin. As the authors state, they are still not able to distinguish between two different models of MCM load (single vs multiple). It would be interesting for the authors to discuss how these two models could be distinguished in the future (perhaps with single cell/molecule experiments).

      This study will be of interest to those in the fields of DNA replication and genome stability.

      My field of expertise is DNA replication and replication origin function.

      Reviewer #2 (Evidence, reproducibility and clarity):

      **Summary:**

      This is a nice study that characterizes the consequences of limiting or increasing Mcm expression on the replication program. Prior ChIP experiments in yeast have observed that not all origins exhibit the same level of Mcm enrichment and that increased mcm enrichment was correlated with origin activity. These observations led to two different models -- a) that multiple Mcm2-7 double hexamer complexes are loaded at some origins and b) a probabilistic model where the differential enrichment of Mcm2-7 reflected the fraction of cells in a population that had loaded the Mcm2-7 complex at a specific origin. While the titration experiments presented here don't provide any conclusive support for either model, they do provide some novel and relevant insights for the replication field, in part, due to the increased resolution and quantification afforded by the MNase ChIP-seq approach (and S. pombe spike in). The authors very nicely demonstrate that origins are differentially sensitive to Mcm2-7 depletion and that loss of Mcm2-7 loading results in an altered replication timing profile. The origins most impacted by loss of Mcm2-7 are 'weak' origins as described by the Fox group. Intriguingly, the authors find that the 5X overexpression of Mcm2-7 does not perturb the relative Mcm2-7 loading at individual origins, but rather instead globally represses Mcm2-7 association at all origins. They also find that overexpression of both Cdt1 and Mcm2-7 is detrimental to the cell (although no obvious replication phenotype was observed). Finally, the authors present a reasonable interpretation of their data in the context of models for replication timing which was very well articulated.

      **Major Comments:**

      From the methods it appears that different analyses were performed with different replicates?

      "Replicate #1 was used for all analyses except for V plots, for which the higher resolution Replicate #2 was used."

      Ideally all of the conclusions should be supported by all the replicates independently, or if the replicates are concordant -- they should be merged (at a similar sequencing depth) prior to doing the analyses.. Even the v-plots with merged replicates will be informative due to the greater sequencing depth.

      Though we agree that greater sequencing depth would be informative for aggregation analysis, we think that one of the main strengths of our study is the analysis of MCM quantitation and replication timing in the same population of cells. Although the experiments were performed in exactly the same way, there is always slight biological or temporal differences between the replicates, due to the complicated nature of the experimental design. This variation increases the noise between the MCM ChIP and the replication timing analyses. Therefore, were analyzed the replicates separately. However, we did do all of the analyses on both replicates and got similar results. We have now explicitly stated as much.

      The authors should provide a separate analysis for the larger nucleosomal sized fragments and smaller putative MCM double hexamer fragments with regards to the Mcm loading and relationship to ACS and orientation. They may represent an interesting intermediate with mechanistic consequences for the interpretation.

      We will include the suggested analysis in a subsequent revision.

      The authors should present the v-plots and an analysis of which side the Mcm's load for the overexpression studies. I was surprised that there was no further in-depth analysis for these two extremes. Perhaps similar conclusions will be reached, but it should at least be mentioned/presented as a supplementary figure.

      We will include the suggested analysis in a subsequent revision.

      **Minor Comments:**

      This is largely semantic, but the majority of MNase ChIP-seq signal recovered is associated with the nucleosomes and not in the NDR and as the signal in the NDR is differentially sensitive to digestion, I would suggest rephrasing the following sentence:

      "In contrast to previous genome-wide reports (Belsky et al., 2015), but in agreement with recent in-vitro cryo-EM structures (Miller et al., 2019), we also observe MCM signal in the nucleosome-depleted region (NDR) of origins. "

      to :

      "In agreement with a previous genome-wide report (belsky 2015), we found that the bulk of the MCM signal was associated with nucleosomal sized fragments; however the increased resolution afforded by our approach allowed us to also detect protected fragments in the NDR as predicted by recent in vitro cryo em structures..."

      We have modified the sentence as suggested.

      As a sanity check, please double check V-plots and presence of small fragments with the digestion conditions. In the Henikoff manuscript the bulk of sub-nucleosomal fragments were lost with the longer digestion time. Specifically, the TF footprints were more pronounced with minimal digestion. While it might be argued that the longer digestion more tightly resolved the binding site, in many cases they were completely lost with the 20 minute digestion. This is just a simple check -- I don't doubt the results as reported given the experimental conditions are very different. For example, the henikoff manuscript did not use cross linking or an antibody enrichment step.

      We double checked and confirmed that more small fragments are found in the more digested library. The reason that we see more small fragments when we digest more, in contrast to the contrary observation in the Henikoff paper is presumably because MCM has a larger footprint than a transcription factor and protects that footprint more effectively.

      Last paragraph of the "MCM associates with nucleosomes section" which reports that the Mcm2-7 complex is loaded up or downstream from the ACS independent of orientation should cite Belsky 2015 (Figure 5 and discussion) for the initial observation.

      Done.

      The authors argue that the global reduction in MCM loading associated with overexpression may be a technical artifact given that all origins exhibit a proportional reduction in mcm2-7 loading. However, this is exactly what the S. pombe spike in control is intended for. The relative difference between individual origins resulting from Mcm2-7 depletion would still be evident without the spike in. The authors do discuss different possibilities, but I would not be so keen to discard this as technical artifact.

      We, too, are reluctant to dismiss this result as a technical artifact. However, we are at a loss to offer any other explanation. We raise a handful of biological possibilities in the Discussion, but dismiss each one as failing to account for our results. We would be happy to entertain other suggestions.

      Reviewer #2 (Significance):

      This work has several advances that will be appreciated by the replication field -- including a high resolution view of Mcm2-7 loading in the context of chromatin; the impact of titrating (low and high) MCM expression on MCM loading and replication timing program; and a well reasoned discussion of how different models of MCM loading would impact origin activation and replication timing program. The work builds on prior studies in the field (eg. Belsky 2015), while some of the conclusions regarding the localization of the Mcm2-7 complex relative to the ACS and surrounding nucleosomes are confirmatory, the increased resolution provides new insight (like the enrichment of small fragments in the NDR) that could be further strengthened by additional analysis (see above).

      My expertise is DNA replication and chromatin.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this study, the authors use Auxin-mediated degradation of Mcm4 to reduce the concentration of the MCM helicase complex in yeast, and determine the effects of this reduction on both MCM-origin association (interpreted as MCM loading) by MNase-MCM-ChIPSeq and on replication origin function by Sync-Seq replication timing experiments (deep sequencing of a yeast population as it progresses through a synchronized S-phase). Complementary experiments testing the effect of induced MCM complex over-expression on MCM-origin association are also performed.

      The authors find that reducing Mcm4 levels (and thus loading-competent MCM complexes) causes yeast cells to be more sensitive to DNA replication stress. In addition, not all origins are equally susceptible to reductions in MCM levels; the origins that do lose MCM binding at reduced MCM levels show a reduction in activity and an associated delay in their replication time under those conditions. Finally, over-expression of the MCM complex has no effect on MCM-origin association or origin function, suggesting that MCM levels are not limiting for origin licensing in yeast under normal lab conditions. The strengths of the study are the well-executed experiments and very nice data that are presented. However, there are several weaknesses. The authors make conclusions that are not supported by their data; and several of the outcomes are not at all unexpected based on extensive published studies in yeast and mammalian cells, raising issues about whether this study advances and/or clarifies the current gaps in the field. While some of the relevant past studies were referenced, the authors did not place their own study in the context to published work and current models in the field, which reduced the scholarly value of their study. Because the work was not placed in context of the field, some of the rationale and conclusions were misleading.

      **Some specific major comments:**

      1,The title is misleading. The authors have clearly shown that when MCM levels are be made limiting in an engineered system, some origins are substantially less active, which means that these origin loci are replicated "passively" (i.e. by a Replication Fork (RF) emanating from a distal origin) rather than actively (i.e. by "firing" and initiating replication). Their own replication data show that. But this competition is only revealed when MCM levels are artificially/experimentally lowered. What is the evidence that competition for MCM complexes among individual origins establishes replication timing patterns in yeast? If anything, the over-expression experiment suggests the opposite--that MCM levels are not limiting and therefore do not play a substantial role in establishing the replication timing patterns that are observed in yeast. Instead those patterns appear to result primarily from the fact that MCM complex activation factors are present in limiting concentrations relative to origins.

      We agree with the reviewer's analysis and have revised the title to "The Capacity of Origins to Load MCM Establishes Replication Timing Patterns".

      2,The abstract states that "the number of MCMs loaded onto origins has been proposed to be a key determinant of when those origins initiate DNA replication during S-phase". While it is true that this lab has proposed this model in budding yeast, the current study performs no experiments that directly address this model--i.e. that i. individual origins possess a different number of MCM complexes and or ii that these differences underlie timing differences. They acknowledge this point in their Discussion--a ChIPSeq experiment is an ensemble experiment--there is no way to know that differences in MCM signals correspond to a different number of MCM complexes per origin versus a differences in the fraction of cells that contain and MCM complex at all at a given origin . But this statement in the abstract, combined with their conclusion in the same section of the paper: "Our results support a model in which the loading activity of origins, controlled by their ability to recruit ORC and compete for MCM, determines the number of helicases loaded, which in turn affects replication timing" implies that they have tested a model that they have not tested. Given how quickly readers "skim" the literature these days, a misleading abstract can do a lot of damage to a field. The results presented in this study neither support nor refute the model for the number of helicases loaded per origin, and the fact that reducing origin licensing efficiency by making the major substrate limiting reduces the number of licensed origins in a cell population is fully expected based on the current state of the field .

      Four questions are addressed in this comment. The first is whether there is variable MCM stoichiometry at origins. The second is whether that variation ranges from 0 to 1 and 0 to many. The third is if the variation is stoichiometry affects replication timing. The fourth is how this variation in stoichiometry comes about.

      Our work is based on the conclusion, supported by a substantial body of literature, that MCM loading stoichiometry varies among origins. Our data in this paper further supports this conclusion.

      As the reviewer notes, and as we had tried to make clear, the data is this paper does not address the range of the variation. Moreover, as we also tried to make clear, our hypotheses, results and conclusions are not affected by whether the range is 0 to 1 or 0 to many.

      This paper focuses on Questions 3 and 4. We have reworked the introduction to make these distinctions more clear.

      We have also corrected the abstract to refer to "the stoichiometry", instead of "the number", of MCMs.

      3,The rationale for the study as stated in the Introduction: "Although the molecular biochemistry of initiation at individual origins continues to be elucidated in great detail (Bleichert, 2019), the mechanism governing the time at which different regions of the genome replicate has remained largely elusive (Boos and Ferreira, 2019)." Is also misleading. In fact, in budding yeast (and other organisms) there have been several advances in this area particularly with respect to DNA replication origin activation. The S-phase origin activation factors are limiting for origin function, and factors such as Ctf19 at centromeres and Fkh1/2 at non-centromeric early-acting origins help to directly recruit the limiting S-phase factor, Dbf4, to origins. It is misleading to ignore this substantial progress and not make an effort to place this current study, which is important and one of the first to look directly at MCM loading control in yeast, into a relevant context with respect to what is known. What's interesting is that this S-phase model assumes/requires that most origins are, in fact, licensed and thus that differences in licensing efficiency are not a major driving of replication timing patterns in yeast. But we do not know why there are only subtle differences in MCM loading---this study may help explain that.

      We have broadened the scope of our Introduction and Discussion to address these points. However, it is not the case that "there are only subtle differences in MCM loading". MCM ChIP-seq (, and this paper) and MCM ChEC-seq both show well over ten-fold variation in MCM stoichiometry at origins. We have now explicitly made this point in the Introduction.

      4,The authors link the differential ability of MCM loading deficiencies when MCM is made limiting to differences in ORC binding categories. The "weak" origins, that presumably bind ORC weakly, were most affected by reductions in MCM. Are these origins less efficient than the other categories, DNA and chromatin-dependent (using the origin efficiency metric data from the Whitehouse lab) where MCM binding is not reduced as much? In normal cells are these early or late origins? Is the idea that the role of excess MCM is to achieve a sufficient number or "back up" origins per cell to deal with potential stress, as proposed by the Blow and Schwob labs in tissue culture cells many years ago? It seems likely that the data reported here are in fact confirmations of those early studies in mammalian cells---which is useful to know even if not unexpected.

      We will include the suggested analyses in a subsequent revision.

      Excess MCM do, as has been long appreciated and as we discuss, contribute to replication-stress tolerance. However, that is not a major point of our paper.

      5,Aren't the results that losing MCM signal corresponds to loss of origin activity peaks entirely expected? The same result would be obtained if you made a point mutation in that origin's ACS. Of course preventing an origin from being licensed will delay that region's replication time in S-phase because it now must be replicated passively. Licensing affects replication timing patterns because the MCM complex is the substrate for limiting S-phase factors, but that is far different from concluding that the number of MCMs at an origin is what controls the time in S-phase when an origin is activated.

      Yes, "the results that losing MCM signal corresponds to loss of origin activity peaks [are] entirely expected". However, this is not the important result. The key result is that the distribution of MCM at origins is not uniformly affected, which leads to our conclusions that, in wild-type cells, origin capacity dominates MCM stoichiometry and that, when MCM become limiting, origin activity (probably determined by ORC affinity) becomes critical—neither of which were expected results. In any case, the expected correlation between MCM loading and origin activity was observed as a consequence of measuring MCM stoichiometry and replication timing and is an obvious analysis to include, so we did so.

      6,The authors stated that the measured MCM abundance for the 43% of origins that are not known to be controlled by the multiple mechanisms that have been shown to control origin replication time. Is this because they think that MCM loading contributes to the timing control of only these origins? Was MCM loading not affected at any of these other origins when MCM levels were reduced? Are those 43% of origins in the "weak" binding category in terms of ORC? The rationale for eliminating so many origins from these analyses were not clear.

      We propose that the probability of origin activation is the product of the stoichiometry of MCM at the origin and the rate of MCM activation, which may be affected by trans-acting factors. For the 43% of origins for which there is no known trans-acting regulation, the correlation with stoichiometry is stronger. However, the correlation holds when looking at all origin, too. The suggestion to look at only the 57% of origins with known trans-action regulation is a good one. We will include this analysis and the other suggested analyses in a subsequent revision.

      7,Doesn't the data in Figure 4c at 0 mM auxin support the conclusion that differences in MCM ChIP signals have negligible effects on origin activation time, in contrast to the publication by Das, 2015 from this lab? Or is the point that these origins are sensitive to reductions in MCM levels and the more sensitive they are the more delayed their replication time (but again, doesn't that have to be true? If they are losing MCM signals they cannot function as origins, so they are replicated passively and, by definition, will show delayed replication timing. An origin is defined as such by a loaded MCM complex.)

      No. The reason the correlation in 4c is not a good as in our previous work is that in Das 2015 we compared origin-activation efficiency (calculated from our stochastic model in Yang 2010), instead of T_rep, which we used here. T_rep is a convolution of origin-activation time and passive-replication time, reducing to correlation. The important observation is that the correlation gets better as MCM levels are reduced.

      The correlation between MCM stoichiometry and activation efficiency may seem trivial, but just because a model is simple does not mean it is not correct. If stoichiometry was the only factor regulating origin activation, we would expect a stronger correlation. So, we conclude that there are other factors at play, quite possible the trans-acting factors that the reviewer mentions in their second point. However, if stoichiometry played no role, we would expect no correlation. So, we propose that MCM stoichiometry is "an important determinant of replication timing".

      8,I do not understand the conclusions from Figure 4d. There is an extremely small positive correlation between how much of an MCM signal is lost and delay in replication time of an origin, but this correlation is not surprising as an unlicensed origin cannot be an origin and will be replicated passively. What seems most surprising about these data is that the effect is so weak, not that it exists. There is quite a lot of scatter in this plot at 500 uM auxin, with some origins losing a given amount of signal (x) and being only slightly delayed in replication time, and others losing the same amount of signal (x) and being substantially delayed. What underlies this outcome?--Are the ones that are not substantially delayed closer to origins that have not been affected at all by MCM reductions? Why is the correlation so weak? The other regulators of origin activation time have stronger and more precise effects--for example the centromere-control can be precisely eliminated so that only the replication time of the centromere-proximal origins are delayed.

      We believe that much of the noise in Figure 4d is due, as the reviewer suggests, to passive replication of origins which lose most of their MCM signal and become inactive but happen to reside next to origins which don’t lost any MCM signal and fire early. And excellent example is ARS 510 (see Figure 4a). ARS510 loses most of its MCM signal and clearly loses its initiation peak in the T_rep plot. However, because it is next to ARS511, which does not lose much MCM signal and which remains a efficient origin, ARS510 is still replicated early. We will include this example in a subsequent revision.

      9,Multiple studies in yeast and mammalian cells indicate that MCM subunits are in excess relative to other licensing and S-phase initiation factors, so it is not unexpected that over-expressing MCM did not lead to enhanced levels of licensing. It seems much more plausible that Cdc6 or Cdt1 or both factors are present in limiting amounts for MCM loading, so I did not understand the point of over-producing MCM subunits. If the "weak" origins are the ones that are most dramatically affected by reducing MCM to "limiting" levels, isn't the question whether you can increase licensing at these origins when you over-produce a factor that is likely limiting for licensing, such as Cdt1 or Cdc6 (or both) while leaving MCM at its normal levels. The fact that MCM levels are not limiting for licensing is not surprising and, if anything, argues against these levels having a regulatory role in origin activation timing---which seems to be the opposite of what the authors want to conclude.

      Orc1-6, Cdc6 and Cdt1 are all substoichiometric to MCM. However, they all act catalytically to load MCM. So, although they may be kinetically limiting, they do not prevent most or all MCMs being loaded in wild-type cells. The fact that overexpressing MCMs (with or without Cdt1) does not allow for more MCM loading suggests that under normal conditions origins are saturated with MCMs and have little or no capacity to load more MCM, even when given plenty of time to do so. From this result, we conclude that origin capacity is a major determinant of MCM loading in wild-type cells. From our MCM-reduction experiments, we also conclude that, when MCM is limiting, origin competition affects which origins load MCMs faster. However, we agree with the reviewer's first point, that our title gave the incorrect impression that we concluded that origin competition is the primary determinant of MCM loading in wild-type cells. Thus, as suggested, we have changed the title. We have also reworked the Introduction and Discussion to more clearly explain that competition is only a determining factor when MCMs are limited.

      In summary, I think the technical aspects of the experiments were quite strong, but I do not think that the experiments answered the question that was posed by the authors.

      **Minor points:**

      Many places where "This data" should be changed to "These data". Data are plural.

      See comments on this point in the response to Reviewer #2.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this study, the authors use Auxin-mediated degradation of Mcm4 to reduce the concentration of the MCM helicase complex in yeast, and determine the effects of this reduction on both MCM-origin association (interpreted as MCM loading) by MNase-MCM-ChIPSeq and on replication origin function by Sync-Seq replication timing experiments (deep sequencing of a yeast population as it progresses through a synchronized S-phase). Complementary experiments testing the effect of induced MCM complex over-expression on MCM-origin association are also performed.

      The authors find that reducing Mcm4 levels (and thus loading-competent MCM complexes) causes yeast cells to be more sensitive to DNA replication stress. In addition, not all origins are equally susceptible to reductions in MCM levels; the origins that do lose MCM binding at reduced MCM levels show a reduction in activity and an associated delay in their replication time under those conditions. Finally, over-expression of the MCM complex has no effect on MCM-origin association or origin function, suggesting that MCM levels are not limiting for origin licensing in yeast under normal lab conditions. The strengths of the study are the well-executed experiments and very nice data that are presented. However, there are several weaknesses. The authors make conclusions that are not supported by their data; and several of the outcomes are not at all unexpected based on extensive published studies in yeast and mammalian cells, raising issues about whether this study advances and/or clarifies the current gaps in the field. While some of the relevant past studies were referenced, the authors did not place their own study in the context to published work and current models in the field, which reduced the scholarly value of their study. Because the work was not placed in context of the field, some of the rationale and conclusions were misleading.

      Some specific major comments:

      1,The title is misleading. The authors have clearly shown that when MCM levels are be made limiting in an engineered system, some origins are substantially less active, which means that these origin loci are replicated "passively" (i.e. by a Replication Fork (RF) emanating from a distal origin) rather than actively (i.e. by "firing" and initiating replication). Their own replication data show that. But this competition is only revealed when MCM levels are artificially/experimentally lowered. What is the evidence that competition for MCM complexes among individual origins establishes replication timing patterns in yeast? If anything, the over-expression experiment suggests the opposite--that MCM levels are not limiting and therefore do not play a substantial role in establishing the replication timing patterns that are observed in yeast. Instead those patterns appear to result primarily from the fact that MCM complex activation factors are present in limiting concentrations relative to origins.

      2,The abstract states that "the number of MCMs loaded onto origins has been proposed to be a key determinant of when those origins initiate DNA replication during S-phase". While it is true that this lab has proposed this model in budding yeast, the current study performs no experiments that directly address this model--i.e. that i. individual origins possess a different number of MCM complexes and or ii that these differences underlie timing differences. They acknowledge this point in their Discussion--a ChIPSeq experiment is an ensemble experiment--there is no way to know that differences in MCM signals correspond to a different number of MCM complexes per origin versus a differences in the fraction of cells that contain and MCM complex at all at a given origin . But this statement in the abstract, combined with their conclusion in the same section of the paper: "Our results support a model in which the loading activity of origins, controlled by their ability to recruit ORC and compete for MCM, determines the number of helicases loaded, which in turn affects replication timing" implies that they have tested a model that they have not tested. Given how quickly readers "skim" the literature these days, a misleading abstract can do a lot of damage to a field. The results presented in this study neither support nor refute the model for the number of helicases loaded per origin, and the fact that reducing origin licensing efficiency by making the major substrate limiting reduces the number of licensed origins in a cell population is fully expected based on the current state of the field .

      3,The rationale for the study as stated in the Introduction: "Although the molecular biochemistry of initiation at individual origins continues to be elucidated in great detail (Bleichert, 2019), the mechanism governing the time at which different regions of the genome replicate has remained largely elusive (Boos and Ferreira, 2019)." Is also misleading. In fact, in budding yeast (and other organisms) there have been several advances in this area particularly with respect to DNA replication origin activation. The S-phase origin activation factors are limiting for origin function, and factors such as Ctf19 at centromeres and Fkh1/2 at non-centromeric early-acting origins help to directly recruit the limiting S-phase factor, Dbf4, to origins. It is misleading to ignore this substantial progress and not make an effort to place this current study, which is important and one of the first to look directly at MCM loading control in yeast, into a relevant context with respect to what is known. What's interesting is that this S-phase model assumes/requires that most origins are, in fact, licensed and thus that differences in licensing efficiency are not a major driving of replication timing patterns in yeast. But we do not know why there are only subtle differences in MCM loading---this study may help explain that.

      4,The authors link the differential ability of MCM loading deficiencies when MCM is made limiting to differences in ORC binding categories. The "weak" origins, that presumably bind ORC weakly, were most affected by reductions in MCM. Are these origins less efficient than the other categories, DNA and chromatin-dependent (using the origin efficiency metric data from the Whitehouse lab) where MCM binding is not reduced as much? In normal cells are these early or late origins? Is the idea that the role of excess MCM is to achieve a sufficient number or "back up" origins per cell to deal with potential stress, as proposed by the Blow and Schwob labs in tissue culture cells many years ago? It seems likely that the data reported here are in fact confirmations of those early studies in mammalian cells---which is useful to know even if not unexpected.

      5,Aren't the results that losing MCM signal corresponds to loss of origin activity peaks entirely expected? The same result would be obtained if you made a point mutation in that origin's ACS. Of course preventing an origin from being licensed will delay that region's replication time in S-phase because it now must be replicated passively. Licensing affects replication timing patterns because the MCM complex is the substrate for limiting S-phase factors, but that is far different from concluding that the number of MCMs at an origin is what controls the time in S-phase when an origin is activated.

      6,The authors stated that the measured MCM abundance for the 43% of origins that are not known to be controlled by the multiple mechanisms that have been shown to control origin replication time. Is this because they think that MCM loading contributes to the timing control of only these origins? Was MCM loading not affected at any of these other origins when MCM levels were reduced? Are those 43% of origins in the "weak"binding category in terms of ORC? The rationale for eliminating so many origins from these analyses were not clear.

      7,Doesn't the data in Figure 4c at 0 mM auxin support the conclusion that differences in MCM ChIPsignals have negligible effects on origin activation time, in contrast to the publication by Das, 2015 from this lab? Or is the point that these origins are sensitive to reductions in MCM levels and the more sensitive they are the more delayed their replication time (but again, doesn't that have to be true? If they are losing MCM signals they cannot function as origins, so they are replicated passively and, by definition, will show delayed replication timing. An origin is defined as such by a loaded MCM complex.)

      8,I do not understand the conclusions from Figure 4d. There is an extremely small positive correlation between how much of an MCM signal is lost and delay in replication time of an origin, but this correlation is not surprising as an unlicensed origin cannot be an origin and will be replicated passively. What seems most surprising about these data is that the effect is so weak, not that it exists. There is quite a lot of scatter in this plot at 500 uM auxin, with some origins losing a given amount of signal (x) and being only slightly delayed in replication time, and others losing the same amount of signal (x) and being substantially delayed. What underlies this outcome?--Are the ones that are not substantially delayed closer to origins that have not been affected at all by MCM reductions? Why is the correlation so weak? The other regulators of origin activation time have stronger and more precise effects--for example the centromere-control can be precisely eliminated so that only the replication time of the centromere-proximal origins are delayed.

      9,Multiple studies in yeast and mammalian cells indicate that MCM subunits are in excess relative to other licensing and S-phase initiation factors, so it is not unexpected that over-expressing MCM did not lead to enhanced levels of licensing. It seems much more plausible that Cdc6 or Cdt1 or both factors are present in limiting amounts for MCM loading, so I did not understand the point of over-producing MCM subunits. If the "weak" origins are the ones that are most dramatically affected by reducing MCM to "limiting" levels, isn't the question whether you can increase licensing at these origins when you over-produce a factor that is likely limiting for licensing, such as Cdt1 or Cdc6 (or both) while leaving MCM at its normal levels. The fact that MCM levels are not limiting for licensing is not surprising and, if anything, argues against these levels having a regulatory role in origin activation timing---which seems to be the opposite of what the authors want to conclude.

      In summary, I think the technical aspects of the experiments were quite strong, but I do not think that the experiments answered the question that was posed by the authors.

      Minor points:

      Many places where "This data" should be changed to "These data". Data are plural.

      Significance

      Significance: see above

      Referees Cross Commenting

      Reviewer 3. My overall conclusions about this study are that the data are extremely nice and useful to the field, but that their potential to advance the field or clarify it for 'outsiders' are limited by 1, a biased. model-centric presentation that fails to put the work in context of a lot of strong previous work. Some of the conclusions cannot event be tested by the experimental design 2, some of the data analyses, for example the parsing of origins for analyses of MCM effects versus effects on replication time seem arbitrary and were not clearly justified. 3, The correlation between reductions in MCM loading and Trep delay seemed weak, even after parsing for origins expected to experience the largest effects, which is actually kind of interesting, but was ignored in favor of the pre-determined interpretation.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This is a nice study that characterizes the consequences of limiting or increasing Mcm expression on the replication program. Prior ChIP experiments in yeast have observed that not all origins exhibit the same level of Mcm enrichment and that increased mcm enrichment was correlated with origin activity. These observations led to two different models -- a) that multiple Mcm2-7 double hexamer complexes are loaded at some origins and b) a probabilistic model where the differential enrichment of Mcm2-7 reflected the fraction of cells in a population that had loaded the Mcm2-7 complex at a specific origin. While the titration experiments presented here don't provide any conclusive support for either model, they do provide some novel and relevant insights for the replication field, in part, due to the increased resolution and quantification afforded by the MNase ChIP-seq approach (and S. pombe spike in). The authors very nicely demonstrate that origins are differentially sensitive to Mcm2-7 depletion and that loss of Mcm2-7 loading results in an altered replication timing profile. The origins most impacted by loss of Mcm2-7 are 'weak' origins as described by the Fox group. Intriguingly, the authors find that the 5X overexpression of Mcm2-7 does not perturb the relative Mcm2-7 loading at individual origins, but rather instead globally represses Mcm2-7 association at all origins. They also find that overexpression of both Cdt1 and Mcm2-7 is detrimental to the cell (although no obvious replication phenotype was observed). Finally, the authors present a reasonable interpretation of their data in the context of models for replication timing which was very well articulated.

      Major Comments:

      From the methods it appears that different analyses were performed with different replicates?

      "Replicate #1 was used for all analyses except for V plots, for which the higher resolution Replicate #2 was used."

      Ideally all of the conclusions should be supported by all the replicates independently, or if the replicates are concordant -- they should be merged (at a similar sequencing depth) prior to doing the analyses.. Even the v-plots with merged replicates will be informative due to the greater sequencing depth.

      The authors should provide a separate analysis for the larger nucleosomal sized fragments and smaller putative MCM double hexamer fragments with regards to the Mcm loading and relationship to ACS and orientation. They may represent an interesting intermediate with mechanistic consequences for the interpretation.

      The authors should present the v-plots and an analysis of which side the Mcm's load for the overexpression studies. I was surprised that there was no further in-depth analysis for these two extremes. Perhaps similar conclusions will be reached, but it should at least be mentioned/presented as a supplementary figure.

      Minor Comments:

      This is largely semantic, but the majority of MNase ChIP-seq signal recovered is associated with the nucleosomes and not in the NDR and as the signal in the NDR is differentially sensitive to digestion, I would suggest rephrasing the following sentence:

      "In contrast to previous genome-wide reports (Belsky et al., 2015), but in agreement with recent in-vitro cryo-EM structures (Miller et al., 2019), we also observe MCM signal in the nucleosome-depleted region (NDR) of origins. "

      to :

      "In agreement with a previous genome-wide report (belsky 2015), we found that the bulk of the MCM signal was associated with nucleosomal sized fragments; however the increased resolution afforded by our approach allowed us to also detect protected fragments in the NDR as predicted by recent in vitro cryo em structures..."

      As a sanity check, please double check V-plots and presence of small fragments with the digestion conditions. In the Henikoff manuscript the bulk of sub-nucleosomal fragments were lost with the longer digestion time. Specifically, the TF footprints were more pronounced with minimal digestion. While it might be argued that the longer digestion more tightly resolved the binding site, in many cases they were completely lost with the 20 minute digestion. This is just a simple check -- I don't doubt the results as reported given the experimental conditions are very different. For example, the henikoff manuscript did not use cross linking or an antibody enrichment step.

      Last paragraph of the "MCM associates with nucleosomes section" which reports that the Mcm2-7 complex is loaded up or downstream from the ACS independent of orientation should cite Belsky 2015 (Figure 5 and discussion) for the initial observation.

      The authors argue that the global reduction in MCM loading associated with overexpression may be a technical artifact given that all origins exhibit a proportional reduction in mcm2-7 loading. However, this is exactly what the S. pombe spike in control is intended for. The relative difference between individual origins resulting from Mcm2-7 depletion would still be evident without the spike in. The authors do discuss different possibilities, but I would not be so keen to discard this as technical artifact.

      Significance

      This work has several advances that will be appreciated by the replication field -- including a high resolution view of Mcm2-7 loading in the context of chromatin; the impact of titrating (low and high) MCM expression on MCM loading and replication timing program; and a well reasoned discussion of how different models of MCM loading would impact origin activation and replication timing program. The work builds on prior studies in the field (eg. Belsky 2015), while some of the conclusions regarding the localization of the Mcm2-7 complex relative to the ACS and surrounding nucleosomes are confirmatory, the increased resolution provides new insight (like the enrichment of small fragments in the NDR) that could be further strengthened by additional analysis (see above).

      My expertise is DNA replication and chromatin.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This manuscript follows on from previous work from the Rhind lab to investigate whether the load of MCMs at origins is a factor in when the origin activate (as a population average) during S phase. The authors use budding yeast and a auxin degron system to modulate the levels of an MCM subunit. This allows them to titrate down the concentration of the MCM hexamer and observe the effect. Crucially, they assay both the reduction in MCM load at origins and the subsequent replication dynamics in the same experiment. This is the power of their approach and allows them to rigorously test their hypothesis.

      Major comments

      1.I found the introductory paragraph discussing the Rhind lab hypothesis about the possibility of multiple MCM being loaded at origins somewhat misleading. The first paragraph of the discussion was much clear. However, I feel that the introductory paragraph should deal with the difference between the two proposals: 0-1 MCM-DH per origin (de Moura et al), vs 0-50+ MCM-DH (Yang et al). It s also important to note that Foss et al find that "In budding yeast, [MCM] complexes were present in sharp peaks comprised largely of single double-hexamers" - i.e. consistent with 0-1 MCM-DH per origin.

      To improve the balance of the introduction, I think the authors should briefly introduce the concepts behind the 0-1 MCM-DH per origin; this was defined as origin competence by Stillman and clearly described by McCune et al (2008; see figure 8) prior to the work from de Moura et al. Furthermore, in the discussion the authors should be more even-handed. To date there is no data to conclusively rule one way or the other in distinguishing between single vs multiple MCMs. The authors cite Lynch et al and state "overexpression of origin-activating factors in S phase causes most all origins to fire early in S phase, consistent with most origins having at least one MCM loaded". However, Lynch et al report equivalent (roughly equal) origin efficiencies, but the assay doesn't distinguish between all going up to high efficiency or all going to a lower intermediary efficiency. Given that fork factors (polymerases, etc) are likely to become limiting at some point (or checkpoints could be activated due to limited dNTP supplies) it would seem plausible that uniform origin efficiency could be a consequence of less than maximal origin firing. As part of this discussion it would be useful for the authors to include what conclusions have been reached on MCM load from in vitro systems (with chromatin substrates).

      2.The authors are not the first to look at the consequence of reduced MCM concentrations on origin function. This was essentially the basis for the MCM screen undertaken by Bik Tye's lab that first identified the MCM genes. In addition to temperature sensitive mutants, the Tye group also examined heterozygotes (Lei et al., 1996) to show differential effect on the ability of two origins to support plasmid replication. The authors finds are entirely consistent with these early studies, particularly since ARS416 (formerly ARS1) was found to highly sensitive to reduced MCM levels and ARS1021 (formerly ARS121) was found to be insensitive to MCM levels. The authors find a signifiant reduction in MCM load at ARS416, but the MCM load at ARS1021 is unaltered by reduced MCM concentration. It would be worth the authors noting this consistency. The authors do cite the Lei study, but not in this context. The original MCM screen was published here: Maine, G., Sinha, P., Tye, B. (1984). Mutants of S. cerevisiae defective in the maintenance of minichromosomes Genetics 106(3), 365 - 385. Furthermore, at the end of the discussion the authors state that "it will be interesting to dissect the specific cis- and trans-acting factors that make origins sensitive or resistant to changes in MCM levels". The equivalent effect reported by the Tye lab has already been dissected by the Donaldson lab (Nieduszynski et al., 2006) and perhaps it would be worth briefly mentioning their findings.

      3.The authors should show the flow cytometry data for each of their cell cycle experiments, if only in supplementary figures. This is important to allow a reader (and reviewer) to judge the level of synchrony achieved when interpreting the results.

      4.I think the authors should show the ChIP signal at some example origins, including ones sensitive and insensitive to the reduction in MCM concentration. Currently all the high resolution ChIP data (i.e. over 1400 bp, e.g. Fig 3a) is presented as meta-analyses of many origins.

      5.When describing the results in Fig 4a the authors focus on changes (highlighted in black boxes) that fit their expectation. However, there are other sites that should at least be mentioned that don't seem to fit the authors model, e.g. ARS517, ARS518. It would be worth discussing what fraction of the timing data can be explained by the reduced MCM load.

      Minor comments

      -These data, rather than this data (throughout).

      -the authors should clearly state in figure legends what window size has been used in analysing genomic data.

      -in figure 2a the authors show pairwise comparisons between conditions, it would be nice to see the 3rd pairwise comparisons perhaps as a supplementary figure

      -in figure 2c it would be clearer to use the same colour for the lines and the points

      -the authors should avoid the use of red/green colour combinations in their figures (see: https://thenode.biologists.com/data-visualization-with-flying-colors/research/)

      -in the text the authors state "ORC binding to the ACS and subsequent MCM loading is a directional process dependent on a ACS- site and a similar but inverted nearby sequence (Xu et al., 2006)". I think it would be more appropriate to cite the following study here: Coster, G., Diffley, J. (2017). Bidirectional eukaryotic DNA replication is established by quasi-symmetrical helicase loading Science (New York, NY) 357(6348), 314 - 318. https://dx.doi.org/10.1126/science.aan0063

      -the list of factors that influence replication timing should include Rif1, whereas it is less clear that Rpd3 acts within the unique genome (as opposed to indirectly via repetitive DNA, e.g. rDNA)

      -figure 4 - it might help to mark the centromere on panel a. Also, why do the ChIP peaks and annotated origins appear to line up so poorly?

      -figure 4d - would it not be better to use fraction of lost MCM signal on the x-axis as in previous figures?

      -"with galactose or raffinose, to induce or repress Mcm2-7 overexpression, respectively." This is incorrect, raffinose does not repress this promoter (that requires glucose).

      -the S. pombe spike in is a great addition to the over expression experiments. It's a shame that it wasn't included in the auxin experiments.

      -why does the data in fig 5d appear to be at much lower resolution that the previous ChIP data?

      -in the sequencing analysis pipeline for MCM ChIP the authors use a 650 bp upper size limit; why have such a large threshold compared to the size of a nucleosome? Are the analyses and findings sensitive to this size threshold?

      -the repliscope package was published here:

      Batrakou, D., Müller, C., Wilson, R., Nieduszynski, C. (2020). DNA copy-number measurement of genome replication dynamics by high-throughput sequencing: the sort-seq, sync-seq and MFA-seq family. Nature Protocols 15(3), 1255 - 1284. https://dx.doi.org/10.1038/s41596-019-0287-7

      Significance

      This work builds upon a body of work from the Rhind group (and others) to determine the contribution of MCM load to replication origin activation dynamics. To my mind this is the most convincing dataset and analysis to date and goes a long way to supporting the model that the efficiency of MCM loading is a major factor in determining the mean replication time of an origin. As the authors state, they are still not able to distinguish between two different models of MCM load (single vs multiple). It would be interesting for the authors to discuss how these two models could be distinguished in the future (perhaps with single cell/molecule experiments).

      This study will be of interest to those in the fields of DNA replication and genome stability.

      My field of expertise is DNA replication and replication origin function.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Abrams and Nance describes how the polarity proteins PAR-6 and PKC-3/aPKC promote lumen extension of the unicellular excretory canal in C. elegans. Using tissue-specific depletion methods they find that CDC-42 and the RhoGEF EXC-5/FGD are required for luminal localization of PAR-6, which recruits the exocyst complex required for lumen extension. Interestingly, they show that the ortholog of the mammalian exocyst receptor, PAR-3, is dispensable for luminal membrane extension. Overall, this is a well-written and interesting manuscript.*

      1.Because depletion of PAR-3 in the canal causes milder defects than PAR-6 or CDC-42 the authors suggest that they cannot rule out the possibility that an alternative isoform of PAR-3 is expressed and buffering the defect. They should perform canal-specific RNAi-mediated depletion of the entire PAR-3 gene to determine if this is true.

      We agree with Reviewer 1 that it would be useful to provide additional evidence that an alternative isoform of PAR-3 lacking the ZF1 degron is not expressed. While tissue-specific RNAi could be used, we have not been successful obtaining complete knockdown in previous tissue-specific RNAi experiments. Moreover, this approach does not target any maternal PAR-3 protein that may be inherited by the excretory cell. As an alternative approach to address this point, we will analyze par-3::zf1::yfp/par-3(null) worms following excretory-cell-specific expression of zif-1, and compare to par-3::zf1::yfp/par-3::zf1::yfp siblings. We would expect the excretory cell phenotype to become more severe if additional, ‘phenotype-buffering’ forms of PAR-3 were present, or if there was incomplete degradation of PAR-3::ZF1::YFP in our previous experiments.

      2.The authors suggest that GTP-loaded (activated) CDC-42 recruits PAR-6 to the luminal membrane. It would be nice if they could use a biosensor, such as the GBD-WSP-1 reagent from Buechner's lab to confirm that EXC-5 depletion also reduces activated CDC-42, as would be expected. This should be achievable since there is strong CDC-42 signal, even in the cytoplasm.

      This is an excellent suggestion. We will utilize a CDC-42 biosensor – an integrated cdc42p::gfp::wsp-1(gbd) strain created in our lab and previously validated and characterized (Zilberman et al. 2017). We have confirmed that the biosensor is detected in the excretory canal and appears enriched at or near the lumenal membrane. We will cross the biosensor into the exc-5::zf1::mScarlet background. This will allow us to assess lumenal enrichment, and using heat shock inducible ZIF-1, determine if there is a reduction in biosensor lumenal enrichment when EXC-5::ZF1::mScarlet is acutely degraded.

      If the biosensor is difficult to measure at the canal lumen, an alternative approach would be to use an available exc-5 null allele to examine genetically if cdc-42 and exc-5 are acting in the same pathway. We could cross CDC-42exc(-) larvae into exc-5(rh232) and quantify excretory canal phenotypes. If CDC-42 and EXC-5 are indeed functioning in the same pathway we would expect no enhancement of the CDC-42exc(-) phenotype.

      3.Related to point 2, (i) does mutation of the CRIB domain of PAR-6 impair its recruitment to the luminal membrane, and (ii) does this mutant exacerbate canal defects when PAR-3 is depleted?

      (i) Our lab has previously generated and characterized a transgenic par6P::par-6(**CRIB)::gfp strain (Zilberman et al., 2017). We will examine this strain to determine if expression is detectable in the excretory canal, and if so, we will compare lumenal enrichment of PAR-6(CRIB)::GFP to control worms expressing wild-type PAR-6::GFP.

      (ii) This is a very interesting experiment, as it would help address if the mild phenotype observed in PAR-3 depleted animals is due to the remaining PAR-6 that is recruited by CDC-42. Our lab has previously shown that par6P::par-6(**CRIB)::gfp cannot rescue the embryonic lethality of a par-6 mutant, in contrast to par-6::gfp (Zilberman et al. 2017). This indicates that the CRIB domain is needed for PAR-6 function during embryogenesis and suggests that CRIB domain mutations introduced by CRISPR would almost certainly be lethal, precluding analysis of the excretory cell.

      As an alternative experiment, we would determine if PAR-3 localizes to the lumenal membrane independently of CDC-42; such a finding would imply that PAR-3 and CDC-42 likely have independent contributions to PAR-6 localization (rather than CDC-42 promoting PAR-6 localization by localizing PAR-3). To do this, we will degrade ZF1::YFP::CDC-42 in the excretory cell and examine the localization of PAR-3::mCherry compared to controls. We have all of the strains needed for this experiment.

      4.The authors hypothesize that partial recruitment of PAR-6 by CDC-42 is sufficient for luminal membrane extension to explain the mild defects caused by PAR-3 depletion. Since depletion of PAR-6 and CDC-42 alone causes milder canal truncations the authors should co-deplete these proteins (as well as PAR-3 and CDC-42) to determine if there is an additive effect.

      This is an excellent suggestion in principal. However, it is not possible to know in any given degradation experiment whether the targeted protein is completely degraded; we can only say it is no longer detectable by fluorescence. Thus, any degron allele (in the presence of ZIF-1) could behave like a strong hypomorph rather than a null. It would not be possible to interpret double degradation experiments in such a case, as a more severe phenotype in the double could simply be a result of combining two hypomorphic alleles, further reducing pathway activity even if the genes function together in the same pathway. To interpret this experiment properly, a null allele of at least one of the genes would have to be used. This is not possible since par and cdc-42 null mutants are lethal and there is also maternal contribution. As an alternative to these double depletion experiments, we will deplete PAR-6::ZF1::YFP or PAR-3::ZF1::YFP in exc-5 null mutant larvae, as unlike cdc-42, exc-5 is not an essential gene.

      5.In figure 2, the authors show that depletion of PKC-3 causes more severe canal truncations than PAR-6. Since these proteins function in the same complex what do they think is the reason for this difference? This point could be discussed more in the manuscript.

      As described in the previous point, incomplete degradation could produce modestly different phenotypes even for genes that act in the same pathway. Therefore, it is not possible to determine whether PAR-6 and PKC-3 have different roles using this approach. We will add text to the discussion bringing up this point.

      6.Related to point 5, more experiments with PKC-3 should be done to determine if, for example, localization of SEC-10 is similarly affected as ablation of PAR-3, PAR-6 and CDC-42.

      We agree, and will address this point by acutely degrading ZF1::GFP::PKC-3 and examining transgenic SEC-10::mCherry, as we have done for other par genes.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): The manuscript by Abrams & Nance describes a precise investigation of the role of PAR proteins in the recruitment of the exocyst during and after the extension of the C. elegans excretory canal. State-of-the-art genetic techniques are used to acutely deplete proteins only in the targeted cell, and examine the localization of endogenously expressed markers. Experiments are well described and carefully quantified, with systematic statistical analysis. The manuscript is easy to follow and the bibliography is very good. Most conclusions are well supported.

      1) I am not entirely convinced by the presence of CDC-42 at the lumenal membrane (Fig3G); it seems to be more sub-lumenal that really lumenal. It peaks well before PAR-6 (Fig3H) which itself seem slightly less apical that PAR-3 (Fig3F). Could you use super-resolution microscopy (compatible with endogenous expression levels) to more precisely localize CDC-42? Similar point for PAR-3 and PAR-6 which do not seem to colocalize completely - a longitudinal line scan along the lumenal membrane might provide the answer even without super-resolution; this could help explain why these two proteins do not have the same function. These suggestions are easy to do provided the authors can have access to super-resolution (Airyscan to name it; although other methods will be perfectly acceptable I believe it is the most simple one).

      We agree that the CDC-42 localization peak does not precisely match the PAR-6 peak. As the reviewer notes, resolving the subcellular localization of these two proteins will not be feasible using standard confocal microscopy. We will image the ZF1::YFP::CDC-42; PAR-6:mKate strain using a Zeiss LSM 880 with Airyscan to determine if their subcellular localization patterns are distinct.

      To examine PAR-3 and PAR-6 colocalization at the lumen, we will acquire additional confocal images of the PAR-6-ZF1-YFP; PAR-3-mCherry strain and examine colocalization of the clusters along the lumenal membrane. As a positive control for two proteins that should co-localize, we will image ZF1::GFP::PKC-3; PAR-6-mKate; these two proteins bind directly and co-localize in nearly all cells in which they have been examined.

      2) The same group has described a CDC-42 biosensor to detect its active form. It could be used here to precisely pinpoint where active CDC-42 is required: in the cytoplasm? At the lumenal membrane? colocalizing with what other protein? This will require the expression of a transgene under an excretory cell specific promotor and a simple injection strategy while helping to strengthen the description of the CDC-42 role.

      See Reviewer 1 point #2.

      3) As the authors certainly know, there is a PAR-6 mutation which prevents its binding to CDC-42. They could express this construct in the excretory canal a simple extrachromosomal array should be sufficient) to validate the direct interaction between these proteins in this cell.

      See Reviewer 1 point #3.

      4) What is the lethality of ZIF-1-mediated depletion of the various factors under the exc promoter? Can homozygous strains be maintained? Authors just have to add a sentence in the Mat&Met section.

      All of the strains with excretory cell-specific degradation we have examined are viable when grown on NGM plates. We will add this point to the materials and methods.

      Provided that the authors have access to an Airyscan, all the questions asked here can be answered in two months (one month for constructs, one month for injection and data analysis) at a very minor cost.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Strengths of this manuscript include the use of endogenously tagged proteins (rather than over-expressed transgenes) for high resolution imaging and a cell-type specific acute depletion strategy that avoids complicating pleiotropies and allows tests of molecular epistasis. While some results were fairly expected based on prior studies of Cdc42, PAR proteins, and the exocyst in other tissues or systems, differences in the requirements for par-6 and pkc-3 vs. par-3 strongly suggest that the former genes play more important roles in exocyst recruitment. I was also excited to see a connection made between EXC-5 and PKC-3 localization.

      1.Lumen formation vs. lumen extension. The abstract and introduction use these two terms almost interchangeably, but they are not the same and more care should be taken to avoid the former term. The data here do not demonstrate any roles for par or other genes in lumen formation, but do demonstrate roles in lumen extension and organization/shaping.

      We agree and will correct wording to indicate that lumen extension is affected.

      2.Related to the above, mutant phenotypes here are surprisingly mild and variable. The authors discuss possible reasons for the particularly mild phenotype of par-3 mutants, but don't specifically address the mild phenotypes of the others. Clearly quite a bit of polarization and apical membrane addition occurs in ALL of the mutants. Is this because those early steps use other/redundant molecular players, or is depletion too late or incomplete to reveal an early role?

      We agree with Reviewer 3 and will bring up these points in the discussion. Degradation of proteins strongly predicted to function together (RAL-1 and SEC-5; PAR-6 and PKC-3) produce similar although not identical phenotypes; as discussed above we consider it likely that these differences reflect minor differences in degradation efficiency below our ability to detect by fluorescence. As Reviewer 3 points out, the excretory-specific driver we use to express ZIF-1 may not be active at the very earliest stages of lumen formation, and degradation could take 45 minutes or more after the promoter becomes active (Armenti et al, 2014). Thus, we agree that phenotypes could be more severe if it were possible to completely deplete each tagged protein prior to the onset of lumen formation. However, this caveat does not change the interpretations of our experiments since all proteins are degraded with the same driver. We have avoided mentioning that the phenotypes we observe reflect the ‘null’ phenotype for these reasons. We will emphasize these points in the discussion.

      The authors introduce a new reagent, "excP" (the promoter for T28H11.8), which they use to drive canal cell expression of ZIF-1 for their degron experiments. Please provide more information about when in embryogenesis this promoter becomes active, how that compares to when the par genes, sec-5, ral-1 and cdc-42 are first expressed, and what canal length is at that time. It would also be helpful to show the timeframe for degron-based depletion using this reagent (Figure 1C shows only depletion at L4, days later).

      Publicly available single cell RNA seq data (https://pubmed.ncbi.nlm.nih.gov/31488706/ and https://cello.shinyapps.io/celegans_explorer/) suggest that canal expression of the endogenous T28H11.8 gene doesn't really ramp up until the 580-650 minute timepoint, which is several hours after par gene canal expression (270-390 minutes) and the initiation of canal lumen formation (bean stage, 400-450 minutes). These data suggest that excP might come on too late to test requirements in lumen formation and early stages of extension. This caveat should be at least mentioned.

      See point #2 above. We agree that providing more information on expression from the T28H11.8 promoter would be important for interpreting the severity of phenotypes. We will raise this point in the discussion, and include existing published expression data and a more detailed analysis of the excP::mCherry transgene.

      3.There are two major aspects to the mutant phenotypes observed here: short lumens and cystic lumens. A short lumen makes sense intuitively, but the cysts could use a little more explanation. (What are cysts? What is thought to be the basis of their formation?). It is intriguing that cysts in sec-5 vs. ral-1 mutants (Figure 1) and par-6 vs. pkc-3 mutants (Figure 4) seem to have a very different size and overall appearance. Are these consistent differences, and if so, what could be the explanation for them?

      This is an interesting point. Since it is not practical to perform time-lapse imaging to watch canal cysts form, we analyzed only L1 and L4 larvae. We believe from our imaging that these are discontinuous regions of the lumen. One explanation for the expansion and dilation of the cystic lumens by L4 stage could be that the canal lumen has been expanded by fluid buildup resulting from a defect in canal function in osmoregulation, but we have not tested this directly. The reviewer also raises an interesting point regarding different appearances of cysts in SEC-5 and RAL-1 depleted larvae compared to PAR-6 and PKC-3. It is possible that these differences arise because SEC-5 and RAL-1 might direct whether vesicles will fuse at all, whereas PAR proteins direct where they will fuse in the cell (i.e. there could be fusion at basal surfaces, or just reduced apical fusion). We will bring up these points in the discussion.

      4.The authors did not test if PKC-3, like PAR-6, is required to recruit exocyst to the canal cell apical membrane, but their prior studies in the embryo suggested that it is (Armenti et al 2014). They also did not test if EXC-5 is required to recruit PAR-6 and the exocyst (along with PKC-3), or if CDC-42 is required to recruit PKC-3 (along with PAR-6). There seems to be an assumption that PAR-6 and PKC-3 are regulated and function in a common manner (as is often the case), but that has not been demonstrated here specifically. The basis for this assumption and alternatives to the linear model should be acknowledged.

      As discussed above (Reviewer 1 point #6), we will directly test whether PKC-3 is required to recruit SEC-10::mCherry to the lumenal membrane. We agree with Reviewer 3 that we have not shown that PAR-6 and PKC-3 always function similarly, although this is expected based on their similar phenotypes and co-dependent functions in other cells. We will mention this caveat in the discussion.

      5.EXC-5 is presumed to act upstream of CDC-42 based on shared phenotypes and the known Rho GEF activity of its mammalian homologs. However, direct evidence for this is currently lacking. In future, the authors might test if depleting EXC-5 affects CDC-42 activation/GTP-loading by using CDC-42 biosensors that have been reported in the literature (e.g. Lazetic et al 2018).

      See Reviewer 1 point #2.

      \*Minor comments:** Figure 1, Figure 4, Figure S3, Figure S4 Blue color/CFP indicates the apical/luminal membrane or the apical region of the canal cytoplasm, not the actual lumen as the labels suggest. The lumen is a hollow cavity on the opposite side of the plasma membrane from these markers, and it is shown as white in the Figure 1A upper right cartoon.*

      Thank you for pointing this out. We will correct the figure labelling.

      Figure 2, Figure S2 I'm not confident in the statistical analysis used here (Fisher's Exact test on two bins, 50% canal length), given that four length bins (not two) were defined. I recommend consulting a statistician.

      Our rationale for using two bins for the statistical analysis was because control larvae nearly all have a similar canal length (L1 stage: 99% of larvae have canal length that is 51-75% of body length; L4 stage: 98% of larvae have canal length that is 76-100% of body length), making it straightforward to ask if mutants are shorter. We chose not to make more granular phenotypic comparisons, as we cannot rule out that subtle differences in degradation efficiency, rather than differences in biological function, underlie any differences in canal length of the degron mutants. We will consult with a statistician to determine if this is an acceptable way to statistically compare controls and mutants.

      p.3 "Born during late embryogenesis..." Actually, the canal cell is born at ~270 minutes after first cleavage, which is in the first half of embryogenesis, not what I would call "late".

      We agree and will correct the wording.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The C. elegans excretory canal cell is a classic model for studying single cell tubulogenesis, where a cell establishes an intracellular apical domain that extends to form a lumen. Prior studies in this system have identified a set of gene products that localize to the growing apical domain and/or are important for its organization and size, but the molecular pathways through which these various gene products act remain poorly understood. Here, Abrams and Nance are able to connect the dots among several of these to flesh out a pathway for apical membrane addition. Specifically, they demonstrate that CDC-42 is needed to recruit PAR-6, and that PAR-6 is needed to recruit the exocyst to the apical membrane and to promote proper apical membrane growth and organization. EXC-5, a candidate GEF for CDC-42, also appears to act in this pathway.

      Strengths of this manuscript include the use of endogenously tagged proteins (rather than over-expressed transgenes) for high resolution imaging and a cell-type specific acute depletion strategy that avoids complicating pleiotropies and allows tests of molecular epistasis. While some results were fairly expected based on prior studies of Cdc42, PAR proteins, and the exocyst in other tissues or systems, differences in the requirements for par-6 and pkc-3 vs. par-3 strongly suggest that the former genes play more important roles in exocyst recruitment. I was also excited to see a connection made between EXC-5 and PKC-3 localization.

      Major comments:

      1.Lumen formation vs. lumen extension. The abstract and introduction use these two terms almost interchangeably, but they are not the same and more care should be taken to avoid the former term. The data here do not demonstrate any roles for par or other genes in lumen formation, but do demonstrate roles in lumen extension and organization/shaping.

      2.Related to the above, mutant phenotypes here are surprisingly mild and variable. The authors discuss possible reasons for the particularly mild phenotype of par-3 mutants, but don't specifically address the mild phenotypes of the others. Clearly quite a bit of polarization and apical membrane addition occurs in ALL of the mutants. Is this because those early steps use other/redundant molecular players, or is depletion too late or incomplete to reveal an early role?

      The authors introduce a new reagent, "excP" (the promoter for T28H11.8), which they use to drive canal cell expression of ZIF-1 for their degron experiments. Please provide more information about when in embryogenesis this promoter becomes active, how that compares to when the par genes, sec-5, ral-1 and cdc-42 are first expressed, and what canal length is at that time. It would also be helpful to show the timeframe for degron-based depletion using this reagent (Figure 1C shows only depletion at L4, days later).

      Publicly available single cell RNA seq data (https://pubmed.ncbi.nlm.nih.gov/31488706/ and https://cello.shinyapps.io/celegans_explorer/) suggest that canal expression of the endogenous T28H11.8 gene doesn't really ramp up until the 580-650 minute timepoint, which is several hours after par gene canal expression (270-390 minutes) and the initiation of canal lumen formation (bean stage, 400-450 minutes). These data suggest that excP might come on too late to test requirements in lumen formation and early stages of extension. This caveat should be at least mentioned.

      3.There are two major aspects to the mutant phenotypes observed here: short lumens and cystic lumens. A short lumen makes sense intuitively, but the cysts could use a little more explanation. (What are cysts? What is thought to be the basis of their formation?). It is intriguing that cysts in sec-5 vs. ral-1 mutants (Figure 1) and par-6 vs. pkc-3 mutants (Figure 4) seem to have a very different size and overall appearance. Are these consistent differences, and if so, what could be the explanation for them?

      4.The authors did not test if PKC-3, like PAR-6, is required to recruit exocyst to the canal cell apical membrane, but their prior studies in the embryo suggested that it is (Armenti et al 2014). They also did not test if EXC-5 is required to recruit PAR-6 and the exocyst (along with PKC-3), or if CDC-42 is required to recruit PKC-3 (along with PAR-6). There seems to be an assumption that PAR-6 and PKC-3 are regulated and function in a common manner (as is often the case), but that has not been demonstrated here specifically. The basis for this assumption and alternatives to the linear model should be acknowledged.

      5.EXC-5 is presumed to act upstream of CDC-42 based on shared phenotypes and the known Rho GEF activity of its mammalian homologs. However, direct evidence for this is currently lacking. In future, the authors might test if depleting EXC-5 affects CDC-42 activation/GTP-loading by using CDC-42 biosensors that have been reported in the literature (e.g. Lazetic et al 2018).

      Minor comments:

      Figure 1, Figure 4, Figure S3, Figure S4 Blue color/CFP indicates the apical/luminal membrane or the apical region of the canal cytoplasm, not the actual lumen as the labels suggest. The lumen is a hollow cavity on the opposite side of the plasma membrane from these markers, and it is shown as white in the Figure 1A upper right cartoon.

      Figure 2, Figure S2 I'm not confident in the statistical analysis used here (Fisher's Exact test on two bins, <50% and >50% canal length), given that four length bins (not two) were defined. I recommend consulting a statistician.

      p.3 "Born during late embryogenesis..." Actually, the canal cell is born at ~270 minutes after first cleavage, which is in the first half of embryogenesis, not what I would call "late".

      Significance

      Polarized plasma membrane addition is critical for the development of epithelial tissues, so understanding the mechanisms that control this is of broad interest to many cell and developmental biologists. This study will be of particularly high interest to researchers working on PAR proteins, the exocyst, or single cell tube development.

      The results here add to the existing body of evidence for PAR-dependent recruitment of exocyst to expanding apical/luminal surfaces (e.g. Bryant et al 2010; Jones et al 2011, 2014; Armenti et al 2014) and to evidence for key functional distinctions between PAR-6 & PKC-3 vs. PAR-3 (e.g. Achilleos et al 2010; Jones et al 2014). The results here are more robust than in those prior studies and more clearly illustrate directionality due to the authors' acute depletion strategy, which avoids major tissue disruptions that could secondarily affect protein localization.

      expertise keywords: C. elegans, epithelia, tubulogenesis

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript by Abrams & Nance describes a precise investigation of the role of PAR proteins in the recruitment of the exocyst during and after the extension of the C. elegans excretory canal. State-of-the-art genetic techniques are used to acutely deplete proteins only in the targeted cell, and examine the localization of endogenously expressed markers. Experiments are well described and carefully quantified, with systematic statistical analysis. The manuscript is easy to follow and the bibliography is very good. Most conclusions are well supported.

      I only have a few minor questions or remarks:

      1) I am not entirely convinced by the presence of CDC-42 at the lumenal membrane (Fig3G); it seems to be more sub-lumenal that really lumenal. It peaks well before PAR-6 (Fig3H) which itself seem slightly less apical that PAR-3 (Fig3F). Could you use super-resolution microscopy (compatible with endogenous expression levels) to more precisely localize CDC-42? Similar point for PAR-3 and PAR-6 which do not seem to colocalize completely - a longitudinal line scan along the lumenal membrane might provide the answer even without super-resolution; this could help explain why these two proteins do not have the same function. These suggestions are easy to do provided the authors can have access to super-resolution (Airyscan to name it; although other methods will be perfectly acceptable I believe it is the most simple one).

      2) The same group has described a CDC-42 biosensor to detect its active form. It could be used here to precisely pinpoint where active CDC-42 is required: in the cytoplasm? At the lumenal membrane? colocalizing with what other protein? This will require the expression of a transgene under an excretory cell specific promotor and a simple injection strategy while helping to strengthen the description of the CDC-42 role.

      3) As the authors certainly know, there is a PAR-6 mutation which prevents its binding to CDC-42. They could express this construct in the excretory canal a simple extrachromosomal array should be sufficient) to validate the direct interaction between these proteins in this cell.

      4) What is the lethality of ZIF-1-mediated depletion of the various factors under the exc promoter? Can homozygous strains be maintained? Authors just have to add a sentence in the Mat&Met section.

      Provided that the authors have access to an Airyscan, all the questions asked here can be answered in two months (one month for constructs, one month for injection and data analysis) at a very minor cost.

      Significance

      The reviewer has an expertise in cell polarity and membrane trafficking, using C. elegans as a model.

      The manuscript by Abrams & Nance describes a precise investigation of the role of PAR proteins in the recruitment of the exocyst during and after the extension of the C. elegans excretory canal. The interactions between these factors have already been examined in a number of models and contexts. In particular it follows a previous study from the same group (Armenti et al, Dev Biol, 2014) which established that the exocyst and RAL-1 controls polarized secretion in this model, and that PAR proteins are required for the polarized localization of the exocyst, but using the early embryo. This new manuscript is entirely focused on the excretory canal and 1) confirms the previous results, and 2) significantly extends them by precisely dissecting the role of CDC-42 and the apical PAR proteins. It will be of interest to researchers investigating the links between polarity and membrane trafficking with the description of a molecular cascade required for membrane trafficking in the context of a single-cell tube.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript by Abrams and Nance describes how the polarity proteins PAR-6 and PKC-3/aPKC promote lumen extension of the unicellular excretory canal in C. elegans. Using tissue-specific depletion methods they find that CDC-42 and the RhoGEF EXC-5/FGD are required for luminal localization of PAR-6, which recruits the exocyst complex required for lumen extension. Interestingly, they show that the ortholog of the mammalian exocyst receptor, PAR-3, is dispensable for luminal membrane extension. Overall, this is a well-written and interesting manuscript.

      Major comments

      1.Because depletion of PAR-3 in the canal causes milder defects than PAR-6 or CDC-42 the authors suggest that they cannot rule out the possibility that an alternative isoform of PAR-3 is expressed and buffering the defect. They should perform canal-specific RNAi-mediated depletion of the entire PAR-3 gene to determine if this is true.

      2.The authors suggest that GTP-loaded (activated) CDC-42 recruits PAR-6 to the luminal membrane. It would be nice if they could use a biosensor, such as the GBD-WSP-1 reagent from Buechner's lab to confirm that EXC-5 depletion also reduces activated CDC-42, as would be expected. This should be achievable since there is strong CDC-42 signal, even in the cytoplasm.

      3.Related to point 2, (i) does mutation of the CRIB domain of PAR-6 impair its recruitment to the luminal membrane, and (ii) does this mutant exacerbate canal defects when PAR-3 is depleted?

      4.The authors hypothesize that partial recruitment of PAR-6 by CDC-42 is sufficient for luminal membrane extension to explain the mild defects caused by PAR-3 depletion. Since depletion of PAR-6 and CDC-42 alone causes milder canal truncations the authors should co-deplete these proteins (as well as PAR-3 and CDC-42) to determine if there is an additive effect.

      5.In figure 2, the authors show that depletion of PKC-3 causes more severe canal truncations than PAR-6. Since these proteins function in the same complex what do they think is the reason for this difference? This point could be discussed more in the manuscript.

      6.Related to point 5, more experiments with PKC-3 should be done to determine if, for example, localization of SEC-10 is similarly affected as ablation of PAR-3, PAR-6 and CDC-42.

      Significance

      This manuscript builds off their previous work on the role of the exocyst in excretory canal extension and in our view represents an important advance that is relevant to biological tube development across phyla. Therefore, this work should be of interest to biologists studying tubulogenesis in many different model systems.

      My areas of expertise include model organism genetics, biological tube development, and biochemistry.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful to Review Commons for the opportunity to get valuable comments on our manuscript “Trim39 regulates neuronal apoptosis by acting as a SUMO-targeted E3 ubiquitin-ligase for the transcription factor NFATc3”. We would like to acknowledge the very nice and constructive reviews that our manuscript received. We found all of the reviewer comments well founded and we are taking them into careful consideration in preparing the revised version. We are currently performing additional experiments to address the questions raised by the reviewers. We are not yet able to provide a revised version of the manuscript, but you will find below our response to the reviewers and our plan of revision. It is difficult to anticipate exactly how much time we will need to get the requested results and to prepare a complete revised version, as it will depend on whether we can work normally and whether we encounter technical problems. However, it should be possible within a few months.

      Reviewer #1

      **Summary:**

      Desagher and co-workers investigate the regulation of the NFAT family member NFATc3, a transcription factor in neurons with a pro-apoptotic role. They identify TRIM39 as a ubiquitin E3 ligase regulating NFATc3. They demonstrate that TRIM39 can bind and ubiquitinate NFATc3 in vitro and in cells. They identify a critical SUMO interaction motif in TRIM39, that is required for its interaction with NFATc3 and for its ability to ubiquitinate NFATc3. Moreover, mutating sumoylation sites in NFATc3 reduces the interaction with TRIM39 and reduces its ubiquitination. Silencing TRIM39 increases the protein levels of NFATc3 and its transcriptional activity, leading to apoptosis of neurons. TRIM17 modulates the TRIM39-NFATc3 axis. Combined, TRIM39 appears to be a SUMO-targeted ubiquitin ligase (STUbL) for NFATc3 in neurons.

      **Major points:**

      1.This manuscript containing two stories: the rather exciting story that TRIM39 is a STUbL for NFATc3 (as mentioned in the title) and the second less exciting story: TRIM17 modulates the regulation of NFATc3 by TRIM39. These stories are now mixed in a confusing manner, disrupting the flow of the first story. It would be better to focus the current manuscript on the first story and strengthen it further and develop the second story in a second manuscript.

      We understand that the reviewer is more interested in the part of our manuscript related to the characterization of Trim39 as a STUbL due to his/her field of expertise. However, the two other reviewers are also interested in the other parts of our work. Notably the third reviewer would like us to highlight the physiological importance of our findings. Indeed, the main goal of this article is to describe the mechanisms regulating the stability of the transcription factor NFATc3. Trim17 plays a role in this regulation by inhibiting Trim39. It is particularly important for understanding the impact of these mechanisms on neuronal apoptosis as Trim17 is induced in these conditions. As we want to reach a wide audience, we prefer not to focus our manuscript on the identification of a new STUbL. However, we agree with the reviewer that it would be very interesting to strengthen this part of our work and we are grateful for his/her suggestions.

      2.Whereas the cellular experiments to indicate that TRIM39 acts as a STUbL are properly carried out, the observed effects are not necessarily direct. Direct evidence that TRIM39 is indeed a STUbL for sumoylated NFATc3 needs to be obtained in vitro, using purified recombinant proteins. Does TRIM39 indeed preferentially ubiquitinate sumoylated NFATc3? Is ubiquitination reduced for non-sumoylated NFATc3? Is ubiquitination of sumoylated NFATc3 dependent on SIM3 of TRIM39? Do other SIMs in TRIM39 contribute?

      We agree with the reviewer that additional in vitro experiments using purified recombinant proteins would strengthen the characterization of Trim39 as a STUbL. In order to answer the specific questions of the reviewer, we propose to perform in vitro ubiquitination using different forms of GST-Trim39 (WT/mSIM3/mSIM1&2) following in vitro SUMOylation (or not) of NFATc3 produced by TnT (wheat germ) and purified by immunoprecipitation. Preliminary results using WT Trim39 show that indeed the in vitro ubiquitination of NFATc3 is improved by prior in vitro SUMOylation. We have to confirm these results and to test the SIM mutants of Trim39 in the same conditions.

      3.Rule out potential roles for other STUbLs by including control knockdowns of RNF4 and RNF111 and verify the sumoylation of NFATc3 and ubiquitination of wildtype and sumoylation-mutant NFATc3.

      Our data show that silencing of Trim39 deeply decreases the ubiquitination level of NFATc3 in Neuro2A cells, indicating that Trim39 plays a major role in this process. We agree that this does not exclude the possible involvement of other STUbLs in NFATc3 ubiquitination in this model but their potential contribution would be limited. This point will be better addressed in the discussion.

      4.Figure 6B: use SUMO inhibitor ML-792 to demonstrate that ubiquitination of wildtype NFATc3 by TRIM39 is dependent on sumoylation.

      We thank the reviewer for suggesting this experiments that can easily improve the strength of our demonstration. Our preliminary results indeed indicate that in vivo ubiquitination of NFATc3 by Trim39 is strongly decreased following treatment with the SUMO inhibitor ML-792. We have to confirm these results.

      **Minor points:**

      5.Figure 1A and B: demonstrate by immunoprecipitation and Western that the endogenous counterparts indeed interact.

      We are currently setting the conditions to immunoprecipitate endogenous NFATc3 and Trim39 in order to demonstrate that they indeed interact.

      6.Figure 1C and 1E: Quantify the PLA results properly and perform statistics.

      We will perform these quantification and statistical analysis as requested.

      7.Figure 2B: Correct unequal loading of samples.

      We agree with the reviewer (as with reviewer #2) that the blots showing the total lysates of this experiment are confusing. As mentioned in the legend, some material has been lost during the TCA precipitation resulting in unequal loading. However, these experiments have been performed a very long time ago and we do not have the protein extracts anymore. We are currently trying to produce efficient shRNA-expressing lentiviruses to reproduce this experiment and provide a better figure.

      8.Figure 6B: proper statistics are needed here from at least three independent experiments.

      The reviewer is right. Statistics are needed to reinforce the significance of these results. We have quantified three independent experiments and made graphs and statistics that will be presented in the revised version of the manuscript. They better support our conclusion.

      Reviewer #1 (Significance (Required)):

      Humans have over 600 different ubiquitin E3s. Currently, RNF4 and RNF111 are the only known human SUMO-Targeted Ubiquitin Ligases (STUbLs). Here, the authors present evidence that the ubiquitin E3 ligase TRIM39 is a STUbL for sumoylated NFATc3. Identification of a new STUbL is an exciting finding for the ubiquitin and SUMO field and for the field of ubiquitin-like signal transduction in general, but needs to be strengthened as outlined above. My field of expertise is SUMO and ubiquitin signal transduction.

      Reviewer #2

      In this manuscript, the authors analyze the effect of TRIM39, a ubiquitin E3 ligase, on NFATc3, a transcription factor that regulates apoptosis in the nervous system. The authors show that TRIM39 can promote the ubiquitination of NFATc3 and regulate its half-life. Furthermore, ubiquitination depends on the SUMOylation state of NFATc3, which suggests that TRIM39 could be a new example of SUMOylation-dependent ubiquitin ligase or STUbL. **In addition, the authors show that TRIM17 interferes with TRIM39 ubiquitination, representing a new regulatory level for NFATc3 degradation. This has consequences on the regulation of apoptosis in cells derived from the nervous system.

      The authors show well-controlled, sound results for the most part. The manuscript is well written, and argumentation is convincing. Given the fact that only 2 STUbLs were previously characterized in mammals, the results are relevant and represent an advance in the field. Overall, this is a nice piece of work. Here are some comments.

      **Major comments**

      -In Fig. 2B, the levels of material loaded are uneven, which difficult the interpretation.

      We agree with the reviewer (as with reviewer #1) that the blots showing the total lysates of this experiment are confusing. As mentioned in the legend, some material has been lost during the TCA precipitation, resulting in unequal loading. In the other experiments, we have the same problem or the background is too high. We are currently trying to produce efficient shRNA-expressing lentiviruses to reproduce this experiment and provide a better figure.

      However, it seems that the control shRNA also has an effect on NFATc3 ubiquitination, which should not be the case.

      It is true that, in the present figure, the ubiquitination signal is decreased in cells transduced with the control shRNA. However, this is likely due to reduced expression of transfected NFATc3 following lentiviral infection, as it can be seen on the western blot of total lysates.

      Also, by reducing ubiquitination by TRIM39, shouldn't you expect an increase in the levels of NFATc3, if this ubiquitination was driving degradation? The authors do not specify whether those cells were treated or not with proteasomal inhibitor.

      We agree that an increase in the protein level of NFATc3 is expected following silencing of Trim39. However, in the assay presented in Figure 2B, NFATc3 is transfected and the part of overexpressed NFATc3 that is ubiquitinated by endogenous Trim39 is certainly low. Therefore, silencing of Trim39 cannot have a visible impact on the total protein level of NFATc3.

      Indeed, cells were treated with proteasome inhibitor. It is mentioned in the legend of Figure 2A. To avoid repeating it in the legend of Figure 2B, we just wrote that, after 24h transfection, cells were treated as in A, with includes MG-132 treatment for 6h.

      Same applies in Figure 4B, where no reduction in NFATc3 are seen after including TRIM39 in the reaction (beyond the fact that it looks reduced because the presence of ubiquitinated forms).

      In Figure 4B, the reaction of ubiquitination is performed in an acellular medium with purified recombinant proteins. Although NFATc3 is produced by in vitro transcription/translation in wheat germ extract, it is purified by immunoprecipitation before in vitro ubiquitination. Therefore, the reaction does not contain any proteasome and NFATc3 should not be degraded following its ubiquitination by TRIM39.

      -After the experiments in vitro shown in Fig. 2C, the authors conclude that the NFATc3 is a direct substrate of TRIM39. I think the authors used the right approach by using bacterially produced GST-TRIM39 for the ubiquitination reaction. However NFATc3 is produced by an in vitro transcription-translation system, which could in principle provide other contaminant proteins to the reaction. Did the authors try to use bacterially produced NFATc3? This might be difficult in the case of big proteins, in which case the authors could add some caution note in the text. Same applies in Figure 4B.

      The reviewer is right. It would have been preferable to use NFATc3 produced in bacteria. Indeed, we started with this approach. However, it was very difficult to get NFATc3 expressed in bacteria, and when we succeeded, most of the protein was degraded. We tried different protease inhibitor cocktails and we used a strain of bacteria (BL21-CodonPlus(DE3)-RP) that is mutated on the genes coding for the proteases Lon and OmpT and is further engineered to express tRNAs that are often limiting when expressing mammalian proteins. Unfortunately, this did not improve our production enough.

      We agree that, in principle, in vitro transcription-translation (TnT) systems can include contaminant proteins. However, we used wheat germ extract to produce NFATc3 by TnT. Moreover, we immunopurified NFATc3 from the TnT reaction prior to the ubiquitination reaction. The probability that proteins modifying NFATc3 are expressed in plants and are co-purified with NFATc3 is low. Nevertheless, we will discuss this point in the result section of the revised version of the manuscript, when describing results of Figure 2B and 4B.

      -In Fig. 6B, higher levels of ubiquitination in the different SUMOylation mutants are shown. Is this effect consistent? How this can be explained?

      We are grateful to the reviewer for pointing out this inconsistency in our manuscript. It will be corrected. Indeed, the values indicated in red in Figure 6B are confusing and are certainly not consistent. We calculated them by normalizing the intensity of the ubiquitination signal by the intensity of NFATc3 in total lysates, which seems to have introduced a bias. Variations in NFATc3 levels are probably responsible for the artificially higher levels of ubiquitination for different SUMOylation mutants after normalization. When quantifying three independent experiments, as requested by reviewer #1, we realized that results are much more consistent without normalization. Therefore, in the revised version of manuscript, we will add a graph showing the average and standard deviation of three independent experiments quantified without normalization. We will also replace the experiment currently presented in Figure 6B by another one in which the levels of NFATc3 show lower variations in the total lysates.

      In addition, variations in the levels of NFATc3 are shown in the total lysate, despite the use of proteasomal inhibitors. How the author explain this effect?

      These variations in NFATc3 levels in the total lysates may be due to differential protein precipitation by TCA. That is why, in more recent experiments, we collected a portion of the homogenous cell suspension before lysis in the guanidinium buffer, to assess the expression level of transfected proteins (as presented in Figures 4A and 7E).

      It is true that treatment with proteasome inhibitor should attenuate differences in protein level due to different ubiquitination levels. However, cells are transfected for 24h and then treated with MG-132 for 6h before lysis. Proteasome inhibition cannot compensate for what occurred in the cells during the 24h transfection. It is added essentially to accumulate poly-ubiquitinated forms of NFATc3.

      Somehow, this is contradictory with the general message of SUMOylation-dependent ubiquitination.

      The reduced levels of SUMOylation mutants in total lysates may appear to be contradictory with SUMOylation-dependent ubiquitination. However, as mentioned above, this could be due to differential protein precipitation by TCA or to different transfection efficiencies. In contrast, the half-life measurement of WT and EallA mutant, that does not rely on initial expression levels, clearly shows a stabilization of the SUMOylation mutant. Moreover, the average of the three ubiquitination experiments is really convincing. Therefore, we believe that the data that will be presented in the revised manuscript will strongly support our hypothesis.

      -In Fig. 7E, not clear to me what the big bands above 130 KDa is after the nickel beads. Do they correspond to monoUb NFATc3 or to the unmodified protein that is sticky to the beads? Do the authors have side-by-side gels of the initial lysate next to the nickel beads eluates to show the increase in molecular weight?

      The big bands above 130 kD among nickel bead-purified proteins in Figure 7A are unlikely to be unmodified NFATc3 sticking to the beads. Indeed, in the control condition, in which NFATc3 is overexpressed in the absence of His-ubiquitin, these bands are not visible. Therefore, they might be mono-ubiquitinated forms of NFATc3, or degradation products of poly-ubiquitinated NFATc3. We will correct the figure to clarify this point. Unfortunately, we do not have a gel with nickel bead eluates and total lysates side by side for this experiment.

      -Quantifications in some pictures (i.e. Figures 5A, 5B, 6B, 7) is shown in red above or below the bands. Not clear whether the quantifications shown correspond to that single experiment or is the average of several experiments. In the first case, the number would not be very valuable. Authors could add quantification graphs with standard deviations or error bars to the experiments if they want to make the point of changes (significant or not) in the levels. Alternatively, indicate in the Figure legends whether the numbers correspond to the average of several experiments.

      These quantifications correspond to the representative experiments shown in the different figures. We will clarify this point in the Figure legends of the revised manuscript. We added these quantifications to normalize the amount of co-precipitated proteins by the amount of the precipitated partner (Fig 5A, 5B, 7B, 7C, 7D) which is not always precipitated with the same efficiency in the different conditions. We think that it should help the reader to assess the degree of interaction. We also added quantifications to Figure 7E to normalize the ubiquitination signal by the amount of NFATc3 expressed in the total lysate. However, we did not want to overload the figures by adding too many graphs.

      For Figure 6B, where TCA precipitation of total lysates created an inconsistency, we will provide a graph with the average and standard deviation of three independent experiments, as requested by reviewer #1.

      -In Fig. 8, the quantification of apoptotic nuclei has been done just based on the morphology after DAPI staining. Could you use an apoptosis marker (i.e. cleaved caspase Abs) to label the apoptotic cells?

      We have been using primary cultures of cerebellar granule neurons (CGN) as an in vitro model of neuronal apoptosis for many years. Nuclear condensation, visualized after DAPI staining, is very characteristic in these neurons and allows a reliable assessment of neuronal apoptosis. In a previous study (Desagher et al. JBC 2005), we have shown that the kinetics of apoptosis in CGN is the same whether we measure cytochrome c release, active caspase 3 or nuclear condensation (Fig 1b). We therefore believe that the counting of apoptotic nuclei is sufficient to support our conclusions, notably for transfection experiments in Figure 8A which would require a lot of work to be repeated with active caspase 3 staining. However, if we can produce efficient shRNA-expressing lentiviruses, we will reproduce the experiment presented in Figure 8B and we will perform a western blot using anti-active caspase 3 to confirm our conclusion.

      **Minor comments**

      -In Figs. 1 and 5, the red channel should be put in black and white, as it is much easier to see the signal. Not relevant to have DAPI alone in B&W (it does not hurt either), as it is well visible in the merge picture. Also, quantification of the PLA positive dots should be shown in Fig. 1.

      We thank the reviewer for these suggestions. We will modify the figures and we will quantify the PLA dots in Figure 1 as requested.

      -In Fig. 3C, is the difference in TRIM17 expression between empty plasmid and NFATc3 plasmid significant? If so, indicate it in the graph. The same in panels D, E, indicate all significant differences. Same in other Figures.

      No, the difference in Trim17 expression is not statistically significant between NFATc3 and empty plasmid although it clearly increases. However, we agree with the reviewer that more significant differences could be shown in the figures, particularly in Figure 3. Nonetheless, we will try not to overload the figures and will restrict ourselves to comparisons that make sense.

      -It would be nice to show a scheme on the location of SIMs in TRIM39 in relation to the other feature of the protein.

      We are grateful to the reviewer for this suggestion. We will be happy to add a scheme of Trim39 showing its different domains and the location of its SIMs in the revised Figure 7.

      -In Fig. 2 legend, "Note that in the presence of ubiquitin the unmodified form of WT GST-Trim39 is lower due to high Trim39 ubiquitination." Please change to "...in the presence of ubiquitin the levels of the unmodified form..."

      -In Fig. 7 legend, the phrases "The intensity of the bands ... " are not clear. Please rephrase.

      -In Fig. 8 legend, "\** * PWe thank the reviewer for pointing out typographical errors and awkward sentences in our manuscript. Changes will be made as requested.

      Reviewer #2 (Significance (Required)):

      In this manuscript, the authors analyze the effect of TRIM39, a ubiquitin E3 ligase, on NFATc3, a transcription factor that regulates apoptosis in the nervous system. The authors show that TRIM39 can promote the ubiquitination of NFATc3 and regulate its half-life. Furthermore, ubiquitination depends on the SUMOylation state of NFATc3, which suggests that TRIM39 could be a new example of SUMOylation-dependent ubiquitin ligase or STUbL. In addition, the authors show that TRIM17 interferes with TRIM39 ubiquitination, representing a new regulatory level for NFATc3 degradation. This has consequences on the regulation of apoptosis in cells derived from the nervous system.

      The authors show well-controlled, sound results for the most part. The manuscript is well written, and argumentation is convincing. Given the fact that only 2 STUbLs were previously characterized in mammals, the results are relevant and represent an advance in the field. Overall, this is a nice piece of work.

      Audience: researchers interested on proteostasis in general and on nervous system regulation

      My expertise: postranslational modifications

      Reviewer #3

      **Summary:**

      In this study, Shrivastava et al. elucidated the previously unknown function of TRIM39 in regulating protein stability of NFATc3, the predominant member of the NFAT family of transcription factor in neurons, where it plays a pro-apoptotic role. NFATs have been shown to be regulated by multiple mechanisms, including at the level of protein stability. In this study, the authors identify TRIM39 as the E3 ligase for NFATc3. Interestingly, TRIM39 recognizes the SUMOylated form of NFATc3 and the interaction facilitates its ubiquitylation and subsequent proteasomal degradation. They further showed that binding of TRIM39 to NFATc3 can also be regulated by TRIM17. Like TRIM39, TRIM17 is a ring-finger containing protein previously shown by this group that it binds NFATc3 but the interaction resulted in an up- rather than down-regulation of NFATc3. In this study, they offer insight to the paradox that overexpression of TRIM17 binding to TRIM39 is to inhibit TRIM39-mediated ubiquitylation of NFATc3. Furthermore, they showed activation of NFATc3 transcriptionally activates TRIM17 expression, thus forming a feedback loop between NFATc3 and TRIM17. Hence, an TRIM17-TRIM39-NFATc3 signaling axis for modulating the protein stability for promoting the activity of NFATc3 in regulating apoptosis in the cerebellar granule neurons induced by KCl deprivation is proposed

      The key conclusions are convincing. The data in general are of good quality and with many of the key interactions vigorously documented **by conducting reciprocal interaction analysis. For knockdown expeRIMents, two shRNA independent sequences were used. However, some issues remain to be addressed:

      **Major comments:**

      1.Figure 1D - the authors should demonstrate that the depletion of TRIM39 expression by shRNA in Neuro2A by Western blotting

      We agree with the reviewer that it would be better to provide this control. Unfortunately, we have never been able to observe a convincing decrease in the protein level of Trim39, following knockdown, by Western blotting in Neuro2A cells. This is surprising because the decrease is clearly visible by immunofluorescence in Neuro2A cells, and by western blotting in neurons (see Figure 8C). It is possible that Neuro2A cells, but not neurons, express a protein that is non-specifically recognized by our best anti-Trim39 antibody in western blots and that migrates at the same size as Trim39, thus preventing the investigator to detect the depletion of Trim39. We will test additional anti-Trim39 antibodies to address this question.

      2.Figure 3 - the author should show overexpression of TRIM39 resulted in reduction of basal level of endogenous NFATc3 due to its effect on protein stability by using CHX or other pulse chase method.

      This is an important point and we have performed many experiments using cycloheximide to measure the half-life of NFATc3 in the presence or the absence of overexpressed Trim39. The results were neither consistent nor reproducible. This is certainly due to the fact that the half-life of endogenous NFATc3 is longer than that of overexpressed Trim39 and that cycloheximide inhibits the expression of both proteins. Therefore, we will perform pulse-chase experiments after metabolic labelling of cells with [35S]-Met. We are currently setting up the conditions to immunoprecipitate endogenous NFATc3 to be able to perform these experiments.

      3.Figure 3 - Does overexpression or knockdown of TRIM39 has an effect on affecting levels of NFATc3 mRNAs?

      The reviewer is right. It is important to control that overexpression and knockdown of Trim39 do not modify the mRNA level of NFATc3. Therefore, we are currently measuring NFATc3 mRNA levels in all the experiments used to make the graphs of Figure 3. These results will be added to the revised version of the manuscript as supplemental data. First results show no significant change of NFATc3 mRNA levels in these experiments.

      4.Figure 6A - the authors should confirm the multiple bands that are slower migrating are SUMO form of NFATc9 by demonstrating the presence of SUMO in these forms of NFATc3, or alternatively, perform His-SUMO pull-down and probe for NFATc3.

      The reactions shown in Figure 6B have been performed in vitro, with purified recombinant proteins and with NFATc3 produced by in vitro transcription/translation. The wheat germ extract used to produce NFATc3 is unlikely to provide the material needed for post-translation modification of a mammalian protein. However, we agree that it would be better to confirm that slower migrating bands are indeed SUMOylated forms of NFATc3. We may hybridize the membranes with an anti-SUMO antibody but it would give a smear as the enzymes added to the reaction mix are themselves SUMOylated. Therefore, we will show an experiment in which the reaction mix has been incubated with and without SUMO. The results show no slower migrating bands in the absence of SUMO although all conditions were otherwise identical. This will be added to the revised Figure 6.

      5.Figure 7C - the quantification for mSIM1 does not seem to agree with the band intensity.

      Yes, we agree with the reviewer that the quantification (122%) does not seem to reflect the amount of SUMO-chains bound to GST-Trim39 mSIM1. This is due to the normalization of the SUMO signals by the intensity of GST-Trim39 bands. Indeed, it is difficult to control exactly how much recombinant protein is used. GST-Trim39 mSIM1 was slightly less abundant than the other GST-Trim39 proteins in this experiment, explaining why less SUMO-chains were eluted in this condition. The normalization is mentioned in the legend of Figure 7C.

      6.TRIM17 reduces TRIM39/NFATc3 interaction and inhibits TRIM39 E3 activity, which results in stabilization of NFATc3. NFATc3 in turn transcriptionally induces TRIM17 expression, thus forming a feedback loop between NFATc3 and TRIM17. It will be good if the authors can discuss the possibility of the existence of this feedback mechanism in physiological context? Is the protein level of NFATc3 level, which should be low abundance at the resting state, elevated by KCI deprivation? If so, can the authors discuss the possible signalling event(s) that that lead to activation of NFATc3 upon KCI deprivation? For instance, does KCL deprivation cause de-SUMOylation of NFATc3?

      We thank the reviewer for these suggestions. Our preliminary results suggest that the protein level of NFATc3 is increased in neurons following KCl deprivation. We are currently performing additional experiments to confirm this result. If proved, this increase may be due to the transcriptional induction of Trim17 that should result in the stabilization of NFATc3 through the inhibition of Trim39. It may also be due to a possible deSUMOylation of NFATc3 following apoptosis induction, as suggested by the reviewer. To address the latter point, we are currently setting up PLA using anti-NFATc3 and anti-SUMO antibodies to assess the SUMOylation level of endogenous NFATc3 in neurons. If they are of good quality, we will add these data to Figure 8 and we will discuss the possible existence of feedback loops in neuronal apoptosis, as suggested by the reviewer.

      **Minor comments:**

      1.Line 294 - it should be "SUMOylation" instead of "SUMO".

      We thank the reviewer for pointing out this typographical error that will be corrected.

      2.Figure 8 - to include TRIM39/NFATc3 double knockdown to show the effect on increased neuronal apoptosis in the cells with TRIM39 knocked down was due to elevation of NFATc3 rather than other target(s) of TRIM39.

      We agree that it would be interesting to test whether the increase on neuronal apoptosis following Trim39 silencing is mainly due to its effect on NFATc3. We will therefore perform double silencing of Trim39 and NFATc3 in neurons in order to address this point.

      3.The discussion may be shortened and revised to highlight the physiological importance of the findings linked to cerebellar granule neurons survival.

      As suggested by the reviewer, we will modify the discussion to better highlight the physiological implications of our data, particularly by discussing the results of the additional experiments we will conduct in neurons.

      Reviewer #3 (Significance (Required)):

      Prior to this study, the mechanism by which protein stability of NFATc3, the pre-dominant member of the NFAT family of transcription factor in neurons, is regulated remains poorly understood. Shrivastava et al. have unravelled the interplay between ubiquitylation and SUMOylation involving TRIM39 and TRIM17 to have an important role in regulating protein stability of NFATc3. The work is interesting and bears significance towards understanding how apoptosis could be finely controlled in cerebellar granule neurons. Furthermore, the study has also expanded the understanding of the role and regulation of the TRIM family of proteins. The senior author is an expert in this field and over the years, her group has contributed many key discoveries on the function of TRIM family of E3 ubiquitin ligases and their critical ubiquitylation substrates in neuronal survival and its relevance to neuronal biology and diseases.

      The referee's field of expertise in in the field of mitochondrial apoptosis signalling. The referee extensively involved in studying how protein stability of regulators in apoptosis signalling are regulated by the ubiquitin-proteasome system (UPS) and how does the regulation play a role in physiology and diseases.

      Key words: apoptosis, ubiquitylation, cell signaling, liver diseases

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this study, Shrivastava et al. elucidated the previously unknown function of TRIM39 in regulating protein stability of NFATc3, the predominant member of the NFAT family of transcription factor in neurons, where it plays a pro-apoptotic role. NFATs have been shown to be regulated by multiple mechanisms, including at the level of protein stability. In this study, the authors identify TRIM39 as the E3 ligase for NFATc3. Interestingly, TRIM39 recognizes the SUMOylated form of NFATc3 and the interaction facilitates its ubiquitylation and subsequent proteasomal degradation. They further showed that binding of TRIM39 to NFATc3 can also be regulated by TRIM17. Like TRIM39, TRIM17 is a ring-finger containing protein previously shown by this group that it binds NFATc3 but the interaction resulted in an up- rather than down-regulation of NFATc3. In this study, they offer insight to the paradox that overexpression of TRIM17 binding to TRIM39 is to inhibit TRIM39-mediated ubiquitylation of NFATc3. Furthermore, they showed activation of NFATc3 transcriptionally activates TRIM17 expression, thus forming a feedback loop between NFATc3 and TRIM17. Hence, an TRIM17-TRIM39-NFATc3 signaling axis for modulating the protein stability for promoting the activity of NFATc3 in regulating apoptosis in the cerebellar granule neurons induced by KCl deprivation is proposed.

      The key conclusions are convincing. The data in general are of good quality and with many of the key interactions vigorously documented by conducting reciprocal interaction analysis. For knockdown expeRIMents, two shRNA independent sequences were used. However, some issues remain to be addressed:

      Major comments:

      1.Figure 1D - the authors should demonstrate that the depletion of TRIM39 expression by shRNA in Neuro2A by Western blotting

      2.Figure 3 - the author should show overexpression of TRIM39 resulted in reduction of basal level of endogenous NFATc3 due to its effect on protein stability by using CHX or other pulse chase method.

      3.Figure 3 - Does overexpression or knockdown of TRIM39 has an effect on affecting levels of NFATc3 mRNAs?

      4.Figure 6A - the authors should confirm the multiple bands that are slower migrating are SUMO form of NFATc9 by demonstrating the presence of SUMO in these forms of NFATc3, or alternatively, perform His-SUMO pull-down and probe for NFATc3.

      5.Figure 7C - the quantification for mSIM1 does not seem to agree with the band intensity.

      6.TRIM17 reduces TRIM39/NFATc3 interaction and inhibits TRIM39 E3 activity, which results in stabilization of NFATc3. NFATc3 in turn transcriptionally induces TRIM17 expression, thus forming a feedback loop between NFATc3 and TRIM17. It will be good if the authors can discuss the possibility of the existence of this feedback mechanism in physiological context? Is the protein level of NFATc3 level, which should be low abundance at the resting state, elevated by KCI deprivation? If so, can the authors discuss the possible signalling event(s) that that lead to activation of NFATc3 upon KCI deprivation? For instance, does KCL deprivation cause de-SUMOylation of NFATc3?

      Minor comments:

      1.Line 294 - it should be "SUMOylation" instead of "SUMO".

      2.Figure 8 - to include TRIM39/NFATc3 double knockdown to show the effect on increased neuronal apoptosis in the cells with TRIM39 knocked down was due to elevation of NFATc3 rather than other target(s) of TRIM39.

      3.The discussion may be shortened and revised to highlight the physiological importance of the findings linked to cerebellar granule neurons survival.

      Significance

      Prior to this study, the mechanism by which protein stability of NFATc3, the pre-dominant member of the NFAT family of transcription factor in neurons, is regulated remains poorly understood. Shrivastava et al. have unravelled the interplay between ubiquitylation and SUMOylation involving TRIM39 and TRIM17 to have an important role in regulating protein stability of NFATc3. The work is interesting and bears significance towards understanding how apoptosis could be finely controlled in cerebellar granule neurons. Furthermore, the study has also expanded the understanding of the role and regulation of the TRIM family of proteins. The senior author is an expert in this field and over the years, her group has contributed many key discoveries on the function of TRIM family of E3 ubiquitin ligases and their critical ubiquitylation substrates in neuronal survival and its relevance to neuronal biology and diseases.

      The referee's field of expertise in in the field of mitochondrial apoptosis signalling. The referee extensively involved in studying how protein stability of regulators in apoptosis signalling are regulated by the ubiquitin-proteasome system (UPS) and how does the regulation play a role in physiology and diseases.

      Key words: apoptosis, ubiquitylation, cell signaling, liver diseases

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, the authors analyze the effect of TRIM39, a ubiquitin E3 ligase, on NFATc3, a transcription factor that regulates apoptosis in the nervous system. The authors show that TRIM39 can promote the ubiquitination of NFATc3 and regulate its half-life. Furthermore, ubiquitination depends on the SUMOylation state of NFATc3, which suggests that TRIM39 could be a new example of SUMOylation-dependent ubiquitin ligase or STUbL. In addition, the authors show that TRIM17 interferes with TRIM39 ubiquitination, representing a new regulatory level for NFATc3 degradation. This has consequences on the regulation of apoptosis in cells derived from the nervous system. The authors show well-controlled, sound results for the most part. The manuscript is well written, and argumentation is convincing. Given the fact that only 2 STUbLs were previously characterized in mammals, the results are relevant and represent an advance in the field. Overall, this is a nice piece of work. Here are some comments.

      Major comments

      -In Fig. 2B, the levels of material loaded are uneven, which difficult the interpretation. However, it seems that the control shRNA also has an effect on NFATc3 ubiquitination, which should not be the case. Also, by reducing ubiquitination by TRIM39, shouldn't you expect an increase in the levels of NFATc3, if this ubiquitination was driving degradation? The authors do not specify whether those cells were treated or not with proteasomal inhibitor. Same applies in Figure 4B, where no reduction in NFATc3 are seen after including TRIM39 in the reaction (beyond the fact that it looks reduced because the presence of ubiquitinated forms).

      -After the experiments in vitro shown in Fig. 2C, the authors conclude that the NFATc3 is a direct substrate of TRIM39. I think the authors used the right approach by using bacterially produced GST-TRIM39 for the ubiquitination reaction. However NFATc3 is produced by an in vitro transcription-translation system, which could in principle provide other contaminant proteins to the reaction. Did the authors try to use bacterially produced NFATc3? This might be difficult in the case of big proteins, in which case the authors could add some caution note in the text. Same applies in Figure 4B.

      -In Fig. 6B, higher levels of ubiquitination in the different SUMOylation mutants are shown. Is this effect consistent? How this can be explained? In addition, variations in the levels of NFATc3 are shown in the total lysate, despite the use of proteasomal inhibitors. How the author explain this effect? Somehow, this is contradictory with the general message of SUMOylation-dependent ubiquitination.

      -In Fig. 7E, not clear to me what the big bands above 130 KDa is after the nickel beads. Do they correspond to monoUb NFATc3 or to the unmodified protein that is sticky to the beads? Do the authors have side-by-side gels of the initial lysate next to the nickel beads eluates to show the increase in molecular weight?

      -Quantifications in some pictures (i.e. Figures 5A, 5B, 6B, 7) is shown in red above or below the bands. Not clear whether the quantifications shown correspond to that single experiment or is the average of several experiments. In the first case, the number would not be very valuable. Authors could add quantification graphs with standard deviations or error bars to the experiments if they want to make the point of changes (significant or not) in the levels. Alternatively, indicate in the Figure legends whether the numbers correspond to the average of several experiments.

      -In Fig. 8, the quantification of apoptotic nuclei has been done just based on the morphology after DAPI staining. Could you use an apoptosis marker (i.e. cleaved caspase Abs) to label the apoptotic cells?

      Minor comments

      -In Figs. 1 and 5, the red channel should be put in black and white, as it is much easier to see the signal. Not relevant to have DAPI alone in B&W (it does not hurt either), as it is well visible in the merge picture. Also, quantification of the PLA positive dots should be shown in Fig. 1.

      -In Fig. 3C, is the difference in TRIM17 expression between empty plasmid and NFATc3 plasmid significant? If so, indicate it in the graph. The same in panels D, E, indicate all significant differences. Same in other Figures.

      -It would be nice to show a scheme on the location of SIMs in TRIM39 in relation to the other feature of the protein.

      -In Fig. 2 legend, "Note that in the presence of ubiquitin the unmodified form of WT GST-Trim39 is lower due to high Trim39 ubiquitination." Please change to "...in the presence of ubiquitin the levels of the unmodified form..."

      -In Fig. 7 legend, the phrases "The intensity of the bands ... " are not clear. Please rephrase.

      -In Fig. 8 legend, " P<0.001". Change to "* P<0.001".

      Significance

      In this manuscript, the authors analyze the effect of TRIM39, a ubiquitin E3 ligase, on NFATc3, a transcription factor that regulates apoptosis in the nervous system. The authors show that TRIM39 can promote the ubiquitination of NFATc3 and regulate its half-life. Furthermore, ubiquitination depends on the SUMOylation state of NFATc3, which suggests that TRIM39 could be a new example of SUMOylation-dependent ubiquitin ligase or STUbL. In addition, the authors show that TRIM17 interferes with TRIM39 ubiquitination, representing a new regulatory level for NFATc3 degradation. This has consequences on the regulation of apoptosis in cells derived from the nervous system.

      The authors show well-controlled, sound results for the most part. The manuscript is well written, and argumentation is convincing. Given the fact that only 2 STUbLs were previously characterized in mammals, the results are relevant and represent an advance in the field. Overall, this is a nice piece of work.

      Audience: researchers interested on proteostasis in general and on nervous system regulation

      My expertise: postranslational modifications

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Desagher and co-workers investigate the regulation of the NFAT family member NFATc3, a transcription factor in neurons with a pro-apoptotic role. They identify TRIM39 as a ubiquitin E3 ligase regulating NFATc3. They demonstrate that TRIM39 can bind and ubiquitinate NFATc3 in vitro and in cells. They identify a critical SUMO interaction motif in TRIM39, that is required for its interaction with NFATc3 and for its ability to ubiquitinate NFATc3. Moreover, mutating sumoylation sites in NFATc3 reduces the interaction with TRIM39 and reduces its ubiquitination. Silencing TRIM39 increases the protein levels of NFATc3 and its transcriptional activity, leading to apoptosis of neurons. TRIM17 modulates the TRIM39-NFATc3 axis. Combined, TRIM39 appears to be a SUMO-targeted ubiquitin ligase (STUbL) for NFATc3 in neurons.

      Major points:

      1.This manuscript containing two stories: the rather exciting story that TRIM39 is a STUbL for NFATc3 (as mentioned in the title) and the second less exciting story: TRIM17 modulates the regulation of NFATc3 by TRIM39. These stories are now mixed in a confusing manner, disrupting the flow of the first story. It would be better to focus the current manuscript on the first story and strengthen it further and develop the second story in a second manuscript.

      2.Whereas the cellular experiments to indicate that TRIM39 acts as a STUbL are properly carried out, the observed effects are not necessarily direct. Direct evidence that TRIM39 is indeed a STUbL for sumoylated NFATc3 needs to be obtained in vitro, using purified recombinant proteins. Does TRIM39 indeed preferentially ubiquitinate sumoylated NFATc3? Is ubiquitination reduced for non-sumoylated NFATc3? Is ubiquitination of sumoylated NFATc3 dependent on SIM3 of TRIM39? Do other SIMs in TRIM39 contribute?

      3.Rule out potential roles for other STUbLs by including control knockdowns of RNF4 and RNF111 and verify the sumoylation of NFATc3 and ubiquitination of wildtype and sumoylation-mutant NFATc3.

      4.Figure 6B: use SUMO inhibitor ML-792 to demonstrate that ubiquitination of wildtype NFATc3 by TRIM39 is dependent on sumoylation.

      Minor points:

      5.Figure 1A and B: demonstrate by immunoprecipitation and Western that the endogenous counterparts indeed interact.

      6.Figure 1C and 1E: Quantify the PLA results properly and perform statistics.

      7.Figure 2B: Correct unequal loading of samples.

      8.Figure 6B: proper statistics are needed here from at least three independent experiments.

      Significance

      Humans have over 600 different ubiquitin E3s. Currently, RNF4 and RNF111 are the only known human SUMO-Targeted Ubiquitin Ligases (STUbLs). Here, the authors present evidence that the ubiquitin E3 ligase TRIM39 is a STUbL for sumoylated NFATc3. Identification of a new STUbL is an exciting finding for the ubiquitin and SUMO field and for the field of ubiquitin-like signal transduction in general, but needs to be strengthened as outlined above. My field of expertise is SUMO and ubiquitin signal transduction.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      **A. Summary:**

      In this modeling study, the authors devised a multicellular model to investigate how circadian clocks in different parts (organs) of plants coordinate their timing. The model uses a plausible mechanism to explain how having a different sensitivity to light leads to different phase and period of circadian clock, which is observed in different plant organs. The model allows for entrainment in Light-Dark (LD) cycles and then a release in always-light (LL) environments. The model disentangles numerous factors that have confounded previous experiments. In one instance, the authors assigned different light sensitivities to the different organs (e.g., root tip, hypocotyl, etc.) which unambiguously show that this one element alone - spatially differing sensitivity to light - is sufficient for recapitulating experimentally observed differences in periods and phases between plant organs. The model also recapitulates the spatial waves of gene expression within and between organs that experimentalists reported. At the sub-tissue level, the model-produced waves have similar patterns as the experimentally observed waves. This confirmation further validates the model. By having the cells share clock mRNA, from any clock component genes, showed the same, experimentally observed spatial dynamics. The main conclusion of the study is that regional differences (e.g., between different organs) in light senilities, when combined with cell-to-cell sharing of clock-gene mRNAs, enables a robust, yet flexible, circadian timing under noisy environmental cycles.

      Thank you for your assessment of our work. We plan to make the following revisions based on your feedback.

      **B. Specific points:**

      1.Lines 125-127: "To simulate the variability observed in single cell clock rhythms, we multiplied the level of each mRNA and protein by a time scaling parameter that was randomly selected from a normal distribution." - Why not add a white (Gaussian) noise term to these equations? How does multiplying by a random variable (for rescaling time) different from my proposal? Some explanation should be given in the text here.

      We opted for a time scaling approach as this generates between-cell period differences but avoids within-cell period differences. This is consistent with single cell experiments (S1 Fig; Gould et al., 2018, eLife). We will provide an explanation of this in the text.

      2.Does the spatial network model simplify calculations by assuming separations of timescales (e.g., for equilibration in concentrations of mRNAs that diffuse between cells)? If so, it would be good to spell these out in the beginning of the Results section (where the model is described).

      We agree that a more detailed discussion of the model assumptions would be beneficial and we will provide this in the text.

      3.Lines 161-162: "....in a phase only model by local...." should be "....in a phase model only by local...."

      Thank you for your correction.

      4.Lines 188-190: The authors observed that qualitatively similar/indistinguishable behaviors arose regardless of which elements are varied (e.g., global versus local cell-cell coupling, setting light input to be equal in all regions of the seedling, etc.). Then they claim here that "...these results show that the assumptions of local cell-to-cell coupling and differential light sensitivity between regions are the key aspects of our model that allow a match to experimental data." - I don't see how this follows from the observation almost any of the variations lead to the same behaviors in this section (spatial waves). Show the reasoning in the text here.

      We observed spatial waves with different local coupling regimes (4 v. 8 nearest neighbours). However, we did not observe spatial waves with global coupling (S10 Fig). This led us to conclude that local coupling is a key aspect. In addition, we do not observe waves when setting the light input to be equal in all regions of the seedling (S11 Fig). This confirms that local differences in light sensitivity are also required in our simulations to generate spatial waves. We will clarify these points with revisions to the text.

      5.Pgs. 9-10: Section on "Cell-to-cell coupling maintains global coordination under noisy light-dark cycles": The simulation results rigorously support the authors' main conclusion here, which is that local cell-to-cell coupling allows for global coordination under noisy LD cycles. But I'm missing an intuitive explanation (or just any explanation) for why this is. At the end of this section, the authors should provide some intuition or qualitative explanation for the observations that they produced using their model in this section.

      We will revise the text to provide an intuitive explanation of these results. The coupling decreases the within-region phase differences. Despite the between-regions phase differences persisting, this effect is sufficient to improve the overall global synchrony.

      6.Lines 261-262: Replace the present tenses with past tenses.

      Thank you for your correction.

      7.Is the main idea that cell-to-cell coupling allows for averaging of fluctuations, between organs or cells within the same organ, while allowing for coordination of the average quantities? Is this responsible for both the flexibility and robustness observed under noisy environmental cycles?

      The cell-to-cell-coupling allows for the averaging of fluctuations between cells and the regional flexibility arises from the different light sensitivities in each region. What was interesting to us was that under light-dark cycles the regional flexibility was not lost due to either the noise in the light or the averaging effect of the cell-to-cell coupling. We will revise the text to emphasize these points. Thank you for your prompts.

      8.Line 304: Is it really true that the mammalian circadian rhythm is centralized? Don't some parts of our bodies have different circadian clock (e.g., slight differences in phase) than some other parts of our bodies?

      There are indeed some small phase differences between parts of our bodies because the mammalian system, like the plant system, is imperfectly coupled. However, the mammalian system is considered more centralized because the suprachiasmatic nucleus in the brain receives the key entraining signal of light and then coordinates rhythms across the body (Bell-Pedersen et al., 2005, Nat Rev Gen; Brown & Azzi, 2013, Circadian Clocks). We will expand on these interesting points by adding a paragraph to the discussion.

      Reviewer #1 (Significance):

      **Overall assessment:**

      I enthusiastically recommend this work for publication after the authors address my comments below (please see "Specific points").

      The model's main strength is that the authors could vary each ingredient separately - light sensitivity of each cell/organ, which gene's mRNA diffuses between cells, cellular noise, local versus global cell-cell coupling, etc. Afterwards, the authors could determine which of these variations produces which experimentally observed behaviors. Another strength of the model is that it can reproduce not just one, but numerous, experimentally observed behaviors that are important for understanding circadian clocks in plants. Thus, the model is grounded in experimental truth and produces experimentally observed results. Crucially, since the authors could vary every single element in the model independently of the other elements, the authors are able to provide plausible explanations for why the experiments produced the results that they did (experimentally, a number of confounding factors prevented one from pinpointing to which element produced which observation).

      Another strength of the model is also extendable, by other researchers to investigate other plant physiologies in the future (e.g., circadian clock's influence on cell division). The authors highlight these future uses in the discussion section. Therefore, I believe that this work will be valuable to plant biologists, non-plant biologists who are interested in circadian clocks, and systems biologists in general.

      The manuscript is also well written and relatively easy to follow, even for non-plant biologists like myself.

      Thank you for the positive feedback - we are pleased that you find the manuscript of broad interest to a range of readers.

      Comment on Reviewer #2:

      I agree with his/her major criticism #3 (ELF4 long-distance movement). I find this to be a reasonable request. Fulfilling it would increase the paper's impact.

      Please see our response to reviewer #2.

      Comment on Reviewer #3:

      The reviewer's point (1) asks for a reasonable request.

      Regarding his/her point (2): This is also reasonable. I'd recommend his/her suggestion (a). In the end, I'd be interested to see how the authors respond to this (what function they choose to let adjacent cells be subjected to some correlated light-input intensity. I'd be happy with something simple such as + noise, where is a deterministic term that, for example, decreases exponentially as one moves away from some central cell. Basically, I'd let the authors decide how to implement this and accept their current implementation - no correlation in light-intensity between adjacent cells - as an extreme scenario, as this reviewer points out.

      Please see our response to reviewer #3.

      Reviewer #2 (Evidence, reproducibility and clarity):

      **Summary:**

      The manuscript presents an improved model of the circadian clock network that accounts for tissue-specific clock behavior, spatial differences in light sensitivity, and local coupling achieved through intercellular sharing of mRNA. In contrast to whole-plant or "phase-only" models, the authors' approach enables them to address the mechanism behind coupling and how the clock maintains regional synchrony in a noisy environment. Using 34 parameters to describe clock activity and applying the properties mentioned above, the authors demonstrate that their model can recapitulate the spatial waves in circadian gene expression observed and can simulate how the plant maintains local synchrony with regional differences in rhythms under noisy LD cycles. Spatial models that incorporate cell-type-specific sensitivities to environmental inputs and local coupling mechanisms will be most accurate for simulating clock activity under natural environments.

      Thank you for your assessment of our work. We plan to make the following revisions based on your feedback.

      *We have the following **major criticisms** as follows*

      1) When assigning light sensitivities in different regions of the plant, the authors assign a higher sensitivity value to the root tip (L=1.03) than they do to the other part of the root (L=0.90). We are curious why the root tip would have higher light sensitivity than the rest of the root. Is this based on experimental data (if so, please cite in this section or methods)? It seems that these L values were assigned simply to make sure they recapitulated the period differences observed in Fig. 2A. Are these values based on PhyB expression in those organs? Or perhaps based on cell density in those locations?

      We assign the light sensitivity to match observed experimental period differences across the plant (Fig 2A,B). This is based on previous experiments demonstrating that experimental period differences are dependent on light input through the light sensing gene PHYB (Greenwood et al., 2019, PLoS Bio; Nimmo et al., 2020, Physiologia Plantarum). For example, in WT seedlings, the root tip oscillates faster than the root, but this difference is lost in the phyb-9 mutant (Greenwood et al., 2019). Thus, we assume the root tip to be more sensitive to light than the roots.

      Further supporting this assumption, there is evidence that expression of phytochromes and cryptochromes are increased in the root tip relative to the root (e.g., Somers & Quail, 1995, Plant J; Bognar et al., 1999, PNAS; Toth et al., 2001, Plant Physiol), as the reviewer proposes. However, further experiments would be needed to verify that these differences in expression are what lead to the differences in clock timing. We will add a discussion of these experiments to the text.

      2) In the discussion of the test where they set the "light inputs to be equal" in all regions to simulate the phyb-9 mutant, could the authors please clarify whether that means they set the L light sensitivity value equal in all regions?

      This is indeed what we mean, we will rephrase the text for clarity.

      a. If they are referring to setting the L value equal to all regions, we suggest that this discussion be moved to the section about different light sensitivities instead of the local sharing of mRNA section.

      Thank you for your suggestion, we agree and will move this discussion.

      b. Additionally, is it possible to set the light sensitivity to zero for all parts of the plant? We think this would be more suitable to simulate the phyb-9 mutant phenotype.

      We thank the reviewer for this suggestion. We will include a simulation with light sensitivity set to zero in the revised manuscript, in addition to the existing simulations with light sensitivity set to 1.

      3) Based on the recent Chen et al. (2020) paper showing ELF4 long-distance movement, we think it would be of great interest for the authors to model ELF4 protein synthesis/translation as the coupling factor, in addition to the modeling using CCA1/LHY mRNA sharing. We understand you may be saving this analysis for a future modeling paper, but this addition to the paper could increase the impact of this paper.

      Thank you for the suggestion to improve our manuscript. We agree it will be of interest to model ELF4 protein as the local coupling factor. In the revision, we will simulate each clock protein (including ELF4) as the local coupling factor and compare.

      In addition, we will also modify the coupling mechanism to simulate the long-distance transport of ELF4 proposed by Chen et al., 2020. Our preliminary simulations show that we can couple shoot rhythms to those in the root tip, but that this long range coupling can not on its own generate the spatial structure observed in experiments. We agree with the reviewers that this analysis and an associated discussion will further increase the impact of the paper.

      4) This model is able to simulate circadian rhythms under 12:12 LD cycles, which represents two days of the year-the equinoxes. We are curious if the model can simulate rhythms under short days and long days as well. We understand this analysis may be outside the scope of this paper and may require changing the values of the 34 parameters used but think it could be a useful addition here or in future work.

      We agree it would be interesting to observe the behavior of the model under different day lengths. We will include simulations under short and long days in the revision.

      *And **minor criticisms** as follows*

      1) In the first paragraph of the results section, it would be helpful for the authors to reference Table S1 when they mention the 34 parameters used to model oscillator function

      We agree and we will implement this helpful suggestion.

      2) In the first paragraph of the section titled "Local flexibility persists under idealized and noisy LD cycles", it would be helpful for the authors to reference S12 Fig after the last sentence that starts "However, ELF4/LUX appeared more synchronized..."

      We agree and we will implement this helpful suggestion.

      3) In the first paragraph of the section titled "Cell-to-cell coupling maintains global communication under noisy light-dark cycles", the authors refer to a "Table 1" but I think they mean to refer to Table S1"

      Thank you, we will implement this helpful suggestion.

      4) In Fig. 1, panel C is described as demonstrating the cell-to-cell coupling through the "level of CCA1/LHY". This phrasing is vague and we think could be improved to the "mRNA level of CCA1/LHY".

      We agree and will implement this helpful suggestion.

      Reviewer #2 (Significance (Required)):

      This work would be broadly interesting to other researchers studying cell-to-cell signaling and coupling of circadian rhythms in plants and other species where spatial waves of gene expression have been observed (i.e., mice and humans). Additionally, the computational modeling aspect of this work was easily interpretable for someone outside this expertise. Our expertise lies in plant circadian biology.

      We thank the reviewer for recognising the broad appeal of our work.

      Reviewer #3 (Evidence, reproducibility and clarity):

      **Summary:**

      The authors start by taking a previously published model of the plant circadian clock and implement five changes: 1) updating the network topology to reflect some recent experimental findings, 2) make a spatial model loosely based on a seedling template 3) introduce coupling between cells based on shared levels of CCA1/LHY 4) randomly rescale time in each cell to induce inter-cell differences in period, 5) include a light sensitivity that depends on the region considered.

      For a certain configuration of light sensitivities/intensities, the different periods of oscillations in each seedling region roughly match that of experiments. With a sufficiently high coupling between cells, the system can also generate spatial waves, which are also observed in the experimental system.

      With pulsed light inputs the spatial pattern is still produced. The authors then investigate the robustness to environmental noise by generating stochastic light signals and show that the global synchrony, as measured with a synchronisation index, increases with cell-to-cell coupling strength. The paper is overall well-written, and the background and details of the analysis are well presented.

      Thank you for your assessment of our work. We plan to make the following revisions based on your feedback.

      **Major comments:**

      For the first part of paper, the output of the model is certainly the focus. There is virtually no discussion of the inferred parameters and how much confidence the authors have in their values.

      Thank you for this point. We will add discussion of the inferred parameters to the initial part of the results.

      My main issue with the paper is about the section with noisy light signals, which is included in the title and is ultimately one of the main themes of the article.

      Specifically, on line 224:

      "This decrease in cell-to-cell variation revealed an underlying spatial structure (Fig 4D, middle and right, and S13 Fig), comparable to that observed under idealized LD cycles (Fig 4B, middle and right, and S12 Fig)."

      Firstly, I don't feel these conclusions match with the data presented. Comparing figure 4D middle and right with figure 4B middle and right shows a clear and pronounced loss in spatial structure. In its current form, this statement has to change, but I believe there are at least two other major issues with this figure:

      We agree there are some differences in the spatial structure between idealized (Fig 4B) and noisy (Fig 4D) LD cycles. Preliminary simulations suggest that this is due to the way the noisy LD cycles are programmed.

      In the current implementation of noisy LD cycles, the maximum intensity of L, L**max, differs between each region, such that relative differences in light sensitivity between regions are maintained. This means that some phase differences between regions are maintained. However, as the reviewer correctly points out in point 1 below, due to the noise fluctuations, the average level of light is lower than under idealized LD cycles, and with considerable day-to-day variation. We believe this is why the spatial structure differs.

      Preliminary simulations suggest that if we normalize the mean light intensity such that the mean is equal between the two conditions (as the reviewer suggests in point 1 below), the spatial structure appears similar. We will present this analysis in the revision.

      1) The figure is clearly designed to invite a comparison between the noise-free light cycles on the left with the noisy cycles on the right. However, due to how the noisy light is simulated, the variance of light signal increases AND the average intensity of light decreases by 50%. When comparing the left and the right, we therefore don't know whether the changes are due to differences in the average signal or differences from the stochasticity. I think the authors should simulate a noisy light signal with the same mean intensity level as the deterministic signal.

      As discussed above, we agree that the average intensity of the light decreases due to the noise, and this complicates interpretation. We will simulate idealized and noisy light cycles with the same mean light level upon revision.

      2) The noise model for the light doesn't seem realistic. On line 484 is says:

      "We made the simplifying assumption that each cell is exposed to an independent noisy LD cycle due to their unique positions in the environment. LD cycles were input to the molecular model through the parameter L".

      In fact, this could be considered as an incredibly complex signal, because for 800 cells it means drawing 800 random light signals. The implication is that two adjacent cells receive statistically independent light signals. Depending on chance, one cell might receive tropical levels of light while its neighbour experiences a cloudy day. This affects the interpretation and conclusions from figures 4 and 5. I propose two different ways of improving the simulation of the noisy light signal:

      a) In one extreme case, all cells receive the same noisy light signal, and the other extreme, they all receive independent signals. You could consider a mixture model of light signals, where each cell receives \lambda L_global(t) + (1-\lambda) L_individual(t), where L_global(t) is a global light signal that is shared by all cells and L_individual(t) is a light signal unique to an individual cell. The mixing parameter \lambda controls how similar the light signal is between cells

      b) Clearly the light signal will differ depending on the region, but there will be some spatial correlation. You could also consider methods of simulating light such that neighbouring cells receive correlated signals, although this might be difficult.

      Thank you for your proposals. We agree that our current implementation of noisy LD cycles represents an extreme scenario. Given that there is no environmental data at sufficient resolution to reliably evaluate which implementation is most realistic, we will explore different approaches based on your suggestions and present them in our revision.

      Assuming that the problem with the mean signal is corrected, do you expect the average spatial pattern to be the same between figure 4 B and D with no coupling (J=0) (although an increase in the variance between cells)? Perhaps not (owing to nonlinearities in the system), but it would be interesting to comment.

      We agree that the decreased light intensity complicates interpretation of the spatial structure. Although in the current implementation relative light differences between regions are maintained, the spatial structure is altered because the mean intensities are lower. Preliminary simulations with the mean intensity fixed do result in spatial patterns more similar to that seen in Fig 4B, but with increased variance. Comprehensive simulations will be included in the revised manuscript.

      The different periods in the different regions of the seedling are caused by differences in light sensitivity, which the authors claim is justified from refs 12-15. An alternative hypothesis is the that biochemical parameters such as degradation rates are different between regions. This is briefly alluded to in the introduction, but I think it would be interesting to discuss further. What would be the pros and cons of the two different mechanisms?

      We agree that an alternative hypothesis is that biochemical parameters such as degradation rates may differ between regions. Experimental evidence, however, more supports the light sensitivity hypothesis. This is because, for example, mutations in light signalling remove the spatial differences between regions. We agree though that this is an important point, and will add a paragraph to the discussion discussing the pros and cons of the two different mechanisms.

      I understand that the authors used a pre-existing model, but I must say that I find the way that light is incorporated into the model a bit confusing.

      On line 345 it says:

      "L(t) represents the input light signal (L = 0, lights off; L > 0, lights on) and D(t) denotes a corresponding darkness input signal (D = 1, lights off; D = 0, lights on)."

      Surely the only thing that matters biophysically is the number of photons hitting the plant? Could you explain why the model needs to have a separate "darkness signal" compared to just a single light signal?

      A darkness signal has been introduced in many circadian clock models because degradation rates of the clock genes can depend upon the light or dark condition. We agree with the reviewer that we should explain this clearer in the text.

      In the model, the light intensity changes depending on the region. It might make more sense for interpretability if instead there is an additional light-sensitivity coefficient that depends on the region, because at the moment I'm not sure what units L(t) is supposed to take.

      Thank you for your suggestion. We will try to implement this approach.

      **Minor comments**

      Could you more explicitly describe a possible molecular mechanism through which the coupling acts?

      Thank you for your suggestion. We will more explicitly discuss likely transport mechanisms in the text.

      In Figure 1C it looks like different genes are coupling to different genes, so you may need to rearrange it.

      In our model, the level of CCA1/LHY is shared. Thus, CCA1/LHY from one cell can be considered to repress the expression of other interacting genes in the neighbour cell.

      Line 103: "We found that regional differences persist even under LD cycles, but cell to-cell minimized differences between neighbor cells." Missing word.

      Thank you for your correction.

      Line 124: "The coupling strength was set to 2 (Methods)." This is meaningless in isolation, so it would be better to briefly explain what the coupling parameter is before mentioning its value.

      Thank you for your suggestion, we will describe the coupling function in more detail.

      Through the text, I think De Caluwe should be corrected to De Caluwé

      Thank you for your correction.

      Typo line 493

      Thank you for your correction.

      Code and data are not made available.

      Model code will be made available from our project GitLab page: https://gitlab.com/slcu/teamJL/greenwood_tokuda_etal_2020

      Output of analysis of experimental data and simulations will also be made available on the GitLab page.

      Reviewer #3 (Significance (Required)):

      The authors motivate the paper by highlighting that their proposed model improves on phase-based models in that it describes underlying molecular mechanisms.

      From an experimental side, it's interesting that a model is developed and directly compared with measured spatio-temporal waves of gene expression. From a theoretical side, the authors address questions relating to oscillations, multi-scale modelling and noise robustness that also generalise to other systems. I therefore expect that both experimental and theoretical audiences will be interested in the results.

      There are many possible additions and modifications that could be made to the model, and so the model and analysis could provide a platform for future research. However, I can't comment on whether there are similar pre-existing models of the plant circadian clock that contain both a molecular description of the circadian clock as well as a spatial scale.

      We appreciate the reviewer’s view that the work is interesting to both experimental and theoretical audiences.

      Comments on Review #1:

      The time is rescaled in each cell, meaning that each cell has a unique period, but the dynamics remain deterministic and hence the peak-to-peak times will be exactly the same for each cell. I imagine this isn't completely consistent with single-cell data (if available), where peak-to-peak times are very likely to be variable due to noisy gene expression. In a future paper it would be interesting to analyse the system using stochastic differential equations.

      Please see our response to reviewer #1.

      Comments on Review #2:

      I agree on the following two points:

      1) It would add value to discuss whether the different ranking of light sensitivities by organ matches any available experimental data.

      Please see our response to reviewer #2.

      2) As the Reviewers point out, there are many possibilities for testing the robustness of the system to light clues, including varying the length of the day. Although outside of the scope of this paper, I wonder if it's possible to find data from a light sensor measuring light intensity across an entire year? Plugging such data into the model and measuring how the amplitude and period changes would be really interesting, in my opinion.

      Thank you for your suggestion. We also see this as an interesting future direction.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The authors start by taking a previously published model of the plant circadian clock and implement five changes: 1) updating the network topology to reflect some recent experimental findings, 2) make a spatial model loosely based on a seedling template 3) introduce coupling between cells based on shared levels of CCA1/LHY 4) randomly rescale time in each cell to induce inter-cell differences in period, 5) include a light sensitivity that depends on the region considered.

      For a certain configuration of light sensitivities/intensities, the different periods of oscillations in each seedling region roughly match that of experiments. With a sufficiently high coupling between cells, the system can also generate spatial waves, which are also observed in the experimental system.

      With pulsed light inputs the spatial pattern is still produced. The authors then investigate the robustness to environmental noise by generating stochastic light signals and show that the global synchrony, as measured with a synchronisation index, increases with cell-to-cell coupling strength. The paper is overall well-written, and the background and details of the analysis are well presented.

      Major comments:

      For the first part of paper, the output of the model is certainly the focus. There is virtually no discussion of the inferred parameters and how much confidence the authors have in their values.

      My main issue with the paper is about the section with noisy light signals, which is included in the title and is ultimately one of the main themes of the article.

      Specifically, on line 224:

      "This decrease in cell-to-cell variation revealed an underlying spatial structure (Fig 4D, middle and right, and S13 Fig), comparable to that observed under idealized LD cycles (Fig 4B, middle and right, and S12 Fig)."

      Firstly, I don't feel these conclusions match with the data presented. Comparing figure 4D middle and right with figure 4B middle and right shows a clear and pronounced loss in spatial structure. In its current form, this statement has to change, but I believe there are at least two other major issues with this figure:

      1) The figure is clearly designed to invite a comparison between the noise-free light cycles on the left with the noisy cycles on the right. However, due to how the noisy light is simulated, the variance of light signal increases AND the average intensity of light decreases by 50%. When comparing the left and the right, we therefore don't know whether the changes are due to differences in the average signal or differences from the stochasticity. I think the authors should simulate a noisy light signal with the same mean intensity level as the deterministic signal. . 2) The noise model for the light doesn't seem realistic. On line 484 is says:

      "We made the simplifying assumption that each cell is exposed to an independent noisy LD cycle due to their unique positions in the environment. LD cycles were input to the molecular model through the parameter L".

      In fact, this could be considered as an incredibly complex signal, because for 800 cells it means drawing 800 random light signals. The implication is that two adjacent cells receive statistically independent light signals. Depending on chance, one cell might receive tropical levels of light while its neighbour experiences a cloudy day. This affects the interpretation and conclusions from figures 4 and 5. I propose two different ways of improving the simulation of the noisy light signal:

      a) In one extreme case, all cells receive the same noisy light signal, and the other extreme, they all receive independent signals. You could consider a mixture model of light signals, where each cell receives \lambda L_global(t) + (1-\lambda) L_individual(t), where L_global(t) is a global light signal that is shared by all cells and L_individual(t) is a light signal unique to an individual cell. The mixing parameter \lambda controls how similar the light signal is between cells

      b) Clearly the light signal will differ depending on the region, but there will be some spatial correlation. You could also consider methods of simulating light such that neighbouring cells receive correlated signals, although this might be difficult.

      Assuming that the problem with the mean signal is corrected, do you expect the average spatial pattern to be the same between figure 4 B and D with no coupling (J=0) (although an increase in the variance between cells)? Perhaps not (owing to nonlinearities in the system), but it would be interesting to comment.

      The different periods in the different regions of the seedling are caused by differences in light sensitivity, which the authors claim is justified from refs 12-15. An alternative hypothesis is the that biochemical parameters such as degradation rates are different between regions. This is briefly alluded to in the introduction, but I think it would be interesting to discuss further. What would be the pros and cons of the two different mechanisms?

      I understand that the authors used a pre-existing model, but I must say that I find the way that light is incorporated into the model a bit confusing.

      On line 345 it says: "L(t) represents the input light signal (L = 0, lights off; L > 0, lights on) and D(t) denotes a corresponding darkness input signal (D = 1, lights off; D = 0, lights on)."

      Surely the only thing that matters biophysically is the number of photons hitting the plant? Could you explain why the model needs to have a separate "darkness signal" compared to just a single light signal?

      In the model, the light intensity changes depending on the region. It might make more sense for interpretability if instead there is an additional light-sensitivity coefficient that depends on the region, because at the moment I'm not sure what units L(t) is supposed to take.

      Minor comments

      Could you more explicitly describe a possible molecular mechanism through which the coupling acts?

      In Figure 1C it looks like different genes are coupling to different genes, so you may need to rearrange it.

      Line 103: "We found that regional differences persist even under LD cycles, but cell to-cell minimized differences between neighbor cells." Missing word.

      Line 124: "The coupling strength was set to 2 (Methods)." This is meaningless in isolation, so it would be better to briefly explain what the coupling parameter is before mentioning its value.

      Through the text, I think De Caluwe should be corrected to De Caluwé

      Typo line 493

      Code and data are not made available.

      Significance

      The authors motivate the paper by highlighting that their proposed model improves on phase-based models in that it describes underlying molecular mechanisms.

      From an experimental side, it's interesting that a model is developed and directly compared with measured spatio-temporal waves of gene expression. From a theoretical side, the authors address questions relating to oscillations, multi-scale modelling and noise robustness that also generalise to other systems. I therefore expect that both experimental and theoretical audiences will be interested in the results.

      There are many possible additions and modifications that could be made to the model, and so the model and analysis could provide a platform for future research. However, I can't comment on whether there are similar pre-existing models of the plant circadian clock that contain both a molecular description of the circadian clock as well as a spatial scale.

      REFEREE'S CROSS-COMMENTING

      Comments on Review #1:

      The time is rescaled in each cell, meaning that each cell has a unique period, but the dynamics remain deterministic and hence the peak-to-peak times will be exactly the same for each cell. I imagine this isn't completely consistent with single-cell data (if available), where peak-to-peak times are very likely to be variable due to noisy gene expression. In a future paper it would be interesting to analyse the system using stochastic differential equations.

      Comments on Review #2:

      I agree on the following two points:

      1) It would add value to discuss whether the different ranking of light sensitivities by organ matches any available experimental data.

      2) As the Reviewers point out, there are many possibilities for testing the robustness of the system to light clues, including varying the length of the day. Although outside of the scope of this paper, I wonder if it's possible to find data from a light sensor measuring light intensity across an entire year? Plugging such data into the model and measuring how the amplitude and period changes would be really interesting, in my opinion.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The manuscript presents an improved model of the circadian clock network that accounts for tissue-specific clock behavior, spatial differences in light sensitivity, and local coupling achieved through intercellular sharing of mRNA. In contrast to whole-plant or "phase-only" models, the authors' approach enables them to address the mechanism behind coupling and how the clock maintains regional synchrony in a noisy environment. Using 34 parameters to describe clock activity and applying the properties mentioned above, the authors demonstrate that their model can recapitulate the spatial waves in circadian gene expression observed and can simulate how the plant maintains local synchrony with regional differences in rhythms under noisy LD cycles. Spatial models that incorporate cell-type-specific sensitivities to environmental inputs and local coupling mechanisms will be most accurate for simulating clock activity under natural environments.

      We have the following major criticisms as follows

      1) When assigning light sensitivities in different regions of the plant, the authors assign a higher sensitivity value to the root tip (L=1.03) than they do to the other part of the root (L=0.90). We are curious why the root tip would have higher light sensitivity than the rest of the root. Is this based on experimental data (if so, please cite in this section or methods)? It seems that these L values were assigned simply to make sure they recapitulated the period differences observed in Fig. 2A. Are these values based on PhyB expression in those organs? Or perhaps based on cell density in those locations?

      2) In the discussion of the test where they set the "light inputs to be equal" in all regions to simulate the phyb-9 mutant, could the authors please clarify whether that means they set the L light sensitivity value equal in all regions? a. If they are referring to setting the L value equal to all regions, we suggest that this discussion be moved to the section about different light sensitivities instead of the local sharing of mRNA section. b. Additionally, is it possible to set the light sensitivity to zero for all parts of the plant? We think this would be more suitable to simulate the phyb-9 mutant phenotype.

      3) Based on the recent Chen et al. (2020) paper showing ELF4 long-distance movement, we think it would be of great interest for the authors to model ELF4 protein synthesis/translation as the coupling factor, in addition to the modeling using CCA1/LHY mRNA sharing. We understand you may be saving this analysis for a future modeling paper, but this addition to the paper could increase the impact of this paper.

      4) This model is able to simulate circadian rhythms under 12:12 LD cycles, which represents two days of the year-the equinoxes. We are curious if the model can simulate rhythms under short days and long days as well. We understand this analysis may be outside the scope of this paper and may require changing the values of the 34 parameters used but think it could be a useful addition here or in future work.

      And minor criticisms as follows

      1) In the first paragraph of the results section, it would be helpful for the authors to reference Table S1 when they mention the 34 parameters used to model oscillator function

      2) In the first paragraph of the section titled "Local flexibility persists under idealized and noisy LD cycles", it would be helpful for the authors to reference S12 Fig after the last sentence that starts "However, ELF4/LUX appeared more synchronized..."

      3) In the first paragraph of the section titled "Cell-to-cell coupling maintains global communication under noisy light-dark cycles", the authors refer to a "Table 1" but I think they mean to refer to Table S1"

      4) In Fig. 1, panel C is described as demonstrating the cell-to-cell coupling through the "level of CCA1/LHY". This phrasing is vague and we think could be improved to the "mRNA level of CCA1/LHY".

      Significance

      This work would be broadly interesting to other researchers studying cell-to-cell signaling and coupling of circadian rhythms in plants and other species where spatial waves of gene expression have been observed (i.e., mice and humans). Additionally, the computational modeling aspect of this work was easily interpretable for someone outside this expertise. Our expertise lies in plant circadian biology.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      A. Summary:

      In this modeling study, the authors devised a multicellular model to investigate how circadian clocks in different parts (organs) of plants coordinate their timing. The model uses a plausible mechanism to explain how having a different sensitivity to light leads to different phase and period of circadian clock, which is observed in different plant organs. The model allows for entrainment in Light-Dark (LD) cycles and then a release in always-light (LL) environments. The model disentangles numerous factors that have confounded previous experiments. In one instance, the authors assigned different light sensitivities to the different organs (e.g., root tip, hypocotyl, etc.) which unambiguously show that this one element alone - spatially differing sensitivity to light - is sufficient for recapitulating experimentally observed differences in periods and phases between plant organs. The model also recapitulates the spatial waves of gene expression within and between organs that experimentalists reported. At the sub-tissue level, the model-produced waves have similar patterns as the experimentally observed waves. This confirmation further validates the model. By having the cells share clock mRNA, from any clock component genes, showed the same, experimentally observed spatial dynamics. The main conclusion of the study is that regional differences (e.g., between different organs) in light senilities, when combined with cell-to-cell sharing of clock-gene mRNAs, enables a robust, yet flexible, circadian timing under noisy environmental cycles.

      B. Specific points:

      1.Lines 125-127: "To simulate the variability observed in single cell clock rhythms, we multiplied the level of each mRNA and protein by a time scaling parameter that was randomly selected from a normal distribution." - Why not add a white (Gaussian) noise term to these equations? How does multiplying by a random variable (for rescaling time) different from my proposal? Some explanation should be given in the text here.

      2.Does the spatial network model simplify calculations by assuming separations of timescales (e.g., for equilibration in concentrations of mRNAs that diffuse between cells)? If so, it would be good to spell these out in the beginning of the Results section (where the model is described).

      3.Lines 161-162: "....in a phase only model by local...." should be "....in a phase model only by local...."

      4.Lines 188-190: The authors observed that qualitatively similar/indistinguishable behaviors arose regardless of which elements are varied (e.g., global versus local cell-cell coupling, setting light input to be equal in all regions of the seedling, etc.). Then they claim here that "...these results show that the assumptions of local cell-to-cell coupling and differential light sensitivity between regions are the key aspects of our model that allow a match to experimental data." - I don't see how this follows from the observation almost any of the variations lead to the same behaviors in this section (spatial waves). Show the reasoning in the text here.

      5.Pgs. 9 -10: Section on "Cell-to-cell coupling maintains global coordination under noisy light-dark cycles": The simulation results rigorously support the authors' main conclusion here, which is that local cell-to-cell coupling allows for global coordination under noisy LD cycles. But I'm missing an intuitive explanation (or just any explanation) for why this is. At the end of this section, the authors should provide some intuition or qualitative explanation for the observations that they produced using their model in this section.

      6.Lines 261-262: Replace the present tenses with past tenses.

      7.Is the main idea that cell-to-cell coupling allows for averaging of fluctuations, between organs or cells within the same organ, while allowing for coordination of the average quantities? Is this responsible for both the flexibility and robustness observed under noisy environmental cycles?

      8.Line 304: Is it really true that the mammalian circadian rhythm is centralized? Don't some parts of our bodies have different circadian clock (e.g., slight differences in phase) than some other parts of our bodies?

      Significance

      Overall assessment:

      I enthusiastically recommend this work for publication after the authors address my comments below (please see "Specific points").

      The model's main strength is that the authors could vary each ingredient separately - light sensitivity of each cell/organ, which gene's mRNA diffuses between cells, cellular noise, local versus global cell-cell coupling, etc. Afterwards, the authors could determine which of these variations produces which experimentally observed behaviors. Another strength of the model is that it can reproduce not just one, but numerous, experimentally observed behaviors that are important for understanding circadian clocks in plants. Thus, the model is grounded in experimental truth and produces experimentally observed results. Crucially, since the authors could vary every single element in the model independently of the other elements, the authors are able to provide plausible explanations for why the experiments produced the results that they did (experimentally, a number of confounding factors prevented one from pinpointing to which element produced which observation).

      Another strength of the model is also extendable, by other researchers to investigate other plant physiologies in the future (e.g., circadian clock's influence on cell division). The authors highlight these future uses in the discussion section. Therefore, I believe that this work will be valuable to plant biologists, non-plant biologists who are interested in circadian clocks, and systems biologists in general.

      The manuscript is also well written and relatively easy to follow, even for non-plant biologists like myself.

      REFEREE'S CROSS-COMMENTING

      Comment on Reviewer #2:

      I agree with his/her major criticism #3 (ELF4 long-distance movement). I find this to be a reasonable request. Fulfilling it would increase the paper's impact.

      Comment on Reviewer #3:

      The reviewer's point (1) asks for a reasonable request. Regarding his/her point (2): This is also reasonable. I'd recommend his/her suggestion (a). In the end, I'd be interested to see how the authors respond to this (what function they choose to let adjacent cells be subjected to some correlated light-input intensity. I'd be happy with something simple such as < intensity > + noise, where <intensity> is a deterministic term that, for example, decreases exponentially as one moves away from some central cell. Basically, I'd let the authors decide how to implement this and accept their current implementation - no correlation in light-intensity between adjacent cells - as an extreme scenario, as this reviewer points out.

    1. Reviewer #3:

      I found the question, approach and analysis provide a clever framework for understanding how vigilance changes over time. I believe this work will contribute greatly to the literature. However, I have one main concern in the interpretation of the patterns of results and the a priori assumptions that are made, but never explicitly discussed or justified.

      The introduction makes it clear that the authors acknowledge that there may be multiple sources of interference contributing to declining vigilance over time: the encoding of sensory information, appropriate responses to the stimuli, or a combination of both. In the introduction, it would help if the authors review how infrequent targets affect response patterns.

      In addition, it would help if the theoretical approach and assumptions of the authors were explicitly stated. On p. 23, lines 481-483: The connectivity analysis between the frontal and occipital areas as a way to get at the effect of vigilance is useful, but some consideration of the theoretical justification for this analysis should be added here. The a priori assumption surrounding this analysis should be acknowledged and discussed in the interpretation of the pattern of results (e.g., p. 32, line 658). Based on the analysis between frontal and occipital areas, we have to assume it's the sensory processing alone, but this does not preclude other influences. For instance, effects could also occur on response patterns. These considerations should be added as caveats to the interpretation and to avoid the impression of a confirmation bias.

    2. Reviewer #2:

      In the manuscript "Neural signatures of vigilance decrements predict behavioural errors before they occur", Karimi-Rouzbahani and colleagues present a study which used a multiple-object monitoring task in combination with magnetoencephalography (MEG) recordings in humans to investigate the neural coding and decoding-based connectivity of vigilance decrements. They found that increasing the rarity of targets led to weaker decoding accuracy for the crucial feature (distance to an object), and weaker decoding was also found for misses compared to correct responses. They also report a drop in decoding-based connectivity between frontal and occipital/parietal regions of interest for misses, and they could predict upcoming performance errors early during a trial based on accumulative decoding accuracy for the relevant target feature.

      This is an interesting study with a quite complex paradigm and a very interesting analysis approach. However, the logic of the approach and the results are rather difficult to unpack, and I am not convinced that it is always correct. My main issues are: Firstly, it is not clear what role eye fixations play here. Participants could freely scan the display, so the retinotopic representations would change depending on where the participants fixate, but at the same time the authors claim that eye position did not matter. Secondly, the display of the results is very dense, and it is not always clear whether decoding for a specific variable was above chance or not. The authors often focused on relative differences, making it difficult to fully understand the meaning of the full pattern of results. Thirdly, the connectivity analysis appears to be a correlation of decoding results between two regions of interest. The more parsimonious interpretation here is that information might have been represented across all channels at this time. Lastly, while this is methodologically interesting work, there is no convincing case made for what exactly the contribution of this study is for theories of vigilance. It seems that the findings can be reduced to that a lack of decodability of relevant target features from brain activity predicts that participants will miss the target. I have outlined my specific comments below.

      1) Methods, Page 11: The authors state that "We did not perform eye-blink artefact removal because it has been shown that blink artefacts are successfully ignored by multivariate classifiers as long as they are not systematically different between decoded conditions (Grootswagers et al., 2017)." I actually doubt that this is really true. Firstly, the cited paper makes a theoretical argument rather than showing this empirically. Secondly, even if this were true, the frequency of eye-related artefacts seems to be of crucial importance for a paradigm that involves moving stimuli (and no fixation). There could indeed be systematic differences between conditions that are then picked up by the classifier (i.e. if more eye-blinks are related to tiredness and in turn decreased vigilance). The authors should show that their results replicate if standard artefact removal is performed on the data.

      2) Relatedly, on page 16 the authors claim that "If the prediction from the MEG decoding was stronger than that of the eye tracking, it would mean that there was information in the neural signal over and above any artefact associated with eye movement." In my view, this statement is problematic: Firstly, such a result might only mean that prediction from MEG decoding is stronger than decoding from eye-movements, but not relate to "artefacts" in general, to which blinks would also count. Secondly, given that the signal underlying both analyses is entirely different (and the number of features), it is not valid to directly compare the results between these analyses.

      3) Results: The Bayes-factor plots in the decoding results figures are so cramped that it is very difficult to actually see the individual dots and to unpack all of this (e.g., Fig 3). I'm wondering whether this complexity could be somehow reduced, maybe by dividing the panels into separate figures? The two top panels in Figure 3B should also include the chance level as in A. It looks like the accuracy is very low for unattended trials, which is only true in comparison to attended trials, but (as also shown in Supplementary Figure 1) it was clearly also encoded in unattended trials, which is very important for interpreting the results.

      4) The section on informational brain connectivity already contains a fair bit of interpretation and discussion in relation to the literature (e.g., "Weaker connectivity between occipital and frontal areas could have led to the behavioural misses observed in this study [...]"). This should be avoided.

      5) Relatedly, if I understand the informational brain connectivity analysis correctly, the authors only show that frontal and occipital/parietal patterns of decoding results are correlated? This means, if one "region" allows for decoding the distance to the object, the other one does too. However, this alone does not equal connectivity. It could simply mean that patterns across the entire brain allow for decoding the same information. For example, it would not be surprising to find that both ROIs correlate more strongly for correct trials (i.e. the brain has obviously represented the relevant information) than for errors (i.e. the brain has failed to represent the information), without this necessarily being related to connectivity at all. The information might simply be spread-out across all channels. The authors show no evidence that only these two (arbitrarily selected) "regions" encode the information while others do not. In my view, to show evidence for meaningful connectivity, a) the spread of information should be limited to small sub-regions, and b) the decoding results in one "region" should predict the results in another region in time (as for DCM).

      6) Predicting miss trials: The implicit assumption here is that there is "less representation" for miss trials compared to correct trials (e.g., of distance to object). But even for miss trials, the representation is significantly above chance. However, maybe the lower accuracy for the miss trials resulted from on average more trials in which the target was not represented at all rather than a weaker representation across all trials. This would call into questions the interpretation of a decline in coding. In other words, on a single trial, a representation might only be present (but could result in a miss for other reasons) or not present (which would be the case for many miss trials), and the lower averages for misses would then be the result of more trials in which the information was completely absent.

      7) Having said that, I am wondering whether the results of the subsequent analysis (predicting misses and correct responses before they occur) might be in conflict with my more pessimistic interpretation. If I understand this correctly, here the classifier predicts Distance to Object for each individual trial, and Fig 6B shows that while there is a clear difference between the correct and miss trials, the latter can still be predicted above chance level but never exceed the threshold? If this is true for all single trials, this would indeed speak for a weak but "unused" representation on miss trials. But for this the authors need to show how many of the miss trials per participant had a chance-level accuracy (i.e. might be truly unrepresented), and how many were above chance but did not exceed the threshold (i.e. might have been "less represented").

      8) In general, it is not clear to me how the brain decoding results were impacted by participants freely looking around on the screen. I am not convinced that decoding from the strongly reduced feature space of eye movements necessarily gives an answer. More detailed analyses of fixations and fixation duration on targets and distractors might indeed be strongly related to behaviour. What is decodable at a given time might just be driven by what participants are looking at.

      9) Discussion: The authors discuss their connectivity results in relation to previous studies on connectivity changes in mind wandering. However, given that the connectivity analysis here is questionable, I'm not sure these results can be meaningfully related.

      10) Overall, even if the issues above are addressed, the study only demonstrates that with less attention to the target, there is less evidence of representations of the relevant features of targets in the brain. The authors also find the expected decrements for rare targets and when participants do not actively monitor the targets. While this is interesting, in particular to directly show this in neural representations, I am not sure whether this is also a conceptually novel contribution to the field. It seems that these general effects are quite well-known from previous work (although demonstrated with different methods)? I am not sure how these findings actually contribute to "theories of vigilance", as claimed by the authors.

    3. Reviewer #1:

      Karimi-Rouzbahani and colleagues investigate vigilance and sustained monitoring, using a complex and intriguing task in which participants attend to multiple colored dots moving towards the center and occasionally make. They use computationally sophisticated multivariate analyses of MEG data to disentangle attentional factors in this task. The authors demonstrate that they can decode spatial location of the dot (left vs. right) as well as the spatial distance from the critical deflection location, and relate the multivariate decoding ability to features of the task. In addition, they develop methods that can predict errors by accumulating information from distance-based classifiers in the time window preceding behavioral responses. While I was intrigued by this paper, I had numerous questions about the details of their multivariate pattern analyses and the conclusions that they drew from them.

      1) One key finding was that while classifying the direction of the dots was modulated by attention, it was insensitive to many features that were captured by a classifier trained to decode the distance from the deflection. In some ways, I find this very surprising because both are spatial features that seem hard to separate. In addition, the procedures to decode direction vs distance were very different. Therefore, I wonder if there would still be a lack of an effect if the procedure used to train the direction classifier was more analogous or matched?

      2) The distance classifier was trained using only correct trials. Then in the testing stage, it was generalized to either correct or miss trials. While I understand the rationale for using correct trials, I wonder if decoding of error prediction is an artifact of the training sample, reflecting the fact that misses were not included in the training set?

      3) By accumulating classifiers across time, it looks like classifier prediction improves closer to deflection. However, this could also be due to the fact that the total amount of information provided to the classifier increased. I understand the rationale that additional information improves classification, but I wonder if that is because classifiers are relatively poor at distinguishing adjacent distances? Alternatively, perhaps there is a way to control for the total amount of information at different timepoints (e.g., by using a trailing window lag rather than accumulation), or contrast the classifier that derives from accumulating information with the classifier trained moment-by-moment?

      4) The relationship between the vigilance decrement and error prediction. Is vigilance decrement driving the error prediction? That is, if errors increase later on, and the signal goes down, then maybe the classifier is worse. Alternatively, maybe the classifier predictions do not necessarily monotonically decrease throughout the experiment. I wonder if the classifier is equally successful at predicting errors early and late?

      5) When decoding of distance, one thing I found intriguing is that active decoding declines from early to late, even though performance does not decline (or even slightly improves from early to late). This discrepancy seems hard to explain. Is this decline in classification driven by differences in the total signal from early to late?

      6) I noted that classifier performance was extremely high almost immediately after trial onset. Does the classifier perform at chance before the trial onset, or does this reflect sustained but not stimulus-specific information?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 4 of the manuscript.

      This manuscript is under revision at eLife.

      Summary:

      Karimi-Rouzbahani and colleagues investigate vigilance and sustained monitoring, using a multiple-object monitoring task in combination with magnetoencephalography (MEG) recordings in humans to investigate the neural coding and decoding-based connectivity of vigilance decrements. Using computationally sophisticated multivariate analyses of the MEG data, they found that increasing the rarity of targets led to weaker decoding accuracy for the crucial feature (distance to an object), and weaker decoding was also found for misses compared to correct responses.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes (Nr1d1, Cry1, Bmal1), and performed model selection/fitting to explain the observed mRNA distributions. They decomposed the mRNA variability into distinct sources, and showed that intrinsic noise (transcription burst) dominates the variance. Therefore, looking at transcript counts may not be feasible to estimate single-cell circadian phase. However, the study is quite descriptive and ends up being a bit dissatisfying, so if the authors could improve this aspect by perhaps analyzing a mechanism on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho), it would help quite a bit in this regard. The model selection/fitting itself was not really sufficient to compensate for this, as it stands .

      We thank the reviewer for appreciating the new smFISH data, the analyses performed, and the consequences regarding phase inference from single cell snapshots.

      The reviewer suggests “perhaps analyzing a mechanism on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho)”, and we have thus added a new Results paragraph (lines 281-316) and two new Supp Figures 13 and 14 to directly address this point.

      Specifically, we have added a dynamic, stochastic model of the circadian clock in order to add mechanistic insight into the parameters of the preferred model M4. Concerning \rho, in the initial manuscript we suggested that the correlations of cell-specific burst sizes (described by the parameter \rho) in the preferred model M4 could result from the underlying network topology. To substantiate this claim, we have now added an analysis of a stochastic model of the clock that includes gene-gene interaction amongst the core-clock genes. The core-clock network involves variables (such as protein levels), parameters (such as mRNA/ protein half-lives) and additional genes (such as Clock) that are not directly measurable in our experiments; and thus offering a detailed mechanistic mathematical model for our data is therefore not realistic. We therefore developed a simplified mathematical model for the three measured genes to explore the underlying mechanisms that could control the parameter \rho, as the referee suggests. As a starting point, we used the circadian clock gene network topology for Nr1d1, Cry1 and Bmal1 as modelled in Relógio et al. (Relógio et al., 2011) (see new Supplementary Material). To keep the model close to the inference framework, we used oscillatory functions for the burst frequency while the transcription rate (and hence the burst size) for each gene is affected by the protein levels of the other genes in the network. Using stochastic simulations we show that, for particular configurations of feedback where the negative repression of Nr1d1 by CRY1 is high, the network can generate positive mRNA correlation between Bmal1/Cry1 mRNA and negative correlation between Nr1d1/Cry1mRNA, as observed in our data (Figure 2C). Furthermore, using the same inference framework as for our data on the simulated mRNA distributions, the obtained \rho is positive for Bmal1/Cry1 and negative for Nr1d1/Cry1, which was also found for our data (Figure 3C). Even though the model is clearly a simplified representation of the clock, these simulations give credence to the scenario that the \rho parameter obtained from the data is a signature of the underlying network topology.

      While the emphasis of the paper is certainly on parameter inference of the single-cell RNA FISH data, we believe the addition of this dynamic model provides more mechanistic insight into the results of the model fitting and hence significantly more depth to the article.

      \*Specific comments:** *

      1.It is hard to distinguish the RNA FISH signals (Figure 1A, 2B). It is probably technically challenging as the mRNAs are of low abundance. I think it may help if they adjust the contrast for the cytoplasm stain or just delineate the cell boundaries.

      Thank you for pointing this out, and we agree that our rendering of the FISH images was not optimal and have now significantly improved it (see new Figure 1A and 2B). Considering the other reviewers’ comments related to the images, we have now 1) added the cell contours as requested; 2) use red/green for the smFISH signal in the pairs of genes; 3) we have improved the contrast to make it easier to distinguish the RNA FISH signals.

      2.In Figure 2C, the authors showed gene-pair correlations with cells of all sizes. Could the authors do a size-dependent extrinsic-noise filtering (Padovan-Merhar, Dev. Cell, 2015; Hansen et al., 2018, Cell Systems) to better dissect the correlations?

      We used negative binomial distributions to directly model the number of mRNA in the cells, which is a natural choice given that the raw smFISH are integer counts. The model incorporates cell size dependencies in a unified framework, which predicts the joint distribution of raw counts, which is why we showed raw counts in the main figure. That being said, as the referee suggests, it can be useful for exploratory purposes to see the relationship between the measured genes while regressing out the contribution of cell area, and we have now added this analysis as Supp Figure 9. On line 156-161 we write:

      “To also estimate the correlation between genes while accounting for cell area, we regressed out the area for each gene and recalculated the correlation coefficients [37,38]. Since all genes are positively correlated with area (Fig. 2A), this processing shifted the correlations for both pairs of genes. Specifically, the correlation coefficients for the area-filtered mRNA counts decreased but remained positive for Bmal1/Cry1 and became more negative for Nr1d1/Cry1(Supp Figure 9).”

      3.For fitting model M3, as the authors pointed out, there are many local minima. Is the fitting score truly sufficient to eliminate the possibility for partial synchrony especially considering that the authors didn't show how effective the Dex treatment was to synchronize the circadian phase?

      Thank you for this comment. In fact, we didn't mean to fully eliminate the possibility of imperfect synchronization, but have tried our best to address it both experimentally and with modeling.

      Experimentally, in addition to the Dex treatment, we also compared with a condition in which we entrained the cells using temperature cycles, which is a standard in the field to achieve the best synchronization. We obtained a fold change of 2.1, which was in the range of previous studies (Saini, et al, 2012) and was slightly higher than with Dex synchronisation (1.6). Given that the improvement was not high and that it was important for us to study the system under free-running conditions and not in an entrained state (i.e. phase locking, which distorts the free dynamics and noise characteristics of the oscillator), we used the Dex protocol.

      Model 3 was used as a computational approach to correct for the individual phases. In addition to the difficult optimisation landscape, the challenge with model M3 also resides in the difficulty of estimating an individual phase for each cell, as the two mRNA counts measured in each cell do not contain sufficient phase information. This could potentially be resolved by either measuring more genes simultaneously, but is, however, beyond the scope of the present manuscript. We have added discussion on this to the text on lines 244-248:

      “Thus, it was apparently difficult to use model M3 to correct the individual phase for each cell, likely due to the fact that the two mRNA counts measured in each cell do not contain sufficient phase information, and that the global optimisation problem contains many local minima. This could potentially be improved by measuring more genes simultaneously.”

      We have also added a new Results section (lines 305-316) and Supp Figure 14 to show that imperfect synchrony alone cannot explain the correlation structure observed in our data. Indeed, if two genes have a similarly phased oscillation, the expression of the two genes will be positively correlated (as shown in the new Supp Figure 14). Similarly, when the oscillations are in anti-phase, negative correlations will be found. Given that Nr1d1 and Cry1 are closer in phase than Bmal1 and Cry1, one would expect that the correlation between Nr1d1 and Cry1 (once accounting for area) would be more positive than for Bmal1 and Cry1, which was not found in the data (area-corrected correlations shown in Supp Figure 9). It therefore seems unlikely that the observed correlations could be caused by imperfect synchrony alone. Together with our simulations of the gene network (described above), we therefore argue that gene-gene interactions are a more plausible mechanistic explanation of the correlations observed in our measured bivariate mRNA distributions.

      4.Regarding model M4, the authors added a cell-specific noise term without specifying the contributing factors. Typically adding degrees of freedom should improve fitting and make it easier for a model to fit, why not in this case? Can the authors provide some explanations/mechanisms.

      We believe there has been a misunderstanding regarding model M4. By adding parameters, model M4 is indeed easier to fit. There is even a problem of overfitting whereby the burst frequency becomes unrealistically high and the model effectively fits a Poisson distribution to each individual cell. To avoid this, we lock the burst frequency values to the posterior mean values from model M2. After describing model M4, we write (lines 260-265):

      “When all parameters are free, we noticed that the burst frequency can become unrealistically high due to a tendency to overfit to individual cells, and we therefore locked the burst frequency to the posterior mean values from model M2. The PSIS-LOO scores overall favoured model M4 (Fig. 3B), and the predicted joint probability density shows good similarity to the observed data (Fig. 3D) (all time points shown in Supp figure 11).”

      Regarding the above comment in the reviewer’s summary on contributing factors of model M4 we added a simple dynamical model that attempts to explain at least one possible mechanism of generating correlations in cell-specific bursting parameters (see above).

      5.The authors should include the number (range) of cells analyzed in the figure legends.

      We have now added the number of cells used at each time point to the legend of Figure 1D. To respond to Reviewer #2 we have also added details on the number of smFISH replicates used at each time point. The number of cells for each replicate is shown in Supp Figures 2-5.

      Reviewer #1 (Significance (Required)):

      Overall, we felt conflicted about the manuscript. On one hand, the authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes. On the other hand, the manuscript was a bit dissatisfying/descriptive. If the authors could provide and analyze some sort of mechanisms on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho) it should help improve the manuscript.

      We thank the review for the suggestion to expand on the mechanistic interpretation, which we have followed. In addition, we would like to emphasise that a similar smFISH analysis of the core circadian oscillator has never been done, and we believe our data represents a significant contribution to the field. Moreover, our quite generic probabilistic inference framework for smFISH using mixture models to describe intrinsic (transcriptional bursting) and extrinsic fluctuations is also novel and the code provided (written using the Stan probabilistic programming language) might find a wide applicability.

      Concerning the mechanistic description, as described above, we added a stochastic, dynamic model of gene expression and propose that gene-gene interactions within the core-clock network topology represent a plausible mechanism for generating correlated burst parameters between genes, which are a feature of the preferred model M4 found during inference. We additionally added an explanatory figure to argue that, given the phase relationship between genes, imperfect synchronisation alone cannot explain the observed correlations that we observe between the pairs of genes. Together, this analysis provides more mechanistic insight into the underlying factors controlling the gene-gene relationships in our measured bivariate mRNA distributions.

      \*Referees cross-commenting** *

      I agree with Reviewer #3 regarding expanding the discussion to include the Shah & Tyagi and Raj et al citations on buffering. However caution should be exercised regarding ref 26 as it is quite controversial and subsequent analyses came to different conclusions (PMID: 30359620 and 30243562). The general consensus is that nuclear buffering of transcript noise (proposed in ref 26) is not a general phenomenon (ref 27 is specific to the calcium response pathway). In fact, the presence and evolution of specific pathways to buffer transcriptional noise, such as protein-protein mechanisms (Shah & Tyagi) or extended half-life proteins (Raj et al. and others), argues that transcript fluctuations are not probably buffered in general.

      Following the suggestion of Reviewer #3, we have expanded the Discussion to include the references cited (Shah & Tyagi, Raj and others).

      Previous work from our lab is also nuancing the conclusions from references 26 and 27. Specifically, buffering effects are expected to be highly gene-specific (3’UTR), and in fact we have not seen those with our unstable construct during live-cell imaging (Suter et al., 2011; Zoller et al., 2015). We have also added text in order to explicitly state that subsequent papers have nuanced the general claims in references 26 and 27. In the text we write (lines 335-342):

      “One explanation for the low intrinsic fluctuation in these studies is that transcriptional fluctuations are filtered by nuclear retention, though other reports suggest that Fano factors (variance/mean, a measure of overdispersion compared to the Poisson distribution) can be even larger in the cytoplasm than in the nucleus [38]. In the cells used here, the strong signature of transcriptional bursting and high intrinsic noise is consistent with live imaging of a Bmal1transcriptional reporter in the same cell line under similar growth conditions, where intrinsic noise was estimated to be 4-times larger than extrinsic noise [23].”.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      \*Summary:** *

      The authors study experimentally and computationally the dynamic transcription of circadian clock genes over time in individual cells with single molecule RNA-FISH with the aim to understand how different noise sources contribute to single cell transcription variability and basic functions of circadian clocks. The authors integrate experiments with computational modeling to understand biology.

      \*Major comments:** *

      This study has some major limitations that need to be addressed to test the model usefulness, to understand noise sources and to gain biological insights into circadian clocks.

      We thank this reviewer for the constructive feedback which enabled us to significantly strengthen the revised manuscript.

      The limitations are on the experiments, the computational implementation of the modeling and the integration of experiments with models.

      Although the experimental datasets contain several hundred cells per time point for multiple time points, only a single replica experiment is presented. From the presented data it is not clear how reproducible these temporal patterns are and if indeed differences between timepoints can be resolved if multiple biological replica experiments have been analyzed. To address this point at least three biological experiments needs to be presented and analyzed for each of the genes. Plotting the SEM on the means in figure 1B is misleading because several hundred cells have been measured which automatically makes the error small. The SEM just describes how well we can determine the mean from a distribution. Instead a mean and std from the biological replicas need to be plotted to show how experimental variability in experiments is resulting in the described expression pattern. This is similar to RNA-seq data or RT-PCR from multiple replica.

      We certainly agree that demonstrating reproducibility is important. Note that our smFISH data is from three independent cell culture dishes and microscopy slides, which included independent cell synchronization. This was described in the Methods but we agree that the data presentation was not showing the individual replicas, which we have now added. In Figure 1B, we now show the mean of each replicate for each time point. While the reviewer suggested displaying the mean and standard deviation across replicates, we show all data points at each time point to make it even more transparent. The mRNA distribution of each replicate is also shown in Supp Figures 2-5, together with individual quantification of mean, coefficient of variation and number of cells.

      In addition, to further demonstrate the reproducibility of the temporal patterns we have performed an additional independent experiment on four time points. This experiment shows that the oscillatory patterns for Nr1d1 and Cry1are clearly significant and reproducible (new Supp Figure 7). The combination of the replicates shown for the main experiment (Supp Figures 2-5) and the new replicate experiment (Supp Figure 7) shows that the oscillatory temporal patterns for the mean mRNA levels are robust and reproducible, and in fact similar as those found in bulk analyses (Ukai-Tadenuma et al., 2011; Hughes et al., 2009), which is expected.

      It is also not clear how good the cell segmentation works and how does cell segmentation influence the analysis. In figure 1A show the segmentation of the cell boundary together with the membrane stain.

      Thanks to this and other reviewers’ comments, we have now significantly improved the presentation of the FISH images. We have now 1) added the cell contours as requested; 2) used red/green for the smFISH signal in the pairs of genes; 3) we have improved the contrast to make it easier to distinguish the RNA FISH signals.

      We have also added Supp Figure 1 to show that the cell segmentation we used is reliable. In fact, as we had described, we used the sum Z-stack projections of the red channel (Wu et al., 2018), which we found provides the most accurate cell segmentation. We now show in Supp Figure 1 that the obtained segmentation shows convincing agreement with the cell autofluorescence .

      The authors use the RNA mean and RNA-FISH distributions and combine this data to build and compare different models. How do you know that the given data fulfils the central limit so that a model describing the mean is an adequate approach? To test this point, the authors should show through subsampling from the data and the model that indeed their data sets have enough cells to fulfil the central limit theorem.

      This comment reflects a misunderstanding of our approach, which we now try to better explain. In our inference framework we use a negative binomial (NB) distribution (and mixtures of NBs) to model the full distribution of mRNA counts, and our approach is therefore not based exclusively on the mean of the distribution. The estimation of model parameters and comparison of models is performed using the PSIS-LOO optimisation procedure (see below). The mixture model of NB binomials makes a few assumptions which we had clearly stated. In fact it captures both bursty transcription (in the limit of short bursts as is biologically plausible, which yields the NB distribution), and cell-to-cell variability (extrinsic noise) captured by the mixture. The suitability of the NB to model bursty transcription is established (Raj et al., 2006), and it is parameterized by a mean and a dispersion coefficient, such that the CV of the distribution is the inverse of the burst frequency (Zoller et al., 2015). Therefore the mean is indeed an important parameter of the model, but we do not see the relationship with the CLT. The used probabilistic inference (PSIS-LOO: Pareto-Smoothed Importance Sampling Leave-One-Out, Vehtari et al. 2017, see below) is established and state-of-the-art for selecting models of the appropriate complexity and we are not aware of a similar previous quantitative model for smFISH analysis.

      We have now added significantly more explanations both on the general approach as well as the methodological details in a fully-revised Methods section to avoid further misunderstanding.

      A strength of the manuscript is that several competing and biologically meaningful models have been generated. However, the manuscript lacks rigor in terms of how fitting and model selection is performed. It is not clear how good the models fit the data. To address this point, the authors should visually compare the model fits to the data and plot their fit errors as a function of model complexity.

      We fully agree that comparing different models using a model selection approach is a powerful methodology, in fact it is arguably the most systematic way to approach modeling problems in quantitative biology. Model selection is an active research area and there have been significant developments recently. Here, we used a state-of-the-art and established Bayesian approach (PSIS-LOO: Pareto-Smoothed Importance Sampling Leave-One-Out, Vehtari et al. 2017), which is certainly rigorous and more objective than visual comparison. The PSIS-LOO is conceptually similar to other approaches of model performance such as AIC or WAIC, and the entire field of model selection aims at establishing rigorous methods to assess the tradeoff between fit errors and model complexity. In PSIS-LOO, this is done by using pareto-smoothed importance sampling to estimate the expected log pointwise predictive density for a new dataset using leave-one-out cross-validation. The PSIS-LOO is the currently recommended metric for measuring model performance in Bayesian analysis (Vehtari et al., 2017) and is considered superior to other approaches such as computations of Bayes factors since it is less sensitive to model priors (Gelman et al. 2013). The performance of the models as measured with PSIS-LOO is shown in Figure 3B. As already mentioned, we have added further details as to how the fitting and model selection is performed in a revised Methods section. We agree that visual comparison is useful to gain intuition and this is why we showed the bivariate distributions in Figure 3D and in Supp Figure 11.

      Regarding the comment on “fit error”, note also that we probabilistically model the full mRNA distribution for each gene. In each cell, there is a likelihood score that measures the likelihood of observing the measured mRNA count given the modelled probability distribution. As our approach is based on this likelihood, the notion of “fitting error” needs to be replaced by the log likelihood (‘fitting error’ is mathematically equivalent to a log-likelihood when the noise model is Gaussian, which is not the case here).

      Another limitation is that the models have not been validated for example by using them to make predictions. One type of prediction could be to fit the model to one biological replica and then predict the other replica (cross validation). Another prediction would be to take the distribution fitted to the experimental data and then compare the model mean to the experimental mean.

      Thank you for this comment. As explained above, we used the state-of-the-art PSIS-LOO to measure the predictive performance of the models, which approximates the result of leave-one-out cross-validation using the full data set. To further assess the predictive capabilities of the model, we have now also added a “leave-replicate-out” cross-validation, as the reviewer suggests (new Supp Figure 12). The aim of our “leave-replicate-out” cross-validation was to test how well the predictions of each model generalise to independent cells that are not in the training set. To do this, we trained each model while omitting the data from one gene on a test slide. We then calculated the likelihood score of the test slide using the parameters from the training set, and repeated this for all slides. Similarly to the PSIS-LOO, the results of the leave-replicate-out cross-validation convincingly show that model M4 has the highest predictive performance. This is now described in the updated text on lines 265-271.

      The results from fitting and prediction should be plotted as a function of model complexity. This kind of analysis will illustrate how model complexity is supported by the data.

      As already mentioned, we used state-of-the-art algorithms to analyze prediction vs. complexity. With the above addition, we now have two methods of calculating the predictive performance of each model: the approximate leave-one-out score as measured with PSIS-LOO and the leave-replicate-out cross-validation. For each model, the PSIS-LOO score is plotted in Figure 3B and the leave-replicate-out cross-validation score is shown in Supp Figure 12.

      In the method section on models, a biological motivation must be presented to justify the different model assumption.

      Thank you for pointing out that the biological justification of the models needed to be expanded. In addition to the improved justifications already provided in the Results section, we have now updated the Methods section such that a biological motivation is included for each model.

      How do the models that fit the distributions describe the mean?

      As explained above, the inference is performed on the entire distributions, using a family of distributions (mixtures of NBs) which are parameterized in a biologically relevant manner (transcriptional bursting + extrinsic noise). The mean and variance of the distribution are now described on lines 585-586 in addition to Figure 3A.

      It is necessary to list model parameters for each of the models, their description, their parameter values, their parameter uncertainty and units of each parameter.

      Thank you, this has now been added as Supplementary Tables 2-5.

      It is not clear to me how the joint probability in figures 2,4, S2 and S4 have been used to fit the model.

      Again, the joint distributions are modeled using mixtures of NBs and the inference is performed on the entire dataset at once using a log-likelihood approach. This uses all the data at once, and it is embedded in a Bayesian model selection method. The way that the joint probability is used is now clarified in the revised Methods section and in the Results section (lines 208-214):

      “For both models M1 and M2, the likelihood of observing the data given the parameters of the model is evaluated using the model-specific NB distribution and the mRNA counts for both genes in each cell. This is performed for both Bmal1/Cry1 and Nr1d1/Cry1 pairs across all time points, and this likelihood is combined with model priors to define the posterior parameter distribution for each model (Methods). We applied Hamiltonian Monte Carlo sampling within the STAN probabilistic programming language to sample the posterior distribution and infer model parameters 40.”

      How do the models make sense in the context of the fact that human genes exist as a diploids?

      This is a good point, although note though that the 3T3 cells are from mice and not humans. 3T3 cells are tetraploid, and it turns out that under the justified assumption that the bursts are short (Zoller et al., 2015; Suter et al., 2011), the number of alleles rescales the burst frequency, i.e. the effective (observed) burst frequency equals the number of alleles times the burst frequency per allele, but it does not change the shape of the distributions. On line 580-582 we have now written: “Since 3T3 cells are tetraploid, and, again assuming that the bursts are short, the inferred burst frequency for tetraploid cells will be approximately four times that of a single allele.”

      The variance decomposition is shortly described but no results are presented to show how this is done. This should be better explained.

      The variance decomposition we used is not a new result; in fact, we used the analytical results of Bowsher, C. G. & Swain, P. S. “Identifying sources of variation and the flow of information in biochemical networks” (PNAS, 2012). The mathematical proofs of the formula we use are contained within that reference; however, we have re-written this section to make it clearer to the reader (lines 688-718).

      \*Minor comments:** *

      In figure 3A, it is not clear to me what these different plots relate to the models. It is also not clear what are equations that describe each model.

      The Methods section has now been improved to show the full data-generating mechanism for each model, and each model has its own section title to make it easier to find. We have also improved the legend for Figure 3 to make the relationship to each model clearer.

      The legends in figure 3 are not very informative. More details need to be presented to understand this figure.

      Thank you for pointing this out, and we have now re-written the figure legend for Figure 3 to make the figure clearer.

      Reviewer #2 (Significance (Required)):

      This is an interesting and important topic with the potential to have general implication of how to model periodic single cell gene expression data and eventually better understand circadian clocks. This study will expand on other modeling studies of circadian clocks and has the potential to advance the field (PMCID: PMC7229691). I personally have done similar analysis and experiments in another system and biological context which has demonstrated the power of this approach if implemented rigorously. I am not an expert in circadian clocks in human cells.

      We thank the reviewer for appreciating the implications for the circadian and single cell gene expression community. Note that to our knowledge, modeling smFISH counts using mixtures of negative binomials combined with Bayesian model selection has not been done. It is both highly relevant biologically (combines intrinsic and extrinsic fluctuations in a rigorous way), general and its applicability extends far beyond the circadian oscillator. Therefore, this approach for quantitative smFISH data analysis also fills an important methodological gap.

      \*Referees Cross commenting** *

      Reviewer #1:

      I agree with the assessment that model fitting and model selection was not sufficient. But I disagreed that the data is enough. Although many cells and time points are analyzed, there is no evidence of how reproducible each mRNA distribution can be measured at each time point. I think reproducibility is key and will also help with the model fitting and identification.

      Regarding the point on reproducibility, we have made the following four changes:

      1. We have added an independent 4 time-point experiment to show that the oscillatory patterns of the distributions are reproducible (Supp Figure 7).
      2. In Figure 1 we now also show the mean of each replicate for the main experiment (Figure 1B).
      3. We also show the mRNA distributions of each replicate in Supp Figures 2-5.
      4. We have added the “leave-replicate-out” cross-validation to show that that the model performance of the preferred model generalises to independent slides that were not included in training (Supp Figure 12). In responding to Reviewer #1 regarding the modeling, we have now also added a simplified dynamical model of circadian clock expression to add mechanistic insight into our proposed models. Overall, we have significantly expanded the description of the model selection approaches to help readers who are less familiar with Bayesian model selection methods.

      Reviewer #3:

      Regarding the red background, my understanding is that this comes from the probe hybridization. This is maybe because the probe concentration has not been optimized or the number of probes per gene is low and the signal to noise is not so good.Or it could be auto fluorescent background. In this case a different fluorophore needs to be used to avoid this problem.

      Thank you for those comments, and we agree with all reviewers that the presentation of the images needed to be improved. It turned out that in Figure 1, we had shown the cell mask in red so it is clearly not related to probe concentration or autofluorescence. We have now removed the cell mask channel from the main images which allows highlighting better the smFISH signals. All smFISH images for Figures 1 and 2 have been much improved, and we’ve added a new Supp Figure 1 to show the performance of our cell segmentation.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this paper Nicholas et al image mRNAs encoding the key controllers of circadian rhythms, Rev-erba, Cry and Bmal1 in single cells over time. It was shown earlier that single cells exhibit circadian rhythms using reporter genes. A large number of studies have shown that transcription is an inherently stochastic process, which raises a question as to how single cells are able to achieve their rhythms on the face of this noise. Their results show that the number of mRNAs for the three genes exhibit the expected periodicity, but this periodicity is associated with significant cell-to-cell variation. They also explore to what extent this variability derives from stochastic transcription vs other sources of variation that are extrinsic to the genes. The results are interesting and experimental and modeling results are important (however this reviewer is not able to judge the veracity of mathematics that underlay the models).

      We thank this reviewer for appreciating the importance of our work.

      \*Some of the concerns that arose are listed below:** *

      1.The images show an annoying red background. If the red is HCS cell mask, it should be removed, and RNA presented on grey scale. This will make a better presentation. The red hue also appears in fig 2 b but here it is one of the RNA. I suggest in Fig 2 one RNA can be presented in green and the other in red, while the nuclei in blue.

      Thank you for this comment. We had indeed shown the cell mask in the red channel and now removed it. Together with the other suggestions and comments from the reviewers, we implemented the following changes: 1) added the cell contours as requested; 2) use red/green for the smFISH signal in the pairs of genes; 3) we have improved the contrast to make it easier to distinguish the RNA FISH signals. The presentation of the images is now much improved.

      2.This paper and a few others talk about the cell size contributing to the cell-to-cell variability in mRNA numbers. Where does it come from physically? One can imagine based on the cell cycle stage there could be more than two copies of then gene in a cell, which will yield more RNAs, but they say that their cells don't have much cell cycle variability. Perhaps a clearer discussion is called for rather than just being polite to other investigators.

      The referee is right that several studies observed empirically that larger cells show more mRNA molecules in smFISH experiments (Padovan et al., 2015; Kempe et al., 2015). In Padovan et al. (2015), the authors found that transcriptional burst size changes with cell volume and burst frequency with cell cycle. The main theory for transcription scaling with cell volume is to maintain transcript concentration. Using cell fusion experiments, they showed that cellular size can directly and globally affect gene expression by modulating transcription. Furthermore, they proposed that the mechanism underlying the global regulation integrates both DNA content and cellular volume to produce the appropriate amount of RNA for a cell of a given size, which is consistent with a model whereby a factor limiting for transcription is sequestered to the DNA. We used these results to propose a model whereby burst size scales with area, and we found an increase in predictive performance (compare M2 with M1 in Figure 3B). While our model selection supported the inclusion of cell area, the variance decomposition showed that the fraction of variance due to cell area ranged from 4.2% for Nr1d1 to 17.6% for Bmal1. We have now expanded the introduction to discuss this in more depth (lines 73-80) as requested.

      3.References 26 and 27 are cited for 10-80% of variance due to gene extrinsic sources. These references actually deny that there is a significant transcriptional noise in most genes. Again, stronger discussion is called for.

      As mentioned in the reply to Reviewer 1, previous work from our lab is also nuancing the conclusions from references 26 and 27. Specifically, buffering effects are expected to be highly gene-specific (3’UTR), and in fact we have not seen those with our unstable construct during live-cell imaging (Suter et al., 2011; Zoller et al., 2015). We have also added text in order to explicitly state that subsequent papers have nuanced the general claims in references 26 and 27. In the text we write (lines 335-342):

      “One explanation for the low intrinsic fluctuation in these studies is that transcriptional fluctuations are filtered by nuclear retention, though other reports suggest that Fano factors (variance/mean, a measure of overdispersion compared to the Poisson distribution) can be even larger in the cytoplasm than in the nucleus [38]. In the cells used here, the strong signature of transcriptional bursting and high intrinsic noise is consistent with live imaging of a Bmal1transcriptional reporter in the same cell line under similar growth conditions, where intrinsic noise was estimated to be 4-times larger than extrinsic noise [23].”.

      4.The results raise a very important question, whether and to what extent the transcriptional noise propagates to the next step of gene regulation and are there buffering mechanisms in the cell. For example, Raj et al, Variability in gene expression underlies incomplete penetrance, Nature 2010, show that alternative pathways serve to buffer the impact of gene expression noise. Similarly, Shah and Tyagi, Barriers to transmission of transcriptional noise in a c-fos c-jun pathway, Mol Syst Biol, 2013, show that variability in mRNA is buffered at protein level and the level of protein-protein complexes. Furthermore, they show that to the extent those vary, the chromatin intrinsically buffers against the fluctuations in numbers of transcription factors. Mention of these and other studies will enrich the paper.

      We have modified the Discussion section and now discuss these papers (and a few more). We thank the reviewer for the suggestions, which will help the reader to have a broader overview of noise buffering in gene expression and indeed enrich the paper.

      Reviewer #3 (Significance (Required)):

      Significance is high. Quality is high.

      \*Referees Cross-Commenting** *

      I agree with the comments made by other reviewers particularly about references 26 and 27. The major conclusions of reference 26 were questioned by Hansen et al 2018. At the bottom of page 7 the authors are qualifying their results in the light of references 26 and 27. Perhaps now there is less of a need to do so.

      As mentioned above, we have added the following sentence citing the Hansen paper to make it clear to the reader that key conclusions of the references 26 and 27 are disputed (lines 335-342):

      “One explanation for the low intrinsic fluctuation in these studies is that transcriptional fluctuations are filtered by nuclear retention, though other reports suggest that Fano factors (variance/mean, a measure of overdispersion compared to the Poisson distribution) can be even larger in the cytoplasm than in the nucleus [38].

      References

      Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. 2013. Bayesian Data Analysis, 3rd edn. CRC Press, London.

      Hughes ME, DiTacchio L, Hayes KR, Vollmers C, Pulivarthy S, Baggs JE, Panda S, Hogenesch JB. 2009. Harmonics of circadian gene transcription in mammals. PLoS Genet 5. doi:10.1371/journal.pgen.1000442

      Kempe H, Schwabe A, Cremazy F, Verschure PJ, Bruggeman FJ. 2015. The volumes and transcript counts of single cells reveal concentration homeostasis and capture biological noise. Mol Biol Cell 26:797–804. doi:10.1091/mbc.E14-08-1296

      Padovan-Merhar O, Nair GP, Biaesch AG, Mayer A, Scarfone S, Foley SW, Wu AR, Churchman LS, Singh A, Raj A. 2015. Single Mammalian Cells Compensate for Differences in Cellular Volume and DNA Copy Number through Independent Global Transcriptional Mechanisms. Mol Cell 58:339–352. doi:10.1016/j.molcel.2015.03.005

      Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. 2006. Stochastic mRNA synthesis in mammalian cells. PLoS Biol4:e309. doi:10.1371/journal.pbio.0040309

      Relógio A, Westermark PO, Wallach T, Schellenberg K, Kramer A, Herzel H. 2011. Tuning the mammalian circadian clock: Robust synergy of two loops. PLoS Comput Biol 7:1–18. doi:10.1371/journal.pcbi.1002309

      Saini C, Morf J, Stratmann M, Gos P, Schibler U. 2012. Simulated body temperature rhythms reveal the phase-shifting behavior and plasticity of mammalian circadian oscillators. Genes Dev 26:567–580. doi:10.1101/gad.183251.111

      Suter DM, Molina N, Gatfield D, Schneider K, Schibler U, Naef F. 2011. Mammalian Genes Are Transcribed with Widely Different Bursting Kinetics. Science (80- ) 332:472–474. doi:10.1126/science.1198817

      Ukai-Tadenuma M, Yamada RG, Xu H, Ripperger JA, Liu AC, Ueda HR. 2011. Delay in feedback repression by cryptochrome 1 Is required for circadian clock function. Cell 144:268–281. doi:10.1016/j.cell.2010.12.019

      Vehtari A, Gelman A, Gabry J. 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27:1413–1432. doi:10.1007/s11222-016-9696-4

      Wu C, Simonetti M, Rossell C, Mignardi M, Mirzazadeh R, Annaratone L, Marchiò C, Sapino A, Bienko M, Crosetto N, Nilsson M. 2018. RollFISH achieves robust quantification of single-molecule RNA biomarkers in paraffin-embedded tumor tissue samples. Commun Biol 1:1–8. doi:10.1038/s42003-018-0218-0

      Zoller B, Nicolas D, Molina N, Naef F. 2015. Structure of silent transcription intervals and noise characteristics of mammalian genes. Mol Syst Biol 11:823. doi:10.15252/msb.20156257

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this paper Nicholas et al image mRNAs encoding the key controllers of circadian rhythms, Rev-erba, Cry and Bmal1 in single cells over time. It was shown earlier that single cells exhibit circadian rhythms using reporter genes. A large number of studies have shown that transcription is an inherently stochastic process, which raises a question as to how single cells are able to achieve their rhythms on the face of this noise. Their results show that the number of mRNAs for the three genes exhibit the expected periodicity, but this periodicity is associated with significant cell-to-cell variation. They also explore to what extent this variability derives from stochastic transcription vs other sources of variation that are extrinsic to the genes. The results are interesting and experimental and modeling results are important (however this reviewer is not able to judge the veracity of mathematics that underlay the models).

      Some of the concerns that arose are listed below:

      1.The images show an annoying red background. If the red is HCS cell mask, it should be removed, and RNA presented on grey scale. This will make a better presentation. The red hue also appears in fig 2 b but here it is one of the RNA. I suggest in Fig 2 one RNA can be presented in green and the other in red, while the nuclei in blue.

      2.This paper and a few others talk about the cell size contributing to the cell-to-cell variability in mRNA numbers. Where does it come from physically? One can imagine based on the cell cycle stage there could be more than two copies of then gene in a cell, which will yield more RNAs, but they say that their cells don't have much cell cycle variability. Perhaps a clearer discussion is called for rather than just being polite to other investigators.

      3.References 26 and 27 are cited for 10-80% of variance due to gene extrinsic sources. These references actually deny that there is a significant transcriptional noise in most genes. Again, stronger discussion is called for.

      4.The results raise a very important question, whether and to what extent the transcriptional noise propagates to the next step of gene regulation and are there buffering mechanisms in the cell. For example, Raj et al, Variability in gene expression underlies incomplete penetrance, Nature 2010, show that alternative pathways serve to buffer the impact of gene expression noise. Similarly, Shah and Tyagi, Barriers to transmission of transcriptional noise in a c-fos c-jun pathway, Mol Syst Biol, 2013, show that variability in mRNA is buffered at protein level and the level of protein-protein complexes. Furthermore, they show that to the extent those vary, the chromatin intrinsically buffers against the fluctuations in numbers of transcription factors. Mention of these and other studies will enrich the paper.

      Significance

      Significance is high. Quality is high.

      Referees Cross-Commenting

      I agree with the comments made by other reviewers particularly about references 26 and 27. The major conclusions of reference 26 were questioned by Hansen et al 2018. At the bottom of page 7 the authors are qualifying their results in the light of references 26 and 27. Perhaps now there is less of a need to do so.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary: The authors study experimentally and computationally the dynamic transcription of circadian clock genes over time in individual cells with single molecule RNA-FISH with the aim to understand how different noise sources contribute to single cell transcription variability and basic functions of circadian clocks. The authors integrate experiments with computational modeling to understand biology.

      Major comments:

      This study has some major limitations that need to be addressed to test the model usefulness, to understand noise sources and to gain biological insights into circadian clocks.

      The limitations are on the experiments, the computational implementation of the modeling and the integration of experiments with models.

      Although the experimental datasets contain several hundred cells per time point for multiple time points, only a single replica experiment is presented. From the presented data it is not clear how reproducible these temporal patterns are and if indeed differences between timepoints can be resolved if multiple biological replica experiments have been analyzed. To address this point at least three biological experiments needs to be presented and analyzed for each of the genes. Plotting the SEM on the means in figure 1B is misleading because several hundred cells have been measured which automatically makes the error small. The SEM just describes how well we can determine the mean from a distribution. Instead a mean and std from the biological replicas need to be plotted to show how experimental variability in experiments is resulting in the described expression pattern. This is similar to RNA-seq data or RT-PCR from multiple replica.

      It is also not clear how good the cell segmentation works and how does cell segmentation influence the analysis. In figure 1A show the segmentation of the cell boundary together with the membrane stain.

      The authors use the RNA mean and RNA-FISH distributions and combine this data to build and compare different models. How do you know that the given data fulfils the central limit so that a model describing the mean is an adequate approach? To test this point, the authors should show through subsampling from the data and the model that indeed their data sets have enough cells to fulfil the central limit theorem.

      A strength of the manuscript is that several competing and biologically meaningful models have been generated. However, the manuscript lacks rigor in terms of how fitting and model selection is performed. It is not clear how good the models fit the data. To address this point, the authors should visually compare the model fits to the data and plot their fit errors as a function of model complexity.

      Another limitation is that the models have not been validated for example by using them to make predictions. One type of prediction could be to fit the model to one biological replica and then predict the other replica (cross validation). Another prediction would be to take the distribution fitted to the experimental data and then compare the model mean to the experimental mean.

      The results from fitting and prediction should be plotted as a function of model complexity. This kind of analysis will illustrate how model complexity is supported by the data.

      In the method section on models, a biological motivation must be presented to justify the different model assumption.

      How do the models that fit the distributions describe the mean?

      It is necessary to list model parameters for each of the models, their description, their parameter values, their parameter uncertainty and units of each parameter.

      It is not clear to me how the joint probability in figures 2,4, S2 and S4 have been used to fit the model.

      How do the models make sense in the context of the fact that human genes exist as a diploids?

      The variance decomposition is shortly described but no results are presented to show how this is done. This should be better explained.

      Minor comments:

      In figure 3A, it is not clear to me what these different plots relate to the models. It is also not clear what are equations that describe each model.

      The legends in figure 3 are not very informative. More details need to be presented to understand this figure.

      Significance

      This is an interesting and important topic with the potential to have general implication of how to model periodic single cell gene expression data and eventually better understand circadian clocks. This study will expand on other modeling studies of circadian clocks and has the potential to advance the field (PMCID: PMC7229691). I personally have done similar analysis and experiments in another system and biological context which has demonstrated the power of this approach if implemented rigorously. I am not an expert in circadian clocks in human cells.

      Referees Cross commenting

      Reviewer #1: I agree with the assessment that model fitting and model selection was not sufficient. But I disagreed that the data is enough. Although many cells and time points are analyzed, there is no evidence of how reproducible each mRNA distribution can be measured at each time point. I think reproducibility is key and will also help with the model fitting and identification.

      Reviewer #3: Regarding the red background, my understanding is that this comes from the probe hybridization. This is maybe because the probe concentration has not been optimized or the number of probes per gene is low and the signal to noise is not so good. Or it could be auto fluorescent background. In this case a different fluorophore needs to be used to avoid this problem.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes (Nr1d1, Cry1, Bmal1), and performed model selection/fitting to explain the observed mRNA distributions. They decomposed the mRNA variability into distinct sources, and showed that intrinsic noise (transcription burst) dominates the variance. Therefore, looking at transcript counts may not be feasible to estimate single-cell circadian phase. However, the study is quite descriptive and ends up being a bit dissatisfying, so if the authors could improve this aspect by perhaps analyzing a mechanism on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho), it would help quite a bit in this regard. The model selection/fitting itself was not really sufficient to compensate for this, as it stands .

      Specific comments:

      1.It is hard to distinguish the RNA FISH signals (Figure 1A, 2B). It is probably technically challenging as the mRNAs are of low abundance. I think it may help if they adjust the contrast for the cytoplasm stain or just delineate the cell boundaries.

      2.In Figure 2C, the authors showed gene-pair correlations with cells of all sizes. Could the authors do a size-dependent extrinsic-noise filtering (Padovan-Merhar, Dev. Cell, 2015; Hansen et al., 2018, Cell Systems) to better dissect the correlations?

      3.For fitting model M3, as the authors pointed out, there are many local minima. Is the fitting score truly sufficient to eliminate the possibility for partial synchrony especially considering that the authors didn't show how effective the Dex treatment was to synchronize the circadian phase?

      4.Regarding model M4, the authors added a cell-specific noise term without specifying the contributing factors. Typically adding degrees of freedom should improve fitting and make it easier for a model to fit, why not in this case? Can the authors provide some explanations/mechanisms.

      5.The authors should include the number (range) of cells analyzed in the figure legends.

      Significance

      Overall, we felt conflicted about the manuscript. On one hand, the authors generated and analyzed a great amount of single-cell RNA FISH data over time on circadian genes. On the other hand, the manuscript was a bit dissatisfying/descriptive. If the authors could provide and analyze some sort of mechanisms on cell-specific burst size (F5), gene-specific dependence on cell size (beta), or the positive/negative gene-pair correlations (rho) it should help improve the manuscript.

      Referees cross-commenting

      I agree with Reviewer #3 regarding expanding the discussion to include the Shah & Tyagi and Raj et al citations on buffering. However caution should be exercised regarding ref 26 as it is quite controversial and subsequent analyses came to different conclusions (PMID: 30359620 and 30243562). The general consensus is that nuclear buffering of transcript noise (proposed in ref 26) is not a general phenomenon (ref 27 is specific to the calcium response pathway). In fact, the presence and evolution of specific pathways to buffer transcriptional noise, such as protein-protein mechanisms (Shah & Tyagi) or extended half-life proteins (Raj et al. and others), argues that transcript fluctuations are not probably buffered in general.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** In this study, the authors investigate the role of hedgehog signaling and lipid metabolism in the neural stem cell niche of the Drosophila larvae. They demonstrate that Hedgehog localizes to lipid droplets in glial cells and show that Hh is necessary but not sufficient for elaboration of glial membranes and normal rates of glial proliferation during development. In addition, they provide an extensive set of results in support of a model that FGF signaling functions upstream of lipid metabolism and hh in glial cells as well as a parallel ROS mediated pathway in glial cells to promote neuroblast proliferation. In general, the results provide strong support for the conclusions. Specifically, the approaches are sound, the images clearly demonstrate the phenotypes described, and the effects are quantified and tested for statistical significance. **Major comments:** 1.Since Hh RNAi decreases the glial compartment (which slows NB proliferation) and increases the frequency of pH3+ NBs, it is unclear why it would decrease the number of EdU+ NBs (Fig. S3C). 2.If overexpression of htl[ACT] slows the NB cell cycle (as evidenced by reduced pH3 and EdU positive cells), it unclear why it does not reduce the number of NBs (Fig. 4L). 3.What is the justification for presenting the EdU quantifications as an EdU index in which the experimental values are normalized to the average number of positive cells in the control? In many cases, the comparison is to the same w[1118] line so it does not control for a specific genetic backgrounds and yet this method may be obscuring experimental variation present between datasets. Likewise, why is glial number presented as a fold-change but NB number is presented as raw counts (e.g. 2D vs S3E)? **Minor comments:** On the top of P.14, "Figure S7A-C" should probably be "Figure S6A-C" Reviewer #1 (Significance (Required)): The cell autonomous regulation of growth and proliferation of neuroblasts in the larval brain have been well-studied, but much less is known about the non-cell autonomous signals. This paper significantly moves forward knowledge in this area by describing multiple steps of a molecular mechanism for glial regulation of the neuroblast cell cycle. These findings would be of interest not only to the study of Drosophila neuroblasts, but also to the broader adult stem cell field. My expertise is in Drosophila stem cell biology and genetics. Reviewer #2 (Evidence, reproducibility and clarity (Required)): **Summary:** The study by Dong et al., investigates the role of Hedgehog in the glial niche during larval neurogenesis in Drosophila. The authors describe the expression of Hh in cortex glia and its association with lipid droplets. They show that Hh expression in cortex glia is required for cortex glial proliferation, cell autonomously, and for maintenance of the normal cell cycle in neuroblasts. They go on to use a well characterised Drosophila glioma model, activation of FGF signalling, to investigate the requirement for Hh during cortex glial overgrowth. They show that FGF-activated cortex glial overproliferation requires Hh for modulation of neuroblast cell cycle, although Hh does not regulate cortex glial proliferation in this context. Finally, they show that inhibition of lipid modification of Hh rescues the neuroblast proliferation cell cycle defect caused by FGF activation in cortex glia. **Major comments:** 1.From the data in presented in Fig. 2H-K and Fig. S3C, I am very confused about role of Hh in the non-cell autonomous regulation of neuroblast cell cycle. Both RNAi and overexpression of Hh with Repo-Gal4 cause a reduction in the neuroblast EdU index (Fig. 2H-K and S3C). The authors conclude this section on p.7 saying "Together, our data suggests that high levels of glial Hh expression restricts NB cell cycle progression." This statement is not consistent with data. What is the normal physiological role of Hh if both decreased and increased levels of cortex glial Hh expression reduce neuroblast cell cycle? The discussion of p.15 does not clarify this issue. The model in Fig.7J relates to the role of Hh in the context of cortex glial FGF activation and does not illustrate the normal physiological role of Hh in the regulation of neuroblast cell cycle. 2.P.8 "Analysis of the total glial cell number indicates overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of cortex glial cells (Figure 4E-G, I-K)." This statement is confusing as Repo staining was used to quantify total glial numbers (including perineural, sub-perineural and cortex glia) but these data are then taken to represent and increase specifically in cortex glia. This should be clarified. 3.It should be mentioned on p.8 that the data in Fig.4A-K reproduce the findings of Avet-Rochex et al., 2012 and Read et al., 2009. 4.Figure 6F. Presumably due to the increase in glia cell number and dramatic increase in glial cell volume, any gene that is specific to, or enriched in, cortex glia will have increased expression levels in RepoGal4>htlACT larval CNS. Can the authors provide evidence that the increase in the expression of these genes is specific to FGF transcriptional regulation and not just a relative increase in the levels of these genes due to an increase in cortex glia as proportion of total CNS volume? Is there any evidence that Hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling in glia? 5.FGF signalling has been shown to be necessary and sufficient for cortex glial proliferation. So does knockdown of Htl, or expression of dominant negative Htl, cause a reduction in Hh, fasn1 and lsd2 expression in cortex glia? If so, does how does reduction of cortex glial numbers independent of FGF signalling, using for example knockdown of String or expression of Decapo, affect the expression of Hh, fasn1 and lsd2 in cortex glia? 6.Can the authors speculate on why and how increased levels of Hh in cortex glia, in the context of FGF activation, inhibit neuroblast cell cycle? Is this a physiological mechanism to limit neuroblast proliferation in the face of increased gliogenesis, or is it simply an indirect result of 'spillover' of excess Hh from cortex glia onto neuroblasts (which are autonomously regulated by Hh and so sensitive to this ligand) by due to increased cortex glia cells? **Minor comments:** -Figure 1C' some lipid droplets are extremely large, is this consistent with previous literature? -Including a profile plot of relative fluorescence intensity in Figure 1C',F',H' to illustrate colocalization of lipidTOX and Hh, would be helpful. -Figure S3A,B quantify Hh protein level and CNS size phenotypes with Hh RNAi. -p.6 include data showing overexpression of Hh does not cause glial overgrowth. -Top of p.14 should be FigS6A-C. -Include quantification of glial overgrowth and lipid droplet phenotypes with HtlACT plus catalase and SOD1 overexpression (Fig. S6D-K). Reviewer #2 (Significance (Required)): The is a novel and very interesting study, well written and the data are very clearly presented. It builds on and adds to the emerging literature on the glial niche and its role in neural stem cell regulation. It will be of great interest to Drosophila neurobiologists but also to the broader field of neural stem cell biology. My expertise is Drosophila neurobiology.

      Dear editor

      Below is our response to the reviewer’s comments and our experimental plan in addressing these concerns.

      Reviewer #1

      Major comments:

      1.Since Hh RNAi decreases the glial compartment (which slows NB proliferation) and increases the frequency of pH3+ NBs, it is unclear why it would decrease the number of EdU+ NBs (Fig. S3C).

      Our experimental data suggests that accompanying glial niche disruption and downregulation of glia-derived signals, NBs are stalled in M phase (we detected an increase in the percentage of pH3+ NBs). As a consequence, less NBs are in G1 and S phase. Therefore, when we conducted a 15-min EdU incorporation, we observed a reduction in EdU incorporation. This NB phenotype (increase in pH3 index and decrease in EdU index) was also observed by Speder and Brand, 2018, when they induced glial niche impairment by inhibiting the PI3K signaling pathway (discussed in P7 of this ms).

      To address whether glial-Hh knockdown reduces the ability of NBs to produce progeny, we plan to carry out two experiments:

      • We will assess the total number of neurons in the CB by assessing Elav+ neurons.

      • We will conduct two EdU pulse-chase experiments. First, we will assess the total number of EdU+ neurons produced within a 4-hr time window (neurons marked with Elav); and the secondly, we will mark the NB lineage (with either nerfin-1-GFP or pros-GFP) and quantify the number of EdU+ neurons produced per lineage during a 4-hr time window.

      Together, these experiments should allow us to assess the consequence of glial-Hh knockdown on NB proliferation.

      If overexpression of htl[ACT] slows the NB cell cycle (as evidenced by reduced pH3 and EdU positive cells), it unclear why it does not reduce the number of NBs (Fig. 4L).

      The number of NBs in the larval CNS is specified at the beginning of post-embryonic neurogenesis, when quiescent NBs re-enter the cell cycle (reviewed by Homem and Knoblich, 2012). Once NBs re-enter the cell cycle, the number of NBs remain constant. NBs undergo asymmetric division to produce one daughter NB and a GMC, which divides once to generate two neurons. With each round of NB-division, the number of NBs remain constant. Therefore, changes in NB cell cycle speed does not alter the overall NB number, only the number of neurons produced.

      To clarify this, we will add a schematic depicting NB asymmetric division to Figure 1.

      3.What is the justification for presenting the EdU quantifications as an EdU index in which the experimental values are normalized to the average number of positive cells in the control?

      EdU index is calculated as number of EdU+ NBs normalised to control EdU+ NBs. The number of EdU+ NBs reflects the NBs that progress through S phase in a 15-min time relative to the control. A similar method was used in Kanai et al., 2018. This method would not be valid only if NB number varied between control and experimental data sets, however, the number of NBs in all our genetic manipulations are not significantly altered relative to their control. We present the quantification of some key manipulations in Reviewer_Figure 1A, B.

      As regards to why we normalise to control in each of these experiments, this is because in-vitro EdU incorporation rely on Click-IT chemistry, which is inherently variable due to incubation conditions. To overcome this, we always incubate control and experimental brains in the same tube and imaged them with the same confocal setting, and each experiment is normalised to its control done in parallel. We have now included Table 1 which includes all the raw data from these experiments (Table 1)

      In the revised manuscript, we will clarify our methodology in greater detail in the Methods section, and we are happy to include Table 1in the supplementary data.

      In many cases, the comparison is to the same w [1118] line so it does not control for a specific genetic backgrounds and yet this method may be obscuring experimental variation present between datasets.

      We have used three different controls in our experiments, namely GAL4 or lexA >w1118, or UAS-mcherryRNAi, or UAS-luc. We detect no significant difference in terms of raw EdU+ NB numbers between the controls used in our experiments, as demonstrated below (Reviewer_Figure 1C). In our revised manuscript, we will include a sentence “As UAS-mcherryRNAi or UAS-luc are indistinguishable from the > w1118 control, we have used GAL4 driver > w1118 as control in place of UAS-luc in our results”.

      Reviewer_Figure 1. Total NB number and Edu+ NB number quantification

      1. A) Hh knockdown or overexpression in glia does not significantly alter NB number compared to control.
      2. B) htlACT overexpression in glia does not significantly alter NB number compared to control.
      3. C) EdU+ NB number is not significantly different within the controls GAL4 or lexA > w1118, or UAS-mcherryRNAi, or UAS-luc. P-value was obtained performing student t-test in A, B and One-way ANOVA in C.

      Likewise, why is glial number presented as a fold-change but NB number is presented as raw counts (e.g. 2D vs S3E)?

      Glial number quantification was carried out using Fiji 3D object counter and a plug-in called “DeadEasy Larval Glia” (Forero et al., 2012), where the threshold of detection is dependent on the brightness of Repo staining in each experiment, this data is presented as fold-change, as control and experiment stained in the same tube are compared to each other. We represented this data as fold-change to allow easy comparison between experiments. The raw data is presented in Table 2. NB number is counted manually and is therefore presented as raw counts.

      **Minor comments:**

      On the top of P.14, "Figure S7A-C" should probably be "Figure S6A-C"

      We will correct this.

      Reviewer #1 (Significance (Required)):

      The cell autonomous regulation of growth and proliferation of neuroblasts in the larval brain have been well-studied, but much less is known about the non-cell autonomous signals. This paper significantly moves forward knowledge in this area by describing multiple steps of a molecular mechanism for glial regulation of the neuroblast cell cycle. These findings would be of interest not only to the study of Drosophila neuroblasts, but also to the broader adult stem cell field.

      My expertise is in Drosophila stem cell biology and genetics.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Major comments:**

      1.From the data in presented in Fig. 2H-K and Fig. S3C, I am very confused about role of Hh in the non-cell autonomous regulation of neuroblast cell cycle. Both RNAi and overexpression of Hh with Repo-Gal4 cause a reduction in the neuroblast EdU index (Fig. 2H-K and S3C). The authors conclude this section on p.7 saying "Together, our data suggests that high levels of glial Hh expression restricts NB cell cycle progression." This statement is not consistent with data. What is the normal physiological role of Hh if both decreased and increased levels of cortex glial Hh expression reduce neuroblast cell cycle? The discussion of p.15 does not clarify this issue. The model in Fig.7J relates to the role of Hh in the context of cortex glial FGF activation and does not illustrate the normal physiological role of Hh in the regulation of neuroblast cell cycle.

      With repo-GAL4>hhRNAi, the cortex glial niche enwrapping NBs is dramatically disrupted, which indirectly alters NB cell cycle progression, indicated by an increase in pH3 index and a decrease in EdU index. From these two pieces of data, it is likely that NBs are stuck in M phase, thus resulting in less NBs in G1 and S phase that are capable to incorporate EdU within a 15-min incubation time window. We will firm up this data with experiments proposed to address concerns of Reviewer 1, Point 1.

      Both RNAi and overexpression of Hh with repo-GAL4 causes a reduction in NB EdU index is seemingly contradictory. However, it is consistent with a previous report from Speder and Brand, 2018, where it was shown that that glial niche impairment induced by the PI3K pathway inhibition also causes a similar NB phenotype (an increase in pH3 index and a decrease in EdU incorporation). Furthermore, with repo-GAL4>htlDN, which caused a similar glia niche impairment (data not shown), we observed a similar phenotype (an increase in pH3 index and a slight decrease in EdU incorporation). Therefore, we concluded that the NB cell cycle progression defects is due to a general cortex glial niche disruption rather than a direct effect of Hh inhibition on NBs. We are happy to include the repo-GAL4>htlDN data in the supplementary data if required.

      With regards to the physiological role of Hh, we can only conclude from the data at hand that Hh is required for the development of cortex glial niche, which is required to maintain NB activities. In terms of how glial niche impairment impedes NB cell cycle progression, we observed that without a proper niche chamber, NBs cluster together instead of residing in separate niches (Figure 2F-G). Therefore, it is possible that the localization of other cell types (i.e. GMCs and neurons) are also altered as a result of NB clustering, which can potentially affect the NB cell cycle. While these questions will be interesting to explore in the future, they are beyond the scope of this current study.

      In contrast, we robustly showed Hh signals, when overexpressed in glial niche, were capable of making contact with NBs (Figure 7C-C’) and triggering a slow-down of NB S-phase progression. Therefore, it is fair to conclude that “high levels of glial Hh expression restricts NB cell cycle progression”.

      In the revised manuscript, we will discuss these findings in greater detail.

      2.P.8 "Analysis of the total glial cell number indicates overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of cortex glial cells (Figure 4E-G, I-K)." This statement is confusing as Repo staining was used to quantify total glial numbers (including perineural, sub-perineural and cortex glia) but these data are then taken to represent and increase specifically in cortex glia. This should be clarified.

      We thank the reviewer for picking this up. Our intention was to quantify the number of cortex glia cells in glial-specific htlACT, InRwt and EgfrACT manipulations. However, two reported cortex glial antibodies (PntP2 from Avet-Rochex et al., 2012 and SoxN described in Read, 2018), showed unspecific labelling of other cell types (Reviewer_Figure 2, arrows, neurons and NBs). As an alternative, we quantified the total glial cell number (Repo+) in htlACT, InRwt or EgfrACT overexpressed using a cortex glial driver (NP2222-GAL4). We expect that the alterations in glial cell number would be primarily attributed to cortex glial-specific gene manipulation. We agree that we should say that “overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of glial cell”.

      In the revised manuscript, we will clarify this in the results section.

      Reviewer_Figure 2: PntP2 staining in the larval CNS.

      A-B) Representative images showing that PntP2 antibody stains cortex glial cells (marked by NP2222-GAL4>mGFP, yellow arrows), NBs (white arrows) and neurons (blue arrows). B) is the zoomed in image of A). Scale bar = 50 mm.

      It should be mentioned on p.8 that the data in Fig.4A-K reproduce the findings of Avet-Rochex et al., 2012 and Read et al., 2009.

      We will correct this.

      4.Figure 6F. Presumably due to the increase in glia cell number and dramatic increase in glial cell volume, any gene that is specific to, or enriched in, cortex glia will have increased expression levels in RepoGal4>htlACT larval CNS. Can the authors provide evidence that the increase in the expression of these genes is specific to FGF transcriptional regulation and not just a relative increase in the levels of these genes due to an increase in cortex glia as proportion of total CNS volume? Is there any evidence that Hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling in glia?

      We agree that FGF activation causes a dramatic increase in glial cell number, thus will cause a relative increase in the level of hh, fasn1 and lsd2s. However, with RT-qPCR, the same amounts of total RNA (1μg) were extracted from control vs repo-GAL4> htlACT and reverse transcribed into cDNA for qPCR. Therefore, the mRNA level described in Figure 6 F are already normalized to the total amount of genetic material.

      In the literature, it is not reported that hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling. However, lipid metabolism rewiring is well known as a hallmark of glioblastoma. For example, high levels of FASN has been linked with high grade glioblastoma (Grube et al., 2014). Furthermore, FGF signalling has also been shown to modulate lipid metabolism and alter the transcription of the Lsd-2 homologue called Plin2 in a mouse model (Ye et al., 2016).

      To figure out whether hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling. we will have to first find out which TFs are altered in the glia upon altered FGF signalling via cortex glia specific RNA-seq, and then conduct DamID to identify their target genes. This would be interesting to follow-up but is however beyond the scope this current study.

      We will add a section on this in the discussion section of the revised ms.

      FGF signalling has been shown to be necessary and sufficient for cortex glial proliferation. So does knockdown of Htl, or expression of dominant negative Htl, cause a reduction in Hh, fasn1 and lsd2 expression in cortex glia?

      In response to glial htlDN overexpression, we observed a significant reduction in total glial number and overall Hh expression. However, RT-qPCR showed that mRNA levels of hh, fasn1 or lsd-2 were not altered upon htlDNoverexpression (Reviewer_Figure 3).

      This data will be included in the supplementary data in the revised ms.

      Reviewer_Figure 3. Glial htlDN overexpression doesn’t alter the expression of hh, fasn1 and lsd2. The mRNA levels of hh, fasn1 and lsd2 are normalized to the reference gene rpl32.

      Continued: If so, how does reduction of cortex glial numbers independent of FGF signalling, using for example knockdown of String or expression of Decapo, affect the expression of Hh, fasn1 and lsd2 in cortex glia?

      To address this question, we plan to assess the expression levels of hh, fasn1 and lsd-2 using glia specific expression of an inhibitor of the PI3K (delta p60), which has been shown by Speder and Brand, 2018 to cause a reduction in cortex glial number. We will also ascertain whether Decapo overexpression causes cortex glial niche impairment. If so, we will also assess the expression levels of hh, fasn1 and lsd-2 in this setting.

      6.Can the authors speculate on why and how increased levels of Hh in cortex glia, in the context of FGF activation, inhibit neuroblast cell cycle? Is this a physiological mechanism to limit neuroblast proliferation in the face of increased gliogenesis, or is it simply an indirect result of 'spillover' of excess Hh from cortex glia onto neuroblasts (which are autonomously regulated by Hh and so sensitive to this ligand) by due to increased cortex glia cells?

      We favour the model that excess Hh in the glia compartment “spills over” to reduce NB proliferation, which are autonomously regulated by Hh and therefore are sensitive to this ligand. We can add this to the discussion.

      **Minor comments:**

      -Figure 1C' some lipid droplets are extremely large, is this consistent with previous literature?

      These large lipid droplets are caused by lipid droplet fusion due to the use of detergent in this experiment. When we perform antibody staining together with lipid droplet staining, PBST detergent is required for antibody staining to work. However, this created the artefact of large lipid droplets, due to lipid droplet fusion. This has previously been reported by Bailey et al., 2015, and we have explained this in P19 of the Method section.

      -Including a profile plot of relative fluorescence intensity in Figure 1C',F',H' to illustrate colocalization of lipidTOX and Hh, would be helpful.

      We will include this in the revised ms.

      -Figure S3A,B quantify Hh protein level and CNS size phenotypes with Hh RNAi.

      We will include this in the revised ms.

      -p.6 include data showing overexpression of Hh does not cause glial overgrowth.

      We will include this in the revised ms.

      -Top of p.14 should be FigS6A-C.

      We will correct this.

      -Include quantification of glial overgrowth and lipid droplet phenotypes with HtlACT plus catalase and SOD1 overexpression (Fig. S6D-K).

      We will include this in the revised ms.

      Reviewer #2 (Significance (Required)):

      The is a novel and very interesting study, well written and the data are very clearly presented. It builds on and adds to the emerging literature on the glial niche and its role in neural stem cell regulation. It will be of great interest to Drosophila neurobiologists but also to the broader field of neural stem cell biology.

      My expertise is Drosophila neurobiology.








      Table 1. EdU+ NB numbers for each genotype described in each Figure

      Figure

      Genotype

      EdU incubation time

      Average EdU+ NB number

      SEM

      Number of samples

      Figure 2J

      repo-GAL4>w1118

      15 min

      66.63

      1.79

      16

      Figure 2J

      repo-GAL4>UAS-hh

      15 min

      57.35

      1.35

      20

      Figure 2K

      NP2222-GAL4>w1118

      15 min

      67.91

      1.44

      11

      Figure 2K

      NP2222-GAL4>UAS-hh

      15 min

      60.79

      0.79

      14

      Figure 2P

      dnab-GAL4>w1118

      15 min

      70.5

      1.44

      12

      Figure 2P

      dnab-GAL4>ciACT

      15 min

      60.1

      1.48

      10

      Figure S3C

      repo-GAL4>dcr2; mcherryRi

      10 min

      57.42

      0.63

      12

      Figure S3C

      repo-GAL4>dcr2; hhRi43255

      10 min

      48.56

      2.65

      9

      Figure 3K

      NP2222-GAL4>w1118

      The same dataset as Figure 2K

      Figure 3K

      NP2222-GAL4>UAS-hh

      Figure 3K

      NP2222-GAL4>UAS-hh; mcherryRi

      15 min

      57.44

      1.41

      16

      Figure 3K

      NP2222-GAL4>UAS-hh; lsdRi34617

      15 min

      63.36

      1.34

      14

      Figure 3K

      NP2222-GAL4>UAS-hh; mcherryRi

      15 min

      58.83

      2.61

      6

      Figure 3K

      NP2222-GAL4>UAS-hh; lsdRi32846

      15 min

      64.5

      1.2

      14

      Figure 5E

      repo-GAL4>w1118

      15 min

      71.6

      1.28

      15

      Figure 5E

      repo-GAL4>UAS-htlACT

      15 min

      56

      1.59

      14

      Figure 5E

      NP2222-GAL4>w1118

      15 min

      70.2

      1.58

      10

      Figure 5E

      NP2222-GAL4>UAS-htlACT

      15 min

      54.75

      1.24

      16

      Figure 6G

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure 6G

      NP2222-GAL4>UAS-htlACT

      Figure 6G

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      60

      1.24

      7

      Figure 6G

      NP2222-GAL4>UAS-htlACT;hhRi43255

      15 min

      67.17

      1.13

      12

      Figure 6G

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      59.29

      1.79

      14

      Figure 6G

      NP2222-GAL4>UAS-htlACT;hhRi25794

      15 min

      68.55

      1.68

      11

      Figure 6H

      dnab-GAL4>mcherryRi

      10 min

      49.13

      1.6

      8

      Figure 6H

      dnab-GAL4>ciRi2125-R2

      10 min

      56.54

      1.27

      13

      Figure 6H

      repo-lexA>w1118

      15 min

      68.5

      1.1

      10

      Figure 6H

      repo-lexA>lexAop-htlACT

      15 min

      55.7

      2.15

      10

      Figure 6H

      repo-lexA>lexAop-htlACT; GFPRi

      15 min

      52

      1.58

      30

      Figure 6H

      repo-lexA>lexAop-htlACT; ciRiHMJ23860

      15 min

      62.4

      1.79

      15

      Figure 6H

      repo-lexA>lexAop-htlACT; GFPRi

      15 min

      56.33

      1.49

      12

      Figure 6H

      repo-lexA>lexAop-htlACT; ciRi2125-R2

      15 min

      62.86

      1.81

      7

      Figure 6J

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure 6J

      NP2222-GAL4>UAS-htlACT

      Figure 6J

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      58.64

      0.99

      14

      Figure 6J

      NP2222-GAL4>UAS-htlACT;fasn1Ri3523R2

      15 min

      65

      2.41

      9

      Figure 6J

      NP2222-GAL4>UAS-htlACT;mcherryRi

      The same dataset as Figure 6G control of NP2222-GAL4>UAS-htlACT;hhRi25794

      Figure 6J

      NP2222-GAL4>UAS-htlACT;lsd2Rikk102269

      15 min

      68.13

      1.08

      8

      Figure S5H

      NP2222-GAL4>mcherryRi

      15 min

      66.4

      1.71

      10

      Figure S5H

      NP2222-GAL4>fasn1Ri3523R6

      15 min

      65.5

      1.38

      10

      Figure S5H

      NP2222-GAL4>mcherryRi

      15 min

      66.4

      1.13

      15

      Figure S5H

      NP2222-GAL4>lsd2Rikk102269

      15 min

      64.2

      0.94

      10

      Figure S5H

      NP2222-GAL4>UAS-luc

      15 min

      65

      1.07

      10

      Figure S5H

      NP2222-GAL4>UAS-lsd2

      15 min

      64.9

      1.51

      10

      Figure S5I

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure S5I

      NP2222-GAL4>UAS-htlACT

      Figure S5I

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      57.93

      0.9

      14

      Figure S5I

      NP2222-GAL4>UAS-htlACT;fasn1Ri3523R6

      15 min

      63.79

      1.25

      14

      Figure S5I

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      50.25

      2.52

      8

      Figure S5I

      NP2222-GAL4>UAS-htlACT;lsd2Ri32846

      15 min

      59.3

      1.2

      10

      Figure 7B

      NP2222-GAL4>mcherryRi

      15 min

      65

      0.93

      10

      Figure 7B

      NP2222-GAL4>raspRi11495R2

      15 min

      65.13

      1.29

      15

      Figure 7B

      NP2222-GAL4>w1118

      The same dataset as Figure 5E

      Figure 7B

      NP2222-GAL4>UAS-htlACT

      Figure 7B

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      58.33

      1.06

      18

      Figure 7B

      NP2222-GAL4>UAS-htlACT;raspRi11495R1

      15 min

      63.95

      1.05

      21

      Figure 7B

      NP2222-GAL4>UAS-htlACT;mcherryRi

      15 min

      59.04

      1.019

      26

      Figure 7B

      NP2222-GAL4>UAS-htlACT;raspRi11495R2

      15 min

      63.07

      0.92

      29

      Figure 7D

      NP2222-GAL4>w1118

      15 min

      69.46

      1.02

      13

      Figure 7D

      NP2222-GAL4>UAS-hh.N.EGFP

      15 min

      52.25

      1.9

      12

      Figure 7F

      repo-GAL4>UAS-hh.N.EGFP;mcherryRi

      15 min

      54.4

      1.18

      15

      Figure 7D

      repo-GAL4>UAS-hh.N.EGFP;fasn1Ri3523R2

      15 min

      65.69

      1.43

      13

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-LacZ

      15 min

      59.17

      1.18

      12

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-Cat.A

      15 min

      64

      1.31

      12

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-LacZ

      15 min

      53.6

      2.32

      10

      Figure S6L

      NP2222-GAL4>UAS-htlACT; UAS-Sod.1

      15 min

      62.7

      1.76

      10

      Table 2. Raw data on glial number

      Figure

      Genotype

      Average Repo+glial number

      SEM

      Number of samples

      Figure 2D

      repo-GAL4>dcr2; mcherryRi

      843

      44.29

      7

      Figure 2D

      repo-GAL4>dcr2; hhRi43255

      666.5

      46.77

      8

      Figure 4K

      NP2222-GAL4>w1118

      1165

      20.55

      10

      Figure 4K

      NP2222-GAL4>htlACT

      2325

      107.5

      10

      Figure 4K

      NP2222-GAL4>InRwt

      1189

      85.92

      10

      Figure 4K

      wrapper-GAL4>w1118

      1305

      51.78

      7

      Figure 4K

      wrapper-GAL4>EgfrACT

      1192

      38.16

      12

      Reference:

      Avet-Rochex, A., Kaul, A.K., Gatt, A.P., McNeill, H., and Bateman, J.M. (2012). Concerted control of gliogenesis by InR/TOR and FGF signalling in the Drosophila post-embryonic brain. Development 139, 2763-2772.

      Bailey, A.P., Koster, G., Guillermier, C., Hirst, E.M., MacRae, J.I., Lechene, C.P., Postle, A.D., and Gould, A.P. (2015). Antioxidant Role for Lipid Droplets in a Stem Cell Niche of Drosophila. Cell 163, 340-353.

      Forero, M.G., Kato, K., and Hidalgo, A. (2012). Automatic cell counting in vivo in the larval nervous system of Drosophila. J Microsc 246, 202-212.

      Grube, S., Dunisch, P., Freitag, D., Klausnitzer, M., Sakr, Y., Walter, J., Kalff, R., and Ewald, C. (2014). Overexpression of fatty acid synthase in human gliomas correlates with the WHO tumor grade and inhibition with Orlistat reduces cell viability and triggers apoptosis. J Neurooncol 118, 277-287.

      Homem, C.C., and Knoblich, J.A. (2012). Drosophila neuroblasts: a model for stem cell biology. Development 139, 4297-4310.

      Kanai, M.I., Kim, M.J., Akiyama, T., Takemura, M., Wharton, K., O'Connor, M.B., and Nakato, H. (2018). Regulation of neuroblast proliferation by surface glia in the Drosophila larval brain. Sci Rep 8, 3730.

      Read, R.D. (2018). Pvr receptor tyrosine kinase signaling promotes post-embryonic morphogenesis, and survival of glia and neural progenitor cells in Drosophila. Development 145.

      Speder, P., and Brand, A.H. (2018). Systemic and local cues drive neural stem cell niche remodelling during neurogenesis in Drosophila. Elife 7.

      Ye, M., Lu, W., Wang, X., Wang, C., Abbruzzese, J.L., Liang, G., Li, X., and Luo, Y. (2016). FGF21-FGFR1 Coordinates Phospholipid Homeostasis, Lipid Droplet Function, and ER Stress in Obesity. Endocrinology 157, 4754-4769.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The study by Dong et al., investigates the role of Hedgehog in the glial niche during larval neurogenesis in Drosophila. The authors describe the expression of Hh in cortex glia and its association with lipid droplets. They show that Hh expression in cortex glia is required for cortex glial proliferation, cell autonomously, and for maintenance of the normal cell cycle in neuroblasts. They go on to use a well characterised Drosophila glioma model, activation of FGF signalling, to investigate the requirement for Hh during cortex glial overgrowth. They show that FGF-activated cortex glial overproliferation requires Hh for modulation of neuroblast cell cycle, although Hh does not regulate cortex glial proliferation in this context. Finally, they show that inhibition of lipid modification of Hh rescues the neuroblast proliferation cell cycle defect caused by FGF activation in cortex glia.

      Major comments:

      1.From the data in presented in Fig. 2H-K and Fig. S3C, I am very confused about role of Hh in the non-cell autonomous regulation of neuroblast cell cycle. Both RNAi and overexpression of Hh with Repo-Gal4 cause a reduction in the neuroblast EdU index (Fig. 2H-K and S3C). The authors conclude this section on p.7 saying "Together, our data suggests that high levels of glial Hh expression restricts NB cell cycle progression." This statement is not consistent with data. What is the normal physiological role of Hh if both decreased and increased levels of cortex glial Hh expression reduce neuroblast cell cycle? The discussion of p.15 does not clarify this issue. The model in Fig.7J relates to the role of Hh in the context of cortex glial FGF activation and does not illustrate the normal physiological role of Hh in the regulation of neuroblast cell cycle.

      2.P.8 "Analysis of the total glial cell number indicates overexpression of htlACT, but not InRwt or EgfrACT, led to an increase in the number of cortex glial cells (Figure 4E-G, I-K)." This statement is confusing as Repo staining was used to quantify total glial numbers (including perineural, sub-perineural and cortex glia) but these data are then taken to represent and increase specifically in cortex glia. This should be clarified.

      3.It should be mentioned on p.8 that the data in Fig.4A-K reproduce the findings of Avet-Rochex et al., 2012 and Read et al., 2009.

      4.Figure 6F. Presumably due to the increase in glia cell number and dramatic increase in glial cell volume, any gene that is specific to, or enriched in, cortex glia will have increased expression levels in RepoGal4>htlACT larval CNS. Can the authors provide evidence that the increase in the expression of these genes is specific to FGF transcriptional regulation and not just a relative increase in the levels of these genes due to an increase in cortex glia as proportion of total CNS volume? Is there any evidence that Hh, fasn1 and lsd2 are direct transcriptional targets of FGF signalling in glia?

      5.FGF signalling has been shown to be necessary and sufficient for cortex glial proliferation. So does knockdown of Htl, or expression of dominant negative Htl, cause a reduction in Hh, fasn1 and lsd2 expression in cortex glia? If so, does how does reduction of cortex glial numbers independent of FGF signalling, using for example knockdown of String or expression of Decapo, affect the expression of Hh, fasn1 and lsd2 in cortex glia?

      6.Can the authors speculate on why and how increased levels of Hh in cortex glia, in the context of FGF activation, inhibit neuroblast cell cycle? Is this a physiological mechanism to limit neuroblast proliferation in the face of increased gliogenesis, or is it simply an indirect result of 'spillover' of excess Hh from cortex glia onto neuroblasts (which are autonomously regulated by Hh and so sensitive to this ligand) by due to increased cortex glia cells?

      Minor comments:

      -Figure 1C' some lipid droplets are extremely large, is this consistent with previous literature?

      -Including a profile plot of relative fluorescence intensity in Figure 1C',F',H' to illustrate colocalization of lipidTOX and Hh, would be helpful.

      -Figure S3A,B quantify Hh protein level and CNS size phenotypes with Hh RNAi.

      -p.6 include data showing overexpression of Hh does not cause glial overgrowth.

      -Top of p.14 should be FigS6A-C.

      -Include quantification of glial overgrowth and lipid droplet phenotypes with HtlACT plus catalase and SOD1 overexpression (Fig. S6D-K).

      Significance

      The is a novel and very interesting study, well written and the data are very clearly presented. It builds on and adds to the emerging literature on the glial niche and its role in neural stem cell regulation. It will be of great interest to Drosophila neurobiologists but also to the broader field of neural stem cell biology.

      My expertise is Drosophila neurobiology.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this study, the authors investigate the role of hedgehog signaling and lipid metabolism in the neural stem cell niche of the Drosophila larvae. They demonstrate that Hedgehog localizes to lipid droplets in glial cells and show that Hh is necessary but not sufficient for elaboration of glial membranes and normal rates of glial proliferation during development. In addition, they provide an extensive set of results in support of a model that FGF signaling functions upstream of lipid metabolism and hh in glial cells as well as a parallel ROS mediated pathway in glial cells to promote neuroblast proliferation. In general, the results provide strong support for the conclusions. Specifically, the approaches are sound, the images clearly demonstrate the phenotypes described, and the effects are quantified and tested for statistical significance.

      Major comments:

      1.Since Hh RNAi decreases the glial compartment (which slows NB proliferation) and increases the frequency of pH3+ NBs, it is unclear why it would decrease the number of EdU+ NBs (Fig. S3C).

      2.If overexpression of htl[ACT] slows the NB cell cycle (as evidenced by reduced pH3 and EdU positive cells), it unclear why it does not reduce the number of NBs (Fig. 4L).

      3.What is the justification for presenting the EdU quantifications as an EdU index in which the experimental values are normalized to the average number of positive cells in the control? In many cases, the comparison is to the same w[1118] line so it does not control for a specific genetic backgrounds and yet this method may be obscuring experimental variation present between datasets. Likewise, why is glial number presented as a fold-change but NB number is presented as raw counts (e.g. 2D vs S3E)?

      Minor comments:

      On the top of P.14, "Figure S7A-C" should probably be "Figure S6A-C"

      Significance

      The cell autonomous regulation of growth and proliferation of neuroblasts in the larval brain have been well-studied, but much less is known about the non-cell autonomous signals. This paper significantly moves forward knowledge in this area by describing multiple steps of a molecular mechanism for glial regulation of the neuroblast cell cycle. These findings would be of interest not only to the study of Drosophila neuroblasts, but also to the broader adult stem cell field.

      My expertise is in Drosophila stem cell biology and genetics.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the three reviewers for providing valuable feedback on our original manuscript. A point-by-point response to all of these comments is provided below. [Note that figures are not added in-line because of text-only limitations.]

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The submitted manuscript entitled 'Predicting cell health phenotypes using image-based morphology profiling' (RC-2020-00394) by Way et al. presents a set of seven dyes/staining (as two separate panels) to microscopically screen cell viability. For automatic classification a training/test set of 119 CRISPR (approximately 2 sgRNAs per gene) perturbations on 3 cancer cell lines were generated (lung A549, ovarian ES2, lung HCC44). After segmentation of cell nuclei a set of morphological cell measurements were extracted from each perturbation (total 952 features). The nature of these feature spanning cell cycle and viability phenotypes, enabled the authors to define 70 different phenotype classes, which are used to model a classifier by elastic linear regression. Specific definitions (cell cycle and ROS) were partly predicted/validated in an independent existing image data set (Drug Repurposing Hub project). The data is available as web-based application/visualization and the supplementary method is well described.

      We thank the reviewer for their constructive comments and helpful feedback.

      There is one subtle point that is worth raising given this description: The images we use to measure the cell cycle and viability phenotypes (two different staining panels in the Cell Health assays) are not the same images we use to extract morphology measurements (Cell Painting assay). This lack of connection, which is based on a light wavelength limitation present in all microscopes that limits the number of stains in a single assay, prevents us from developing a method that analyzes the same cells across the three assays. This distinction will become important later in the review, and we have made specific changes in the manuscript to increase clarity.

      **Major concerns:**

      (1)The only fundamental argument of this manuscript not to apply state-of-the-art deep learning (DL) machine-learning (mentioned in McCain et al. 2018), which does not require segmentation, feature extraction, abstraction, manual gating is the 'interpretability' of the predictions. However, performance, precision, scalability (by modern GPUs) with DL should clearly outperform 'manual' regression models. All recent machine vision benchmarks in microscopy confirm this, but also clearly shows 'real world' translational applications, e.g.

      https://www.nature.com/articles/s43018-020-0085-8,

      https://www.biorxiv.org/content/10.1101/2020.07.02.183814v1.full.pdf,

      In other words, the presented methodology is not compared to DL, and is not convincing in terms of interpretability benefits.

      (We’ve copied a similar critique from __Significance sectio__n from Reviewer #1 in order to reduce redundancy) The author/co-authors have been instrumental/pioneered with their past work on cell-based image processing (CellProfiler software), but the presented methodology is simply outdated. Therefore, a revision towards a comparison and benchmarking with DL will also not help.

      Ref (DL with MIL): https://academic.oup.com/bioinformatics/article/32/12/i52/2288769

      We agree that deep learning approaches are exciting; much of our laboratory’s work focuses on their application (see https://doi.org/10.1073/pnas.2001227117, https://doi.org/10.1038/s41592-019-0612-7; https://doi.org/10.1002/cyto.a.23863, https://doi.org/10.1109/CVPR.2018.00970), and we agree that they are likely to outperform simpler regression models trained using so-called hand-engineered features. We thank the reviewer for highlighting our failure to accurately and fully describe our rationale.

      We intentionally did not use deep learning for this problem given (a) data limitations (b) the primary goal of the manuscript, which is to demonstrate feasibility.

      Data limitations. There is no mechanism to link the cells of the assays (Cell Health and Cell Painting) together, which greatly reduces the available sample size. In the two referenced manuscripts, which each propose an exciting approach, the dataset is much larger (~17,000 and ~1,000 images respectively). Our dataset is only 357 perturbations that can only be linked between assays at the perturbation level rather than a single-cell level. Therefore, a deep learning approach is likely to produce models that don’t generalize to other datasets. Furthermore, reviewer 3 commented in favor of the approach we presented: “Using elastic net regression models is well-suited to the problem due to the low number of observations.”

      Primary goal of the manuscript is to demonstrate feasibility. In addition, the primary goal of the manuscript is to add cell health annotations as functional readouts to perturbations. Our aim was to demonstrate feasibility of predicting cell health states, not to optimize performance. Optimizing performance would require collecting much more data, or developing new deep learning or data collection methods to account for the lack of matched single cell readouts.

      To make this rationale more clear and concise, we have made the following changes in the manuscript:

      In the first paragraph of page 3, we make some minor contextual updates (”To demonstrate proof of concept, we collected a small pilot dataset of 119 CRISPR knockout perturbations…”) and replaced “We used simple machine learning methods, which are relatively easy to interpret compared to deep learning” with:

      We used simple machine learning methods instead of a deep learning approach because of our limited sample size of 119 perturbations and the inability to increase the sample size by linking single cell measurements across assays.

      We have also amended the Conclusions section to emphasize our primary goal and note possible deep learning extensions as future directions. The Conclusions now reads:

      We have demonstrated feasibility that information in Cell Painting images can predict many different Cell Health indicators even when trained on a small dataset. The results motivate collecting larger datasets for training, with more perturbations and multiple cell lines. These new datasets would enable the development of more expressive models, based on deep learning, that can be applied to single cells. Including orthogonal imaging markers of CRISPR infection would also enable us to isolate cells with expected morphologies. More data and better models would improve the performance and generalizability of Cell Health models and enable annotation of new and existing large-scale Cell Painting datasets with important mechanisms of cell health and toxicity.

      (2)One aforementioned point of the methodology is cryptically/not described: Why it should be less expensive compared with other (which?) approaches (see introduction)?

      We thank the reviewer for bringing up this point. We believe that part of this confusion stems from a slight misunderstanding about how images from the three assays (two Cell Health and one Cell Painting) are collected. The Cell Health assays are two distinct panels of targeted reagents that are separately prepared as two physically distinct assays. The Cell Painting assay is already an established assay used by many labs and companies around the world to mark cell morphology in an unbiased and relatively cheap way. We are comparing the expenses between the two Cell Health assays vs. the Cell Painting assay.

      We believe that this misunderstanding likely results from our somewhat cryptic and inconsistent language when describing the Cell Health assays in the abstract and introduction. We’ve updated the third sentence of the abstract from “We developed two customized microscopy assays that use seven reagents to measure 70 specific cell health phenotypes...” to now read:

      We developed two customized microscopy assays, one using four targeted reagents and the other three targeted reagents, to collectively measure 70 specific cell health phenotypes including proliferation, apoptosis, reactive oxygen species (ROS), DNA damage, and cell cycle stage.

      For consistency, we have also updated the penultimate paragraph in the introduction to now read:

      To do this, we first developed two customized microscopy assays, which collectively report on 70 different cell health indicators via a total of seven reagents applied in two reagent panels. Collectively, we call these assays “Cell Health”.

      With these clarifications in mind, we believe that the question of comparing monetary costs is more clear. We are comparing the costs of the targeted reagents in the two Cell Health assays to the unbiased reagents in the single Cell Painting assay. We’ve also modified the last two sentences in the first paragraph of the introduction to strengthen the connection between Cell Health assays, targeted reagents, and high cost:

      Cell health is normally assessed by eye or measured by specifically targeted reagents, which are either focused on a single Cell Health parameter (ATP assays) or multiple, in combination, via FACS-based or image-based analyses, which involves a manual gating approach, complicated staining procedures, and significant reagent cost. These traditional approaches limit the ability to scale to large perturbation libraries such as candidate compounds in academic and pharmaceutical screening centers.

      (3)Generalizability and/or training data size is essential for any model-based classification, but not evaluated or validated in the current manuscript. The independent validation on a A549 cell line only data might be not sufficient/convincing.

      We separately address the two distinct points raised by the reviewer of 1) generalizability and 2) training data size:

      Generalizability We agree that any model-based classification must demonstrate generalizability. For this reason, we have taken careful consideration to assess the generalizability of all 70 models in two contexts. First, we assessed model performance in a single held out test set (15% of all data). All results we report in the main text (e.g. Figure 2) report performance on this test set. We see high performance in many (but not all) models, and we observe much better model performance compared to a negative control baseline (New Supplementary Figure S5). High performance in the test set indicates that, for some cell health indicators, the models generalize well.

      Second, we also demonstrate that these models generalize to data from an entirely different experiment using a fundamentally different perturbation (CRISPR vs. drug compounds). We demonstrate generalizability to this external validation data in four different ways: 1) Validating a relatively simple model (“Number of Live Cells”) with an orthogonal viability readout from the PRISM assay (barcoding-based cell viability; updated Figure 4); 2) Demonstrating that proteasome inhibitors, which are known to produce reactive oxygen species, are predicted to do so; 3) Demonstrating that PLK inhibitors, which are known to reduce entry to G1, show a robust dose response in the "G1 Cell Count" model; and 4) Demonstrating that aurora kinase and tubulin inhibitors are predicted to induce high DNA damage (gH2AX) in G1 cells. These two drug classes are known to cause “mitotic slippage” and double stranded DNA breaks. The fourth example was added in response to a comment by reviewer 3.

      We’ve also added a series of enrichment tests, as described in the following new text:

      We also chose to validate three additional models: ROS, G1 cell count, and Number of gH2AX spots in G1 cells. We observed that the two proteasome inhibitors (bortezomib and MG-132) in the Drug Repurposing Hub set yielded high ROS predictions (OR = 76.7; p -15) (Figure 4C). Proteasome inhibitors are known to induce ROS (Han and Park, 2010; Ling et al., 2003). As well, PLK inhibitors yielded low G1 cell counts (OR = 0.035; p = 3.9 x 10-8) (Figure 4C). The PLK inhibitor HM-214 showed an appropriate dose response (Figure 4D). PLK inhibitors block mitotic progression, thus reducing entry into the G1 cell cycle phase (Lee et al., 2014). Lastly, we observed that aurora kinase and tubulin inhibitors were enriched for high Number of gH2AX spots in G1 cells predictions (OR = 11.3; p -15) (Figure 4E). In particular, we observed a strong dose response for the aurora kinase inhibitor barasertib (AZD1152) (Figure 4F). Aurora kinase and tubulin inhibitors cause prolonged mitotic arrest, which can lead to mitotic slippage, G1 arrest, DNA damage, and senescence (Orth et al. 2011; Cheng and Crasta 2017; Tsuda et al. 2017).

      The updated methods section describing our approach to assess generalizability perform the enrichment tests now states:

      Assessing generalizability of cell health models applied to Drug Repurposing Hub data

      We used our cell health webapp (https://broad.io/cell-health-app) to identify compounds with high predictions for three models with high or intermediate performance: ROS, Number of G1 cells, and Number of gH2AX spots in G1 cells. For each model, we identified classes of compounds with consistently high scores, then tested for statistical enrichment: for proteasome inhibitors in the ROS model, PLK inhibitors in the Number of G1 cells model, and aurora kinase and tubulin inhibitors in the Number of gH2AX spots in G1 cells model. We used one-sided Fisher’s exact tests to quantify differences in expected proportions between high and low model predictions. For each case, we determined high and low predictions based on the 50% quantile threshold for each model independently.

      We acknowledge that prospectively making predictions and measuring Cell Health readouts directly in a new experiment would be more convincing, but we note that our existing assessment of generalizability in an external experiment is already unusual in machine learning publications. Additionally and unfortunately, collecting a second validation dataset for this manuscript is not currently feasible given experiments backlogged from COVID.

      1. Training data size

      We also agree that a more comprehensive analysis on training data size would be an important indicator of model limitations. Therefore, we performed a sample titration analysis in which we randomly dropped samples from the training procedure, and tracked performance of the held out test set. We add the following figure, figure legend, and results text to describe and interpret the results.

      Supplementary Figure S13: Dropping samples from training reduces test set model performance in high, mid, and low performing models. We determined model performance stratification by taking the top third, mid third, and bottom third of test set performance when using all data. We performed the sample titration analysis with 10 different random seeds and visualized the median test set performance for each model.

      We updated the results section to introduce and discuss this result:

      Lastly, we performed a sample size titration analysis in which we randomly removed a decreasing amount of samples from training. For the high and mid performing models, we observed a consistent performance drop, suggesting that increasing sample size would result in better overall performance (Supplementary Figure 13).

      Finally, the updated methods section describing our sample titration analysis now reads:

      Machine learning robustness: Investigating the impact of sample size

      We performed an analysis in which we randomly dropped an increasing amount of samples from the training set before model training. After dropping the predefined number of samples, we retrained all 70 cell health models and assessed performance on the original holdout test set. We performed this procedure ten times with ten unique random seeds to mirror a more realistic scenario of new data collection and to reduce the impact of outlier samples on model training.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      **Minor concerns:**

      (1)Highest test performance comprises that precision is mainly driven by cell cycle/count and live status and could be probably derived from DRAQ7 (Fig. 2) and DNA granularity (Fig. 3, bottom right) and would argue for rigid feature selection across channels and features.

      We believe that clarifying the confusion between the two Cell Health assays we developed and the well-established Cell Painting assay addresses part of this concern. The DRAQ7 dye marks dead cells, and is measured in Cell Health. In other words, readouts from this reagent are what we aim to predict, not what we use for training. Indeed, DRAQ7-based phenotypes are among the top predicted models, which is a result we present in Supplementary Figure S7 - this figure uncovers which Cell Health phenotypes are more easily predicted by Cell Painting.

      The DNA granularity morphology measurements are collected from the Cell Painting assay and thus are available for training, and, as noted by the reviewer, encode a high proportion of signal in predicting the various cell health phenotypes. In our most common processing workflows for other projects, we do apply a rigid feature selection pipeline to all Cell Painting profiles before analysis, but we do not do this in this analysis since we were using a model with a sparsity-inducing penalty (elastic net).

      To directly answer the question of how channels and feature groups influence model performance, we’ve performed a systematic experiment removing different channel, compartment, and feature groups and retraining all models with the specific group dropped. We now include the following supplementary figure:

      Supplementary Figure S12: Systematically removing classes of features has little impact on most models’ performance. We retrained all 70 cell health models after dropping features associated with specific (a) feature groups, (b) channels, and (c) compartments. Each dot is one model (predictor), and the performance difference between the original model and the retrained model after dropping features is shown on the x axis. Any positive change indicates that the models got worse after dropping the feature group. (d) Individual model differences in performance after dropping features. Each dot is one class of features removed (as in a-c).

      Additionally, we updated the results section to introduce and discuss this result:

      We also performed a systematic feature removal analysis, in which we retrained cell health models after dropping features that are measured from specific groups, compartments, and channels. We observed that most models were robust to dropping entire feature classes during training (Supplementary Figure 12). This result demonstrates that many Cell Painting features are highly correlated, which might permit prediction “rescue” even if the directly implicated morphology features are not measured. Because of this, we urge caution when generating hypotheses regarding causal relationships between readouts and individual Cell Painting features.

      And we add the following to the methods section:

      Machine learning robustness: Systematically removing feature classes

      We performed an analysis in which we systematically dropped features measured in specific compartments (Nuclei, Cells, and Cytoplasm), specific channels (RNA, Mito, ER, DNA, AGP), and specific feature groups (Texture, Radial Distribution, Neighbors, Intensity, Granularity, Correlation, Area Shape) and retrained all models. We omitted one feature class and then independently optimized all 70 cell health models as described in the Machine learning framework results section above. We repeated this procedure once per feature class.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      (2)Any H2AX and 'polynuclear' would probably fail in any cell line with this size of training data.

      Indeed we would expect certain cell health phenotype models to fail if they had few hits and a relatively low variance of output values. This hit rate is directly associated with the phenotypes that the CRISPR perturbations induce, which is why we intentionally selected them to span multiple gene pathways in an attempt to maximize morphology diversity (see Supplementary Table S1).

      We did indeed observe that the polynuclear model had few hits in the training data and relatively poor performance. We did not expect this result, given that DNA stains are captured in the Cell Health and Cell Painting assays. We suspect the poor performance in this model is likely because so few cells were classified as polynuclear in our gating strategy, making it perhaps an inconsistently measured readout.

      By contrast, some gH2AX models did have relatively good performance. In the conclusion, we note that increased training data size using more perturbations is likely to improve model performance:

      The results motivate collecting larger datasets for training, with more perturbations and multiple cell lines. These new datasets would enable the development of more expressive models, based on deep learning, that can be applied to single cells. Including orthogonal imaging markers of CRISPR infection would also enable us to isolate cells with expected morphologies. More data and better models would improve the performance and generalizability of Cell Health models and enable annotation of new and existing large-scale Cell Painting datasets with important mechanisms of cell health and toxicity.

      (3)To what refers the 'weights' of the model in Fig. 1c?

      We thank the reviewer for pointing out that we never defined this term in the Figure 1 legend. We use “weights” to refer to the coefficients from the regression model. To make this more clear, we have updated the legend to now read: “Model coefficient weights” and the text in Figure 1C to now read “model weights”.

      Reviewer #1 (Significance (Required)):

      This manuscript is not advanced in the context of latest improvements/developments of cell-based microscopic classification. Rationale in the introduction and the conclusion are not linked (interpretability, generalizability, costs). It seems to be unfinished or unformatted to this end?

      Since responding to these reviews, we believe that our primary motivation - to demonstrate proof-of-concept of predicting cell health phenotypes directly from Cell Painting data - is now much clearer, holistically. We provide below an updated introduction, which improves rationale.

      Perturbing cells with specific genetic and chemical reagents in different environmental contexts impacts cells in various ways (Kitano, 2002). For example, certain perturbations impact cell health by stalling cells in specific cell cycle stages, increasing or decreasing proliferation rate, or inducing cell death via specific pathways (Markowetz, 2010; Szalai et al., 2019). Cell health is normally assessed by eye or measured by specifically targeted reagents, which are either focused on a single Cell Health parameter (ATP assays) or multiple, in combination, via FACS-based or image-based analyses, which involves a manual gating approach, complicated staining procedures, and significant reagent cost. These traditional approaches limit the ability to scale to large perturbation libraries such as candidate compounds in academic and pharmaceutical screening centers.

      Image-based profiling assays are increasingly being used to quantitatively study the morphological impact of chemical and genetic perturbations in various cell contexts (Caicedo et al., 2016; Scheeder et al., 2018). One unbiased assay, called Cell Painting, stains for various cellular compartments and organelles using non-specific and inexpensive reagents (Gustafsdottir et al., 2013). Cell Painting has been used to identify small-molecule mechanisms of action (MOA), study the impact of overexpressing cancer mutations, and discover new bioactive mechanisms, among many other applications (Caicedo et al., 2018; Christoforow et al., 2019; Hughes et al., 2020; Pahl and Sievers, 2019; Rohban et al., 2017; Simm et al., 2018; Wawer et al., 2014). Additionally, Cell Painting can predict mammalian toxicity levels for environmental chemicals (Nyffeler et al., 2020) and some of its derived morphology measurements are readily interpreted by cell biologists and relate to cell health (Bray et al., 2016). However, no single assay enables discovery of fine-grained cell health readouts.

      We hypothesized that we could predict many cell health readouts directly from the Cell Painting data, which is already available for hundreds of thousands of perturbations. This would enable the rapid and interpretable annotation of small molecules or genetic perturbations. To do this, we first developed a customized microscopy assays, which collectively report on 70 different cell health indicators via a total of seven reagents applied in two reagent panels. Collectively, we call these assay panels “Cell Health”.

      To demonstrate proof of concept, we collected a small pilot dataset of 119 CRISPR knockout perturbations in three different cell lines using Cell Painting and Cell Health. We used the Cell Painting morphology readouts to train 70 different regression models to predict each Cell Health indicator independently. We used simple machine learning methods instead of a deep learning approach because of our limited sample size and the inability to increase it by linking single cell measurements from both assays. We predicted certain readouts, such as the number of S phase cells, with high performance, while performance on other readouts, such as DNA damage in G2 phase cells, was low. We applied and validated these models on a separate set of existing Cell Painting images acquired from 1,571 compound perturbations measured across six different doses from the Drug Repurposing Hub project (Corsello et al., 2017). We provide all predictions in an intuitive web-based application at http://broad.io/cell-health-app, so that others can extend our work and explore cell health impacts of specific compounds.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This report from Way et al describes a method of extending a very popular screening technology called Cell Painting developed by the Carpenter Lab. The authors are contending with an important issue and as such this paper potentially will be of great interest to the community. Cell Painting provides quantitative fingerprints of cell phenotypes in response to changes in the molecular or physiological status of cells. However the molecular basis or even the candidate pathways for those changes is not always clear. Here, the authors take specific markers of cell physiology, e.g., DNA damage, ROS production, cell cycle progression etc. and relate them to Cell Painting features. The authors are trying to address the issue that running many probes of cell physiology is expensive and time consuming and that identifying proxies for these assays using much simpler Cell Painting technologies would be a useful and potentially powerful approach. The overall goal is to develop some type of regression model that can link the state of cells (the "health") to Cell Painting fingerprints.

      The authors use three separate cell lines and CRISPR knockouts delivered through lentivirus that target 59 genes to establish a range of cell physiologies that they directly measure (the "Cell Health") and then relate to similar assays performed by Cell Painting. Ultimately they aim to use Cell Painting models to predict Cell Health.

      We thank the reviewer for their succinct summary of our goals and rationale for this manuscript, and for the constructive and valuable comments herein.

      **Major Issues:**

      It appears that the phenotypes that are detected at a high enough level of significance (see Fig. 2), e.g DNA damage (gH2Ax), apoptosis (Caspase 3/7), dead cells, ROS (CellROX), etc. are probably most easily detected by simply monitoring DAPI signal in these screens. To detect many of the phenotypes, the authors have presented a fairly complex method of doing much simpler assays. The authors correctly highlight in Fig. 3 that the phenotypes they are detecting go beyond pure signals from DAPI. They report power in their models from Radial Distribution across many different components of the Cell Painting feature set.

      We agree that the two assays we’re collectively calling “Cell Health” are indeed fairly complex - we use two different panels of multiplexed stains and a series of gating strategies to measure phenotypes in various cell subpopulations. However, the fundamental message in the manuscript is that we may no longer need to perform these complex assays if we get this information from the simpler Cell Painting assay.

      We agree that our machine learning approach to predict the various cell health phenotypes uses signals beyond nucleus-based stains. However, even if we are predicting just DAPI signals, this reinforces our argument that the specific stains in the Cell Health assays (which are commonly used in targeted experiments) are not necessary to measure specifically. Instead, in certain circumstances, a scientist should just use unbiased stains to capture their biology of interest, since the stains are cheaper at scale and one has access to much more information.

      It is also worth noting that the DNA damage phenotypes in specific cell subpopulations (e.g. DNA Damage in G1 cells) would not be possible to measure with high precision without EdU co-staining.

      However these appear to give outputs that won't be that useful. It is hard to tell whether this is simply because they don't have enough images or whether their signal is confounded by using cell lines where the lentivirus CRISPR knockouts are working less efficiently.

      (Reviewer 2 introduced a similar critique below, which we now move here) A fundamental issue that the authors mention but do not address is the efficiency of the CRISPR KOs. The authors should measure the efficiency of representative guides and present these data to help support the interpretation of their models.

      We definitely agree that sample size is a limitation in this manuscript. Our primary goal with this paper was to demonstrate feasibility of the approach to predict the targeted Cell Health readouts using a simpler (and more affordable/scalable) assay in Cell Painting. The promising results we observed, especially given this sample size limitation, motivates collecting a larger dataset using more perturbations.

      Potentially confounded signal by low efficiency CRISPR knockouts is also an interesting topic. We do provide Supplementary Figure S8 to describe a subtle relationship that we observed regarding CRISPR infection efficiency. We also discuss this in the results as: “We observed overall better predictivity in ES2 cells, which had the highest CRISPR infection efficiency (Supplementary Figure 8), suggesting that stronger perturbations provide better information for training and that training on additional data should provide further benefit.”

      Additionally, we made a substantial effort to maximize CRISPR efficiency by independently optimizing lentivirus volumes for each sgRNA. In general, we observed that some cell lines are easier to CRISPR, probably based on more factors beyond Cas9 expression. However, we note that CRISPR is being used simply as a perturbation to elicit a variable morphology response. In other words, the type, efficacy, and even accuracy of perturbation does not matter as long as it satisfies two constraints: 1) induces a morphology response for a sufficient number of perturbations, and 2) is consistent between the two assays (Cell Health and Cell Painting). Our setup satisfies both constraints.

      However, this experiment (and data from the experiment) can be used in other contexts in which the CRISPR efficiency is extremely important. Therefore, we added three columns to Supplementary Table 1 providing the efficiency readouts for the three cell lines. (This information was already present in GitHub, but we moved it to a more obvious location in Supplementary Table 1). Code describing this change can be viewed here: https://github.com/broadinstitute/cell-health/pull/142

      In regards to the first sentence of this concern: “However these appear to give outputs that won’t be that useful” - indeed, we fully expected that many cell health readouts would be difficult to predict. In the original submission, we included the following explanation for potential sources of low performing models: ”Performance differences might result from random technical variation, small sample sizes for training models, different number of cells in certain Cell Health subpopulations (e.g. mitosis or polynuclear cells), fewer cells collected in the viability panel (see methods), or the inability of Cell Painting reagents to capture certain phenotypes.”

      It seems misleading (or perhaps the explanation lacks clarity) to describe in the same paragraph the need to validate the model by applying it to new datasets, namely the Drug Repurposing Hub project, then describe gradients in cell health features across UMAP coordinates.

      We thank the reviewer for pointing out this source of confusion and for providing an opportunity to improve the clarity of this section. Our major revisions here are as follows: 1) Introduce the Drug Repurposing Hub as an external dataset for validation; 2) Validate a high performing and simple model (number of live cells) by comparing model readout predictions from the Drug Repurposing Hub Cell Painting profiles against orthogonal PRISM viability readouts (in compounds with slightly different doses); 3) Validate three additional models: enrichment of proteasome inhibitors in the ROS model, enrichment of PLK inhibitors in the G1 cell count model, and enrichment of tubulin-destabilizing compounds in the Number of gH2Ax spots in G1 cells model; 4) Display a global structure of Cell Health predictions in UMAP space for select models. Note that for the fourth point, we are using the UMAP gradients to observe patterns, and not to validate models.

      In order to encapsulate the updated flow, we’ve pasted below the entire Drug Repurposing Hub results/discussion section, which introduces two additional analyses and new text in response to various other reviewer comments. We feel that the updated section improves clarity and purpose.

      The updated section now reads:

      “Predictive models of cell health would be most useful if they could be trained once and successfully applied to data sets collected separately from the experiment used for training. Otherwise one could not annotate existing datasets that lack parallel Cell Health results, and Cell Health assays would have to be run alongside each new dataset. We therefore applied our trained models to a large, publicly-available Cell Painting dataset collected as part of the Drug Repurposing Hub project (Corsello et al., 2017). The data derive from A549 lung cancer cells treated with 1,571 compound perturbations measured in six doses.

      We first chose a simple, high-performing model to validate. The number of live cells model captures the number of cells that are unstained by DRAQ7. We compared model predictions to orthogonal viability readouts from a third dataset: Publicly available PRISM assay readouts, which count barcoded cells after an incubation period (Yu et al., 2016). Despite measuring perturbations with slightly different doses and being fundamentally different ways to count live cells (Figure 4A), the predictions correlated with the assay readout (Spearman's Rho = 0.35, p -3; Figure 4B).

      We also chose to validate three additional models: ROS, G1 cell count, and Number of gH2AX spots in G1 cells. We observed that the two proteasome inhibitors (bortezomib and MG-132) in the Drug Repurposing Hub set yielded high ROS predictions (OR = 76.7; p -15) (Figure 4C). Proteasome inhibitors are known to induce ROS (Han and Park, 2010; Ling et al., 2003). As well, PLK inhibitors yielded low G1 cell counts (OR = 0.035; p = 3.9 x 10-8) (Figure 4C). The PLK inhibitor HM-214 showed an appropriate dose response (Figure 4D). PLK inhibitors block mitotic progression, thus reducing entry into the G1 cell cycle phase (Lee et al., 2014). Lastly, we observed that aurora kinase and tubulin inhibitors yielded high Number of gH2AX spots in G1 cells predictions (OR = 11.3; p Figure 4E). In particular, we observed a strong dose response for the aurora kinase inhibitor barasertib (AZD1152) (Figure 4F). Aurora kinase and tubulin inhibitors cause prolonged mitotic arrest, which can lead to mitotic slippage, G1 arrest, DNA damage, and senescence (Orth et al. 2011; Cheng and Crasta 2017; Tsuda et al. 2017).

      We applied uniform manifold approximation (UMAP) to observe the underlying structure of the samples as captured by morphology data (McInnes et al., 2018). We observed that the UMAP space captures gradients in predicted G1 cell count (Supplementary Figure S14A) and in predicted ROS (Supplementary Figure S14B). We also observed similar gradients in the ground truth cell health readouts in the CRISPR Cell Painting profiles used for training cell health models (Supplementary Figure S15). Gradients in our data suggest that cell health phenotypes manifest in a continuum rather than in discrete states.

      Lastly, we observed moderate technical artifacts in the Drug Repurposing Hub profiles, indicated by high DMSO profile dispersion in the Cell Painting UMAP space (Supplementary Figure 14C). This represents an opportunity to improve model predictions with new batch effect correction tools. Additionally, it is important to note that the expected performance of each Cell Health model can only be as good as the performance observed in the original test set (see Figure 2), and that all predictions require further experimental validation.“

      Updated Figure 4:

      Figure 4: Validating Cell Health models applied to Cell Painting data from The Drug Repurposing Hub. The models were not trained using the Drug Repurposing Hub data. (a) The results of the dose alignment between the PRISM assay and the Drug Repurposing Hub data. This view indicates that there was not a one-to-one matching between perturbation doses. (b) Comparing viability estimates from the PRISM assay to the predicted number of live cells in the Drug Repurposing Hub. The PRISM assay estimates viability by measuring barcoded A549 cells after an incubation period. (c) Drug Repurposing Hub profiles stratified by G1 cell count and ROS predictions. Bortezomib and MG-132 are proteasome inhibitors and are used as positive controls in the Drug Repurposing Hub set; DMSO is a negative control. We also highlight all PLK inhibitors in the dataset. (d) HMN-214 is an example of a PLK inhibitor that shows strong dose response for G1 cell count predictions. (e) Tubulin and aurora kinase inhibitors are predicted to have high Number of gH2AX spots in G1 cells compared to other compounds and controls. (f) Barasertib (AZD1152) is an aurora kinase inhibitor that is predicted to have a strong dose response for Number of gH2AX spots in G1 cells predictions.

      Updated Supplementary Figure:

      Supplementary Figure S14: Applying a Uniform Manifold Approximation (UMAP) to Drug Repurposing Hub consensus profiles of 1,571 compounds across six doses. The models were not trained using the Drug Repurposing Hub data. (a) The point color represents the output of the Cell Health model trained to predict the number of cells in G1 phase (G1 cell count). (b) The same UMAP dimensions, but colored by the output of the Cell Health model trained to predict reactive oxygen species (ROS). (c) In the UMAP space, we highlight DMSO as a negative control, and Bortezomib and MG-132 as two positive controls (proteasome inhibitors) in the Drug Repurposing Hub set. We observe moderate batch effects in the negative control DMSO profiles, based on their spread in this visualization. The color represents the predicted number of live cells. The positive controls were acquired with a very high dose and are expected to result in a very low number of predicted live cells.

      All software updates required to update these figures can be viewed at https://github.com/broadinstitute/cell-health/pull/145

      Is it surprising that cell health phenotypes and gradients therein are present in a dataset describing cell health perturbations?

      This was not surprising to us, and we thank the reviewer for asking the question. We have now added a new Supplementary Figure to present a UMAP with ground truth cell health measurements in the CRISPR dataset (pasted below). By adding the figure, we show how Cell Health predictions are expected to show gradients in UMAP space. In fact, for any lower-dimensional embedding that is able to preserve local neighborhoods of the high-dimensional space, we should expect all linear transformations of the input data (in the high-dimensional space) to vary smoothly across the lower-dimensional embedding. However, it is still informative to observe where the specific Cell Health phenotype predictions manifest in relation to global morphology structure. We add the following sentence in the Drug Repurposing Hub paragraph juxtaposed to the other UMAP gradient observations:

      We applied uniform manifold approximation (UMAP) to observe the underlying structure of the samples as captured by morphology data (McInnes et al., 2018). We observed that the UMAP space captures gradients in predicted G1 cell count (Supplementary Figure S14A) and in predicted ROS (Supplementary Figure S14B). We also observed similar gradients in the ground truth cell health readouts in the CRISPR Cell Painting profiles used for training cell health models (Supplementary Figure S15). Gradients in our data suggest that cell health phenotypes manifest in a continuum rather than in discrete states.

      Supplementary Figure S15: Applying a Uniform Manifold Approximation (UMAP) to the Cell Painting consensus profile data of CRISPR perturbations. UMAP coordinates visualized by (a) cell line, (b) ground truth G1 cell counts, and (c) ground truth ROS counts. (d) Visualizing the distribution of ground truth ROS compared against G1 cell count. The two outlier ES2 profiles are CRISPR knockdowns of GPX4, which is known to cause high ROS.

      We have also added the option to explore the CRISPR profile Cell Health ground truth in our shiny app https://broad.io/cell-health (screenshot pasted below)

      Modifications to the software introducing these changes can be viewed at https://github.com/broadinstitute/cell-health/pull/141.

      The actual test of the model's performance is in the paragraph below, but the data associated with the Spearman correlation is hidden in Fig. S10b. The data is not convincing by eye, and the artifactually low p value suggests that proper statistical corrections were not applied.

      We have moved the Spearman correlation figure (previously Supplementary Figure S10B) into a main figure, along with a complete restructuring of the results and discussion in the Drug Repurposing Hub section.

      We appreciate the careful observations and interpretations, and confirm the statistical test performed here is sound and the p value is correct (there is no need to account for multiple testing since there is only one test being applied, a test of correlation between two variables).

      We add this rationale to the “Comparing viability predictions to an orthogonal readout” methods section:

      We performed the non-parametric Spearman correlation test because 1) the doses were not aligned between the datasets we compared, and 2) it is possible that a strong nonlinear correlation exists between readouts from two fundamentally different ways to measure viability.

      It is definitely valid to critique the scatter plot relationship to understand that the mean squared error is quite high (i.e. if two datasets had viability measurements using the two approaches, it would be wrong to assume that lower measurements in one assay automatically could be compared to lower measurements of the other assay). This level of variability would be lost if all we did was report the test statistic, which is the reason why we included the scatter plot as a figure.

      It may also be important to mention that the authors of the PRISM paper also noted high variation in their estimates (from Corsello et al https://doi.org/10.1038/s43018-019-0018-6): "At the level of individual compound dose–responses, we note that the PRISM Repurposing dataset tends to be somewhat noisier, with a higher standard error estimated from vehicle control measurements (Extended Data Fig. 5c and Extended Data Fig. 6a–c)."

      Nevertheless, we agree that the current way we report this p value is distracting and potentially misleading, depending on how the p value is interpreted. Therefore, we have updated the reporting of all p values to say that they are less than a predefined cutoff. The figure now states that p

      Fig 1A and associated methods are not sufficient information to describe the manual gating strategy and any variability found across iterations in these gates. Effort should be taken to quantify where these manual boundaries were set and why.

      We describe the manual gating strategies in much detail in the methods section “Cell Health assay: Image analysis”. However, we agree that a description of measurement variability and experimental approach requires more detail, and we agree that the manuscript would benefit from a visual example of these gates. These improvements required us to rearrange Figure 1.

      With a goal of increasing reproducibility in the cell health assay, we’ve (1) moved example images of the Cell Health assay to Figure 1A; (2) Moved the existing gating strategies drawing to Supplementary Figure 1; (3) Added real data examples of the manual gating strategy as a new Supplementary Figure 2. We show all updates below:

      Updated Figure 1:

      Figure 1. Data processing and modeling approach. (a) Example images and workflow from the Cell Health assays. We apply a series of manual gating strategies (see Methods) to isolate cell subpopulations and to generate cell health readouts for each perturbation. (top) In the “Cell Cycle” panel, in each nucleus we measure Hoechst, EdU, PH3, and gH2AX. (bottom) In the “Cell Viability” panel, we capture digital phase contrast images, measure Caspase 3/7, DRAQ7, and CellROX. (b) Example Cell Painting image across five channels, plus a merged representation across channels. The image is cropped from a larger image and shows ES2 cells. Below are the steps applied in an image-based profiling pipeline, after features have been extracted from each cell’s image. (c) Modeling approach where we fit 70 different regression models using CellProfiler features derived from Cell Painting images to predict Cell Health readouts.

      Updated Supplementary Figure S1:

      Supplementary Figure S1: Illustration of the gating strategy in the Cell Health assays. We extract 70 different readouts from the Cell Health imaging assay. The assay consists of two customized reagent panels, which use measurements from seven different targeted reagents and one channel based on digital phase contrast (DPC) imaging; shown are five toy examples to demonstrate that individual cells are isolated into subpopulations by various gating strategies to define the Cell Health readouts.

      Updated Supplementary Figure S2 (Example gating strategies):

      Supplementary Figure S2: Real data of manual gating in the Cell Health assays.

      For each cell line, we apply a series of manual gating strategies defined by various stain measurements in single cells to define cell subpopulations. (a) In the cell cycle panel, we first select cells that are useful for cell cycle analysis based on nucleus roundness and Hoechst intensity measurements. We also identify polyploid and “large not round” (polynuclear) cells. (b) We then subdivide the cells used for cell cycle to G1, G2, and S cells based on total Hoechst intensity (DNA content) and EdU incorporation signal intensity. (c) We use Hoechst and PH3 nucleus intensity to define mitotic cells. The points are colored by EdU intensity in the nucleus in both (b) and (c). (d) Example gating in the viability panel. We use DRAQ7 and CellEvent (Caspase 3/7) to distinguish alive and dead cells, and categorize early or late apoptosis. See Methods for more details about how the Cell Health measurements are made.

      We’ve also added the following to the methods section:

      Additionally, we set these gates for each cell subpopulation using a set of random wells from each cell line and experiment independently. We observed that the intensity measurements used to form the gates were consistent across wells and plates, and generally formed distinct cell subpopulation clusters. After using the random wells to set the gates, we used the Harmony microscope software to apply the gates to the remaining wells and plates.

      In general however, the need to clearly define this process further emphasizes a strength in our approach: There is great potential for inconsistencies when different humans draw gates. We aim to reduce these inconsistencies by predicting these readouts from Cell Painting images directly.

      The authors conclude that their results motivate further data acquisition and model training, and that this will improve model performance. This is only true if their lack of predictive power comes from the data volume itself, and not in larger problems of data quality, variability and the core assumptions of their method. The authors note the better predictability in ES2 cells, likely due to higher CRISPR efficiency and therefore stronger phenotypes. It is possible, as I believe the authors suggest, that the ES2 cells provide information that improves the predictive power of cells with poor infection efficiency. It is instead possible that only the ES2 cells with strong phenotypes yield predictive power, pulling the average of the dataset up. Authors could train the cell line specific datasets independently and compare relative changes in predictive performance. Otherwise, is it possible that subtle or highly complex phenotypes simply cannot be detected by this method and more data will be unlikely to improve predictability in modest perturbations.

      We thank the reviewers for raising this possibility. To explore this, we performed a cell-line holdout analysis in which we retrained (and individually reoptimized) all 70 cell health models on every combination of two cell lines and predicted readouts from the held out third cell line.

      Despite there being fewer samples in the training set in the cell line holdout test compared to the original test set (66% vs. 85%) and the fact that each model had never seen the held out cell line before, many cell health phenotypes could still be predicted. We add the following results in a new Supplementary Figure:

      Supplementary Figure S11: Results from a cell line holdout analysis. We trained and evaluated all 70 cell health models in three different scenarios using each combination of two cell lines to train, and the remaining cell line to evaluate. For example, we trained all 70 models using data from A549 and ES2 and evaluated performance in HCC44. We bin all cell health models into 14 different categories (see Supplementary Table S3 and https://github.com/broadinstitute/cell-health/6.ml-robustness for details about the categories and scores). We also provide the original test set (15% of the data, distributed evenly across all cell types) performance in the last row, as well as results after training with randomly permuted data. This cross-cell-type analysis yields worse performance overall. Nevertheless, despite the models never encountering certain cell lines, and having fewer training data points, many models still have predictive power across cell line contexts. Note that we truncated the y axis to remove extreme outliers far below -1. The raw scores are available on https://github.com/broadinstitute/cell-health.

      We’ve also performed a sample size titration analysis, which suggests that more data would indeed improve model performance. More data would also enable a deep learning approach, which is also likely to improve performance.

      Supplementary Figure S13: Dropping samples from training reduces test set model performance in high, mid, and low performing models. We determined model performance stratification by taking the top third, mid third, and bottom third of test set performance when using all data. We performed the sample titration analysis with 10 different random seeds and visualized the median test set performance for each model.

      We also update the results section to introduce and discuss this result:

      Lastly, we performed a sample size titration analysis in which we randomly removed a decreasing amount of samples from training. For the high and mid performing models, we observed a consistent performance drop, suggesting that increasing sample size would result in better overall performance (Supplementary Figure 13).

      And an updated methods describing this analysis now reads:

      Machine learning robustness: Investigating the impact of sample size

      We performed an analysis in which we randomly dropped an increasing amount of samples from the training set before model training. After dropping the predefined number of samples, we retrained all 70 cell health models and assessed performance on the original holdout test set. We performed this procedure ten times with ten unique random seeds to mirror a more realistic scenario of new data collection and to reduce the impact of outlier samples on model training.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      Although the authors argue that the Cell Painting assay is capturing complex health phenotypes using a variety of morphological features, there is a clear overweighting of a particular few (in fact two...). It would be interesting to systematically retrain with exclusion of particular features to determine if equalizing the weight across features changes performance. These are also notably the feature groups with the fewest features-- how many individual features within these feature groups are pulling all the weight?

      We agree that an additional computational analysis including a systematic feature removal would be interesting and valuable. We’ve included this analysis as part of a new results subsection in which we assess where classification improvements are likely to come from by testing robustness of the ML models.

      Specifically, we’ve systematically removed individual features that belong to specific feature groups, channels, and compartments to determine how much their absence negatively affects model performance. The added supplementary figure is pasted below.

      Supplementary Figure S12: Systematically removing classes of features has little impact on most models’ performance. We retrained all 70 cell health models after dropping features associated with specific (a) feature groups, (b) channels, and (c) compartments. Each dot is one model (predictor), and the performance difference between the original model and the retrained model after dropping features is shown on the x axis. Any positive change indicates that the models got worse after dropping the feature group. (d) Individual model differences in performance after dropping features. Each dot is one class of features removed (as in a-c).

      We conclude that the majority of cell health models are robust to missing feature groups. Some models actually improve with a reduction in the feature space. Combined with the feature heatmap presented in Figure 3, these results tell us that a lot of the morphology signal is redundant across Cell Painting features.

      We add the following text to the results:

      We also performed a systematic feature removal analysis, in which we retrained cell health models after dropping features that are measured from specific groups, compartments, and channels. We observed that most models were robust to dropping entire feature classes during training (Supplementary Figure 12). This result demonstrates that many Cell Painting features are highly correlated, which might permit prediction “rescue” even if the directly implicated morphology features are not measured. Because of this, we urge caution when generating hypotheses regarding causal relationships between readouts and individual Cell Painting features.

      And the following to the methods:

      Machine learning robustness: Systematically removing feature classes

      We performed an analysis in which we systematically dropped features measured in specific compartments (Nuclei, Cells, and Cytoplasm), specific channels (RNA, Mito, ER, DNA, AGP), and specific feature groups (Texture, Radial Distribution, Neighbors, Intensity, Granularity, Correlation, Area Shape) and retrained all models. We omitted one feature class and then independently optimized all 70 cell health models as described in the Machine learning framework results section above. We repeated this procedure once per feature class.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      In summary there is a very interesting concept here, but for several possible, currently undefined reasons, the authors are reporting a very weak measurement. The authors allude to these limitations, but it would be great if the authors could address these issues and provide a stronger dataset.

      We thank the reviewers for their encouraging remarks. We believe that with the added robustness analyses and with increased clarity about the motivation behind the paper, we’ve successfully demonstrated a proof of concept for the approach to predict cell health phenotypes from Cell Painting images. We believe that we’ve provided sufficient evidence to a reader to demonstrate the benefits of the prediction approach. As well, given the additional details describing the Cell Health assay reproducibility, that the paper also successfully introduces a new assay paradigm.

      Furthermore, while many of the cell health measurements are definitely weak (and unreliable), it is not fair to generalize all predictions as weak (especially given the sample size limitations).

      It is also worth noting that, under the current circumstances, separating the one dataset we have into a train/test set and validating the model in an external set is the best we could do; we do not have additional budget to run further wet lab experiments (which would also face a COVID backlog in our chemical screening group). We agree that additional datasets would benefit the field; our current data is now public, all of our future data will be public (to the extent possible), and we hope that others building on our work will make their data public too to address these questions.

      Lastly, in response to the “currently undefined reasons” comment, as well as other comments throughout, we’ve now included a new subsection in the Results/Discussion subsection to more directly answer some of the reasons why many models may have underperformed. Specifically, and as mentioned previously in this response, we perform three distinct robustness analyses: 1) Cell line holdout; 2) feature holdout; 3) sample size titration.

      Authors should include representative images of their Cell Health assay in the main figures. A full figure of all labels and examples of manual gating should be included (S1 is too limited)

      Scale bars need to be included in all images, some are missing in S1

      We thank the reviewers for this suggestion. We have since substantially updated figure 1 and supplementary figure S1. We have also added a new supplementary figure S2 as an example of the manual gating strategies, and we have updated all scale bars appropriately. We’ve attached the specific figure updates in an earlier response.

      "20x water objective in confocal mode" is not a sufficient level of detail on image acquisition parameters especially considering the lack of representative images. At the very least, NA and if appropriate pinhole size should be reported. Similarly, "9 FOV per well" is not sufficient. Pixel size and FOV area/dimensions are necessary.

      We have added these necessary details in their representative methods sections:

      We acquired all cell images using an Opera Phenix High Content Imaging Instrument (PerkinElmer) with a 20X water objective (a numerical aperture (NA) of 1.0), in confocal mode (a pinhole size of 50µm). The effective pixel size was 0.65µm/pixel. We acquired images in four channels using default excitation / emission combinations: for the blue channel (Hoechst) 405/435-480; for the green channel (Alexa 488 and CellEvent) 488/500-550; for the orange channel (Alexa 568 and CellRox Orange) 561/570-630 and for the far-red channel (Alexa 647 and DRAQ7) 640/650-760. We applied the Cell Health reagents for cell viability and for cell cycle in two separate plates.

      The legends for the different parts of Fig S10 are transposed which makes the figure quite confusing.The authors should amend or clarify the language of "guide perturbation" and "guide profile".

      Wow! We thank the reviewers for pointing out this oversight, and for their careful attention to detail. This figure is now completely different after the restructuring of the Drug Repurposing Hub results/discussion section. The legends for all figures are now correct.

      EdU is defined after it is abbreviated in methods

      We thank the reviewers for noting this. We’ve now fixed where these acronyms are abbreviated in the methods section and removed their definition in later sections where redundant:

      The authors should address the following image processing reproducibility concerns:

      Segmentation and feature extraction parameters are not included in the Supplementary Information. Either attach the CellProfiler pipeline or add a table with parameters and settings used for each module.

      CellProfiler and Harmony versions are missing.

      We thank the reviewers for pointing out these very important omissions. We have since rectified in the methods section:

      We built a CellProfiler image analysis and illumination correction pipeline (version 2.2.0) to extract these image-based features (McQuin et al., 2018). We include the CellProfiler pipelines in our github repository.

      We developed and ran two distinct image analysis pipelines in Harmony software (version 4.1; PerkinElmer) for each of the Cell Health plates.

      We also add the CellProfiler pipelines to our GitHub repository. A pull request introducing this change can be viewed here: https://github.com/broadinstitute/cell-health/pull/149

      Subpopulation definition (page 14) should be defined in a way that the algorithms (pipelines) could be reproduced, e.g.: "unusually high intensity of Hoechst max" requires a stricter definition.

      These definitions are subjective by nature. Gating decisions will be different depending on the scientist performing the image analysis. We feel that the sentence: “We excluded outlier nuclei with unusually high intensity of Hoechst max” conveys this subjectivity well. One of the strengths of the proposed approach to predict cell health phenotypes directly from the Cell Painting images is the removal of gating subjectivity.

      Why is the nucleus roundness calculated in PE Harmony and not in the CellProfiler pipeline itself?

      We used the nucleus roundness measurements as calculated in PE Harmony to define the “cells selected for cell cycle” subpopulation in the first panel of the Cell Health assay. I.e. this measurement was integral to the Cell Health assay itself. We believe that the addition of example gates (in supplementary figure 2) clears up this confusion.

      Reviewers:

      Jason Swedlow

      Melpi Platani

      Erin Diel

      Emil Rozbicki

      Reviewer #2 (Significance (Required)):

      Nature and Significance: This study aims to demonstrate how phenotypic studies using different markers can be combined and linked to deliver wider application and value.

      Relationship to Published Work: This study extends previous work from the same group and attempts a novel extension. The approach is a useful concept and potentially important.

      Audience: The method this paper proposes will be of interests to scientists involved with drug discovery and/or computational biology.

      Reviewer's Expertise: Cell Biology, Imaging, Imaging Informatics, Machine Learning, Computer Vision

      We would like to again express thanks to these reviewers for their careful read, very helpful comments, and encouraging remarks.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The authors present a novel idea on predicting various cell health readouts based on a general set of markers and cell painting assay. The cell health readouts are based on more specific markers performed in different assays measuring cell proliferation and death. The authors suggest that such an approach can reduce the number of experiments needed. The paper is well written, and the figures are clear and comprehensive.

      We thank the reviewer for their helpful comments and encouragement!

      **Major comments:**

      Some of the health readouts are based on general morphology (cell and nucleus) which can be obtained based on cell painting assay. Although some of these models perform well, it is surprising that the model of nuclear roundness did not perform very well especially for HCC4 (R-square reaching zero). This is surprising as these data can be extracted from cell painting assays. Can the author elaborate on why this is the case?

      We agree that the performance of the live cell roundness and nucleus roundness models were unexpectedly low. One would expect that these shape features as measured by PerkinElmer Harmony software, would be easily predicted from CellProfiler readouts from the Cell Painting assay.

      The roundness property was used in Harmony versions,

      2*sqrt(π)*sqrt(Area-BorderArea/2.0)/BorderArea-0.1)

      where Area is object area in pixels and BorderArea is border area in pixels (we thank Joe Trask, Olavi Ollikainen, Hartwig Preckel, and Kaupo Palo at PerkinElmer for this information.)

      No single feature in the CellProfiler readouts measures roundness directly; instead, CellProfiler will measure a combination of shape features that together could synthesize the idea of “roundness”. However, given that the elastic net approach is well-suited for this type of synthesis, it remains unclear why roundess is not predicted well.

      One possible explanation is that shape features are the most different measurements across cell lines and they are measured precisely in both assays. Precise measurements coupled with our training strategy of using all three lines together, might lead to poor performance in predicting certain cell-line intrinsic features.

      We tested this shape result directly (and also generally to the other cell health features) in a “cell line holdout” analysis, which we describe in more detail in response to the next comment. In this analysis, we tested how well models generalized to cell lines not encountered in the training process. In this analysis, we trained on every combination of two cell lines and applied the trained models to the third. We observed that cell line intrinsic features, like shape, are predicted poorly if a model was not trained using the cell line.

      Using elastic net regression models is well-suited to the problem due to the low number of observations. However, there is a significant difference between the performance of different cell lines. Does the performance of the models improve if different models were trained for every cell line? Leave one out approach can be used to accommodate the scarcity of samples.

      We thank the reviewer for this important question. We also appreciate how different certain models behaved with certain cell lines. We would like to stress that the results presented here represent a small pilot study that is not meant to optimize model performance. Instead, the motivation of the manuscript is to demonstrate proof-of-concept of the approach to predict specific cell health phenotypes directly from Cell Painting images. We believe that the current results demonstrate positive proof, which warrants an expansion of data collection and an improvement of the classification methodology.

      Nevertheless, with our current data, we can answer an important question about the feasibility of signal transfer between cell lines. Therefore, we performed an additional “cell line holdout” analysis. We believe that the cell line holdout analysis tells us that signals can be transferred across contexts, but that any leading observations must be followed up with experiments performed directly in the cell line of interest. This signal transfer is diluted compared to the original test set performance, but it is also worth noting that the models presented in Supplementary Figure 11 (pasted below) were trained on only 66% of the data in the holdout cell line analysis and 85% of the data in the original analysis.

      Supplementary Figure S11: Results from a cell line holdout analysis. We trained and evaluated all 70 cell health models in three different scenarios using each combination of two cell lines to train, and the remaining cell line to evaluate. For example, we trained all 70 models using data from A549 and ES2 and evaluated performance in HCC44. We bin all cell health models into 14 different categories (see Supplementary Table S3 and https://github.com/broadinstitute/cell-health/6.ml-robustness for details about the categories and scores). We also provide the original test set (15% of the data, distributed evenly across all cell types) performance in the last row, as well as results after training with randomly permuted data. This cross-cell-type analysis yields worse performance overall. Nevertheless, despite the models never encountering certain cell lines, and having fewer training data points, many models still have predictive power across cell line contexts. Note that we truncated the y axis to remove extreme outliers far below -1. The raw scores are available on https://github.com/broadinstitute/cell-health.

      And we add the following text to the results section:

      We performed a series of analyses to determine certain parameters and options that are likely to improve models in the future. First, we performed a “cell line holdout” analysis, in which we trained models on two of three cell lines and predicted cell health readouts on the held out cell line. We observed that certain models including those based on viability, S phase, early mitotic and death phenotypes could be moderately predicted in cell lines agnostic to training (Supplementary Figure 11). Not surprisingly, shape-based phenotypes could not be predicted in holdout cell lines, which emphasizes the limitations of transferring certain cell-line specific measurements across cell lines.

      All software updates introducing this analysis can be viewed at https://github.com/broadinstitute/cell-health/pull/143

      The authors chose to validate based on the number of live cells as it is one of the best models. However, this readout can be obtained using simple viability assays. It would be more convincing to validate on a more complex phenotype that can only be attained using imaging such as #gH2AX spots.

      It is worth noting that we do also show generalizability in the Drug Repurposing Hub for two other models: ROS and G1 cell count. We show that proteasome inhibitors significantly induce high ROS and PLK inhibitors restrict entry to G1. We have also added enrichment tests demonstrating high statistical significance for these compound mechanisms.

      While we recognize that these two examples provide anecdotal evidence, they suggest the ability and power of the approach to assign phenotypes to Cell Painting images.

      Nevertheless, we thank the reviewer for bringing up this critical point and certainly appreciate the benefit of validating a gH2AX model. Therefore, we’ve added a similar analysis in which we demonstrate generalizability of the top performing gH2Ax model: Number of gH2AX spots in G1 cells. We discuss these changes in an updated section:

      We also chose to validate three additional models: ROS, G1 cell count, and Number of gH2AX spots in G1 cells. We observed that the two proteasome inhibitors (bortezomib and MG-132) in the Drug Repurposing Hub set yielded high ROS predictions (OR = 76.7; p -15) (Figure 4C). Proteasome inhibitors are known to induce ROS (Han and Park, 2010; Ling et al., 2003). As well, PLK inhibitors yielded low G1 cell counts (OR = 0.035; p = 3.9 x 10-8) (Figure 4C). The PLK inhibitor HM-214 showed an appropriate dose response (Figure 4D). PLK inhibitors block mitotic progression, thus reducing entry into the G1 cell cycle phase (Lee et al., 2014). Lastly, we observed that aurora kinase and tubulin inhibitors yielded high Number of gH2AX spots in G1 cells predictions (OR = 11.3; p Figure 4E). In particular, we observed a strong dose response for the aurora kinase inhibitor barasertib (AZD1152) (Figure 4F). Aurora kinase and tubulin inhibitors cause prolonged mitotic arrest, which can lead to mitotic slippage, G1 arrest, DNA damage, and senescence (Orth et al. 2011; Cheng and Crasta 2017; Tsuda et al. 2017).

      We also modify the abstract summarizing this result:

      For Cell Painting images from a set of 1,500+ compound perturbations across multiple doses, we validated predictions by orthogonal assay readouts, and by confirming mitotic arrest, ROS, and DNA damage phenotypes via PLK, proteasome, and aurora kinase/tubulin inhibition, respectively.

      And we add this analysis to an updated Figure 4:

      Figure 4: Validating Cell Health models applied to Cell Painting data from The Drug Repurposing Hub. The models were not trained using the Drug Repurposing Hub data. (a) The results of the dose alignment between the PRISM assay and the Drug Repurposing Hub data. This view indicates that there was not a one-to-one matching between perturbation doses. (b) Comparing viability estimates from the PRISM assay to the predicted number of live cells in the Drug Repurposing Hub. The PRISM assay estimates viability by measuring barcoded A549 cells after an incubation period. (c) Drug Repurposing Hub profiles stratified by G1 cell count and ROS predictions. Bortezomib and MG-132 are proteasome inhibitors and are used as positive controls in the Drug Repurposing Hub set; DMSO is a negative control. We also highlight all PLK inhibitors in the dataset. (d) HMN-214 is an example of a PLK inhibitor that shows strong dose response for G1 cell count predictions. (e) Tubulin and aurora kinase inhibitors are predicted to have high Number of gH2AX spots in G1 cells compared to other compounds and controls. (f) Barasertib (AZD1152) is an aurora kinase inhibitor that is predicted to have a strong dose response for Number of gH2AX spots in G1 cells predictions.

      All software updates required to update these figures can be viewed at https://github.com/broadinstitute/cell-health/pull/145

      It is also worth noting that collecting more data for this manuscript is not currently feasible given the amount of projects backlogged from COVID. We feel that given that the motivation of the project is to demonstrate feasibility of the approach, with our current training/testing machine learning framework and the application to Drug Repurposing Hub data is sufficient.

      The text would benefit from expanding the discussion to include the advantages and limitations of their approach.

      We thank the reviewer for bringing up this concern, and we agree that it is worth an increased discussion about advantages and limitations of the approach. Indeed, we’ve added a full new results/discussion subsection directly testing many of the assumptions for why some models performed well and others didn’t. The new section introduces many model limitations:

      We performed a series of analyses to determine certain parameters and options that are likely to improve models in the future. First, we performed a “cell line holdout” analysis, in which we trained models on two of three cell lines and predicted cell health readouts on the held out cell line. We observed that certain models including those based on viability, S phase, early mitotic and death phenotypes could be moderately predicted in cell lines agnostic to training (Supplementary Figure 11). Not surprisingly, shape-based phenotypes could not be predicted in holdout cell lines, which emphasizes the limitations of transferring certain cell-line specific measurements across cell lines. We also performed a systematic feature removal analysis, in which we retrained cell health models after dropping features that are measured from specific groups, compartments, and channels. We observed that many models were robust to dropping entire feature classes during training (Supplementary Figure 12). This result demonstrates that many Cell Painting features are highly correlated, which might permit prediction “rescue” even if the directly implicated morphology features are not measured. Because of this, we urge caution when generating hypotheses regarding causal relationships between phenotypes and individual Cell Painting features. Lastly, we performed a sample size titration analysis in which we randomly removed a decreasing amount of samples from training. For the high and mid performing models we observed a consistent performance drop, suggesting that increasing sample size would result in better overall performance (Supplementary Figure 13).

      **Minor comments**

      Page 8: The authors visualize the predicted G1 cell count and ROS when overlayed on a UMAP based on cell painting data from Drug Repurposing Hub. How these visualisations look like if applied to the original CRISPR training data.

      We address this comment by adding a supplementary figure showing ground truth G1 cell count and ROS readouts.

      We applied uniform manifold approximation (UMAP) to observe the underlying structure of the samples as captured by morphology data (McInnes et al., 2018). We observed that the UMAP space captures gradients in predicted G1 cell count (Supplementary Figure S14A) and in predicted ROS (Supplementary Figure S14B). We also observed similar gradients in the ground truth cell health readouts in the CRISPR Cell Painting profiles used for training cell health models (Supplementary Figure S15). Gradients in our data suggest that cell health phenotypes manifest in a continuum rather than in discrete states.

      Where Supplementary Figure 15 is pasted below:

      Supplementary Figure S15: Applying a Uniform Manifold Approximation (UMAP) to the Cell Painting consensus profile data of CRISPR perturbations. UMAP coordinates visualized by (a) cell line, (b) ground truth G1 cell counts, and (c) ground truth ROS counts. (d) Visualizing the distribution of ground truth ROS compared against G1 cell count. The two outlier ES2 profiles are CRISPR knockdowns of GPX4, which is known to cause high ROS.

      We have also added the option to explore the CRISPR profile Cell Health ground truth in our shiny app https://broad.io/cell-health (screenshot pasted below)

      Modifications to the software introducing these changes can be viewed at https://github.com/broadinstitute/cell-health/pull/141.

      The second part of the last paragraph on page 8 is confusing as it is not related to the first part using the PRISM data.

      We thank the reviewer for noting this. We agree that the clarity of this section could be improved. We have now completely reworked the final section of applying the cell health models to the Drug Repurposing Hub data.

      In particular, we’ve moved the PRISM data section as the first, most simple model to validate, and moved these results to Figure 4. We then describe validation for three other models: ROS, G1 cell count and Number of gH2Ax spots in G1 cells. And we end with the UMAP discussion, which is the original second part of the last paragraph on page 8.

      The PRISM section now reads:

      We first chose a simple, high-performing model to validate. The number of live cells model captures the number of cells that are unstained by DRAQ7. We compared model predictions to orthogonal viability readouts from a third dataset: Publicly available PRISM assay readouts, which count barcoded cells after an incubation period (Yu et al., 2016). Despite measuring perturbations with slightly different doses and being fundamentally different ways to count live cells (Figure 4A), the predictions correlated with the assay readout (Spearman's Rho = 0.35, p -3; Figure 4B).

      Reviewer #3 (Significance (Required)):

      This approach can be of wide interest as it is easy to implement, cost-effective and lead to interpretable models. It would be interesting to see if the results improve when increasing the sample size. Another aspect that can be useful to investigate in the future is whether including a separate marker that indicates infected cells only in the more detailed assays would result in better accuracies.

      We thank the reviewer for their enthusiasm and for this concluding idea. Indeed, we also feel that including a separate marker to indicate infected cells could lead to improved accuracy. We add this thought to the concluding section as a future direction. The full updated conclusion reads as follows:

      We have demonstrated feasibility that information in Cell Painting images can predict many different Cell Health indicators even when trained on a small dataset. The results motivate collecting larger datasets for training, with more perturbations and multiple cell lines. These new datasets would enable the development of more expressive models, based on deep learning, that can be applied to single cells. Including orthogonal imaging markers of CRISPR infection would also enable us to isolate cells with expected morphologies. More data and better models would improve the performance and generalizability of Cell Health models and enable annotation of new and existing large-scale Cell Painting datasets with important mechanisms of cell health and toxicity.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The authors present a novel idea on predicting various cell health readouts based on a general set of markers and cell painting assay. The cell health readouts are based on more specific markers performed in different assays measuring cell proliferation and death. The authors suggest that such an approach can reduce the number of experiments needed. The paper is well written, and the figures are clear and comprehensive.

      Major comments:

      Some of the health readouts are based on general morphology (cell and nucleus) which can be obtained based on cell painting assay. Although some of these models perform well, it is surprising that the model of nuclear roundness did not perform very well especially for HCC4 (R-square reaching zero). This is surprising as these data can be extracted from cell painting assays. Can the author elaborate on why this is the case?

      Using elastic net regression models is well-suited to the problem due to the low number of observations. However, there is a significant difference between the performance of different cell lines. Does the performance of the models improve if different models were trained for every cell line? Leave one out approach can be used to accommodate the scarcity of samples.

      The authors chose to validate based on the number of live cells as it is one of the best models. However, this readout can be obtained using simple viability assays. It would be more convincing to validate on a more complex phenotype that can only be attained using imaging such as #gH2AX spots.

      The text would benefit from expanding the discussion to include the advantages and limitations of their approach.

      Minor comments

      Page 8: The authors visualize the predicted G1 cell count and ROS when overlayed on a UMAP based on cell painting data from Drug Repurposing Hub. How these visualisations look like if applied to the original CRISPR training data.

      The second part of the last paragraph on page 8 is confusing as it is not related to the first part using the PRISM data.

      Significance

      This approach can be of wide interest as it is easy to implement, cost-effective and lead to interpretable models. It would be interesting to see if the results improve when increasing the sample size. Another aspect that can be useful to investigate in the future is whether including a separate marker that indicates infected cells only in the more detailed assays would result in better accuracies.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This report from Way et al describes a method of extending a very popular screening technology called Cell Painting developed by the Carpenter Lab. The authors are contending with an important issue and as such this paper potentially will be of great interest to the community. Cell Painting provides quantitative fingerprints of cell phenotypes in response to changes in the molecular or physiological status of cells. However the molecular basis or even the candidate pathways for those changes is not always clear. Here, the authors take specific markers of cell physiology, e.g., DNA damage, ROS production, cell cycle progression etc. and relate them to Cell Painting features. The authors are trying to address the issue that running many probes of cell physiology is expensive and time consuming and that identifying proxies for these assays using much simpler Cell Painting technologies would be a useful and potentially powerful approach. The overall goal is to develop some type of regression model that can link the state of cells (the "health") to Cell Painting fingerprints.

      The authors use three separate cell lines and CRISPR knockouts delivered through lentivirus that target 59 genes to establish a range of cell physiologies that they directly measure (the "Cell Health") and then relate to similar assays performed by Cell Painting. Ultimately they aim to use Cell Painting models to predict Cell Health.

      Major Issues:

      It appears that the phenotypes that are detected at a high enough level of significance (see Fig. 2), e.g DNA damage (gH2Ax), apoptosis (Caspase 3/7), dead cells, ROS (CellROX), etc. are probably most easily detected by simply monitoring DAPI signal in these screens. To detect many of the phenotypes, the authors have presented a fairly complex method of doing much simpler assays. The authors correctly highlight in Fig. 3 that the phenotypes they are detecting go beyond pure signals from DAPI. They report power in their models from Radial Distribution across many different components of the Cell Painting feature set. However these appear to give outputs that won't be that useful. It is hard to tell whether this is simply because they don't have enough images or whether their signal is confounded by using cell lines where the lentivirus CRISPR knockouts are working less efficiently.

      It seems misleading (or perhaps the explanation lacks clarity) to describe in the same paragraph the need to validate the model by applying it to new datasets, namely the Drug Repurposing Hub project, then describe gradients in cell health features across UMAP coordinates. Is it surprising that cell health phenotypes and gradients therein are present in a dataset describing cell health perturbations? The actual test of the model's performance is in the paragraph below, but the data associated with the Spearman correlation is hidden in Fig. S10b. The data is not convincing by eye, and the artifactually low p value suggests that proper statistical corrections were not applied.

      Fig 1A and associated methods are not sufficient information to describe the manual gating strategy and any variability found across iterations in these gates. Effort should be taken to quantify where these manual boundaries were set and why.

      A fundamental issue that the authors mention but do not address is the efficiency of the CRISPR KOs. The authors should measure the efficiency of representative guides and present these data to help support the interpretation of their models.

      The authors conclude that their results motivate further data acquisition and model training, and that this will improve model performance. This is only true if their lack of predictive power comes from the data volume itself, and not in larger problems of data quality, variability and the core assumptions of their method. The authors note the better predictability in ES2 cells, likely due to higher CRISPR efficiency and therefore stronger phenotypes. It is possible, as I believe the authors suggest, that the ES2 cells provide information that improves the predictive power of cells with poor infection efficiency. It is instead possible that only the ES2 cells with strong phenotypes yield predictive power, pulling the average of the dataset up. Authors could train the cell line specific datasets independently and compare relative changes in predictive performance. Otherwise, is it possible that subtle or highly complex phenotypes simply cannot be detected by this method and more data will be unlikely to improve predictability in modest perturbations.

      Although the authors argue that the Cell Painting assay is capturing complex health phenotypes using a variety of morphological features, there is a clear overweighting of a particular few (in fact two...). It would be interesting to systematically retrain with exclusion of particular features to determine if equalizing the weight across features changes performance. These are also notably the feature groups with the fewest features-- how many individual features within these feature groups are pulling all the weight?

      In summary there is a very interesting concept here, but for several possible, currently undefined reasons, the authors are reporting a very weak measurement. The authors allude to these limitations, but it would be great if the authors could address these issues and provide a stronger dataset.

      Minor issues: Authors should include representative images of their Cell Health assay in the main figures. A full figure of all labels and examples of manual gating should be included (S1 is too limited) Scale bars need to be included in all images, some are missing in S1

      "20x water objective in confocal mode" is not a sufficient level of detail on image acquisition parameters especially considering the lack of representative images. At the very least, NA and if appropriate pinhole size should be reported. Similarly, "9 FOV per well" is not sufficient. Pixel size and FOV area/dimensions are necessary.

      The legends for the different parts of Fig S10 are transposed which makes the figure quite confusing.The authors should amend or clarify the language of "guide perturbation" and "guide profile".

      EdU is defined after it is abbreviated in methods

      The authors should address the following image processing reproducibility concerns:

      Segmentation and feature extraction parameters are not included in the Supplementary Information. Either attach the CellProfiler pipeline or add a table with parameters and settings used for each module.

      CellProfiler and Harmony versions are missing.

      Subpopulation definition (page 14) should be defined in a way that the algorithms (pipelines) could be reproduced, e.g.: "unusually high intensity of Hoechst max" requires a stricter definition.

      Why is the nucleus roundness calculated in PE Harmony and not in the CellProfiler pipeline itself?

      Reviewers: Jason Swedlow Melpi Platani Erin Diel Emil Rozbicki

      Significance

      Nature and Significance: This study aims to demonstrate how phenotypic studies using different markers can be combined and linked to deliver wider application and value.

      Relationship to Published Work: This study extends previous work from the same group and attempts a novel extension. The approach is a useful concept and potentially important.

      Audience: The method this paper proposes will be of interests to scientists involved with drug discovery and/or computational biology.

      Reviewer's Expertise: Cell Biology, Imaging, Imaging Informatics, Machine Learning, Computer Vision

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The submitted manuscript entitled 'Predicting cell health phenotypes using image-based morphology profiling' (RC-2020-00394) by Way et al. presents a set of seven dyes/staining (as two separate panels) to microscopically screen cell viability. For automatic classification a training/test set of 119 CRISPR (approximately 2 sgRNAs per gene) perturbations on 3 cancer cell lines were generated (lung A549, ovarian ES2, lung HCC44). After segmentation of cell nuclei a set of morphological cell measurements were extracted from each perturbation (total 952 features). The nature of these feature spanning cell cycle and viability phenotypes, enabled the authors to define 70 different phenotype classes, which are used to model a classifier by elastic linear regression. Specific definitions (cell cycle and ROS) were partly predicted/validated in an independent existing image data set (Drug Repurposing Hub project). The data is available as web-based application/visualization and the supplementary method is well described.

      Major concerns:

      (1)The only fundamental argument of this manuscript not to apply state-of-the-art deep learning (DL) machine-learning (mentioned in McCain et al. 2018), which does not require segmentation, feature extraction, abstraction, manual gating is the 'interpretability' of the predictions. However, performance, precision, scalability (by modern GPUs) with DL should clearly outperform 'manual' regression models. All recent machine vision benchmarks in microscopy confirm this, but also clearly shows 'real world' translational applications, e.g.

      https://www.nature.com/articles/s43018-020-0085-8,

      https://www.biorxiv.org/content/10.1101/2020.07.02.183814v1.full.pdf,

      In other words, the presented methodology is not compared to DL, and is not convincing in terms of interpretability benefits.

      (2)One aforementioned point of the methodology is cryptically/not described: Why it should be less expensive compared with other (which?) approaches (see introduction)?

      (3)Generalizability and/or training data size is essential for any model-based classification, but not evaluated or validated in the current manuscript. The independent validation on a A549 cell line only data might be not sufficient/convincing.

      Minor concerns:

      (1)Highest test performance comprises that precision is mainly driven by cell cycle/count and live status and could be probably derived from DRAQ7 (Fig. 2) and DNA granularity (Fig. 3, bottom right) and would argue for rigid feature selection across channels and features.

      (2)Any H2AX and 'polynuclear' would probably fail in any cell line with this size of training data.

      (3)To what refers the 'weights' of the model in Fig. 1c?

      Significance

      This manuscript is not advanced in the context of latest improvements/developments of cell-based microscopic classification. Rationale in the introduction and the conclusion are not linked (interpretability, generalizability, costs). It seems to be unfinished or unformatted to this end?

      The author/co-authors have been instrumental/pioneered with their past work on cell-based image processing (CellProfiler software), but the presented methodology is simply outdated. Therefore, a revision towards a comparison and benchmarking with DL will also not help.

      Ref (DL with MIL): https://academic.oup.com/bioinformatics/article/32/12/i52/2288769

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers:

      We are grateful to the referees for investing valuable time in reviewing our work, and for recognising the importance and utility. We thank them for their insightful and constructive comments that have helped us significantly improve the manuscript.

      Below, we provide a point-by-point response to all specific questions raised.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): In order to improve SARS-CoV-2 diagnostics, Reijns et al. developed a multiplexed RT-qPCR protocol that allows simultaneous detection of two viral genes, one housekeeping gene as well as an external gene as an extraction control. Compared to running parallel assays to detect genes individually, the turnaround time is much shorter and reagents are saved. Furthermore, the presented data suggest that the assay is more sensitive than commercial kits. The authors also propose the detection of the human housekeeping gene as a measure of sample quality control. In principal, this work has potential but the manuscript itself needs a better structure. **Major concerns:** The authors have used the Takara RT-qPCR kit for their study. Did the authors try other commercial kits?

      We have not assessed other commercial kits as the Takara reagent performed well, and has been easy to source. We expect that other one-step kits could be used if the need arose.

      When we initiated this work in March 2020, we selected the Takara One Step PrimeScript™ III RT-PCR Kit based on 1) the practical advantages of a one-step reaction mix, 2) published evidence of its successful use in SARS-CoV-2 detection (see below), 3) availability in sufficient quantities for testing at scale, and 4) affordability.

      (Published evidence: One of the first descriptions of an assay to detect SARS-CoV-2 [1] employed the Takara One Step PrimeScript™ III RT-PCR Kit, and this kit was later shown by others to perform as well as or better than Qiagen Quantifast Multiplex RT-PCR +R mastermix, ThermoFisher TaqPath 1-Step RT-qPCR MasterMix and ThermoFisher Taqman Fast Virus 1-step mastermix, when used to detect SARS-CoV-2 RNA from nose and throat swabs with N1, N2 or N gene assays [2].)

      Can the authors elaborate on the supply chain of the Takara kit?

      We have not had problems securing the Takara kit in sufficient quantities and in a timely fashion, and did so through the company’s Scotland and NE England representative. The managing director of Takara Bio Europe provided the following statement, as a clarification of the supply chain:

      “Takara Bio Inc. has worked on significantly increasing the production of one-step RT-qPCR reagents to cover worldwide needs for SARS-CoV-2 detection. The production of this kit is based in China under ISO13485 certification and the European stock is based in, and distributed, from Paris. Throughout this pandemic, Takara Bio Europe has supplied millions of reactions around Europe to COVID-19 testing labs, without encountering any shortages or significant shipping delays.”

      Could it cover population testing in case of shortages of other commercial kits?

      Yes, it could. The Takara kit is available in 4,000 and 20,000 reaction pack sizes and therefore could well be a useful option in case of shortages of other commercial kits. Indeed, one motivation for developing the multiplex assay was to ensure diagnostic testing resilience in the face of reagent shortages.

      For better comparison, is it possible to give information on which primers the commercial kits are based on?

      We contacted both ThermoFisher and Abbott to ask for more information on the primers and probes included in the TaqPath COVID‐19 Combo Kit (detects N, ORF1ab and S gene) and Abbott RealTime SARS‐CoV‐2 assay (detects RdRp and N gene). Unfortunately, we were informed that this information is proprietary. For clarity, we have included the following in the Materials and Methods section:

      “Primers and probes included in the TaqPath COVID-19 Combo Kit (Thermo Fisher Scientific, Cat. No. A47814) detect SARS-CoV-2 ORF1ab, N and S gene; those in the Abbott RealTime SARS-CoV-2 assay (Cat. No. 09N77-090) detect RdRp and N gene. Further details are not available, as this information is proprietary.”

      Also, explain better the primers used in this study. For example, the N1 and N2 primers are directed against different regions of the SARS-CoV-2 N gene.

      We thank the reviewer for encouraging us to better explain the primers we use for our own assays, and now provide more detailed information in a new Fig 1.

      The result section needs a better structure as the first two pages do not refer to any of the main figures. For example, in which figure or table can the reader find the data that are discussed in lines 83 to 87?

      We have now substantially re-structured the entire Results section, and include the data that was discussed in lines 83 to 87 of the original manuscript, in Fig 1D of the revised manuscript.

      Table S1, instead of current Table 1, could be moved to main figures as it contains the important finding that the multiplexed assay may be more sensitive than the commercial one.

      As suggested, we have moved Table S1 to the main display items (now Table 1), and moved the original Table 1 to the supplementary items (now Table S3).

      The authors identified some samples that scored negative in commercial assays but positive in their new assay. This is important, however, the possibility of detecting false positives should be strengthened in a "Discussion" section.

      We thank the reviewer for highlighting this, and now discuss the issue of detecting false positives in more detail in the Discussion section of the revised manuscript:

      “RT-qPCR tests are molecular tests with high intrinsic accuracy, however false positive and false negative results can occur. The use of multiplex assays that detect multiple SARS-CoV-2 targets, such as those reported here, reduces the chance of both. Off-target reactivity is one possible cause of false positives, and although some have reported high false positive rates for the E gene assay [20, 22], this does not match our experience. In two patients, our N1E-RP and N2E-RP assays detected virus, albeit weakly, whereas commercial assays did not. As multiple SARS-CoV-2 targets were positive, these are likely true positive results and not due to off-target reactivity. False positives can also occur due to lab issues such as sample mislabelling, data entry errors, reagent contamination with target nucleic acids or contamination of primary specimens. However high standards of quality control at all stages of testing, and effective mitigation strategies should quickly identify problems. Additionally, sample re-test with an independent assay and/or patient re-sampling should also be effective measures to counter false positives, particularly in low pre-test probability situations such as mass screening.”

      Figures 1 to 3 have different panels which seem