2,183 Matching Annotations
  1. Last 7 days
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the reviewers

      We would like to thank the two reviewers for the valuable comments and suggestions on improvements. We addressed each reviewer’s comments individually. We have carefully revised the manuscript to incorporate new data and to make necessary clarifications.

      Overall we made the following major modifications:

      1. We investigated the relevance of BHRF1 expression in the context of EBV infection, in B cells and epithelial cells. We observed that EBV reactivation leads to MT hyperacetylation and subsequent mito-aggresome formation in both cell types. An EBV+ B cell line deficient for BHRF1 was generated and allowed us to demonstrate the involvement of BHRF1 in this phenotype. These results were added to Figures 2, 3 and Figure 1 – S1 in the revised version of the manuscript.
      2. We better characterized the mechanism leading to MT hyperacetylation, by demonstrating that BHRF1 colocalizes and interacts with the tubulin acetyltransferase ATAT1. These results were added to Figure 5 and Figure 5 – S2 in the revised manuscript.
      3. We generated stable HeLa cells KO for ATG5. Using these autophagy-deficient cells, we demonstrated the involvement of autophagy in BHRF1-induced MT hyperacetylation and mito-aggresome formation. We added these results to Figure 8 in the revised version of the manuscript.
      4. We compared the impact of BHRF1 with other mitophagy inducers on MT hyperacetylation, mitochondrial morphodynamics and the inhibition of IFN production, to demonstrate the specificity of the mechanism of action of BHRF1 (Figure 4 – S1).
      5. We demonstrated that MT hyperacetylation requires mitochondrial fission, using a Drp1-deficient HeLa cell line that we have previously described (Vilmen et al., 2020). This result was added to the revised version of the manuscript in Figure 3 – S2A. Moreover, we confirmed this result in the context of EBV infection (Figure 3 – S2B). ## Reviewer#1 Reviewer #1 (Evidence, reproducibility and clarity)

      Major comments:

      1. In the presented manuscript the authors characterize mainly BHRF1 overexpression in HeLa cells. Does BHRF1 also block type I IFN responses by microtubule hyperacetylation in the context of EBV infection? Do alpha-tubulin K40A overexpressing B cells produce more type I IFN after EBV infection?

      In the revised version of the manuscript, we added several experiments to explore the phenotype of BHRF1 during EBV infection, as requested by the two reviewers. Since EBV infects both B cells and epithelial cells, we used two different approaches. In latently-infected B cells, coming from Burkitt lymphoma (Akata cells), we induced EBV reactivation by anti-IgG treatment. To explore the importance of BHRF1 in this cell type, we constructed a cell line knocked down for BHRF1 expression, thanks to a lentivirus bearing an shRNA against BHRF1. In parallel, HEK293 cells harboring either EBV WT or EBV ΔBHRF1 genome were transfected with ZEBRA and Rta plasmids to induce the viral productive cycle in epithelial cells.

      We demonstrated that EBV infection induces MT hyperacetylation and subsequent mito-aggresome formation, both dependent on autophagy. Moreover, this phenotype requires BHRF1 expression in B cells and epithelial cells. We also observed that the expression of alpha-tubulin K40A in EBV+ epithelial cells blocks mito-aggresome formation induced by EBV reactivation. These results are now presented in Figures 2 and 3 in the revised version of the manuscript.

      Regarding regulation of IFN response during infection, several EBV-encoded proteins and non-coding RNAs have been described to interfere with the innate immune system. For example, BGLF4 and ZEBRA bind to IRF3 and IRF7, respectively, to block their nuclear activity (Hahn et al., 2005; Wang et al., 2009). Moreover, Rta expression decreases mRNA expression of IRF3 and IRF7 (Bentz et al., 2010; Zhu et al., 2014). We therefore think that studying the inhibitory role of BHRF1 on IFN response in the context of EBV reactivation will be arduous. Indeed, the lack of BHRF1 could be compensated by the activity of other viral proteins acting on innate immunity.

      1. The authors document that the observed microtubule hyperacetylation is due to the acetyltransferase ATAT1. How does BHRF1 activate ATAT1? Is there any direct interaction?

      As requested by reviewer#1, we explored a possible interaction of BHRF1 and ATAT1. First, we observed by confocal microscopy that GFP-ATAT1 colocalized with BHRF1 in the juxtanuclear region of HeLa cells (Figure 5 – S2). Second, we demonstrated by two co-immunoprecipitation assays that BHRF1 binds to exogenous ATAT1 (Figures 5E and 5F). These new results have been added to the revised version of the manuscript and clarify the mechanism of action of BHRF1.To go further, we explored whether BHRF1 was able to stabilize ATAT1 because it was recently reported that p27, an autophagy inducer that modulates MT acetylation, binds to and stabilizes ATAT1 (Nowosad et al., 2021). However, BHRF1 expression does not impact the expression of ATAT1 (data not shown).

      1. Furthermore, the authors demonstrate with pharmacological autophagy inhibitors that autophagy is increased in a BHRF1 dependent and microtubule acetylation independent manner but required for microtubule hyperacetylation. How does autophagy stimulate ATAT1 dependent microtubule hyperacetylation? Is this dependency also observed with a more specific ATG silencing or knock-out?

      We generated a stable autophagy-deficient HeLa cell line KO for ATG5, using an ATG5 CRISPR/Cas9 construct delivered by a lentivirus. The lack of ATG5 expression and LC3 lipidation was verified by immunoblot (Figure 8B). We observed that BHRF1 was unable to increase MT acetylation in this autophagy-deficient cell line (Figure 8C) in accordance with our data reported in the original manuscript using treatment with spautin 1 or 3-MA (previously Figure S5C and Figure 8A in the revised version). Moreover, the lack of hyperacetylated MT in BHRF1-expressing cells led to a dramatic reduction of mito-aggresome formation (Figures 8D and 8E). These new results demonstrate that autophagy is required for BHRF1-induced MT hyperacetylation.

      Minor comments:

      1. "Innate immunity" and "innate immune system", but not "innate immunity system" are in my opinion better wordings.

      We thank reviewer #1 for this useful comment. The term “innate immunity system” in the introduction section has been replaced by “innate immune system”. Elsewhere, we used “innate immunity”.

      1. The reader would benefit from a discussion on the role of type I IFNs during EBV infection and how important the authors think their new mechanism could be in this context.

      We thank the reviewer for this suggestion. However, we already discussed the different strategies developed by EBV to counteract IFN response induction, in our previous study, suggesting the importance of IFN in the control of EBV infection (Vilmen et al., 2020). In this study, we have focused the discussion on the role of mitophagy in the control of IFN production.

      Reviewer #1 (Significance):

      The significance of the described pathway for type I IFN production needs to be documented in the context of EBV infection.

      The revised version of the manuscript now explored the role of BHRF1 in the context of EBV infection See above for details (major comment 1).

      Reviewer#2

      Reviewer #2 (Evidence, reproducibility and clarity)

      The work presented is a relatively straightforward cell biological dissection of a subset of the previously described functions of BHRF1, focusing on the mitochondrial aggregation phenotype. The approaches and analysis are performed in cell lines mainly using overexpression and some siRNA experiments and appear well done throughout.

      We thank reviewer #2 for this comment and would like to underline that the revised version of the manuscript includes now a study of BHRF1 in the context of infection in both B cells and epithelial cells, the generation of a stable EBV positive B cells KD for BHRF1 by using shRNA approach and the generation of a stable autophagy-deficient cell line, using CRISPR/cas9 against ATG5.

      Reviewer #2 (Significance):

      The current study unpicks one of the phenotypes induced by BHRF1 over expression: namely the previously reported mitochondrial aggregation phenotype. The findings that peri-nuclear mitochondrial aggregation are dependent on microtubules and retrograde motors are useful but could perhaps have been predicted. Overexpression of many proteins (or indeed chemical treatments) causing cellular and / or mitochondrial stress have been shown to cause mitochondrial perinuclear aggregation.

      To explore the specificity of BHRF1 activity on mito-aggresome formation, we decided to investigate the impact of AMBRA1-ActA, a previously characterized mitophagy inducer, on MT (Strappazzon et al., 2015). We observed that expression of AMBRA1-ActA leads to mito-aggresome formation but does not modulate acetylation of MTs, contrary to BHRF1. This result was added to the revised version of the manuscript (Figure 4 - S1A and S1B). Moreover, chemical treatments with either oligomycin/antimycin or CCCP, which induce mitochondrial stress and mitophagy (Lazarou et al., 2015; Narendra et al., 2008), do not cause mitochondrial juxtanuclear aggregation (Figure 4 - S1C). We also observed that a hyperosmotic shock-induced by NaCl leads to MT hyperacetylation (Figure 4 - S1D) but not to the mito-aggresome formation (data not shown), suggesting that MT hyperacetylation per se is not sufficient to induce the clustering of mitochondria. Altogether, these new results demonstrated the originality of the mechanism used by BHRF1 to induce mito-aggresome formation.

      The findings linking the process to altered tubulin acetylation are more novel and interesting and may add a new dimension to understanding of BHRF1 function. However what is lacking here is really advancing our understanding of how BHRF1 does this.

      We thank the reviewer for underlining the fact that regulation of mitochondrial morphodynamics by BHRF1 via MT hyperacetylation is novel and interesting.

      In the original version of the manuscript, we have demonstrated that autophagy and ATAT1 are required for BHRF1-induced hyperacetylation. In the revised version, we uncovered that BHRF1 interacts and colocalizes with ATAT1 (Figures 5E, 5F and Figure 5 – S2). Moreover, we demonstrated that MT hyperacetylation is involved in the localization of autophagosomes next to the nucleus, thus close to the mito-aggresome. Therefore, we better characterized the mechanism of action of BHRF1 in the revised manuscript.

      Although some downstream processes are identified in the current and previous study it still remains unclear what the exact underlying mechanisms are. Is BHRF1 doing this by disrupting mitochondrial function and making the organelles sick or by causing cellular stress indirectly leading to mitochondrial pathology? Previous studies have shown that cellular stress such as altered proteostasis can also cause stress-induced mitochondrial retrograde trafficking and aggregation. Is BHRF1 causing the same phenotype by generally stressing the cell and if it is more specifically through mitochondrial disruption what is the mechanism? As demonstrated by the authors in their previous work, BHRF1 does a number of things to cell signalling. Which of these are leading to a general disruption of cell signalling versus having specific effects on the cell or mitochondria still seems somewhat unclear.

      We previously reported that BHRF1 expression does not alter the mitochondrial membrane potential (Vilmen et al., 2020). contrary to treatment by O/A or CCCP. Moreover, we observed that these treatments do not induce mitochondrial clustering (Figure 4 – S1). Therefore, BHRF1 modulates mitochondrial dynamics in a specific and regulated manner.

      Our study clearly demonstrated that BHRF1 uses an original strategy to modulate IFN response, via a regulated pathway of successive steps, from mitochondrial fission to mitophagy, via MT hyperacetylation, rather than “a general disruption of cell signalling”.

      It would be interesting to know whether the role of microtubule hyperacetylation and ATAT1 are more generally involved in other previously described processes of stress induced mitochondrial aggregation.

      In the revised version of the manuscript, we observed that AMBRA1-ActA does not change the level of MT acetylation, whereas it induces mito-aggresome formation. These data reinforce the originality of the BHRF1 mechanism.

      Currently while this is a nicely performed follow up study to their 2020 paper, the present study neither provides in depth mechanistic advance of BHRF1 function, nor a better understanding of the molecular steps in a more generally relevant pathway (e.g. mitophagy).

      We disagree with the reviewer’s comment. Indeed, in this new study, we uncovered and characterized a new mechanism of action for BHRF1 via ATAT1-dependent MT hyperacetylation. More generally, we reported for the first time that innate immunity can be regulated by the level of MT acetylation.

      In addition, all the experiments were performed in cell lines and rely on the overexpression of a viral protein. But this is a significant over-simplification of the viral pathological process. It therefore remains unclear how pathophysiologically relevant the findings are (e.g. to EBV pathology) without further extending this element of the work.

      To address this comment, we extended our results in the infectious context, by adding several experiments performed in EBV-infected cell lines (see above reviewer#1 for details). The same phenotype was observed after reactivation of the EBV productive cycle as in BHRF1 ectopic expression. Moreover, we demonstrated that the phenotype is BHRF1-dependent. This suggests the importance of BHRF1 in EBV pathogenesis by participating in innate immunity control.

      An additional minor issue is the authors naming of the process as Mito-aggresome formation. Although this might sound catchy it is somewhat unclear what the biological basis for this is. Aggresomes are defined structures that occur in cells during pathology and due to the peri-nuclear accumulation of misfolded protein. Since the process here is simply the description of aggregated mitochondria next to the nucleus but doesn't seem to have anything to do with protein misfolding it's really unclear how this labelling is helpful to the field. The process of perinuclear mitochondrial aggregation e.g. during mitochondrial stress or damage has been described many times before without the need for calling it a mito-aggresome. This term is likely to cause unhelpful confusion.

      We understand the comment of reviewer #2, but since 2010 the term “mito-aggresome” was previously used in other studies and refers to a clustering of mitochondria next to the nucleus, similarly to what we observed with BHRF1 (D’Acunzo et al., 2019; Lee et al., 2010; Springer and Kahle, 2011, 2011; Strappazzon et al., 2015; Van Humbeeck et al., 2011; Yang and Yang, 2011).

      However, we took into consideration the risk of confusion for the readers, by changing how we introduced the term “mito-aggresome” in the revised version of the manuscript (page 5 line 94).

      References

      Bentz GL, Liu R, Hahn AM, Shackelford J, Pagano JS. 2010. Epstein–Barr virus BRLF1 inhibits transcription of IRF3 and IRF7 and suppresses induction of interferon-β. Virology 402:121–128. doi:10.1016/j.virol.2010.03.014

      D’Acunzo P, Strappazzon F, Caruana I, Meneghetti G, Di Rita A, Simula L, Weber G, Del Bufalo F, Dalla Valle L, Campello S, Locatelli F, Cecconi F. 2019. Reversible induction of mitophagy by an optogenetic bimodular system. Nat Commun 10:1533. doi:10.1038/s41467-019-09487-1

      Hahn AM, Huye LE, Ning S, Webster-Cyriaque J, Pagano JS. 2005. Interferon regulatory factor 7 is negatively regulated by the Epstein-Barr virus immediate-early gene, BZLF-1. J Virol 79:10040–10052. doi:10.1128/JVI.79.15.10040-10052.2005

      Lazarou M, Sliter DA, Kane LA, Sarraf SA, Wang C, Burman JL, Sideris DP, Fogel AI, Youle RJ. 2015. The ubiquitin kinase PINK1 recruits autophagy receptors to induce mitophagy. Nature 524:309–314. doi:10.1038/nature14893

      Lee J-Y, Nagano Y, Taylor JP, Lim KL, Yao T-P. 2010. Disease-causing mutations in Parkin impair mitochondrial ubiquitination, aggregation, and HDAC6-dependent mitophagy. J Cell Biol 189:671–679. doi:10.1083/jcb.201001039

      Narendra DP, Tanaka A, Suen D-F, Youle RJ. 2008. Parkin is recruited selectively to impaired mitochondria and promotes their autophagy. J Cell Biol 183:795–803. doi:10.1083/jcb.200809125

      Nowosad A, Creff J, Jeannot P, Culerrier R, Codogno P, Manenti S, Nguyen L, Besson A. 2021. p27 controls autophagic vesicle trafficking in glucose-deprived cells via the regulation of ATAT1-mediated microtubule acetylation. Cell Death Dis 12:1–18. doi:10.1038/s41419-021-03759-9

      Springer W, Kahle PJ. 2011. Regulation of PINK1-Parkin-mediated mitophagy. Autophagy 7:266–278. doi:10.4161/auto.7.3.14348

      Strappazzon F, Nazio F, Corrado M, Cianfanelli V, Romagnoli A, Fimia GM, Campello S, Nardacci R, Piacentini M, Campanella M, Cecconi F. 2015. AMBRA1 is able to induce mitophagy via LC3 binding, regardless of PARKIN and p62/SQSTM1. Cell Death Differ 22:419–32. doi:10.1038/cdd.2014.139

      Van Humbeeck C, Cornelissen T, Hofkens H, Mandemakers W, Gevaert K, De Strooper B, Vandenberghe W. 2011. Parkin Interacts with Ambra1 to Induce Mitophagy. J Neurosci 31:10249–10261. doi:10.1523/JNEUROSCI.1917-11.2011

      Vilmen G, Glon D, Siracusano G, Lussignol M, Shao Z, Hernandez E, Perdiz D, Quignon F, Mouna L, Poüs C, Gruffat H, Maréchal V, Esclatine A. 2020. BHRF1, a BCL2 viral homolog, disturbs mitochondrial dynamics and stimulates mitophagy to dampen type I IFN induction. Autophagy 17:1296–1315. doi:10.1080/15548627.2020.1758416

      Wang J-T, Doong S-L, Teng S-C, Lee C-P, Tsai C-H, Chen M-R. 2009. Epstein-Barr Virus BGLF4 Kinase Suppresses the Interferon Regulatory Factor 3 Signaling Pathway. J Virol 83:1856–1869. doi:10.1128/JVI.01099-08

      Yang J-Y, Yang WY. 2011. Spatiotemporally controlled initiation of Parkin-mediated mitophagy within single cells. Autophagy 7:1230–1238. doi:10.4161/auto.7.10.16626

      Zhu L-H, Gao S, Jin R, Zhuang L-L, Jiang L, Qiu L-Z, Xu H-G, Zhou G-P. 2014. Repression of interferon regulatory factor 3 by the Epstein-Barr virus immediate-early protein Rta is mediated through E2F1 in HeLa cells. Mol Med Rep 9:1453–1459. doi:10.3892/mmr.2014.1957

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this study the authors continue on from previous recent work demonstrating that the Epstein Barr virus encoded protein BHRF1 causes a number of cellular effects including an impact on autophagy and disruption of mitochondrial dynamics including drp1-dependent mitochondrial fragmentation and mitochondrial peri-nuclear aggregation followed by enhanced Parkin-dependent mitochondrial turnover (mitophagy). In the current study the authors further extend this work by showing that mitochondrial aggregation (as one might predict) is dependent on the microtubule network and coupling to retrograde motors. They also demonstrate that mitochondrial aggregation is dependent on ATAT1 dependent tubulin hyperacetylation.

      Overall the work presented is a relatively straightforward cell biological dissection of a subset of the previously described functions of BHRF1, focusing on the mitochondrial aggregation phenotype. The approaches and analysis are performed in cell lines mainly using over expression and some siRNAi experiments and appear well done throughout.

      Significance

      The current study unpicks one of the phenotypes induced by BHRF1 over expression: namely the previously reported mitochondrial aggregation phenotype. The findings that peri-nuclear mitochondrial aggregation are dependent on microtubules and retrograde motors are useful but could perhaps have been predicted. Overexpression of many proteins (or indeed chemical treatments) causing cellular and / or mitochondrial stress have been shown to cause mitochondrial perinuclear aggregation. This process has often been previously reported to be dependent on retrograde (dynein-dependent) mitochondrial trafficking so finding the process is also required for BHRF1-dependent aggregation is a helpful addition but not in itself particularly impactful. The findings linking the process to altered tubulin acetylation are more novel and interesting and may add a new dimension to understanding of BHRF1 function. However what is lacking here is really advancing our understanding of how BHRF1 does this. Although some downstream processes are identified in the current and previous study it still remains unclear what the exact underlying mechanisms are. Is BHRF1 doing this by disrupting mitochondrial function and making the organelles sick or by causing cellular stress indirectly leading to mitochondrial pathology? Previous studies have shown that cellular stress such as altered proteostasis can also cause stress-induced mitochondrial retrograde trafficking and aggregation. Is BHRF1 causing the same phenotype by generally stressing the cell and if it is more specifically through mitochondrial disruption what is the mechanism? As demonstrated by the authors in their previous work, BHRF1 does a number of things to cell signalling. Which of these are leading to a general disruption of cell signalling versus having specific effects on the cell or mitochondria still seems somewhat unclear.

      It would be interesting to know whether the role of microtubule hyperacetylation and ATA1 are more generally involved in other previously described processes of stress induced mitochondrial aggregation. Currently while this is a nicely performed follow up study to their 2020 paper, the present study neither provides in depth mechanistic advance of BHRF1 function, nor a better understanding of the molecular steps in a more generally relevant pathway (e.g. mitophagy).

      In addition all the experiments were performed in cell lines and rely on the over expression of a viral protein. But this is a significant over-simplification of the viral pathological process. It therefore remains unclear how pathophysiologically relevant the findings are (e.g. to EBV pathology) without further extending this element of the work.

      A additional minor issue is the authors naming of the process as Mito-aggresome formation. Although this might sound catchy it is somewhat unclear what the biological basis for this is. Aggresomes are defined structures that occur in cells during pathology and due to the peri-nuclear accumulation of misfolded protein. Since the process here is simply the description of aggregated mitochondria next to the nucleus but doesn't seem to have anything to do with protein misfolding it's really unclear how this labelling is helpful to the field. The process of perinuclear mitochondrial aggregation e.g. during mitochondrial stress or damage has been described many times before without the need for calling it a mito-aggresome. This term is likely to cause unhelpful confusion.

      Referee Cross-commenting

      Reviewer 1 makes several good points.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Manuscript Nr.: RC-2021-00890 Glon et al., "Essential role of hyperacetylated microtubules in innate immunity escape orchestrated by the EBV-encoded BHRF1 protein"

      The authors demonstrate that overexpression of the early lytic Epstein Barr virus protein BHRF1 causes mitochondrial fission and aggregation of smaller mitochondria in the perinuclear area. This aggregation is dependent on microtubules that are hyperacetylated upon BHRF1 expression, and on dynein motors. Hyperacetylation is dependent on autophagy, but not required for BHRF1 induced autophagy. Expression of acetylation insensitive tubulin abolishes mitochondrial aggregation, but not fission upon BHRF1 expression. This mitochondrial aggregation is required for BHRF1 dependent inhibition of type I interferon (IFN) production and of IRF3 translocation into the nucleus. From these data the authors conclude that BHRF1 might compromise type I IFN production by microtubule acetylation dependent mitochondria aggregation in the perinuclear area.

      The presented study describes an interesting mechanism, but it remains unclear if it occurs and which role it plays during EBV infection.

      Major comments:

      1. In the presented manuscript the authors characterize mainly BHRF1 overexpression in HeLa cells. Does BHRF1 also block type I IFN responses by microtubule hyperacetylation in the context of EBV infection? Do alpha-tubulin K40A overexpressing B cells produce more type I IFN after EBV infection?
      2. The authors document that the observed microtubule hyperacetylation is due to the acetyltransferase ATAT1. How does BHRF1 activate ATAT1? Is there any direct interaction?
      3. Furthermore, the authors demonstrate with pharmacological autophagy inhibitors that autophagy is increased in a BHRF1 dependent and microtubule acetylation independent manner but required for microtubule hyperacetylation. How does autophagy stimulate ATAT1 dependent microtubule hyperacetylation? Is this dependency also observed with a more specific ATG silencing or knock-out?

      Minor comments:

      1. "Innate immunity" and "innate immune system", but not "innate immunity system" are in my opinion better wordings.
      2. The reader would benefit from a discussion on the role of type I IFNs during EBV infection and how important the authors think their new mechanism could be in this context.

      Significance

      The significance of the described pathway for type I IFN production needs to be documented in the context of EBV infection.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer’s comments

      We thank the three reviewers for their positive comments and constructive feedback. We have addressed the issues raised through additional experiments and text changes which have helped to improve the manuscript. Below, we address the specific points with detailed responses (reviewer comments are provided in italic).

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Rodriguez-Lopez et al describes the analysis of long intergenic non-coding RNA (lincRNA) function in fission yeast using both deletion and overexpression methods. The manuscript is very well presented and provides a wealth of lincRNA functional information for the field. This work is an important advance as there is still very little known about the function of lincRNAs in both normal and other conditions. An impressive array of conditions were assessed here. With a large scale analysis like this there is really not one specific conclusion. The authors conclude that lincRNAs exert their function in specific environmental or physiological conditions. This conclusion is not a novel conclusion, it has been proposed and shown before, but this manuscript provides the experimental proof of this concept on a large scale.

      The lincRNA knock-out library was assessed using a colony size screen, a colony viability screen and cell size and cell cycle analysis. Additionally, a lincRNA over-expression library was assessed by a colony size screen. These different functional analysis methods for lincRNAs were than carried out in a wide variety of conditions to provide a very large dataset for analysis. Overall, the presentation and analysis of the data was easy to follow and informative. Some points below could be addressed to improve the manuscript.

      There were 238 protein coding gene mutants assessed in parallel, to provide functional context, which was a very promising idea. But, unfortunately, the inclusion of 104 protein coding genes of unknown function restricted the use of the protein coding genes in the integrated analysis to connect lincRNAs to a known function using guilt by association.

      Reply: Yes, the unknown coding-gene mutants did certainly not help to provide functional context through guilt by association. These mutants were included to generate functional clues for the unknown proteins and compare phenotype hits with unknown lincRNA mutants. Nevertheless, because the known coding-gene mutants included broadly cover all high-level biological processes (GO slim), we could make several useful functional inferences for certain lincRNAs as discussed.

      The colony viability screen is not described well throughout the manuscript. Firstly, the use of phloxine B dye to determine cell viability needs to be described better when first introduced at the bottom of page 6. What exactly is this viability screen and red colour intensity indicating? Please define what the different levels of red a colony would indicate as far as viability. I assume an increase in red colour indicates more dead cells? So it is confusing that later the output of this assay is described as giving a resistant/sensitive phenotype or higher/lower viability. How can you get a higher viability from an assay that should only detect lower viability? Shouldn't this assay range from viable (no, or low red, colour) to increasing amounts of red indicating increasingly less viability? Figure 4D is also confusing with the "red" and "white" annotations. These should be changed to "lower viability" and "viable" or "not viable" and "viable".

      Reply: The colony-viability screen is described in detail in our recent paper (Kamrad et al, eLife 2020). We have now better explained how phloxine B works to determine cell viability (p. 6). The reviewer’s assumption is correct: an increase in red colour indicates more dead cells. However, all phenotypes reported are relative to wild-type cells under the same condition. Many conditions lead to a general increase in cell death, but some mutants show a lower increase in cell death compared to wild-type cells. These mutants, therefore, have a higher viability than wild-type cells, i.e. they are more resistant than wild-type under the given condition. We have tried to clarify this in the text, including the legend of Fig. 4. We agree that the ‘red’ and ‘white’ annotations in Fig. 4D could be confusing. We have now changed these to ‘low viability’ and ‘high viability’. Again, this is relative to wild-type cells.

      How are you sure that when generating the 113 lincRNA ectopic over-expression constructs by PCR that the sequences you cloned are correct? Simply checking for "correct insert size", as stated in the methods, is not really good practice and these constructs should be fully sequenced to be sure they contain the correct sequence and that constructs have not had mutations introduced by the PCR used for cloning. Without such sequence confirmation one cannot be completely confident that the data produced is specific for a lincRNA over-expression. Additionally, a selection of strains with the overexpression constructs should be tested by qRT-PCR and compared to a non-over-expressing strain to confirm lincRNA overexpression.

      Reply: To minimize errors during PCR amplification, we used the high-fidelity Phusion DNA polymerase which features an >50-fold lower error rate than Taq DNA Polymerase. We had confirmed the insert sequences for the first 17 lincRNAs cloned using Sanger sequencing (but did not report this in the manuscript). We have now checked additional inserts of the overexpression plasmids by Sanger sequencing in 96-well plate-format using a universal forward primer upstream of the cloning site. This high-troughput sequencing produced reliable sequence data for 80 inserts, including full insert sequences for 62 plasmids and the first ~900 bp of insert sequences for 18 plasmids). Of these, only the insert for SPNCRNA.601 showed a sequence error compared to the reference genome: T to C transition in position 559. This mutation could reflect either an error that occurred during cloning or a natural sequence variant among yeast strains (lincRNA sequences are much more variable than coding sequences). So, in general, the PCR cloning accurately preserved the sequence information. We have added this information in the Methods (p. 27-28). Please note that lincRNAs depend much less on primary nucleotide sequence than mRNAs, and a few nucleotide changes are highly unlikely to interfere with lincRNA function.

      Minor comments:

      Page 4, lines 19-20 - "A substantial portion of lincRNAs are actively translated (Duncan and Mata, 2014), raising the possibility that some of them act as small proteins." This sentence does not make sense, lincRNAs can't "act as" small proteins, they can only "code for" small proteins. Wording needs to be changed here.

      Reply: We agree and have changed the wording as suggested.

      Figure 1A is a nice representation but what are the grey dots? Are they all ncRNAs including lincRNAs? This needs to be stated in the legend.

      Reply: The grey dots represent all non-coding RNAs across the three S. pombe chromosomes as described by Atkinson et al., 2018. This has now been clarified in the legend.

      How many lincRNAs are there in total in pombe and what percentage did you delete? These numbers should be stated in the text.

      Reply: There are 1189 lincRNAs and we mutated ~12.6% of them. These numbers are now stated at the end of the Introduction, page 5.

      It would be nice if Supplementary Figure 1 included concentrations or amounts of the conditions used. This info is buried in a Supplementary table and would be better placed here.

      Reply: Supplemental Fig. 1 provides a simple overview for the different conditions and drugs used. For most stresses and drugs, we used multiple different doses. So the figure would become cluttered if we indicated all these concentrations, detracting from the main message. Colleagues who are interested in the different concentration ranges used for specific conditions can readily obtain this information from Supplemental Dataset 1. We have now added a statement in this respect to the legend of Supplemental Fig. 1

      Page 6, last sentence. What is a "biological repeat"? Three distinct deletion strains (ie three different deletion strains made by CRISPR) or one deletion strain used three times?

      Reply: Biological repeat means that one deletion strain was assayed three times independently, each with at least two colonies (technical repeats). In most cases, we had two or more independently generated deletion strains for each lincRNA (using the same or different gRNAs), and we performed at least three biological repeats for each strain. The numbers of independent strains for each lincRNA are provided in Supplemental Dataset 1 (sheet: lincRNA_metadata, column: n_independent_ko_mutants). The total numbers of repeats carried out for each condition after QC filtering are available in Supplemental Dataset 2 (columns: observation_count). We have clarified this on p. 7, and the details are now provided in the Methods on p. 28-29 (deletion mutants) and p. 32 (overexpression mutants).

      There is no mention in the manuscript of how other researchers can get access to the deletion strains and over-expression plasmids.

      Reply: As is usual, all strains and plasmids will be readily available upon request.

      Reviewer #1 (Significance (Required)):

      The production of lincRNA deletion strains and overexpression plasmids, and their analysis under an impressive number of conditions, provides key resources and data for the ncRNA field. This work complements nicely the analysis of protein coding gene deletion strains and provides the tools and data for future mechanistic studies of individual lincRNAs. This work would be of interest to the growing audience of ncRNA researchers in both yeast and other systems.

      Field of expertise:

      Yeast deletion strain construction and analysis, RNA functional analysis

      \*Referee Cross-commenting** *

      Reviewer #3 makes an important point that the stability of each lincRNA over expressed from plasmid is not known and therefore some lincRNAs may not be overexpressed as predicted. RT-qPCR would be required to assess lincRNA expression levels from the plasmids. It also appears that we both agree that it is important to determine the sequence of the cloned lincRNAs in the over expression plasmids.

      Reply: See reply in response to Reviewer 3.

      Reviewer #3 also makes an important point in his review that where it is predicted that a lincRNA deletion influences an adjacent gene in cis then the expression of that gene should be tested.

      Reply: See reply in response to Reviewer 3.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      \*Summary:** *

      The Rodriguez-Lopez manuscript from the Bahler lab present the phenotypical and functional profiling of lincRNA in fission yeast. This is the first large-scale, extensive work of this nature in this model organism and it therefore nicely complement the well-documented examples of lincRNA already reported in S.pombe.

      The work is very solid using seamless genome deletion and overexpression followed by colony-based assay in respone to a very wide set of conditions.

      \*Major comments:** *

      - considering that this is a descriptive work by nature and that the experiments were properly conducted as far as I can judge, I don't have major issues with this paper.

      To me the only thing that is missing is a gametogenesis assay, for two reasons: First, several reported cases of lincRNAs in pombe critically regulates meiosis, and second many of the analysed lincRNAs are upregulated durig meiosis. Figure 6B already points to three obvious candidates. I don't think it would take to much time to look at the deletion and OE in an h90 strain and see the effect of gametogenesis for the entire set or at least the 3 candidates from Figure 6.

      If the already broad set of lincRNAs implicated in meiosis would grow, this would be another evidence that eukaryotic cell differentiation relies on non-coding RNAs even in simpler models.

      Reply: We agree that this is a meaningful analysis to add. We have now deleted the three unstudied lincRNA genes, along with the meiRNA gene, from the sub-cluster of Figure 6B in the homothallic h90 background (to allow self-mating). We have analysed meiosis and spore viability of these four deletion strains together with a wild-type h90 control strain. These experiments indicate that cell mating is normal in the deletion mutants, but meiotic progression is somewhat delayed in SPNCRNA.1154, SPNCRNA.1530 and, most strongly, meiRNA mutants (the latter has been reported before (reviewed by Yamashita 2019). Notably, we detected significant reductions in spore viability for all four deletion mutants compared to the control strain. These results point to roles of SPNCRNA.1154, SPNCRNA.1530, and SPNCRNA.335 in meiotic differentiation, as predicted by the clustering analyses. This is a nice addition to the manuscript. We now report these results on p. 23, with a new Supplemental Figure 10, and describe the experimental procedures in the Methods (p. 34-35).

      \*Minor comments:** *

      - A reference to the recent work of the Rougemaille lab on mamRNA is necessary

      Reply: Yes, we now cite this reference in the Introduction (p. 4).

      - a discussion of the possibility to perfom large-scale genetic interactions searches (as done by Krogan for protein-coding genes) would add to the discussion of futue plans

      Reply: We have added a sentence about the potential of SGA screens in the Conclusions (p. 26).

      Reviewer #2 (Significance (Required)):

      The work unambigously shows that that most of the lincRNAs analyzed exert cellular functions in specific environmental or physiological contexts. This conclusion is critical because the biological relevance this so-called « dark matter » is still debated despite a few well-established cases. This is an important addition to the field and the deep phenotyping work already points to some directions to analyse some of these lincRNA in the context of cell cycle progression, metabolism or meiosis.

      \*Referee Cross-commenting** *

      - I agree with the issues raised by referees 1 and 3 but I am concerned about the added value of a RT-qPCR. First, this is a significant amout of work considering the large set of targets. Second a more importantly, what you ll end up with is a fold change. What will be considered as overexpression? Which threshold? This is why I prefer a biological read-out (a phenotype) because whatever the fold change, it tells us that there is an effect. It is very likely indeed that some targets are not overexpressed because of their rapid degradation. To me, this is the drawback of any large-scale studies.

      - Also, looking at the expression of the adjacent gene in the case of a cis-effect is interesting though this is likely condition-dependent (because most phenotypes appear in specific conditions). So, what would be the conclusion if there is no effect in classical rich media?

      - The sequence of the insert should be specified, I agree. Most likely, it is the sequence available from pombase (this is what I understood) but that should be clarified indeed.

      Reply: Yes, the sequences of the inserts are available from PomBase, and we provide the primer sequences used for cloning in the Supplemental Dataset 1. We have now clarified this in the Methods (p. 27).

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this work from the group of Jurg Bahler, the authors take advantage of the high throughput colony-based screen approach they recently developed (Kamrad et al, eLife 2020) to perform a functional profiling analysis on a subset of 150 lincRNAs in fission yeast. Using a seamless CRISPR/Cas9-based method, they created deletion mutants for 141 lincRNAs. In addition, the authors also generated strains ectopically overexpressing 113 lincRNAs from a plasmid (under the control of the strong and inducible nmt1 promoter).

      The viability and growth of all these mutants was then assessed across benign, nutrient, drug and stress conditions (149 conditions for the deletion mutants, 47 conditions for the overexpression). For the deletion mutants, the authors also assayed in parallel mutants of 238 protein-coding genes (PCGs) covering multiple biological processes and main GO classes.

      In benign conditions, deletion of 5 and 10 lincRNAs resulted in a reduced growth phenotype (rich and minimal medium, respectively). Morphological characterization by microscopy also revealed cell size defects for 6 lincRNA mutants (2 shorter, 4 longer). In addition, 27 mutants displayed phenotypes pointing defects in the cell cycle.

      Remarkably, the nutrient/drug/stress conditions revealed more phenotypes, with 60 of the 141 lincRNA mutants showing a growth phenotype in at least one condition, and 25 mutants showing a different viability compared to the wild-type in at least one condition.

      Also remarkable is the observation that 102/113 lincRNA overexpression strain displayed a growth phenotype in at least one condition, 14 lincRNAs showing phenotypes in more than 10 conditions.

      The clustering analyses performed by the authors also provide functional insight for some lincRNAs.

      Overall, this is an important study, well conducted and well presented. Together, the data described by the authors are convincing and highlight that most lincRNAs would function in very particular conditions, and that deletion/inactivation and overexpression are complementary approaches for the functional characterization of lncRNAs. This has been demonstrated here, in a very elegant manner.

      I think this manuscript will be acknowledged as a pioneer work in the field.

      \*A. Major comments** *

      - A.1. Are the key conclusions convincing?

      To my opinion, the key conclusions of this study are convincing.

      - A.2. Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      No. The authors are careful in their claims and conclusions.

      - A.3. Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      This study is based on systematic lincRNA deletion/overexpression.

      - For the deletion strains, I could not find any information about the control of the deletions. Are the authors sure that the targeted lincRNAs were indeed properly deleted?

      Reply: Yes, we had carefully checked the correctness of the deletions using several controls as described by Rodriguez-Lopez et al. 2017. All deletion strains were checked for missing open-reading frames by PCR. For 20 strains, we also sequenced across the deletion scars. We re-checked all strains by PCR after arraying them onto the 384 plates to ensure that no errors occurred during the process. We have now specified this in the Methods (p. 27).

      - For the overexpression, there is only a control of the insert size by PCR. Sanger sequencing would have been preferable to confirm that the targeted lincRNAs were properly cloned, without any mutation. In addition, the authors did not check that the lincRNAs were indeed overexpressed (at least in the benign conditions). Is the overexpression fold similar for all the lincRNAs? Do the 14 lincRNAs showing the most consistent phenotypes in at least 10 conditions display different expression levels than the other lincRNAs?

      - A.4. Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      - Validating the deletion strains requires genomic DNA extraction and then PCR. This is repetitive and tedious, but this control is important, I think. The time needed depends on the possibility of automating the process. I think this is feasible in this lab.

      - Controlling the insert sequence into the overexpression vector requires plasmid DNA (available as it was used for PCR) and one/several primer(s), depending on the insert size. The sequencing itself is usually done by platforms.

      - Analysing lincRNA overexpression at the RNA level requires yeast cultures, RNA extraction and then RT-qPCR. Again, the time needed depends on the possibility of automating the process.

      Reply: We have now checked most overexpression constructs by Sanger sequencing of the inserts as described in response to Reviewer 1. Moreover, we have tested the overexpression levels for eight selected overexpression constructs using RT-qPCR analysis. These eight constructs feature the entire range of associated phenotypes hits, including 3 lincRNAs with the highest number of phenotypes in 14 conditions, 3 with no phenotypes, and 2 with intermediate numbers of phenotypes. The RT-qPCR results show that the lincRNAs were 35- to 2200-fold overexpressed relative to the empty-vector control strain (which expresses the lincRNA at native levels). No clear pattern was evident between expression levels and phenotype hits, e.g. lincRNAs without phenotypes when overexpressed showed similar fold-changes as a lincRNA showing 13 phenotypes. We present these results on p. 21/22 and in the new Supplemental Figure 9A, and describe the experiment in the Methods (p. 28).

      As pointed out by Reviewer 2, these fold changes in expression are actually of limited value compared to the phenotype read-outs. The important result is that we detected phenotypes for over 90% of the overexpression strains, indicating that overexpression generally worked. Given that this is a large-scale study, there might be some lincRNA constructs that are faulty or are not overexpressed. It would not be realistic or meaningful to test all constructs. Any follow-on studies focusing on a specific lincRNAs will need to first validate the large-scale results as is common practice.

      - A.5. Are the data and the methods presented in such a way that they can be reproduced?

      The methods are clearly and extensively explained. If necessary, the reader can find more details about the high-throughput colony-based screen approach in the original paper (Kamrad et al, eLife 2020); a very interesting technical discussions can also be found in the reviewers reports and in the authors response published alongside.

      - A.6. Are the experiments adequately replicated and statistical analysis adequate?

      The experiments are replicated. However, I feel confused regarding the number of replicates used in each analysis.

      In the first part of the Results, it is mentioned that all colony-based phenotyping was performed in at least 3 independent replicates, with a median number of 9 repeats per lincRNAs. In the Methods section, I read that for the high-throughput microscopy and flow cytometry for cell-size and cell-cycle phenotypes, over 80% of the 110 lincRNA mutants screened for cellular phenotypes were assayed in at least 2 independent biological repeats. For the overexpression, I read that each strain was represented by at least 12 colonies across 3 different plates and experiments were repeated at least 3 times. Each condition was assayed in three independent biological repeats, together with control EMM2 plates, resulting in at least 36 data points per strain per condition.

      Perhaps I missed something. If not, could the authors clarify this? In addition, I suggest to indicate the number of replicates used for each lincRNA/condition/assay in Supplemental Dataset 2 (I could only find the information for the Flow Cytometry) and in Supplemental Dataset 6.

      Reply: For all colony-based phenotyping, we performed at least three biological repeats, meaning that the strains were assayed three times independently, each with at least two colonies (technical repeats). In most cases, we had two or more independently generated deletion strains for each lincRNA, and we performed at least three biological repeats for each strain (hence the higher median number of nine repeats per lincRNA). The numbers of independent deletion strains for each lincRNA are provided in Supplemental Dataset 1 (sheet: lincRNA_metadata, column: n_independent_ko_mutants). The total numbers of repeats carried out for each condition after QC filtering are available in Supplemental Dataset 2 (columns: observation_count). We have now clarified this on p. 6, and the details are provided in the Methods on p. 28-29 (for deletion mutants) and p. 32 (for overexpression mutants). For the high-throughput microscopy and flow cytometry experiments, we performed the repeats as described in the text.

      \*B. Minor comments** *

      - B.1. Specific experimental issues that are easily addressable.

      - The pattern of the SPNCRNA.1343 and SPNCRNA.989 mutants is consistent with the idea that these lincRNAs act in cis and that their deletion interferes with the expression of the adjacent tgp1 and atd1 genes, respectively. The authors could easily test by RT-qPCR or Northern Blot that the lincRNA deletion leads to the induction of the adjacent gene. Also, if the hypothesis of the authors is correct, the ectopic expression of these two lincRNAs in trans should not complement the phenotypes of the corresponding mutants. These experiments would reinforce the conclusion of the authors about the specific regulatory effect of the SPNCRNA.1343 and SPNCRNA.989 lincRNAs.

      Reply: It would actually not be as easy as suggested to obtain conclusive results in this respect. For SPNCRNA.1343 and its neighbour, atd1, the mechanisms involved have already been shown in detail based on several mechanistic studies (Ard et al., 2014; Ard and Allshire, 2016; Garg et al., 2018; Shah et al., 2014; 2014; Yague-Sanz et al., 2020). But these studies did require multiple precise genetic constructs and specialized approaches to interrogate the complex regulatory relationships between the overlapping transcripts which can be both positive and negative. As correctly pointed out by Reviewer 2, we do not know the particular conditions where any cis-regulatory interactions take place, and a negative result would not be conclusive. We have interrogated our RNA-seq data obtained under multiple genetic and environmental conditions (Atkinson et al. 2018) to analyse the regulatory relationship between SPNCRNA.1343 and atd1 (studied before) as well as SPNCRNA.989 and tgp1 (proposed in our manuscript). Depending on the specific conditions, both of these gene pairs show positive or negative correlations in expression levels. So it is not possible to just perform the easy experiment as suggested to reach a clear conclusion.

      - Is there any possibility that some nutrient/drug/stress conditions interfere with the expression from the nmt1 promoter?

      Reply: This seems unlikely as this widely used promoter is known to be specifically regulated by thiamine. Consistent with this, we actually detected phenotypes for over 90% of the overexpression strains. But we cannot exclude the possibility that some conditions might interfere with nmt1 function.

      - Supplemental Figure 7 refers to unpublished data from Maria Rodriguez-Lopez. Is this still allowed?

      Reply: These are just control RNA-seq data from wild-type cells growing in rich medium. It does not seem that meaningful, but if required we could submit these data to the European Nucleotide Archive (ENA).

      - Supplemental Figure 8 shows drop assays to validate the growth phenotypes revealed by the screen for lincRNAs of clusters 1 and 3. As admitted by the authors in the text, in most cases, the effects are quite difficult to see to the naked eye. Did the authors consider the possibility to use growth curves (for the lincRNAs/conditions they would like to highlight), which might be more appropriate to visualize weak effects?

      Reply: We have tried a few experiments in liquid medium using our BioLector microfermentor. However, the doses need to be substantially changed for liquid media (in which cells typically are more sensitive than on solid media). So the situation with the altered conditions would become too confusing and could not be used as a direct validation of our results from solid media.

      - B.2. Are prior studies referenced appropriately?

      Yes. The authors could have cited the work of Huber et al (2016) Cell Rep. (PMID: 27292640) as another pioneer study where systematic lncRNA deletion was performed, even if in this case, these were antisense lncRNAs.

      Reply: Agreed, we now cite this paper in the Introduction (p. 4).

      - B.3. Are the text and figures clear and accurate?

      Overall, I found the text and figures clear.

      Reviewer #3 (Significance (Required)):

      Eukaryotic genomes produce thousands of long non-coding RNAs, including lincRNAs which are expressed from intergenic regions and do not overlap PCGs. Several lincRNAs have been extensively studied and characterized, showing that they function in different cellular processes, such as regulation of gene expression, chromatin modification, etc. However, beside these well documented lincRNAs, the function of most lincRNAs remains elusive. In addition, under the standard growth conditions used in labs, many of them are expressed to very low levels, and for the few cases for which it has been tested, the deletion and/or overexpression in trans often failed to display in a detectable phenotype.

      High throughput approaches for lncRNA functional profiling are currently emerging. The lab of Jurg Bahler recently developed a high throughput colony-based screen approach enabling them to quantitatively assay the growth and viability of fission yeast mutants under multiple conditions (Kamrad et al, eLife 2020). Here, they take advantage of this approach to characterize mutants of 150 lincRNAs in fission yeast, including not only deletion mutants generated using the CRISPR/Cas9 technology, but also overexpression mutants, tested in 149 and 47 growth conditions, respectively. This systematic approach allowed the authors to reveal specific phenotypes for a large fraction of the lincRNAs, emphasizing the fact that they are likely to be functional in particular nutrient/drug/stress conditions, acting in cis but also in trans.

      As I wrote in the summary above, I think that this study is important and constitutes a significant contribution in the lncRNA field.

      My field of expertise: long non-coding RNAs, yeast, genetics.

      \*Referee Cross-commenting** *

      I can see that reviewer #1 and I have raised the same concerns about the lack of insert sequencing for the overexpression plasmids, which is crucial to control that the correct lincRNAs were cloned and that no mutation has been introduced by the PCR. We are also both asking for RT-qPCR controls to show that the lincRNAs are indeed overexpressed. Again, this control is very important as many long non-coding RNAs are rapidly degraded by the nuclear and/or ctyoplasmic RNA decay machineries. So expressing a lincRNA from a plasmid, under the control of a strong promoter, does not guarantee increased RNA levels.

      I see that reviewer #2 is asking for a gametogenesis assay. I think it should be limited to the 3 lincRNAs which belong to the same sub-cluster as meiRNA.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this work from the group of Jurg Bahler, the authors take advantage of the high throughput colony-based screen approach they recently developed (Kamrad et al, eLife 2020) to perform a functional profiling analysis on a subset of 150 lincRNAs in fission yeast. Using a seamless CRISPR/Cas9-based method, they created deletion mutants for 141 lincRNAs. In addition, the authors also generated strains ectopically overexpressing 113 lincRNAs from a plasmid (under the control of the strong and inducible nmt1 promoter).

      The viability and growth of all these mutants was then assessed across benign, nutrient, drug and stress conditions (149 conditions for the deletion mutants, 47 conditions for the overexpression). For the deletion mutants, the authors also assayed in parallel mutants of 238 protein-coding genes (PCGs) covering multiple biological processes and main GO classes. In benign conditions, deletion of 5 and 10 lincRNAs resulted in a reduced growth phenotype (rich and minimal medium, respectively). Morphological characterization by microscopy also revealed cell size defects for 6 lincRNA mutants (2 shorter, 4 longer). In addition, 27 mutants displayed phenotypes pointing defects in the cell cycle.

      Remarkably, the nutrient/drug/stress conditions revealed more phenotypes, with 60 of the 141 lincRNA mutants showing a growth phenotype in at least one condition, and 25 mutants showing a different viability compared to the wild-type in at least one condition. Also remarkable is the observation that 102/113 lincRNA overexpression strain displayed a growth phenotype in at least one condition, 14 lincRNAs showing phenotypes in more than 10 conditions.

      The clustering analyses performed by the authors also provide functional insight for some lincRNAs. Overall, this is an important study, well conducted and well presented. Together, the data described by the authors are convincing and highlight that most lincRNAs would function in very particular conditions, and that deletion/inactivation and overexpression are complementary approaches for the functional characterization of lncRNAs. This has been demonstrated here, in a very elegant manner. I think this manuscript will be acknowledged as a pioneer work in the field.

      A. Major comments

      • A.1. Are the key conclusions convincing? To my opinion, the key conclusions of this study are convincing.
      • A.2. Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? No. The authors are careful in their claims and conclusions.
      • A.3. Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      This study is based on systematic lincRNA deletion/overexpression.

      • For the deletion strains, I could not find any information about the control of the deletions. Are the authors sure that the targeted lincRNAs were indeed properly deleted?
      • For the overexpression, there is only a control of the insert size by PCR. Sanger sequencing would have been preferable to confirm that the targeted lincRNAs were properly cloned, without any mutation. In addition, the authors did not check that the lincRNAs were indeed overexpressed (at least in the benign conditions). Is the overexpression fold similar for all the lincRNAs? Do the 14 lincRNAs showing the most consistent phenotypes in at least 10 conditions display different expression levels than the other lincRNAs?
      • A.4. Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.
      • Validating the deletion strains requires genomic DNA extraction and then PCR. This is repetitive and tedious, but this control is important, I think. The time needed depends on the possibility of automating the process. I think this is feasible in this lab.
      • Controlling the insert sequence into the overexpression vector requires plasmid DNA (available as it was used for PCR) and one/several primer(s), depending on the insert size. The sequencing itself is usually done by platforms.
      • Analysing lincRNA overexpression at the RNA level requires yeast cultures, RNA extraction and then RT-qPCR. Again, the time needed depends on the possibility of automating the process.
      • A.5. Are the data and the methods presented in such a way that they can be reproduced? The methods are clearly and extensively explained. If necessary, the reader can find more details about the high-throughput colony-based screen approach in the original paper (Kamrad et al, eLife 2020); a very interesting technical discussions can also be found in the reviewers reports and in the authors response published alongside.
      • A.6. Are the experiments adequately replicated and statistical analysis adequate? The experiments are replicated. However, I feel confused regarding the number of replicates used in each analysis.

      In the first part of the Results, it is mentioned that all colony-based phenotyping was performed in at least 3 independent replicates, with a median number of 9 repeats per lincRNAs. In the Methods section, I read that for the high-throughput microscopy and flow cytometry for cell-size and cell-cycle phenotypes, over 80% of the 110 lincRNA mutants screened for cellular phenotypes were assayed in at least 2 independent biological repeats. For the overexpression, I read that each strain was represented by at least 12 colonies across 3 different plates and experiments were repeated at least 3 times. Each condition was assayed in three independent biological repeats, together with control EMM2 plates, resulting in at least 36 data points per strain per condition.

      Perhaps I missed something. If not, could the authors clarify this? In addition, I suggest to indicate the number of replicates used for each lincRNA/condition/assay in Supplemental Dataset 2 (I could only find the information for the Flow Cytometry) and in Supplemental Dataset 6.

      B. Minor comments

      • B.1. Specific experimental issues that are easily addressable.
      • The pattern of the SPNCRNA.1343 and SPNCRNA.989 mutants is consistent with the idea that these lincRNAs act in cis and that their deletion interferes with the expression of the adjacent tgp1 and atd1 genes, respectively. The authors could easily test by RT-qPCR or Northern Blot that the lincRNA deletion leads to the induction of the adjacent gene. Also, if the hypothesis of the authors is correct, the ectopic expression of these two lincRNAs in trans should not complement the phenotypes of the corresponding mutants. These experiments would reinforce the conclusion of the authors about the specific regulatory effect of the SPNCRNA.1343 and SPNCRNA.989 lincRNAs.
      • Is there any possibility that some nutrient/drug/stress conditions interfere with the expression from the nmt1 promoter?
      • Supplemental Figure 7 refers to unpublished data from Maria Rodriguez-Lopez. Is this still allowed?
      • Supplemental Figure 8 shows drop assays to validate the growth phenotypes revealed by the screen for lincRNAs of clusters 1 and 3. As admitted by the authors in the text, in most cases, the effects are quite difficult to see to the naked eye. Did the authors consider the possibility to use growth curves (for the lincRNAs/conditions they would like to highlight), which might be more appropriate to visualize weak effects?
      • B.2. Are prior studies referenced appropriately? Yes. The authors could have cited the work of Huber et al (2016) Cell Rep. (PMID: 27292640) as another pioneer study where systematic lncRNA deletion was performed, even if in this case, these were antisense lncRNAs.
      • B.3. Are the text and figures clear and accurate? Overall, I found the text and figures clear.

      Significance

      Eukaryotic genomes produce thousands of long non-coding RNAs, including lincRNAs which are expressed from intergenic regions and do not overlap PCGs. Several lincRNAs have been extensively studied and characterized, showing that they function in different cellular processes, such as regulation of gene expression, chromatin modification, etc. However, beside these well documented lincRNAs, the function of most lincRNAs remains elusive. In addition, under the standard growth conditions used in labs, many of them are expressed to very low levels, and for the few cases for which it has been tested, the deletion and/or overexpression in trans often failed to display in a detectable phenotype.

      High throughput approaches for lncRNA functional profiling are currently emerging. The lab of Jurg Bahler recently developed a high throughput colony-based screen approach enabling them to quantitatively assay the growth and viability of fission yeast mutants under multiple conditions (Kamrad et al, eLife 2020). Here, they take advantage of this approach to characterize mutants of 150 lincRNAs in fission yeast, including not only deletion mutants generated using the CRISPR/Cas9 technology, but also overexpression mutants, tested in 149 and 47 growth conditions, respectively. This systematic approach allowed the authors to reveal specific phenotypes for a large fraction of the lincRNAs, emphasizing the fact that they are likely to be functional in particular nutrient/drug/stress conditions, acting in cis but also in trans. As I wrote in the summary above, I think that this study is important and constitutes a significant contribution in the lncRNA field.

      My field of expertise: long non-coding RNAs, yeast, genetics.

      Referee Cross-commenting

      I can see that reviewer #1 and I have raised the same concerns about the lack of insert sequencing for the overexpression plasmids, which is crucial to control that the correct lincRNAs were cloned and that no mutation has been introduced by the PCR. We are also both asking for RT-qPCR controls to show that the lincRNAs are indeed overexpressed. Again, this control is very important as many long non-coding RNAs are rapidly degraded by the nuclear and/or ctyoplasmic RNA decay machineries. So expressing a lincRNA from a plasmid, under the control of a strong promoter, does not guarantee increased RNA levels.

      I see that reviewer #2 is asking for a gametogenesis assay. I think it should be limited to the 3 lincRNAs which belong to the same sub-cluster as meiRNA.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The Rodriguez-Lopez manuscript from the Bahler lab present the phenotypical and functional profiling of lincRNA in fission yeast. This is the first large-scale, extensive work of this nature in this model organism and it therefore nicely complement the well-documented examples of lincRNA already reported in S.pombe.

      The work is very solid using seamless genome deletion and overexpression followed by colony-based assay in respone to a very wide set of conditions.

      Major comments:

      • considering that this is a descriptive work by nature and that the experiments were properly conducted as far as I can judge, I don't have major issues with this paper. To me the only thing that is missing is a gametogenesis assay, for two reasons: First, several reported cases of lincRNAs in pombe critically regulates meiosis, and second many of the analysed lincRNAs are upregulated durig meiosis. Figure 6B already points to three obvious candidates. I don't think it would take to much time to look at the deletion and OE in an h90 strain and see the effect of gametogenesis for the entire set or at least the 3 candidates from Figure 6. If the already broad set of lincRNAs implicated in meiosis would grow, this would be another evidence that eukaryotic cell differentiation relies on non-coding RNAs even in simpler models.

      Minor comments:

      • A reference to the recent work of the Rougemaille lab on mamRNA is necessary
      • a discussion of the possibility to perfom large-scale genetic interactions searches (as done by Krogan for protein-coding genes) would add to the discussion of futue plans

      Significance

      The work unambigously shows that that most of the lincRNAs analyzed exert cellular functions in specific environmental or physiological contexts. This conclusion is critical because the biological relevance this so-called « dark matter » is still debated despite a few well-established cases. This is an important addition to the field and the deep phenotyping work already points to some directions to analyse some of these lincRNA in the context of cell cycle progression, metabolism or meiosis.

      Referee Cross-commenting

      • I agree with the issues raised by referees 1 and 3 but I am concerned about the added value of a RT-qPCR. First, this is a significant amout of work considering the large set of targets. Second a more importantly, what you ll end up with is a fold change. What will be considered as overexpression? Which threshold? This is why I prefer a biological read-out (a phenotype) because whatever the fold change, it tells us that there is an effect. It is very likely indeed that some targets are not overexpressed because of their rapid degradation. To me, this is the drawback of any large-scale studies.
      • Also, looking at the expression of the adjacent gene in the case of a cis-effect is interesting though this is likely condition-dependent (because most phenotypes appear in specific conditions). So, what would be the conclusion if there is no effect in classical rich media?
      • The sequence of the insert should be specified, I agree. Most likely, it is the sequence available from pombase (this is what I understood) but that should be clarified indeed.
    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript by Rodriguez-Lopez et al describes the analysis of long intergenic non-coding RNA (lincRNA) function in fission yeast using both deletion and overexpression methods. The manuscript is very well presented and provides a wealth of lincRNA functional information for the field. This work is an important advance as there is still very little known about the function of lincRNAs in both normal and other conditions. An impressive array of conditions were assessed here. With a large scale analysis like this there is really not one specific conclusion. The authors conclude that lincRNAs exert their function in specific environmental or physiological conditions. This conclusion is not a novel conclusion, it has been proposed and shown before, but this manuscript provides the experimental proof of this concept on a large scale.

      The lincRNA knock-out library was assessed using a colony size screen, a colony viability screen and cell size and cell cycle analysis. Additionally, a lincRNA over-expression library was assessed by a colony size screen. These different functional analysis methods for lincRNAs were than carried out in a wide variety of conditions to provide a very large dataset for analysis. Overall, the presentation and analysis of the data was easy to follow and informative. Some points below could be addressed to improve the manuscript.

      There were 238 protein coding gene mutants assessed in parallel, to provide functional context, which was a very promising idea. But, unfortunately, the inclusion of 104 protein coding genes of unknown function restricted the use of the protein coding genes in the integrated analysis to connect lincRNAs to a known function using guilt by association.

      The colony viability screen is not described well throughout the manuscript. Firstly, the use of phloxine B dye to determine cell viability needs to be described better when first introduced at the bottom of page 6. What exactly is this viability screen and red colour intensity indicating? Please define what the different levels of red a colony would indicate as far as viability. I assume an increase in red colour indicates more dead cells? So it is confusing that later the output of this assay is described as giving a resistant/sensitive phenotype or higher/lower viability. How can you get a higher viability from an assay that should only detect lower viability? Shouldn't this assay range from viable (no, or low red, colour) to increasing amounts of red indicating increasingly less viability? Figure 4D is also confusing with the "red" and "white" annotations. These should be changed to "lower viability" and "viable" or "not viable" and "viable".

      How are you sure that when generating the 113 lincRNA ectopic over-expression constructs by PCR that the sequences you cloned are correct? Simply checking for "correct insert size", as stated in the methods, is not really good practice and these constructs should be fully sequenced to be sure they contain the correct sequence and that constructs have not had mutations introduced by the PCR used for cloning. Without such sequence confirmation one cannot be completely confident that the data produced is specific for a lincRNA over-expression. Additionally, a selection of strains with the overexpression constructs should be tested by qRT-PCR and compared to a non-over-expressing strain to confirm lincRNA overexpression.

      Minor comments:

      Page 4, lines 19-20 - "A substantial portion of lincRNAs are actively translated (Duncan and Mata, 2014), raising the possibility that some of them act as small proteins." This sentence does not make sense, lincRNAs can't "act as" small proteins, they can only "code for" small proteins. Wording needs to be changed here.

      Figure 1A is a nice representation but what are the grey dots? Are they all ncRNAs including lincRNAs? This needs to be stated in the legend.

      How many lincRNAs are there in total in pombe and what percentage did you delete? These numbers should be stated in the text.

      It would be nice if Supplementary Figure 1 included concentrations or amounts of the conditions used. This info is buried in a Supplementary table and would be better placed here.

      Page 6, last sentence. What is a "biological repeat"? Three distinct deletion strains (ie three different deletion strains made by CRISPR) or one deletion strain used three times?

      There is no mention in the manuscript of how other researchers can get access to the deletion strains and over-expression plasmids.

      Significance

      The production of lincRNA deletion strains and overexpression plasmids, and their analysis under an impressive number of conditions, provides key resources and data for the ncRNA field. This work complements nicely the analysis of protein coding gene deletion strains and provides the tools and data for future mechanistic studies of individual lincRNAs. This work would be of interest to the growing audience of ncRNA researchers in both yeast and other systems.

      Field of expertise: Yeast deletion strain construction and analysis, RNA functional analysis

      Referee Cross-commenting

      Reviewer #3 makes an important point that the stability of each lincRNA over expressed from plasmid is not known and therefore some lincRNAs may not be overexpressed as predicted. RT-qPCR would be required to assess lincRNA expression levels from the plasmids. It also appears that we both agree that it is important to determine the sequence of the cloned lincRNAs in the over expression plasmids.

      Reviewer #3 also makes an important point in his review that where it is predicted that a lincRNA deletion influences an adjacent gene in cis then the expression of that gene should be tested.

  2. Nov 2021
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all reviewers for their thorough assessment and constructive comments.

      For clarity, their comments have been numbered.

      Reviewer #1

      Evidence, reproducibility and clarity:

      Summary:

      Acetylation/Deacetylation controls G1/s transition in budding yeast. The lysine acetyl transferase Esa1 is here shown to play a role, in part via acetylation of the nuclear pore complex basket component Nup60, which stimulates mRNA export.

      Major comments:

      1 • Figure 1C: The curve for esa1-ts in this figure and the curve in the supplementary figure S2B are not similar, while the first shows 10% cells budding after 60 minutes it is about 50% after 60 min in S2B. Another helpful way of presenting the data could be the length of the G1 phase (from cytokinesis to budding) in the WT, esa1-ts, gcn5delta cells over time.

      We thank the reviewer for pointing this out. Indeed, there is some day-to-day variability in the budding kinetics of the temperature-sensitive esa1 mutant, and the text referred to one individual experiment. Therefore, we have changed the text to better reflect the observed variability (p. 7) and added a graph (supplementary Figure S2C) including all individual replicates. This shows that in spite of small differences between experiments, esa1-ts cells always bud slower and less efficiently than wild-type cells. We note that the data cannot be shown in the way suggested (time from cytokinesis to budding, presumably from individual cells) because cells in these experiments were released from a G1 block (after cytokinesis), and samples from cell cultures were imaged at time intervals (and not single cells over time). Time-lapse data of single cells is shown in figure 2E.

      2 • What is the rational of creating the Nup60-KN mutation. Does it prevent acetylation of Nup60, at least by GCN5 and/or esa1?

      The biophysical properties of asparagine resemble those of acetylated lysine. Therefore, the Nup60-KN mutant (lysine 467 to asparagine) is expected to mimic acetylation of Nup60 K467, which was found to be acetylated in earlier studies. Supporting the conclusion that Nup60-KN is indeed an acetyl-mimic, the nup60-KN mutation partially rescues the Start and mRNA export defects on Esa1-deficient cells. We make the rationale of the Nup60-KN mutation clearer in the current version (p. 8).

      3 • Given the much stronger phenotype of the esa1-ts+GCN5 delta condition for G1/S transition as compared to esa1-ts and that GCN5 seems to strongly acetylate Nup60 I do not understand the sole focus on esa1 in the study. The fact that the Nup60-KN cells do not show G1/S transition under esa1-ts+GCN5 delta conditions in experiments presented in Fig. S3 argues that esa1 meaidted acetylation of Nup60 is only one, probably minor aspect of G1/S transition. This should be much balanced discussed.

      We focus on Esa1 because this allows us to dissect the specific role of Nup60 acetylation and mRNA export during the G1/S transition. Of course, Esa1-dependent acetylation of Nup60 is not the only process controlling the G1/S transition, which is regulated at several levels. For example, the concentration of multiple Start activators and inhibitors scales differentially with cell size (PMID: 26390151, 32246903). In addition, daughter-specific factors inhibit Start through a pathway parallel to Nup60 deacetylation (Ace2/Ash1-dependent repression of Cln3 transcription; PMID: 19841732, 19841732). We discuss these studies in the current version (p. 17).

      As for the relative contribution of Esa1 and Gcn5 to the G1/S transition and mRNA export: both of these KATs have overlapping roles in promoting transcription, probably through distinct substrates (such as histone H2 for Gcn5, H4 for Esa1) and this may contribute to their role in Start. Consistent with this, deletion of GCN5 causes a minor delay in transcription of G1/S genes (Kishkevich, Sci. Rep 2019). On the other hand, gnc5 mutants have no detectable mRNA export defects, unlike esa1-ts (our Figure 3E). This suggests that whereas Gcn5 and Esa1 may have overlapping roles in transcription of G1/S genes, Esa1 is more specifically involved in mRNA export. The ability of Nup60-KN to rescue the single mutant esa1 but not the double gcn5 esa1 is consistent with this view: the transcription defects in the double mutant may be so severe as to prevent Start even in the presence of Nup60-KN. We have modified the discussion to mention these points. In addition, we will investigate the transcription defects of esa1 and gcn5 single and double mutants to test this possibility and include the results in a revised version.

      4 • Suppl: Fig 2: I miss the hat1delta+gcn5delta condition.

      We will include the budding index of the hat1 gcn5 double mutant in a revised version.

      Minor comments:

      5 • Figure legend 2C "at least 200 cells were scored": please state number of replicates

      Figure 2C shows RT-qPCR data. The reviewer probably means figure 1C, which shows the budding index of one experiment comparing wild type, esa1, gcn5 and esa1 gcn5 strains. This experiment was repeated 3 times, as is now mentioned in the figure 1 legend.

      6 • Figure 2E: X axis "impor" should be corrected to "import"

      We have corrected this.

      7 • Would Mex67 and/or Mrt2 overexpression recue the esa1-ts and esa1-ts+GCN5 delta phenotype?

      We will include this experiment in a revised version.

      8 • Figure 4 A: The size of the daughter cells in the hos3delta condition seems smaller as compared to esa1-ts. Is this true and can you comment this? Is a premature onset of S phase observed here?

      Since Fig 4A features only wild type and hos3∆ cells, the reviewer is probably referring to esa1-ts cells shown in figure 4B. These two figure panels are not directly comparable: cells in 4A are freely cycling, whereas those in 4B were released from a mitotic arrest using nocodazole. The mitotic arrest was done in order to avoid potentially confounding effects due to inactivation of Esa1 during S phase. However, the arrest also causes daughter cells to grow larger, explaining the size differences pointed out by the reviewer. That being said, it is true that cell size and G1 duration are intimately linked and thus the reviewer question raises a relevant point. We previously showed that although hos3 daughter cells enter S phase prematurely, their size is not significantly different from wild type (Kumar et al., Figure 1d-g). Premature onset of S phase can lead to smaller cell size but this is not the case for hos3 cells, probably due to the slightly faster growth rate of the hos3∆ mutant relative to wild type specifically during S/G2/M phases (Kumar et al., Supplementary Fig. 1b).

      9 • Figure 4D: The still images in figure 2E and 4D do not correspond with the quantitation. E.g. in Fig 2E the esa1ts cells shows Whi5 export at t=81 min, which is according to the shown quantitation unusual late.

      We will modify Figures 2E-4D in a revised version to include cells that export Whi5 at times closer to the median.

      10 • Figure 4B: it is not clear why for the quantitation a different representation is chosen as compared to 4A. It would be better to show the nuclear intensities of mother/daughter as in Figure 4A.

      The reason for the different representation between figures 4A and 4B is that 4A depicts freely cycling cells and in 4B, cells were released from a nocodazole-induced mitotic arrest (as mentioned in our response to point 8). A mitotic arrest perturbs M/D size asymmetries, as daughter cells (but not mothers) continue growing during the arrest, leading to larger nuclear size. In addition, esa1-ts daughters are smaller than wt daughters in this condition, further complicating M/D asymmetries. We thought that in this case, a better metric for protein association with the NPC is the fluorescence intensity relative to a nuclear pore component. We agree that using different types of graphs is confusing, and therefore we have removed M/D comparisons from figure 4A and now represent these data as in figure 4B: the intensity of Sac3 relative to Nup49. Finally, a good control for these experiments is the quantification of total protein levels, which we have added for Sac3. We have also removed Mtr2-GFP data until our analysis of Mtr2 total levels is complete. We hope this simplifies this figure.

      11 • Figure 4D: To strengthen these results, it would be good to perform this assay with esa1-ts Nup60-KN cells as in figure 2a. The release of Whi5-GFP is expected to behave in a similar way to the WT. This would ensure that Nup60 acetylation is a pre-requisite for Whi5 release

      I’m afraid we don't understand this suggestion. Figure 4D shows time-lapse fluorescence microscopy of Whi5 nuclear export when Sac3 is recruited to the nuclear basket. Figure 2a shows western blots of Nup60 acetylation status. Therefore it is not clear how these two assays could be done in similar ways. Perhaps the reviewer refers to a different figure panel. The purpose of the suggested experiment, if we understand properly, is to test whether Nup60 acetylation is required for Whi5 export. This is the hypothesis tested in figure 2D: Whi5-GFP export is delayed in esa1-ts, and this delay is partially rescued in esa1-ts nup60-KN, which mimics acetylation. In fact, the advance in Whi5 export observed in Figure 4D upon Sac3 anchoring to NPC is similar to that observed in a nup60-KN (Figure 2E).

      12 • Page 13 "Finally, we tested whether Esa1 targets Sac3 to G1 nuclei": The effect of esa1 knockdown on Sac3 fit with the story line and the effect esa1 imposes on mRNA export. However targeting of Sac3 which is part of a bigger complex by esa1 is a misleading statement, given that you don't show a proof of direct interactions shown, e.g. by immunoprecipiations.

      We meant to say “we tested whether Esa1 function promotes the localisation of Sac3 to the nuclear basket”. We agree that it is unknown whether this involves direct interactions between Sac3 and Esa1. We have changed the text to make this point clearer.

      13 • Page 18: "Nevertheless, our findings suggest that mammalian nucleoporins may represent a novel category of substrates for KATs and for the multiprotein complexes in which these enzymes reside, with important roles in gene expression." Given that there is little experimental evidence this statement is for my taste too strong. Rather indicate that this is a possibility which needs to be tested...

      We have changed the text as suggested.

      14 • Page 3: "Nuclear pores are macromolecular assemblies composed of approximately 30-50 different Nucleoporins": it is rather approximately 30 different nucleoporins in the species so far analyzed.

      We have corrected this as suggested.

      Significance:

      The concept of acetylation/deacetylation regulation of G1/S transition in budding yeast is very appealing. The specific (and important) contribution of Esa1, especially in comparison to GCN5 and Hat1 remains unclear as well as its precise effect on Nup60. Clarifying this, also in a more balanced way of presentation of discussion, would be of interest for the field.

      My research centers around NPC function.

      Audience: experts in the nuclear structure/function fields and cell cycle regulation.

      A more detailed characterisation of the specific roles of Esa1, Gcn5 and Hat1 in the G1/S transition and mRNA export will be included in a revised version, as mentioned in our response to point 3.

      Reviewer #2

      Evidence, reproducibility and clarity:

      In this manuscript, Gomar-Alba et al. follow up on previous work from the lab that showed that the KDAC Hos3 is targeted to the bud neck and daughter cell nuclear pore complexes in budding yeast where it slows cell cycle progression by influencing gene positioning and nucleo-cytoplasmic transport. Overall, the current manuscript describes a well-conducted study that dissects the role of acetylation and deacetylation on Nup60 during the cell cycle using genetics and microscopy. The authors conclusively identify Esa1 as counteracting Hos3 in the nucleus (Figure 1) and show that part of their effect on cell cycle progression and gene expression is mediated by acetylation of Nup60 at K467 (Figure 2). They also demonstrate that this leads to a differential localization of several mRNA export factors and suggest that deacetylation of Nup60 blocks mRNA export in daughter cells. Although this work is overall carefully done, the last conclusion is still somewhat speculative.

      I have a number of minor suggestions to improve the manuscript, but only one major concern, which revolves around the role of chromatin tethering to NPCs. The authors have shown in their previous paper that this plays a role for CLN2 and it is known that active GAL1 interacts with the nuclear periphery, but in the current manuscript this aspect is largely disregarded although I think it could play a major role in the observed mRNA export phenotypes. Therefore, I think some additional experiments and controls as well as additional analysis are required to substantiate especially the results shown in figure 5.

      Major points:

      1) Figure 2: The authors claim that the mechanism by which Nup60 acetylation promotes cell cycle progression is the enhancement of mRNA export through the NPC. In Figure 2, the authors look at the expression levels of four candidate mRNAs which all show disturbed expression in esa1-ts which is not rescued by the nup60-KN mutation, but expression of the protein of one of these candidates (CLN2) is improved. In their previous paper, the same lab has shown that the CLN2 gene is tethered to the NPC in daughter cells with deacetylated Nup60 and that this is relieved in a Nup60 K467N mutant. I think it would be important here to investigate the protein levels of additional candidates that are not regulated at the level of gene localization. Is it a general effect that protein expression is higher in the nup60KN mutant?

      We agree this is an important point. To establish if Nup60-KN regulates only genes that interact with the NPC (such as CLN2), the reviewer suggests determining the cell cycle levels of proteins encoded by other G1/S genes that do not bind NPCs. The main problem with this approach is that with the exception of CLN2, the nuclear localisation of the (about 200) G1/S regulon genes is not yet known. In addition, establishing connections between mRNA and protein levels during the first cell cycle is only possible for short-lived proteins such as Cln2. For instance, amongst the G1/S genes shown in Figure 2, Cdc21 and Rnr1 have protein half-lives of 10 and 4 h, much longer than the 90-minute yeast cell cycle (PMID 25466257). We think a more direct approach to investigate the connection between gene position and mRNA synthesis / export would be to directly visualise the localisation of single mRNAs upon perturbation of the Nup60 acetylation pathway, using single mRNA labeling techniques (smFISH or PP7). We aim to do this for CLN2 and also for GAL1 (see point 2d of this reviewer). We will attempt these experiments for a revised version of our paper.

      2) Figure 5: In figure 5, the authors investigate the expression of a different inducible RNA (GAL1) to test whether the observed effect on mRNA export is more general. Since this is a crucial point for generalizing the finding, this data needs to be presented in a more convincing manner.

      2a. GAL1 is known to be tethered to the NPC upon transcription. Whether this tethering is affected by the Nup60-KN mutant is unclear, but since Nup60 has been implicated in GAL1 tethering in the literature, this possibility is not unlikely. GAL1 therefore becomes a similar case to CLN2, where it is difficult to disentangle effects directly due to mRNA export from the effects of gene tethering on mRNA transcription and processing. Therefore, this experiment should be repeated with a system that is independent of gene tethering. For example, induction of the GAL promoter via a b-estradiol inducible VP16 transactivator does not seem to induce tethering.

      This is an excellent idea. We are not aware of studies on the localisation of the GAL1 locus induced by a VP16 transactivator, but this was investigated for the HXK1 gene. This subtelomeric gene localises to NPCs in non-glucose carbon sources, and its localisation is perturbed by VP16 transactivation in glucose (PMID: 16760983). We will investigate whether the same is true for GAL1, and if so, perform the suggested experiments.

      2b. The activation kinetics in all mutants analyzed is very different from the wildtype. Therefore, the quantification made in Figure 5C is difficult to interpret. Therefore, it might be more fair to quantify for the mutant strains at an earlier timepoint after activation when the levels are similar to the levels in the wildtype strain. E.g. in the hos3d strain at around 250 min.

      This is a good point - indeed, persistent mother/daughter asymmetry in GAL1 expression in hos3 and nup60-KN mutants could be masked by saturated levels of GFP at late time points. An alternative way to test this is to determine the time of GAL1 induction in mother and daughter cells. We have done this in wild-type and hos3 mutant cells; our results indicate that GAL1 expression occurs first in wildt-type mothers and later in their daughters, whereas it is almost simultaneous in nup60-KN mother/daughter mutant pairs (as shown for a single M-D pair in the new figure 5A). In a revised version, we will include data of GAL1 expression for M-D pairs at different times after galactose addition for cells in figures 5C and 5E.

      2c. Similarly - although not as drastic - , in figure 5E, quantification should be done at a timepoint when the induction level is similar between DMSO and Rapamycin treated samples to make conclusions about differences between mother and daughter cell.

      We agree. See our response to the previous point.

      2d. The major claim of the paper is that mRNA export is inhibited by Nup60 deacetylation. In this figure, the mRNA levels need to be quantified to validate that it is not transcription that is affecting expression.

      We agree. In addition to regulating mRNA export (as suggested by the effect of Sac3 anchoring to NPCs) Nup60 deacetylation may also inhibit GAL1 transcription (directly, and/or indirectly via disruption of Gal1-based transcriptional feedback; PMID 23150580). To directly assess the role of Nup60 acetylation in GAL1 transcription and mRNA export, it would be ideal to determine the levels of GAL1 mRNA in both the nucleus and the cytoplasm, using smFISH and/or PP7 tools, in wild type and in mutants of the Nup60 acetylation pathway as we proposed to do for CLN2 (see our response to point 1 of this reviewer). These or equivalent experiments will be included in a revised version.

      3) The manuscript investigates in detail the effects of a KN mutant, however, a non-acetylatable mutant is not investigated. Is such a mutant viable?

      We have obtained a Nup60-KR mutant, which is predicted to behave as a non-acetylatable mimic, and it is viable. We will describe its phenotype in a revised version.

      Minor comments:

      4) Figure 2E: Is the rescue really specific to daughter cells? The dynamic range in the daughter cells is much higher due to the slower and more heterogenous timepoint of Whi5 export. However, zoom-in on the early timepoints after Whi5 import before the 30 min when 50% of the cells have exported Whi5, might reveal a significant increase of mother cells with shortened time to S phase entry. I suggest that the authors test this possibility. The cells shown in the image panels also suggest that the acetyl mimic might shorten mother cell time to S phase entry. If this is not the case, the authors might want to show a different example cell. Interestingly, it appears from the supplementary figure S5, that while Nup60 K647N partially rescues the export of Whi5, budding does not seem to be different to Nup60 wt. This appears to contradict the budding after alpha factor arrest shown in figure 2.

      We thank the reviewer for this suggestion. Indeed, zooming into the first 30 minutes shows a slight increase in the fraction of nup60-KN mother cells that export Whi5; however this change is not statistically significant when considering the entire cell population (p=0.6017, Mann-Whitney test). Therefore, we will replace the cell shown in figure 2E with a more representative example.

      As for figure S5, the reviewer is correct that in these experiments nup60-KN partially rescues Whi5 export (a marker of Start) but not budding (a downstream event), and this is indeed in variance with the experiment shown in figure 2B. Different experimental conditions may contribute to this apparent discrepancy: as noted in the text, the duration of G1 phase in cells synchronised with alpha factor is not directly comparable with that of freely cycling cells.

      5) Figure 3C: The authors use a truncated version of SAC3 for overexpression, since the full length is toxic (Figure S6A). I think it would be important to include this information in the main text.

      We agree, and have included this information in the main text.

      6) Figure 4B: Is there simply less Sac3 protein in the esa1-ts mutant? Although the authors address this question in figure S9, the very low expression levels of Sac3 may make this difficult to conclude from fluorescence quantification. A Western Blot would be an important control. The relative level of Sac3 still seems to be lower in esa1-ts daughter cells compared to mother cells, but no statistical test is shown.

      We are confident that the total Sac3-GFP levels are sufficient to make accurate comparisons, in both the nucleus and the entire cell. However, we will be happy to include western blot controls for Sac3 total levels in a revised version as the reviewer suggests. As for the levels of Sac3 in M vs D cells: Sac3 is indeed asymmetrically distributed in both wild-type and esa1-ts cells (p

      7) Analysis of mother daughter pairs (e.g. figure 5C): a paired t-test would be appropriate.

      We agree. Results do not change with this new analysis (in fact, p values are even lower for wild-type M-D pairs in figure 5C).

      8) Figure 5A: Can some representative mother-daughter pairs be shown as images for both wt and mutant in the timelapse? It is difficult to see in 5A whether there are any mother daughter pairs.

      We have modified the figure to include clearly identifiable mother-daughter pairs, as requested.

      9) Figure 4C: Please show image of localization of Sac3-GFP-FRB +/- rapamycin to the NPC.

      We have added this.

      Significance:

      This manuscript describes an important advance in understanding the role of non-histone protein modification on the regulation of cell cycle progression and gene expression. It is a logical follow-up on a previous paper from the lab (Kumar et al. 2018) and beautifully builds on this work. It is to my knowledge the first mechanistic description of regulation of nuclear pore complex function by a post-translational modification. This will therefore be a very interesting paper for anyone interested in nuclear pore complex regulation and biology, non-histone protein acetylation, asymmetric cell division, and cell cycle regulation.

      Reviewer #3

      Evidence, reproducibility and clarity:

      The pre-print is dedicated to mRNA export and G1/S transition control in mother and daughter cells of budding yeasts through acetylation/deacetylation of nuclear pore component Nup60 (hsNup153). In particular, authors found that Esa1(hsTip60/KAT5) acetylates the basket nucleoporin Nup60, and this event promotes recruitment of mRNA export factors to the nuclear basket and export of polyA RNA to the cytosol. This export event promotes entry of cells into S phase; in particular, Nup60 is deacetylated by histone deacetylase Hos3 that displaces mRNA export complexes from the NPC and inhibits Start specifically in daughter cells.

      The manuscript is a well-designed and well-written study.

      Please, see my major and minor suggestions below:

      Major comments:

      1. P4-5. "deacetylation of the nuclear basket nucleoporin Nup60 does not affect Whi5 nuclear accumulation". I was confused by this statement because, in the previous article Kumar et al., 2018, both main text and abstract have the following phase "nuclear basket and central channel nucleoporins establish daughter-cell-specific nuclear accumulation of the transcriptional repressor Whi5.." Could you please address this discrepancy?

      Thank you for pointing this out. We should have written: “deacetylation of Nup60 does not strongly affect Whi5 nuclear accumulation”. The Kumar et al. paper shows that deacetylation of central channel nucleoporins (such as Nup49) is important to increase accumulation of Whi5 in daughter cells, whereas deacetylation of the basket nucleoporin Nup60 plays a relatively minor role (see Kumar et al, Figure 7c). We have corrected this in the main text.

      Fig.2A: In addition to increased Nup60 acetylation, I noticed an overall increased level of Nup60 after overexpression of Esa1 and Gcn5. Is it a statistically significant increase in the Nup60 level? It is not mentioned in the main text or figure legend. Does the acetylation level of Nup60 influence its stability?

      We don’t know if acetylation of Nup60 affects its stability, although it is an intriguing possibility. Although it´s true that Nup60 levels in the IP fraction seem to increase upon Esa1 and Gcn5 overexpression, nuclear levels of Nup60-mCherry are similar in wild-type, hos3∆ and nup60-KN (Supplementary Figure S11A). Therefore it is unlikely that changes in Nup60 acetylation affect its stability. We have added this information to the text.

      Authors determined the mRNA level of four representative genes in esa1-ts and esa1-ts nup60-KN cultures.

      3a. Do authors know if Nu60-KN expression affects the perinuclear positioning of these transcripts?

      We did not investigate the localisation of individual transcripts in this study. However, as mentioned in our replies to reviewer 2, we propose to do so for the CLN2 and GAL1 mRNAs, in order to test directly the effect of Nup60 acetylation in the positioning of specific mRNAs.

      3b.I also suggest authors investigate if Nup60-KN affects other transcripts using the RNAseq approach. Nup60-KN might improve the transcription output of other transcripts and it will be interesting to know if these transcripts share similar features.

      We agree that investigating the impact of Nup60 acetylation in mRNA synthesis genome-wide is an exciting challenge. We speculate that Nup60-KN is likely to have some effect in transcription, either directly or indirectly through perturbation of feedback regulatory loops caused by mRNA export defects (for instance, transcription of both CLN2 and GAL1 is regulated by positive feedback). However we think that these experiments are beyond the scope of our study, which is focused on mRNA export.

      3c. Do authors know if GAL1pr:HOS3-NLS expression affects specifically G1-dependent transcripts?

      Answering this question would require RNA sequencing experiments. As mentioned in the previous point, we think these are beyond the scope of our study. That being said, it is likely that the Hos3-Nup60 pathway downregulates gene expression during G1, because Nup60 deacetylation is largely restricted to this phase. Note that this is not the same as regulating expression of the G1/S regulon specifically, because Hos3 also regulates GAL1 expression (Figure 5). We mention this important point in the discussion (p. 17).

      3d. Another interesting question will be to define if there is a group of transcripts that respond specifically to the status of Nup60 acetylation during G1/S transition. Is it possible to make ts-driven Nup60-KN expression to turn in ON/OFF? However, this question is beyond the scope of this paper.

      Thank you for this interesting suggestion. The proposed experiment is technically possible (for example, expression of Nup60-KN could be induced in G1 using a GAL1 promoter, followed by RNA sequencing). We agree that this is beyond the scope of our paper but would like to explore the question in future studies.

      1. Fig.2D It is not mentioned that Cln2 is not cycling anymore upon Nup60-KN overexpression.

      The Cln2 protein peaks at 30 minutes in this experiment, and is degraded at approximately 120 minutes. This corresponds to the slow, incomplete G1/S transition wave of the esa1-ts nup60-KN mutant, as indicated in the budding index at the bottom of the panel. We added this in the figure 2 legend. Note that Nup60-KN is not overexpressed, since the KN mutation is inserted in the endogenous gene under the control of its native promoter.

      Fig.2E. Arrows indicating Whi5 export timing do not match to the numbers in the main text. For example, yellow arrows indicate Whi5 export in wt strain at 30 and 78 min, but it is stated 15 and 59 min in the text. Also, do I understand right that Whi5-mCherry is not visible in the cytosol?

      See our reply to reviewer 2, point 4: we will replace the cell shown in figure 2E with a more representative example. As for Whi5-mCherry, it is visible in the cytoplasm but only weakly (since it is diluted into the larger cytoplasmic volume), and not at all in the images shown due to the overlay with the brightfield channel.

      Did the authors analyze where SAC3 and MTR2 are localized in hos3del, Nup60KN, and Esa-ts strains once their localization was affected in the nucleus? Is the overall level Sac3 level is affected in hos3del and Nup60KN strains?

      We have imaged the localisation of Sac3-GFP and Mtr2-GFP during the whole cycle using time-lapse microscopy. Our impression is that in wild type cells, their perinuclear levels increase during S phase in daughter cells, which mirrors the increase in Nup60 acetylation. In contrast, Sac3 and Mtr2 perinuclear levels seem more stable in hos3 and nup60-KN cells. We will include these analyses in a revised version. The total level of Sac3 is not affected, as shown in the updated figure 4; see our reply to reviewer 2, point 6.

      Fig4C. "Sac3-GFP-FRB partitioned equally to M and D nuclei, in the presence of Nup60-mCherry-FKBP and rapamycin (Figure 4C)." Sac3-GFP-FRB is slightly elevated in mother cells. Did you run a statistical test between the first and the third column on the box plot?

      Comparing the first and third columns in Fig 4C (Nup60 and Sac3 in control cells) shows that the mother cell accumulation is higher for Sac3 than for Nup60 (p

      P15. "GAL1 expression levels were higher in wild-type mother cells than in their daughter, and these differences were absent in cells lacking Hos3 or expressing Nup60KN". GAL1-10 promoter contains information necessary and sufficient for recruitment to the nuclear periphery (PMID: 27489341). I wonder if GAL1pr-driven transgenes of HOS3, spt10, hat1, and etc., contain DNA sequences sufficient for targeting genes to the nuclear periphery, and these genes are asymmetrically expressed in mother and daughter cells because of the presence of GAL1pr?

      We agree that these genes may be expressed at different levels in mother and daughter cells. We don’t think this asymmetric expression affects our conclusions. Indeed, the phenotypes scored (growth on plates) apply to the population and not to individual cells. The one exception is figure 3D, in which mRNA nuclear accumulation is scored in single cells. In this case, it remains possible that some of the variability observed corresponds to differences between mothers and daughters. In this case, our measurements could under-estimate the effect of Hos3-NLS in inhibition of mRNA export. However, since we cannot differentiate M and D cells in this experiment, we prefer not to speculate on this possibility in the text.

      Minor comments:

      1. Supplementary Fig. S1, it will be easy to read cell viability assays if 1A, S1A and S1B figures have the same orientation.

      We have changed the figure as suggested.

      Could you please clarify the difference between HOS3-NLS and GAL1pr:HOS3-NLS in the text of figure legend? P.33

      We have fixed this (figure 1 legend).

      P6. I recommend adding the following sentence to help clarity of the text: "To understand how NPC acetylation regulates the G1/S transition (Start), we sought to identify the lysine acetyl-transferases (KATs) counteracting the activity of the Hos3 deacetylase. Hos3 displays asymmetric distribution between mother and daughter cells in wild type Saccharomyces cerevisiae. Overexpression of a version of Hos3 fused to a nuclear localization signal (GAL1pr-HOS3-NLS) leads to targeting of Hos3 to mother and daughter cell nuclei, deacetylation of nucleoporins, and inhibition of cell proliferation (Kumar et al, 2018)."

      We thank the reviewer for this suggestion. This has been added.

      P8. Misspelling: Though Nup60 acetylation

      This has been fixed.

      FigS7. Description of polyA distribution is missing for single gcn5del strain.

      Thank you for pointing this out. This has been added.

      Misspelling: We conclude that Esa1 and Nup60 acetylation promotes Start, at least in part, by targeting Sac3 to the nuclear basket, where it mediates mRNA export.

      This has been fixed.

      Significance

      Authors of this pre-print overview and try to resolve a fundamental and not well-studied question about NPC acetylation status and S phase entry. This work is a logical extension of their previously published work (PMID: 29531309). However, this study for the first-time links status of NPC acetylation to mRNA export through lysine acetyl transferases. It will be interesting to address this question in mammalian cells considering interaction of basket nucleoporins with Tip60/KAT5 (PMID: 24302573).

      This work might be of interest to researchers investigating RNA export, transcription regulation, and nuclear pores.

      My fields of expertise are RNA export, nucleoporins, transcription regulation.

      I do not have expertise to evaluate yeast strains used in this study.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The pre-print is dedicated to mRNA export and G1/S transition control in mother and daughter cells of budding yeasts through acetylation/deacetylation of nuclear pore component Nup60 (hsNup153). In particular, authors found that Esa1(hsTip60/KAT5) acetylates the basket nucleoporin Nup60, and this event promotes recruitment of mRNA export factors to the nuclear basket and export of polyA RNA to the cytosol. This export event promotes entry of cells into S phase; in particular, Nup60 is deacetylated by histone deacetylase Hos3 that displaces mRNA export complexes from the NPC and inhibits Start specifically in daughter cells.

      The manuscript is a well-designed and well-written study.

      Please, see my major and minor suggestions below:

      Major comments:

      1. P4-5. "deacetylation of the nuclear basket nucleoporin Nup60 does not affect Whi5 nuclear accumulation". I was confused by this statement because, in the previous article Kumar et al., 2018, both main text and abstract have the following phase "nuclear basket and central channel nucleoporins establish daughter-cell-specific nuclear accumulation of the transcriptional repressor Whi5.." Could you please address this discrepancy?
      2. Fig.2A: In addition to increased Nup60 acetylation, I noticed an overall increased level of Nup60 after overexpression of Esa1 and Gcn5. Is it a statistically significant increase in the Nup60 level? It is not mentioned in the main text or figure legend. Does the acetylation level of Nup60 influence its stability?
      3. Authors determined the mRNA level of four representative genes in esa1-ts and esa1-ts nup60-KN cultures. Do authors know if Nu60-KN expression affects the perinuclear positioning of these transcripts? I also suggest authors investigate if Nup60-KN affects other transcripts using the RNAseq approach. Nup60-KN might improve the transcription output of other transcripts and it will be interesting to know if these transcripts share similar features. Do authors know if GAL1pr:HOS3-NLS expression affects specifically G1-dependent transcripts?

      Another interesting question will be to define if there is a group of transcripts that respond specifically to the status of Nup60 acetylation during G1/S transition. Is it possible to make ts-driven Nup60-KN expression to turn in ON/OFF? However, this question is beyond the scope of this paper.

      1. Fig.2D It is not mentioned that Cln2 is not cycling anymore upon Nup60-KN overexpression.
      2. Fig.2E. Arrows indicating Whi5 export timing do not match to the numbers in the main text. For example, yellow arrows indicate Whi5 export in wt strain at 30 and 78 min, but it is stated 15 and 59 min in the text. Also, do I understand right that Whi5-mCherry is not visible in the cytosol?
      3. Did the authors analyze where SAC3 and MTR2 are localized in hos3del, Nup60KN, and Esa-ts strains once their localization was affected in the nucleus? Is the overall level Sac3 level is affected in hos3del and Nup60KN strains?
      4. Fig4C. "Sac3-GFP-FRB partitioned equally to M and D nuclei, in the presence of Nup60-mCherry-FKBP and rapamycin (Figure 4C)." Sac3-GFP-FRB is slightly elevated in mother cells. Did you run a statistical test between the first and the third column on the box plot?
      5. P15. "GAL1 expression levels were higher in wild-type mother cells than in their daughter, and these differences were absent in cells lacking Hos3 or expressing Nup60KN". GAL1-10 promoter contains information necessary and sufficient for recruitment to the nuclear periphery (PMID: 27489341). I wonder if GAL1pr-driven transgenes of HOS3, spt10, hat1, and etc., contain DNA sequences sufficient for targeting genes to the nuclear periphery, and these genes are asymmetrically expressed in mother and daughter cells because of the presence of GAL1pr?

      Minor comments:

      1. Supplementary Fig. S1, it will be easy to read cell viability assays if 1A, S1A and S1B figures have the same orientation.
      2. Could you please clarify the difference between HOS3-NLS and GAL1pr:HOS3-NLS in the text of figure legend? P.33
      3. P6. I recommend adding the following sentence to help clarity of the text: "To understand how NPC acetylation regulates the G1/S transition (Start), we sought to identify the lysine acetyl-transferases (KATs) counteracting the activity of the Hos3 deacetylase. Hos3 displays asymmetric distribution between mother and daughter cells in wild type Saccharomyces cerevisiae. Overexpression of a version of Hos3 fused to a nuclear localization signal (GAL1pr-HOS3-NLS) leads to targeting of Hos3 to mother and daughter cell nuclei, deacetylation of nucleoporins, and inhibition of cell proliferation (Kumar et al, 2018)."
      4. P8. Misspelling: Though Nup60 acetylation
      5. FigS7. Description of polyA distribution is missing for single gcn5del strain.
      6. Misspelling: We conclude that Esa1 and Nup60 acetylation promotes Start, at least in part, by targeting Sac3 to the nuclear basket, where it mediates mRNA export.

      Significance

      Authors of this pre-print overview and try to resolve a fundamental and not well-studied question about NPC acetylation status and S phase entry. This work is a logical extension of their previously published work (PMID: 29531309). However, this study for the first-time links status of NPC acetylation to mRNA export through lysine acetyl transferases. It will be interesting to address this question in mammalian cells considering interaction of basket nucleoporins with Tip60/KAT5 (PMID: 24302573).

      This work might be of interest to researchers investigating RNA export, transcription regulation, and nuclear pores.

      My fields of expertise are RNA export, nucleoporins, transcription regulation.

      I do not have expertise to evaluate yeast strains used in this study.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Gomar-Alba et al. follow up on previous work from the lab that showed that the KDAC Hos3 is targeted to the bud neck and daughter cell nuclear pore complexes in budding yeast where it slows cell cycle progression by influencing gene positioning and nucleo-cytoplasmic transport. Overall, the current manuscript describes a well-conducted study that dissects the role of acetylation and deacetylation on Nup60 during the cell cycle using genetics and microscopy. The authors conclusively identify Esa1 as counteracting Hos3 in the nucleus (Figure 1) and show that part of their effect on cell cycle progression and gene expression is mediated by acetylation of Nup60 at K467 (Figure 2). They also demonstrate that this leads to a differential localization of several mRNA export factors and suggest that deacetylation of Nup60 blocks mRNA export in daughter cells. Although this work is overall carefully done, the last conclusion is still somewhat speculative.

      I have a number of minor suggestions to improve the manuscript, but only one major concern, which revolves around the role of chromatin tethering to NPCs. The authors have shown in their previous paper that this plays a role for CLN2 and it is known that active GAL1 interacts with the nuclear periphery, but in the current manuscript this aspect is largely disregarded although I think it could play a major role in the observed mRNA export phenotypes. Therefore, I think some additional experiments and controls as well as additional analysis are required to substantiate especially the results shown in figure 5.

      Major points:

      1) Figure 2: The authors claim that the mechanism by which Nup60 acetylation promotes cell cycle progression is the enhancement of mRNA export through the NPC. In Figure 2, the authors look at the expression levels of four candidate mRNAs which all show disturbed expression in esa1-ts which is not rescued by the nup60-KN mutation, but expression of the protein of one of these candidates (CLN2) is improved. In their previous paper, the same lab has shown that the CLN2 gene is tethered to the NPC in daughter cells with deacetylated Nup60 and that this is relieved in a Nup60 K467N mutant. I think it would be important here to investigate the protein levels of additional candidates that are not regulated at the level of gene localization. Is it a general effect that protein expression is higher in the nup60KN mutant?

      2) Figure 5: In figure 5, the authors investigate the expression of a different inducible RNA (GAL1) to test whether the observed effect on mRNA export is more general. Since this is a crucial point for generalizing the finding, this data needs to be presented in a more convincing manner.

      a. GAL1 is known to be tethered to the NPC upon transcription. Whether this tethering is affected by the Nup60-KN mutant is unclear, but since Nup60 has been implicated in GAL1 tethering in the literature, this possibility is not unlikely. GAL1 therefore becomes a similar case to CLN2, where it is difficult to disentangle effects directly due to mRNA export from the effects of gene tethering on mRNA transcription and processing. Therefore, this experiment should be repeated with a system that is independent of gene tethering. For example, induction of the GAL promoter via a b-estradiol inducible VP16 transactivator does not seem to induce tethering.

      b. The activation kinetics in all mutants analyzed is very different from the wildtype. Therefore, the quantification made in Figure 5C is difficult to interpret. Therefore, it might be more fair to quantify for the mutant strains at an earlier timepoint after activation when the levels are similar to the levels in the wildtype strain. E.g. in the hos3d strain at around 250 min.

      c. Similarly - although not as drastic - , in figure 5E, quantification should be done at a timepoint when the induction level is similar between DMSO and Rapamycin treated samples to make conclusions about differences between mother and daughter cell.

      d. The major claim of the paper is that mRNA export is inhibited by Nup60 deacetylation. In this figure, the mRNA levels need to be quantified to validate that it is not transcription that is affecting expression.

      3) The manuscript investigates in detail the effects of a KN mutant, however, a non-acetylatable mutant is not investigated. Is such a mutant viable?

      Minor comments:

      4) Figure 2E: Is the rescue really specific to daughter cells? The dynamic range in the daughter cells is much higher due to the slower and more heterogenous timepoint of Whi5 export. However, zoom-in on the early timepoints after Whi5 import before the 30 min when 50% of the cells have exported Whi5, might reveal a significant increase of mother cells with shortened time to S phase entry. I suggest that the authors test this possibility. The cells shown in the image panels also suggest that the acetyl mimic might shorten mother cell time to S phase entry. If this is not the case, the authors might want to show a different example cell. Interestingly, it appears from the supplementary figure S5, that while Nup60 K647N partially rescues the export of Whi5, budding does not seem to be different to Nup60 wt. This appears to contradict the budding after alpha factor arrest shown in figure 2.

      5) Figure 3C: The authors use a truncated version of SAC3 for overexpression, since the full length is toxic (Figure S6A). I think it would be important to include this information in the main text.

      6) Figure 4B: Is there simply less Sac3 protein in the esa1-ts mutant? Although the authors address this question in figure S9, the very low expression levels of Sac3 may make this difficult to conclude from fluorescence quantification. A Western Blot would be an important control. The relative level of Sac3 still seems to be lower in esa1-ts daughter cells compared to mother cells, but no statistical test is shown.

      7) Analysis of mother daughter pairs (e.g. figure 5C): a paired t-test would be appropriate.

      8) Figure 5A: Can some representative mother-daughter pairs be shown as images for both wt and mutant in the timelapse? It is difficult to see in 5A whether there are any mother daughter pairs.

      9) Figure 4C: Please show image of localization of Sac3-GFP-FRB +/- rapamycin to the NPC.

      Significance

      This manuscript describes an important advance in understanding the role of non-histone protein modification on the regulation of cell cycle progression and gene expression. It is a logical follow-up on a previous paper from the lab (Kumar et al. 2018) and beautifully builds on this work. It is to my knowledge the first mechanistic description of regulation of nuclear pore complex function by a post-translational modification. This will therefore be a very interesting paper for anyone interested in nuclear pore complex regulation and biology, non-histone protein acetylation, asymmetric cell division, and cell cycle regulation.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Acetylation/Deacetylation controls G1/s transition in budding yeast. The lysine acetyl transferase Esa1 is here shown to play a role, in part via acetylation of the nuclear pore complex basket component Nup60, which stimulates mRNA export.

      Major comments:

      • Figure 1C: The curve for esa1-ts in this figure and the curve in the supplementary figure S2B are not similar, while the first shows 10% cells budding after 60 minutes it is about 50% after 60 min in S2B. Another helpful way of presenting the data could be the length of the G1 phase (from cytokinesis to budding) in the WT, esa1-ts, gcn5delta cells over time.

      • What is the rational of creating the Nup60-KN mutation. Does it prevent acetylation of Nup60, at least by GCN5 and/or esa1?

      • Given the much stronger phenotype of the esa1-ts+GCN5 delta condition for G1/S transition as compared to esa1-ts and that GCN5 seems to strongly acetylate Nup60 I do not understand the sole focus on esa1 in the study. The fact that the Nup60-KN cells do not show G1/S transition under esa1-ts+GCN5 delta conditions in experiments presented in Fig. S3 argues that esa1 meaidted acetylation of Nup60 is only one, probably minor aspect of G1/S transition. This should be much balanced discussed.

      • Suppl: Fig 2: I miss the hat1delta+gcn5delta condition.

      Minor comments:

      • Figure legend 2C "at least 200 cells were scored": please state number of replicates

      • Figure 2E: X axis "impor" should be corrected to "import"

      • Would Mex67 and/or Mrt2 overexpression recue the esa1-ts and esa1-ts+GCN5 delta phenotype?

      • Figure 4 A: The size of the daughter cells in the hos3delta condition seems smaller as compared to esa1-ts. Is this true and can you comment this? Is a premature onset of S phase observed here?

      • Figure 4D: The still images in figure 2E and 4D do not correspond with the quantitation. E.g. in Fig 2E the esa1ts cells shows Whi5 export at t=81 min, which is according to the shown quantitation unusual late.

      • Figure 4B: it is not clear why for the quantitation a different representation is chosen as compared to 4A. It would be better to show the nuclear intensities of mother/daughter as in Figure 4A.

      • Figure 4D: To strengthen these results, it would be good to perform this assay with esa1-ts Nup60-KN cells as in figure 2a. The release of Whi5-GFP is expected to behave in a similar way to the WT. This would ensure that Nup60 acetylation is a pre-requisite for Whi5 release

      • Page 13 "Finally, we tested whether Esa1 targets Sac3 to G1 nuclei": The effect of esa1 knockdown on Sac3 fit with the story line and the effect esa1 imposes on mRNA export. However targeting of Sac3 which is part of a bigger complex by esa1 is a misleading statement, given that you don't show a proof of direct interactions shown, e.g. by immunoprecipiations.

      • Page 18: "Nevertheless, our findings suggest that mammalian nucleoporins may represent a novel category of substrates for KATs and for the multiprotein complexes in which these enzymes reside, with important roles in gene expression." Given that there is little experimental evidence this statement is for my taste too strong. Rather indicate that this is a possibility which needs to be tested...

      • Page 3: "Nuclear pores are macromolecular assemblies composed of approximately 30-50 different

      • Nucleoporins": it is rather approximately 30 different nucleoporins in the species so far analyzed.

      Significance

      The concept of acetylation/deacetylation regulation of G1/S transition in budding yeast is very appealing. The specific (and important) contribution of Esa1, especially in comparison to GCN5 and Hat1 remains unclear as well as its precise effect on Nup60. Clarifying this, also in a more balanced way of presentation of discussion, would be of interest for the field.

      My research centers around NPC function.

      Audience: experts in the nuclear structure/function fields and cell cycle regulation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      I found this an exceptionally impressive manuscript. The evolution of Y chromosomes has until recently been nearly impossible, and this research group have pioneered approaches that can yield reliable results in Drosophila. The study used an innovative heterochromatin-sensitive assembly pipeline on three D. simulans clade species, D. simulans, D. mauritiana and D. sechellia, which diverged less than 250 KYA, allowing comparisons with the group's previous results for the D. melanogaster Y.

      The study is both technically impressive and extremely interesting (an highly unusual combination). It includes a rich set of interesting results about these genome regions, and furthermore the results are discussed in a well-organised way, relating both to previous observations and to understanding of the genetics and evolution of Y chromosomes, illuminating all these aspects. It is a rare pleasure to read such a study. I believe that this study will inspire and be a model for future work on these chromosomes. It shows how these difficult genome regions can be studied.

      Thank you for the positive evaluation of our paper. While we did not make any specific revisions in response to these comments, we did attempt to improve the writing.

      **Major comments:**

      The conclusions are convincing. The methods are explained unusually clearly, and the reasoning from the results is convincing. When appropriate, the caveats, the caveats are clearly explained. The material is clearly organised and the questions studied are well related to the results. I had a few minor comments concerning the English. Even the figure (often a major problem to understand) are very clear and helpful, with proper explanations. I have very rarely read such a good manuscript, and almost never (in a long career) found a manuscript that could be published without revision being necessary.

      Thank you for pointing out that there were minor concerns with the English. We have carefully gone through the manuscript and fixed some minor issues with the writing. The analysis found 58 exons missed in previous assemblies (as well as all previously known exons of the 11 canonical Y-linked genes, which are present in at least one copy across the group). FISH on mitotic chromosomes using probes for 12 Y-linked sequences was used to determine the centromere locations, and to determine gene orders and relate them to the cytological chromosome bands, demonstrating changes in satellite distribution, gene order, and centromere positions between their Y chromosomes within the D. simulans clade species. It also confirmed previous results for Y-linked ribosomal DNA,genes, which are responsible for X-Y pairing in D. melanogaster males. Although 28S rDNA has been lost in D. simulans and D. sechellia (but not in D. mauritiana), the intergenic spacer (IGS) repeats between these repeats are retained on both sex chromosomes in all three species. Only sequencing can reliably reveal this, as their abundance is below the detection level by FISH in D. sechellia. The 11 canonical Y-linked genes' copy numbers vary between the species, and some duplicates are expressed and have complete open reading frames, and may therefore be functional because they, but most include only a subset of exons, often with duplicated exons flanking the the presumed functional gene copy. Mega-introns and Y-loops were found, as already seen in Drosophila species, but this new study detects turn overs in the ~2 million years separating D. melanogaster and the D. simulans clade. 49 independent duplications onto the Y chromosome were detected, including 8 not previously detected. At least half show no expression in testes, or lack open reading frames, so they are probably pseudogenes. Testis-expressed genes may be especially likely to duplicate into the Y chromosome due to its open chromatin structure and transcriptional activity during spermatogenesis, and indeed most of the new Y-linked genes in the species studied clade have likely functions in chromatin modification, cell division, and sexual reproduction. The study discovered two new gene families that have undergone amplification on D. simulans clade Y chromosomes, reaching very high copy numbers (36-146). Both these families appear to encode functional protein-coding genes and show high expression. The paper described intriguing results that illuminate Y chromosome evolution. First, SRPK, arose by an autosome-to-Y duplication of the sequence encoding the testis-specific isoform of the gene SR Protein Kinase (SRPK), after which the autosomal copy lost its testis-specific exon via a deletion. In D. melanogaster, SRPK is essential for both male and female reproduction, so the relocation of the testis-specific isoform to the Y chromosome in the D. simulans clade suggests that the change may have been advantageous by resolving sexual antagonism. The paper presents convincing evidence that the Y copy evolved under positive selection, and that gene amplification may confer advantageous increased expression in males. The second amplified gene family is also potentially related to an interesting function. Both X-linked and Y-linked duplicates are found of a gene called Ssl located on chromosome 2R. In D. simulans, the X-linked copies were previously known, and called CK2ßtes-like. In D. melanogaster, degenerated Y-linked copies are also found, with little or no expression, contrasting with complete open reading frames and high expression in the D. simulans clade species in testes, consistent with the possibility of an arms race between sex chromosome meiotic drive factors. Other interesting analyses document higher gene conversion rates compared to the other chromosomes, and evidence that these Y chromosomes may differ in the DNA-repair mechanisms (preferentially using MMEJ instead of NHEJ), perhaps contributing to their high rates of intrachromosomal duplication and structural rearrangements. The authors relate this to evidence for turnover of Y-linked satellite sequences, with the discovery of five new Y-linked satellites, whose locations were validated using FISH. The study also documented enrichment of LTR retrotransposons on the D. simulans clade Y chromosomes relative to the rest of the genome, together with turnovers between the species.

      Reviewer #1 (Significance (Required)):

      As described above, the advances are both, technical and conceptual for the field. The manuscript itself does an excellent job of placing the work in the context of the existing literature.

      • Anyone working on sex chromosomes and other non-recombining genome regions should be interested in the findings reported.

      • My field of expertise is the evolution of sex chromosomes, and the evolution of genome regions with suppressed recombination. I have experience of genomic analyses. I have less expertise in analyses of gene expression, but I understand enough about such approaches to evaluate the parts of this study that use them.

      Reviewer #2:

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript describes a thorough investigation of the Y-chromosomes of three very closely related Drosophila species (D. simulans, D. sechellia, and D. mauritiana) which in turn are closely related to D. melanogaster. The D. melanogaster Y was analysed in a previous paper by the same goup. The authors found an astonishing level of structural rearrangements (gene order, copy number, etc.), specially taking into account the short divergence time among the three species (~250 thousand years). They also suggest an explanation for this fast evolution: Y chromosome is haploid, and hence double-strand breaks cannot be repaired by homologous recombination. Instead, it must use the less precise mechanisms of NHEJ and MMEJ. They also provide circumstantial evidence that MMEJ (which is very prone to generate large rearrangements) is the preferred mechanism of repair. As far as I know this hypothesis is new, and fits nicely on the fast structural evolution described by the authors. Finally, the authors describe two intriguing Y-linked gene families in D. simulans (Lhk and CK2ßtes-Y), one of them similar to the Stellate / Suppressor of Stellate system of D. melanogaster, which seems to be evolving as part of a X-Y meiotic drive arms race. Overall, it is a very nice piece of work. I have four criticisms that, in my opinion, should be addressed before acceptance.

      Thank you for your positive comments. We respond to your concerns point-by-point below.

      The suggestion/conclusion that MMEJ is the preferential repair mechanism (over NHEJ) should be better supported and explained. At line 387, the authors stated "The pattern of excess large deletions is shared in the three D. simulans clade species Y chromosomes, but is not obvious in D. melanogaster (Fig 6B). However, because all D. melanogaster Y-linked indels in our analyses are from copies of a single pseudogene (CR43975), it is difficult to compare to the larger samples in the simulans clade species (duplicates from 16 genes). ". Given that D. melanogaster has many Y-linked pseudogenes (described by the authors and by other researchers, and listed in Table S6), there seems to be no reason to use a sample size of 1 in this species.

      We only used pseudogenes with large alignable regions (>300 bp) to prevent the potential bias toward small indels and increase our confidence in indel calling. As a result, we excluded most of the duplicates on the D. melanogaster Y chromosome. We now include 5 additional D. melanogaster Y-linked indels in the manuscript, however, the majority of indels in this species (36/41) are still from the same gene.

      Furthermore, given that D. melanogaster is THE model organism, it is the species that most likely will provide information to assess the "preferential MMEJ" hypothesis proposed by the authors.

      A previous paper has shown that male flies deficient in MMEJ have a strong bias toward female offspring (McKee et al. 2000), suggesting that MMEJ is necessary for successfully producing Y-bearing sperm, consistent with our hypothesis. We agree with the reviewer that careful genetic and cytological experiments in D. melanogaster could further clarify the role of MMEJ in the repair of Y-linked mutations. Even more revealing would be experiments using the simulans clade species, where we hypothesize the MMEJ bias is even more pronounced on the Y chromosome. We believe, however, that these experiments are beyond the scope of this study and should merit their own papers.

      Still on the suggestion/conclusion that MMEJ is the preferential repair mechanism (over NHEJ). Y chromosome in heterochromatic, haploid and non-recombining. In order to ascribe its mutational pattern to the haploid state (and the consequent impossibility of homologous recombination repair), the authors compared it to chromosome IV (the so called "dot chromosome"). This may not be the best choice: while chr IV lacks recombination in wild type flies, it is not typical heterochromatin. E.g., " results from genetic analyses, genomic studies, and biochemical investigations have revealed the dot chromosome to be unique, having a mixture of characteristics of euchromatin and of constitutive heterochromatin". Riddle and Elgin, FlyBook 2018 (https://doi.org/10.1534/genetics.118.301146). Given this, it seems appropriate to also compare the Y-linked pseudogenes with those from typical heterochromatin. In Drosophila, these are the regions around the centromeres ("centric heterochromatin"). There are pseudogenes there; e.g., the gene rolled is known to have partially duplicated exons.

      Thank you for the suggestion. We now include the data from pericentric heterochromatin and pseudogenes in supplemental data (see Fig 7). Both data types support our conclusion that indel size is only larger on Y chromosomes, which is consistent with the comparison between the dot chromosome and pericentric heterochromatin reported by Blumenstiel et al. 2002.

      In some passages of the ms there seems to be a confusion between new genes and pseudogenes, which should be corrected. For example, in line 261: "Most new Y-linked genes in D. melanogaster and the D. simulans clade have presumed functions in chromatin modification, cell division, and sexual reproduction (Table S7)".. Who are these "new genes"? If they are those listed in Table S6 (as other passages of the text suggest), most if not all of them are pseudogenes. If they are pseudogenes, it is not appropriate to refer to them as "new genes". The same ambiguity is present in line 263: "Y-linked duplicates of genes with these functions may be selectively beneficial, but a duplication bias could also contribute to this enrichment (...) " Pseudogenes can be selectively beneficial, but in very special cases (e.g.. gene regulation). If the authors are suggesting this, they must openly state this, and explain why. Pseudogenes are common in nearly all genomes, and should be clearly separated from genes (the later as a shortcut for functional genes). The bar for "genes" is much higher than simple sequence similarity, including expression, evidences of purifying selecion, etc., as the authors themselves applied for the two gene families they identified in D. simulans (Lhk and CK2ßtes-Y)

      Thank you for the suggestion. We now state our criteria for calling genes based on the expression and long CDS and correct the sentences that the reviewer refers to. The protein evolution rates of many Y-linked duplicates were surveyed in Tobler et al. 2017, who found that most are not under strong purifying selection. Our study supports this previous report. We think that protein evolution rate alone may not be a good indicator for functionality. Our current study does not focus on the potential function of these genes, and we think further population studies are required to get a solid conclusion. We changed the text to clarify this point: “Most new Y-linked duplications in D. melanogaster and the D. simulans clade are from genes with presumed functions in chromatin modification, cell division, and sexual reproduction (Table S7), consistent with other Drosophila species [17, 77].” (p15 L281-284)

      The authors center their analysis on "11 canonical Y-linked genes conserved across the melanogaster group ". Why did they exclude the CG41561 gene, identified by Mahajan & Bachtrog (2017) in D. melanogaster? Given that most D. melanogaster Y-linked genes were acquired before the split from the D. simulans clade (Koerich et al Nature 2008), the same most likely is true for CG41561 (i.e., it would be Y-linked in the D. simulans clade). Indeed, computational analysis gave a strong signal of Y-linkage in D. yakuba (unpublished; I have not looked in the other species). If CG41561 is Y-linked in the simulans clade, it should be included in the present paper, for the only difference between it and the remaining "canonical genes" was that it was found later. Finally, the proper citation of the "11 canonical Y-linked genes" is Gepner and Hays PNAS 1993 and Carvalho, Koerich and Clark TIG 2009 (or the primary papers), instead of ref #55.

      Thank you for the suggestion. CG41561 is indeed a relatively young Y-linked gene because it’s not Y-linked in D. ananassae (Muller’s element E). We already have CG41561 in Table S6 and we think that it is reasonable to separate a young Y-linked gene from the others. We also fixed the reference as suggested (p5 L116).

      Other points/comments/suggestions:

      1. a) Possible reference mistake: line 88 "For example, 20-40% of D. melanogaster Y-linked regulatory variation (YRV) comes from differences in ribosomal DNA (rDNA) copy numbers [52, 53]." reference #53 is a mouse study, not Drosophila. Thank you for pointing out this error, we fixed the reference (p4 L91).

      2. b) Possible reference mistake: line 208 "and the genes/introns that produce Y-loops differs among species [75]". ref #75 is a paper on the D. pseudoobscura Y. Is it what the authors intended? Yes, our previous paper (ref 75) found that Y-loops do not originate from the kl-3, kl-5, and ORY genes in D. pseudoobscura because they don’t have large introns in this species.

      c) line 113. "We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, including 58 exons missed in previous assemblies (Table S1; [55])." Please show in the Table S1 which exons were missing in the previous assemblies. I guess that most if not all of these missing exons are duplicate exons (and many are likely to be pseudogenes). If they indeed are duplicate exons, the authors should made it clear in the main text, e.g., "We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, plus 58 duplicated exons missed in previous assemblies."

      Thank you for the suggestion. However, the 58 exons did not include the duplicated exons. We are similarly surprised how much we will miss if we don’t assemble the Y chromosome carefully. We now mark these exons in red in Table S1 to make this point clearer.

      d) line 116 "Based on the median male-to-female coverage [22], we assigned 13.7 to 18.9 Mb of Y-linked sequences per species with N50 ranging from 0.6 to 1.2 Mb." The method (or a very similar one) was developed by Hall et al BMC Genomics 2013, which should be cited in this context. e) line 118: "We evaluated our methods by comparing our assignments for every 10-kb window of assembled sequences to its known chromosomal location. Our assignments have 96, 98, and 99% sensitivity and 5, 0, and 3% false-positive rates in D. mauritiana, D. simulans, and D. sechellia, respectively (Table S2). The procedure is unclear. Why break the contigs in 10kb intervals, instead of treating each as an unity, assignable to Y, X or A? The later is the usual procedure in computational identification of suspect Y-linked contigs (Carvalho and lark Gen Res 2013; Hall et al BMC Genomics 2013). The only reason I can think for analyzing the contigs piecewise is a suspicion of misassemblies. If this is the case, I think it is better to explain.

      Thank you for the suggestion. We did not break the contigs into 10kb intervals when we assigned the Y-linked contigs. As you suspect, our motivation for evaluating our methods and analyzing the contigs in 10kb intervals was to detect possible misassemblies. We rewrote the sentence to make this point clearer (p6 L129-132).

      1. f) Fig. 1. It may be interesting to put a version of Fig 1 in the SI containing only the genes and the lines connecting them among species, so we can better see the inversions etc. (like the cover of Genetics , based on the paper by Schaeffer et al 2008). Thank you for the suggestion. We would like to make a figure like that fantastic cover image you refer to, but the repetitive nature of the Y chromosome makes it difficult to illustrate rearrangements based on alignments at the contig-level. We instead opted to update Figure 1 to better highlight the rearrangements, still based on the unique protein-coding genes which are supported by the FISH experiments.

      2. g) Table S6 (Y-linked pseudogenes). Several pseudogenes listed as new have been studied in detail before: vig2, Mocs2, Clbn, Bili (Carvalho et al PNAS2015) Pka-R1, CG3618, Mst77F (Russel and Kaiser Genetics 1993; Krsticevic et al G3 2015) . Note also that at least two are functional (the vig2 duplication and some Mst77 duplications). Thank you for the suggestion. We now include a column to indicate the potential function of Y-linked duplicates (see Table S6).

      h) line 421: "one new satellite, (AAACAT)n, originated from a DM412B transposable element, which has three tandem copies of AAACAT in its long terminal repeats." The birth of satellites from TEs has been observed before, and should be cited here. Dias et al GBE 6: 1302-1313, 2014.

      Thank you for the suggestion. We now include a sentence to cite this reference (p27 L467-468).

      1. i) Fig S2 shows that the coverage of PacBio reads is smaller than expected on the Y chromosome. Any explanation? This has been noticed before in D. melanogaster, and tentatively attributed to the CsCl gradient used in the DNA purification (Carvalho et al GenRes 2016). However, it seems that the CsCl DNA purification method was not used in the simulans clade species (is it correct?). Please explain the ms, or in the SI. The issue is relevant because PacBio sequencing is widely believed to be unbiased in relation to DNA sequence composition (e.g., Ross et al Genome Biol 2013). Yes, we used Qiagen's Blood and Cell Culture DNA Midi Kit for DNA extraction. We suspect that the underrepresentation of Y-linked reads is driven by the presence of endoreplicated tissue in adults. Heterochromatin is underreplicated in endoreplicated cells, and thus there may simply be less heterochromatin in these tissues. Consistent with this idea, we find that all heterochromatin seems to be underrepresented in the reads, not just the Y chromosome (see Chakraborty et al. 2021; Flynn et al. 2020). We now include this discussion in the SI of our paper (see supplementary text p75).

      2. j) I may have missed it, but in which public repository have the assemblies been deposited? We link to the assemblies in Github (https://github.com/LarracuenteLab/simclade_Y) and they will also be in the Dryad Digital Repository (doi forthcoming).

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Due to suppressed recombination, Y chromosomes have degenerated, undergone extensive structural rearrangements, and accumulated ampliconic gene families across species. The molecular processes and selective pressures guiding dynamic Y chromosome evolution are not well understood. In this study, Chang et al. generate updated Y assemblies of three closely related species in the D. simulans complex using long-read PacBio sequencing in combination with FISH. Despite having diverged only 250,00 years ago, the authors find structural rearrangements, two newly amplified gene families and evidence of positive selection across D. simulans. The authors also suggest the high level of Y duplications and deletions may be mediated by MMEJ biased repair.

      The authors generated a valuable resource for the study of Y-chromosome evolution in Drosophila and describe Y chromosome evolution patterns found in previous Y chromosome sequencing studies, such as newly amplified genes, positive selection, and structural rearrangements. The authors improvements to the Drosophila simulans clade Y chromosomes are commended, as assembly of the highly repetitive Y chromosome sequences is challenging. However, the manuscript is largely descriptive, the claims are largely speculative, and lacks a clear question. There are also a number of concerns with the text and figures (see below concerns). Overall, the manuscript would be significantly improved if the authors focused on a specific question as opposed to a survey of sequence features of the Y chromosome. For example, development of the idea that MMEJ is the primary mechanism for loss of Y chromosome sequence could be nice new twist.

      Our aim is to discover and understand the many different factors and processes that shape the evolution of Y chromosome organization and function. Because these Y chromosomes were largely unassembled, we needed to first generate the sequence assembly before we could ask specific questions. We prefer not to focus the manuscript solely on one specific topic such as MMEJ repair, as our other observations and analyses may be interesting to a wide range of scientists studying topics other than mutation and DNA repair. We are therefore choosing to present the more comprehensive story about Y chromosome evolution that we included in our original manuscript.

      We also respectfully disagree with the comment that our paper is just a descriptive survey of Y chromosomal sequence features. On the contrary, we present thorough evolutionary analyses to test hypotheses about the forces shaping the evolution of Y chromosome organization and Y-linked genes. Specifically, we use molecular evolution and phylogenetic and comparative genomics approaches to show that multi-copy gene families experience rampant gene conversion and positive selection. We posit that one simulans clade-specific Y-linked gene family has undergone subfunctionalization, potentially resolving sexual conflict, and another may be involved in meiotic drive. We also use evolutionary genomic approaches to show that the distribution of Y-linked mutations indeed suggests that Y chromosomes disproportionately use MMEJ and we propose that this unique feature may shape the evolution of Y chromosome structural organization. This is, as far as we know, a novel hypothesis. We think that follow-up studies of either hypothesis merit different papers.

      **Major concerns:**

      1. Title: The authors use "unique structure" in the title, which is a vague point. Are not Y chromosomes, or any chromosome, "unique" in some manner? Also are there not more evolutionary processes governing the rapid divergence of the Y's. Thank you for raising your concern. We believe that we are justified in referring to the Y chromosome as unique among all other chromosomes in its structural properties (e.g. combination of its hemizygosity, abundant tandem repeats, large scale rearrangements, and highly amplified testis-specific genes). Because there are many properties of Y chromosomes that we believe contribute to their rapid divergence, we opted for the general phrase ‘unique structure’ to capture all of these features. Many evolutionary processes likely shape the evolution of that unique structure (e.g. Muller’s Ratchet, background selection, Hill Robertson effects; see Charlesworth and Charlesworth 2000 for a review), and these processes are well-studied, especially on newly evolved sex chromosomes. Here our focus is on evolutionarily old Y chromosomes, which may have comparatively fewer targets of purifying selection and are more likely to be shaped by positive selection (Bachtrog 2008).

      p.2, line 53-56: The authors claim that sexually antagonistic selection and regulatory evolution are causes of recombination suppression. Couldn't this statement be reversed? Recombination suppression via inversions or other rearrangements enable sexually antagonistic selection. This is a chicken or egg question, so it should be revised to have both possibilities be equal.

      Thank you for the suggestion. We think that it is unlikely that recombination suppression itself is beneficial, but for sexually antagonistic selection and regulatory evolution, recombination suppression can have short-term benefits. We rephrased this sentence to be agnostic about the direction (p2 L56).

      p.5, 118-120: Are the assemblies de novo or have they been guided based upon the D. melanogaster Y chromosome assembly? Please clarify how the authors evaluate their methods by comparing their Y-sequence assignments to known chromosomal locations.

      Thank you for the suggestion. We didn’t use D. melanogaster Y chromosome assembly to guide our assemblies. “All assemblies are generated de novo”, and thus we don’t think there is any potential bias. We first assigned Y-linked sequences using the presence of known Y-linked genes, and used this assignment to evaluate our methods. We now make the sentence clear (p5 L112).

      While the gene copy number estimates are accurate, the PacBio-based genome assemblies are still not able to accurately assemble large segmental duplications (see Evan Eichler's laboratories recent primate and human genome assemblies). A statement mentioning the concerns about accuracy of the underlying sequence and genomic architecture shown should be included in the main text. FISH provides support for the location of the contigs, but not for the accuracy of the underlying genomic architecture.

      Thank you for the suggestion. We can’t validate all Y-linked regions. We did validate the larger structural features of the assembly and only discuss the results that we are confident in. We now include sentences to address this concern (p7 L150-152).

      The authors assigned Y-linked sequences based on median male-to-female coverage. Is this method feasible for assigning ampliconic sequence to the Y given the N50 of 0.6-1.2Mb? Are the authors potentially excluding novel Y-linked ampliconic sequence?

      We validated our methods to assign contigs to a chromosome by comparing 10-kb intervals to the contigs with known chromosomal location, including the Y chromosome. Our assignments have high (96, 98, and 99%) sensitivity and low (5, 0, and 3%) false-positive rates in D. mauritiana, D. simulans, and D. sechellia, respectively (see Table S2). Based on these results, we think that this method is reasonable for Y-linked contigs with N50 of 0.6-1.2Mb.

      We might exclude some novel Y-linked sequences since we only assigned ~15Mb out of a total ~40 Mb Y-linked sequences. We acknowledged this possibility, and now include a sentence to address this concern (p31 L554-556).

      Where did the rDNA sequences go in D. simulans and D. sechellia? Can they be detected on another chromosome?

      Please see Fig S5 for detailed results. We found a few copies of rDNA on the contigs of autosomes. We assembled many copies of rDNA that can’t be confidently assigned to Y chromosomes. It’s possible that they might be located on other chromosomes. Based on our FISH data (Fig S4) and previous papers, most of these non-Y-linked rDNA copies should be on the X chromosome. However, in this study, we did not make a concerted effort to assign X-linked contigs.

      Figure 2B is hard to follow and it is unclear what additional value it provides to part A. Why is expression level of specific exons important?

      Exon duplication may be an important contributor to Y-linked gene evolution: most genes have duplications and our figure shows that at least some of these duplicates are expressed. The patterns we see indicate that duplication may play different roles in genes depending on their length. For example, the duplications involving short genes (e.g., ARY) may be functional and influence protein expression, whereas duplications involving large genes (e.g. kl-2) may not influence the overall protein expression level from this gene, although the expressed duplicated exons may play some other role. We revised a sentence in the main text and added a sentence to the figure 2 legend to make this point clearer.

      Figure 3 There are many introns that contain gaps, so it is unclear how confident one can be in intron length when there are gaps.

      Indeed, we are not confident about the length of introns with gaps. Therefore, we separated these introns and showed them in different colors.

      Figure 4: What are the authors using as a common ancestor in this figure to infer duplications in the initial branch?

      We used phylogenies to infer the origin of Y-linked duplicates. Any duplications that happened earlier than the divergence between four species are listed in the branch. We also edited the legend to make this point clearer.

      p.15, paragraph 2: The authors describe a newly amplified gene, CK2Btes-Y, in D. simulans. In the first half of the paragraph the authors state that Y-linked copies are also found in D. melanogaster but have "degenerated and have little or no expression" and call them pseudogenes. Later in the paragraph, the authors state that the D. melanogaster Y-linked copies are Su(Ste), a source of piRNAs that are in conflict with X-linked Stellate. Lastly in the paragraph, the authors discuss Su(ste) as a D. melanogaster homolog of CK2Btes-Y. The logic of defining CK2Btes-Y origins is confusing. Was CK2Btes-Y independently amplified on the D. simulans Y, or were CK2BtesY and Su(Ste) amplified in a common ancestor but independently diverged?

      The amplification of CK2Btes-Y and CK2Btes-like happened in the ancestor of D. melanogaster and D. simulans (Fig S11). However, both CK2Btes-Y and CK2Btes-like became pseudogenes (D. melanogaster CK2Btes-Y is named PCKR in a previous study) in D. melanogaster. On the other hand, Ste and Su(Ste) are only limited to D. melanogaster based on phylogenetic analyses (Fig 5A) and are a chimera of CK2Btes-like and NACBtes. The evolutionary history of this gene family has been detailed in other papers, except for the presence of CK2Btes-Y in the D. simulans complex, which we describe for the first time in this study. We now include a new figure (Figure 5B) a schematic of the inferred evolutionary history of sex-linked Ssl/CK2ßtes paralogs

      Figure 5: Is each FISH signal a different gene copy?

      Yes, based on our assemblies, Lhk-1 and Lhk-2 are mostly located on different contigs. Unfortunately, we are not able to design probes that can separate Lhk-1 from Lhk-2.

      The authors suggest DNA-repair on the Y chromosome is biased towards MMEJ based on indel size and microhomologies. Is there any evidence MMEJ is responsible for variable intron length in the canonical Y-linked genes or the amplification of new gene families? Since MMEJ is error-prone, it's a more tolerable repair mechanism in pseudogenes, so their findings might be biased. Rather than comparing pseudogenes to their parent genes, they should compare chrY pseudogenes to autosomal pseudogenes. Even more would be to track MMEJ on the dot chromosome which is known not recombine and is highly heterchromatic like the Y chromosome.

      We did compare chrY pseudogenes to autosomal pseudogenes in our study. We also add new analyses to address other issues from reviewer 2, which are similar to your concern. We now include data from pericentric heterochromatin and pseudogenes (see Fig 7). Both data types support our conclusion that indel size is only larger on Y chromosomes. This is consistent with a report that the dot chromosome and pericentric heterochromatin have similar indel size distributions (Blumenstiel et al. 2002).

      Reviewer #3 (Significance (Required)):

      While it is a benefit to have much improved Y chromosome assemblies from the three D. simulans clade species, the gap in knowledge this manuscript is trying to address is unclear. The manuscript is almost entirely descriptive and the figures are difficult to follow.

      As stated above, we respectfully disagree with the comment that the manuscript is entirely descriptive, as we present thorough evolutionary analyses to test hypotheses about the forces shaping the evolution of Y chromosome organization and Y-linked genes. We have two guiding hypotheses about the importance of sexual antagonism and DNA repair pathways for Y chromosome evolution, and we conduct sequence analyses that support these hypotheses that sexual antagonism and MMEJ affect Y chromosome evolution.

      References cited in this response:

      Bachtrog D. The temporal dynamics of processes underlying Y chromosome degeneration. Genetics. 2008 Jul;179(3):1513-25. doi: 10.1534/genetics.107.084012. Epub 2008 Jun 18. PMID: 18562655; PMCID: PMC2475751.

      Blumenstiel, J.P., Hartl, D.L, Lozovsky, E.R.. Patterns of Insertion and Deletion in Contrasting Chromatin Domains, Molecular Biology and Evolution, Volume 19, Issue 12, December 2002, Pages 2211–2225, __https://doi.org/10.1093/oxfordjournals.molbev.a004045__

      Chakraborty M, Chang CH, Khost DE, Vedanayagam J, Adrion JR, Liao Y, Montooth KL, Meiklejohn CD, Larracuente AM, Emerson JJ. Evolution of genome structure in the Drosophila simulans species complex. Genome Res. 2021 Mar;31(3):380-396. doi: 10.1101/gr.263442.120. Epub 2021 Feb 9. PMID: 33563718; PMCID: PMC7919458.

      Charlesworth B, Charlesworth D. The degeneration of Y chromosomes. Philos Trans R Soc Lond B Biol Sci. 2000 Nov 29;355(1403):1563-72. doi: 10.1098/rstb.2000.0717. PMID: 11127901; PMCID: PMC1692900.

      Flynn,J, Long, M, Wing, RA, A.G Clark, Evolutionary Dynamics of Abundant 7-bp Satellites in the Genome of Drosophila virilis, Molecular Biology and Evolution, Volume 37, Issue 5, May 2020, Pages 1362–1375, https://doi.org/10.1093/molbev/msaa010

      McKee, Bruce D. et al. “On the Roles of Heterochromatin and Euchromatin in Meiosis in Drosophila: Mapping Chromosomal Pairing Sites and Testing Candidate Mutations for Effects on X–Y Nondisjunction and Meiotic Drive in Male Meiosis.” Genetica 109 (2004): 77-93.

      Tobler R, Nolte V, Schlötterer C. High rate of translocation-based gene birth on the Drosophila Y chromosome. Proc Natl Acad Sci U S A. 2017 Oct 31;114(44):11721-11726. doi: 10.1073/pnas.1706502114. Epub 2017 Oct 19. PMID: 29078298; PMCID: PMC5676891.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Due to suppressed recombination, Y chromosomes have degenerated, undergone extensive structural rearrangements, and accumulated ampliconic gene families across species. The molecular processes and selective pressures guiding dynamic Y chromosome evolution are not well understood. In this study, Chang et al. generate updated Y assemblies of three closely related species in the D. simulans complex using long-read PacBio sequencing in combination with FISH. Despite having diverged only 250,00 years ago, the authors find structural rearrangements, two newly amplified gene families and evidence of positive selection across D. simulans. The authors also suggest the high level of Y duplications and deletions may be mediated by MMEJ biased repair.

      The authors generated a valuable resource for the study of Y-chromosome evolution in Drosophila and describe Y chromosome evolution patterns found in previous Y chromosome sequencing studies, such as newly amplified genes, positive selection, and structural rearrangements. The authors improvements to the Drosophila simulans clade Y chromosomes are commended, as assembly of the highly repetitive Y chromosome sequences is challenging. However, the manuscript is largely descriptive, the claims are largely speculative, and lacks a clear question. There are also a number of concerns with the text and figures (see below concerns). Overall, the manuscript would be significantly improved if the authors focused on a specific question as opposed to a survey of sequence features of the Y chromosome. For example, development of the idea that MMEJ is the primary mechanism for loss of Y chromosome sequence could be nice new twist.

      Major concerns:

      1. Title: The authors use "unique structure" in the title, which is a vague point. Are not Y chromosomes, or any chromosome, "unique" in some manner? Also are there not more evolutionary processes governing the rapid divergence of the Y's.
      2. p.2, line 53-56: The authors claim that sexually antagonistic selection and regulatory evolution are causes of recombination suppression. Couldn't this statement be reversed? Recombination suppression via inversions or other rearrangements enable sexually antagonistic selection. This is a chicken or egg question, so it should be revised to have both possibilities be equal.
      3. p.5, 118-120: Are the assemblies de novo or have they been guided based upon the D. melanogaster Y chromosome assembly? Please clarify how the authors evaluate their methods by comparing their Y-sequence assignments to known chromosomal locations.
      4. While the gene copy number estimates are accurate, the PacBio-based genome assemblies are still not able to accurately assemble large segmental duplications (see Evan Eichler's laboratories recent primate and human genome assemblies). A statement mentioning the concerns about accuracy of the underlying sequence and genomic architecture shown should be included in the main text. FISH provides support for the location of the contigs, but not for the accuracy of the underlying genomic architecture.
      5. The authors assigned Y-linked sequences based on median male-to-female coverage. Is this method feasible for assigning ampliconic sequence to the Y given the N50 of 0.6-1.2Mb? Are the authors potentially excluding novel Y-linked ampliconic sequence?
      6. Where did the rDNA sequences go in in D. simulans and D. sechellia? Can they be detected on another chromosome?
      7. Figure 2B is hard to follow and it is unclear what additional value it provides to part A. Why is expression level of specific exons important?
      8. Figure 3 There are many introns that contain gaps, so it is unclear how confident one can be in intron length when there are gaps.
      9. Figure 4: What are the authors using as a common ancestor in this figure to infer duplications in the initial branch?
      10. p.15, paragraph 2: The authors describe a newly amplified gene, CK2Btes-Y, in D. simulans. In the first half of the paragraph the authors state that Y-linked copies are also found in D. melanogaster but have "degenerated and have little or no expression" and call them pseudogenes. Later in the paragraph, the authors state that the D. melanogaster Y-linked copies are Su(Ste), a source of piRNAs that are in conflict with X-linked Stellate. Lastly in the paragraph, the authors discuss Su(ste) as a D. melanogaster homolog of CK2Btes-Y. The logic of defining CK2Btes-Y origins is confusing. Was CK2Btes-Y independently amplified on the D. simulans Y, or were CK2BtesY and Su(Ste) amplified in a common ancestor but independently diverged?
      11. Figure 5: Is each FISH signal a different gene copy?
      12. The authors suggest DNA-repair on the Y chromosome is biased towards MMEJ based on indel size and microhomologies. Is there any evidence MMEJ is responsible for variable intron length in the canonical Y-linked genes or the amplification of new gene families? Since MMEJ is error-prone, it's a more tolerable repair mechanism in pseudogenes, so their findings might be biased. Rather than comparing pseudogenes to their parent genes, they should compare chrY pseudogenes to autosomal pseudogenes. Even more would be to track MMEJ on the dot chromosome which is known not recombine and is highly heterchromatic like the Y chromosome.

      Significance

      While it is a benefit to have much improved Y chromosome assemblies from the three D. simulans clade species, the gap in knowledge this manuscript is trying to address is unclear. The manuscript is almost entirely descriptive and the figures are difficult to follow.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript describes a thorough investigation of the Y-chromosomes of three very closely related Drosophila species (D. simulans, D. sechellia, and D. mauritiana) which in turn are closely related to D. melanogaster. The D. melanogaster Y was analysed in a previous paper by the same goup. The authors found an astonishing level of structural rearrangements (gene order, copy number, etc.), specially taking into account the short divergence time among the three species (~250 thousand years). They also suggest an explanation for this fast evolution: Y chromosome is haploid, and hence double-strand breaks cannot be repaired by homologous recombination. Instead, it must use the less precise mechanisms of NHEJ and MMEJ. They also provide circumstantial evidence that MMEJ (which is very prone to generate large rearrangements) is the preferred mechanism of repair. As far as I know this hypothesis is new, and fits nicely on the fast structural evolution described by the authors. Finally, the authors describe two intriguing Y-linked gene families in D. simulans (Lhk and CK2ßtes-Y), one of them similar to the Stellate / Suppressor of Stellate system of D. melanogaster, which seems to be evolving as part of a X-Y meiotic drive arms race. Overall, it is a very nice piece of work. I have four criticisms that, in my opinion, should be addressed before acceptance.

      The suggestion/conclusion that MMEJ is the preferential repair mechanism (over NHEJ) should be better supported and explained. At line 387, the authors stated "The pattern of excess large deletions is shared in the three D. simulans clade species Y chromosomes, but is not obvious in D. melanogaster (Fig 6B). However, because all D. melanogaster Y-linked indels in our analyses are from copies of a single pseudogene (CR43975), it is difficult to compare to the larger samples in the simulans clade species (duplicates from 16 genes). ". Given that D. melanogaster has many Y-linked pseudogenes (described by the authors and by other researchers, and listed in Table S6), there seems to be no reason to use a sample size of 1in this species. Furthermore, given that D. melanogaster is THE model organism, it is the species that most likely will provide information to assess the "preferential MMEJ" hypothesis proposed by the authors. Still on the suggestion/conclusion that MMEJ is the preferential repair mechanism (over NHEJ). Y chromosome in heterochromatic, haploid and non-recombining. In order to ascribe its mutational pattern to the haploid state (and the consequent impossibility of homologous recombination repair), the authors compared it to chromosome IV (the so called "dot chromosome"). This may not be the best choice: while chr IV lacks recombination in wild type flies, it is not typical heterochromatin. E.g., " results from genetic analyses, genomic studies, and biochemical investigations have revealed the dot chromosome to be unique, having a mixture of characteristics of euchromatin and of constitutive heterochromatin". Riddle and Elgin, FlyBook 2018 (https://doi.org/10.1534/genetics.118.301146). Given this, it seems appropriate to also compare the Y-linked pseudogenes with those from typical heterochromatin. In Drosophila, these are the regions around the centromeres ("centric heterochromatin"). There are pseudogenes there; e.g., the gene rolled is known to have partially duplicated exons. In some passages of the ms there seems to be a confusion between new genes and pseudogenes, which should be corrected. For example, in line 261: "Most new Y-linked genes in D. melanogaster and the D. simulans clade have presumed functions in chromatin modification, cell division, and sexual reproduction (Table S7)".. Who are these "new genes"? If they are those listed in Table S6 (as other passages of the text suggest), most if not all of them are pseudogenes. If they are pseudogenes, it is not appropriate to refer to them as "new genes". The same ambiguity is present in line 263: "Y-linked duplicates of genes with these functions may be selectively beneficial, but a duplication bias could also contribute to this enrichment (...) " Pseudogenes can be selectively beneficial, but in very special cases (e.g.. gene regulation). If the authors are suggesting this, they must openly state this, and explain why. Pseudogenes are common in nearly all genomes, and should be clearly separated from genes (the later as a shortcut for functional genes). The bar for "genes" is much higher than simple sequence similarity, including expression, evidences of purifying selecion, etc., as the authors themselves applied for the two gene families they identified in D. simulans (Lhk and CK2ßtes-Y) The authors center their analysis on "11 canonical Y-linked genes conserved across the melanogaster group ". Why did they exclude the CG41561 gene, identified by Mahajan & Bachtrog (2017) in D. melanogaster? Given that most D. melanogaster Y-linked genes were acquired before the split from the D. simulans clade (Koerich et al Nature 2008), the same most likely is true for CG41561 (i.e., it would be Y-linked in the D. simulans clade). Indeed, computational analysis gave a strong signal of Y-linkage in D. yakuba (unpublished; I have not looked in the other species). If CG41561 is Y-linked in the simulans clade, it should be included in the present paper, for the only difference between it and the remaining "canonical genes" was that it was found later. Finally, the proper citation of the "11 canonical Y-linked genes" is Gepner and Hays PNAS 1993 and Carvalho, Koerich and Clark TIG 2009 (or the primary papers), instead of ref #55. Other points/comments/suggestions:

      a) Possible reference mistake: line 88 "For example, 20-40% of D. melanogaster Y-linked regulatory variation (YRV) comes from differences in ribosomal DNA (rDNA) copy numbers [52, 53]." reference #53 is a mouse study, not Drosophila.

      b) Possible reference mistake: line 208 "and the genes/introns that produce Y-loops differs among species [75]". ref #75 is a paper on the D. pseudoobscura Y. Is it what the authors intended?

      c) line 113. "We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, including 58 exons missed in previous assemblies (Table S1; [55])." Please show in the Table S1 which exons were missing in the previous assemblies. I guess that most if not all of these missing exons are duplicate exons (and many are likely to be pseudogenes). If they indeed are duplicate exons, the authors should made it clear in the main text, e.g., "We recovered all known exons of the 11 canonical Y-linked genes conserved across the melanogaster group, plus 58 duplicated exons missed in previous assemblies."

      d) line 116 "Based on the median male-to-female coverage [22], we assigned 13.7 to 18.9 Mb of Y-linked sequences per species with N50 ranging from 0.6 to 1.2 Mb." The method (or a very similar one) was developed by Hall et al BMC Genomics 2013, which should be cited in this context. e) line 118: "We evaluated our methods by comparing our assignments for every 10-kb window of assembled sequences to its known chromosomal location. Our assignments have 96, 98, and 99% sensitivity and 5, 0, and 3% false-positive rates in D. mauritiana, D. simulans, and D. sechellia, respectively (Table S2). The procedure is unclear. Why break the contigs in 10kb intervals, instead of treating each as an unity, assignable to Y, X or A? The later is the usual procedure in computational identification of suspect Y-linked contigs (Carvalho and lark Gen Res 2013; Hall et al BMC Genomics 2013). The only reason I can think for analyzing the contigs piecewise is a suspicion of misassemblies. If this is the case, I think it is better to explain.

      f) Fig. 1. It may be interesting to put a version of Fig 1 in the SI containing only the genes and the lines connecting them among species, so we can better see the inversions etc. (like the cover of Genetics , based on the paper by Schaeffer et al 2008).

      g) Table S6 (Y-linked pseudogenes). Several pseudogenes listed as new have been studied in detail before: vig2, Mocs2, Clbn, Bili (Carvalho et al PNAS2015) Pka-R1, CG3618, Mst77F (Russel and Kaiser Genetics 1993; Krsticevic et al G3 2015) . Note also that at least two are functional (the vig2 duplication and some Mst77 duplications).

      h) line 421: "one new satellite, (AAACAT)n, originated from a DM412B transposable element, which has three tandem copies of AAACAT in its long terminal repeats." The birth of satellites from TEs has been observed before, and should be cited here. Dias et al GBE 6: 1302-1313, 2014.

      i) Fig S2 shows that the coverage of PacBio reads is smaller than expected on the Y chromosome. Any explanation? This has been noticed before in D. melanogaster, and tentatively attributed to the CsCl gradient used in the DNA purification (Carvalho et al GenRes 2016). However, it seems that the CsCl DNA purification method was not used in the simulans clade species (is it correct?). Please explain the ms, or in the SI. The issue is relevant because PacBio sequencing is widely believed to be unbiased in relation to DNA sequence composition (e.g., Ross et al Genome Biol 2013).

      j) I may have missed it, but in which public repository have the assemblies been deposited?

      Significance

      see above.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      I found this an exceptionally impressive manuscript. The evolution of Y chromosomes has until recently been nearly impossible, and this research group have pioneered approaches that can yield reliable results in Drosophila. The study used an innovative heterochromatin-sensitive assembly pipeline on three D. simulans clade species, D. simulans, D. mauritiana and D. sechellia, which diverged less than 250 KYA, allowing comparisons with the group's previous results for the D. melanogaster Y.

      The study is both technically impressive and extremely interesting (an highly unusual combination). It includes a rich set of interesting results about these genome regions, and furthermore the results are discussed in a well-organised way, relating both to previous observations and to understanding of the genetics and evolution of Y chromosomes, illuminating all these aspects. It is a rare pleasure to read such a study. I believe that this study will inspire and be a model for future work on these chromosomes. It shows how these difficult genome regions can be studied.

      Major comments:

      The conclusions are convincing. The methods are explained unusually clearly, and the reasoning from the results is convincing. When appropriate, the caveats, the caveats are clearly explained. The material is clearly organised and the questions studied are well related to the results. I had a few minor comments concerning the English. Even the figure (often a major problem to understand) are very clear and helpful, with proper explanations. I have very rarely read such a good manuscript, and almost never (in a long career) found a manuscript that could be published without revision being necessary.

      The analysis found 58 exons missed in previous assemblies (as well as all previously known exons of the 11 canonical Y-linked genes, which are present in at least one copy across the group). FISH on mitotic chromosomes using probes for 12 Y-linked sequences was used to determine the centromere locations, and to determine gene orders and relate them to the cytological chromosome bands, demonstrating changes in satellite distribution, gene order, and centromere positions between their Y chromosomes within the D. simulans clade species. It also confirmed previous results for Y-linked ribosomal DNA,genes, which are responsible for X-Y pairing in D. melanogaster males. Although 28S rDNA has been lost in D. simulans and D. sechellia (but not in D. mauritiana), the intergenic spacer (IGS) repeats between these repeats are retained on both sex chromosomes in all three species. Only sequencing can reliably reveal this, as their abundance is below the detection level by FISH in D. sechellia. The 11 canonical Y-linked genes' copy numbers vary between the species, and some duplicates are expressed and have complete open reading frames, and may therefore be functional because they, but most include only a subset of exons, often with duplicated exons flanking the the presumed functional gene copy. Mega-introns and Y-loops were found, as already seen in Drosophila species, but this new study detects turn overs in the ~2 million years separating D. melanogaster and the D. simulans clade. 49 independent duplications onto the Y chromosome were detected, including 8 not previously detected. At least half show no expression in testes, or lack open reading frames, so they are probably pseudogenes. Testis-expressed genes may be especially likely to duplicate into the Y chromosome due to its open chromatin structure and transcriptional activity during spermatogenesis, and indeed most of the new Y-linked genes in the species studied clade have likely functions in chromatin modification, cell division, and sexual reproduction. The study discovered two new gene families that have undergone amplification on D. simulans clade Y chromosomes, reaching very high copy numbers (36-146). Both these families appear to encode functional protein-coding genes and show high expression. The paper described intriguing results that illuminate Y chromosome evolution. First, SRPK, arose by an autosome-to-Y duplication of the sequence encoding the testis-specific isoform of the gene SR Protein Kinase (SRPK), after which the autosomal copy lost its testis-specific exon via a deletion. In D. melanogaster, SRPK is essential for both male and female reproduction, so the relocation of the testis-specific isoform to the Y chromosome in the D. simulans clade suggests that the change may have been advantageous by resolving sexual antagonism. The paper presents convincing evidence that the Y copy evolved under positive selection, and that gene amplification may confer advantageous increased expression in males. The second amplified gene family is also potentially related to an interesting function. Both X-linked and Y-linked duplicates are found of a gene called Ssl located on chromosome 2R. In D. simulans, the X-linked copies were previously known, and called CK2ßtes-like. In D. melanogaster, degenerated Y-linked copies are also found, with little or no expression, contrasting with complete open reading frames and high expression in the D. simulans clade species in testes, consistent with the possibility of an arms race between sex chromosome meiotic drive factors. Other interesting analyses document higher gene conversion rates compared to the other chromosomes, and evidence that these Y chromosomes may differ in the DNA-repair mechanisms (preferentially using MMEJ instead of NHEJ), perhaps contributing to their high rates of intrachromosomal duplication and structural rearrangements. The authors relate this to evidence for turnover of Y-linked satellite sequences, with the discovery of five new Y-linked satellites, whose locations were validated using FISH. The study also documented enrichment of LTR retrotransposons on the D. simulans clade Y chromosomes relative to the rest of the genome, together with turnovers between the species.

      Significance

      As described above, the advances are both, technical and conceptual for the field. The manuscript itself does an excellent job of placing the work in the context of the existing literature.

      • Anyone working on sex chromosomes and other non-recombining genome regions should be interested in the findings reported.

      • My field of expertise is the evolution of sex chromosomes, and the evolution of genome regions with suppressed recombination. I have experience of genomic analyses. I have less expertise in analyses of gene expression, but I understand enough about such approaches to evaluate the parts of this study that use them.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Summary

      Fusion and fission of the mitochondrial network is one of the hottest topics in mitochondrial biology in the last years. The process is obviously necessary to allow cells to control the quality of small individual organelles, which are degraded by autophagy or mitophagy, if they are not working properly. Since a healthy mitochondrial network is essential for every cell in the body, the molecular players involved in these two processes are heavily investigated. In this paper, the authors investigate the role of MTFP1 in vitro and in vivo, a protein which has been studied and named because it seemed to be an important fission factor in cultured cells.

      Surprisingly and excitingly, the authors find that mitochondrial morphology and homeostasis is not affected by knocking out this protein in the heart of mice. On the contrary, it shows that this protein is a critical regulator of mitochondrial inner membrane coupling via the adenine-nucleotide-transporter (ANT). A loss of MTFP1 leads to a decline in the mitochondrial membrane potential, leading to cell death, which finally results in dilatative cardiomyopathy and causes early death of the animals. Therefore, this paper gives an important mitochondrial inner membrane protein a new role which may become very important to understand the opening of the large channel (MTPT-channel), which is responsible for some kinds of cell death in almost all cell types.

      Major comments:

      The conclusions are convincing, additional experiments on the molecular nature of the interaction between MTFP1 and ANT may be easily proposed by a reviewer; however, this will open a completely new line of research and should not be asked at this moment. Data and methods are presented in a perfect way, typical for the Wai lab. Statistical analysis has been performed meticulously, and there is nothing to add here. I have read the paper very carefully, but cannot find many points which should be changed.

      Minor Points:

      I must admit I hate the title, but the authors are in good company using the "genetic argument", as many others do. Mitochondrial fission process controls energetic efficiency - that is correct, but it does not prevent inflammatory cardiomyopathy and heart failure in mice. It is intact mitochondria which prevent inflammatory cardiomyopathy and heart failure, and as long as we do not know what exactly MTFP1 does, this title is misleading, although it may be considered attractive for readers. I would reformulate that and mention the new role of this protein in coupling of the mitochondrial inner membrane potential, but I leave this to the authors, of course.

      P. 2, line 45: The loss of MTFP1 promotes ... (erase "the")

      P. 12, line 321: There is clearly no indication of mitochondrial elongation, but I do see clearly in these pictures a separation between the organelles in the mutant mice in contrast to wild type, where mitochondria touch each other (Fig. 3c to d). If this is consistent, it should be mentioned. P. 12, line 324: I am not a true expert in fusion and fission, so wouldn't be a blot showing all the OPA1 isoforms necessary here?

      P. 13, line341: The same argument is repeated in two sentences following each other. I suggest to write here "Our data collectively indicate that MTFP1, unlike DRP1, is not an essential fission protein, contrary to its namesake, either in vitro or in vivo.".

      P. 13, line 349: "We sought to investigate..."

      Significance

      Understanding mitochondrial dynamics (fusion and fission) and bioenergetics (which some people considered to be fully known since the 1950s) is of utmost importance for biology and biomedicine. Since this paper gives a prominent protein, which the field believes is a fission factor, a completely new role, it is a paper of high interest. As the authors state, using these mice the protein may help to understand the molecular function of the mitochondrial membrane permeability transition pore (MPTP), which is still enigmatic, but important for so many ways of cell death. The paper is therefore state of the art and at the frontline of cell biology, and the large mitochondrial community will be very interested to read the paper.

      I have been working on mitochondria for 35 years, starting with bioenergetics, switching then to mitochondrial biogenesis regulated by transcription of nuclear genes as well as the mitochondrial genome, followed by studying the consequences of mtDNA mutations, and now considering how mitochondrial dysfunction may be involved in the normal aging process. Therefore, I feel myself competent to critically judge the quality of this paper. I am not a molecular biologist, therefore, the molecular details of protein-protein interaction do not lie in the focus of my interest; on the contrary, I feel that sometimes too much emphasis is laid on such molecular details, while the big question - in this case, how mitochondrial membrane potential is regulated - is not addressed at all.

      Referee Cross-commenting

      I guess we all suffer from reviewers of our own papers asking for more mechanistic insight. This paper unexpectedly shows a new role for MTFP1 - which is important for the mito community - and opens the door to more mechanistic studies how it uncouples the mitos and leads to cell death via ANT and MPTP - which is imprtant for a very broad community.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Donnarumma et al. characterize cardiac-specific KO of Mitochondrial fission process 1 (MTFP1), a mysterious mitochondrial protein thought to be involved in mitochondrial inner membrane fission. They initially demonstrate that the survival, cardiac function and respiration is diminished in the KO mouse and seek to find a mechanism. In MEF cells, surprisingly, they do not report any changes in fission, though mitochondrial morphology is altered. The authors then identify loss of MTFP1 as being damaging through exacerbation of cell death, possibly due to enhanced activation of the mitochondrial permeability transition. This is a beautiful and thorough paper. The data presented is of high quality and the conclusions are well supported by the figures. There was little to criticize in the manuscript!

      1. Although total mtDNA levels were no different, was there mtDNA release into the cytoplasm in Mtfp1 cKO? This is one possible mechanism to consider regarding the interferon response, as this would be a potent trigger for the innate immune response, as pointed out in the discussion and in PMC4409480.
      2. The authors show mitochondrial morphology in the pre-symptomatic period. What happens during DCM? Does this effect become exacerbated in the KO compared to WT?
      3. Given the cellular phenotypes seen in the ppif/Mtfp1 DKO cells, does this translate into a survival benefit in these animals? (If this data is easily available would recommend showing it, even if negative; but if entirely new crosses and 20-30 weeks of follow-up are required then it's fine to not address this question here).
      4. Methods (line 1281): It appears that only male mice were imaged from 10-34 weeks? Why only show one sex, especially as the authors note a difference in survival between males and females? Also, it is unclear why the data on female HF is relegated to the Supplement. This should be in the main manuscript side-by-side with the male data on the same scale to allow comparison of effect sizes on similar assays. Minor comments:
      5. Please change the title: "inflammatory cardiomyopathy" is a poorly defined term and would suggest myocarditis or inflammatory cell infiltrates, which are not shown in the manuscript. In addition, the only discussion of inflammation is through the innate immunity pathway in the RNA-seq data, with no real further follow-up.
      6. Line 39, Abstract: "ANT" needs to be in brackets/parenthesis
      7. Figure 1M: It would be good to see a higher magnification image showing fibrosis in the trichrome stain.
      8. Line 180, "gender" should more properly read "sex".
      9. At line 321, the authors state that there are no changes in mitochondrial elongation, however, Figure 3D seems to suggest that mitochondrial area is decreased in MKO cells. Is this an error or are the authors suggesting that the data in 3D is not significant? How was elongation measured?
      10. At line 335, the authors state that MTFP1 KO mitochondria were not protected from fragmentation, this is supported by the data in Figures 3G-H. However, to my eye, it appears that the mitochondria from the KO cells were far more fragmented in response to hydrogen peroxide. Is this data not significant?

      Significance

      This paper is novel in that it constitutes the first description of a mouse cardiac knockout of MTFP1, a poorly studied protein previously thought to be involved in mitochondrial fission. Previously MTFP1 has been described in knockdown cells (Aung et al. J Cell Mol Med. 2017 Dec; 21(12)) and the current paper builds upon this research. The current paper demonstrates that MTDP1 is important for cardiac function, but intriguingly, does not share the prior in vitro phenotypes related to mitochondrial fission, suggesting that it may have some other physiological function. Most of the methods shown are standard, though there are some quite novel machine learning-based analyses of imaging data. The paper is quite thorough and of relevance to a wide range of investigators interested in cardiac mitochondrial function, mitochondrial kinetics (fusion/fission), and cell death mechanisms more broadly. Our field of expertise is in cardiac mitochondrial function. The ML computational tools are very interesting, but these are not our expertise.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript entitled "Mitochondrial fission process 1 (MTFP1) controls bioenergetic efficiency and prevents inflammatory cardiomyopathy and heart failure in mice" by Donnarumma and collaborators investigate the role of MTFP1 in the heart in vivo. Mice with a cardiac-specific deletion of Mtfp1 were generated and fully characterized. Structure-function analyses were performed prior to and at the onset of the cardiomyopathy and show that homozygous Mtfp1 ko mice develop DCM progressing to heart failure and death in middle age, associated with increased fibrosis. RNAseq data revealed a severe impairment of metabolic processes including reduced oxidative phosphorylation, TCA cycle and mitochondrial gene expression. Mitochondrial respiration was significantly reduced in mitochondria isolated from Mtfp1 ko mice, while global mitochondrial proteins and activity of the Krebs cycle remained normal. Further assessment of a variety of processes revealed an increase in proton leak through ANT as the major contributor to the mitochondrial defects and cardiac dysfunction in Mtfp1 ko mice. Cardiomyocytes isolated from Mtfp1 ko mice were also more sensitive to stress-induced apoptosis and to mPTP opening. The major conclusion of the study is that contrary to previous reports documenting a role of MTFP1 in mitochondrial fission, MTFP1 does not regulate mitochondria morphology but rather is essential for cardiac energy balance. This is substantiated by mass spectrometry experiments which identify mitochondrial proteins of the complex I/IV and proteins regulating mPTP as MTFP1 partners.

      Overall, this is an elegant study with an impressive amount of work performed in isolated mitochondria and in vivo before and at the onset of DCM. Results are important because they challenge previous findings that established a role of MTFP1 in mitochondrial fission and therefore reveal another function of MTFP1. To rigorously establish how MTFP1 regulates cardiac bioenergetics, additional experiments are needed and are listed below. In particular, the use of wildtype mice as control is concerning because some transgenic lines of Myh6-Cre+ develop DCM. Also, experiments addressing MTFP1 as an essential fission protein should be performed in adult ventricular myocytes isolated from Mtfp1 ko mice to show consistency with experiments performed in MEFs.

      Major comments:

      One major concern is with the control mice, which appear to be wildtype (Myh6-Cre+/+ Mtfp1 LoxP/LoxP). The proper control group should be Myh6-Cretg/+. This is important because some models of Myh6-Cre+ mice develop DCM including mitochondrial dysfunction (Buerger et al., J Card Failure 2006; Hall et al., Am J Physiol Heart Circ Physiol 2011). At a minimum, the most critical assays evaluating mitochondrial function should be performed using Myh6-Cre+ as control to verify that they do not develop pathological cardiac remodeling.

      The observation that Mtfp1ko mice show a complete loss of the protein by Western blot analysis is intriguing because it suggests that Mtfp1 is only expressed in ventricular myocytes and not in the other cells populating the heart. Can you please comment on this?

      Was Seahorse analysis from ventricular myocytes isolated from Mtfp1ko performed in parallel with the analysis in MEF and U2OS cells? This should be done to establish the cell specific defects observed in cardiac mitochondria lacking Mtfp1.

      Mitochondrial morphology under normal or stress condition was assessed in MEF, which have very distinct characteristics than primary cardiac cells. The experiment using oligomycin, rotenone and CCCP should be performed in ventricular myocytes isolated from Mtfp1 ko mice, to rigorously reach the conclusion that MTFP1 is not essential for mitochondrial fission.

      Related to that, is-it possible that while total levels of mitochondrial fission and fusion proteins are similar in Mtfp1 ko and wt mice, their phosphorylated forms may be different?

      Figure 4: Cell death in Mtfp1 ko and control cardiomyocytes is measured using supervised ML-assisted high throughput live-cell imaging (Cretin et al., 2021). This result should be substantiated by additional apoptosis assays.

      Cell death assay are performed by treating cardiomyocytes isolated from Mtfp1 ko and wt mice with the cardiotoxic anthracycline doxorubicin (DOX). The dose DOX of 60 microM is extremely high. Can cell death be observed at lower concentrations of DOX?

      Minor comments:

      Line 349: there is a typo. Please replace "we sought investigate whether MTFP1 loss specifically..." with "we sought to investigate whether MTFP1 loss specifically..."

      Line 417: What the authors mean is that "the modest level of over-expression did not negatively impact cardiac function in vivo (Figure S5B-C)".

      Line 490-500: this is a very long sentence. Please break it down into 2 sentences to ease the reading.

      Significance

      The role of MTFP1 has been investigated in isolated cells where conflicting results were reported in the literature. The in vivo role of MTFP1 in the heart is currently unknown. RNAseq and a panoply of approaches assessing mitochondrial structure and function, before symptomatic DCM occurs, provide important insights on early events causing the cardiomyopathy. This study is potentially conceptually innovative and could reveal a new role of MTFP1 in maintaining energy metabolism in the heart as well as in other organs.

      Referee Cross-commenting

      I have read the comments of the other 3 reviewers in details. Like reviewer 3 and 4, I believe that the study is very well performed and provides new knowledge on the role of MTFP1 on cardiac energetics, assuming that the control mice do not develop DCM. I agree with the issues they identified. Regarding the issues raised by reviewer 1, especially concerning the lack of mechanistic insights, I actually thought that the full characterization of the Mtfp1 cko mouse model before and at the onset of the cardiomyopathy showing a strong cardiac phenotype, the RNAseq data showing alteration of metabolic genes and the detailed experiments performed in isolated mitochondria and isolated cells including rescue experiments, provide strong evidence that Mtfp1 regulates energy metabolism. That being said, I agree that direct causality could be better demonstrated by adding siRNA experiments to knockdown Mtfp1 and see if it can recapitulate the adverse effects seen in Mtfp1 ko mice. This was attempted in MEFs and U2OS cells, which did not show the expected results. I would perform this experiment in cardiac cells, which is the relevant cell type to investigate underlying mechanisms. Adding causality experiments would strengthen the study even more.

    5. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The study investigated the role of mitochondrial fission process 1 (MTFP1) on cardiac structure and function. MTFP1 deletion in the heart resulted in adult-onset dilated cardiomyopathy (DCM), reduced membrane potential, and increased non-phosphorylation-dependent respiration. MTFP1 deletion also increased the sensitivity to programmed cell death, which was accompanied by an opening of the mitochondrial permeability transition pore (mPTP) in vitro. Thus, the authors conclude that MTFP1 influences mitochondrial coupling and cell death sensitivity.

      I have the following concerns regarding the study and its main conclusions:

      Major concerns:

      1- While the study challenges previous reports regarding the role of MTFP1 in mitochondrial fission, the study is descriptive and does not provide any mechanistic insights delineating the impact of MTFP1 on cardiac energy metabolism and cell death.

      2- The significance of the RNA sequencing data is not clear, and the authors need to put these changes in context and explain how these changes may fit in the study context. It is also not clear why the authors decided to only comment on the changes in Nppa and Nppb levels?

      3- It is not clear how MTFP1 influences bioenergetic efficiency, and the authors do not prove any evidence to suggest that this might be the case.

      4- In Figure 2F, there is a decrease in the expression of ATP5A complex in the cMKO mitochondria, which could explain the changes in state 4 respiration and membrane potential. The authors need to delineate how MTFP1 could influence the activity of the ATP5 complex.

      5- In Figure S4, the author should report the baseline measurements of LV function and structure pre-doxorubicin treatment to ensure no significant difference in these parameters occurred prior to the treatment protocol.

      6- How does MTFP1 modify PTP activity? More work is needed to characterize this effect.

      7- Co-immunoprecipitation data in figure S5 are confusing and have no clear significance. Therefore, the authors need to discuss the significance of these changes and how they might be relevant in the study context.

      Minor:

      • Line 222, "wholesale" > whole cell

      Significance

      Significance: This study challenges existing dogma, although the data is not convincing enough to make this challenge convincing.

      Referee Cross-commenting

      I have read the comments of the other 3 reviewers, and I agree with their comments. This is an interesting study, that if adequately revised would make an important contribution to the literature.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1: **Major comments:**

      The authors state that all the RNA and contaminating DNA was validated and verified with nanodrop and BioAnalyzer which is the correct and accepted approach. However, the following concerns arise with testing reaction efficiency and data analysis:

      Comment 1.

      For reaction efficiency, the standard curves for each reference gene and gene of interest target should be included in the supplemental data. A four point standard curve is the bare minimum to assess reaction efficiency and raises concerns about the data quality. The unknown samples being tested should also be plotted on the corresponding standard curves to assess their efficiency

      Response:

      We have indeed calculated primer efficiencies by serial dilution and performed a four-point standard curve wherever possible. In other cases, at least a three point dilution curve was performed to assess primer efficiency. To have a more extensive range of Cq values in the standard curve, the dilution series was done with serial dilution by a factor of 1/10th as indicated in the materials and methods section under the heading “Amplification Efficiencies”. This provided a range of 6.6 cycles (three point dilution) and about 9.9 cycles (4 point dilution for the primers tested. If the Cq values of the 4th dilution fell beyond the detection range of the machines (above 29 cycles) or closer to the No-RT Control, only the first three dilutions were taken into consideration. We have now included the standard curve for all the genes in the metadata/source data and updated the Figshare DOI. All sample Cq values were within this standard curve as mentioned in the materials and methods section and they have been disclosed already in the metadata files. The raw qPCR Cq output for all references and targets for both datasets can be retrieved from the data file in figshare. Moreover, we will also add a new sentence in the methods sections clarifying the standard curve dilutions and data availability.

      Comment 2. The statement starting on line 510: "The WT experimental group was omitted from this analysis as it was used as the experimental calibrator for differential expression. The mean Fold Change of the WT group is always at 1 regardless of the gene/method in question and therefore it is redundant to test for statistical significance of the WT fold change levels across different methods for each gene." indicates that data analysis was not performed in a rigorous and generally accepted manner. PLease check the analysis with that described in: https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(18)30342-1

      The generally accepted methodology for relative, normalized qPCR data analysis is well described in Figure 5 of that article. qPCR statistical analysis should be performed on the log transformed expression results also well described in that paper.

      Response:

      We apologize for any lack of clarity on this line. We have always compared the Control group to the Test group while performing statistical analysis as shown Figure 3 and Figure 6. This is a fundamental point of any study and we have strictly adhered to this. The highlighted statement pertains to the supplementary figures 1 & 2 where we compare the Fold changes of the “Test” groups between qPCR and RNA-Seq in both datasets. Comparing the Control groups with one another between these methods is redundant as the mean Fold change of the control groups are always 1 as we are measuring relative expression. Thus, we cannot perform any meaningful statistical testing between the control groups between RNA-Seq and qPCR regardless of the method employed for testing.

      Furthermore, the use of the 2-ΔΔCt method for relative expression is in strict adherence to the initial papers describing this method (Livak and Schmittgen 2001, Schmittgen and Livak, 2008), which is again recapitulated in the article that you have cited. This can be seen in the metadata where the excel files that were used for calculating differential expression for all samples and datasets can be accessed. However, we would like to remark that we use more stringent criteria for primer validation (Efficiency between 95% and 105% as opposed to between 90% and 110% as mentioned in the paper). Moreover, the statistical testing and data representation prescribed in Figure 5 of the article that you have mentioned are not well founded for the following reasons:

      • We cannot perform parametric T-Tests using low sample sizes. Furthermore, we cannot test for data normality using few data points employed in standard qPCR assays. Thus, neither our qPCR assays nor the ones used in the mentioned article have enough samples to perform a T-Test. Hence, we have used a non – parametric ordinal Mann Whitney test for testing statistical significance in our study, as it is more apt for such low sample sizes and distributions.
      • The article proposes data representation with the mean and SEM or 95% Confidence Intervals (CI). We would like to kindly remark that SEM and CI are sampling parameters that arise when we perform sampling of data points from a larger population. In our study, we have always shown all the data points (biological replicates) for each experimental group. Hence, we can only show the distribution around the mean with the standard deviations (SD) and not with SEM or CI. We have not performed any sampling whatsoever nor has the study mentioned by the reviewer.

        Comment 3. The authors used Normfinder to assess reference gene stability. Since Normfinder uses a particular algorithm for assessing stability, it is recommended to assess stability using a combination of these "stability calculators" including: GeNorm, NormFinder and BestKeeper. This is described in Table 1 of: https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(18)30342-1. This will give a much more reliable perspective on the ranking of reference genes by their stability.

      Response:

      The method used in our study for reference gene validation is a combination of CV, Normfinder and statistical testing of raw expression profiles. In our previous study (Sundaram et al 2019), we have categorically shown that using a combination of different existing methods such as GeNORM, NormFinder and Best Keeper and comparing their ranks results in a sub optimal choice of reference genes. This is because GeNorm ranks genes with similar expression patterns as stable even if they vary significantly among groups. BestKeeper calculates variation based on Cq values which are exponents while the expression levels are calculated in the linear scale (2^Cq). NormFinder stability scores are influenced by the presence of genes with significant overall variation. More evidence backed up with data can be found in our previous study where we have clearly shown that combining these methods and calculating an overall rank (as proposed in the article you have mentioned) is not the best strategy. Hence, we devised the approach used in the present study, which has been previously validated, published (Sundaram et al 2019, PLoS ONE) and was designed taking into account the advantages and disadvantages of the different existing approaches.

      Comment 4. Finally, since many currently studied targets for relative gene expression are low expressed, it would be important to also examine three deferentially expressed targets in the Cq range of 29 to 32. Yes the variability will be higher but these data will give a more realistic test of reference gene stability.

      Response:

      The target genes used in the study range from about 12 cycles to about 29 cycles (both datasets included, please refer to the source data/metadata). This falls well within the standard curves of all these genes used as mentioned earlier. The stability of the reference genes has been shown with absolute parameters such as the Co-efficient of variation and the Normfinder S scores (Tables 1, 2, 3 & 4). Although we are not opposed to adding more target genes, we fail to see as to how adding target genes with Cq values above 29 cycles would reflect on the stability of reference genes. The variability that will be observed is a mere reflection of the variability of Cq values of the target genes in the Cq range of 29 – 32 as it approaches the detection limits of qPCR assays. The Cq values of the best reference genes would still remain the same. Therefore, this exercise cannot test the “stability” of the reference genes but only demonstrate the limit of qPCR detection (which is already well known). We would also like to remark that we have used No-RT controls in our qPCR assays, which exhibit a signal (different dissociation peak) in this Cq range for some genes and hence this is not a signal that arises from the cDNA. Therefore, we do not consider values above 29 cycles are reliable in our qPCR setup and we switch to droplet digital PCR for such low-expressed genes in our studies.

      Reviewer #2: **Summary + Minor Comments** Reference gene selection is one of the most critical steps in gene expression analysis using qPCR. The authors compared data quality using references selected based on RNA-Seq or using panel of often used reference genes. The manuscript is well prepared and easy to understand. Figures are nice and clear. I do not have major comments, but rather a few suggestions to make the manuscript more advanced. Since it is based on already available data or a few more expression measurements could be easily added, I would suggest to include total RNA factor, some rRNA and mtRNA as potential references. It will be interesting to compare their stability and effect on results of other targeted genes.

      In discussion, authors suggested that: "stable reference genes for qPCR data normalisation can be obtained from any random set of candidates provided the statistical approach of reference gene validation is sound and consistent". I do not think the word random in many sentences is appropriate. Panel of reference genes used in this study contains many known stable genes and that does not look random to me. I would rephrase these sentences. Usually panels of reference genes (for human and mouse are commercially available and contains several genes used in study) are composed of genes coding various biological processes to ensure that some of them will be stably expressed in experiments.

      Response:

      We understand the reviewer’s perspective on the use of the words “random reference genes”. We have replaced it with the words “conventional reference genes” throughout the manuscript.

      Regarding the addition of other RNA species as reference genes, we would like to clarify that we have used only protein coding transcripts (encoded by nuclear genes) as reference genes as all our target genes also belong to the same RNA category. This was done in accordance with the MIQE guidelines for qPCR data publication (Bustin et al 2009, DOI: 10.1373/clinchem.2008.112797) which states that rRNA should not be used for mRNA target gene normalization. This is because the vast majority of RNA from total RNA extraction is rRNA and only about 1% - 5% is mRNA. Thus, it is advisable to normalize mRNA targets with mRNA reference genes as it serves as a control for the extraction and RT PCR protocol. This argument can also be extended to other RNA species either in type or in origin (mtRNA). Regarding the total RNA factor, we have always used the same quantity of total RNA from all samples for RT-PCR as mentioned in the materials and methods section.

      Reviewer #3

      **Summary + Minor Comments**

      The aim of this study was to demonstrate that the statistical approach to determine the best reference genes from randomly selected "standard" reference genes might be more sufficient than employing reference genes as indicated by RNA-Seq.

      In a previous study they established a qPCR data normalization workflow, after comparing several statistical approaches for the assessment of reference gene stability. In this study they apply this workflow to compare "random" reference genes with preselected references genes based on RNA-Seq data. They test their hypothesis in two different experimental setups, varying sample material and methodology. After establishing the most "stable" reference genes, the suitability of these genes for normalization was put on trial by investigating their ability to normalize differential expression of target genes. These results were compared to one another and to fold-changes computed from RNA-Seq. The results indicate that as stated in the title of the study, "RNA-Seq is not required to determine stable reference genes for qPCR normalization", since both approaches render similar results. Potential pitfalls when selecting genes from RNA-seq data are discussed and an integration of influencing factors is suggested.

      The key conclusions of the study are convincing and well-supported by the experiments conducted, which are realistic in terms of time and resources. Data and methods are presented articulate and are reproducible. Experiments are adequately replicated and statistical analysis is adequate. The manuscript is well written, tables and figures provided are sound and corroborate a better understanding of the presented results. Minor changes would be:

      Figure 1, 2, 3, 4, 5, 6: in the figure are uppercase letters, in the figure legend are lowercase letters, please adjust that.

      p10 line 347: I understand what is meant, with "using the NF as the reference gene", however, stating again that the combined NF of the two most stable ref genes was used here, would make it clearer. P11 line 355f: the first sentences here are negligible, as already stated elsewhere P30 line 777: The last sentence is not clear to me.

      Response:

      All minor concerns have been addressed in the revised manuscript as follows:

      1. Figure 1, 2, 3, 4, 5, 6: in the figure are uppercase letters, in the figure legend are lowercase letters, please adjust that – Has been modified
      2. p10 line 347: I understand what is meant, with "using the NF as the reference gene", however, stating again that the combined NF of the two most stable ref genes was used here, would make it clearer. – Has been modified
      3. P11 line 355f: the first sentences here are negligible, as already stated elsewhere – Have been removed
      4. P30 line 777: The last sentence is not clear to me.

        We wanted to say that our study aptly addressed the strongest hurdle in performing reliable qPCR assays, which is the choice of good reference genes. This choice is not dependent on RNA-SEQ results. We have modified this sentence for better clarity.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      The aim of this study was to demonstrate that the statistical approach to determine the best reference genes from randomly selected "standard" reference genes might be more sufficient than employing reference genes as indicated by RNA-Seq.

      In a previous study they established a qPCR data normalization workflow, after comparing several statistical approaches for the assessment of reference gene stability. In this study they apply this workflow to compare "random" reference genes with preselected references genes based on RNA-Seq data. They test their hypothesis in two different experimental setups, varying sample material and methodology. After establishing the most "stable" reference genes, the suitability of these genes for normalization was put on trial by investigating their ability to normalize differential expression of target genes. These results were compared to one another and to fold-changes computed from RNA-Seq. The results indicate that as stated in the title of the study, "RNA-Seq is not required to determine stable reference genes for qPCR normalization", since both approaches render similar results. Potential pitfalls when selecting genes from RNA-seq data are discussed and an integration of influencing factors is suggested.

      The key conclusions of the study are convincing and well-supported by the experiments conducted, which are realistic in terms of time and resources. Data and methods are presented articulate and are reproducible. Experiments are adequately replicated and statistical analysis is adequate. The manuscript is well written, tables and figures provided are sound and corroborate a better understanding of the presented results. Minor changes would be:

      Figure 1, 2, 3, 4, 5, 6: in the figure are uppercase letters, in the figure legend are lowercase letters, please adjust that.

      p10 line 347: I understand what is meant, with "using the NF as the reference gene", however, stating again that the combined NF of the two most stable ref genes was used here, would make it clearer. P11 line 355f: the first sentences here are negligible, as already stated elsewhere P30 line 777: The last sentence is not clear to me.

      Significance

      In the last years the necessity of stable reference genes for the normalization of pPCR data has become more and more apparent, since it has been shown, that selecting the genes most "popular", might not always lead to correct expression profiles, since depending on the experimental setup, significant variation can occur. Numerous studies exist, validating potential reference genes, employing several well-established statistical approaches (Genorm, Normfinder etc.) and more recently based on RNA-Seq data. RNA-Seq is definitely accompanied by more work effort and higher costs. Therefore employing the "simpler" approach, obtaining the same results might be beneficial for scientists, establishing a new qPCR protocol, in particular in times, when working cost-effectively is a prerequisite in most laboratories.

      The authors performed a thorough analysis of the two approaches compared in this study. By investigating two entirely different experimental set-ups with a similar outcome, they nicely substantiate their findings. Furthermore, by investigating differential expression of target genes, for both experimental setups, they put their results to the test, convincingly corroborating their results.

      This manuscript is well-written, experiments are thoroughly performed, the findings are convincing and it clearly is an important contribution for the scientific community.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Reference gene selection is one of the most critical steps in gene expression analysis using qPCR. The authors compared data quality using references selected based on RNA-Seq or using panel of often used reference genes. The manuscript is well prepared and easy to understand. Figures are nice and clear. I do not have major comments, but rather a few suggestions to make the manuscript more advanced. Since it is based on already available data or a few more expression measurements could be easily added, I would suggest to include total RNA factor, some rRNA and mtRNA as potential references. It will be interesting to compare their stability and effect on results of other targeted genes.

      In discussion, authors suggested that: "stable reference genes for qPCR data normalisation can be obtained from any random set of candidates provided the statistical approach of reference gene validation is sound and consistent". I do not think the word random in many sentences is appropriate. Panel of reference genes used in this study contains many known stable genes and that does not look random to me. I would rephrase these sentences. Usually panels of reference genes (for human and mouse are commercially available and contains several genes used in study) are composed of genes coding various biological processes to ensure that some of them will be stably expressed in experiments.

      Significance

      Good reference gene selection is needed for most of experiments, where quantities and qualities of samples are not identical. Unfortunately, every experiment has other stable and reliable reference genes. Validation can be time consuming and expensive. RNA-Seq experiments covering broad spectrum of biological samples are potentially a way for faster identification of unknown stable genes, which could be used for normalization in qPCR. Authors compared effectivity of reference genes selected based on RNA-Seq and using panel of potential reference genes. I like their comparison, but do not fully agree with "random" selection.

      I am not aware of other study comparing quality of qPCR references from RNA-Seq or preselected genes. I think the manuscript will be appreciated by technically or methodically oriented readers (gene expression area).

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This article contrasts RNAseq and random selection to assess reference genes for relative gene expression. The study was well contrived with a solid experimental design.

      Major comments:

      The authors state that all the RNA and contaminating DNA was validated and verified with nanodrop and BioAnalyzer which is the correct and accepted approach. However, the following concerns arise with testing reaction efficiency and data analysis:

      1. For reaction efficiency, the standard curves for each reference gene and gene of interest target should be included in the supplemental data. A four point standard curve is the bare minimum to assess reaction efficiency and raises concerns about the data quality. The unknown samples being tested should also be plotted on the corresponding standard curves to assess their efficiency.
      2. The statement starting on line 510: "The WT experimental group was omitted from this analysis as it was used as the experimental calibrator for differential expression. The mean Fold Change of the WT group is always at 1 regardless of the gene/method in question and therefore it is redundant to test for statistical significance of the WT fold change levels across different methods for each gene." indicates that data analysis was not performed in a rigorous and generally accepted manner. PLease check the analysis with that described in: https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(18)30342-1

      The generally accepted methodology for relative, normalized qPCR data analysis is well described in Figure 5 of that article. qPCR statistical analysis should be performed on the log transformed expression results also well described in that paper.

      The authors used Normfinder to assess reference gene stability. Since Normfinder uses a particular algorithm for assessing stability, it is recommended to assess stability using a combination of these "stability calculators" including: GeNorm, NormFinder and BestKeeper. This is described in Table 1 of: https://www.cell.com/trends/biotechnology/fulltext/S0167-7799(18)30342-1. This will give a much more reliable perspective on the ranking of reference genes by their stability.

      Finally, since many currently studied targets for relative gene expression are low expressed, it would be important to also examine three deferentially expressed targets in the Cq range of 29 to 32. Yes the variability will be higher but these data will give a more realistic test of reference gene stability.

      Significance

      This article will be useful for all labs conducting gene expression experiments. It also uncovers additional contrasts between qPCR and RNA seq which are helpful in choosing the appropriate technology for given experiments.

      Referee Cross-commenting

      I agree with the other reviewers comments.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Hello, we wrote our review before seeing that you have special formatting requirements. We're just going to post our review in it's entirety rather than rewrite it based on these suggestions. It encompasses the above content, it's just not formatted in the suggested order. We hope that's OK! **Full review:** This manuscript makes a strong case for the evolvability of multicellular size via selection for settling rate in the icthyosporea. The use of an experimental evolution framework to assess the evolvability of multicellular phenotypes, using sedimentation rate as a selective pressure, extends the previous work of others into a new domain within the holozoan and the closest living relatives of animals. The natural, ecological significance of selection for sedimentation rate is a novel idea, and the connection between sedimentation rate and multicellular evolution in natural as opposed to contrived experimental circumstances is an interesting idea. The results are striking and well supported, with laboratory evolution rapidly adjusting both the cellular composition and the multicellular phenotypes of the organisms involved in ways that are well explained. This is an important result that brings the laboratory study of the evolution of multicellularity forward, into a different branch of the tree of life and showing its broad applicability. Sequencing of evolved lines adds significantly to the completeness of the story. While the causal role of these mutations in the production of the observed multicellular phenotypes are not demonstrated via manipulation or breeding, this is quite understandable in the light of the unusual model organism and the observed homologies and role of the genes involved. While this is largely clear from a reading, we believe the manuscript would benefit from a brief analysis of the numerical enrichment of genes with homologs involved in cytokinesis, cell membrane composition, and cell cycle control relative to the null hypothesis of genes picked randomly from the genome. If this is beyond the scope of this research in an unusual model organism with many poorly annotated genes, then a slightly expanded verbal discussion of the potential roles of the apparent functions of these genes in the evolution of multicellular clumping would be an appropriate substitute. We wholeheartedly recommend the publication of this manuscript with a number of minor revisions, which while not affecting the main conclusions or points of the manuscript will clarify important points, adjust small errors, and point the reader at relevant literature and concepts.

      ANSWER__: We would like to heartily thank the reviewers for their appreciation of our work. __

      **Major points:** none. **Minor points:** Line 79 - is sedimentation rate really invariably associated with multicellularization? Active swimming would seem to prevent this.

      ANSWER__: We meant to refer to the fact that all published examples of the emergence of multicellularity from unicellular ancestors have been accompanied by an increased sedimentation rate. Active swimming alone would just increase the diffusion rate of cells and not counteract the effects of increased size and density; such an active mechanism would also require directionality away from the tendency to sediment. A more passive mechanism, whereby a genetic variant, or cell cycle transition, which simultaneously causes a relative decrease in density while increasing cell size, leaving the net sedimentation rate the same as the ancestor, while conceivable, has not been observed in the literature. We changed the text from “invariably” to “frequently” at line 80 to emphasize how this is an empirical observation.__

      Line 164 - the precise phenotype in the evolution experiment being referred to is unclear without further context, with the ordering of paragraphs possibly needing a little work.

      ANSWER__: We tightened the paragraphs and merged both, the sentence containing “this phenotype” was removed.__

      Line 178 - is sorting them into three classes informative? Are there different mutations associated with these, or is it just visual clumping on the numberline? Perhaps not a useful classification, but the existence of great variation is an important point to get across. A more useful classification might be those that increase sedimentation with large density changes versus exclusively by clumping.

      ANSWER__: We agree with this argument and ultimately decided to remove the visual classification. We revised the text and figures accordingly.__

      Line 254 - excess cellular density is referred to interchangeably with density, when these are very different figures. This continues in line 269, and in the figure legends of Figure 4.

      ANSWER__: We fixed this.__

      Line 341 - the rule of RCC1 homolog in other organisms could be expanded on in slightly more detail. Similarly, other mutations in this same section known to affect cytokinesis could have potential mechanisms for affecting clumping commented upon, especially given the cell membrane results in the figures.

      ANSWER__: We share the reviewer’s enthusiasm about some of these mutations. We, however, try to be very conservative about what each gene or protein could be doing. Indeed, the absence of genetic tools does not allow us to directly test the effect of each mutation. We added a couple of extra sentences about RCC1 as well as about cytokinetic proteins and their potential role in clumping phenotypes.__

      Line 387 - awkward formatting or sentence structure, with dashes and commas.

      ANSWER__: We fixed the sentence structure.__

      Line 395 - this cellular process, or this evolutionary process of selection for faster settling?

      ANSWER__: We revised this appropriately.__

      Line 408 - per unit volume

      ANSWER__: Fixed.__

      Line 425 - the idea of clumpiness as ancestral is quickly put forward and dismissed within a single sentence. This could be explored in slightly more detail as an option, before concluding that what is clear is that the phenotype is easy to change.

      ANSWER:__ We agree that it would be interesting to pursue the ecological role and distribution of clumping and cell cycle phenotypes for other species in the Ichthyosporea genus. We could propose alternative scenarios of which trait came or went first and test this hypothesis by calculating the correlation of the presence or absence of the trait with the branch lengths and branching patterns of phylogenetic trees we have built using genome sequences. However, for our dataset, this would nonetheless remain a fragile correlation consisting of five data points. We do not feel such speculation is helpful for the text.__

      However, because two reviewers have mentioned or suggested in this direction, we expanded the discussion and annotated the tips of the species tree in figure 5 with the traits of interest. The result shows that S. gastrica, S. tapetis and S. nootkatensis species exhibit clumpiness as a trait. However, the data is not enough to resolve whether the traits are “derived” or “ancestral”.

      Line 437 - sedimentation as a highly variable trait, or a highly evolvable trait?

      ANSWER__: Evolvable trait. We fixed it in the text.__

      Figure 1G, 1H: We are fairly certain that the logarithmic scale of DNA content and coenocyte volume are mislabeled. The scale that is labeled log2 in 1G in the legend goes up by factors of 2 rather than single digits. The axis is obviously logarithmic, and the log2 in the legend is superfluous and misleading. Similarly, in 1H a scale labeled as log10 goes from 1 to 30, which on a logarithmic scale would be a sphere approximately 100 kilometers wide. The numbers can remain, but the legend should remove the log10.

      ANSWER__: Fixed. It is indeed a log scale. We made sure to remove the confusing log2 and log10 from figure and legend.__

      **General:** Were there any head to head competitions performed? Not suggesting you need to, but it's a nice way to directly examine fitness consequences of multicellularity, and is commonly done in the field. If you have done this it wasn't clear to us.

      ANSWER__: We now included a fitness experiment previously performed using the clumpy S01 and S03 in a head-to-head competition with the Ancestor (AN). The results are shown in Figure 2E and Figure 2 – figure supplement 1D. The results reflect how the fast-sedimenting clumpy phenotype is highly advantageous in our experimental evolution selection procedure, however deleterious in the absence of selection.__

      Reviewer #1 (Significance (Required)): see the above comments about writing the review before realizing there were specific formatting suggestions. I hope you understand us not wanting to re-write the review having already written it once.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): The present work adds to the growing literature on sedimentation rate as a major player in the evolution of multicellularity. Via rigorous experimentation, the authors convincingly show that they can select for increase sedimentation rate and identify two mechanisms underlying this increase: incomplete cellular separation leading to multicellular groups and increases in cellular density. They also show surprising natural variation in sedimentation and argue that, along with similar evidence from other organisms, their findings cement the likely major role of sedimentation and go farther by revealing the tight genetic control that it is under. Reviewer #3 (Significance (Required)): This is a very significant study because it illuminates processes and underlying mechanisms that could have played a major role in the transition to multicellularity. Their result will likely greatly influence the conceptual and theoretical thinking and will foster additional empirical directions. My only quibble with the manuscript is that I wished for a bit more ecological context and grounding of the main findings: in that respect, both the abstract and the last paragraph of the discussion leave me wanting and occasionally puzzled. If maintaining buoyancy is such a strong selective pressure and the variation in sedimentation rate is such a challenge to it, then I think explaining a bit more exactly why sedimentation would evolve, why so much variation would exist etc etc would be really helpful to the more naive reader. Just a bit further elaboration on selective pressures (even presumed ones and even if speculative) would be helpful to put the picture together.

      ANSWER__: We would like to thank Reviewer #3 for his/her comments. We do believe that extensive ecological context is highly relevant. Throughout the manuscript, we strived to be conservative in the way we describe both our model system and its experimental and natural settings, perhaps to a fault, but we now do offer an evolutionary model that tries to shed light into the phenotypic evolution of the various species through different routes (Fig. 5H). To elaborate more on the rationale behind this strategy, we offer the following two aspects:__

      1. we are investigating a sizeable, but still a very limited number of six Sphaeroforma Therefore, we feel that explaining what trait may be considered ancestral is speculative based on the known species tree (we revised our Discussion in this regard and update figure 5A).
      2. our knowledge about the ecological niches of Sphaeroforma species is limited. We avoid extensive speculation, and while inference of the potential ecological context is part of the scope of this study, we relied on an experimental approach to tackle our questions, rather than ecological observation or computational modeling.
      • throughout the text we aimed to avoid taking a strong stance on the “adaptiveness” of the traits which we are measuring. This is because, depending on the model specification and parameters, ecological models could be made for or against whether the cellular traits of size and density, and their effects on the higher-level trait of sedimentation rate, might be adaptive “in the wild”.

      We hope that future studies will be able to tackle any open questions on the understanding of the ecology of ichthyosporeans, hopefully benefitting from our inferred evolutionary insights in this study.

      **A more minor point:** I remember seeing a talk by Will Ratcliff a while back in which he showed that in S cerevisiae they also see the two mechanisms of increased sedimentation: increased cellular size and clumping. Yet, I didn't see a reference to that work in the context of the cell density mechanism discussion and wondered why.

      ANSWER__: We do believe to have cited the relevant papers from the Ratcliff lab. To be clear, we observed two separate physical mechanisms for fast-sedimentation: __


      1. by cell-clumping (increasing size),
      2. by increasing the number of nuclei per unit volume (increasing density).

      To our knowledge the 1st mechanisms was indeed observed in snowflake yeasts (for which we referenced all relevant studies), whereas the 2nd, which we believe might be specific to multinucleated cells, while a conceivable variable affected by mutations in the organisms from these studies, has not been measured to our knowledge. We added a new model figure (Figure5H) to hopefully better get this message across.


      Reviewer #4 (Evidence, reproducibility and clarity (Required)): In this study Dudin et al. explored the variability of sedimentation rates in members of the Sphaeroforma genus and found that sedimentation rates are very variable between different isolates as well as during the life cycle of each isolates. Following this observation Dudin et al. evolved S. arctica under a regime favoring fast settling objects. After a few hundred generations they observed that most lineages increased their sedimentation rate. Characterization of some of these evolved population suggests two distinct mechanisms allowing fast sedimentation: cluster formation by non-separation of cells post-cellularization and increase in object density. By sequencing the evolved lines Dudin et al. were able to identify that several mutations has been under the effect of positive selection and that some of the mutations relate to mechanisms involved in cell separation and cellularization.

      ANSWER__: We dearly thank Reviewer #4 for his/her time and efforts.__

      **Major comments: **

      • Line 143, I don't understand how figure 1G shows that "nuclear division cycles were periodic...".

      ANSWER__: From previous published results (Ondracka et al 2018 & Dudin et al 2021), we know that nuclear divisions in S. arctica are strictly synchronized and occur within defined time-intervals. As can be seen in Figure 1G, DNA content doubles with a constant interval of about 9 hrs. Likewise, this phenomenon is clearly depicted in Figure 4F and Figure S4H. These results combined with results shown in Figure 1F, demonstrate that division cycles are still periodic in our experimental setting and are not occurring asynchronously as no odd number of nuclei per cell was observed.__

      • When characterizing the evolved lines, the authors display (and measure?) separately the size and the sedimentation rate, but don't directly compare them. If the statement that density plays a role in the sedimentation rate of S4 and S9 but not S1, then correlation between size and sedimentation should be similar between AN and S1 and changed in S4 and S9. It would be nice to see these relationships and the correlations.

      ANSWER__: We do indeed measure the size and the sedimentation rate of each fast-settling mutant separately. This is shown in figure 1C, where sedimentation rate is plotted against cell size for our dataset and the older Smayda (1973) data. Further, both measurements, directly, feed in the estimation of cellular density in Figures 4C and S4D (explained extensively in the methods). Cellular density estimations show the correlations and relationships between S1 and AN as well as between S4 and S9. __

      • Line 288: "surviving 780 generations of passaging for all 10 isolates" what data is this referring to?

      ANSWER__: This refers to growing cultures in the lab of fast-settling mutants with tens of passages done without any selection. These growing cultures maintained their clumping phenotypes even without a constant selection, suggesting they are due to a genetic modification. We are unsure about how to answer reviewer #4 as this is the data we are mentioning. We however changed “surviving” to “persisting for”, and hope it better clarified the sentence.__

      • The weakest aspect of the paper is that there is neither a statistical argument (with a single anecdotal exception), from seeing the same genes or pathways mutated in parallel experiments, or experimental reconstruction that argues that any of the observed mutations were selected as opposed to being neutral mutations that hitch-hiked with adaptive mutations. One strongly suspect that some of the observed mutations were selected, but from the available data, it is impossible to know which were selected and which were hitch-hiking.

      ANSWER__: We agree that our draft did not elaborate in-depth if mutations were drivers versus passengers, a fact also mentioned by another reviewer. To be fair however, there are several important considerations to make.__

      First, and most importantly, we do offer an unprecedented look into the genetic underpinnings of this novel model organism, and demonstrate highly parallel phenotypic evolution in response to selection. The molecular genetic signal reflects this finding given a skewed dN/dS-ratio > 1. While the precise molecular changes are not as easy to interpret, molecular parallelism at the level of genes is not a prerequisite for directional selection in repeat lineages, especially given the complex genomic architecture of S. arctica.

      Second, while we didn’t emphasize this a lot, the results from our bioinformatic analyses are pretty unique. We are dealing with a non-standard model organism here, with highly intriguing placement in the tree of life, but with big genome size, at >140 Mbp. This is 1-2 orders of magnitude larger than that of other single-celled model systems used in evolution experiments, including E. coli or S. cerevisiae. Unlike the latter two, this organism’s genome contains extensive levels of intergenic and intronic sequence, as well as a high amount of (simple sequence) duplication. Hence, the analyses of the resequencing data were a major effort, and it took an extensive amount of time to identify the mutations.

      Third, there are no genetic tools that would allow us to either perform molecular genetics or crossing with S. arctica as of now. This will change in the future, and in this event, our comprehensive list of target genes will be hopefully valuable to the field and beyond.

      • Even if the authors knew which mutations were selected, it is not possible to say if the mutations that have been selected are directly advantageous in the settling regime, they could be due to adaptation to lab conditions and higher temperatures, etc. Having a control evolution experiment with no settling selection would be required to reach the conclusion that the mutants were selected for faster sedimentation.

      ANSWER__: We agree that a “no-selection”-control experiment would have been helpful for the molecular interpretation. But the clumping phenotype has never been observed to occur in many generations of passaging in any of the labs culturing these organisms and at different temperatures (we made sure to specify this in the text) As such, we argue that any adaptation to laboratory conditions must have happened before we conducted our selection experiment. Given that the molecular signals were unique (with one exception), we have reason to believe that the highly controlled nature of the experiment with a constant environment throughout, did at least not bias the molecular signals toward extensive genetic parallelism. __


      **Minor comments:**

      • Line 164, the authors write "this phenotype", it is unclear what phenotype is referred to as.

      ANSWER__: Fixed__

      • Line 187: the authors use the word "radius" in the text, while using "perimeter" in the figure.

      ANSWER__: Fixed__

      • Line 224: Is the use of the expression "incomplete detachment between daughter and mother cell" appropriate given that all cells emerge from a multinucleated cell?

      ANSWER__: Fixed – “incomplete detachment between cells.”__

      • Line 151, typo, the "with" should be removed.

      ANSWER__: We believe the reviewer wanted to point out the “with” in line 251, which we fixed.__

      • The intro about changes in ecology is nice but does not make sense given the rest of the paper, I would add it to the discussion.

      ANSWER__: We beg to differ with Reviewer#4 here, as the water column distribution for plankton in marine environment is one of the key aspects of our paper and is a critical parameter in models of water body ecology.__

      • Line 399 "increase their cell size by increasing cell-cell adhesion post-cellularization" the first use of "cell" is misleading because the objects are now a collection of cells rather than a single cell.

      ANSWER__: Fixed__

      Reviewer #4 (Significance (Required)): Most of the findings made in this study have been obtained in previous studies done with more genetically tractable organisms, however this is the first time that such experimental evolution was made on a unicellular non-model system organism closely related to animals. The significance of the work is reduced by the failure to produce evidence to answer two critical questions about the observed mutations: 1) were they selected during the experiment or did they hitch-hike with other selected mutations, and 2) if they were selected, were they selected because they led to faster sedimentation or some other aspect of the conditions in which they were passaged. It would take serious effort to perform additional experiments to address these questions and thus the authors are likely to be better off explaining that their work is unable to answer the questions and thus they are speculating about both the causality of the mutants and the nature of the advantage they conferred.


      ANSWER__: We beg to differ with the reviewer’s argument.__

      We believe that our study demonstrates heritable phenotypic changes for an evolvable, ecologically relevant trait, and their tight cellular regulation. We identify and carefully quantify how two cellular growth phenotypes – the nuclear division rate and cell size control –– can vary heritably and independently of one another, and together directly shape variation in a critical ecological parameter of a marine organism. Therefore, in addition to the fact that the work was performed in an emerging model marine organism, this work provides fundamental “novel” insight into cellular trait evolution more generally.

      Our results do not depend upon knowing the exact genetic mutations or molecular mechanisms which have caused these phenotypic changes. Nor, as the reviewer implies, do we claim to have identified particular mutations that were selected, or their effects on particular cellular phenotypes. We do, however, provide a large amount of evidence that the changes are likely genetic. With our sequencing effort, we find a strong, statistically significant, molecular signal of adaptation in the lineages (dN/dS > 1), and we publish a curated list of affected genes which are potentially causative for the phenotypes we observe.

      Because we did not observe frequently recurrent mutations, as most directed (and cancer, antimicrobial resistance, etc.) evolution studies find, our results suggest that there is a large mutational target size affecting the phenotype of interest, reflecting its potentially broad genetic and molecular control mechanisms. We view these results as a great strength of the study, and consider this result in and of itself “novel”. Furthermore, we have now added and __used a statistical genetic approach to quantify the heritability of traits, or what proportion of the variance in phenotype is due to an individual’s inherited state__ (Figure 1 – figure supplement 1A). The results show that Heritability exceeds 95% across phenotypes, and across the entire dataset, H exceeded 99% of the total phenotypic variance (ANOVA F = 1118 on 252 and 735 DF, p = 0). This means that for a typical individual genotype in a given environment, we could predict its average phenotypic measurement with >97% accuracy.

      The fact that we do not conclusively identify which particular mutations are causative does not obviate the overwhelming evidence that heritable changes occurred in our samples, leading to repeated phenotypic convergence affecting the trait of sedimentation rate. We believe these phenotypic changes, and our quantification of their magnitude, to be a “novel” and “significant” contribution to the literature on cellular trait evolution, ecology, and multicellularity.





    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this study Dudin et al. explored the variability of sedimentation rates in members of the Sphaeroforma genus and found that sedimentation rates are very variable between different isolates as well as during the life cycle of each isolates. Following this observation Dudin et al. evolved S. arctica under a regime favoring fast settling objects. After a few hundred generations they observed that most lineages increased their sedimentation rate. Characterization of some of these evolved population suggests two distinct mechanisms allowing fast sedimentation: cluster formation by non-separation of cells post-cellularization and increase in object density. By sequencing the evolved lines Dudin et al. were able to identify that several mutations has been under the effect of positive selection and that some of the mutations relate to mechanisms involved in cell separation and cellularization.

      Major comments:

      • Line 143, I don't understand how figure 1G shows that "nuclear division cycles were periodic...".
      • When characterizing the evolved lines, the authors display (and measure?) separately the size and the sedimentation rate, but don't directly compare them. If the statement that density plays a role in the sedimentation rate of S4 and S9 but not S1, then correlation between size and sedimentation should be similar between AN and S1 and changed in S4 and S9. It would be nice to see these relationship and the correlations.
      • Line 288: "surviving 780 generations of passaging for all 10 isolates" what data is this referring to?
      • The weakest aspect of the paper is that there is neither a statistical argument (with a single anecdotal exception), from seeing the same genes or pathways mutated in parallel experiments, or experimental reconstruction that argues that any of the observed mutations were selected as opposed to being neutral mutations that hitch-hiked with adaptive mutations. One strongly suspect that some of the observed mutations were selected, but from the available data, it is impossible to know which were selected and which were hitch-hiking.
      • Even if the authors knew which mutations were selected, it is not possible to say if the mutations that have been selected are directly advantageous in the settling regime, they could be due to adaptation to lab conditions and higher temperatures, etc. Having a control evolution experiment with no settling selection would be required to reach the conclusion that the mutants were selected for faster sedimentation.

      Minor comments:

      • Line 164, the authors write "this phenotype", it is unclear what phenotype is referred to as.
      • Line 187: the authors use the word "radius" in the text, while using "perimeter" in the figure.
      • Line 224: Is the use of the expression "incomplete detachment between daughter and mother cell" appropriate given that all cells emerge from a multinucleated cell?
      • Line 151, typo, the "with" should be removed.
      • The intro about changes in ecology is nice but does not make sense given the rest of the paper, I would add it to the discussion.
      • Line 399 "increase their cell size by increasing cell-cell adhesion post-cellularization" the first use of "cell" is misleading because the objects are now a collection of cells rather than a single cell.

      Significance

      Most of the findings made in this study have been obtained in previous studies done with more genetically tractable organisms, however this is the first time that such experimental evolution was made on a unicellular non-model system organism closely related to animals. The significance of the work is reduced by the failure to produce evidence to answer two critical questions about the observed mutations: 1) were they selected during the experiment or did they hitch-hike with other selected mutations, and 2) if they were selected, were they selected because they led to faster sedimentation or some other aspect of the conditions in which they were passaged. It would take serious effort to perform additional experiments to address these questions and thus the authors are likely to be better off explaining that their work is unable to answer the questions and thus they are speculating about both the causality of the mutants and the nature of the advantage they conferred.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The present work adds to the growing literature on sedimentation rate as a major player in the evolution of multicellularity. Via rigorous experimentation, the authors convincingly show that they can select for increase sedimentation rate and identify two mechanisms underlying this increase: incomplete cellular separation leading to multicellular groups and increases in cellular density. They also show surprising natural variation in sedimentation and argue that, along with similar evidence from other organisms, their findings cement the likely major role of sedimentation and go farther by revealing the tight genetic control that it is under.

      Significance

      This is a very significant study because it illuminates processes and underlying mechanisms that could have played a major role in the transition to multicellularity. Their result will likely greatly influence the conceptual and theoretical thinking and will foster additional empirical directions. My only quibble with the manuscript is that I wished for a bit more ecological context and grounding of the main findings: in that respect, both the abstract and the last paragraph of the discussion leave me wanting and occasionally puzzled. If maintaining buoyancy is such a strong selective pressure and the variation in sedimentation rate is such a challenge to it, then I think explaining a bit more exactly why sedimentation would evolve, why so much variation would exist etc etc would be really helpful to the more naive reader. Just a bit further elaboration on selective pressures (even presumed ones and even if speculative) would be helpful to put the picture together.

      A more minor point:

      I remember seeing a talk by Will Ratcliff a while back in which he showed that in S cerevisiae they also see the two mechanisms of increased sedimentation: increased cellular size and clumping. Yet, I didn't see a reference to that work in the context of the cell density mechanism discussion and wondered why.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Hello, we wrote our review before seeing that you have special formatting requirements. We're just going to post our review in it's entirety rather than rewrite it based on these suggestions. It encompasses the above content, it's just not formatted in the suggested order. We hope that's OK!

      Full review:

      This manuscript makes a strong case for the evolvability of multicellular size via selection for settling rate in the icthyosporea. The use of an experimental evolution framework to assess the evolvability of multicellular phenotypes, using sedimentation rate as a selective pressure, extends the previous work of others into a new domain within the holozoan and the closest living relatives of animals. The natural, ecological significance of selection for sedimentation rate is a novel idea, and the connection between sedimentation rate and multicellular evolution in natural as opposed to contrived experimental circumstances is an interesting idea. The results are striking and well supported, with laboratory evolution rapidly adjusting both the cellular composition and the multicellular phenotypes of the organisms involved in ways that are well explained. This is an important result that brings the laboratory study of the evolution of multicellularity forward, into a different branch of the tree of life and showing its broad applicability.

      Sequencing of evolved lines adds significantly to the completeness of the story. While the causal role of these mutations in the production of the observed multicellular phenotypes are not demonstrated via manipulation or breeding, this is quite understandable in the light of the unusual model organism and the observed homologies and role of the genes involved. While this is largely clear from a reading, we believe the manuscript would benefit from a brief analysis of the numerical enrichment of genes with homologs involved in cytokinesis, cell membrane composition, and cell cycle control relative to the null hypothesis of genes picked randomly from the genome. If this is beyond the scope of this research in an unusual model organism with many poorly annotated genes, then a slightly expanded verbal discussion of the potential roles of the apparent functions of these genes in the evolution of multicellular clumping would be an appropriate substitute.

      We wholeheartedly recommend the publication of this manuscript with a number of minor revisions, which while not affecting the main conclusions or points of the manuscript will clarify important points, adjust small errors, and point the reader at relevant literature and concepts.

      Major points:

      none.

      Minor points:

      Line 79 - is sedimentation rate really invariably associated with multicellularization? Active swimming would seem to prevent this.

      Line 164 - the precise phenotype in the evolution experiment being referred to is unclear without further context, with the ordering of paragraphs possibly needing a little work.

      Line 178 - is sorting them into three classes informative? Are there different mutations associated with these, or is it just visual clumping on the numberline? Perhaps not a useful classification, but the existence of great variation is an important point to get across. A more useful classification might be those that increase sedimentation with large density changes versus exclusively by clumping.

      Line 254 - excess cellular density is referred to interchangeably with density, when these are very different figures. This continues in line 269, and in the figure legends of Figure 4.

      Line 341 - the rule of RCC1 homolog in other organisms could be expanded on in slightly more detail. Similarly, other mutations in this same section known to affect cytokinesis could have potential mechanisms for affecting clumping commented upon, especially given the cell membrane results in the figures.

      Line 387 - awkward formatting or sentence structure, with dashes and commas.

      Line 395 - this cellular process, or this evolutionary process of selection for faster settling?

      Line 408 - per unit volume

      Line 425 - the idea of clumpiness as ancestral is quickly put forward and dismissed within a single sentence. This could be explored in slightly more detail as an option, before concluding that what is clear is that the phenotype is easy to change.

      Line 437 - sedimentation as a highly variable trait, or a highly evolvable trait?

      Figure 1G, 1H: We are fairly certain that the logarithmic scale of DNA content and coenocyte volume are mislabeled. The scale that is labeled log2 in 1G in the legend goes up by factors of 2 rather than single digits. The axis is obviously logarithmic, and the log2 in the legend is superfluous and misleading. Similarly, in 1H a scale labeled as log10 goes from 1 to 30, which on a logarithmic scale would be a sphere approximately 100 kilometers wide. The numbers can remain, but the legend should remove the log10.

      General:

      Were there any head to head competitions performed? Not suggesting you need to, but it's a nice way to directly examine fitness consequences of multicellularity, and is commonly done in the field. If you have done this it wasn't clear to us.

      Significance

      see the above comments about writing the review before realizing there were specific formatting suggestions. I hope you understand us not wanting to re-write the review having already written it once.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements [optional]

      We thank the reviewers for their thoughtful, constructive, and highly actionable critique. The reviewers mentioned that “the experiments presented are well-designed, the methods well-implemented, and communication of the authors' findings is clear and concise”. We are happy to hear that “figure presentation and manuscript layout are top notch and... these data are easy to read and interpret”.

      We appreciate reviewers’ suggestions in improving the interpretability of the morphodynamic representation and address each of the Reviewers’ comments (typeset in blue) in the document below.

      Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      Reviewer # 1 (major points)

      * The Trajectory Feature Vectors (TFVs) are averaged over time - this seems to lose a lot of the salient information in the trajectories themselves, resulting in the low(ish) accuracy of the GMM. Could a Hidden Markov Model trained on the trajectories in state space help to identify/classify those trajectories that change their morphology/motion over time?

      Thanks for the suggestion. We did recognize that averaging will smooth the dynamics in each cell trajectory and reduce diversity of phenotypes. On the other hand, the temporal smoothing serves to reduce the noise, especially when the cells have reached steady state dynamics after being stimulated with pro- or anti-inflammatory cytokines. Our experiments were constructed to probe steady state dynamics and therefore we opted to use temporal smoothing.

      It is possible to identify rare transitions even with some temporal smoothing.

      In our analysis of rare transitions (Fig. 4C), we extracted long trajectories and split them into segments (10~15 frames, 1.5~2 hours). By applying Gaussian Mixture Model (GMM) to each segment, we identified a sequence of states along the full trajectory, from which state transitions were identified.

      During the revision, we will employ the Hidden Markov Model (HMM) to model state transitions in the latent shape space as suggested by the reviewer to detect rare transitions. Our expectation is that HMM will be able to identify more transition events due to its higher time resolution (frame instead of segment), though it may also be affected by unexpected imaging artifacts and noise.

      Reviewer # 1 (minor points)

      Could the authors provide some example images showing interpolation of each PC using the generative decoder?

      Thanks for the suggestion, however the discrete nature of the latent codebook of VQ-VAE makes it challenging to use interpolation as a proxy for utility of interpolation. A possible link between interpolation abilities and usefulness of representation learned by autoencoders has been explored in this paper by Berthelot et al. As Berthelot et al. note, “We perform interpolation in the VQ-VAE by interpolating continuous latents, mapping them to their nearest codebook entries, and decoding the result. Assuming a sufficiently large codebook, a semantically “smooth” interpolation may be possible. On the lines task, we found that this procedure produced poor interpolations. Ultimately, many entries of the codebook were mapped to unrealistic datapoints, and the interpolations resembled those of the baseline autoencoder.”

      Reviewer # 2 (major points)

      -It's unclear what the effect of speed is on the final state determination. TFVs were composed of auto-encoder-based features (PCs from latent space) and speed of the cells. Would the states be very different without speed as part of the TFVs or with TFVs consisting only of speed features? Please quantify and discuss.

      Thanks for your comment. We agree that speed of the cell is a main factor that contributes to the clustering, though shape features (from VQ-VAE) do contribute (Fig. 3B, histograms) to discrimination of cell states. In the revision, we will perform the clustering analysis with only shape features and compare with current results of Fig. 4.

      Reviewer # 3 (major points)

      1. Temporal consistency regularization

      In the authors' framework, models are regularized to minimize the l2 norm between embeddings of adjacent timepoints.

      This is approach is conceptually well-motivated, but could have some unintended effects.

      For instance, some cells may make a rapid state transition such that state(t-1) = A, state(t) = B, state(t+1) = A'.

      In these cases, a regularized model may best minimize the joint loss by returning an embedding at time t that interpolates between state A and A', rather than returning an embedding that reflects the true distinct state B.

      The work would be strengthened if the authors analyzed the impact of this regularization term on the detection of rapid state transitions that occur for only a few frames (e.g. when cells that exhibit filopodial motility "jump" in an actin/myosin contraction).

      This might be accomplished through experiments scanning different regularization hyperparameters on some of the authors' real data, fitting models on temporally downsampled versions of the real data where "slow" multi-timestep transitions now occur in a few timesteps, or perhaps using simulations where rapid state transitions are known to occur.

      Even if the regularization does have some negative impacts, it does not argue against the utility of the general approach, but it is important for users to understand the constraints on downstream applications.

      In our revision, we will evaluate the optimal matching loss for our dataset by training the model with a series of temporal matching loss weights. With this computational experiment, we will illustrate the trade-offs introduced by the relative strengths of matching and reconstruction losses.

      Our expectation is that with very high matching loss, the embeddings (latent vectors) of the frames of the same trajectory will collapse regardless of morphology. For, a relatively wide range of matching loss weights, rank relations between transition pairs ([A->B] + [B->A'] >> [A->A']) should be preserved, from which the rare transitions can be robustly identified. In our experiments, most cells reached steady state morphodynamics when imaged, i.e., the matching loss between two adjacent frames arises primarily due to variations in background/noise. Fast transitions are “rare” in our data. Numerically, fast transitions contribute less to the matching loss during training and therefore their latent representations are not minimized. In other words, if B is a morphologically different state from A/A', the model is driven more by the reconstruction loss due to morphological difference rather than temporal smoothness across three consecutive frames.

      Baseline comparisons

      The authors evaluate their method by assessing the correlation of embedding PCs with heuristic features (Fig. 2C,D + supp.), variation of embedding PCs across cell treatment groups (Fig. 3), and qualitative interpretation of embedding trajectories.

      In the supplement, the authors compare their VQ-VAE approach to VAEs and AAEs and chose to use a VQ-VAE based on lower reconstruction error and higher PC/heuristic feature correlation.

      However, the authors do not compare their method to much simpler baseline approaches to this problem.

      Existing literature suggests that heuristic features of cell shape and motion (similar to those the authors use to evaluate the relevance of their embeddings) are sufficient to perform many of the same tasks a VQ-VAE is used for in this work.

      For instance, in Fig. 3 it appears that a simple analysis of cell centroid speed recovers much of same information as the complex VQ-VAE embeddings.

      In Fig. 2 - Supp. 6, it appears that after regressing out many heuristic features of cell geometry, the latent space largely explains cell non-autonomous information about the background environment, suggesting the heuristic features are largely sufficient.

      To demonstrate the usefulness of their deep modeling approach relative to simple baselines, the authors should compare against existing heuristics and embeddings of heuristics (e.g. PCA) using some of the tasks shown for the VQ-VAE (recovery of perturbation state, state transition detection, qualitative trajectory analysis, discrimination of cell types).

      Heuristics might include those already calculated here, or a more comprehensive set as cited in the Introduction.

      The authors may also consider comparing against baselines that don't include time information for some of their tasks (e.g. recovery of perturbation state could arguably be achieved with CNNs either ignorant of the timestep with simple temporal conditioning, not including trajectory information).

      If these features are sufficient for many of the same tasks performed in this work, the authors should provide a clear argument for readers as to why the unsupervised VQ-VAE approach may be preferable (e.g. ability to recover potentially unknown cell changes, for which no heuristic exists).

      The VQ-VAE doesn't need to be superior along every axis to hold merit, but the work would be strengthened if the authors could show clear superiority along some dimension.

      Thanks for your comments. We agree that through our exploration, specific heuristic features are found to be correlated with latent shape features. We did not start with heuristic features, but instead identified them after observing how cell morphology changes along the principal components of the latent shape space. Discovering the heuristic shape features that describe the variation in shape space, in our view, reinforces the value of self-supervised learning of complex cellular morphologies.

      We’d argue that the dynamorph pipeline complements heuristic approaches: it enables discovery of cell states through unbiased encoding and clustering, and the correlation of learned features with heuristic features enables interpretation of the cell state/data distribution more quantitatively than using either approach in isolation. Our argument is further reinforced by the related work (e.g., Zaritsky et al. and others mentioned in the introduction) on self-supervised learning of cell shape and interpretation of its latent space.

      More specifically, self-supervised learning with temporal matching generates unbiased and smooth encodings for cell morphologies, from which we identified the rank correlations between top PCs and certain geometric properties. However, this does not indicate that the set of heuristics chosen a priori will be equally descriptive of the shape distribution. For example, optical density of cells (phase) is a heuristic feature that has not been used in previous studies, which we recognized after sampling the PCs of shape space. Further identification of such correlations is by itself an interesting discovery enabled by self-supervised learning.

      In the current manuscript, we compared learned latent features (PCA on VQ-VAE latent embeddings) against a simple baseline (top PCs of raw images) and showed superior performances, which already illustrate the advantage of self-supervised learning in denoising data and extracting key diversities. In the revision, we will compare PCs of multiple heuristic features (e.g., cell size) with latent features to further strengthen the above point.

      Reviewer # 3 (minor points)

      For Fig. 4 - supp 1 -- isn't it expected that the GMM cluster of a vector can be predicted from the vector? The GMM clusters were derived from the vectors to begin with, so this seems like a bit of a circular analysis. If I'm missing something, this figure might benefit from more exposition.

      Thanks for your question. The original purpose of having this confusion matrix is to parallel Fig. 3 - supp 2, showing that GMM generated distinct cell states that describe population better than perturbation conditions. The confusion matrix itself is trivial, so we will evaluate how to make this point more precisely during the revision.

      For Fig. 4 - Supp 3, the authors should consider changing the "state" and "cluster" colors on the embedding projections so that they do not match. As presented, it appears as if the states and clusters were co-assayed and linked by some experimental label, when in fact the State 1::Cluster1, State 2::Cluster 2 relationship is just inferred.

      Thanks for your comment, we will change the color scheme for Fig. 4 - supp 3 to avoid confusion in the revision.

      * Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      Reviewer # 1 (major points)

      * The temporal matching to enforce a smooth latent space representation is interesting. The authors mention that they mask out surrounding cells with a median pixel value. Have the authors considered using a pixel weighting in the reconstruction/matching loss to differentiate foreground/background? Also, does this affect detection of any fast (or indeed rare) transitions in the trajectories?

      Thanks for your comment and question. Yes, we indeed incorporated a pixel weighting strategy during training. In addition to masking out surrounding cells, we used a smoothed and enlarged version of individual cell's segmentation mask to emphasize accurate reconstruction of the center cell in each patch, and reduce the influence of the surrounding cells/artifacts/background fluctuations. Matching loss is computed from latent vectors, which will be indirectly affected by the pixel weighting as well.

      More detailed description of the weighting strategy will be added to the methods section. The code for our weighting strategy can be found at: https://github.com/czbiohub/dynamorph/blob/b3321f4368002707fbe39d727bc5c23bd5e7e199/HiddenStateExtractor/vq_vae_supp.py#L287

      Reviewer # 1 (minor points)

      I was a little confused by the labels given to the PCs, as they seem to vary between figures. For example, In Fig2, PC1 and PC2 are Size and Peak Retardance, but in Fig3 they are referred to as Size and Cell Density (which could be interpreted as the number of cells per unit area). Could the authors clarify these in the captions?

      We have clarified the text to distinguish between cell density (population) and optical density (phase).

      The authors note that single-cell tracking is of vital importance. This should be elaborated upon. Also - could the VQ-VAE encodings be used to help track linking in cases of high density?

      We added a clearer reference to the methods section containing details of the tracking procedure. Additionally, we clarified in the discussion that the methods used for segmentation and tracking cells can be refined for high density cultures. Since we rely on the tracks to compute the temporal matching loss and regularize the VQ-VAE encodings (shape space) during the training, the encodings are not useable for refining tracking in high density population.

      Reviewer # 2 (major points)

      -'Cell state' in the field of cell biology has been operationally defined in so many different ways and with so many different types of measurement data, that 'cell state' is becoming a somewhat vacuous term. This is not only a problem of this paper but a challenge for the field. In this case, clustering of cells using a Gaussian mixture model that uses the first few principal components of the latent space coefficients as well speed - both averaged across the frames of cell tracks. This is fine and descriptive, but it's unclear whether this definition of 'cell state' is easily applied to other datasets and how this definition can be operationalized for hypothesis generation and experimentation. For other datasets, e.g. other cell types and other processes, such as differentiation, where e.g. tracking and segmentation may be more difficult and images would look quite different, can one still apply the same approach towards describing cell states? One could state that this definition of cell state is very specific to the dataset and therefore not generally useful. How would the authors respond to such a statement?

      This is an excellent point. We agree that the meaning of a “cell state” or a “cell type” can depend on the context. Cell state can be rigorously described in terms of measurements of the cells, and recent developments of new cell probing techniques, including imaging modalities and single-cell genomics keep adding to the growing list of the features that can be measured. Time-lapse imaging is high dimensional and therefore admits multiple definitions of cell state. Our use of the terms ‘latent shape space’ and ‘trajectory feature vectors’ clarifies how we define the cell state. Given the increasingly wider use of live cell imaging for biological studies and drug discovery, both of these descriptors of cell state are valuable. In the current manuscript, we focus on a combination of morphodynamic features, including but not limited to the cell shape, size, and speed. We use these features to cluster cells in an unbiased manner to detect morpho-dynamic “states” unique for this particular culture system. Our approach can be generalized to other cell culture systems, such as cell differentiation, where cell architecture evolves substantially.

      To clarify this point, we add the following text in the manuscript:

      Line 85: “The meaning of a "cell state" can vary with the physiological and methodological context. In this work, we refer to "morphodynamic states" as a combination of morphological and temporal features. From the trajectory of cells in the latent shape space, we identified transitions among morphodynamic states of single cells. The same approach enabled detection of transitions in the morphodynamic states of cells as a result of immunogeneic perturbations.

      In the discussion:

      Line 333: “ Our work formalizes an analytical approach for data-driven discovery of morphodynamic cell states based on the quantitative shape and motion descriptors. A cell state can be rigorously described in terms of measurements of the cells, and recent developments in measurement techniques, including imaging modalities and single-cell genomics keep adding to the growing list of the features that can be measured. Time-lapse imaging is high dimensional and therefore admits multiple definitions of a cell state.”

      -It's unclear to the reviewer whether the training data (unperturbed microglia) are close enough to the test data (perturbed microglia) such that application of the trained model to the test data makes sense. The authors provide reconstruction loss numbers, but they are difficult to interpret. Can the authors create plots of the unperturbed microglia cells and unperturbed microglia cells in the latent space and show overlap, or in other ways, show that training data and test data are close enough for this application.

      Thank you for pointing out the lack of clarity in generalizability of the model. We trained the model on control, untreated microglia acquired during one experiment, and then applied it to a separate dataset acquired during another experiment that included perturbed and control microglia. The reconstructions shown in Fig. 2 are from the test dataset that was not used during training. The quality of reconstructions supports that the shape space of the training set is representative of the shape space of the larger test set. We will add a density plot in the supplementary figures showing the overlapping latent space distribution of unperturbed (training dataset) and perturbed (test dataset) microglia.

      We now include the revised sentence in the manuscript to clarify the results:

      Line 132: “Comparison of reconstructed shapes from the test set and training set along with the analysis of the shape space described in the next section show that our self-supervised model trained on training dataset generalized well between independent experiments and can be used to compare cell state changes between control microglia and cells treated with multiple perturbations”.

      -Only a small amount of intensity variation is explained; 17% using the first 4 PC components which are mainly used in the analyses. This seems like a very low number. There is a lot of variation in the intensity images that is not explained by the autoencoder. The autoencoder seems to be doing a bad job. At the same time, the downstream analyses using the latent space are insightful and sensible. Can the authors provide more explanation?

      Thanks for your question. We would like to first clarify that the autoencoder (VQ-VAE) used in this work follows the design of the original reference, which doesn't have a very large compression. Given the latent space size (16x16x16), it is understandable that the 4 top PCs captured relatively smaller portions of the variance. The fact that cell shape cannot be described with few principal components is likely due to: a) diversity of morphology of microglia, b) diversity of modalities used to train the model.

      We now include the following text in the manuscript: Line 158: “The high variance of the shape space of microglia can be due to more complex shapes of microglia, such as diversity of protrusions, sub-cellular structures and variations in cell optical density, location of nuclei in migrating cells, etc. As we mentioned above, the inclusion of several imaging channels (brightfield, phase, and retardance) increases the performance of the model, possibly by increasing the diversity of morphological information encoded in our input data.”

      As you note, the downstream analyses from the learned latent space are insightful, e.g., we do detect substantial changes in top PCs upon perturbations. This supports our view that the shape space of microglia as encoded by our data is intrinsically high dimensional and the transients in the shape space are informative.

      Reviewer # 2 (minor points)

      -The motivation for GMMs over k-means is unclear. K-means clustering leads to spatial separation between clusters (states) since all cells/tracks that closest to their cluster mean are per definition further away from the means of other clusters. This is not the case with the more flexible GMMs; e.g. they allow one to have a smaller cluster (with small variance components) inside of a larger cluster (with large variance). The latter scenario seems undesirable for interpretation in terms of states.

      Thanks for your comments. The major reason for choosing GMMs over K-means clustering is that GMM allows different prior distributions for different perturbations. In practice, K-means would be capable of generating clusters regardless of perturbation conditions, while GMM enables a finer separation of states which are very likely correlated with perturbations. We agree that GMM has certain caveats as you mentioned in the comment. In our analyses, we didn’t notice the issues such as ‘nesting of components’ that you described.

      -Related to the previous point, 'self-supervised' sounds nice, but it's still optimizing towards something, in this case explaining the variation in input intensity images. A lot of the variation in the intensity images may not be of interest for the biological investigation of shape and dynamics. Did the authors uncover that indeed some of the latent dimensions are encoding other aspects of the images which may be less related to the biology and more to image properties/artifacts/biases?

      We agree with your assessment. Precisely for the reasons you point out, we counter the dependence of learned representation on non-biological variations in data using temporal regularization. This point is recognized by the reviewer #3. We clarify this concept. We clarify that not all the latent features represent biology of the cells and some represent the features of the instrument and the experiment. We report this for the top few PCs of latent representation and provide the code for the interested reader to discover what other PCs report.

      -The original images are 3D (5 z-planes). The analyzed images were 2D. The reviewer missed how the authors went from 3D to 2D. And since cells are 3D, can the authors describe what they gained by going to 2D and what they potentially lost?

      We added additional text to the methods subsection describing the Dynamorph Pipeline (line 590):

      “The input data for both semantic segmentation and VQ-VAE models are 2D-images of computed phase and retardance that measure integrated optical density and anisotropy across the depth of the cell. The raw collected data is 3-dimensional (5 z-slices acquired in multiple polarization channels). The 2D phase is computed from the full stack of brightfield images via deconvolution. The retardance is computed from an average of the intensities across the 5 z-slices. Subsequent model training is more tractable with 2D data instead of 3D, while capturing the cell architecture across the depth.”

      Reviewer # 3 (major points)

      Cell state transition interpretation

      In line 278, the authors propose that the unbalanced nature of transitions such that p(1 -> 2) >> p(2 -> 1) must represent some difference in timescales across the transitions because "cell states should have reached equilibrium after several days in culture at the time of the imaging experiments".

      This logic is unclear to me for two reasons.

      * If the population obeys detailed balance (e.g. transitions have equal frequency), then observed transitions should be balanced on a reasonably long time window, even if individual transitions occur on different timescales.

      * The assumption that cell states are balanced after a few days in culture is at odds with a few different aspects of the biology. Cell density and nutrient availability are continually changing in the dish, so culture conditions are non-stationary. Imaging apparatuses also commonly impact the cell biology of imaged samples due to imperfect incubation, etc. (2 or 3)

      It seems likelier that these data represent an unbalanced transition due to the non-stationary nature of the culture system.

      Given the authors' emphasis on the value of measuring these transitions, the work would be strengthened by a more careful interpretation of these results, additional analysis details (e.g. how large are most state transitions? are these mostly small shifts "over the border" in state space, or large jumps?), and an attempt at biological interpretation of the observed phenomenon.

      The authors' RNA-seq data may be helpful in this latter regard.

      This is an excellent point. We agree that the cell culture conditions, including nutrient availability, accumulating presence of metabolites and imagine-induced changes constantly introduce new variations to the system. In an attempt to mitigate these dynamic changes to the system, we maintained cells in culture for six days before starting the experiment. To avoid cell stimulation due to freshly added nutrients and growth factors from the culture media, we consistently exchanged the media and performed cytokine treatments 24 hours before each imaging experiment. Each imaging round was started after the cells were allowed to equilibrate to the environmental chamber for at least one hour before imaging. Despite these efforts, we agree with the reviewer that the conditions cannot be considered fully stationary. We removed the sentence “ Given that cell states should have reached equilibrium after several days in culture at the time of the imaging experiments, these results suggest that the transitions from state 2 to state 1 occur at a different time scale (i.e., much slower)” and changed the text to reflect this point:

      Line 294:

      “In our analysis, transition events are very rare among cells treated with IFN beta, while the most frequent cell transitions were observed among cells treated with GBM supernatant. One possible explanation for this imbalance is that IFN-treated cells represent a single polarization axis, while a heterogeneous cell signaling milieu derived from cancer cells provides conflicting pro- and anti-inflammatory signals, instructing cells to transition between the states. While both directions of transitions were observed within the imaging period, cells in state-1 are more likely to transition to state-2 than vice versa within the chosen time frame. This imbalance between the rates of state transitions correlates with the higher state-2/state-1 ratio in GBM and control environment and may explain the longitudinal accumulation of cells in a more activated state under these culture conditions.”

      1. Single cell RNA-seq analysis

      The authors performed a very interesting experiment where they profiled the same cell population using both timelapse imaging and single cell RNA-seq.

      The authors argue that the global structure of the state space resolved by each modality is analogous, but this seems a bit of a stretch to me.

      The behavior state space is unimodal (bifurcated into two states by GMM clustering), while the mRNA-seq space has several distinct clusters.

      The argument that these states are analogous would be significantly strengthened by biological interpretation of the RNA-seq data.

      Do the mRNA profiles exhibit differentially expressed genes that might explain differences in behavior in the cell behavior states?

      The analyses in Fig. 4 - Supp 4 are suggestive that "State 1" contains interferon-responsive cells and not control cells, but broader conclusions don't appear well supported by current analyses.

      We agree with the reviewer’s comment that the analogy between molecular cell states defined with scRNAseq analysis and morphodynamic cell states defined with dynamorph needs to be clarified. In our current work, the correlative measurement of morphodynamics and transcriptome was exploratory and relied on population statistics measured with each modality. More detailed studies linking morphodynamic states to the single cell transcriptomics, such as Patch-Seq or laser microdissection, are needed to decisively link morphodynamics and molecular programs underlying these phenotypes.

      Single cell transcriptomics simultaneously measures thousands of mRNA species in individual cells. Therefore, it can provide a nuanced interpretation for the molecular states of each population, as can be seen at a more granular separation of sub-states in scRNAseq clustering. For example, Cluster 1-2 was defined by high expression of interferon response genes, and predictably, this cluster was primarily derived from the cells treated with IFNb. Interferon exposure induces morphological changes associated with increased cell perimeter, which reports ramification of microglia plasma membrane (Aw et al., PMID: 33183319). It was also shown that infections with neurotropic viruses, leading to interferon response, also leads to decreased velocity and distance traveled for cultured microglia cells (Fekete et al., PMID: 30027450). These observations are in direct agreement with our morphodynamic analysis demonstrating a higher proportion of cells in State 1, characterized by lower cell velocity. Interestingly, scRNAseq analysis also identified a population of cells with high expression of cell cycle genes (Cluster 1-3), which would also be predicted to have a slower speed and potentially larger cell body. These results point to the fact that different molecular states may be underlying very similar morphodynamic states.

      We now provide a revised statement to reflect the above.

      Line 290: “We further compared the detected morphodynamic states with scRNA measurements of the same cell populations. Interestingly, the separation of cells in state-1 and state-2 from control and IFN group parallels the clusters identified with cell transcriptome, suggesting that correlative analysis of gene expression and morphodynamics can reveal molecular programs underlying these phenotypes. In our preliminary analysis, scRNAseq revealed a greater degree of granularity in each of the cell populations, such as cluster 1 of the scRNAseq separating into three additional subclusters. Cluster 1-2 was defined by high expression of interferon response genes, and predictably, this cluster was primarily derived from the cells treated with IFNb. Interferon exposure induces morphological changes associated with increased cell perimeter, which reports ramification of microglia membrane (Aw et al., 2020). It was also shown that infections with neurotropic viruses, leading to interferon response, also leads to decreased velocity and distance traveled for cultured microglia cells (Fekete et al., 2018). These observations are in direct agreement with the higher proportion of cells in State 1, characterized by lower cell velocity. Interestingly, scRNAseq analysis also identified a population of cells with high expression of cell cycle genes (Cluster 1-3), which would also be predicted to have a slower speed and potentially larger cell body. These results point to the fact that different molecular states may be underlying very similar morphodynamic states. Correlative single-cell measurements of morphodynamic states and single cell transcriptomics, such as Patch-Seq or laser microdissection, are needed to decisively link morphodynamics and molecular programs underlying these phenotypes.”

      Reviewer # 3 (minor points)

      1. Check grammar. Some articles are missing and some subject-verb agreements are mismatched. e.g. line 624 "we regularized [the] latent space", line 713 "after both loss[es] achieved".

      Thanks for pointing this out, we have thoroughly checked grammar and typos in this submission.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      Here, the authors present Dynamorph, an unsupervised learning framework for timelapse cell microscopy data built on VQ-VAEs.

      The authors apply this method to the analysis of microglial cell behavior under a series of perturbation conditions.

      Methodologically, the primary contributions of this work are the introduction of a temporal consistency regularization penalty on the latent space of a VQ-VAE model for application to timeseries data and the introduction of a "temporal feature vector"-ization procedure to summarize complex temporal trajectories in a single low-dimensional vector for analysis. Biologically, the primary contributions are the demonstration that microglial responses to different perturbogens and dynamics state transitions can be resolved by transmitted light microscopy.

      Overall, the experiments presented are well-designed, the methods well-implemented, and communication of the authors' findings is clear and concise.

      However, there are unaddressed potential caveats to the proposed framework and the manuscript fails to compare the proposed method to any existing baselines, such that the particular strengths and weaknesses of the method are unclear to readers.

      Major Points

      1. Temporal consistency regularization

      In the authors' framework, models are regularized to minimize the l2 norm between embeddings of adjacent timepoints. This is approach is conceptually well-motivated, but could have some unintended effects.

      For instance, some cells may make a rapid state transition such that state(t-1) = A, state(t) = B, state(t+1) = A'. In these cases, a regularized model may best minimize the joint loss by returning an embedding at time t that interpolates between state A and A', rather than returning an embedding that reflects the true distinct state B.

      The work would be strengthened if the authors analyzed the impact of this regularization term on the detection of rapid state transitions that occur for only a few frames (e.g. when cells that exhibit filopodial motility "jump" in an actin/myosin contraction). This might be accomplished through experiments scanning different regularization hyperparameters on some of the authors' real data, fitting models on temporally downsampled versions of the real data where "slow" multi-timestep transitions now occur in a few timesteps, or perhaps using simulations where rapid state transitions are known to occur.

      Even if the regularization does have some negative impacts, it does not argue against the utility of the general approach, but it is important for users to understand the constraints on downstream applications.

      1. Baseline comparisons

      The authors evaluate their method by assessing the correlation of embedding PCs with heuristic features (Fig. 2C,D + supp.), variation of embedding PCs across cell treatment groups (Fig. 3), and qualitative interpretation of embedding trajectories. In the supplement, the authors compare their VQ-VAE approach to VAEs and AAEs and chose to use a VQ-VAE based on lower reconstruction error and higher PC/heuristic feature correlation.

      However, the authors do not compare their method to much simpler baseline approaches to this problem. Existing literature suggests that heuristic features of cell shape and motion (similar to those the authors use to evaluate the relevance of their embeddings) are sufficient to perform many of the same tasks a VQ-VAE is used for in this work. For instance, in Fig. 3 it appears that a simple analysis of cell centroid speed recovers much of same information as the complex VQ-VAE embeddings. In Fig. 2 - Supp. 6, it appears that after regressing out many heuristic features of cell geometry, the latent space largely explains cell non-autonomous information about the background environment, suggesting the heuristic features are largely sufficient.

      To demonstrate the usefulness of their deep modeling approach relative to simple baselines, the authors should compare against existing heuristics and embeddings of heuristics (e.g. PCA) using some of the tasks shown for the VQ-VAE (recovery of perturbation state, state transition detection, qualitative trajectory analysis, discrimination of cell types). Heuristics might include those already calculated here, or a more comprehensive set as cited in the Introduction. The authors may also consider comparing against baselines that don't include time information for some of their tasks (e.g. recovery of perturbation state could arguably be achieved with CNNs either ignorant of the timestep with simple temporal conditioning, not including trajectory information).

      If these features are sufficient for many of the same tasks performed in this work, the authors should provide a clear argument for readers as to why the unsupervised VQ-VAE approach may be preferable (e.g. ability to recover potentially unknown cell changes, for which no heuristic exists). The VQ-VAE doesn't need to be superior along every axis to hold merit, but the work would be strengthened if the authors could show clear superiority along some dimension.

      1. Cell state transition interpretation

      In line 278, the authors propose that the unbalanced nature of transitions such that p(1 -> 2) >> p(2 -> 1) must represent some difference in timescales across the transitions because "cell states should have reached equilibrium after several days in culture at the time of the imaging experiments". This logic is unclear to me for two reasons.

      • If the population obeys detailed balance (e.g. transitions have equal frequency), then observed transitions should be balanced on a reasonably long time window, even if individual transitions occur on different timescales.
      • The assumption that cell states are balanced after a few days in culture is at odds with a few different aspects of the biology. Cell density and nutrient availability are continually changing in the dish, so culture conditions are non-stationary. Imaging apparatuses also commonly impact the cell biology of imaged samples due to imperfect incubation, etc.

      It seems likelier that these data represent an unbalanced transition due to the non-stationary nature of the culture system. Given the authors' emphasis on the value of measuring these transitions, the work would be strengthened by a more careful interpretation of these results, additional analysis details (e.g. how large are most state transitions? are these mostly small shifts "over the border" in state space, or large jumps?), and an attempt at biological interpretation of the observed phenomenon. The authors' RNA-seq data may be helpful in this latter regard.

      1. Single cell RNA-seq analysis

      The authors performed a very interesting experiment where they profiled the same cell population using both timelapse imaging and single cell RNA-seq. The authors argue that the global structure of the state space resolved by each modality is analogous, but this seems a bit of a stretch to me. The behavior state space is unimodal (bifurcated into two states by GMM clustering), while the mRNA-seq space has several distinct clusters.

      The argument that these states are analogous would be significantly strengthened by biological interpretation of the RNA-seq data. Do the mRNA profiles exhibit differentially expressed genes that might explain differences in behavior in the cell behavior states? The analyses in Fig. 4 - Supp 4 are suggestive that "State 1" contains interferon-responsive cells and not control cells, but broader conclusions don't appear well supported by current analyses.

      Minor Points

      1. Check grammar. Some articles are missing and some subject-verb agreements are mismatched. e.g. line 624 "we regularized [the] latent space", line 713 "after both loss[es] achieved".
      2. For Fig. 4 - supp 1 -- isn't it expected that the GMM cluster of a vector can be predicted from the vector? The GMM clusters were derived from the vectors to begin with, so this seems like a bit of a circular analysis. If I'm missing something, this figure might benefit from more exposition.
      3. For Fig. 4 - Supp 3, the authors should consider changing the "state" and "cluster" colors on the embedding projections so that they do not match. As presented, it appears as if the states and clusters were co-assayed and linked by some experimental label, when in fact the State 1::Cluster1, State 2::Cluster 2 relationship is just inferred.

      Positive comments

      1. Figure presentation and manuscript layout are top notch. Thanks to the authors for making these data easy to read and interpret.

      Significance

      See above.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Provide a short summary of the findings and key conclusions (including methodology and model system(s) where appropriate). Please place your comments about significance in section 2.

      -The authors describe Dynamorph; a deep-learning based autoencoder to represent - in an interpretable latent space - live cell microscopy image data of motile microglia in unperturbed and perturbed situations. Using Dynamorph, the authors identify and describe 'morphodynamic' states of the microglia.

      Major comments:

      Are the key conclusions convincing?

      -Yes, the methodology, observations and conclusions are clearly explained and convincing.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      -'Cell state' in the field of cell biology has been operationally defined in so many different ways and with so many different types of measurement data, that 'cell state' is becoming a somewhat vacuous term. This is not only a problem of this paper but a challenge for the field. In this case, clustering of cells using a Gaussian mixture model that uses the first few principal components of the latent space coefficients as well speed - both averaged across the frames of cell tracks. This is fine and descriptive, but it's unclear whether this definition of 'cell state' is easily applied to other datasets and how this definition can be operationalized for hypothesis generation and experimentation. For other datasets, e.g. other cell types and other processes, such as differentiation, where e.g. tracking and segmentation may be more difficult and images would look quite different, can one still apply the same approach towards describing cell states? One could state that this definition of cell state is very specific to the dataset and therefore not generally useful. How would the authors respond to such a statement?

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary to evaluate the paper as it is, and do not ask authors to open new lines of experimentation.

      -It's unclear to the reviewer whether the training data (unperturbed microglia) are close enough to the test data (perturbed microglia) such that application of the trained model to the test data makes sense. The authors provide reconstruction loss numbers, but they are difficult to interpret. Can the authors create plots of the unperturbed microglia cells and unperturbed microglia cells in the latent space and show overlap, or in other ways, show that training data and test data are close enough for this application.

      -It's unclear what the effect of speed is on the final state determination. TFVs were composed of auto-encoder-based features (PCs from latent space) and speed of the cells. Would the states be very different without speed as part of the TFVs or with TFVs consisting only of speed features? Please quantify and discuss. -Only a small amount of intensity variation is explained; 17% using the first 4 PC components which are mainly used in the analyses. This seems like a very low number. There is a lot of variation in the intensity images that is not explained by the autoencoder. The autoencoder seems to be doing a bad job. At the same time, the downstream analyses using the latent space are insightful and sensible. Can the authors provide more explanation?

      -Related to the previous point, 'self-supervised' sounds nice, but it's still optimizing towards something, in this case explaining the variation in input intensity images. A lot of the variation in the intensity images may not be of interest for the biological investigation of shape and dynamics. Did the authors uncover that indeed some of the latent dimensions are encoding other aspects of the images which may be less related to the biology and more to image properties/artifacts/biases? Are the suggested experiments realistic for the authors? It would help if you could add an estimated cost and time investment for substantial experiments. -These are computational experiments based on already existing data/results/code. It should be relatively straightforward to do these additional computational experiments. Careful analysis and interpretation require time.

      Are the data and the methods presented in such a way that they can be reproduced? -The methods are described with sufficient detail.The complicated experimental and computational processes seem reproducible to a decent extent. The code is captured in Github repos. The reviewer did not attempt to reproduce computational results. The reviewer did not check whether the available data meets FAIR requirements. Are the experiments adequately replicated and statistical analysis adequate?

      -Yes, and there is lots of useful supplementary material which helps with interpretation of the results. Minor comments: Specific experimental issues that are easily addressable. -The motivation for GMMs over k-means is unclear. K-means clustering leads to spatial separation between clusters (states) since all cells/tracks that closest to their cluster mean are per definition further away from the means of other clusters. This is not the case with the more flexible GMMs; e.g. they allow one to have a smaller cluster (with small variance components) inside of a larger cluster (with large variance). The latter scenario seems undesirable for interpretation in terms of states.

      -The original images are 3D (5 z-planes). The analyzed images were 2D. The reviewer missed how the authors went from 3D to 2D. And since cells are 3D, can the authors describe what they gained by going to 2D and what they potentially lost? Are prior studies referenced appropriately?

      -Yes, citations are amply and relevant. Are the text and figures clear and accurate?

      -Yes, the figures are informative. Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      -No specific suggestions

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      -This is a technological/computational advance using a large integrative (experimental+computational) approach.

      Place the work in the context of the existing literature (provide references, where appropriate).

      -The authors have done an excellent job at this.

      State what audience might be interested in and influenced by the reported findings.

      -Cell biologists, brain researchers, computer vision computational biologists

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      -Cell biology, cancer biology, systems biology, machine learning, statistics, data integration

      -Brain biology aspects (biological significance of the findings on morphodynamic microglial states) are difficult to assess for the reviewer

      Referee Cross-commenting

      Comments by Reviewer #1 look great and useful. I think they are in line with my comments. I think this manuscript would benefit from a reviewer that could comment on the biological significance. The review reports are skewed towards questions and remarks about the computational approach.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The authors use a combination of quantitative phase microscopy and machine learning to determine the state space of microglia cells. The key conclusions are that a VQ-VAE is able to capture a compact latent representation of the cell morphology, and combined with motion features, can predict state changes in single cell trajectories, and discriminate between purturbations.

      Major comments:

      Overall - I very much enjoyed reading the manuscript. The work has been carefully performed and the results are interesting.

      • The temporal matching to enforce a smooth latent space representation is interesting. The authors mention that they mask out surrounding cells with a median pixel value. Have the authors considered using a pixel weighting in the reconstruction/matching loss to differentiate foreground/background? Also, does this affect detection of any fast (or indeed rare) transitions in the trajectories?
      • The Trajectory Feature Vectors (TFVs) are averaged over time - this seems to lose a lot of the salient information in the trajectories themselves, resulting in the low(ish) accuracy of the GMM. Could a Hidden Markov Model trained on the trajectories in state space help to identify/classify those trajectories that change their morphology/motion over time?

      Minor comments:

      • Could the authors provide some example images showing interpolation of each PC using the generative decoder?
      • I was a little confused by the labels given to the PCs, as they seem to vary between figures. For example, In Fig2, PC1 and PC2 are Size and Peak Retardance, but in Fig3 they are referred to as Size and Cell Density (which could be interpreted as the number of cells per unit area). Could the authors clarify these in the captions?
      • The authors note that single-cell tracking is of vital importance. This should be elaborated upon. Also - could the VQ-VAE encodings be used to help track linking in cases of high density?
      • I was pleased to see the full source code available!

      Significance

      Nature and significance:

      This is a significant, mostly technical piece of work, that explores a complex new area of science -- using ML and large datasets to gain insight into biological systems. There are significant challenges, not least that interpreting ML models can be challenging.

      Existing literature/context:

      There have been relatively few examples of using self-supervised learning to gain insight into these complex datasets. Much of the work has concentrated on learning morphological descriptors. The present work starts to introduce the time dimension more explicity.

      Target Audience:

      Broadly applicable to those studying cell biology, microscopy and machine learning.

      My expertise:

      ML applied to microscopy data. Single cell tracking.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      The authors do not wish to provide a response at this time.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The authors examine the important and challenging question in current biology, the role of RNA in the structural maintenance of nuclear and cytoplasmic membrane-less organelles including stress granules, processing bodies, nucleolus, Cajal bodies, and nuclear speckles. Furthermore, the authors explored super-enhancer complexes involved in the regulation of gene expression. The authors used RNase L, an interferon-induced ribonuclease which, upon activation in the cytoplasm or targeted to the nucleus, degrades all RNAs within the cell. Then they took the quantitative approach to analyze the effect of RNA degradation on disassembly or reorganization of membrane-less organelles. Interestingly, the authors observed that RNAs present within nuclear organelles are susceptible to RNase IL degradation leading to their disassembly. In contrast, super-enhancer-containing eRNAs are largely unaffected.

      Major concerns

      Many studied organelles are challenging to see in many of the figures. Thus this reviewer encourages the authors to present clearer insets at higher magnification to illustrate what is being quantified, and then show that quantification in the central figure next to the immunofluorescent images.

      The amount of specific RNAs degradation after induction of RNase L for several assemblies should be analyzed by qRT-PCR and quantified. This will justify observations provided by microscopy on an individual cell basis. The main issue regards the connection between RNA and its role in the formation and structural integrity of nuclear organelles. There is consensus that these nuclear assemblies are built on specific nascent transcripts which act as a nucleation scaffold. If specific RNA synthesis is impaired, these assemblies collapse. The authors should discuss it. It would be relevant to mention two experimental works on this topics, DOI: 10.1038/ncb2140 and DOI: 10.1038/ncb2157 The study is limited to observed macroscopical changes in the appearances of assemblies. The authors must dig deeper and provide more conclusive results by several colocalizing components of these assemblies. It has been documented that the visualization of a selective marker for a specific assembly is not enough to prove its functionality/dysfunctionality but also the level of its disassembly. For example, in Figure 4A the authors should more convincible visualize nascent 47/45S pre-rRNA transcript to demonstrate that the nucleolus is built on ongoing pre-rRNA synthesis reflected by the tripartite nucleolar substructures. The loss of the GC component after rRNA depletion should be better presented with NPM1 colocalization.

      In Figure 4C, D the authors used the term "coilin assemblies". That's confusing for a reader. The Cajal body after activation of RNase L likely undergoes the structural rearrangement which cannot be justified only by the presence of rearranged coilin foci. The authors should colocalize them with at least one or two functional markers.

      Enhancer RNAs likely play the role in gene control rather than as a nucleation element to build nuclear assemblies. This should be discussed in the explanation of observed differences between MED1 foci and other assemblies.

      Significance

      Understanding changes in the nuclear and cellular organization that accompany and drive changes in the formation and maintenance of cellular structures is an essential and not well-understood topic. Thus, this manuscript is relevant. However, the presented data in this paper are based on a limited approach, and particularly their interpretation and presentation could be substantially improved. Consequently, the conclusions are not convincingly supported by published data. However, some open questions need to be addressed. Specific criticisms are outlined above.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript by Decker et al. "RNA is required for the maintenance of multiple cytoplasmic and nuclear membrane-less organelles" investigates the structural role of RNA in membraneless organelles. The authors show that degradation of RNA in transient or constitutive membraneless organelles results in the altered formation and structure of many but not all organelles studied. The main assay is the activation of RNAseL activity by dsRNA which then destroys mRNA in the cell. The collected data leads the authors to highlight the possible roles of RNA in membraneless organelle formation and categorize the organelles: some relying more on the RNA-RNA interactions while others on protein-RNA or protein-protein interactions. The manuscript is well written and the data is sound.

      Major comments:

      The authors study the maintenance of organelles by RNA. For the transient ones, like stress granules (SG), it would be very interesting to see the formation/clearance kinetics with and without RNA. Also maybe using something other than dsRNA to trigger the formation. The idea being - if RNA is needed for SG maintenance, then the clearance kinetics with RNA would differ from that of the depleted RNA.

      The experiments were done in cells. It is known that core components of the organelles can form granule like structures in vitro without RNA. If it is possible to show that RNA presence improves the integrity in vitro, that would support the authors claim. For example studying SG maintenance with and without RNAseL using the previously developed SG extraction protocol.

      Minor comments:

      In the Figure 1a, it is not clear if the smaller granules are different from SGs as mentioned in the text, maybe using additional markers can make it clearer. Figure 3 and 4 requires quantification.

      Significance

      This is a solid paper that advances our understanding of membraneless organelle formation and dynamics. This field is of high general interest for the broader scientific community. My expertise is in the field of membraneless organelles.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In the paper entitled. "RNA is required for the maintenance of multiple cytoplasmic and nuclear membrane-less organelles" Decker et al set out to test the rolw of RNA in maintaining the integrity of a variety of biomolecular condensates. To do this, they assess how multiple different assemblies in the cytoplasm and nucleus retain their structural integrity following RNAseL activation. They identified many condensates which are solubilized and have protein components redistributed following RNAse L activation and presumably subsequent RNA digestion. These experiments largely concur with previous findings from RNAseA treatment. The implication is that RNA rather than protein is the essential organizing component for most tested condensates. The manuscript is well written, and the data are convincing. It is my judgement that this is worthy of publication following a few additional experiments/clarifications.

      1. The authors identify condensates which are sensitive to or refractory to RNAaseL. It would be good if the authors more conclusively eliminate the possibility that remaining condensates contain specific residual antiviral RNAs and this is the reason why these condensates remain intact. Are any of these condensates enriched in anti-viral RNAs like IFNbeta following polyIC treatment by FISH, for example (PMID: 31494035)?
      2. Is there a particular protein feature, charge, IDR-types etc. which is common to solubilized versus not solubilized groups? What about dissolved and novel formed assemblies? A simple table comparing protein features in the three groups would suffice, with particular emphasis on RNA binding domains PMID: 32243832 and intrinsic disordered regions PMID: 24773235.
      3. Demonstrate that the RNAseL treatment is reversible (i.e. withdraw polyIC, particularly for a protein that ends up in a novel assembly) or remove the word maintenance from the title.
      4. Control for RNA-dependence of the activity. Try to dissolve a non-RNA dependent/enriched condensate with RNAseL. SPOP mutations (PMID: 30244836) might be interesting as both SPOP and RNAseL loss of function mutations (PMID: 11799394) are associated with prostate cancer.
      5. A caveat is that certain regions of condensates enriched in RNA may not be accessible to RNAseL protein. A way to address this might be to attempt to directly target the enzyme to a compartment that is deemed refractory to the activity (and inferred to not require RNA) via an inducible systsem (ie FKBP12/FK506)
      6. Overall, this paper would be greatly enhanced by including a more extensive discussion on the basic biological implications for these findings. Why are some condensates RNA dependent? What function(s) are common to these condensates? How does disruption of this lead to disease?

      Significance

      This work addresses the neglected role of RNA in structuring condensates throughout the cell. Despite the prevalence of RNA in many condensates and the enrichment of RNA-binding proteins in condensates, there is still a highly limited understanding of the structural roles RNA plays in their assembly s most work has been protein/IDR-centric. This work seeks to systematically assess the RNA-dependence of the assemblies.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      The authors do not wish to provide a response at this time.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In mice, failures in conducting meiosis during spermatogenesis can be rescued by injecting prophase I male chromosomes into oocytes, to allow them to undergo the two meiotic divisions within the oocyte, together with the chromosomes of the oocyte. However, segregations are highly error prone and rarely lead to a live birth when the resulting embryos are reimplanted into foster mothers. In this study, the authors show that segregation errors in meiosis I oocytes harboring both male and female chromosomes are mainly affecting the male chromosome set. Most errors are due to precocious segregation of sister chromatids in unpaired male chromosomes (univalents). A delay in alignemnt of male chromosomes compared to female chromosomes was also observed. Reducing the volume of the oocyte cytoplams to half leads to a signifncant reduction in the errors occuring, and hence, a significant increase in successful birth after re-implantation. Excitingly, with this technique, live births were obtained from male mice with a spermatogenic arrest phenotype.

      Main points:

      1)The authors conclude that halving the oocyte cell size is helping in proper segregation of male meiosis I chromosomes in the cytoplasm of meiosis I oocytes. It is also possible that the experimental procedure involved in removing half of the cytoplasm is promoting proper segregation for some unknown reason. The authors should include a condition where half of the cytoplasm is aspirated but then put back again, so oocytes have the same volume as before but the cytoplasm underwent the same treatment as in the halved oocytes. Also, increasing the cytoplasm volume of the oocyte should not lead to a better segregation of male chromosomes but make things worse, have the authors checked for that?

      2)The authors mention that male chromosomes align with a delay, compared to the female chromosomes. Does this delay depend on activation of error correction, or the spindle asembly checkpoint? Is it possible that dilution of factors required for checkpoint control and hence, assuring proper chromosome segregation, are the reason for error prone segregation in oocytes harboring twice the amount of chromosomes? If yes, have the authors stained for SAC proteins at the kinetochores? Maybe slight overepxression of the SAC protein were sufficient to rescue male meiotic divisions in the oocyte- have the authors tested this hypothesis?

      3) The authors state that male chromosomes have a hard time segregating in the hugh cytoplasm of the oocytes. Maybe it is not the fact that the chromosomes came from a male pronucleus, but this is just a manner of double the chromosomes that have to be segregated in the oocyte cytoplams. How do male chromosomes behave in enucleated oocytes undergoing meiosis I? Conversely, if female chromosomes coming from another oocyte are injected into the recipient oocyte instead of ale chromosomes, are those segregating correctly, or the delay in chromosome alignment and error rate comparable to the situation when the additional chromosome set comes from the male?

      4) In the rescue of mice with spermatogenic arrest the authors find aneuploidies of sex-chromosomes in the off-spring, not of autosomes. To my best of knowledge, autosome aneuploidies are not viable in the mouse, hence this result does not indicate that sex-chromosomes are the main source of aneuploidies. Nevertheless, it is attractive to speculate that aneuploidies are mainly due to sex chromosomes, because the oocyte is not prepared to segregate a male sex-chromosome bivalent. The authors should determine whether the segregation errors in meiosis I in oocytes harboring the additional male chromosome set concern mainly the male sex-chromosomes, by doing Fish analysis after meiosis I.

      Significance

      This study is very interesting and of high significance, and very well executed. I think the study can go much further as far as mechanistic insights are concerned, only requiring techniques and tools that the authors have at their disposition.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Previously, the team has shown that primary spermatocyte nucleus can undergo meiosis when transplanted into immature oocytes, and later obtained normal mice from the fertilized oocytes (Zygotes 1997, PMID: 9276513; PNAS 1998, PMID: 9576931). However, the efficiency was quite low (~ 1%) due to chromosome aberration, thus not feasible for basic/clinical research applications. In this study, Ogonuki et al., extrapolated from the recent study showing the reduction of the ooplasm ameliorate the error of chromosome segregation during meiosis (Dev Cell 2017, PMID: 28486131), injected the spermatocyte nucleus into the half-sized GV oocytes, and succeeded to obtain live murine pups with a high incidence (the birth rate improved from 1% with full-sized oocytes to 19% with half-sized oocytes). Further, through detailed observation with high-resolution 3D live imaging, the authors clarified that the misalignment of paternal chromosomes could be ameliorated by reducing the volume of ooplasm. Finally, the authors applied this technology and obtained live pups from azoospermic mice, suggesting the potential application in human infertility treatment.

      Major comments:

      This is a great study combining the expertise on both sperm and oocytes. The experiments are well designed and performed. The key conclusions are convincing.

      Line 228. The authors claimed that all the pups born following the injection of wild-type or mutant spermatocytes grew into fertile adults.

      Because the authors tested 3 males from wt spermatocytes (line 197), the above sentence should be rephrased.

      The authors found one XXY male among the three male mice from wt spermatocytes. Was the XYY male mouse fully fertile without XY/XYY mosaicism?

      How many females and males were obtained from wt spermatocytes?

      Minor comments:

      The authors clearly showed the technique can be applied to rescue the spermatogenic arrest. The readers would appreciate if the authors include any unsuccessful cases.

      To prevent sex-chromosome aberration, are there any potential markers for selecting most developed spermatocytes?

      Significance

      One in six couples suffers from infertility, and 70-90% of male infertility cases are related to defects in spermatogenesis. Clinically, intracytoplasmic injection of sperm is common, but it is not applicable to men who lack haploid germ cells. Injection of primary spermatocyte nucleus can give pups but the efficiency was poor (~1%, PNAS 1998, PMID: 9576931). In the present study, by using halved oocytes as recipient, the authors improved the efficiency from 1% to 19%. With the great improvement, they further obtained healthy fertile offspring from the male mice genetically lacking haploid cells. This approach opens up the window for the infertile patients suffering from spermatogenic arrest.

      The reviewer's field of expertise: knockout mice, male infertility, spermatogenesis, sperm function, fertilization.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Ogonuki et al developed a new technique using primary spermatocyte-injected oocytes for offspring production. They examined chromosome segregation error in biparental meiosis using spermatocyte-injected oocytes. They showed that artificially reducing ooplasmic volume rescued highly error-prone chromosome segregation by preventing sister separation in biparental meiosis. Their live-imaging analysis demonstrated that erroneous chromosome segregation derived from univalent-like chromosomes followed by predivision of sister chromatids during prometaphase I in biparental meiosis. They showed that the birth rate was improved using halved oocytes. Furthermore, they showed that production of offspring was successful using spermatocyte from azoospermic mice.

      Overall data are convincing and the manuscript addresses important questions. The data was produced in a technically high level. Presented data are sufficient to support conclusions of the authors, and further provide a significant insight into application to production of offspring for azoospermia animals. Thus, the manuscript could be open for the fields and are supposed to deserve publication, if they could address following minor concerns.

      Fig1A, Line 117 This is an amazing experiment to set up biparental meiosis using spermatocyte nuclei. Since spermatocytes are in different stages during progression through meiotic prophase, some of them (late pachytene) should yield crossover but others (before mid-pachytene) are yet to complete recombination. Thus, whether donor paternal chromosomes have bivalents or univalents depends on which stage spermatocytes derived from. The authors should describe how spermatocytes were picked up for injection and whether they used a particular stage of spermatocytes.

      Line 159-160 The authors stated that paternal chromosomes are susceptible to errors in ooplasm-hosted biparental meiosis. This is nice demonstration to trace the origin of separated chromatids. In Fig2C right graph, 1 to 2 paternal chromosomes showed misalignment. It is unclear whether premature separation is biased to any particular paternal chromosome, eg XY ? The authors should discuss more about it.

      Line 176-177 The authors stated that most of errors were preceded by premature separation of bivalent chromosomes into univalent-like structures. This implies that premature separation of bivalent chromosomes happens prior to anaphase onset. Does this depend on spindle force? Or is cohesion intrinsically fragile in donor spermatocyte chromosomes? The authors should discuss more about it.

      Fig3E, The authors depicted that in normal sized oocytes, univalent-like chromosomes undergo predivision at anaphase. This is somewhat too simplified, because Fig3B shows that a certain population exhibits nondisjunction. This model and description should be corrected to fit the data they demonstrated. If sister segregation at anaphase is predominant, I wonder what happens to sister kinetochore mono-orientation and sister centromeric protection in such univalent-like chromosomes. It would be nice to show centromeric proteins MEIKIN, SGO2 in donor spermatocyte chromosomes versus those of oocyte to examine centromeric cohesion. The authors should clarify this issue.

      Line296-294 What do the authors mean by the sentence " It is known that sex chromosomes are prepared to undergo meiosis later than autosomes."?

      Significance

      The manuscript will provide biological significance for the reproduction fields. There are two major biological significances : They addressed the mechanism of erroneous chromosome segregation in biparental meiosis. They showed that biparental meiosis using spermatocyte-injected oocytes can be applied to production of offspring of azoospermic mice, which would have great impact on reproductive biology field. The data was produced with their high level of technique.

      Referee Cross-commenting

      I agree to the point described in Reviewer #3's Main points2. It would be better to see SAC proteins.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      The comments of the reviewers were highly insightful and enabled us to greatly improve the quality of our manuscript. We provided point-by-point responses to each of the reviewers’ comments. Revisions in the text are highlighted in yellow. We hope that the revisions in the manuscript and our accompanying responses will be sufficient to make our manuscript suitable for publication.

      2. Point-by-point description of the revisions

      Reviewer #1

      - The authors provide no rationale for using the PTI score to measure the protein-coding potential of transcripts. The only attempt to justify this measure is given in the methods: "The definition of PTI score is motivated by our hypothetical concept that translation of pPTI is limited by alternate competing sPTIs." (lines 426-427, page 20). What the PTI score measures is the dominance of the largest predicted ORF over the predicted ORFs, in terms of length. It is not clear why there would be competition for translation of putative ORFs for genuine protein-coding transcripts. An alternative hypothesis, briefly touched upon in the discussion (lines 318-320) is that translation of non-functional ORFs could give rise to the production of toxic proteins, in addition to being costly in terms of energy. The authors should provide the reasoning behind the PTI score and should explain the biological mechanisms that may underlie differences between coding and non-coding transcripts.

      Thank you for your comment. We previously identified a de novo gene, NCYM, and showed that its protein has a biochemical function (Suenaga et al 2014; Suenaga et al 2020). However, NCYM was previously registered as a non-coding RNA in the public database, and the established predictors for protein-coding potential, coding potential assessment tool (CPAT), showed a coding probability of NCYM of 0.022, labeling it as a noncoding RNA (new Supplementary Figure 1B). Therefore, we sought to identify a new indicator for coding potential, comparing NCYM with a small subset of coding and non-coding RNAs to determine whether NCYM has sequence features that would allow it to be registered as a coding transcript (data not shown). We found that predicted ORFs, other than major ORFs, seem to be short in coding RNAs. In addition, it has been reported that upstream ORFs inhibit the translation of major ORFs (Calvo et al 2009). Therefore, we hypothesized that the predicted ORFs may reduce the translation of major ORFs, thereby becoming short in the coding transcripts, including NCYM, during evolution. The term ORF refers to an RNA sequence that is translated into an actual product; however, the biological significance of non-translating, predicted ORFs has been largely ignored and remains to be characterized. Therefore, we defined a PTI as an RNA sequence from the start codon sequence to the end codon sequence and did not assume that it would result in a translated product. Thus, PTI can be defined even in genuine non-coding RNAs. The major ORFs are often the longest PTIs (hereafter, primary PTIs or pPTIs) in coding transcripts. Thus, to investigate the importance of pPTIs relative to other PTIs (hereafter, secondary PTIs, or sPTIs) for the evolution of coding genes, we defined a PTI score as the occupancy of the pPTI length to the total PTI length (Figure 1A–B) and assumed that the PTI score was high in coding transcripts. These are the rationale for using the PTI score for protein-coding potential and are now included in the revised manuscript (lines 92-115, page 5-6).

      To examine the biological mechanism underlying the difference between coding and noncoding RNAs, we investigated the relationship between translation and PTI scores. We chose a dataset of non-coding RNAs that translated small proteins derived from the databases SmProt and sORF.org. From ribosome profiling and mass spectrometry data, the databases include noncoding RNAs that encode small proteins (less than 100 residues) as well as mRNAs that have extra-small ORFs in addition to major ORFs. The SmProt database divides these small ORFs into three categories: upstream (uORF), small (sORF), and downstream (dORF). The definitions are based on their locations: uORFs and dORFs are located in 5’ and 3’ UTRs, respectively, and sORFs overlap with major ORFs using different reading frames (new Figure 2B). We first calculated PTI scores of lincRNAs encoding small proteins and found that the distribution of these lincRNAs shifted to higher PTI scores compared with the distribution of all lincRNAs (new Figure 2A). Therefore, lincRNA translation is correlated with higher PTI scores. Next, we examined whether PTI scores were associated with the translation occupancy of major ORFs in coding RNAs. We calculated PTI scores in mRNAs with uORF, sORF, or dORFs and found that the distribution of mRNAs encoding such small proteins shifted to lower PTI scores (new Figure 2C). Similar data were obtained from the sORF org dataset (Supplementary Figure 5). These data support the idea that the PTI score is related to the occupancy of the major ORF during translation. These results are now included in the results of the revised manuscript (lines 241-271, pp 12-13).

      Translation of small proteins from noncoding RNAs seems to inhibit noncoding functions because of ribosome binding and subsequent translation. On the other hand, translation of sPTIs in coding RNAs seems to inhibit the translation of major ORFs because of competing translations (Calvo et al 2009). At the same time, however, the translation of such proteins may have the advantage of producing new functional proteins/regulatory mechanisms during evolution. Therefore, the right and left shifts of the PTI score that we observed for noncoding and coding RNAs, respectively, seem to be slightly deleterious or beneficial. As further discussed in the responses below, the overlap of distributions of PTI scores between coding and noncoding transcripts was negatively correlated with the effective population size of the species. Therefore, as nearly neutral theory predicts, mutations causing such slightly deleterious/beneficial effects of translation in coding and noncoding transcripts seem to be fixed in species with small effective population sizes (including humans) by genetic drift (Kimura 1968, 1983; Ohta 1992). Clearly, PTI scores are related to translation of PTIs, and their distributions suggest a mechanism for producing bifunctional RNAs that are simultaneously coding and noncoding. The discussion has now been included in the revised manuscript (lines 487-503, pp 23-24).

      Calvo SE, Pagliarini DJ, Mootha VK. Upstream open reading frames cause widespread reduction in protein expression and are polymorphic among humans. Proc Natl Acad Sci U S A. 2009 May 5;106(18):7507-12. doi: 10.1073/pnas.0810916106. Epub 2009 Apr 16. PMID: 19372376; PMCID: PMC2669787.

      Kimura M. 1968. Evolutionary rate at the molecular level Nature. 217(5129):624-6. PMID: 5637732. https://doi.org/10.1038/217624a0

      Kimura, M. (1983). Neutral Theory of Molecular Evolution Cambridge: Cambridge University Press. https://doi.org/10.1093/obo/9780199941728-0132

      Ohta T. 1992. The Nearly Neutral Theory of Molecular Evolution. Annu Rev Ecol Syst. 23:263-86.

      - The presence of ORFs in transcripts has long been used as a predictor of their protein-coding potential. For example, the ORF size and the ORF coverage are part of the set of predictors implemented in CPAT (Wang et al., 2013). The PTI score is necessarily related to these methods, yet no comparison is provided. If the PTI score is to be used as a measure to classify transcripts as coding or non-coding, its performance should be compared to other classifiers, including those that use the presence of ORFs as a predictor (e.g., CPAT) but not only (e.g., PhyloCSF, based on the pattern of sequence evolution).

      Thank you for your comment. As you noted, our reasons for using the PTI score were not clearly described in the original manuscript and are now included in the Results section (lines 92-115, page 5-6). As mentioned in response to comment 1, CPAT was not able to predict NCYM as a coding transcript (Supplementary Figure 1B). Furthermore, we intended to use this new concept to identify the RNA sequence elements that determine protein-coding potential, but did not intend to use the score as a classifier of coding or non-coding RNAs. Many studies have identified bifunctional RNAs that are simultaneously coding and noncoding (Li and Liu 2019; Huang Y et al. 2021). Moreover, neutrally evolving peptides are encoded by small ORFs of noncoding RNAs, possibly contributing to the evolutionary origin of new functional proteins (Ruiz-Orera et al. 2014). Therefore, we argue that such dichotomous classification is often misleading, by unconsciously ignoring ncRNAs that encode functional or nonfunctional small proteins. Additionally, this approach has several technical problems. For a training set for use with such a classification, we need a dataset of genuine noncoding RNAs. However, it is quite difficult to define such noncoding RNAs without bias, for example, for cell or tissue types, including cancer or normal cells/tissues. Increasing evidence has shown peptide translation from known noncoding RNAs (Li and Liu 2019; Huang Y et al. 2021); moreover, some of these peptides are specific to the cellular context (Dohka et al 2021). Therefore, we cannot be certain that we are identifying genuine noncoding RNAs from the datasets from ribosome profiling or mass spectrometry, which neither cover all cell/tissue types nor all physiological contexts.

      We agree with you in that we need to compare PTI scores with other indicators of coding potential, such as transcript length, ORF size, and ORF coverage. ORFs of less than 100 residues have been used to define noncoding RNAs; thus, such RNAs necessarily have shorter ORF sizes relative to coding RNAs. Therefore, we calculated these indicators by focusing on noncoding RNAs that encode proteins, but not coding RNAs (new Supplementary Figure 4). The PTI score distribution shifted to the right for lincRNAs encoding small proteins, indicating that the PTI score is related to translation (new Figure 2C). In contrast, the distributions of transcript length, ORF size, and ORF coverage did not shift higher for noncoding RNAs encoding small proteins (new Supplementary Figure 4), although a slight shift to higher ORF coverage was found. Therefore, we argue that the PTI score is a better indicator of translation than transcript length, ORF size or ORF coverage. These results are now included in the results of the revised manuscript (lines 241-255, page 12).

      - The authors compare the observed PTI score distributions with the PTI scores from random or shuffled sequences. They conclude that the PTI scores do not depend on transcript lengths but on transcript sequences (lines 122-123). However, this is not true for non-coding RNAs, for which the observed and randomized distributions are very similar. The relationship between transcript length and PTI scores should be analyzed into more detail. Are the annotated non-coding transcripts with high PTI scores particular in terms of length?

      We analyzed the length of high-PTI-score transcripts compared to all lncRNA transcripts. The average high-PTI-score with high coding potential (0.6 PTI score −29), consistent with the distribution of transcript length in lincRNAs translating small proteins (new Supplementary Figure 4C). Therefore, the high PTI scores are not simply due to the larger ORF size derived from longer transcript length, but also because of the occupancy of pPTI among all PTIs. The occupancy of pPTI can be estimated by ORF coverage or PTI score, and we can easily see that transcript length (the denominator of ORF coverage) correlates with the sum of the lengths of all PTIs (the denominator of the PTI score). Thus, we need to clarify which indicators have more biological significance in terms of gene evolution. Higher PTI scores in noncoding RNAs cause overlap of the coding and noncoding transcripts in eukaryotes, especially in multicellular eukaryotes (new Figure 4 and 5). The overlaps of PTI score distributions between coding and noncoding RNAs (Opti) were positively and negatively correlated with mutation rate and effective population size, respectively, and approximated by logarithmic or exponential relationships (new Figure 6). Because the inverse of the effective population size defines the strength of genetic drift relative to the strength of selection, the overlaps quantified by Opti seem to be derived from genetic drift. These results clearly suggest that the observed PTI score distribution of noncoding RNAs is not random. In contrast, ORF coverage (Ocov) showed a weaker relationship with mutation rates and effective population sizes (new Supplementary Figure 8 and 9). These results suggest that ORF coverage is less related to gene evolution than PTI score, with the weak relationship seemingly indirectly derived from the correlation with the PTI score. We have now included these results in the revised manuscript (lines 306-322, page 15).

      - The authors discuss in depth the correlation between PTI scores and PTI-based protein-coding potential measures (e.g., section "PTI scores correlate with protein coding potential in humans and mice", starting line 125; section "Relationship between the PTI score and protein-coding potential", starting line 243). Given that the protein-coding potential is directly derived from the PTI score distributions for coding and non-coding transcripts, it is not surprising that the two should be correlated. The significance of observing a linear or a sigmoid relationship is not clearly explained.

      As you noted, the protein-coding potential was directly derived from the PTI score distributions. Therefore, if the distribution for coding RNA shows a higher or lower PTI score compared to that of noncoding RNA, the protein-coding potential is expected to be positively or negatively correlated with the PTI score. If the distributions of coding and noncoding RNA significantly overlapped (Opti > 0.7), the protein-coding potential became constant and was not correlated with the PTI score (new Figure 7 and new Supplementary Figure 10). Thus, the PTI score is not always positively correlated with the protein-coding potential.

      We had divided the species into three groups; the sigmoidal group, the linear group, or others based on the intercept and slope in the linear approximation, but considering the fit of the linear approximation, there is no essential difference between the sigmoidal and linear groups. Therefore, in the revised text, we classify the species into two groups: linear and constant (new Figure 7 and Supplementary Figure 10). We have now replaced the figures and added a new interpretation of the results in the revised manuscript (lines 341-353, pages 16-17).

      - The authors use the entire set of annotated coding and non-coding transcripts to assess the distribution of PTI scores and to define the protein-coding potential. Traditionally, for methods that aim to classify transcripts as coding or non-coding, this is done using "bona fide" coding and non-coding transcripts, which are used as training sets. The efficiency of the method can then be evaluated using a test set of transcripts. This aspect is lacking here and should be implemented.

      As we wrote in response to your comment 2, we aimed to examine what RNA sequence elements determine genuine-coding RNA but not to identify the classifier of coding and noncoding RNA. Technically, the “bona fide” coding and noncoding RNAs cannot be rigorously defined, given the possible existence of unidentified bifunctional RNAs in the testing sets; therefore, more traditional approaches often eliminate such possibilities.

      - The comparisons among species are likely biased by the quality of lncRNA annotations in non-model organisms - cf. high variations among primates, which are likely driven by the annotation quality and depth.

      As written in the response to comment 3, the variation of PTI score distribution in lncRNA is not random, and overlaps with the distribution of coding RNA are negatively correlated with effective population size (new Figure 6). In addition, we found that the tissue-specific expression of lncRNA influences the PTI score distribution in multicellular eukaryotes (new Figure 8 C and D and new Supplementary Figure 11 and 12). Therefore, the variation is caused, at least in part, by the specificity of gene expression, and it thus contains biological significance. These results are now included in the revised manuscript (lines 383-402, pages 18-19).

      Based on these results, we expect that the quality of the lncRNA annotations derived from two major databases, Ensenbl and RefSeq, are well curated and sufficient to compare the PTI score distribution. Realistically, there is no database that catalogs a number of curated lncRNAs from various species other than these two. However, we also expect that recent progress in whole genome sequencing and transcriptome analysis of vertebrates may improve the annotation of lncRNAs, including non-model organisms, and provide more ideal datasets for comparisons among species.

      - The differences among bacteria, archaea and eukaryotes should be discussed into more depth. In bacteria, the genuine ORF is well defined by the presence of translation signals (e.g., Shine-Dalgarno sequence). Other factors are also at work in both prokaryotes and eukaryotes, including RNA secondary structures. The relationship between these factors and the PTI score should be discussed.

      The Shine–Dalgarno sequence in bacteria and the Kozak sequence in eukaryotes have been identified as important regulatory elements for ribosome binding, but these sequences are not essential for all coding RNAs, and their significance is not well characterized, especially in noncoding RNAs that are translated. Recent research has sought to identify the determinants that regulate ribosome binding to lncRNAs using 99 characteristics, including the weight of each base at the −6 to +1 positions relative to the start codon (Kozak-like sequence) or RNA secondary structure (Zeng et al 2018). They found that transcript length is a stronger indicator than either of these characteristics for ribosome binding in human lncRNAs. Because the PTI score is a better indicator for translation of lincRNAs than transcript length (new Supplementary Figure 4C), we would argue that Kozak sequences and RNA secondary structures are not reliable indicators for ribosome binding of lncRNAs, and their significance should be limited to more specific transcript classes. Furthermore, Hata et al. recently showed that the Kozak sequence is a negative regulator of de novo gene birth in plants (Hata et al. 2021). Therefore, these sequence characteristics seem to evolve after the birth of coding transcripts and are not generally involved in new coding gene origination from noncoding RNAs.

      Zeng C, Hamada M. 2018. Identifying sequence features that drive ribosomal association for lncRNAs BMC Genomics. 19(Suppl 10):906. PMID: 30598103; PMCID: PMC6311901. https://doi.org/10.1186/s12864-018-5275-8

      Hata T, Satoh S, Takada N, Matsuo M, Obokata J. 2021. Kozak sequence acts as a negative regulator of de novo transcription initiation of newborn coding sequences in the plant genome. Mol Biol Evol. 38:2791-2803. PMID: 33705557; PMCID: PMC8233501. https://doi.org/10.1093/molbev/msab069

      - From an evolutionary perspective, the effective population size (Ne) is also likely related to the "quality" of the ORFs. An analysis of Ne vs. the PTI score distributions would be an interesting addition to this manuscript.

      We appreciate this comment. We now include an analysis of the relationship between Ne and PTI scores by defining an indicator of the extent of overlap in the PTI score distributions between coding and noncoding transcripts. This overlapping score was calculated based on PTI scores or ORF coverage and named Opti or Ocov, respectively. Opti showed positive and negative correlations with mutation rates (Up) and effective population size (Ne), respectively (new Figure 6A), suggesting that the overlap of PTI score distribution is related to slightly deleterious or beneficial mutations fixed in populations due to genetic drift. Furthermore, using the relationship between Ne and Opti, we calculated the minimum effective population size to be approximately 1000, which is consistent with the results from conservation biology (Frankham et al. 2014). Indeed, species at risk of extinction had significantly higher Opti than species with little risk of extinction (left panel, new Figure 6B). In addition, Opti was higher for species with a decrease compared to those with stable population sizes (right panel, new Figure 6B). These results are now included in the revised manuscript (lines 323-332, page 15-16).

      Frankham R, Bradshaw CJA. 2014. Genetics in conservation management: Revised recommendations for the 50/500 rules, Red List criteria and population viability analyses, Biological Conservation, 170:56-63, https://doi.org/10.1016/j.biocon.2013.12.036

      Reviewer #1 (Significance (Required)):

      This manuscript is lacking in novelty and is not well positioned in the field. If the aim of this work is to provide a method to classify transcripts as coding or noncoding, the authors should provide detailed comparisons with existing methods (see above). If the aim is to understand what defines a genuine protein-coding transcript, then the biological mechanisms should be better described and the comparisons among species and among functional categories of genes should be further developed. The idea of using the "dominance" of the largest ORF compared to the other predicted ORFs is interesting, and provides a new element compared to existing methods that rely exclusively on ORF length and ORF coverage. I would recommend that the authors develop this idea further and discuss the advantages of using the ORF dominance compared to just the ORF length or coverage.

      Thank you for your comment. To address this, we have revised the description of our aim to investigate what defines a genuine protein-coding transcript and found that doing so prompted us to learn that the extent of overlap of PTI score distribution between coding and noncoding transcripts is negatively correlated with effective population size. In addition, we have added characterizations of functional categories of high-PTI-score lncRNAs in mice (new Supplementary Tables 6 to 8) and C. elegans (new Supplementary Tables 9, 10, and 11). Comparison of ORF size and coverage to PTI score showed that PTI score is a better indicator for translation of lncRNAs than these indicators and has biological significance in molecular evolution because of the clear correlation between mutation rate and effective population size. These results and related descriptions are now included in the revised manuscript (lines 323-332, pages 15-16; lines 210-218, pages 10-11).

      **Referee Cross-commenting**

      I fully agree with Reviewer 2's remarks. In particular, adding ribosome profiling analyses is an excellent idea and could substantially improve the manuscript.

      We investigated the PTI scores in lncRNAs that are translated, using ribosome profiling data, and found that PTI scores correlated with translation (lines 241-271, pages 12-13). Thank you for this excellent suggestion.

      Reviewer 2

      **Major comments:**

      - some validation of their predictions of coding potential would be good to add. There are plenty of ribosome profiling experiments out there for some of the studied organisms (human, mouse, E. coli) that could be used to show that indeed some of the non-coding RNAs are misclassified and have ribosome density across the predicted open reading frames.

      Thank you for your comment. As noted in our response to Reviewer 1 above, we calculated the PTI scores of translated lncRNAs from the two databases and found that the PTI score correlates with translation of both coding and noncoding RNAs (new Figure 2 and new Supplementary Figures 4 and 5). As noted above, such translation seems to produce slightly deleterious/beneficial effects, thereby becoming fixed in species with smaller effective population sizes by genetic drift. These results and related discussion are now included in the revised manuscript (lines 241-271, pages 12-13; lines 323-332, pages 15-16; lines 487-503, page 23-24).

      - the manuscript is at times difficult to follow and the implication of the statements may not be immediately clear to the readers, particularly those without formal training in bioinformatic methods; even in the abstract. Some examples: "The relationship between the PTI score and protein-coding potential was sigmoidal in most eukaryotes; however, it was linear passing through the origin in three distinct eutherian lineages, including humans". Here it is not clear what this means (without reading the paper) - and even after reading the paper the importance of noting the sigmoidal vs linear relationship of PTI vs. protein-coding potential is unclear. I would encourage the authors to double-check that they provide a clear interpretation of their results, with readers unschooled in proper statistics in mind.

      Thank you for these comments. As we noted in response to comment 4 of Reviewer 1, considering the fit of the linear approximation, there was no essential difference between the sigmoidal and linear groups. Therefore, in the revised manuscript, we classify the species into two groups: linear and constant (new Figure 7 and Supplementary Figure 10). We also propose and diagram a new gene birth model to help readers understand our interpretations more easily (Figure 9). These results and discussion are now included in the revised manuscript (lines 341-353, pages 16-17; lines 514-538, pages 24-25).

      - For the definition of PTI and protein-coding potential the authors refer to the Materials and Methods. I would encourage to explain in plain terms in the results section 1.) how they decided on this particular formalization and 2.) explain clearly what this means.

      Thank you for your suggestion. We have included a concise definition in the revised text in plain terms (lines 107-115, page 5-6; lines 144-146, page 7).

      - The definition of protein coding potential for appears to be dependent on database classification of a transcript as either coding and non-coding. Particularly for organisms with complex transcriptomes, databases may not contain the proper information - what are the implications for their protein-coding potential score?

      Organisms with complex transcriptomes, such as multicellular organisms, present difficulties in classifying coding vs. noncoding transcripts because RNAs classified as noncoding based on proteomic data from a subset of cell types may encode functional proteins in other cell types for which proteomic data are not available. To examine whether cell types affect the PTI distribution of coding and noncoding transcripts, we analyzed transcriptomic data from five mammals (human, mouse, rat, macaque, and opossum) and found that the PTI score distributions were similar in most cell or tissue types for noncoding transcripts (new Figure 8C and Supplementary Figure 11). However, PTI score distributions for noncoding RNA in mature testes showed a rightward shift for all five species (new Figure 8C and Supplementary Figure 11).

      Furthermore, we found that tissue specificity of RNA expression was correlated with PTI score (new Figure 8D and new Supplementary Figure 12 and 13), with more specific expression associated with higher PTI scores in all five species, with the majority of the tissue-specific expression in mature testis. Therefore, the mature testis is a special tissue that expresses noncoding RNAs with high coding potentials. These results support the hypothesis that the testis is a special organ for new gene origination (Kaessmann 2010). We have added these results and discussion to the revised manuscript (lines 383-402, pages 18-19; lines 427-434, pages 20-21; lines 435-445, page 21).

      Kaessmann H. 2010. Origins, evolution, and phenotypic impact of new genes. Genome Res, 20:1313-26. Epub 2010 Jul 22. PMID: 20651121; PMCID: PMC2945180. https://doi.org/10.1101/gr.101386.109

      - The authors completely ignore plants - would it make sense to expand their analysis to this branch of the tree of life?

      In Supplementary Figure 5 of our original manuscript (new Supplementary Figure 7), we have included the PTI score distributions from plants. We also present their overlapping scores (Opti) in the revised manuscript.

      Reviewer 2 (Significance (Required)):

      The manuscript presents an elegant way to predict protein-coding and non-coding RNAs, which may be very relevant to the study of organisms with complex transcriptomes. The audience for the manuscript at the moment may be more limited to scientists trained and working in the field of bioinformatics, but with some integration of transcriptomics and ribosome profiling data, as well as an effort to make the results accessible to scientists not trained in bioinformatics, this manuscript may be relevant and of interest to researchers working on the biology of long non-coding RNAs and translation in general. My expertise: systems biology of RNA binding proteins, transcriptomics, RNA biology.

      **Referee Cross-commenting**

      I fully agree with my co-reviewer regarding additional analyses to strengthen the manuscript.

      Thank you for these comments. We analyzed noncoding RNAs using ribosome profiling data and transcriptomes in different tissues. We found that high PTI scores correlated with translation of noncoding RNAs, and that such high PTI-score noncoding RNAs were specifically expressed in mature testes. Because the effective population size was inversely correlated with the overlap of PTI distributions, the slightly deleterious or beneficial mutations in germ cells of matured testis seem to generate high-PTI score noncoding RNAs as candidates for new coding genes in the next generation. This idea is consistent with the hypothesis that new coding transcripts are derived from noncoding transcripts expressed in spermatocytes and spermatids in mature testes. In addition, we found that human noncoding transcripts with high PTI scores tended to be involved in transcriptional regulation, and the target gene of MYCN was significantly enriched as the original gene. A recent study showed that binding sites for transcription factors, including MYCN, are mutational hotspots in human spermatogonia (Kaiser et al. 2021). Therefore, the PTI score offers an opportunity to integrate the concept of gene birth with classical molecular evolutionary theory, thereby contributing to our understanding of evolution.

      Kaiser VB et al. 2021. Mutational bias in spermatogonia impacts the anatomy of regulatory sites in the human genome. Genome Res. Epub ahead of print. PMID: 34417209. https://doi.org/10.1101/gr.275407.121

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In the manuscript "Potentially translated sequences determine protein-coding potential of RNAs in cellular organisms" Suenaga and colleagues analyze the available transcriptomes from 100 prokaryotes and eukaryotes, as well as >100 viruses to understand whether transcripts tend to be translated or not. They develop a potentially translated island score (PTI) that combines the number and length of open reading frames in a transcript. From there they develop a protein-coding potential score that combines PTI with database information on coding and non-coding transcripts in various organisms and that in some sense predicts whether a transcript would fall in the coding or non-coding category. The main takeaway appears to be that in prokaryotes PTIs and protein coding potential strongly differentiates coding and non-coding transcripts, while in eukaryotes these differences appear to be more fluid. The manuscript presents an interesting bioinformatic analysis of coding properties across the phylogenetic field and may represent an interesting resource. The audience for the manuscript at the moment may be more limited to scientists trained and working in the field of bioinformatics, but with some integration of transcriptomics and ribosome profiling data, as well as an effort to make the results accessible to scientists not trained in bioinformatics, this manuscript may be relevant and of interest to researchers working on the biology of long non-coding RNAs and translation in general.

      Major comments:

      • some validation of their predictions of coding potential would be good to add. There are plenty of ribosome profiling experiments out there for some of the studied organisms (human, mouse, E. coli) that could be used to show that indeed some of the non-coding RNAs are misclassified and have ribosome density across the predicted open reading frames.
      • the manuscript is at times difficult to follow and the implication of the statements may not be immediately clear to the readers, particularly those without formal training in bioinformatic methods; even in the abstract. Some examples: "The relationship between the PTI score and protein-coding potential was sigmoidal in most eukaryotes; however,it was linear passing through the origin in three distinct eutherian lineages, including humans". Here it is not clear what this means (without reading the paper) - and even after reading the paper the importance of noting the sigmoidal vs linear relationship of PTI vs. protein-coding potential is unclear. I would encourage the authors to double-check that they provide a clear interpretation of their results, with readers unschooled in proper statistics in mind.
      • For the definition of PTI and protein-coding potential the authors refer to the Materials and Methods. I would encourage to explain in plain terms in the results section 1.) how they decided on this particular formalization and 2.) explain clearly what this means.
      • The definition of protein coding potential for appears to be dependent on database classification of a transcript as either coding and non-coding. Particularly for organisms with complex transcriptomes, databases may not contain the proper information - what are the implications for their protein-coding potential score?
      • The authors completely ignore plants - would it make sense to expand their analysis to this branch of the tree of life?

      Significance

      The manuscript presents an elegant way to predict protein-coding and non-coding RNAs, which may be very relevant to the study of organisms with complex transcriptomes.

      The audience for the manuscript at the moment may be more limited to scientists trained and working in the field of bioinformatics, but with some integration of transcriptomics and ribosome profiling data, as well as an effort to make the results accessible to scientists not trained in bioinformatics, this manuscript may be relevant and of interest to researchers working on the biology of long non-coding RNAs and translation in general.

      My expertise: systems biology of RNA binding proteins, transcriptomics, RNA biology.

      Referee Cross-commenting

      I fully agree with my co-reviewer regarding additional analyses to strengthen the manuscript.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      The manuscript submitted by Suenaga and co-authors presents a method to evaluate the protein-coding potential of transcripts. This method is based on an index that they name the PTI (potentially translated island) score, which represents the ratio between the length of the largest predicted ORF and the sum of all the predicted ORF lengths, for each transcript. The author compare PTI score distributions between transcripts classified as protein-coding and as non-coding in public nucleotide databases, for a wide range of species, including bacteria, archaea, eukaryotes and viruses. They derive from this comparison a measure of the protein-coding potential of transcripts. To validate this approach, the authors evaluated the distributions of Ka/Ks values for transcripts annotated as coding or non-coding, in various classes of PTI-based protein-coding potential. The main finding of the manuscript stems from the comparison among species: the authors find that bacteria and archaea have narrow, non-overlapping PTI distributions for coding and non-coding transcripts, while eukaryotes have broader and more overlapping PTI distributions.

      Major comments

      • The authors provide no rationale for using the PTI score to measure the protein-coding potential of transcripts. The only attempt to justify this measure is given in the methods: "The definition of PTI score is motivated by our hypothetical concept that translation of pPTI is limited by alternate competing sPTIs." (lines 426-427, page 20). What the PTI score measures is the dominance of the largest predicted ORF over the predicted ORFs, in terms of length. It is not clear why there would be competition for translation of putative ORFs for genuine protein-coding transcripts. An alternative hypothesis, briefly touched upon in the discussion (lines 318-320) is that translation of non-functional ORFs could give rise to the production of toxic proteins, in addition to being costly in terms of energy. The authors should provide the reasoning behind the PTI score and should explain the biological mechanisms that may underlie differences between coding and non-coding transcripts.
      • The presence of ORFs in transcripts has long been used as a predictor of their protein-coding potential. For example, the ORF size and the ORF coverage are part of the set of predictors implemented in CPAT (Wang et al., 2013). The PTI score is necessarily related to these methods, yet no comparison is provided. If the PTI score is to be used as a measure to classify transcripts as coding or non-coding, its performance should be compared to other classifiers, including those that use the presence of ORFs as a predictor (e.g., CPAT) but not only (e.g., PhyloCSF, based on the pattern of sequence evolution).
      • The authors compare the observed PTI score distributions with the PTI scores from random or shuffled sequences. They conclude that the PTI scores do not depend on transcript lengths but on transcript sequences (lines 122-123). However, this is not true for non-coding RNAs, for which the observed and randomized distributions are very similar. The relationship between transcript length and PTI scores should be analyzed into more detail. Are the annotated non-coding transcripts with high PTI scores particular in terms of length?
      • The authors discuss in depth the correlation between PTI scores and PTI-based protein-coding potential measures (e.g., section "PTI scores correlate with protein-coding potential in humans and mice", starting line 125; section "Relationship between the PTI score and protein-coding potential", starting line 243). Given that the protein-coding potential is directly derived from the PTI score distributions for coding and non-coding transcripts, it is not surprising that the two should be correlated. The significance of observing a linear or a sigmoid relationship is not clearly explained.
      • The authors use the entire set of annotated coding and non-coding transcripts to assess the distribution of PTI scores and to define the protein-coding potential. Traditionally, for methods that aim to classify transcripts as coding or non-coding, this is done using "bona fide" coding and non-coding transcripts, which are used as training sets. The efficiency of the method can then be evaluated using a test set of transcripts. This aspect is lacking here and should be implemented.
      • The comparisons among species are likely biased by the quality of lncRNA annotations in non-model organisms - cf. high variations among primates, which are likely driven by the annotation quality and depth.
      • The differences among bacteria, archaea and eukaryotes should be discussed into more depth. In bacteria, the genuine ORF is well defined by the presence of translation signals (e.g., Shine-Dalgarno sequence). Other factors are also at work in both prokaryotes and eukaryotes, including RNA secondary structures. The relationship between these factors and the PTI score should be discussed.
      • From an evolutionary perspective, the effective population size (Ne) is also likely related to the "quality" of the ORFs. An analysis of Ne vs. the PTI score distributions would be an interesting addition to this manuscript.

      Significance

      This manuscript is lacking in novelty and is not well positioned in the field. If the aim of this work is to provide a method to classify transcripts as coding or non-coding, the authors should provide detailed comparisons with existing methods (see above). If the aim is to understand what defines a genuine protein-coding transcript, then the biological mechanisms should be better described and the comparisons among species and among functional categories of genes should be further developed. The idea of using the "dominance" of the largest ORF compared to the other predicted ORFs is interesting, and provides a new element compared to existing methods that rely exclusively on ORF length and ORF coverage. I would recommend that the authors develop this idea further and discuss the advantages of using the ORF dominance compared to just the ORF length or coverage.

      Referee Cross-commenting

      I fully agree with Reviewer 2's remarks. In particular, adding ribosome profiling analyses is an excellent idea and could substantially improve the manuscript.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01041 Corresponding author(s): Gregory P. Way, PhD

      1. General Statements

      On behalf of the authors, I’d like to thank the Review Commons team for sending our manuscript out for review. I’d also like to thank the three anonymous reviewers for providing valuable feedback that will improve the clarity, focus, and analysis interpretation presented in our manuscript.

      To prompt the editorial team, our paper provides two well-controlled innovations:

      We are the first to train variational autoencoders (VAEs) on classical image features extracted from Cell Painting images. VAEs are commonplace in, and have contributed major discoveries to, other biomedical data types (e.g. transcriptomics), but they have been underexplored in morphology data. In our paper, we trained and optimized three different VAE variants using Cell Painting readouts and compared these variants against shuffled data, against PCA (a nonlinear dimensionality reduction algorithm commonly used as a VAE control), and against L1000 (mRNA) readouts from the same perturbations. We found that cell morphology VAEs train with different settings than gene expression data, and that they generate interpretable latent spaces that depend on the chosen VAE variant.

      We tested special VAE properties to predict polypharmacology cell states in a novel way. Polypharmacology is a major reason why drugs fail to reach the bedside. Off-target effects cause unintended toxicity, and lead to adverse clinical events. In our paper, we used VAE latent space arithmetic (LSA) to predict polypharmacology cell states; in other words, what cells might look like if we perturbed them with a compound that had two mechanisms of action (MOA). We compared our results to shuffled data, PCA, and to LSA performed with VAEs trained using L1000 readouts. We found that cell morphology and gene expression provide complementary information, and that we could predict some polypharmacology cell states robustly, while others were more difficult to predict.

      We found value in all of the reviewer comments. We intend to conduct all but four of the proposed analyses to supplement our aforementioned innovations.

      In the following revision plan, we include all reviewer comments exactly as they were written. The reviewers often had overlapping suggestions. In these cases, we grouped together similar reviewer comments and responded to them once.

      We include three sections: 1) A description of the revisions we plan to conduct in the near future; 2) A description of changes we have already made; and 3) A description and rationale of changes we will not pursue.

      Lastly, we would like to highlight that all reviewers provided positive feedback in their reviews. They discussed our paper as “conceptually and technically unique” and were positive about our methods section, stating that we did a “good job making everything available and reproducible”. Our methods section is complete, and we provide a fully reproducible and versioned github repository. We will release a second version of our github repository when we complete our revision plan to maintain clarity for our submitted version and the peer-reviewed version.

      1. Description of the planned revisions

      2.1. Address UMAP interpretability to provide a deeper description of MOA performance

      Reviewer 1: Instead of using UMAP embedding, it would be better to compare reconstruction error or show a reconstructed image with the original image to claim that models reliably approximate the underlying morphology data.

      Reviewer 1: Rather than just stating that the VAE's did not span the original data distribution and saying beta-VAE performed best by eye, some simple metrics can be drawn to analyze the overlap in data for a more direct and quantified comparison. Researchers should also explain what part of the data is not being captured here. Some analysis of what the original uncaptured UMAP represents is important in understanding the limitations of the VAEs' capacity.

      Reviewer 2: The authors compare generation performance based on UMAP. In the UMAP space, data tend to cluster together even though they might be far from each other in the feature space. I would like to see more quantitive metrics on how well these methods capture morphology distributions. You can compute metrics like MMD distance, kullback leibler (KL), earthmoving distance, or a simple classifier trained on actual MoA classes tested on generated data.

      We agree with the reviewers that evaluating reconstruction loss in addition to providing the UMAP coordinates would improve understanding of VAE limitations and enable a better comparison of VAE performance. We will analyze reconstruction loss across models and include these data as a new supplementary figure, which will enable direct comparisons across models and across different MOAs.

      We also agree that UMAP interpretation can be misleading. While currently state-of-the-art, UMAP has mathematical limitations that prevent interpretation of global data structures. However, there are emerging tools, including a new dimensionality reduction algorithm, called PaCMAP, which aims to preserve both local and global structure (Wang et al, 2021). We will explore this tool to determine, both mathematically and empirically, which is most appropriate for our dataset by cross-referencing the visualization with our added supplementary figure describing per-MOA reconstruction loss.

      We would also like to emphasize that we trained our VAEs using CellProfiler readouts from Cell Painting images and not the raw Cell Painting images themselves. As this was one of our primary innovations, this detail is extremely important. Therefore, we have improved clarity and added emphasis to this point in the manuscript introduction and discussion (see section 3).

      2.2. More specific comparisons of MOA predictions to shuffled data and improved description of MOA label accuracy

      Reviewer 1: It is difficult to know the clear threshold for successful performance is on figures like Figure 7 and SFigure 9, but by and large, it appears that the majority of predicted combination MOAs were not successful. Without the ability to either A) adequately predict most all combinations from individual profiles that were used in training or B) an explanation prior to analysis of which combination will be able to predict, it is difficult to see this method being used since the combinatorial predictions are more likely not informative.

      Reviewer 1: The researchers justify the poor performance compared to shuffled data, by saying that A) MOA annotations are noisy and unreliable and B) they MOAs may only manifest in other modalities like what was seen in the L1000 vs morphology predictability. While these might be true, knowing this the researchers should make an effort to clean and de-noise their data and select MOAs that are well-known and reliable, as well as, selecting MOAs for which we have a known morphological or genetic reaction.

      Reviewer 3: Figure 6 is missing error bars (standard deviation of the L2 distance) and, as such, is hard to draw conclusions from.

      We thank the reviewers for raising this concern. We agree that it is critical, and we appreciate the opportunity to address it.

      All three of these comments relate to being unable to draw conclusions from our results when most A∩B predictions appear to have no difference from shuffled controls. Therefore, to address this comment, we will update our LSA evaluation to compare each MOA to a matched set of randomly shuffled data. Specifically, in our existing comparison, we realized a methodological fallacy in how we're displaying these data shuffles. We should be comparing specific MOA combinations to their corresponding shuffled results instead of comparing all to all, which will artificially decrease performance when there are polypharmacology predictions that fail to recapitulate the ground truth cell states.

      We have connected with Paul Clemons, the senior director Director of Computational Chemical Biology Research at the Broad Institute of MIT and Harvard, who has informed us that the Drug Repurposing Hub annotations are among the most well documented. Therefore, while we know that biological annotations are often incomplete, our original text overemphasized the amount of noise contributed by inaccurate labels. We therefore added the following sentence to the discussion to clarify this important point:

      “However, the Drug Repurposing Hub MOA annotations are among the most well-documented resources, so other factors like different dose concentration and non-additive effects contribute to weak LSA performance for some compound combinations (Corsello et al, 2017).”

      We will also update our supplementary figure to account for specific MOA shuffling and include additional text comparing Cell Painting and L1000 showing which MOAs perform best in which modality.

      2.3. More detailed evaluation of MOA performance across drug variance and drug classes

      Reviewer 1: With the small number of combinations that are successfully predicted, to build confidence in the performance, it would be necessary to explain the reason for the differences in performance. Further experimentations should be done looking into any relationship between the type of MOAs (and their features) and the resulting A|B predictability. Looking at Figure 7, the top-performing combinations are comprised entirely of inhibitor MOAs. If the noisiness of the data is a factor, there should be some measurable correlation between feature noisiness and variation and the resulting A|B predictability from LSA.

      We agree with the reviewer that further experimentation would be helpful to gain confidence in our LSA performance. We plan to perform two different analyses to address this question. First, we will compare profile reproducibility (median pairwise correlations among MOAs) to MOA predictability. This will provide insight to determine the relationship between MOA measurement variance and performance. Second, we will split MOAs by category (e.g. inhibitor, activator) and test if there are significant performance differences between categories across VAE models in both L1000 and Cell Painting data. This will tell us if there are certain trends in the type of MOAs we’re able to predict. If there is, this would be useful knowledge since it could suggest that certain types of MOAs are associated with a more consistent cell state.

      2.4. Higher confidence in LSA overfitting assessment

      Reviewer 1: To show that the methodology works well on unseen data, researchers withheld the top 5 performing A|B MOAs (SFig 9) and showed they were still well predicted. This is not the most compelling demonstration since the data to be held out was selected with bias as the top-performing samples. It would be much more interesting to withhold an MOA that was near or only somewhat above the margin of acceptability and see how many holdouts affected the predictability of those more susceptible data points. From my best interpretation, the hold-out experiment also only held out the combination MOA groups from training. It would be better if single MOAs (for example A) which were a part of a combination of MOA (A|B) were also held out to see if predictability suffered as a result and if generalizability did extend to cells with unseen MOAs (not just cells which had already highly performing combinations of seen MOAs).

      We believe our original analysis was extremely compelling. Even if we removed the top MOAs from training, we were still able to capture their combination polypharmacology cell states through LSA. We find this similar to removing all pictures of sunglasses in an image corpus of human faces, but still being able to reliably infer pictures of people wearing sunglasses. Specifically, this tells us that our model is learning some fundamental data generating function that our top performing MOAs tap into regardless of if they are present or not in training.

      However, we agree with the reviewer that withholding intermediate-performing MOAs would also be informative, but for a separate reason. Unlike the best predicted MOAs, the intermediate MOAs are likely more susceptible to changes in the training data, so it would be interesting to determine if intermediate MOAs’ performance is a result of overfitting instead of truly learning aspects of the data generating function. We plan to perform this new analysis and add the results to Supplementary Figure 8 as a subpanel and add a full description of the approach to the appropriate methods subsection.

      2.5. Additional metrics to evaluate LSA predictions to provide more confident interpretation

      Reviewer 2: The predictions are evaluated using L2 distances, which I find not that informative. I would like to see other metrics (correlation or L1 or distribution distances in previous comments)

      We agree with the reviewer that using more than one metric would be helpful because oftentimes a single metric does not tell a complete story. We will add a panel to the LSA supplementary figure (Supplementary Figure 7), using Pearson correlation instead. While L2 distances will tell us how close predictions are to ground truth, Pearson correlations will tell us how consistent, on average, we are able to predict feature direction.

      2.6. Adding a performance-driven feature level analysis to categorize per-feature modeling ability

      Reviewer 2: I would like to see feature-level analysis, which features are well predicted and which ones are more challenging to predict?

      We agree with the reviewer that feature level analysis would be interesting to study. We believe that understanding which features are easy and hard to model could give insight into why certain MOAs (which could be associated with more signal in certain Cell Painting features) are predicted better than others.

      However, we are concerned that it is difficult to have an objective measurement of which features are easier to model because features that have less variation might be easier to model. So, we will analyze the correlation between individual feature reconstruction loss vs. feature variance across profiles. We will color-code the points to represent feature groups or channels. This analysis will not only demonstrate the relationship between feature variance and modeling ability, but also provide insight into the difficulty of modeling individual CellProfiler features.

      1. Description of the revisions that have already been incorporated in the transferred manuscript

      3.1. Documenting positive feedback as provided by the three reviewers

      Reviewer 1: With access to the dataset, the posted GitHub, and documentation in the paper, I believe that the experiments are reproducible.

      Reviewer 1: The experiments are adequately replicated statistically for conventions of deep learning.

      Reviewer 1: This paper proposes a conceptually and technically unique proposal in terms of application, taking existing technologies of VAEs and LSA and, and as far as I know, uses them in a novel area of application (predicting and simulating combination MOAs for compound treatments). If this work is shown to work more broadly and effectively, is seen through to it completion, and is eventually successfully implemented, it will help to evaluate the effects of drugs used in combination on gene expression and cell morphology. An audience in the realm of biological deep learning applications as well as an audience working in the compound and drug testing would be interested in the results of this paper. Authors successfully place their work within the context of existing literature, referencing the numerous VAE applications that they build off of and fit into the field of (Lafarge et al, 2018; Ternes et al, 2021, etc...), citing the applications of LSA in the computer vision community (Radford et al, 2015, Goldsborough et al, 2017), and discussing the biological context that they are working in (Chandrasekaran et al, 2021).

      Reviewer 2: The main novelty of the work is applying VAEs on cell painting data to predict drug perturbations. The final use case could be guiding experimental design by predicting unseen data. However, the authors do not show such an example and use case which is understandable due to the need for doing further experiments to validate computational results and maybe not the main focus of this paper. The authors did a good job of citing existing methods and relevant. The potential audience could be the computational biology and applied machine learning community.

      Reviewer 3: The manuscript is beautifully written in a crystal clear manner. The authors have made a visible effort towards making their work understandable. The methods section is clear and comprehensive. All experiments are rigorously conducted and the validation procedures are sound. The conclusions of the paper are convincing and most of them are well supported by the data. Both the data and the code required to reproduce this work are freely available. Overall, the article is of high quality and relevance to several scientific communities.

      We thank the reviewers for their encouraging remarks and overall positive sentiment. As early-career researchers, we feel empowered by these words.

      3.2. Moved Figure 2 to supplement and removed Figure 5

      Reviewer 1: Fig 2 is not informative so it can go to supplementary.

      Reviewer 2: I liked the paper's GitHub repo, the authors did a good job making everything available and reproducible. As a suggestion, you can move the learning curves in two the sup figures cause they might not be the most exciting piece of info for the non-technical reader.

      Reviewer 3: I would suggest removing Figure 5 (or moving it to the supplementary) as it revisits the content of Figure 1 and does not bring much extra information.

      We agree that Figure 2 might not be informative to a non-technical reader, so we have accepted this suggestion by both reviewers 1 and 2, and we have moved Figure 2 to supplementary.

      We agree with the reviewer and have removed Figure 5.

      3.3. Clarified our data source as CellProfiler readouts, not raw Cell Painting images

      Reviewer 1: In Fig 4, it would be useful to show a few sample representative images with respect to CellProfiler feature groups.

      Reviewer 1: Figure 6, what does it means original input space? Does it mean raw pixel image? As researchers extracted CellProfiler feature groups already, it would be interesting to compare mean L2 distance based on CellProfiler features so that whether VAE improves performance or not (compared to handcrafted features) as a baseline.

      Reviewer 3: While what "morphological readouts" concretely mean becomes clearer later on in the paper, it would be useful to give a couple of examples early on when introducing the considered datasets.

      We thank the reviewer for these suggestions, which bring to light a common source of confusion, which we must alleviate. We are working with CellProfiler readouts (features extracted using classical algorithms) of the Cell Painting images and not the images themselves. We have made several edits throughout the manuscript to improve clarity and remove this confusion, including the introduction, in which we clearly state our model input data:

      “Because of the success of VAEs on these various datasets, we sought to determine if VAEs could also be trained using cell morphology readouts (rather than directly on images), and further, to carry out arithmetic to predict novel treatment outcomes. We derive the cell morphology readouts using CellProfiler (McQuin et al, 2018), which measures the size, structure, texture, and intensity of cells, and use these readouts to train all models.”

      This decision comes with tradeoffs: The benefit of using CellProfiler readouts instead of images is that they are more manageable but we might lose some information. We more thoroughly discuss this important tradeoff in the discussion section:

      “We determined that VAEs can be trained on cell morphology readouts rather than directly using the cell images from which they were derived. This decision comes with various trade-offs. Compared to cell images, cell morphology readouts as extracted by image analysis tools (e.g. CellProfiler) are a more manageable data type; the data are smaller, easier to distribute, substantially less expensive to analyze and store, and faster to train (McQuin et al, 2018). However, it is likely some biological information is lost, because these tools might fail to measure all morphology signals. The so-called image-based profiling pipeline also loses information, by nature of aggregating inherently single-cell data to bulk consensus signatures (Caicedo et al, 2017).”

      3.4. Clarified future directions to infer cell health readouts from simulated polypharmacology cell states

      Reviewer 1: Authors also make the claim that they can infer toxicity and simulate the mechanism of how two compounds might react. This is a claim that would not be supported even if the method were able to successfully predict morphology or gene profiles. Drug interaction and toxicity are quite complex and goes beyond just morphology and expression. VAEs predicting a small set of features would not be able to capture information beyond the readouts, especially when dealing with potentially unseen compounds for which toxicity is not yet known. For example, two compounds might produce a morphology that appears similar to other safe compounds but has other factors that contribute to toxicity. Further, here they show no evidence of toxicity or interaction analysis.

      The reviewer is correct that such a claim is unsupported by our research. Our message was actually that inferring toxicity could be a potential future application of our work. Specifically, for example, we can apply orthogonal models of cell toxicity that we previously derived using other data (Way et al, 2021a) to our inferred polypharmacology cell states. We thank this reviewer for noticing our lack of clarity, and we have made changes in the discussion to make it clear that inferring toxicity is something we may do in the future and is not something that is discussed in the manuscript:

      “In the future, by predicting cell states of inferred polypharmacology, we can also infer toxicity using orthogonal models (e.g. Way et al. 2021) and simulate the mechanisms of how two compounds might interact.”

      3.5. Clarified our method of splitting data, and noting how a future analysis will answer overfitting extent

      Reviewer 2: Could authors outline detailed data splits? Which MoA are in train and which are held out from training? As I understood, there were samples from MoAs that were supposed to be predicted in the calculation of LSA? Generally, the predicted MoA should not be seen during training and not in LSA calculation.

      We now more explicitly detail how we split our data in the methods:

      “As input into our machine learning models, we split the data into an 80% training, 10% validation, and 10% test set, stratified by plate for Cell Painting and stratified by cell line for L1000. In effect, this procedure evenly distributes compounds and MOAs across data splits.”

      We also thank the reviewer for this comment, because they express an important concern about making sure that we are not overfitting to the data. We have explained in the manuscript that because of lack of data, MOAs were repeated in training and LSA. However, we believe overfitting is not playing a large role in model performance. Through our hold 5 out experiment, we are able to show that our models are able to predict the same MOAs irrespective of whether they were in the training data, indicating that we did not overfit to the distribution of certain MOAs.

      Reviewer 1 also suggested that we do the hold 5 out experiment on A∩Bs that were barely predicted. After we do that, we will explicitly demonstrate the extent of overfitting.

      3.6. Introduced acronyms when they first appear in the manuscript

      Reviewer 3: The Kullback-Leibler divergence is properly introduced in the methods part, but not at all in the introduction (it directly appears as "the KL divergence"). To enhance readability, it would be better to fully spell it before using the acronym, and maybe give a one-sentence intuition of what it is about before pointing out to the methods part for more details.

      We thank the reviewer for bringing this to our attention. We have carefully reviewed the entire manuscript and have corrected such instances of clear introductions to acronyms.

      3.7. Fixed minor text changes

      Reviewer 3: In Figure 1, I would recommend changing "compression algorithms" to "dimension reduction algorithm" or "embedding algorithm". In a compression setting, I would expect the focus to be on the number of bits of information each method requires (or the dimension of the resulting embedding) to encode the data while guaranteeing a certain quality threshold. This is obviously not the case here as the dimension of the embedding is fixed and the focus is on exploring how the embedding is constructed (eg how much it decorrelates the different features, etc) - which may be misleading.

      Reviewer 3: I recommend using "A n B" or "A & B" or "(A, B)" to denote the combination of two independent modes of action A and B. The current notation "A | B" overloads the statistical "A given B" which appears in the VAE loss and is therefore misleading.

      We agree with the reviewer, and aim to minimize all sources of potential confusion. We have made the change in the figure.

      We also agree that our current notation can be confusing. We have updated all instances of “A|B” with “A ∩ B”.

      3.8. Added hypothesis of MMD-VAE oscillations to supplementary figure legend

      Reviewer 3: Do the authors have a hypothesis of what may be causing MMD-VAE to oscillate during validation when data are shuffled? This seems to be the case on two of the three considered datasets (Figure 2 and SuppFigure 1) and is not observed for the other models. Including a few sentences on that in the text would be interesting.

      We believe a big reason for this is because of the fact that the optimal MMD-VAE had a much higher regularization term, which puts a greater emphasis on forming normal latent distributions, than the optimal Beta or Vanilla VAE. Forcing the VAE to encode a shuffled distribution into a normally distributed latent distribution would be difficult to do consistently across different randomly shuffled data subsets, and therefore might cause oscillations in the training curve across epochs when the penalty for that term is high. As these observations may be interesting to a certain population of readers, we have incorporated this explanation into the supplementary figure legend (which is where this figure is shown):

      “Forcing the VAE to consistently encode a shuffled distribution into a normally distributed latent distribution would be difficult, and therefore might cause oscillations in the training curve across epochs.”

      3.9. Explained our selection of VAE variants

      Reviewer 3: The different types of considered VAE and their differences are very clearly introduced. It may however be good to motivate a bit more the focus on beta-VAE and MMD-VAE among all the possible VAE models. This is partly done through examples in the second paragraph of page 2, but could be elaborated further.

      We thank the author for their encouraging remarks. We have made edits to the manuscript’s introduction, explaining why we chose these two variants out of all the possible choices:

      “We trained vanilla-VAEs, β-VAEs, and MMD-VAEs only, and not other VAE variants and other generative model architectures, such as generative adversarial networks (GANs), because these VAE variants are known to facilitate latent space interpretability.”

      1. Description of analyses that authors prefer not to carry out

      4.1. We will not explore additional latent space dimensions in more detail, as this is out of scope

      Reviewer 1: As both reconstructed and simulated data did not span the full original data distribution, it might be better to look at reconstruction error and increase the dimension of latent space.

      We thank the reviewer for bringing up this important point. Our VAE loss function consists of the sum of reconstruction error and some form of KL divergence. Specifically, this reviewer is suggesting that if we only minimize reconstruction error (or focus more on reconstruction over KLD by lowering beta), a higher latent dimension would result in better overall reconstruction. This is true, but doing so would have negative consequences. While we would perhaps get the UMAPs to show the full data distribution, the UMAPs are not our focus; predicting polypharmacology through LSA is. We found that when we have a higher focus on the reconstruction term, we have more feature entanglement, as indicated by lower performance when simulating data and overlapping feature contribution per latent feature. The fact that simulating data would logically require less disentanglement than performing LSA shows that we require higher regularization (and hence lower focus on reconstruction) than the one we got from simulating data.

      Essentially, while the reviewer's comments would improve reconstruction and allow us to improve the UMAPs, doing so would likely worsen LSA performance, which is the main focus of the project. Also, increasing the latent dimension without changing beta would likely have caused little to no change because since beta is encouraging disentanglement, it would cause the newly added dimension to have little variation and encode little new information that wasn’t already encoded before.

      We have also previously explored the concept of toggling the latent dimensions in a separate project (Way et al, 2020). We are very interested in this area of research in general, and any additional analyses (beyond hyperparameter optimization) deserves a much deeper dive than what we can provide in this paper.

      Lastly, we intend to include a deeper description and analysis of reconstruction loss across models, datasets, and MOAs as was suggested by a previous reviewer comment (see section 2.1 above)

      4.2. We will not review Gaussian distribution assumptions of the VAE as we feel it is not informative

      Reviewer 1: By looking at SFigure 6, I am wondering whether latent distribution actually met gaussian distribution (assumption of VAE). It may show skew distribution as some of latent features shows low contribution.

      This reviewer’s comment is interesting, but we do not believe it would change the findings of our study. Suppose we find that the latent dimensions aren’t normally distributed. This wouldn’t change much; a gaussian distribution isn’t the most critical to perform LSA. We need the latent code to be disentangled, but having normally distributed latent features doesn't necessarily mean that we have good disentanglement (see https://towardsdatascience.com/what-a-disentangled-net-we-weave-representation-learning-in-vaes-pt-1-9e5dbc205bd1)

      4.3. In this paper, we will not train or compare conditional VAEs nor cycle GANs

      Reviewer 2: While authors provided a comparison between vanilla VAE and MMD-VAE, B-VEA, there are other methods capable of doing similar tasks (data simulation, counterfactual predictions ), I would like to see a comparison with those methods such as conditional VAE( https://papers.nips.cc/paper/2015/hash/8d55a249e6baa5c06772297520da2051-Abstract.html, CVAE + MMD : https://academic.oup.com/bioinformatics/article/36/Supplement_2/i610/6055927?login=true) or cycle GANs(https://arxiv.org/abs/1703.10593 ).

      While such comparisons would be interesting, they are not the main focus of the manuscript, which is to benchmark the use of VAEs in cell morphology readouts and to predict polypharmacology.

      We think that CVAE would not be appropriate for our study. In a CVAE, the encoder and decoder are both conditioned to some variable. In our situation where we are predicting the cell states of different MOAs, it would make most sense to condition on the MOA. However, because we’re using the MOA labels in our LSA experiment, conditioning on them is likely to bias our results and not be effective for MOAs outside the conditioning.

      For cycle GANs, we have found that training using these data, in a separate study in our lab, is extremely difficult. Our lab has not published this yet, but once we are able to better understand cycleGAN behavior in these data, it will require a separate paper in which we compare performance and dissect model properties in much greater detail.

      Nevertheless, we have added citations to multi-modal approaches like cycle GANs (see section 4.4) as they will point a reader to useful resources for future directions.

      4.4. We will not be comparing with multi-modal integration, but we clarified our focus on Cell Painting VAE novelty and added multi-modal citations

      Reviewer 1: Researchers found that the optimal VAE architectures were very different between morphology and gene expression, suggesting that the lessons learned training gene expression VAEs might not necessarily translate to morphology. It would be interesting to compare the result with multimodal integration as baseline (i.e., Seurat).

      Our focus in this paper was to train and benchmark different variational autoencoder (VAE) architectures using Cell Painting data and to demonstrate an important, unsolved application in predicting polypharmacology that we show is now possible for a subset of compounds. It was a natural and useful extension to compare Cell Painting VAE performance with L1000 VAE performance especially since our data set contained equivalent drug perturbations. We feel that any extension including multi-modal data integration will distract focus away from the Cell Painting VAE novelty, and requires a much deeper dive beyond scope of our current manuscript.

      Additionally, there have been other, more in-depth and very recent multi-modal data integration efforts using the same or similar datasets (Caicedo et al, 2021; Haghighi et al, 2021). In a separate paper that we just recently submitted, we also dive much deeper to answer the question of how the two modalities complement one another in various ways and for various tasks (Way et al, 2021b). These two papers already provide a deeper and more informative exploration of Cell Painting and L1000 data integration.

      Therefore, because multi-modal data integration, while certainly interesting, will distract from the Cell Painting VAE novelty and is redundant with other recent publications, we feel it is beyond scope of this current paper.

      Nevertheless, multi-modal data integration is important to mention, so we add it to the discussion. Specifically, we discuss how multi-modal data integration might help with predicting polypharmacology in the future and include pertinent citations so that we, or another reader, might be able to follow-up in the future. The new section reads:

      “Because we had access to the same perturbations with L1000 readouts, we were able to compare cell morphology and gene expression results. We found that both models capture complementary information when predicting polypharmacology, which is a similar observation to recent work comparing the different technologies’ information content (Way et al, 2021). We did not explore multi-modal data integration in this project; this has been explored in more detail in other recent publications (Caicedo et al, 2021; Haghighi et al, 2021). However, using multi-modal data integration with models like CycleGAN or other style transfer algorithms might provide more confidence in our ability to predict polypharmacology in the future (Zhu et al, 2017).”

      1. References

      Caicedo JC, Cooper S, Heigwer F, Warchal S, Qiu P, Molnar C, Vasilevich AS, Barry JD, Bansal HS, Kraus O, et al (2017) Data-analysis strategies for image-based cell profiling. Nat Methods 14: 849–863

      Caicedo JC, Moshkov N, Becker T, Yang K, Horvath P, Dancik V, Wagner BK, Clemons PA, Singh S & Carpenter AE (2021) Predicting compound activity from phenotypic profiles and chemical structures. bioRxiv: 2020.12.15.422887

      Corsello SM, Bittker JA, Liu Z, Gould J, McCarren P, Hirschman JE, Johnston SE, Vrcic A, Wong B, Khan M, et al (2017) The Drug Repurposing Hub: a next-generation drug library and information resource. Nat Med 23: 405–408

      Haghighi M, Singh S, Caicedo J & Carpenter A (2021) High-Dimensional Gene Expression and Morphology Profiles of Cells across 28,000 Genetic and Chemical Perturbations. bioRxiv: 2021.09.08.459417

      McQuin C, Goodman A, Chernyshev V, Kamentsky L, Cimini BA, Karhohs KW, Doan M, Ding L, Rafelski SM, Thirstrup D, et al (2018) CellProfiler 3.0: Next-generation image processing for biology. PLoS Biol 16: e2005970

      Wang Y, Huang H, Rudin C & Shaposhnik Y (2021) Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP for data visualization. J Mach Learn Res 22: 1–73

      Way GP, Kost-Alimova M, Shibue T, Harrington WF, Gill S, Piccioni F, Becker T, Shafqat-Abbasi H, Hahn WC, Carpenter AE, et al (2021a) Predicting cell health phenotypes using image-based morphology profiling. Mol Biol Cell 32: 995–1005

      Way GP, Natoli T, Adeboye A, Litichevskiy L, Yang A, Lu X, Caicedo JC, Cimini BA, Karhohs K, Logan DJ, et al (2021b) Morphology and gene expression profiling provide complementary information for mapping cell state. bioRxiv: 2021.10.21.465335

      Way GP, Zietz M, Rubinetti V, Himmelstein DS & Greene CS (2020) Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations. Genome Biol 21: 109

      Zhu J-Y, Park T, Isola P & Efros AA (2017) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv [csCV]

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this paper, the authors explore the use of VAE to learn low-dimensional representations of morphological features of cells. They demonstrate that the representations learned by the different VAE models considered accurately model the distribution of features in real data and can be complemented by other biological readouts such as gene expression. Additionally, the structure of the learned feature space appears to be sufficient to generate accurate predictions relying on latent space arithmetic - for instance allowing to predict the morphology of samples subjected to two perturbations knowing the morphology of samples affected by either of these perturbations in isolation.

      Comments:

      The manuscript is beautifully written in a crystal clear manner. The authors have made a visible effort towards making their work understandable. The methods section is clear and comprehensive. All experiments are rigorously conducted and the validation procedures are sound. The conclusions of the paper are convincing and most of them are well supported by the data. Both the data and the code required to reproduce this work are freely available.

      Overall, the article is of high quality and relevance to several scientific communities. I only have a couple of minor comments that I think could help improve it further:

      • The Kullback-Leibler divergence is properly introduced in the methods part, but not at all in the introduction (it directly appears as "the KL divergence"). To enhance readability, it would be better to fully spell it before using the acronym, and maybe give a one-sentence intuition of what it is about before pointing out to the methods part for more details.
      • While what "morphological readouts" concretely mean becomes clearer later on in the paper, it would be useful to give a couple of examples early on when introducing the considered datasets.
      • The different types of considered VAE and their differences are very clearly introduced. It may however be good to motivate a bit more the focus on beta-VAE and MMD-VAE among all the possible VAE models. This is partly done through examples in the second paragraph of page 2, but could be elaborated further.
      • In Figure 1, I would recommend changing "compression algorithms" to "dimension reduction algorithm" or "embedding algorithm". In a compression setting, I would expect the focus to be on the number of bits of information each method requires (or the dimension of the resulting embedding) to encode the data while guaranteeing a certain quality threshold. This is obviously not the case here as the dimension of the embedding is fixed and the focus is on exploring how the embedding is constructed (eg how much it decorrelates the different features, etc) - which may be misleading.
      • Do the authors have a hypothesis of what may be causing MMD-VAE to oscillate during validation when data are shuffled? This seems to be the case on two of the three considered datasets (Figure 2 and SuppFigure 1) and is not observed for the other models. Including a few sentences on that in the text would be interesting.
      • I recommend using "A n B" or "A & B" or "(A, B)" to denote the combination of two independent modes of action A and B. The current notation "A | B" overloads the statistical "A given B" which appears in the VAE loss and is therefore misleading.
      • I would suggest removing Figure 5 (or moving it to the supplementary) as it revisits the content of Figure 1 and does not bring much extra information.
      • Figure 6 is missing error bars (standard deviation of the L2 distance) and, as such, is hard to draw conclusions from.

      Significance

      Nature and significance:

      This work does not hold new conceptual or technical contributions per se as it focuses on showcasing the use of existing techniques established in other fields (eg in the context of natural image processing for latent space arithmetics) to biological data analysis. That said, popularizing successful methodologies beyond the scientific community where they have been developed, as done in this work, is immensely valuable. As such, the approach presented in the paper is likely to inspire and enable many other studies and is therefore a significant contribution (especially so thanks to the code availability!)

      Comparison to existing published knowledge:

      While a bunch of published works use VAEs on biological data, I am not aware of existing ones that study the relative merit of the representations obtained with different VAE models as done here and explore their use in a generative setting with latent space arithmetics. As such, this work is novel and distinguishes itself from existing published knowledge.

      Audience:

      This work is likely to be of interest to life scientists with an enthusiasm for state-of-the-art data analysis techniques. Because the paper is clearly written and makes very few assumptions of prior expert knowledge, it is also likely to be a good entry point to the wider VAE/generative models literature for non-experts. I also believe that this manuscript can be of interest to computer scientists and machine learning researchers as it presents a concrete example of the use of published methods in the context of biological data analysis.

      My expertise:

      Computer vision and machine learning. I do not feel qualified to assess the clinical relevance of this work.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Ler et al. propose a series of VAE based methods to predict compound polypharmacology For cell painting data. They first learn a latent space and try to answer the following counterfactual:

      how would cell morphology or gene expression of a cell perturbed with Drug A change if was perturbed with Drug A and B (A+B) given we have the measurement for drug A and drug B. They address the problem by doing latent space arithmetics (LSA) and decoding the predicted morphology measurements. They first train different VAE models to compare the training stability and simulation performance by sampling from the latent space. Further analysis is the learned latent space to deconvolve latent space to feature space. I like the application of LSA+VAE on cell painting datasets which is the main novelty of the paper. However, I have some major comments and concerns:

      Major comments:

      While authors provided a comparison between vanilla VAE and MMD-VAE, B-VEA, there are other methods capable of doing similar tasks (data simulation, counterfactual predictions ), I would like to see a comparison with those methods such as conditional VAE( https://papers.nips.cc/paper/2015/hash/8d55a249e6baa5c06772297520da2051-Abstract.html, CVAE + MMD : https://academic.oup.com/bioinformatics/article/36/Supplement_2/i610/6055927?login=true) or cycle GANs(https://arxiv.org/abs/1703.10593 ). The authors compare generation performance based on UMAP. In the UMAP space, data tend to cluster together even though they might be far from each other in the feature space. I would like to see more quantitive metrics on how well these methods capture morphology distributions. You can compute metrics like MMD distance, kullback leibler (KL), earthmoving distance, or a simple classifier trained on actual MoA classes tested on generated data.

      The predictions are evaluated using L2 distances, which I find not that informative. I would like to see other metrics (correlation or L1 or distribution distances in previous comments) I would like to see feature-level analysis, which features are well predicted and which ones are more challenging to predict?

      • Could authors outline detailed data splits? Which MoA are in train and which are held out from training? As I understood, there were samples from MoAs that were supposed to be predicted in the calculation of LSA? Generally, the predicted MoA should not be seen during training and not in LSA calculation.

      Minor comments:

      I liked the paper's GitHub repo, the authors did a good job making everything available and reproducible. As a suggestion, you can move the learning curves in two the sup figures cause they might not be the most exciting piece of info for the non-technical reader.

      Significance

      The main novelty of the work is applying VAEs on cell painting data to predict drug perturbations. The final use case could be guiding experimental design by predicting unseen data. However, the authors do not show such an example and use case which is understandable due to the need for doing further experiments to validate computational results and maybe not the main focus of this paper.

      • The authors did a good job of citing existing methods and relevant
      • The potential audience could be the computational biology and applied machine learning community.
      • My expertise is in computational biology and machine learning.
    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Researchers used two primary data modalities (L1000 sequencing data, and Cell painting morphology features) for cell data perturbed by a series of compounds, each with labeled (individual and combination) mechanisms of action. Using several VAEs and ML methods, they evaluated their ability to encode interpretable latent spaces (evaluated by subtracting +/-3stds and checking the contribution off features to the latent space) and adequately reconstruct the input data. Using the constructed latent spaces and labeled MOAs, researchers performed latent space arithmetic, to remove base DMSO features and add features of individual MOAs to produce the features of combination MOAs (evaluated by the significance of difference to shuffled data). Researchers found that MDD-VAE encoded the most information and that VAEs successfully simulated morphology and gene expression features. They found that the optimal VAE architectures were very different between morphology and gene expression. Researchers found that VAEs were able to use individual MOA profiles to simulate some combination MOA profiles with varied success.

      Comments:

      • Researchers found that the optimal VAE architectures were very different between morphology and gene expression, suggesting that the lessons learned training gene expression VAEs might not necessarily translate to morphology. It would be interesting to compare the result with multimodal integration as baseline (i.e., Seurat).

      -Instead of using UMAP embedding, it would be better to compare reconstruction error or show a reconstructed image with the original image to claim that models reliably approximate the underlying morphology data. As both reconstructed and simulated data did not span the full original data distribution, it might be better to look at reconstruction error and increase the dimension of latent space.

      -Fig 2 is not informative so it can go to supplementary. -In Fig 4, it would be useful to show a few sample representative images with respect to CellProfiler feature groups.

      -By looking at SFigure 6, I am wondering whether latent distribution actually met gaussian distribution (assumption of VAE). It may show skew distribution as some of latent features shows low contribution.

      -Figure 6, what does it means original input space? Does it mean raw pixel image? As researchers extracted CellProfiler feature groups already, it would be interesting to compare mean L2 distance based on CellProfiler features so that whether VAE improves performance or not (compared to handcrafted features) as a baseline.

      -It is difficult to know the clear threshold for successful performance is on figures like Figure 7 and SFigure 9, but by and large, it appears that the majority of predicted combination MOAs were not successful. Without the ability to either A) adequately predict most all combinations from individual profiles that were used in training or B) an explanation prior to analysis of which combination will be able to predict, it is difficult to see this method being used since the combinatorial predictions are more likely not informative.

      -Authors also make the claim that they can infer toxicity and simulate the mechanism of how two compounds might react. This is a claim that would not be supported even if the method were able to successfully predict morphology or gene profiles. Drug interaction and toxicity are quite complex and goes beyond just morphology and expression. VAEs predicting a small set of features would not be able to capture information beyond the readouts, especially when dealing with potentially unseen compounds for which toxicity is not yet known. For example, two compounds might produce a morphology that appears similar to other safe compounds but has other factors that contribute to toxicity. Further, here they show no evidence of toxicity or interaction analysis.

      -The researchers justify the poor performance compared to shuffled data, by saying that A) MOA annotations are noisy and unreliable and B) they MOAs may only manifest in other modalities like what was seen in the L1000 vs morphology predictability. While these might be true, knowing this the researchers should make an effort to clean and de-noise their data and select MOAs that are well-known and reliable, as well as, selecting MOAs for which we have a known morphological or genetic reaction.

      -With the small number of combinations that are successfully predicted, to build confidence in the performance, it would be necessary to explain the reason for the differences in performance. Further experimentations should be done looking into any relationship between the type of MOAs (and their features) and the resulting A|B predictability. Looking at Figure 7, the top-performing combinations are comprised entirely of inhibitor MOAs. If the noisiness of the data is a factor, there should be some measurable correlation between feature noisiness and variation and the resulting A|B predictability from LSA.

      -To show that the methodology works well on unseen data, researchers withheld the top 5 performing A|B MOAs (SFig 9) and showed they were still well predicted. This is not the most compelling demonstration since the data to be held out was selected with bias as the top-performing samples. It would be much more interesting to withhold an MOA that was near or only somewhat above the margin of acceptability and see how many holdouts affected the predictability of those more susceptible data points. From my best interpretation, the hold-out experiment also only held out the combination MOA groups from training. It would be better if single MOAs (for example A) which were a part of a combination of MOA (A|B) were also held out to see if predictability suffered as a result and if generalizability did extend to cells with unseen MOAs (not just cells which had already highly performing combinations of seen MOAs).

      -Rather than just stating that the VAE's did not span the original data distribution and saying beta-VAE performed best by eye, some simple metrics can be drawn to analyze the overlap in data for a more direct and quantified comparison. Researchers should also explain what part of the data is not being captured here. Some analysis of what the original uncaptured UMAP represents is important in understanding the limitations of the VAEs' capacity.

      -My suggestions are realistic and feasible. The cost for the recommended tests and validations would cost no additional money (outside of researcher labor and re-training on the existing GPUs) as my recommendations are simply further analysis and training on the same data. Time would be dependent on the time required to train the VAE models, but seeing as 2-layer VAEs are relatively small for the deep learning community, time to train and analyze through existing pipelines should be minimal. This is confirmed by looking at their GitHub code, where jupyter notebooks show that models can be trained in a few minutes.

      -With access to the dataset, the posted GitHub, and documentation in the paper, I believe that the experiments are reproducible.

      -The experiments are adequately replicated statistically for conventions of deep learning.

      Significance

      My background of expertise is developing and applying deep learning and VAEs applied to single cell imaging and expression data. There is no part of this paper that I do not have sufficient expertise to evaluate.

      This paper proposes a conceptually and technically unique proposal in terms of application, taking existing technologies of VAEs and LSA and, and as far as I know, uses them in a novel area of application (predicting and simulating combination MOAs for compound treatments). If this work is shown to work more broadly and effectively, is seen through to it completion, and is eventually successfully implemented, it will help to evaluate the effects of drugs used in combination on gene expression and cell morphology. An audience in the realm of biological deep learning applications as well as an audience working in the compound and drug testing would be interested in the results of this paper. Authors successfully place their work within the context of existing literature, referencing the numerous VAE applications that they build off of and fit into the field of (Lafarge et al, 2018; Ternes et al, 2021, etc...), citing the applications of LSA in the computer vision community (Radford et al, 2015, Goldsborough et al, 2017), and discussing the biological context that they are working in (Chandrasekaran et al, 2021).

  3. Oct 2021
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **Summary:** The manuscript submitted by Djekidel et al entitled: "CovidExpress: an interactive portal for intuitive investigation on SARS-CoV-2 related transcriptomes" reports on a new web portal to search and analyze RNAseq data related to SARS-CoV-2 infections. The authors downloaded and reprocessed data of more than 40 different studies, which is available on the web portal along with all available meta data. The web portal allows to perform numerous differential expression and gene set enrichment analyses on the data and provides publication ready figures. Because of batch effects that could not be removed, the authors do not recommend to analyze data across studies at this point. The authors conclude that the web portal is unique and will allow scientists to rapidly analyze gene expression signatures related to SARS-CoV-2 infections with the potential to make new discoveries. **Major comments:** Based on the scientific literature, the web portal seems to be an unprecedented resource to search and analyze SARS-CoV-2-related RNAseq data and as such would certainly be a useful resource for the SARS-CoV-2 scientific community. The authors argue that new discoveries are possible by using their web portal in providing use cases. However, the section detailing the analyses the authors did to generate new hypotheses about genes potentially relevant in SARS-CoV-2 infections are very difficult to follow and without more guidance very difficult to reproduce with the web portal. It would require substantial expert knowledge in RNAseq data analysis without more information being provided. It also seems that key candidate genes identified by their analyses have all been studied or identified to be related to SARS-CoV-2 infections, so it is somewhat unclear whether new hypotheses can be generated by the reanalysis of RNAseq datasets, especially because combining the data from different studies is currently not recommended by the authors. The manuscript would benefit from providing fewer use cases but for each of them providing more information on how the portal and which studies were used to generate them and which findings were not described in the publication of the used studies. Some observations in the manuscript are not substantiated with significance calculations (see below). At times, the English writing (grammar) should be improved.

      We thank the reviewer for the positive comments. We suppose the reviewer conclude it need substantial expert knowledge in RNAseq data analysis were due to lacking Video Tutorial. We have now put up several Video Tutorials and more tutorials would be added along later along with users’ feedbacks. We believed this would help ease reviewers’ concern.

      In response to whether new hypothesis can be generated. Sorry if it’s not clear, for all the case studies and our “CovidExpress Reveals Insights and Potential Discoveries”, our portal has provided information not reported by their original publications, as listed below:

      1. Case study #1: The original publication employed a multiomics approach to find the predictor genes between ICU and non-ICU patient. But it’s not obviously to know which genes were mainly due to expression level, which might be due to other data they included (e.g. mass spectrometry data). Our portal allow user to quickly check their expression level and find SESN2 does not have strong expression differences.
      2. Case study #2: We replace this case study with bacterial-susceptibility genes to show such questions could be quickly asked and answered using our portal. Such investigation has not been reported before.
      3. FURIN’s function have been well related to SARS-CoV-2. However, for all reports we could find, they focused on Furin cleavage sites of SARS-CoV-2 or whether FURIN were expressed in the SARS-CoV-2 sensitive tissues. SARS-CoV-2 infection could up-regulate FURIN expression have never been reported before. The study published the data didn’t mentioned FURIN at all. We have made this discovery simply by using CovidExpress portal to find the differential expressed genes and overlap with the literature-based gene list (Supplementary Table S2), we believe more discoveries could be made by users by selecting different data.
      4. If we search OASL AND " SARS-CoV-2" on pubmed, only 5 results shown up indicated it’s under-studied. And none of them indicated OASL could be up-regulated both by SARS-CoV-2 infected lung and Rhinovirus-infected nasal in human. It is not clear to us if we might misunderstand reviewers’ suggestion as “fewer use cases”. Thus, we haven’t removed any use cases, instead we provided more details to help users understand what and how did we made those discoveries not reported by their original studies using CovidExpress.

      At last, we have gone through substantial scientific editing to improve the grammar. **Minor comments:** Page 6 last sentence: The statement of this sentence is very much what one would expect. It remains unclear whether the authors mean this as a result to validate the processing of the RNAseq data or as a new discovery. Please, clarify.

      We apologize for the confusion. We intended this statement to be a result confirming what we had expected. We have now amended the text to make this point clearer.

      Figure 3A: The violin plots are so tiny that it is impossible to see any trends. It is also difficult to understand which categories one should compare with each other. If there is anything significant to observe, please, add a statistical test and better guide the reader.

      We agree with the reviewer; therefore, we have removed this figure from the paper. The goal of this figure was to demonstrate how to use violin plots for exploratory analysis; however, in this case, the violin plot did not show a clear trend. By using more filtering and other plots (e.g., Figure 3B-C), we believe we now provide better insight.

      Figure 3C: A legend for the color scale is missing. The signal (I guess expression amounts) for SESN2 seems very weak and the same between ICU and non-ICU samples. What is the significance for assigning this gene to the group of genes being upregulated in ICU samples? Also contrary to what the authors state on page 8, SESN2 does not seem to be highly expressed in ICU samples, however, without knowing what the colors represent (fold changes or absolute expression values?) this is somewhat speculative.

      We thank the reviewer for bringing this to our attention. We have now added a legend for the color scale in the revised figure. In Figures 3A-C, we are showcasing how an exploratory analysis can be performed using CovidExpress. As an example, we investigated the expression of the top 20 genes identified by the random forest classifier of Overmyer et al., 2021, as predictors of ICU and non-ICU cases. In the original Overmyer et al. paper, only the general performance metrics of the models are presented (Fig. 6c-g), but the authors do not show the expression patterns of the top predictors. Hence, we demonstrate how CovidExpress can be used to further investigate some questions not explored in the original paper. SESN2 was listed as a top predictor; however, its expression did not vary between ICU and non-ICU samples, as was also observed by the reviewer. We suspect SESN2 was a top predictor due to other data the Overmyer et al. paper included, such as mass spectrometry data. Our statement about SESN2 was not accurately reflected in the figure; therefore, we have rewritten this section to make it clearer.

      Page 9 first sentence: Please, specify what you mean by "starting list". Furthermore, in this paragraph, how do your results compare to the results from the study that you re-analyze here?

      We thank the reviewer for the question. By “starting list,” we meant the top genes from the Overmyer et al., 2021, article as predictors of ICU and non-ICU cases. We have now rewritten this section to make it clearer. We did not expect our results to differ from their data. Our goal was to ask which of their top predictors (by multi-omics data) show a difference in gene expression. When we downloaded their TPM values from their GEO records, the values were very similar overall (see below).

      Figure 3F: Please add labels to your axes and is there a particular reason why in a correlation plot like this one, the y and x axis are not shown with the same range and why does the y axis not start at 0?

      We thank the reviewer for this helpful comment. Our reasoning for presenting the figure in this way is that different genes can have very different expression levels but still be correlated. For example, if gene A expressed 1, 5, and 10 in samples 1,2, and 3, while gene B expressed 100, 500, and 1000 for samples 1, 2, and 3, then their range would be very different but still perfectly correlated (see panel A below). If we draw the x- and y-axes using the same range, this correlation will not be visually obvious (see panel B below).

      This comparison is different from the correlation plots that compare the expression of one gene in different samples. We apologize for the confusion and to avoid misleading readers, we have enlarged the gene names in the Figure labels to ensure that readers notice their differences. We have also added an option to the correlation plot on our portal so that users can choose the optimal format (see below).

      Page 9 second last sentence: It remains unclear which kind of analysis the authors intend to do here and what the starting question is. Please, try to rewrite with less technical terms (i.e. what do you mean by "precalculated contrasts"). In line with this, it remains unclear what Figure 3I is supposed to show. Please, provide some more information to readers who are not RNAseq analysis experts.

      We thank the reviewer for this suggestion. To avoid any misleading claims, we followed Reviewer #2’s suggestion and replaced the coagulation gene list with a filtered gene list from the “Coronavirus disease - COVID-19” KEGG pathway (hsa05171) to showcase how to identify experiments in which this gene signature is enriched or depleted. We also replaced the related figures and text with new results and rewrote this section to avoid using technical terms.

      Figure 3J is somewhat confusing. Why is the mean expression range indicated from 0 to 1 and why are all genes apparently having a mean expression of 1?

      We thank the reviewer for this question. Because the levels of expression of different genes can vary greatly, in Figure 3J (new Figure 3A and 3I), we normalized the mean expression levels of the genes to their maximum values across groups to improve the visualization. We have now made this clearer in the figure, legend, and text.

      Page 10 line 5-6. Are you referring to coagulation markers here or general expression patterns? In case of the latter, how does this statement fit to the paragraph about analyzing expression patterns of coagulation markers? Please, specify. And in line with this, are the highlighted genes in Figure 3K coagulation markers? If not, what is the relevance of these to make the point that one can use the portal to investigate the role of coagulation markers in SARS-CoV-2 infections?

      As mentioned above, to avoid any misleading claims, we followed Reviewer #2’s suggestion and replaced the coagulation gene list with a filtered gene list from the “Coronavirus disease - COVID-19” KEGG pathway (hsa05171). This revision enables us to show how to identify experiments in which this gene signature is enriched or depleted. We have now replaced these figures and text with new results.

      The appearance of describing batch effects and attempts to remove them from the studies was somewhat surprising on page 10 as I would expect this kind of results rather earlier in the results section before describing use cases of the data. You may consider changing the order of your results for a better flow.

      We apologize for the confusion. However, we want to make it clear that the analysis before page 10 did not involve “batch effect”; all analyses were performed within each study. Thus, it is not necessary to change the order in which the results are presented. Also, based on Reviewer #2’s comments, we did not accurately use the term “batch effect,” because “batch effects are purely due to technical differences.” We have now revised the corresponding text to make this point clearer.

      Page 11, second paragraph. Please, explain briefly what the silhouette score is supposed to reflect and thus how Figure S4G should be interpreted. The difference of both bars in Figure S4G is very marginal and thus, does not seem to support the statement of the authors that the ssGSEA scores-based projection is better unless you perform a significance test or I misunderstood. Please, clarify.

      We thank the reviewer for this suggestion. We have now added an explanation of the silhouette score in the manuscript. Briefly, a silhouette score is a metric of the degree of separability of gene clusters from the nearest cluster. For a given sample, lets be the mean intra-cluster distance, and be the mean distance to the nearest cluster. The silhouette score (sil) will be calculated as follows

      The silhouette score ranges between -1 and 1. A value near 1 means that the clusters are well separated, and a value near -1 means that the clusters are intermingled. Using a Wilcoxon rank test, we showed that using ssGSEA scores significantly improves the separability of global GTEx tissues (in Figure S4G; p=8.75e-26).

      Page 11, third paragraph: Figure 4B, to the best of my understanding, does not support the claim that samples clustered less according to study cohorts using the ssGSEA approach. Please, quantify the effect and test for significance or better explain.

      We apologize for the confusion. We quantified the separability between cohorts (GSE ids) by using the silhouette score. In Figure S4H (panel A below), we show that the TPM-based PCA leads to more separation by studies than does the Covid contrast ssGSEA scores in which the separation between studies is less prominent (p-value=0.0045, paired Wilcoxon test).

      For the analyses described starting on page 12 it remains largely unclear whether they were conducted across studies or within studies and which studies were used. This section until the end of the results would especially benefit from providing more information on how the analyses were performed, either in the results or in the methods section.

      We apologize for the confusion. The goal of the analysis on page 12 and the corresponding Figure 4G was to identify genes whose expression increased in both the SARS-CoV-2 infection lung and rhinovirus-infected nasal tissue. Hence, we did a log2(fold-change) vs log2(fold-change) comparison. The log2(fold-change) values were independently calculated for each study. Because we compared values by using the same ranking metric, the cross-samples comparison was possible, as shown in Figure 4G. We have now added more details to the Methods section to clarify this point.

      Figures 4J and 4K miss axis labels and since we look at correlations, the figures could be redrawn using the same ranges on x and y axis.

      We thank the reviewer for this suggestion. We have now added axes labels to the new figures. However, we have not used the same range on the x and y axes because they depict expression levels of different genes. For example, if gene A is expressed 1, 5, and 10 in samples 1, 2, and 3, while gene B is expressed 100, 500 and 1000 for samples 1, 2, and 3, their range would be very different but still perfectly correlated (panel A below). If we draw x and y axes using the same range, this correlation will not be visually obvious (panel B below).

      This comparison is different from the correlation plots that compare the expression of one gene in different samples. We apologize for the confusion and to avoid misleading readers, we have enlarged the gene names in Figure labels to ensure that readers notice they are different genes. We have also added an option to the correlation plot on our portal so that users can choose the optimal format (see below).

      Page 14 line 5: Is this the right figure reference here to Figure 4G? If yes, then it is unclear how Figure 4G supports the statement in this sentence. Please, clarify.

      We apologize for the confusion. In Figure 4G, we labeled several important genes and used different colors to indicate whether the gene was regulated by SARS-CoV-2 only (purple), Rhinovirus only (black), or both(red). FURIN was the gene that is only significantly upregulated by SARS-CoV-2. The data in Figure 4G were from GSE160435(“SARS-CoV-2 infection of primary human lung epithelium for COVID-19 modeling and drug discovery”); that study used lung organoid alveolar type 2 (AT2) cells as the model. We think this confusion was caused by our failure to provide the details about the GSE160435 study. We have now amended the manuscript to include these details in the Methods section to avoid confusion. We also enlarged the gene labels in the figure to make them more visible. In the manuscript, we have changed from “our results found FURIN gene was also upregulated in SARS-CoV-2–infected lung organoid alveolar type 2 cells (Figure 4G, Supplementary Table S3).” to “We found that FURIN was upregulated in SARS-CoV-2-infected lung organoid alveolar type 2 cells (Figure 4G, Supplementary Table S4) (Mulay, Konda et al., 2021), it has reported that TGF-β signaling could also regulates FURIN (Blanchette, Rivard et al., 2001). Our gene enrichment analysis also found TGF-β signaling enriched only for up-regulated genes in SARS-CoV-2-infected lung cells (FDR correct p=7.58E-05, Supplementary Table S4), these observations implicated a positive feedback mechanism only for SARS-CoV-2-infected lung but not RV-infected nasal cells.”

      Figure 2 is of too low resolution. Many details cannot be read. Please, provide a higher resolution figure.

      We apologize for the inconvenience. However, we did not expect the reader to read the details on Figure 2, as it is just an overview of the CovidExpress portal. The aim is give the reader an impression about what functions CovidExpress could offer.

      Reviewer #1 (Significance (Required)):

      Providing a single platform for the analysis of SARS-CoV-2-related RNAseq data is certainly of high value to the scientific community. However, as the portal and manuscript are currently presented, for scientists that are not RNAseq analysis specialists, more guidance would be required to understand and use correctly the functionalities of the portal. Unfortunately, because batch effects could not be removed from the studies, the authors, correctly, do not recommend to combine data from different studies for analyses, however, this likely will also limit the potential of the resource to make new discoveries beyond what the original studies have already published. As indicated above, the authors could support their claim by comparing their findings with findings published from the studies they reanalyzed. The portal is only of use to scientists studying SARS-CoV-2. I am not an expert in RNAseq data analysis and thus cannot comment on the technicalities, especially the processing of the RNAseq datasets. We thank the reviewer for the positive comments. We apologize for the confusion and acknowledge that we should not describe our effort using the term “batch effect.” As described by Reviewer #2 (and we agree), batch effect should be used only to indicate a purely technical difference in the same biological system; for example, differences in experiments performed on different days or by different lab personnel. Thus, we cannot correct for “batch effect” by using CovidExpress. We hope that the reviewer realizes that what we did was correct for the effect caused by differences in software and parameters across the studies. For example, in our approach, the DEGs from GSE155518 and GSE160435 (both primary lung alveolar AT2 cells (both from Mulay et al., Cell Report, 2021) were significantly correlated (panel A below; p = 1.36e-24, F-test). However, when we downloaded the TPM values from their GEO records, GSE155518 appeared to have a genome-wide decrease in the expression of SARS-CoV-2–infected samples (panel B below). We suspect that this is because in their data processing, the expression of virus themselves were also considered. Thus, using the proceed data directly without careful reviewing the method might lead to false hypothesis.

      At last, researchers can make new discoveries, such as our OASL and FURIN findings, by using many other features that CovidExpress provides.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Djekidel and colleagues describe a web portal to explore several SARS-CoV-2 related datasets. The authors applied a uniform reprocessing pipeline to the diverse RNA-seq datasets and integrated them into a cellxgene-based interface. The major strengths of the manuscript are the scale of the compiled data, with over one thousand samples included, and the data portal itself, which has useful visualization and analysis functions, including GSEA and DEG analysis. My primary concerns with the study are centered on the analysis examples that are presented and their interpretation, as well as the user interface for the data portal. **Major Comments:**

      1. The literature analysis feels out of place and is not informative (Fig 1E), as the conclusions that can be drawn from literature mining are minimal. In evidence of this, the authors highlight that CRP is a top-studied "gene" and later voice their interest in how CRP is not a differentially expressed gene (pg6). This illustrates the problems with the literature-based analysis, since in the context of COVID-19, CRP is a common blood laboratory measurement that is used as a general marker of inflammation. Transcription of CRP is essentially exclusively in hepatocytes as an acute phase reactant (see GTEx portal for helpful reference), and would therefore not be expected to be found in the various datasets collected by the authors. The one exception might be liver RNA-seq samples from COVID-19 patients, but I do not think these are available in the current collection. I would therefore suggest to remove the literature analysis parts from the manuscript.

      We thank the reviewer for sharing knowledge about CRP. As discussed in our manuscript, we agree that not all top genes from literature-based analysis were expected to be included in RNA-seq analysis. We apologize for the confusion, and we have amended our description to make this point clearer. However, we still believe that literature-based analyses are very useful in the following aspects:

      1. This type of analysis bridges the gap between data-driven research and hypothesis-driven research. For example, we found many genes in our meta-analysis, but it is not feasible to describe the functions of all of them. Thus, in Figure 1F, we color-coded genes in red if they also appeared as top genes in the literature-based analysis and read related manuscripts to build confidence that the meta-analysis is useful. Then we expanded our review to more top genes and found more interesting evidence (Supplementary Table S2, “TopGenesbyDifferentialAnalysis” tab).
      2. Literature-based analyses also reduce the time researchers spend prioritizing their investigations. For example, in our comparison of SARS-CoV-2–infected lung and Rhinovirus-infected nasal tissue, we found >2000 genes upregulated only in SARS-CoV-2–infected lung but not in Rhinovirus-infected nasal cells. It is not easy to derive a hypothesis from so many genes. When we overlapped the gene list with literature-based analysis, FURIN popped up as the most well-studied gene, and we did not find any report that mentioned that SARS-CoV-2 can regulate FURIN This raised our interest and led to a suggested mechanism in which SARS-CoV-2 could evolve to induce FURIN expression and gain superior infectivity. FURIN’s upregulation is significant but not among the top genes, in terms of fold change (>2-fold change, FDR p th by fold change). Thus, without the literature-based analysis, this observation could have easily been neglected.
      3. Such analyses help researchers to prime their hypotheses for novel findings. For example, in our comparison between SARS-CoV-2–infected lung and Rhinovirus-infected nasal tissues (Figure 4G, Supplementary Figure 5D and E), we found many upregulated genes, but OASL was not in our literature-based analysis, which indicated that it is under-studied and worth highlighting. We hope the reviewer will agree that we should retain the literature-based analysis in our paper. These analyses were not meant to be conclusive but rather a way to prioritize investigations. Finally, we removed CRP from Fig 1E and the main text to avoid confusion.
      1. The data portal, implemented through cellxgene, is accessible for non-programmers to use. However, it is very easy to end up with an "Unexpected HTTP response 400, BAD REQUEST" error, with essentially no description of the cause of the error or how to rectify it. When this occurs (and in my experience it occurs very frequently), this also forces the user to refresh the page entirely, losing any progress they may have made. I see that the authors describe this error in their FAQ page, but their answer is not very intuitive and I was unsure of what they meant: "This happens because the samples you selected doesn't contain all "Group by" you want compare for each "Split by" group. You could confirm using the "Diff. groups" buttons.".

      We apologize for the confusion. This excellent point made by the reviewer required an improvement in the software engineering, which we have now completed. We have figured out how to avoid this error and have run thorough tests to ensure that it does not appear anymore. We also added a gitter chat channel to our landing page, so that users can report if they encounter this or other errors.

      I would therefore ask that the authors provide more detailed tutorials (ideally step-by-step) on common analyses that users will want to perform, hopefully minimizing the amount of frustration that users will encounter.

      We thank the reviewer for this suggestion. We have uploaded several video tutorials to our landing page and will gradually add more. We also added a gitter chat channel, so users can ask questions, report bugs, or suggest new studies to include in the portal.

      1. Selection of samples is not very quick or intuitive. If I wanted to select only the samples from one specific GEO accession, I had to resort to individually checking the boxes of the sample IDs that I wanted. If I instead selected the GEO accession under the samples source ID, then used the "Subset to currently selected samples" button, I invariable got the HTTP error 400 message. Of course, this may simply reflect my lack of familiarity with cellxgene; I would nevertheless encourage the authors to improve the FAQ to include a step-by-step example for how to do common analyses/procedures.

      We apologize for the confusion. To select an individual GEO accession, users can simply tick the box beside “Samples Source ID.”

      Then all boxes would be clear for “Samples Source ID” that allow you to select only the one you want. We also have uploaded video tutorials to help users learn how to navigate the portal.

      We apologize for the “HTTP error 400” messages. We figured out that users would encounter that message frequently after they encounter it once due to a back-end cache mechanism. We have now improved the portal from the software-engineering side. In our recent tests of the latest version, this error does not appear anymore. We also added a gitter chat channel on our landing page so that users can report encountering this or other errors.

      1. The second case study, centered on coagulation genes, is misguided. Alteration of coagulation lab values in severe COVID-19 patients is reflecting the general inflammatory state of these patients, and would not be expected to manifest on the transcriptional level in infected cells/tissues. Coagulation labs are measuring the functional status of the coagulation cascade, which is far-removed from the direct transcription of the corresponding genes - proteolytic processing of clotting factors, etc. As with CRP (see above comment), most clotting factors are transcribed almost exclusively in the liver (check GTEx portal); I would not expect upregulation of coagulation factors in lung cell lines/organoids/cultures etc after infection with SARS-CoV-2. I would recommend the authors to pick a different gene ontology set for a case study, as the current one focusing on coagulation is confusing in a pathophysiologic sense.

      We thank the reviewer for this suggestion. To avoid any misleading claims, we have replaced the coagulation gene list with a filtered gene list from the “Coronavirus disease - COVID-19” KEGG pathway (hsa05171) to showcase how to identify experiments in which this gene signature is enriched or depleted. We also replaced Figures 3G-J with new results.

      1. The two large clusters of blood-derived samples vs other tissues is not surprising and the authors' interpretation is confusing. The authors write that "the COVID-19 signature was not able to overcome the tissue specificity and that immune cells might respond to SARS-CoV-2 differently." This should be immediately obvious given the pathophysiology of COVID-19 infection; the cell types that are directly infected by SARS-CoV-2 will of course have a distinct response compared to the circulating blood cells of COVID-19 patients, which are responding by mounting an immune response. There is no reason to expect a priori that the DEGs in the directly infected lung cells would be similar to that of immune cells that are mounting a response against the virus.

      We thank the reviewer for these comments. We agree that it should be obvious that directly infected lung cells would differ from immune cells. However, this has never been shown in a large dataset. Also, it is not obviously whether all other different tissues would respond to SARS-CoV-2 differently. Thus, we believe it is important to present this overview. We have amended the description to deliver clearer message as “This confirmed immune cells respond to SARS-CoV-2 differently from other tissues also suggested the response of most other tissues might sharing similar features.”.

      1. The authors devote considerable space in the manuscript to exploring "batch effects" and trying to minimize them (pg10-11 Fig 4A-D, Fig S4). However, given that the compiled datasets are from entirely different experimental and biological systems (e.g. in vitro infection vs patient infection, different cell lines, timepoints after virus exposure, diverse tissues, varying disease severity), it is inappropriate to simply refer to all of these differences as "batch effects" alone. Usually, the term "batch effect" would refer to the same biological experiment/system (i.e. A549 cells infected with CoV vs control), but performed on different days or by different lab personnel - in other words, batch effects are purely due to technical differences. This term clearly does not apply when comparing samples from entirely different cell lines, or tissues, etc, and the authors should not keep describing these differences as batch effects that should be "corrected" out.

      We thank the reviewer for the insight. We apologize for the confusion caused by using the phrase “batch effect correction” to describe our approach. We agree that the difference between studies should not be referred to as a “batch effect correction” and have now amended the descriptions to avoid confusion.

      Indeed, the authors themselves state that the main point of their "batch effect correction" efforts is only for PCA visualization. I therefore feel this section contributes very little to the overall manuscript, especially given the authors' own recommendation that all analyses should be performed on individual datasets (which I certainly agree with). I assume that the authors were required to provide some sort of dimensional reduction projection for the cellxgene browser, but this is more a quirk in their choice of platform for the web portal. Thus, this section of the manuscript should be deemphasized.

      We thank the reviewer for these comments and again apologize for the confusion caused by our use of the term “batch effect correction” to describe our approach. However, we believe these parts of the paper should be retained for the following reasons:

      • In practice, sample mislabeling can happen. PCA or simple clustering approaches are very useful for helping raise researchers’ attention, so they could further check the possibility of sample mislabeling.
      • Even within a study, one sample can be an outlier due to low or unequal sample quality. Removing outliers would help boost the significance of real findings. Without our approach, it would be harder for users to notice and remove outliers from their investigations.
      • Finally, these efforts are useful for generating hypotheses. For example, although we collected a lot of data, it is not feasible for us to read all the details in all the manuscripts published. We observed a similarity between SARS-CoV-2–infected lung samples and Rhinovirus–infected nasal samples by exploring our portal’s capabilities (Figure 3E-F). Then we read the manuscripts in which those data were published and found that our discovery was consistent with the original studies’ results. We believe these efforts are essential to help researchers generate or refine their hypotheses. As we update the database with more samples, this approach will become increasingly powerful.
        1. Given the limitations of any combined multi-dataset analyses, one very useful feature would be to conduct "meta-analyses" across multiple datasets. For instance, it would be informative to find which genes are commonly DEGs in user-selected comparisons, calculated separately for each dataset and then cross-referenced across the relevant/user-selected datasets.

      We thank the reviewer for this comment. Indeed, we agree that “meta-analyses” are useful and have now compiled Supplementary Table S2 and Figure 1F to demonstrate the commonly regulated genes. To enable user-selected comparisons across studies on our portal, we need to design a thoughtful user interface. Otherwise, the results from our portal could easily cause fatal misinterpretation. For example, GSE154613 includes samples like DMSO, Drug, SARS-CoV-2, and DMSO+SARS-CoV-2. If a user simply selected to compare SARS-CoV-2 versus Control, the results would be SARS-CoV-2 and DMSO+SARS-CoV-2 versus DMSO and Drug. Such functions need time to design and implement; therefore, we will consider this suggestion for further development of our portal.

      **Minor comments:**

      1. Fig S1G, color legend should be added (I understand that these colors are the same from S1H).

      We thank the reviewer for the comment. We have now added information about the colors in the figure legend.

      1. Mouseover text for trackPlot on the data portal is incorrect (it says the heatmap text instead).

      We thank the reviewer for this comment. We have now corrected this bug.

      1. Abstract should be revised to describe only the 1093 final remaining RNA-seq samples after filtering/QC steps.

      We thank the reviewer for this comment. We have now amended the Abstract to include this information.

      1. Text in many figures is too small to be legible. I would suggest pt 6 font minimum for all figure text, including the various statistics in the figure panels.

      We thank the reviewer for this comment. We have now amended the font sizes and will provide high-resolution figures in revision.

      1. Are the DE analyses in Fig 1F specifically limited to control vs SARS-CoV-2/COVID-19 comparisons? Many of the samples included in this study are from other respiratory infections (labeled "other" in Fig 1B).

      We thank the reviewer for the question. Figure 1F was not originally limited to control vs SARS-CoV-2/COVID-19 comparisons, because we thought control vs virus, drug vs mock, or difference between time points would also be interesting. If we narrow the analysis to contrasts only between control vs SARS-CoV-2/COVID-19, Figure 1F would be still look similar (as below) because the genes in that comparison comprise the largest share of genes included in the original graphic.

      In the end, we replaced Figure 1F to avoid confusion and added more details in the Methods.

      1. The word cloud format is not conducive for understanding or interpretation. It would be much more informative to simply have a barplot or similar to clearly indicate the relative "abnudance" of a given gene among all 315 DE analyses.

      We thank the reviewer for this comment but respectfully disagree with this point. Visualization of the relative “abundance” of genes with word clouds is a relatively novel concept in computational biology. However, we believe, that in this case, it has certain advantages over visualization using traditional bar plots for example. The word cloud format allows us to highlight genes relative to their importance, with the word “importance” being used here in the sense of combined metrics from DEGs, as shown in Figure 1F, or the frequency with which genes are mentioned/discussed in various literature sources, as shown in Figure 1E. For this purpose, the exact values will most likely not be important for most users/readers. Be presenting a word cloud visualization, readers can easily discern the top genes and use them in the exploration of their own data or the CovidExpress portal. However, if users want to analyze raw values, we provide in Supplementary Table S3 a full list of all genes and gene sets that can be download from our landing page (section “CovidExpress Expression Data Download”) in GMT format. Also, when we visualized the ranks of genes by using bar plots as the reviewer suggested, the results were much harder to read (as shown in the bar graph below) than simply looking at the raw data in supplementary tables.

      1. Claims of increased/decreased dataset separability should have statistical analysis on the silhouette score boxplots (Fig S4G-I).

      We thank the reviewer for the reminder. We have added statistical tests to referred silhouette score boxplots (Wilcoxon rank test)

      1. Regarding Fig 4E-F - what are the key genes that contribute to PC1, and how do they relate to the DEGs in Fig 4G?

      We thank the reviewer for this question and apologize for the confusion. In Figure 4E-F, the PCA were based on ssGSEA score, as each gene set would have a score for a sample, not individual genes. Thus, the top contributed to PC1 were gene sets upregulated or down-regulated in certain contrasts. We provided on the portal’s landing page detailed results for top gene sets (for the ssGSEA approach) and genes (for the TPM approach) that contributed to various PCs (“Clustering Results for Reviewing and Download” section). This allows users to download and further explore these data.

      1. Statistics describing the relation between OASL And TNF/PPARGC1A should be included to justify the author's statements. This could be correlation, mutual information, regression, etc.

      We thank the reviewer for this suggestion, and we have updated Figures 4J-K to show the correlation values and corresponding F-statistics. The Pearson correlation between OASL and TNF was significant (Pearson Correlation=0.75 and p-value = 6.85e-72), but the correlation between OASL and PPARGC1A had a negative slope and showed a moderately significant p-value (Pearson Correlation=-0.08 and p-value=0.12), confirming to a certain degree our statement. We have now updated the corresponding text in the manuscript.

      1. There are several studies now that have performed scRNA-seq on the lung resident and peripheral immune cells of COVID-19 patients. To more definitively tie in their analyses in Fig 4J-K/Fig S5D-E (to affirm "its important role in the innate immune response in lungs"), the authors should assess whether OASL is upregulated in the lung macrophages of COVID-19 patients vs controls.

      We thank the reviewer for this suggestion. Indeed, Liao, et al. recently reported “BALFs of patients with severe/critical COVID-19 infection contained higher proportions of macrophages and neutrophils and lower proportions of mDCs, pDCs, and T cells than those with moderate infection.” (Nature Medicine, 2020, https://doi.org/10.1038/s41591-020-0901-9). They further refined macrophage data into subclusters and reported top enriched GO terms as “response to virus” (group 1), “type I interferon signaling pathway” (group 2), “neutrophile degranulation” (group 3), and “cytoplasmic translational initiation” (group 4). When we investigated their data, we found that group1 and group2 both identified OASL as a marker gene, indicated OASL might response to virus and help type I interferon signaling. Furthermore, another data set (from Ren et al., Cell, 2021, https://dx.doi.org/10.1016%2Fj.cell.2021.01.053) showed several clusters in patients with severe COVID-19 (left panel below) that were enriched for OASL expression(right panel below).

      We have now added these observations to strengthen our hypothesis about the role of OASL.

      1. The visualization and analysis functions in the data portal appear to work reasonably well out of the box. However, the download buttons for plots did not work in my hands. I realized that a workaround is to right click -> "Save image as" (which then downloads a .svg file), but this is not ideal and should be fixed to improve usability. I had tested the data portal on both Firefox and Edge browsers, using a Windows 10 PC.

      We agree with the reviewer. Due to some technical issues with the figure javascript plugin, the download feature does not work unless the figure is saved as a file on the server side. To avoid any security issues, we tried to minimize new file generations, hence, for the moment we have disabled this feature. Users can still download high-resolution .svg figures by using the right-click -> “save image as.” This information is now included in the FAQ section on the portal’s landing page.

      Reviewer #2 (Significance (Required)): The data portal appears to have useful analysis and visualization features, and the data collection appears to be quite comprehensive. I would strongly encourage the authors to continue collecting datasets as they become available and further improving the usability of the portal. As noted in the above comments, I think there is potential for their cellxgene-based browser to be useful to non-computational biologists, but at present, the data portal is not as simple to use as it should be. With further efforts to developing step-by-step tutorials for common analysis/visualization tasks, more informative case studies, and the other revisions suggested above, this study could be a valuable resource for the community. Of note, this review is written from the perspective of a primary wet-lab biologist with extensive bioinformatics experience but limited web development expertise.

      We thank the reviewer for the positive comments. We understand the importance of data updating. Our plan is to complete quarterly updates once this manuscript has been accepted or when 10 new studies have been either collected by us or suggested by users. This information is also now included in the FAQs of the portal’s landing page. We have also uploaded several tutorials videos to the landing page and will gradually add more. We also added a gitter chat channel, so users can ask questions, report bugs, or suggest new studies to add to the database.

      **Referee Cross-commenting** I agree with the comments of the other reviewers. Reviewer #3 (Evidence, reproducibility and clarity (Required)): **Summary:** The ongoing COVID-19 pandemic is a big threat to human health. The researchers have conducted studies to explore the gene expression regulations of human cells responding to COVID-19 infection. A website that integrating those datasets and providing user-friendly tools for gene expression analysis is a valuable resource for the COVID-19 study community. The authors collected published RNASeq datasets and developed a database and an interactive portal for users to investigate the gene expression of SARS-CoV-2 related samples. This website would be of great value for the SARS-CoV-2 research community if the batch normalization problems are solved. **Major comments:** 1) The major concern of CovidExpress is the batch effects from different studies. As the authors have shown and mentioned in their discussion that "For the current release, we strongly suggest investigators to perform gene expression comparison within individual study." This limits the usage of CovidExpress as integrating analysis from multiple datasets of different studies is the key value and purpose of CovidExpress.

      We thank the reviewer for the comment. Reviewer #2 reminded us, and we agree, that differences between studies should not be considered “batch effects.” We apologize for the confusion. The GSEA function provided in the portal does not suffer from batch effect, because all the pre-ranked lists of genes are based on contrasts from the same studies. Although we cannot correct for the differences between studies, we did correct for effect caused by differences in software and parameters used. For example, in our approach, the DEGs from GSE155518 and GSE160435 (both studies of primary lung alveolar AT2 cells from Mulay et al., Cell Report, 2021) were significantly correlated (below panel A, p-value = 1.36e-24, F-test). However, if we simply download the TPM values from their GEO records, GSE155518 appears to show a genome-wide decrease in expression in SARS-CoV-2–infected samples (below panel B). These errors might lead to false hypotheses.

      2) The authors should include experimental protocols as one key parameter in the description and further integrating analysis of different datasets. As the authors showed that QuantSeq is a 3' sequencing protocol of RNA sequencing. However, it is not convincing to me that simply excluding QuantSeq samples is the ideal solution for downstream integrating analysis as QuantSeq has been shown that it has pretty good correlations with normal RNASeq methods in gene quantifications. It is interesting that there are 21.2% of samples were biased toward intronic reads. What protocol differences or experimental variations would explain the biases?

      We thank the reviewer for the comment and apologized for not being clearer. One of our main goals re-processing all samples is to correct for pipeline processing–related batch effects. We tried to reduce those effects introduced by using different software or parameters. QuantSeq or similar protocols are heavily bias to 3’ UTR; thus, the software and parameters used for RNA-seq data will not be suitable. In contrast, we agree that the downstream results from QuantSeq have good correlation to RNA-seq (we observed a correlation of ~0.75, when compared to the log2 fold-change from Quant-Seq to RNA-seq). However, we could not reconcile QuantSeq always correlated well with RNA-seq, in terms of individual quantification. For example, Jarvis et al. recently reported only ~0.35 correlation between QuantSeq and RNA-seq (https://doi.org/10.3389/fgene.2020.562445). Theoretically, the correlation would be weaker for genes with a small 3’ UTR. Thus, we will not include QuantSeq data in this portal. However, if we collect enough studies in the future, we will consider uploading a separate portal just for QuantSeq using a pipeline optimized for protocol bias to 3’ UTR.

      For the 21.2% samples that were biased towards intronic reads, we believe they reflect differences in the kits used. For example, of the 162 samples “BASE_INTRON (%)” >30% (Supplementary Table S1) that passed QC, 76 samples were total RNA obtained using the SMARTer kit and 36 were total RNA obtained using the Trio kit. Given that we have 105 samples of total RNA derived using the SMARTer kit and 38 samples of total RNA derived using the Trio kit, we conclude that the Trio kit was more biased toward introns, and the SMARTer kit was also strongly biased. This finding is consistent with those of others who have reported the bias of the SMARTer kit (Song et al., https://doi.org/10.1186/s12864-018-5066-2). Users can find these results in our Supplementary Table S1. We have also uploaded the protocol information to our portal.

      3) How do the authors plan to update and maintain CovidExpress?

      We thank the reviewer for this question. We understand the importance of data updating. Our plan is to update the database quarterly once this manuscript has been accepted or when 10 new studies have been collected by us or suggested by users. We have added this information to the FAQs on the portal’s landing page. We also understand the importance of maintaining the service for a feasible amount of time for research. Therefore, we will keep the server activated for at least 2 years after the WHO announces that COVID-19 is no longer a global pandemic. We will also ensure that, even after we take down the server , scientists with programming skills will be able to create local servers based on the data provided on CovidExpress.

      **Minor comments:** 1) Some texts in figures are not readable. For example, Fig2B, 2C, 2D, 2E.

      We thank the reviewer for this comment. We have now increased the font sizes and provided high-resolution figures in revision.

      2) The authors could use Videos to demonstrate how to use CovidExpress on the website as they have shown in Fig3.

      We thank the reviewer for this suggestion. We have uploaded several video tutorials to the landing page and will gradually add more. We also added a gitter chat channel so that users can ask questions, report bugs, or suggest new studies to include in the database.

      Reviewer #3 (Significance (Required)): The ongoing COVID-19 pandemic is a big threat to human health. Many molecular and cellular questions related to COVID-19 pathophysiology remain unclear and many researchers have conducted studies to explore the gene expression regulations of human cells responding to COVID-19 infection. However, there is no database/website that integrating all RNASeq data to provide user-friendly tools for gene expression analysis for COVID-19 researchers. The authors collected the published RNASeq datasets and developed a database and an interactive portal, named CovidExpress, to allow users to investigate the gene expressions response to COVID-19 infection. CovidExpress is a valuable resource for the COVID-19 study community once the batch normalization problems are solved. The users who came up with ideas about the regulation of COVID-19 response could use the system to test their hypothesis, without experience in bioinformatics and RNASeq data analysis. This will be more important when more RNASeq data from samples with different tissues, cell lines, and conditions are integrated into the database.

      We thank the reviewer for the positive comments. We apologize for the confusion and acknowledge that we should not describe our effort using the term “batch effect.” As described by Reviewer #2 (and we agree), batch effect should be used only to indicate a purely technical difference in the same biological system; for example, differences in experiments performed on different days or by different lab personnel. Thus, we cannot correct for “batch effect” by using CovidExpress. We hope that the reviewer realizes that what we did was correct for the effect caused by differences in software and parameters across the studies. For example, in our approach, the DEGs from GSE155518 and GSE160435 (both primary lung alveolar AT2 cells (both from Mulay et al., Cell Report, 2021) were significantly correlated (panel A below; p = 1.36e-24, F-test). However, when we downloaded the TPM values from their GEO records, GSE155518 appeared to have a genome-wide decrease in the expression of SARS-CoV-2–infected samples (panel B below).

      Thus, using the proceed data directly without careful reviewing the method might lead to false hypothesis. At last, researchers can make new discoveries, such as our OASL and FURIN findings, by using many other features that CovidExpress provides.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The ongoing COVID-19 pandemic is a big threat to human health. The researchers have conducted studies to explore the gene expression regulations of human cells responding to COVID-19 infection. A website that integrating those datasets and providing user-friendly tools for gene expression analysis is a valuable resource for the COVID-19 study community. The authors collected published RNASeq datasets and developed a database and an interactive portal for users to investigate the gene expression of SARS-CoV-2 related samples. This website would be of great value for the SARS-CoV-2 research community if the batch normalization problems are solved.

      Major comments:

      1) The major concern of CovidExpress is the batch effects from different studies. As the authors have shown and mentioned in their discussion that "For the current release, we strongly suggest investigators to perform gene expression comparison within individual study." This limits the usage of CovidExpress as integrating analysis from multiple datasets of different studies is the key value and purpose of CovidExpress.

      2) The authors should include experimental protocols as one key parameter in the description and further integrating analysis of different datasets. As the authors showed that QuantSeq is a 3' sequencing protocol of RNA sequencing. However, it is not convincing to me that simply excluding QuantSeq samples is the ideal solution for downstream integrating analysis as QuantSeq has been shown that it has pretty good correlations with normal RNASeq methods in gene quantifications. It is interesting that there are 21.2% of samples were biased toward intronic reads. What protocol differences or experimental variations would explain the biases?

      3) How do the authors plan to update and maintain CovidExpress?

      Minor comments:

      1) Some texts in figures are not readable. For example, Fig2B, 2C, 2D, 2E.

      2) The authors could use Videos to demonstrate how to use CovidExpress on the website as they have shown in Fig3.

      Significance

      The ongoing COVID-19 pandemic is a big threat to human health. Many molecular and cellular questions related to COVID-19 pathophysiology remain unclear and many researchers have conducted studies to explore the gene expression regulations of human cells responding to COVID-19 infection. However, there is no database/website that integrating all RNASeq data to provide user-friendly tools for gene expression analysis for COVID-19 researchers. The authors collected the published RNASeq datasets and developed a database and an interactive portal, named CovidExpress, to allow users to investigate the gene expressions response to COVID-19 infection. CovidExpress is a valuable resource for the COVID-19 study community once the batch normalization problems are solved. The users who came up with ideas about the regulation of COVID-19 response could use the system to test their hypothesis, without experience in bioinformatics and RNASeq data analysis. This will be more important when more RNASeq data from samples with different tissues, cell lines, and conditions are integrated into the database.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Djekidel and colleagues describe a web portal to explore several SARS-CoV-2 related datasets. The authors applied a uniform reprocessing pipeline to the diverse RNA-seq datasets and integrated them into a cellxgene-based interface. The major strengths of the manuscript are the scale of the compiled data, with over one thousand samples included, and the data portal itself, which has useful visualization and analysis functions, including GSEA and DEG analysis. My primary concerns with the study are centered on the analysis examples that are presented and their interpretation, as well as the user interface for the data portal.

      Major Comments:

      1. The literature analysis feels out of place and is not informative (Fig 1E), as the conclusions that can be drawn from literature mining are minimal. In evidence of this, the authors highlight that CRP is a top-studied "gene" and later voice their interest in how CRP is not a differentially expressed gene (pg6). This illustrates the problems with the literature-based analysis, since in the context of COVID-19, CRP is a common blood laboratory measurement that is used as a general marker of inflammation. Transcription of CRP is essentially exclusively in hepatocytes as an acute phase reactant (see GTEx portal for helpful reference), and would therefore not be expected to be found in the various datasets collected by the authors. The one exception might be liver RNA-seq samples from COVID-19 patients, but I do not think these are available in the current collection. I would therefore suggest to remove the literature analysis parts from the manuscript.
      2. The data portal, implemented through cellxgene, is accessible for non-programmers to use. However, it is very easy to end up with an "Unexpected HTTP response 400, BAD REQUEST" error, with essentially no description of the cause of the error or how to rectify it. When this occurs (and in my experience it occurs very frequently), this also forces the user to refresh the page entirely, losing any progress they may have made. I see that the authors describe this error in their FAQ page, but their answer is not very intuitive and I was unsure of what they meant: "This happens because the samples you selected doesn't contain all "Group by" you want compare for each "Split by" group. You could confirm using the "Diff. groups" buttons.".

      I would therefore ask that the authors provide more detailed tutorials (ideally step-by-step) on common analyses that users will want to perform, hopefully minimizing the amount of frustration that users will encounter.

      1. Selection of samples is not very quick or intuitive. If I wanted to select only the samples from one specific GEO accession, I had to resort to individually checking the boxes of the sample IDs that I wanted. If I instead selected the GEO accession under the samples source ID, then used the "Subset to currently selected samples" button, I invariable got the HTTP error 400 message. Of course, this may simply reflect my lack of familiarity with cellxgene; I would nevertheless encourage the authors to improve the FAQ to include a step-by-step example for how to do common analyses/procedures.
      2. The second case study, centered on coagulation genes, is misguided. Alteration of coagulation lab values in severe COVID-19 patients is reflecting the general inflammatory state of these patients, and would not be expected to manifest on the transcriptional level in infected cells/tissues. Coagulation labs are measuring the functional status of the coagulation cascade, which is far-removed from the direct transcription of the corresponding genes - proteolytic processing of clotting factors, etc. As with CRP (see above comment), most clotting factors are transcribed almost exclusively in the liver (check GTEx portal); I would not expect upregulation of coagulation factors in lung cell lines/organoids/cultures etc after infection with SARS-CoV-2. I would recommend the authors to pick a different gene ontology set for a case study, as the current one focusing on coagulation is confusing in a pathophysiologic sense.
      3. The two large clusters of blood-derived samples vs other tissues is not surprising and the authors' interpretation is confusing. The authors write that "the COVID-19 signature was not able to overcome the tissue specificity and that immune cells might respond to SARS-CoV-2 differently." This should be immediately obvious given the pathophysiology of COVID-19 infection; the cell types that are directly infected by SARS-CoV-2 will of course have a distinct response compared to the circulating blood cells of COVID-19 patients, which are responding by mounting an immune response. There is no reason to expect a priori that the DEGs in the directly infected lung cells would be similar to that of immune cells that are mounting a response against the virus.
      4. The authors devote considerable space in the manuscript to exploring "batch effects" and trying to minimize them (pg10-11 Fig 4A-D, Fig S4). However, given that the compiled datasets are from entirely different experimental and biological systems (e.g. in vitro infection vs patient infection, different cell lines, timepoints after virus exposure, diverse tissues, varying disease severity), it is inappropriate to simply refer to all of these differences as "batch effects" alone. Usually, the term "batch effect" would refer to the same biological experiment/system (i.e. A549 cells infected with CoV vs control), but performed on different days or by different lab personnel - in other words, batch effects are purely due to technical differences. This term clearly does not apply when comparing samples from entirely different cell lines, or tissues, etc, and the authors should not keep describing these differences as batch effects that should be "corrected" out.

      Indeed, the authors themselves state that the main point of their "batch effect correction" efforts is only for PCA visualization. I therefore feel this section contributes very little to the overall manuscript, especially given the authors' own recommendation that all analyses should be performed on individual datasets (which I certainly agree with). I assume that the authors were required to provide some sort of dimensional reduction projection for the cellxgene browser, but this is more a quirk in their choice of platform for the web portal. Thus, this section of the manuscript should be deemphasized.

      1. Given the limitations of any combined multi-dataset analyses, one very useful feature would be to conduct "meta-analyses" across multiple datasets. For instance, it would be informative to find which genes are commonly DEGs in user-selected comparisons, calculated separately for each dataset and then cross-referenced across the relevant/user-selected datasets.

      Minor comments:

      1. Fig S1G, color legend should be added (I understand that these colors are the same from S1H).
      2. Mouseover text for trackPlot on the data portal is incorrect (it says the heatmap text instead).
      3. Abstract should be revised to describe only the 1093 final remaining RNA-seq samples after filtering/QC steps.
      4. Text in many figures is too small to be legible. I would suggest pt 6 font minimum for all figure text, including the various statistics in the figure panels.
      5. Are the DE analyses in Fig 1F specifically limited to control vs SARS-CoV-2/COVID-19 comparisons? Many of the samples included in this study are from other respiratory infections (labeled "other" in Fig 1B).
      6. The word cloud format is not conducive for understanding or interpretation. It would be much more informative to simply have a barplot or similar to clearly indicate the relative "abnudance" of a given gene among all 315 DE analyses.
      7. Claims of increased/decreased dataset separability should have statistical analysis on the silhouette score boxplots (Fig S4G-I).
      8. Regarding Fig 4E-F - what are the key genes that contribute to PC1, and how do they relate to the DEGs in Fig 4G?
      9. Statistics describing the relation between OASL And TNF/PPARGC1A should be included to justify the author's statements. This could be correlation, mutual information, regression, etc.
      10. There are several studies now that have performed scRNA-seq on the lung resident and peripheral immune cells of COVID-19 patients. To more definitively tie in their analyses in Fig 4J-K/Fig S5D-E (to affirm "its important role in the innate immune response in lungs"), the authors should assess whether OASL is upregulated in the lung macrophages of COVID-19 patients vs controls.
      11. The visualization and analysis functions in the data portal appear to work reasonably well out of the box. However, the download buttons for plots did not work in my hands. I realized that a workaround is to right click -> "Save image as" (which then downloads a .svg file), but this is not ideal and should be fixed to improve usability. I had tested the data portal on both Firefox and Edge browsers, using a Windows 10 PC.

      Significance

      The data portal appears to have useful analysis and visualization features, and the data collection appears to be quite comprehensive. I would strongly encourage the authors to continue collecting datasets as they become available and further improving the usability of the portal. As noted in the above comments, I think there is potential for their cellxgene-based browser to be useful to non-computational biologists, but at present, the data portal is not as simple to use as it should be. With further efforts to developing step-by-step tutorials for common analysis/visualization tasks, more informative case studies, and the other revisions suggested above, this study could be a valuable resource for the community. Of note, this review is written from the perspective of a primary wet-lab biologist with extensive bioinformatics experience but limited web development expertise.

      Referee Cross-commenting

      I agree with the comments of the other reviewers.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The manuscript submitted by Djekidel et al entitled: "CovidExpress: an interactive portal for intuitive investigation on SARS-CoV-2 related transcriptomes" reports on a new web portal to search and analyze RNAseq data related to SARS-CoV-2 infections. The authors downloaded and reprocessed data of more than 40 different studies, which is available on the web portal along with all available meta data. The web portal allows to perform numerous differential expression and gene set enrichment analyses on the data and provides publication ready figures. Because of batch effects that could not be removed, the authors do not recommend to analyze data across studies at this point. The authors conclude that the web portal is unique and will allow scientists to rapidly analyze gene expression signatures related to SARS-CoV-2 infections with the potential to make new discoveries.

      Major comments:

      Based on the scientific literature, the web portal seems to be an unprecedented resource to search and analyze SARS-CoV-2-related RNAseq data and as such would certainly be a useful resource for the SARS-CoV-2 scientific community. The authors argue that new discoveries are possible by using their web portal in providing use cases. However, the section detailing the analyses the authors did to generate new hypotheses about genes potentially relevant in SARS-CoV-2 infections are very difficult to follow and without more guidance very difficult to reproduce with the web portal. It would require substantial expert knowledge in RNAseq data analysis without more information being provided. It also seems that key candidate genes identified by their analyses have all been studied or identified to be related to SARS-CoV-2 infections, so it is somewhat unclear whether new hypotheses can be generated by the reanalysis of RNAseq datasets, especially because combining the data from different studies is currently not recommended by the authors. The manuscript would benefit from providing fewer use cases but for each of them providing more information on how the portal and which studies were used to generate them and which findings were not described in the publication of the used studies. Some observations in the manuscript are not substantiated with significance calculations (see below). At times, the English writing (grammar) should be improved.

      Minor comments:

      Page 6 last sentence: The statement of this sentence is very much what one would expect. It remains unclear whether the authors mean this as a result to validate the processing of the RNAseq data or as a new discovery. Please, clarify.

      Figure 3A: The violin plots are so tiny that it is impossible to see any trends. It is also difficult to understand which categories one should compare with each other. If there is anything significant to observe, please, add a statistical test and better guide the reader.

      Figure 3C: A legend for the color scale is missing. The signal (I guess expression amounts) for SESN2 seems very weak and the same between ICU and non-ICU samples. What is the significance for assigning this gene to the group of genes being upregulated in ICU samples? Also contrary to what the authors state on page 8, SESN2 does not seem to be highly expressed in ICU samples, however, without knowing what the colors represent (fold changes or absolute expression values?) this is somewhat speculative.

      Page 9 first sentence: Please, specify what you mean by "starting list". Furthermore, in this paragraph, how do your results compare to the results from the study that you re-analyze here?

      Figure 3F: Please add labels to your axes and is there a particular reason why in a correlation plot like this one, the y and x axis are not shown with the same range and why does the y axis not start at 0?

      Page 9 second last sentence: It remains unclear which kind of analysis the authors intend to do here and what the starting question is. Please, try to rewrite with less technical terms (i.e. what do you mean by "precalculated contrasts"). In line with this, it remains unclear what Figure 3I is supposed to show. Please, provide some more information to readers who are not RNAseq analysis experts.

      Figure 3J is somewhat confusing. Why is the mean expression range indicated from 0 to 1 and why are all genes apparently having a mean expression of 1? Page 10 line 5-6. Are you referring to coagulation markers here or general expression patterns? In case of the latter, how does this statement fit to the paragraph about analyzing expression patterns of coagulation markers? Please, specify. And in line with this, are the highlighted genes in Figure 3K coagulation markers? If not, what is the relevance of these to make the point that one can use the portal to investigate the role of coagulation markers in SARS-CoV-2 infections?

      The appearance of describing batch effects and attempts to remove them from the studies was somewhat surprising on page 10 as I would expect this kind of results rather earlier in the results section before describing use cases of the data. You may consider changing the order of your results for a better flow. Page 11, second paragraph. Please, explain briefly what the silhouette score is supposed to reflect and thus how Figure S4G should be interpreted. The difference of both bars in Figure S4G is very marginal and thus, does not seem to support the statement of the authors that the ssGSEA scores-based projection is better unless you perform a significance test or I misunderstood. Please, clarify.

      Page 11, third paragraph: Figure 4B, to the best of my understanding, does not support the claim that samples clustered less according to study cohorts using the ssGSEA approach. Please, quantify the effect and test for significance or better explain.

      For the analyses described starting on page 12 it remains largely unclear whether they were conducted across studies or within studies and which studies were used. This section until the end of the results would especially benefit from providing more information on how the analyses were performed, either in the results or in the methods section.

      Figures 4J and 4K miss axis labels and since we look at correlations, the figures could be redrawn using the same ranges on x and y axis.

      Page 14 line 5: Is this the right figure reference here to Figure 4G? If yes, then it is unclear how Figure 4G supports the statement in this sentence. Please, clarify. Figure 2 is of too low resolution. Many details cannot be read. Please, provide a higher resolution figure.

      Significance

      Providing a single platform for the analysis of SARS-CoV-2-related RNAseq data is certainly of high value to the scientific community. However, as the portal and manuscript are currently presented, for scientists that are not RNAseq analysis specialists, more guidance would be required to understand and use correctly the functionalities of the portal. Unfortunately, because batch effects could not be removed from the studies, the authors, correctly, do not recommend to combine data from different studies for analyses, however, this likely will also limit the potential of the resource to make new discoveries beyond what the original studies have already published. As indicated above, the authors could support their claim by comparing their findings with findings published from the studies they reanalyzed. The portal is only of use to scientists studying SARS-CoV-2. I am not an expert in RNAseq data analysis and thus cannot comment on the technicalities, especially the processing of the RNAseq datasets.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2021-01024

      Corresponding author(s): Martin Spiess

      1. Description of the planned revisions — point-by-point response


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Apart from the default constitutive pathway for protein secretion some specialized cells (e.g., neuroendocrine cells, exocrine cells, peptidergic neurons and mast cells) exhibit additional regulated secretory pathway, where peptide hormones are stored as highly concentrated ordered manner inside electron opaque "dense core" of secretory granule for long duration until secretagogue mediated burst release. Although the general sorting receptor for packaging hormones in secretory granules is not yet identified, self-aggregation in the trans-Golgi network is a common shared property of peptide hormones and is a well-accepted potential sorting mechanism. Here the authors have hypothesized that cysteine containing small disulphide loop (CC loop), which is abundant in several hormone precursors, acts as aggregation mediator in TGN for sorting into secretory granule. They have tested the aggregation propensity of a misfolded reporter protein, NPΔ, in ER by attaching the CC loop segment of different hormones which promoted the pathological aggregation in endoplasmic reticulum (ER) of mutant provasopressin in the case of diabetes insipidus. Immunofluorescence and immunogold electron microscopy revealed accumulation of aggregates in the ER when CC loop of different hormonal origin fused NPΔ was transiently transfected in COS-1 fibroblasts and Neuro-2a neuroblastoma cells. The authors have also shown small disulphide loop mediated functional aggregation in TGN can sort a constitutively secreted protein, α1-protease inhibitor, into the secretory granule. The rerouting capacity of CC loop was tested in stably expressed AtT-20 cell line by confirming their localization with CgA-positive secretory granule as well as by studying BaCl2 mediated stimulated secretion and by testing secretory granule specific lubrol insolubility.

      **Major comments:**

      The study is highly impressive, and the results fully support the CC loop mediated hormone sorting hypothesis. However, it would be nice if the authors characterize the nature of the CC-loop mediated aggregates as hormones are reported to be stored inside secretory granules as functional amyloid (Maji et al., 2009). The mechanistic reason behind the small disulfide loop mediated aggregation was not explained in the paper. Authors may propose the probable molecular reasons behind CC loop mediated aggregation to completely justify their hypothesis.

      Although the hypothesis and the experimental results are highly impressive, the authors may consider adding the following experiments.

      The authors replaced CC-loop by the proline/glycine repeat sequence (Pro1) as a negative control which was previously reported to abolish aggregation as well. However, the authors may completely delete the small loop forming segment, CCv, and may check the status of His-tagged fused neurophysin II (NPΔ) segment as an additional negative control. We plan to use a NP∆ construct completely lacking any N-terminal extension as a further negative control, as proposed by the reviewer.

      To find the ultrastructure authors have done immunogold assay with anti-His antibody which indicated different CC loop mediated ER aggregation. Since the amyloid-like fibril nature of pro-vasopressin mutant mediated ER aggregates was previously reported (Beuret et al., 2017), authors must check the nature of the CC loop mediated ER aggregates with amyloid specific antibody.

      We will test staining ER aggregates of our CC loop–NP∆ constructs with anti-amyloid antibodies. A caveat is that CC loops cannot form a classical cross-b structure (strict b-sheets) because of the ring closure – which is why we suggest their aggregation to be "amyloid-like". These structures may not be recognized by anti-amyloid antibodies.

      Since hormones are known to form reversible functional amyloid during their storage inside secretory granule, authors may consider characterizing the nature of the aggregates formed by CC loop fused constitutive protein in AtT-20 cell line by immunostaining, immunoprecipitation and dot blot assay using amyloid specific antibody. Endogenous AtT20 granules are expected to be positive for amyloid stains or antibodies anyway (if the size and mass of the granules is sufficient for detection; Maji et al. used pituitary tissue and purified granules).

      **Minor comments:**

      In the quantification study (Figure 2C) CCc and CCr showed almost similar ER aggregates (around 40%). But authors have commented that all constructs except CCc produce statistically significant increases in cells compared to background. Authors must clarify the statement.

      CCc also increased, but in a statistically not significant manner (p = 0.08). We will change the sentence to: "It confirmed the ability of all constructs to produce an increase of cells with aggregates above background in COS-1 cells (Figure 2C), although not statistically significant for CCc (p = 0.08)."

      In lubrol insolubility assay, the otherwise constitutively secreted protein A1Pimyc (negative control) showed 23% insolubility. The authors explained the observation by commenting about trapping of the protein inside granule aggregate. But CCv and CCa fused proteins showed a very slight increase (around 30%). Only CCc construct showed more than 40% insolubility. If the trapping of constitutive protein may result in 23% insolubility, all the insolubility data except CCc is not satisfactory to claim as secretory granular content of aggregated protein. The authors must explain that.

      Lubrol insolubility is an empirical assay with high specificity for Golgi/post-Golgi forms, but with a relatively high background that we suggest to be due to trapping. Interpretation is based on statistical analysis of several independent experiments. It supports the conclusion of the other assays from an independent angle.

      We present the data of the paired t-test

      The authors have satisfactorily referenced prior studies in the field. However, authors may consider adding the following papers as they are directly connected with the hypothesis. The sorting of POMC hormone into secretory granules by disulphide loop was previously studied. (Cool et al.,1995). The N-terminal loop segment was also previously used to reroute a constitutive protein chloramphenicol acetyltransferase (Tam and Peng, 1993). S K. Maji and his coworker had previously shown that disulphide bond maintains native reversible functional amyloid structure relevant to hormone storage inside secretory granule whereas disulphide bond disruption led to rapid irreversible amyloid aggregation using cyclic somatostatin as model peptide. (Anoop et al., 2014). We will be happy to add these references (Anoop et al., 2014, is already discussed in the text).. Authors must check grammar and may reconstruct a few sentences where sentence construction seems complicated. We will go through the text to improve readability.

      Reviewer #1 (Significance (Required)):

      This manuscript has a significant contribution to enrich academia with fundamental research knowledge of hormone sorting mechanisms. Although constitutive and regulated secretory pathways are known for long times, the exact sorting mechanism is not yet elucidated. There is no common receptor identified yet for recruiting regulated secretary proteins inside the secretory granules.

      Aggregation in the TGN is a well-accepted mechanism for sorting. However, the triggering factor for aggregation is not yet known. This study has shed light on a novel hypothesis, which has considered intramolecular disulfide bond mediated small CC loop in hormone may act as aggregation mediator. Since many regulated secretory proteins contain the short disulphide loop, the hypothesis proposed in the manuscript is interesting.

      It has been confirmed that TGN is the last compartment which is common to both regulated and constitutive pathways (Kelly, 1985). There is no sorting mechanism required for the constitutive one as this is the default mechanism, whereas a regulated secretory pathway requires a specific sorting mechanism to be efficiently packaged in the secretory granules. There are two popular hypotheses about protein sorting in regulated secretory pathways. They are "sorting for entry" and "sorting for retention" (Blázquez and Kathleen, 2000). In "sorting for entry" hormones destined to go to the regulated secretory pathway start to form aggregates in the TGN specific environment excluding other proteins destined to go to the constitutive pathway. Arvan and Castle proposed the second mechanism as some hormones, like proinsulin, are initially packaged with lysosomal enzymes in immature secretory granules (ISG) (Arvan and Castle, 1998). But with time they start to aggregate and lysosomal enzymes are removed from ISG by small constitutive-like vesicles. Although, in both the mechanisms aggregation is an essential sorting criterion the molecular events that lead to aggregation is not yet elucidated. TGN specific environmental conditions including pH (around 6.5), divalent metal ions (Zn2+, Cu2+), Glycosaminoglycans (GAGs) have potential to trigger aggregation (Dannies, Priscilla S, 2012). Though each hormone has aggregation prone regions in the amino acid sequence, there is no common amino acid sequence responsible for aggregation. The authors in this manuscript, have pointed out an interesting observation that many hormones contain small disulfide loops which are exposed due to their presence in N or C terminal or close to the processing site. Based on their observation, they hypothesized CC loop may act as aggregation driver for hormone sorting. In-cell study with CC construct from different hormones successfully rerouted a constitutively secretory protein into the regulated pathway which supported their novel hypothesis.

      However, the hypothesis raises some questions to be answered regarding the molecular mechanism of CC loop mediated aggregation. Why does CC-loop promote aggregation? Does the amino acid sequence, size of the loop play a role in aggregation? The granular structure shown in the manuscript from different CC loops has different size and shape (Figure 2 and 3). What is the reason for the structural heterogeneity of the CC loop mediated dense core? Since authors have shown CC loop mediated aggregation both in functional as well as in diseased aggregation, a very important aspect to address would be the structure-function relationship of the aggregates. Since authors have rightly pointed out that not all hormones or prohormones contain CC loop, another curious question would be about the sorting mechanism of those without CC loop. The best part of the study is that it has tried to explain the well-established aggregation mediated sorting mechanism from a new perspective, which raises room for many questions to be addressed by further research. These are very valid questions, but beyond the scope of this study in which we address the contribution of CC loops in a cellular context. This is a novel extension to published in vitro studies, where a few CC loop proteins (vasopressin, oxytocin, somatostatin-14) have already been shown to enable amyloid(-like) aggregation in vitro.

      From this study, the audience will get to know about the role of small disulphide loop in functional and diseased associated protein/peptide aggregation. The audience will also get an idea about the sorting mechanism in the regulated secretory pathway from the study. According to my expertise and knowledge where I do protein aggregation related to human diseases and hormone storage, I see this manuscript is a fantastic addition to understand the secretory granules biogenesis of hormones with storage and subsequent release.

      Reference: Maji, Samir K., et al. "Functional amyloids as natural storage of peptide hormones in pituitary secretory granules." Science 325.5938 (2009): 328-332. Beuret, Nicole, et al. "Amyloid-like aggregation of provasopressin in diabetes insipidus and secretory granule sorting." BMC biology 15.1 (2017): 1-14. Cool, David R., et al. "Identification of the sorting signal motif within pro-opiomelanocortin for the regulated secretory pathway." Journal of Biological Chemistry 270.15 (1995): 8723-8729. Tam, W. W., K. I. Andreasson, and Y. Peng Loh. "The amino-terminal sequence of pro-opiomelanocortin directs intracellular targeting to the regulated secretory pathway." European journal of cell biology 62.2 (1993): 294-306.

      Anoop, Arunagiri, et al. "Elucidating the Role of Disulfide Bond on Amyloid Formation and Fibril Reversibility of Somatostatin-14: RELEVANCE TO ITS STORAGE AND SECRETION." Journal of Biological Chemistry 289.24 (2014): 16884-16903. Kelly, Regis B. "Pathways of protein secretion in eukaryotes." Science 230.4721 (1985): 25-32. Blázquez, Mercedes, and Kathleen I. Shennan. "Basic mechanisms of secretion: sorting into the regulated secretory pathway." Biochemistry and Cell Biology 78.3 (2000): 181-191. Arvan, Peter, and David Castle. "Sorting and storage during secretory granule biogenesis: looking backward and looking forward." Biochemical Journal 332.3 (1998): 593-610. Dannies, Priscilla S. "Prolactin and growth hormone aggregates in secretory granules: the need to understand the structure of the aggregate." Endocrine reviews 33.2 (2012): 254-270.


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      This manuscript by Reck and colleagues aim at determining the importance of short disulfide loops for the correct sorting to, and release from, secretory granules. They utilize hybrid secretory proteins where sequences encoding disulfide loop from different hormones are cloned in frame with the same secretory peptide, and assess how the presence of the disulfide loop affect the ability of the protein to aggregate in the ER and to get sorted for secretion. By immunofluorescence analysis they show that the presence of a disulfide loop increases the ability of the peptide hormone to form aggregates in the ER, and these observations are confirmed by immunogold-EM. Importantly, aggregate formation is seen both in professional secretory (N-2a) and non-secretory (COS-1) cells. Using immunofluorescence and quantitative immuoblotting, they also show that the ability to aggregate the secretory proteins coincide with increased localization to secretory granules and in increased release from cells in response to stimuli.

      The results from this study are interesting and suggest that small disulfide loops may be an important part of the cargo sorting mechanism in secretory cells, and perhaps also a cause of sorting defects in certain diseases. The study is overall well conducted and worthy of publication after revision.

      **Major comments:**

      1) It is unclear to me what the relationship between the CC-loop and amyloid is. They are not involved in the formation of fibrils and amyloid, yet the authors conclude that they support the amyloid hypothesis of granule biogenesis. This must be clarified.

      Maji et al. (2009) concluded in their Science paper that secretory granules of the pituitary are made of functional amyloids formed by the protein hormones themselves. Evidence for this is that many purified protein hormones formed fibrillar aggregates in vitro with amyloid characteristics. Among the hormones analyzed were 4 CC loop-containing ones: vasopressin, oxytocin, somatostatin-14 (these are just the CC loop segments of the respective precursors), and full-length prolactin (199 aa, containing an N- and a C-terminal CC loop). Amyloid formation of somatostatin-14 was further analyzed in vitro with and without the disulfide bond by Anoop et al. (2014). On the tissue level, it was only shown that granules are stained by amyloid dyes (Maji et al., 2009). Our own lab found that folding-deficient mutant forms of provasopressin formed fibrillar aggregates in vitro (Birk et al., 2009) and in the ER of expressing cells (Birk et al., 2009; Beuret et al., 2011). These ER aggregates likely represent mislocalized amyloid formation that normally happens at the TGN for granule sorting.

      In the present study, we therefore tested the role of different CC loops in cells with respect to (1) inducing ER aggregation of a folding-incompetent reporter and (2) inducing granule sorting of a folded constitutive cargo protein. Unfortunately, the ER aggregates were all very compact and did not reveal fibrillarity. However, secretory granules, which contain functional amyloids, similarly do not have a fibrillar appearance.

      In this study, we do not directly provide evidence for the amyloid (or rather amyloid-like) character of aggregation. The concept of granules consisting of functional amyloids of peptide hormones was the starting point for our analysis. Our results are in line with the functional amyloid hypothesis and thus provide first functional support for it.

      2) What is the actual function of the CC-loops? The authors show that the loops promote aggregation of cargo proteins, yet the mechanism behind this is unclear. For example, would the proteins used in this study be able to aggregate in vitro (i.e. the CC-loop enable aggregation) or do they require some co-factor/chaperone? It would also be good if the authors could clarify or explain why some CC-loops cause aggregation and others not.

      Maji et al. (2009) showed for 3 different CC loops (vasopressin, oxytocin and somatostatin-14) that they aggregate in an amyloid-like form in vitro in purified form in the absence of chaperones or other protein cofactors. Anoop et al. (2014) analyzed in vitro amyloid formation of somatostatin-14 with and without disulfide bond in more detail. The proposed function is aggregation of the hormone into secretory granules as functional amyloids, which is supported by the finding that secretory granules are positive for amyloids.

      In the present study, we tested a variety of CC loops for aggregation in cells rather than in vitro. Many proteins and peptides have been shown to be able to form amyloids in vitro. The hallmark of pathological or functional amyloids is that they are still able to do it in living cells despite the presence of chaperones, whose function is to generally prevent aggregation.

      We found all CC loops to have the ability to mediate ER aggregation and granule sorting, although to different extents. The differences are likely due to their intrinsic potency and/or the way they are presented by the reporter proteins, since we used the same rather short linkers.

      We plan to go through the manuscript text to make our points clearer.

      3) The MS data in table 2 is very confusing, since half of the data points are missing. It is also not clear what the numbers in the table represent and if they are from a single experiment or multiple. As it is presented now, and as I interpret it, these results do not give support to the conclusion that CC loops form disulfide bonds. Since this is an important conclusion from the paper, these experiments need to be clarified, repeated or a different experimental approach used.

      Thanks to this comment, we realize that Table II may have presented the result in a confusing way, making the impression that a lot of data are missing, while in fact the data was measured to be 0. To improve it, we will write 0 instead of – to indicate that no signal could be detected for a particular peptide. In addition, we will move the missing results for CCpN-NP∆ into the figure legend to avoid confusion. In the legend, we will also note that the intensities detected by mass spectrometry differ strongly for different peptides. One experiment is shown, because the numbers for peak areas inherently differ between experiments. We will revise the text to make the experiment clearer.

      Proposed new Table II:

      Table II. Cysteines of CC loops are oxidized in secreted reporter fusion proteins.

      __nonreduced

      • IAA__

      __reduced

      • IAA__

      Diagnostic peptide*

      CCv disulf

      1637

      10

      CYFQNCPR↓

      CCv 2xmod

      0

      696

      CCa disulf

      4

      0

      ↓CNTATCATQTGEDPQGDAAQK↓

      CCa 2xmod

      0

      23

      CCc disulf

      6

      0

      ↓CGNLSTCMLGTTGEDPQGDAAQK↓

      CCc 2xmod

      0

      32

      CCr disulf

      570

      152

      ↓CSRLYTACVYHK↓

      CCr 2xmod

      0

      246

      CC loop fusion proteins with A1Pimyc were immunoprecipitated from the media of producing AtT20 cell lines, reduced with TCEP or not, before treatment with iodoacetic acid (+IAA). Samples analyzed by mass spectrometry for the expected peptide masses and the peak areas, normalized to the intensity of the peptide LQHLENELTHDIITK within A1Pi in arbitrary units are shown. It should be noted that intensities detected by mass spectrometry differ strongly by peptide. *CC loop sequences are shown in green with red cysteines, the N-terminal sequence of A1Pi in blue, linker sequence in black. CCv-, CCa-, and CCc-NP∆ containing samples were digested with trypsin, CCr- and CCpN-NP∆ containing samples with Lys-C. The peptides for CCpN-NP∆ (↓LPICPGGAARCQVTTGEDPQGDAAQK↓, disulfide bonded or carbamidomethylated) could not be detected.

      4) As the authors state, it is well-known that the concentration of proteins in the ER will influence the ability to aggregate. In figure 1 and 2, the authors use transient overexpression to assess the ability of different CC-loops to induce aggregation in the ER. How were these results normalized to expression levels of the proteins? In later experiments the authors instead use stable cell lines expressing similar amounts of the different proteins. However, in these cells there is no obvious aggregation in the ER (see figure 4). It therefore becomes unclear what the role of ER aggregation for sorting to granules is.

      The ER aggregation experiments were not normalized for expression levels. Plasmids were identical except for the short CC loop segments and produced similar transfection efficiencies. Stable cell lines with useful expression levels of CC-NP∆ could not be obtained, most likely because expression of mutant proteins inhibits growth.

      To analyze granule sorting, we expressed CC fusion proteins with rapidly folding A1Pi as a reporter that does not accumulate in the ER. Stable cell lines were important to select clones with moderate and very similar expression levels.

      5) What is the basal secretion of the different proteins, i.e. how much goes through the constitutive secretory pathway and how much goes through the regulated secretory pathway? The authors should show the resting secretion (before BaCl2 addition) for all conditions tested instead of just the change in relation to control (i.e. the way data is presented now it is not possible to tell whether BaCl2 stimulation actually cause an increased release of the peptides).

      The experiment is done by comparing resting secretion (– lanes) with BaCl2 stimulated secretion (+ lanes) in Fig. 5A and C. Stimulated secretion is calculated as a ratio of resting secretion / stimulated secretion (after normalization for cell number and supernatant loading).

      6) Lastly, the importance of CC-loops for the sorting of native peptides is unclear. The authors should test the importance of these loops for aggregation, sorting and secretion of a non-hybrid hormone with naturally occurring CC-loops (and a mutated version lacking the loop). This is important, since it is so far only shown that loops can affect the secretion of non-biologically relevant hybrid hormones.

      In our previous study Beuret et al. (2017), we analyzed the segments contributing to ER aggregation of folding-incompetent mutant provasopressins and to granule sorting for folding-competent mutants of provasopressins by self-aggregtion at the TGN. We found separate protein segments – vasopressin (=CCv) and the glycopeptide – to contribute to aggregation in both localizations. Our study is a follow up on the finding for vasopressin, expanding to other CC loops found in peptide hormones. Our results show that CC loops in general have the ability to aggregate and contribute to granule sorting.

      As exemplified by provasopressin, the CC loop may not be the only contributor. Preliminary experiments suggest the same for growth hormone. The detailed analysis of the aggregating sequences in one or more prohormone is clearly beyond the scope of our study.

      **Minor comments:**

      1) Stated that the 2x CC-loop constructs showed a positive effect in the cases of CCv and CCr, but this is not evaluated statistically.

      We will add the statistics to the respective figures.

      2) Explain the abbreviation POMC

      We will add the full name to the text.

      3) Figure 6D. Paired Student's t-test is not appropriate for determining significance when data is not paired (unpaired t-tests used throughout the rest of the paper).

      Only in the lubrol insolubility experiment did we find considerable shifts between experiments (particularly obvious for the yellow experiment). Instead of normalizing to the control construct, we used the paired t-test. However, using the unparied t-test does not produce fundamentally different significance. If required, we will change the figure as suggested.

      Figure 6D using unpaired t-test: [Figure]

      Reviewer #2 (Significance (Required)):

      The work in this paper builds on previous work from the same group and reinforces the notion that peptide aggregation is an important part of the sorting process that controls efficient delivery of certain proteins to nascent secretory granules, and suggest that short loops formed by disulfide bridges between closely apposed cysteine residues may be part of this sorting mechanism. The paper is of general cell biological interest, but perhaps of special interest to researches working on professional secretory cells and mechanisms of secretory protein sorting and secretion. My own research focuses on stimulus-secretion coupling pathways in secretory cells and we primarily use live cell imaging approaches to visualize different steps of secretory granule biogenesis and release.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Since the small disulfide loop of the nonapeptide vasopressin has been previously demonstrated to play a role the self-aggregation and secretory granule targeting of vasopressin precursor (Beuret et al., 2017), and as several other peptide hormones contain small disulfide loops, Reck and colleagues investigate in this study the requirement of small disulfide loops coming from four additional peptide hormones for the self-aggregation and secretory granule targeting of their precursors. Then, they studied the aggregation role of small disulfide loops in the ER and the TGN of two cell lines, COS1 and Neuro-2a. Using confocal and TEM, an aggregation has indeed been observed, although to different extents depending on the cell line. When fused to a constitutively secreted reporter protein, these disulfide loops induced their sorting into secretory granules, increased the stimulated secretion and Lubrol insolubility in endocrine AtT20 cells. All these results led the authors to hypothesize that small disulfide loops may act as a general device for peptide hormone aggregation and sorting, and therefore for secretory granule biogenesis.

      **Major comments:**

      The authors demonstrated the ability of small disulfide loops of peptide hormones to induce peptide precursor aggregation in ER using confocal microscopy, in COS1 and Neuro-2a cell lines, with a higher extent in COS1 cells. The authors have to moderate this conclusion and to include in their interpretation that distinct results may be due to the distinct secretory phenotype of these two cell lines: COS1 are epithelial cells, i.e. with a unique constitutive secretory pathway, while Neuro-2a as well as AtT20 cells also possess a regulated secretory pathway. Thus, the differences could be explained by the distinct molecular mechanisms involved in the formation of constitutive vesicles or secretory granules, and therefore aggregation and/or sorting processes could be distinct in the two cell types. We can also suggest to remove COS1-related results, to avoid hasty conclusions. As suggested, we will amend the text to point out that the two cell lines differ with respect to regulated secretion and to explain why they were used. COS-1 and Neuro-2a cells were previously used by Birk et al. (2009) to study ER aggregation of disease mutants of provasopressin. COS-1 cells were used because they are large with an extensive ER suitable for immunofluorescence microscopy. Neuro-2a cells are of neuroendocrine origin and thus more comparable to the cell types where ER aggregation of disease mutants of provasopressin or growth hormone was observed. However, the presence or absence of a regulated pathway has no relevance for ER aggregation experiments, since the different pathways diverge only at the TGN.

      The data and the methods can be reproduced and the experiments are adequately replicated, using timely statistical analysis.

      **Minor comments:**

      • Figure 3: to complete TEM study, the concomitant use of an ER specific antibody would definitely demonstrate that small disulfide loop-containing aggregates are linked to ER compartment.

      In our previous study Birk et al. (2009), we performed double-immunogold staining for provasopressin mutants and calreticulin to confirm aggregation in the ER. This anti-calreticulin antibody is unfortunately not commercially available anymore and other antibodies we tested were not suitable for immuno-EM. Instead, we colocalized PDI with CC-NP∆ constructs for immunofluorescence microscopy. Colocalization is so extensive that we believe EM confirmation to be unnecessary.

      • Along abstract, introduction and discussion sections, the authors should avoid to conclude on the role of small disulfide loops on secretory granule biogenesis, but rather limit their conclusion on prohormone aggregation and targeting. Indeed, the present study did not highlight any direct molecular / physical link between disulfide loops and TGN membrane to drive secretory granule formation. Granule biogenesis involves a number of processes including interaction of cargo components with the membrane and of the actomyosin complex with the forming buds, but also selfaggregation of cargo as functional amyloids. However, we will reword our statements in the Abstract avoiding the term "**granule biogenesis".

      Reviewer #3 (Significance (Required)):

      • This study highlights small disulfide loops as novel signals for self-aggregating and secretory granule sorting of prohormone precursors in cells with a regulated secretory pathway. These results help to understand the molecular mechanism driving peptide hormone secretion, a physiological process which is crucial for interorgan communication and functional synchronization. Moreover, their previous study revealed that vasopressin small disulfide loop is involved in toxic unfolded mutant aggregation in the ER (Beuret et al., 2017), which highlights the clinical potential of the work.
      • Audience that might be interested in and influenced by the reported findings: cell biologists interested in cell trafficking, peptide hormone secretion
      • My field of expertise: secretory granule biogenesis, hormone sorting, secretory cells, neurosecretion.

      2. Description of the revisions that have already been incorporated in the transferred manuscript

      The manuscript has not yet been revised.

      3. Description of analyses that authors prefer not to carry out

      As indicated in the point-by-point response above, we consider additional analyses of in vitro aggregation with purified proteins to be beyond the scope of our study.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      Since the small disulfide loop of the nonapeptide vasopressin has been previously demonstrated to play a role the self-aggregation and secretory granule targeting of vasopressin precursor (Beuret et al., 2017), and as several other peptide hormones contain small disulfide loops, Reck and colleagues investigate in this study the requirement of small disulfide loops coming from four additional peptide hormones for the self-aggregation and secretory granule targeting of their precursors. Then, they studied the aggregation role of small disulfide loops in the ER and the TGN of two cell lines, COS1 and Neuro-2a. Using confocal and TEM, an aggregation has indeed been observed, although to different extents depending on the cell line. When fused to a constitutively secreted reporter protein, these disulfide loops induced their sorting into secretory granules, increased the stimulated secretion and Lubrol insolubility in endocrine AtT20 cells. All these results led the authors to hypothesize that small disulfide loops may act as a general device for peptide hormone aggregation and sorting, and therefore for secretory granule biogenesis.

      Major comments:

      The authors demonstrated the ability of small disulfide loops of peptide hormones to induce peptide precursor aggregation in ER using confocal microscopy, in COS1 and Neuro-2a cell lines, with a higher extent in COS1 cells. The authors have to moderate this conclusion and to include in their interpretation that distinct results may be due to the distinct secretory phenotype of these two cell lines: COS1 are epithelial cells, i.e. with a unique constitutive secretory pathway, while Neuro-2a as well as AtT20 cells also possess a regulated secretory pathway. Thus, the differences could be explained by the distinct molecular mechanisms involved in the formation of constitutive vesicles or secretory granules, and therefore aggregation and/or sorting processes could be distinct in the two cell types. We can also suggest to remove COS1-related results, to avoid hasty conclusions.

      The data and the methods can be reproduced and the experiments are adequately replicated, using timely statistical analysis.

      Minor comments:

      • Figure 3: to complete TEM study, the concomitant use of an ER specific antibody would definitely demonstrate that small disulfide loop-containing aggregates are linked to ER compartment.
      • Along abstract, introduction and discussion sections, the authors should avoid to conclude on the role of small disulfide loops on secretory granule biogenesis, but rather limit their conclusion on prohormone aggregation and targeting. Indeed, the present study did not highlight any direct molecular / physical link between disulfide loops and TGN membrane to drive secretory granule formation.

      Significance

      • This study highlights small disulfide loops as novel signals for self-aggregating and secretory granule sorting of prohormone precursors in cells with a regulated secretory pathway. These results help to understand the molecular mechanism driving peptide hormone secretion, a physiological process which is crucial for interorgan communication and functional synchronization. Moreover, their previous study revealed that vasopressin small disulfide loop is involved in toxic unfolded mutant aggregation in the ER (Beuret et al., 2017), which highlights the clinical potential of the work.
        • Audience that might be interested in and influenced by the reported findings: cell biologists interested in cell trafficking, peptide hormone secretion
        • My field of expertise: secretory granule biogenesis, hormone sorting, secretory cells, neurosecretion.
    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This manuscript by Reck and colleagues aim at determining the importance of short disulfide loops for the correct sorting to, and release from, secretory granules. They utilize hybrid secretory proteins where sequences encoding disulfide loop from different hormones are cloned in frame with the same secretory peptide, and assess how the presence of the disulfide loop affect the ability of the protein to aggregate in the ER and to get sorted for secretion. By immunofluorescence analysis they show that the presence of a disulfide loop increases the ability of the peptide hormone to form aggregates in the ER, and these observations are confirmed by immunogold-EM. Importantly, aggregate formation is seen both in professional secretory (N-2a) and non-secretory (COS-1) cells. Using imm