5,788 Matching Annotations
  1. Oct 2020
    1. Reviewer #3:

      In this interesting study, the authors explored the effect of five consecutive generations of high-fat high-sugar diet (WD) in mice and their offspring's metabolic performance under a normal chow diet. It is very interesting to find that the chow-diet-fed progenies from these multigenerational western-diet-fed males develop a "healthy" overweight phenotype (which means without problem of glucose metabolism and fatty liver abnormalities) that persist 4 subsequent generations. In parallel, the authors also performed zygotic sperm RNA injection using sperm RNAs from the WD-fed males (both from first generation and five generations of feeding) and showed that the sperm RNA indeed induce offspring metabolic phenotypes in F1 mice and some phenotypes persist to F2-F3, but none persist to F4, which is different from the mating induced phenotype (last 4 generations). The study is overall well-performed and the comprehensive examinations (especially on phenotypes) represent an advance to the mammalian epigenetic inheritance field. I have a few concerns and suggestions for further improvement.

      1) In the abstract, I strongly recommend the authors to clarify what is a "healthy" overweight phenotype, which in the current paper means normal glucose metabolism and without fatty liver. This will make the information in the abstract more informative and precise. In fact, this is the major novel discovery in the phenotypic exploration, not only for social-medical implications, but also from the perspective of evolution. It looks like the five-generational western-diet-fed males have evolved to develop a protective mechanism in glucose and liver fat metabolism that can be inherited by the offspring. The underlying mechanism is intriguing and worth exploring in the future using this model. More extensive discussions on the social-medical and evolutionary aspects could be included.

      2) Regarding the phenotype induced by sperm RNA injection, the description should be more precise as the current description is not all consistent with the data presented. In Figure.4, some parameter changes persist to F2-F3, this already suggests transgenerational inheritance rather than merely intergenerational transmission. The more precise description should be that sperm RNAs can unequivocally induce intergenerational phenotype, but may induce some transgenerational features - although the effect is weaker than the effect induced by whole sperm. In fact, in a previous study using a mental-stress induced model, sperm RNA injection can also induce phenotype in both F1 and F2 generations (Nat Neurosci. 2014 May;17(5):667-9.).

      3) The sperm small RNA analysis part (Fig. S4) is relatively weak. The datasets generated are in fact quite valuable as they include the sperm from the control diet, first-generation WD and the Fifth-generation WD. This is an opportunity to explore the difference especially between the first-generation WD and Fifth-generation WD as no one has done this before. The current data analyses are crude and did not show these differences in an informative way. It is needed to at least provide the overall length distribution of each datasets with the annotation of different types of small RNAs. The authors have shown some difference regarding miRNAs and tRNA-derived small RNAs (tsRNAs) in Fig.S4, it would be interesting to also look at the rRNA-derived small RNAs (rsRNAs) because rsRNAs are also extensively discovered in both mouse and human sperm and these sperm rsRNAs are sensitive to dietary changes (Nat Cell Biol. 2018 May;20(5):535-540; PLoS Biol. 2019 Dec 26;17(12):e3000559.), closely associated with mammalian epigenetic inheritance and thus represent a component of the recently proposed sperm RNA code in epigenetic inheritance (Nat Rev Endocrinol. 2019 Aug;15(8):489-498). The reanalysis of the datasets could be done by SPORTS1.0 (Genomics Proteomics Bioinformatics. 2018 Apr;16(2):144-151.), which provide the annotation and analyses of miRNAs, tsRNA, rsRNAs and piRNAs that have been used in the above mentioned publications (Nat Cell Biol. 2018 May;20(5):535-540; PLoS Biol. 2019 Dec 26;17(12):e3000559)

    2. Reviewer #2:

      Raad et al. examined the effects of multigenerational paternal exposure to an obesogenic diet on epigenetic and metabolic alterations at somatic and germ cell levels. The experimental work addresses an important question. The findings are intriguing that sperm mRNA and natural crosses have different effects on offspring metabolic states. The major tissue of interest explored was WAT. Fat cell size, no and gene expression were reported. The intriguing thing about these data is that the sperm RNA microinjection did not fully recapitulate the effect across multiple generations - there is little explanation of potential mechanisms.

      There is no detailed coverage of the gene changes, small RNAs, piRNAs etc observed and the pathways implicated. This would be a welcome addition.

      As this is such a complex design, more overall schematics would be helpful.

      Number of mice per group ranges widely, and it is unclear how many matings this represents. Fig 3 legend states 4 WD1 and 9 WD5 males from different littermates were mated with CD females - again, unclear - do you mean from different litters? Numbers shown in panel A do not seem to concur with those in panels B, C

      Figure 1 shows outcomes for WD 1,2,3,4,5 and largely focuses on gWAT. Gene expression changes are only briefly summarised. Only 1 CD generation is represented.

      It is unclear why mice were studied at the various ages- eg Across data sets, ages shown range from 10 weeks, 12 weeks, 16 weeks, 18 weeks. Note there are inconsistencies regarding figure formats and some details are missing, which makes it hard to understand what the authors found. Fig S3 and S5- no n values given. Labels in S4 D, E hard to follow.

      In several of the figures, it is not clear what the significance (*) is being compared to - is it always CD? Eg Figure 3, Figure 4

      It appears that variability increases from WD1 to WD5- with larger ranges evident- is this why n increases across generations? And is this a consistent observation across paternal studies of this kind?

      The effect of paternal WD on BW, GTT and adiposity is relatively larger in mice than rats- have the authors considered species differences?

      One page 10 the authors state that the diet used is not associated with hepatic steatosis - but I would have thought there was good evidence of this occurring in mice, over the timeframe described here.

      The intriguing thing about these data is that the sperm RNA microinjection did not fully recapitulate the effect across multiple generations - there is little explanation of potential mechanisms.

      It is surprising that there is no detailed coverage of the gene changes observed and the metabolic pathways implicated. The story is undersold.

    3. Reviewer #1:

      While this study is focusing on an interesting hypothesis and attempting to address the molecular mechanisms at play, there are numerous flaws in the study design and the statistical test that prevail from drawing conclusions.

      1) In line 72, the authors state that "the average body weight of the WD-fed male mice increased gradually with multigenerational WD feeding", however, the results of the test indicating gradual increase is not reported. As described in the legend of Figure 1, the test performed tested differences in body weight between the control group and each individual generation, not the generations to each other. Visually, it rather seems that in fact, body weight was not gradually increased for instance, comparison of WD1 and WD3, or WD2 and WD5, does not support the "gradual increase" in body weight that the authors are claiming.

      2) There is a lack of clarity in the methods in regards to numbers of animals used in each generation, the number of founders, and what constitutes the control group. In the legend of Figure 1, it is stated that 5 males were used from WD2 and on. However, the method section states "(...) 4 to 6 independent males of WD1 group". The reviewer assumes that the authors know how many animals were used in the WD1 group, and that the authors meant 4 to 6 animals per WD generation. However, if the details indicated in the legend of Figure 1 are accurate (5 fathers per group from WD2), how is it possible that 4 to 6 animals were used? The reviewer suggests to clarify this in the text, as well as in a more detailed experimental setup diagram stating the number of fathers in each generation, the number of offspring studied in each litter, and the total number of offspring studied for each generation.

      3) In Supplemental Figure 1I, the CD1 group appears to be composed of 7 individuals and the CD2 group of 10 individuals. This is not consistent with the numbers reported in Figure 1A (10 in CD1 and 13 in WD3) and Figure 1B (22 visible dots). It is thus difficult for the reviewer to trust that body weights were truly compared between all animals in CD1 and CD5. Regardless, the reviewer is intrigued by the choice of the authors to only study control animals from the first generation (CD1), and the fifth generation (CD5) offspring, as they describe in the methods that, for the control group, they followed the same procedure as the WD group, which should have led to the generation of control animals in all F1, F2, F3, F4 and F5 generations. The authors should clarify on this, and if they indeed generated these animals, they should use body weight data in each generation of controls and compare them to their respective generation WD group (i.e. CD1 to WD1, CD2 to WD2 etc..). By having different sample size in the various groups, the authors are biasing results of the statistical test being made, as greater sample size is likely to compare statistically different than a group with lower sample size (as with CD(22 observations) and WD2(12observations) in Figure 1B, but also with the RNA-seq results). In the same line, there were more animals studied in WD4 and WD5 compared to WD1-3 which is likely biasing statistical analysis. Again, if the study design described in the methods section is accurately reported, it implies that an average of 3 offspring per fathers were used in WD1-3, and 8-10 (a full litter) for the WD4-5.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. David E James (The University of Sydney) served as the Reviewing Editor.

      Summary:

      In this manuscript, Raad and colleagues exposed male mice to a western diet before conception for 5 consecutive generations and measured body weight, adiposity and various metabolic markers in the offspring. Sequencing of small RNA in sperm from founders identified several differentially expressed tRF and miRNA species. Microinjection of RNAs recapitulated some, but not all effects on body weight and metabolism. The authors report an aggravation of adiposity along generations and a phenotype that persists for 4 consequent generations. Such persistence of phenotype was not observed in animals originating from microinjection of total RNAs, suggesting other epigenetic mechanisms are at play in the persistence of phenotype. Overall the studies were considered to be of interest by the referees but one major overarching problem identified by them concerned the study design and the statistical analyses that limited interpretation of the study. These issues need to be seriously addressed by the authors. These and other points are listed below.

    1. Reviewer #2:

      Overall, I think this is a creative study, with very interesting findings. A major weakness is that the interpretations seem a bit exaggerated and alternative interpretations not considered.

      Using a creative paradigm of perceptual filling-in, the authors show that increased attention (indexed by a reduction in alpha power over central-parietal locations, and supported by previous psychophysics studies) is associated with perceptual filling-in, and the phenomenal disappearance of targets. By tagging targets and surround with different frequencies, they show that SSVEP elicited by targets increases at the time of perceptual filling-in.

      These results suggest that SSVEP, thought to index the content of visual perception in previous binocular rivalry studies, can be dissociated from conscious perception in this paradigm, and instead reflect attention.

      While the results are interesting and novel, they are perhaps not as surprising as the authors present them to be. Given that previous studies have shown a clear connection between SSVEP and attention (e.g. Ref 14 cited by the authors), these results show that when attention and awareness are dissociated (as the last author has nicely demonstrated/argued previously), SSVEP goes with attention.

      These results do not demonstrate that all sensory-cortical activity goes along with attention instead of awareness, as the authors' abstract/significance statement/discussion suggest to be the case. E.g., in the abstract/significance statement, the authors only state "neural activity" or "neural response", instead of specifically SSVEP, which can be misleading. Similarly, in discussion, it remains a possibility that other types of neural activity (e.g. spiking rate or recurrent activity) in sensory cortex correlates with the vividness of conscious experience, which would in principle be consistent with first-order or GNW theories.

      An analysis comment:

      In discussion, the authors mention "As more targets disappeared and presumably drew attention, both the duration of their absence and strength of target SNR increased."

      The duration effect, shown in SI, is not referenced in the main text as I could find. In Fig. 2, in addition to investigating SSVEP's relation with the number of disappeared targets, the authors could also test its relation with the duration of PFI.

    2. Reviewer #1:

      General assessment:

      In this paper, Davidson et al. characterize the neural correlates of visual disappearance during perceptual filling-in (PFI) using steady-state visual evoked potentials (SSVEPs). They show that target disappearance actually leads to an increase rather than to a decrease of the target SNR. This finding is potentially of importance. However, the current version of the manuscript does not provide enough details regarding the underlying assumptions and neural mechanisms. The results should also be better described, interpreted and compared to the existing literature. I list my most substantive concerns below.

      Substantive concerns:

      1) I was a bit frustrated to see that almost no discussion about the neural mechanisms underlying the results is provided. It seems important to better explain the cortical processes involved (e.g. the authors could compare more carefully their results with those obtained in macaque electrophysiology by De Weerd et al. 1995).

      To go further along this direction, one possibility would also be to analyse the SNRs at the intermodulation frequencies (I see in supplementary figure 3 that responses at F2-F1 = 5Hz are significantly above noise). This would permit to characterize and discuss the interactions between the neural responses corresponding to the processing of the targets and to the surround (see e.g. Appelbaum et al., 2008).

      2) When I read the whole manuscript, I had the feeling that the analysis of the SNR change latencies (which is currently described in the supplements) would deserve to be more documented and to appear in the main document. The finding that changes in background SNR precede changes in target SNR is an important result which clarifies the temporal sequence of neural activations. That would also be nice if the authors could determine when the SNR change corresponding to the inter-modulation product (e.g. at F2-F1) appears (see my first point above).

      3) To better characterize the difference between the responses to PFI vs to phenomenally matched disappearances (PMD) and support the claim that target-SNR decreases rather than increases during PMD (l. 170), that would be great to show the target-SNR changes around button press (i.e. the equivalent of figure 2 b & e) for PMD.

      4) The target disappearance during PFI is associated with an increase of SNR and therefore, SSVEPs in this case do not reflect conscious perception. But does it necessarily imply that this target-SNR increase reflects attention instead? The authors base their interpretation on previous studies (Lou, 1999; De Weerd et al., 2006) where attending to target feature increased PFI probability (which I think is not exactly equivalent to the PFI magnitude reported here) and also on the correlation they found between target-SNR and evoked alpha. However, these are indirect evidences and in their experimental protocol, attention was not directly manipulated (as e.g. in Morgan et al., 1996 or Müller et al., 2006). I would suggest being a little bit more cautious with this interpretation in the manuscript.

      5) Before this study, other groups looked at the dissociation between attention and perceptual awareness (among others, see e.g. Wyart & Tallon-Baudry, 2008; 2009; Koivisto et al., 2009; Norman et al., 2013). A deeper review of the existing literature on this topic (in the introduction and/or discussion) would permit to better understand what is already known and also to provide leads for future investigations.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      This manuscript describes a human EEG study which aims at characterizing the neural correlates of visual disappearance during perceptual filling-in (PFI) using steady-state visual evoked potentials (SSVEP). The authors report that target disappearance leads in this paradigm to an increase rather than to a decreased SNR of the target SSVEP. The authors interpret this "neural correlate of invisibility" as an empirical challenge for existing theories regarding the relationship between SSVEP and conscious perception. The two reviewers have found the study to be creative and its findings to be of potential importance for the field. However, they have also raised concerns regarding the interpretation of the findings proposed by the authors, which would require additional analyses to be supported by the data and a more extensive account of the existing literature on the relationship between the neural correlates of visual awareness and attention. There are also concerns regarding the number of subjects included in the analyses which should be clarified. The paragraphs below describe the main concerns that have been discussed among reviewers and the reviewing editor.

    1. Reviewer #3:

      In this manuscript, Robinson et al., identified alternative first exon (AFE) switching events conserved between mouse and human following macrophage inflammation. Using short and long-read sequencing, the authors identified a few unannotated transcription initiation sites (TSS) that are specific to an inflammatory response. Among those, they centered on an unannotated TSS in the Aim2 gene that drives expression of a novel isoform regulated by an iron-responsive element in its 5′UTR.

      While previous work had documented crucial AFE switching events in many other biological contexts, Robinson et al. presents here an interesting AFE switching event that can have potential implications for our understanding of the molecular regulation of the innate immune response. I would expect further progress on global mechanisms and biological relevance of these AFE switching events, as well as evidence that the AFE are truly first exons/TSSs.

      Substantive concerns:

      1) Are the AFEs truly first exons/TSS? While both short-read and long-read sequencing detected changes in alternative splicing choices, neither of those are optimal methodologies to analyze first exons. Therefore, I suggest to use a more specialized method to identify (and quantify) more accurately the usage of first exons. Globally, cap analysis of gene expression (CAGE) would be ideal. For validation of specific AFE changes, the qPCR technique has a few issues. First, it does not have nucleotide resolution, so the authors should not refer to TSSs if they used this technique for validation. Second, many downstream first exons are also used as internal exons in other isoforms. There is not a direct technology to analyze specifically first exons/TSSs here. Also, RNA-sequencing technologies, depending on their depth, can definitely miss specific isoforms. Considering a low coverage in 5'end of genes in RNA-seq analysis, this is particularly important for first exons. A qPCR would only analyze the well-known TSSs. Thus, 5'RACE or a similar technology should be performed to assess the relative usage of AFE specifically.

      2) Global mechanism. The authors assumed that the mechanism of AFE switching is generated by transcription initiation and looked for transcription factors binding and chromatin structure modifications in promoters. However, they did not rule out the possibility that the global switching effect is a post-transcriptional regulation, such as differential mRNA stability. A transcription initiation measurement (e.g., 4SU metabolic labelling) is necessary to demonstrate that the changes in AFE usage are co-transcriptional. In addition, in terms of their ATAC-Seq analysis, the chromatin structure changes in promoters can be the cause or consequence of transcription initiation. Thus, it should not be listed as one mechanism driving the expression of AFE events (line 145). Also, to demonstrate a mechanism based on transcription factor binding more than 2 transcription factors should be considered. In any case, the expression patterns of the transcription factors considered are not clear. As a minor note, the bioinformatic analysis of the two promoter regions driving the isoforms of Aim2 (line 156) is not explained in the method section.

      3) Biological relevance. Could the authors evaluate whether the translation regulation of Aim2 based on its AFE switching is a more generalized phenomenon? Are there any global gene regulation changes triggered by the other genes with significant changes in AFE usage?

    2. Reviewer #2:

      This manuscript by Robinson et al. presents an interesting and timely analysis of a wealth of transcriptome data upon immune stimulation. The unique combination of long-read Oxford Nanopore and short-read Illumina high-throughput sequencing across both human and mouse samples presents an opportunity for many interesting inter-species immune response comparisons, as well as elucidation of full-length transcript information. This paper is well-written and has interesting validation and discussions regarding Aim2. My major concern is that the paper seems to narrow in on the characterization of Aim2 and class of RNA processing changes (alternative first exons) quite quickly without really delving into the rest of the data and how they arrived there. Below are my major/minor comments and suggestions:

      1) I would have liked the authors to provide more insight into how they honed-in on specifically talking about first exon changes, by discussing more of the other RNA processing changes they found. There is cursory mention in the text and figures of other alternative exon or splice site changes. Firstly, other studies (including those referenced by the authors) have found hundreds of RNA processing changes genome-wide upon immune stimulation - especially of cassette exons, alternative splice sites, and last exon/3'UTR changes. However here, the authors only find tens of changes (Fig 1B). Are they underpowered to identify changes and can they do any sort of analyses to show that they are sufficiently powered (# of sequencing reads & junctions, complexity of reads, etc)?

      2) Similarly, I would also be interested in seeing an analysis indicating whether the 50 AFE events that overlap between the long-read and short-read sequencing analyses is a statistically significant overlap. Particularly, how many overlapping events would be expected given the difference in quantification power between the two methods? How many real AFE differences might the authors be missing because the long-read sequencing methods often do not have the power to identify them (ie. lower expressed genes in one or the other condition, thus dropout of isoforms and perhaps fewer isoform differences for differentially expressed genes).

      3) Second, for the non-AFE changes that they did find, there is very little discussion about what those changes might represent. Specifically: (a) how many changes are validated with long-read data?, (b) is there any insight into specific domains being included/changed, especially using the long-read data?, (c) how many of these non-AFE changes overlap between species? and (d) which types of genes show higher overlap between species and what are their characteristics (binding sites, etc)? To my knowledge, this is the first study that is really designed to properly really look at the conservation of splicing or RNA processing changes after immune activation, so I would love to see more analysis and discussion of this aspect genome-wide.

      4) The authors define significant splicing changes as those with a p-value <= 0.25 and |dPSI| >= 10. I'd like some more clarification on whether this is an adjusted p-value (BH, FDR, or some other multiple test-corrected p-value). Especially if this is adjusted, I find it surprising that the authors are choosing such a liberal statistical confidence level and that even with such a liberal threshold, they are only getting tens of significant events. I would like the authors to at least show these same trends across multiple p-value thresholds or with rank threshold analysis (top 5%, top 10%, top 20%) to show biological trends.

      5) The authors introduce their long-read sequencing data by mentioning that they wanted to identify "additional splicing events that are not captured using short-read sequencing." They then go on to only talk about novel first exon events identified with the long-read sequencing data. Did they identify any other non-AFE events in using the long-read that could then be quantified with the short read data? And second, how do they quantify confidence for novel AFE isoforms, when long-read data seems to have lots of issues with properly sequencing the terminal ends of transcripts (particularly the 5' end when polyA primed, as occurs in ONT DirectRNA sequencing)? They mention the use of ATAC-seq data to show putative promoter support, but mention at one point in their methods that ATAC regions within 10kb of AFEs are considered. This seems like it could be a rather large region to be sure that the ATAC peak is specific to a novel AFE - what is the average distance between AFEs? Finally, I would love to also see the incorporation of CAGE-seq data (or other 5'end data) to validate the specific AFEs sites - which I believe the FANTOM consortium has across many human and mouse tissues.

    3. Reviewer #1:

      Our understanding of the transcriptomic impact of innate immune signaling remains incomplete. Here Robinson et al., use both long and short read RNA sequencing to gain further insight into LPS-induced changes to mRNA isoform expression in human and mouse macrophages. Their studies report the novel observation that the most common change in isoform expression is alternative use of the first exon. Such changes are indicative of transcriptional regulation, and is thus consistent with the known impact of innate immune signaling on activation of multiple transcription factors. Despite some minor concerns with details of the study, as enumerated below, this is a well-executed and important study that will be of interest and importance to many studying innate immunity, as well as those interested in gene regulation.

      Major comments:

      1) In some ways this is minor, but the authors should be careful to not describe alternative first exon use as alternative splicing. While a novel splice junction is created, mechanistically this is driven by changing transcriptional regulation, and then splicing occurs in the only pattern available to that TSS. In general this is described appropriately in the manuscript, but at a few points there is confusing terminology.

      2) An interesting and somewhat surprising point in the manuscript is that 50% of the AFE events don't show an overall change in gene expression. For Aim2, which does change, the authors show that the AFE change is due to activated use of the unannotated TSS in LPS-stimulated cells. For those genes for which AFE use doesn't correlate with a change in gene expression (e.g. Ncoa7, Rcan1, Ampd3 - Fig S3) is there still transcriptional activation of one TSS and transcriptional silencing of the other? In other words, is there coordinated regulation of the two TSSs to ensure overall message abundance doesn't change, or does activation of one TSS inherently shut off the other (more akin to splice site competition in traditional AS)?

      3) The data suggesting that an IRE regulates translation of the induced 5'UTR is compelling, but more work should be done to confirm. Most importantly, the experiment in Figure 4J should be repeated with the deltaIRE version of the unannotated UTR. Also is the IRE regulation controlled upon LPS-stimulation, or just the presence of the IRE element? In other words, what is the distribution of the annotated and unannotated isoforms in the polysome in the absence of LPS (i.e. repeat 4P without LPS)? Can the authors comment on whether the level of iron or the activity of IRP1/2 change in LPS-stimulated cells?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Timothy W Nilsen (Case Western Reserve University) served as the Reviewing Editor.

      Summary:

      There was significant enthusiasm for the work. However, it seems that considerable effort including additional experiments will be required to firm up the conclusions.

    1. Reviewer #2:

      The authors set out to investigate whether the cerebellum plays a domain-general and predictive role in speech perception. They leveraged the online platform, Neurosynth to conduct a meta-analysis of fMRI studies to compare the activation results between speech perception and speech production studies. They find that there are distinct as well as overlapping regions of perception- and production-related activity in the cerebellum, and that each of these regions has a distinct connectivity fingerprint with the cerebral cortex. They mined text data from thousands of studies in Neurosynth to determine which labels best explain these speech-perception and speech-production activity patterns. They find that cerebellar regions activated by speech-perception, speech production, and their overlap, are also associated with cognitive and motor processes beyond the domain of speech and language. On the basis of these results, they argue for a domain-general view of cerebellar processing.

      One of the most interesting findings in this paper is that speech-perception and speech-production tasks elicit both distinct and overlapping activity patterns in the cerebellum. It has long been known that the cerebellum is activated by speech processing, however, it has been less clear to what extent these two processes (perception and production) differ in their activation patterns. Importantly, the authors also show that these distinct and overlapping networks in the cerebellum display connectivity patterns with corresponding regions of the cerebral cortex. However, there are some major concerns.

      One of the central take-aways from this study is that prediction is a domain-general mechanism that supports speech perception in the cerebellum. The authors argue for domain-generality on the basis that regions activated by speech perception and production in the cerebellum are also activated by a wide range of non-speech tasks. However, I was a bit confused by this argument. It is my understanding that the same region of the cerebellum can be activated by many different tasks, and that each task will demand its own computational description. However, that does not necessarily provide evidence for domain-generality. What could point to domain-generality is a function/computation that explains the diverse set of computations required by the tasks. That speech-related regions of the cerebellum are also activated by a range of non-speech tasks does not (in my opinion) support a domain-general view of cerebellar processing.

      Another take-away from this study is that the cerebellum plays a predictive role in speech processing. Prediction is at the core of many theories of cerebellar function (e.g., internal models, error-based learning), of course, it is a very broad term that is not necessarily unique to the cerebellum. The authors hypothesize that, "if the cerebellum is involved in prediction during natural speech perception, there should be a greater amount of activity throughout the brain when the cerebellum is not active during this task". The authors compare two different sets of speech perception studies, those that report cerebellar activation and those that do not. They then compare the level of activation in cortex versus cerebellum for both of these study types. They find that cortical activation in the "no cerebellum" studies is increased relative to cortical activation in the "cerebellum active" studies. On the basis of these results, they infer that the cerebellum must be involved in prediction and that prediction results in metabolic savings (i.e. decreased activity in cortex). However, why did the speech perception tasks in the "no cerebellum" studies not activate the cerebellum. Did they not involve prediction in some capacity? There are likely other reasons that there was increased cortical activation in the "no cerebellum" studies that are unrelated to the absence of cerebellar activation.

      It is also not clear to me why speech perception studies that involved passive sound and music perception were included. How are tones related to speech perception? It would have been helpful if the authors had shown consistency across the different modalities (i.e. speech, sounds, instrumental music, and tones). I'm also assuming that the speech production studies were not matched across these four groups. Couldn't differences in activity patterns arising between the two study types potentially be attributed to sounds, instrumental music, and tones present in the speech perception studies?

    2. Reviewer #1:

      I have very much enjoyed reading this piece of work, investigating the role of the cerebellum in non-motor functions using a meta-analysis and focusing especially on speech perception and predictive processing. I believe that this work is highly relevant to the field and will contribute considerably to the understanding of cerebellar functioning.

      I appreciate the careful description of the methods and the aim to challenge the hypotheses through additional testing. However I have only very few major concerns, which however I believe are all addressable:

      1) From page 8, but mainly throughout the whole paper: I am concerned with the inclusion of 22.5% of instrumental music or tone studies. The paper's overall focus is on speech perception and production, and the authors always only refer to "speech" throughout the manuscript. Whereas the inclusion of speech sound perception studies can be easily justified, the inclusion of tone perception is highly different if the focus lies on speech, e.g. due to the varying complexity of the input signal.

      Although the authors address this issue in the limitation section, it weakens the overall impact of the findings (as they also state, but downplay). For consistency the authors should exclude tone processing studies from their analysis; as the role of cerebellum in contributing to processing of time and potential motor sequencing is widely discussed in the literature (see Gordon et al 2018, PLoSOne, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6242316/ ). As I very much support the ideas presented in the paper I believe a clear differentiation between perception of speech and perception of music is crucial for making a convincing argument regarding the role of the cerebellum in "passive" predictive language perception, if that is the focus of the paper. It would be interesting, however, if the regions for perception differ when including music studies compared to speech studies only. A separate analysis of the tone studies might not be feasible for 20 or so studies.

      Generally, the authors should either refrain from setting the focus on "speech perception" when the paper clearly focuses on "speech and tone perception" (or more generally "non-motor auditory perception", which is, by the way, not problematic at all, as the findings support a domain-general function of cerebellum. In that case speech perception should not be mentioned singularly in the title. However, if the authors wish to make a statement on speech perception, then they should exclude the tone perception studies from the analysis.

      2) Relatedly, page 5 last sentence, whereas I do agree with this approach and appreciate effort to test the own hypothesis, this approach is missing the testing of an alternative hypothesis: Could the decrease of general cortical activation be linked to the greater activity of a different region, other than the cerebellum. This should be at least discussed.

      3) Page 16/20: To test their hypothesis the authors compare the cortical activation of studies that report cerebellar activity and those that don't. If the cerebellum had this domain general function in predictive processing why would it not be active in some studies? Was there a systematic difference between the two sets of studies, and, as furthermore argued, did those studies that did not activate the cerebellum use indeed speech in novel contexts? A further investigation of the difference between the two sets of studies would be helpful in support of the argumentation.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The importance of cerebellum in cognition generally, and in speech processing more specifically, are timely and interesting questions, and metaanalysis is a helpful tool. The paper is clearly written. However, in the opinion of at least one reviewer and the Reviewing Editor, neither of the two stated aims of the paper were satisfactorily achieved.

      The stated aims were to demonstrate:

      1) "that the cerebellum plays a domain-general role in speech perception-that is, a role that is not inherently speech specific." However, just showing coactivation with other tasks does not indicate domain-generality; for a variety of reasons. First, this conclusion is not supported because of the computational specificity issue raised by Reviewer 2, and second, coactivation in brain imaging can be an artifact of the spatial resolution of BOLD, and of preprocessing -- it does not necessarily imply coactivation at a neural level.

      2) "that the domain-general role played by the cerebellum and its connections during speech perception is related to prediction." Two lines of evidence are offered for this. 1) the reverse inference that regions identified in the paper are associated in Neurosynth with the term 'prediction'; and 2) that there was more cortical activity when the cerebellum was inactive. Just because accurate prediction should reduce activity, doesn't mean that a reduction in activity signifies prediction.

    1. Reviewer #3:

      Otsuka et al. report the characterisation of three temperature sensitive alleles of genes which prominently lead to overproliferation of cells in lateral root primordia. Interestingly this phenotype which is not underpinned by alteration of the auxin pattern, can be phenocopied by treatment with ROS and by interfering with the mitochondrial respiratory chain. This reveals that ROS modulate cell proliferation in the LR. The cloning and biochemical characterisation of the genes affected, reveal that all three encode enzymes involved in mt RNA processing, that perturb the production of certain components of the mitochondrial electron transport chain.

      This is an excellent manuscript that points to a new and very interesting link between primary metabolism and cell proliferation in lateral roots. It is remarkably well written and presented. The conclusions are fully supported by the data. As it is the case for exciting new discoveries, they raise a lot of questions and this manuscript is no exception. It would be very interesting for future work to uncover the nature of the molecular link between ROS and cell proliferation and why are LR so sensitive to this. It'd be eventually interesting to speculate whether the reported existence of an hypoxic environment in the centre of the LRP has to do with this.

      The one point I would like to hear some comments from the authors about relates to the growth conditions used to reveal the phenotype at restrictive temperature. They mention that they use explant culture on RIM (characterised by high glucose and high 2.5µM IBA). What's the penetrance of the phenotype in standard (1/2 MS, 1% sucrose, no additional auxin/IBA)?

    2. Reviewer #2:

      The manuscript by Otsuka and coworkers, describes the mapping of the mutations in rrd1, rrd2 and rid4 causing the temperature sensitive lateral root morphogenesis defects (fascinated LR meristem). Interestingly, the respective mutated genes all map to genes involved in mitochondrial mRNA processing, mRNA deadenylation, and mRNA editing. The authors propose that defective ROS homeostasis is causal to excessive cell proliferation in the lateral root primordia, and associated fasciation phenotype. Overall the manuscript is well-written, and is overall convincing with respect to characterization and mapping of the mutants, and the importance of RNA editing in mitochondria for the mutant phenotypes. I am not yet entirely convinced about the link to ROS production and the lateral root morphogenesis defects.

      1) The fascinated LR phenotype is reminiscent of mutants defective in coordination of LR emergence, such as CASP:shy2 (Vermeer et al). Suggesting that defective signaling in LR overlaying layers, could be causal to the observed phenotype. However, the phenotyping presented in this manuscript does not allow to assess this. A detailed staging of LRPs would be required, and/or an analysis of the LRP developmental dynamics using a root bending assay.

      2) Furthermore the expression domain analysis shows clear expression in LRPs. However, I suspect expression of at least RID4-GFP in LRP overlaying layers. However, the resolution of the picture, and interference of the bright PI counterstaining in Fig2B preclude a thorough assessment of this.

      3) The colocalization analysis in Fig 2D and E is not very clear. The mitotracker signal is set a bit too weak, making it difficult to assess the distinction between the GFP signal and the overlapping (yellow) signal). This could be amended by using different LUTs (also green/reds are not great for colorblind readers). Of note is the presence of a relatively large structure labeled by RDD1-GFP, that is not colocalizing with mitotracker, suggesting it also localized to another subcellular compartment. Therefore, colocalization should be addressed more quantitatively, also using additional organellar markers. Additionally, the mitochondrial localization could be further supported by western blot on purified mitochondria.

      4) The accumulation of polyadenylated transcripts in Fig3D, seems to also display a temperature sensitivity in the WT. Why was this assay not done using a quantitativePCR, that will allow for better appreciation of temperature component.

      5) In contrast to the LR phenotyping as displayed in Fig 1, the LR phenotyping in Fig4 is done in a completely different way. Why not use a uniform way to quantify. As it was done now, the suppression of rdd1 by ags1 mutation, is not very convincing, as the rrd1 phenotype is nearly abolished in the Col-0 introgressed line (Fig 4 B), suggesting that the rrd1 phenotype is sensitized in the Ler background.

      6) While the authors focus on the LR morphology phenotype in the mutants, there is also a prominent effect on primary root growth that is not described. However, this phenotype does not seem to be very ecotype-specific, and is rescued in the ags1 background. A small phenotypic characterization of the primary root phenotype could thus be beneficial for the manuscript, and it’s wider relevance for development.

      7) Fig5. -> explain arrowheads in B, in the legend. Bar charts using mean + and - SD should be avoided when you do not have many data points, as in D and F (N=3 and 2). Better to show the raw data. Loading controls are missing for Fig5 C and E.

      8) The section about ROS is all based on ROS related pharmacology. However, ROS levels in the mutants were not assessed, making it difficult to use the pharmacological treatments to interpret the origin of the mutant phenotypes.

      9) What is the link to the temperature sensitivity. Are these mutants hypersensitive to ROS inducing treatments?

      10) While the role of ROS in LR development is key to the proposed model, the authors did not introduce what is the state of the art about ROS in lateral and primary root development.

      11) In their model the authors might need to discuss whether or not ROS from the LRP could act as an intercellular coordinative developmental signal.

    3. Reviewer #1:

      This study continues research started by Professor Munetaka Sugiyama and his laboratory who identified about 20 years ago, or so, very interesting temperature-dependent fasciation (TDF) mutants affected in lateral root primordium (LRP) morphogenesis. The authors identified and reported in this study genes responsible for the mutant phenotype of the root redifferentiation defective 1 (rrd1), rrd2, and root initiation defective 4 (rid4). Intriguingly, all the genes are involved in RNA processing. Detailed analysis of the role of RRD2 and RID4 in mitochondrial mRNA editing and RRD1 in poly(A) degradation of mitochondrial mRNA make this work a solid and substantial study. The fact that pharmacological treatments of wild type seedlings by mitochondrial electron transport inhibitors can phenocopy the fasciated LRP phenotype is really fine. Similarly, the experiments with paraquat and ascorbate are very interesting. The main conclusion of the work (that LRP morphogenesis is linked to mitochondrial RNA processing and mitochondrion-mediated ROS generation) is novel and significant. I think this is an important step forward in our understanding of LRP morphogenesis.

      I see only one main conceptual or interpretation problem.

      The authors conclude that "that mitochondrial RNA processing is required for limiting cell division during early lateral root (LR) organogenesis" (line, L, 51). A similar statement appears on L101-103 where the authors postulate that TDF encode "negative regulators of proliferation that are important for the size restriction of the central zone during the formation of early stage LR primordia". Again, similar statements appear on L151-152, 344, and in the section of discussion "Mitochondrial RNA processing is linked to the control of cell proliferation", especially where the authors say about "the control of cell proliferation at the early stage".

      To my opinion, the above conclusions are arguable and cannot be accepted. To conclude about excessive cell division, the number of anticlinal divisions must be estimated per founder cell. This analysis has not been performed. The fact that at early stages LRPs are wider in the TDF mutants suggests that a greater number of FCs in the longitudinal plane participate in LRP formation. So, if this is correct, the mutations apparently affect control of lateral inhibition, and TDF genes are negative regulators of lateral inhibition. This question should be further investigated, but currently a more careful interpretation of the results is required. Also, if TDF genes encode "negative regulators of proliferation" then more frequent divisions would occur in the mutant. This question was not addressed either. If more frequent cell division is expected in early stage LRPs, this should result in formation of smaller cells. In accordance with Fig. 1D of this study and Figs. 1b and 3a of Otsuka and Sugiyama (2012), this is not the case. Contrary, it seems that at the same developmental stage there are lower numbers of cells per unit of volume in the mutants compared to wild type. Another, possible explanation of the TDF mutant phenotype, in addition to lateral inhibition, is abnormal establishment of stem cell identity or affected stem cell function. Therefore, the mechanistic explanation of the link between TDF gene action and the respective mutant phenotype is not satisfactory. The interpretation given can be corrected and carefully rephrased throughout the text.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      The reviewers were very enthusiastic about your work. They identified some shortcomings, but most of it could be addressed by text edits. The reviewers were less convinced about the envisioned link to reactive oxygen species (ROS). Ideally, you should consolidate this aspect by depicting the mis-regulated ROS in the mutant, and its restoration in the suppressor double mutants (e.g. by staining).

    1. Reviewer #3:

      The manuscript by Hutchings et al. describes several previously uncharacterised molecular interactions in the coats of COP-II vesicles by using a reconstituted coats of yeast COPI-II. They have improved the resolution of the inner coat to 4.7A by tomography and subtomogram averaging, revealing detailed interactions, including those made by the so-called L-loop not observed before. Analysis of the outer layer also led to new interesting discoveries. The sec 31 CTD was assigned in the map by comparing the WT and deletion mutant STA-generated density maps. It seems to stabilise the COP-II coats and further evidence from yeast deletion mutants and microsome budding reconstitution experiments suggests that this stabilisation is required in vitro. Furthermore, COP-II rods that cover the membrane tubules in right-handed manner revealed sometimes an extra rod, which is not part of the canonical lattice, bound to them. The binding mode of these extra rods (which I refer to here a Y-shape) is different from the canonical two-fold symmetric vertex (X-shape). When the same binding mode is utilized on both sides of the extra rod (Y-Y) the rod seems to simply insert in the canonical lattice. However, when the Y-binding mode is utilized on one side of the rod and the X-binding mode on the other side, this leads to bridging different lattices together. This potentially contributes to increased flexibility in the outer coat, which may be required to adopt different membrane curvatures and shapes with different cargos. These observations build a picture where stabilising elements in both COP-II layers contribute to functional cargo transport. The paper makes significant novel findings that are described well. Technically the paper is excellent and the figures nicely support the text. I have minor suggestions that I think would improve the text and figures.

      L 108: "We collected .... tomograms". While the meaning is clear to a specialist, this may sound somewhat odd to a generic reader. Perhaps you could say "We acquired cryo-EM data of COP-II induced tubules as tilt series that were subsequently used to reconstruct 3D tomograms of the tubules."

      L 114: "we developed an unbiased, localisation-based approach". What is the part that was developed here? It seems that the inner layer particle coordinates where simply shifted to get starting points in the outer layer. Developing an approach sounds more substantial than this. Also, it's unclear what is unbiased about this approach. The whole point is that it's biased to certain regions (which is a good thing as it incorporates prior knowledge on the location of the structures).

      L 124: "The outer coat vertex was refined to a resolution of approximately ~12 A, revealing unprecedented detail of the molecular interactions between Sec31 molecules (Supplementary Fig 2A)". The map alone does not reveal molecular interactions; the main understanding comes from fitting of X-ray structures to the low-resolution map. Also "unprecedented detail" itself is somewhat problematic as the map of Noble et al (2013) of the Sec31 vertex is also at nominal resolution of 12 A. Furthermore, Supplementary Fig 2A does not reveal this "unprecedented detail", it shows the resolution estimation by FSC. To clarify, these points you could say: "Fitting of the Sec31 atomic model to our reconstruction vertex at 12-A resolution (Supplementary Fig 2A) revealed the molecular interactions between different copies of Sec31 in the membrane-assembled coat.

      L 150: Can the authors exclude the possibility that the difference is due to differences in data processing? E.g. how the maps’ amplitudes have been adjusted?

      L 172: "that wrap tubules either in a left- or right-handed manner". Don't they always do both on each tubule? Now this sentence could be interpreted to mean that some tubules have a left-handed coat and some a right-handed coat.

      L276: "The difference map" hasn't been introduced earlier but is referred to here as if it has been.

      L299: Can "Secondary structure predictions" denote a protein region "highly prone to protein binding"?

      L316: It's true that the detail in the map of the inner coat is unprecedented and the model presented in Figure 7 is partially based on that. But here "unprecedented resolution" sounds strange as this sentence refers to a schematic model and not a map.

      L325: "have 'compacted' during evolution" -> remove. It's enough to say it's more compact in humans and less compact in yeast as there could have been different adaptations in different organisms at this interface.

      L327: What's exactly meant by "sequence diversity or variability at this density".

      L606-607: The description of this custom data processing approach is difficult to follow. Why is in-plane flip needed and how is it used here?

      L627: "Z" here refers to the coordinate system of aligned particles not that of the original tomogram. Perhaps just say "shifted 8 pixels further away from the membrane"

      L642-643: How can the "left-handed" and "right-handed" rods be separated here? These terms refer to the long-range organisation of the rods in the lattice; it's not clear how they were separated in the early alignments.

      Figure 2B. It's difficult to see the difference between dark and light pink colours.

      Figure 3C. These panels report the relative frequency of neighbouring vertices at each position; "intensity" does not seem to be the right measure for this. You could say that the colour bar indicates the "relative frequency of neighbouring vertices at each position" and add detail how the values were scaled between 0 and 1. The same applies to SFigure 1E.

      Figure 4. The COP-II rods themselves are relatively straight, and they are not left-handed or right-handed. Here, more accurate would be "architecture of COPII rods organised in a left-handed manner". (In the text the authors may of course define and then use this shorter expression if they so wish.) Panel 4B top panel could have the title "left-handed" and the lower panel should have the title "right-handed" (for consistency and clarity).

    2. Reviewer #2:

      The manuscript describes new cryo-EM, biochemistry, and genetic data on the structure and function of the COPII coat. Several new discoveries are reported including the discovery of an extra density near the dimerization region of Sec13/31, and "extra rods" of Sec13/31 that also bind near the dimerization region. Additionally, they showed new interactions between the Sec31 C-terminal unstructured region and Sec23 that appear to bridge multiple Sec23 molecules. Finally, they increased the resolution of the Sec23/24 region of their structure compared to their previous studies and were able to resolve a previously unresolved L-loop in Sec23 that makes contact with Sar1. Most of their structural observations were nicely backed up with biochemical and genetic experiments which give confidence in their structural observations. Overall the paper is well-written and the conclusions justified. However, this is the third iteration of structure determination of the COPII coat on membrane with essentially the same preparation and methods. Each time, there has been an incremental increase in resolution and new discoveries, but the impact of the present study is deemed to be modest. The science is good and appropriate for a specialized journal. Areas of specific concern are described below.

      1) The abstract should be re-written with a better description of the work.

      2) Line 166 - "Surprisingly, this mutant was capable of tubulating GUVs". This experiment gets to one of the fundamental unknown questions in COPII vesiculation. It is not clear what components are driving the membrane remodeling and at what stages during vesicle formation. Isn't it possible that the tubulation activity the authors observe in vitro is not being driven at all by Sec13/31 but rather Sec23/24-Sar1? Their Sec31ΔCTD data supports this idea because it lacks a clear ordered outer coat despite making tubules. An interesting experiment would be to see if tubules form in the absence of all of Sec13/31 except the disordered domain of Sec31 that the authors suggest crosslinks adjacent Sec23/24s.

      3) Line 191 - "Inspecting cryo-tomograms of these tubules revealed no lozenge pattern for the outer 192 coat" - this phrasing is vague. The reviewer thinks that what they mean is that there is a lack of order for the Sec13/31 layer. Please clarify.

      4) Line 198 - "unambiguously confirming this density corresponds to 199 the CTD." This only confirms that it is the CTD if that were the only change and the Sec13/31 lattice still formed. Another possibility is that it is density from other Sec13/31 that only appears when the lattice is formed such as the "extra rods". One possibility is that the density is from the extra rods. The reviewer agrees that their interpretation is indeed the most likely, but it is not unambiguous. The authors should consider cross-linking mass spectrometry.

      5) In the Sec31ΔCTD section, the authors should comment on why ΔCTD is so deleterious to oligomer organization in yeast when cages form so abundantly in preparations of human Sec13/31 ΔC (Paraan et al 2018).

      6) The data is good for the existence of the "extra rods", but significance and importance of them is not clear. How can these extra densities be distinguished from packing artifacts due to imperfections in the helical symmetry.

      7) Figure 5 is very hard to interpret and should be redone. Panels B and C are particularly hard to interpret.

      8) The features present in Sec23/24 structure do not reflect the reported resolution of 4.7 Å. It seems that the resolution is overestimated.

      9) Lines 315/316 - "We have combined cryo-tomography with biochemical and genetic assays to obtain a complete picture of the assembled COPII coat at unprecedented resolution (Fig. 7)." Figure 7 is a schematic model/picture; the authors should reference a different figure or rephrase the sentence.

    3. Reviewer #1:

      Hutchings et al. report an updated cryo-electron tomography study of the yeast COP-II coat assembled around model membranes. The improved overall resolution and additional compositional states enabled the authors to identify new domains and interfaces, including what the authors hypothesize is a previously overlooked structural role for the SEC31 C-Terminal Domain (CTD). By perturbing a subset of these new features with mutants, the authors uncover some functional consequences pertaining to the flexibility or stability of COP-II assemblies.

      Overall, the structural and functional work appears reliable, but certain questions and comments should be addressed. This study provides a valuable refinement of our understanding of COP-II that I believe is well suited to a specialized, structure-focused journal.

      Major Comments: 1) The authors belabor the comparison between the yeast reconstruction of the outer coat vertex with prior work on the human outer coat vertex. Considering the modest resolution of both the yeast and human reconstructions, the transformative changes in cryo-EM camera technology since the publication of the human complex, and the differences in sample preparation (inclusion of the membrane, cylindrical versus spherical assemblies, presence of inner coat components), I did not find this comparison informative. The speculations about a changing interface over evolutionary time are unwarranted and would require a detailed comparison of co-evolutionary changes at this interface. The simpler explanation is that this is a flexible vertex, observed at low resolution in both studies, plus the samples are very different.

      2) As one of the major take home messages of the paper, the presentation and discussion of the modeling and assignment of the SEC31-CTD could be clarified. First, it isn't clear from the figures or the movies if the connectivity makes sense. Where is the C-terminal end of the alpha-solenoid compared to this new domain? Can the authors plausibly account for the connectivity in terms of primary sequence? Please also include a side-by-side comparison of the SRA1 structure and the CTD homology model, along with some explanation of the quality of the model as measured by Modeller. Finally, even if the new density is the CTD, it isn't clear from the structure how this sub-stoichiometric and apparently flexible interaction enhances stability. Hence, when the authors wrote "when the [CTD] truncated form was the sole copy of Sec31 in yeast, cells were not viable, indicating that the novel interaction we detect is essential for COPII coat function." Maybe, but could this statement be a leap to far? Is it the putative interaction essential, or is the CTD itself essential for reasons that remain to be fully determined?

      3) Are extra rods discussed in Fig. 4 are a curiosity of unclear functional significance? This reviewer is concerned that these extra rods could be an in vitro stoichiometry problem, rather than a functional property of COP-II.

      4) The clashscore for the PDB is quite high, and I am dubious about the reliability of refining sidechain positions with maps at this resolution. In addition to the Ramchandran stats, I would like to see the Ramachandran plot as well as, for any residue-level claims, the density surrounding the modeled side chain (e.g. S742).

      Minor Comments:

      1) The authors wrote "To assess the relative positioning of the two coat layers, we analysed the localisation of inner coat subunits with respect to each outer coat vertex: for each aligned vertex particle, we superimposed the positions of all inner coat particles at close range, obtaining the average distribution of neighbouring inner coat subunits. From this 'neighbour plot' we did not detect any pattern, indicating random relative positions. This is consistent with a flexible linkage between the two layers that allows adaptation of the two lattices to different curvatures (Supplementary Fig 1E)." I do not understand this claim, since the pattern both looks far from random and the interactions depend on molecular interactions that are not random. Please clarify.

      2) Related to major point #1, the author wrote "We manually picked vertices and performed carefully controlled alignments." I do now know what it means to carefully control alignments, and fear this suggests human model bias.

      3) Why do some experiments use EDTA? I may be confused, but I was surprised to see the budding reaction employed 1mM GMPPNP, and 2.5mM EDTA (but no Magnesium?). Also, for the budding reaction, please replace or expand upon the "the 10% GUV (v/v)" with a mass or molar lipid-to-protein ratio.

      4) Please cite the AnchorMap procedure.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewers and Revision Plan

      We thank all three reviewers for their time and their comments on our manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Here Ryan et al. have used localization analysis following induced rapid relocalization of endogenous proteins to investigate the composition and recruitment hierarchy of a clathrin-TACC3-based spindle complex that is important for microtubule organization and stability.

      The authors generate different HeLa cell lines, each with one of four complex members (TACC3, CLTA, chTOG and GTSE1) endogenously tagged with FKBP-GFP via Cas9-mediated editing. This tag allows rapid recruitment to the mitochondria upon rapamycin addition ("knocksideways"). They ultimately quantify each of the 4 components' localization to the spindle following knocksideways of each component using fluorescently-tagged transfected constructs. The authors' interpretation of the results of this analysis are summarized in the last model figure, in which a core MT-binding complex of clathrin and TACC3 recruit the ancillary components GTSE1 and chTOG. In addition, the authors investigate the contribution of individual clathrin-binding LIDL motifs in GTSE1 to the recruitment of clathrin and GTSE1 to spindles. Their findings here largely agree with and confirm a recent report regarding the contribution of these motifs to GTSE1 recruitment to the spindle. They further analyzed GTSE1 fragments for interphase and mitotic microtubule localization, and identified a second region of GTSE1 required (but not sufficient) for spindle localization. Finally, the authors report that PIK3C2A is not part of this complex, contradicting (correcting) a previously published study.

      **Major comments:**

      1.The chTOG-FKBP-GFP cell line the authors generate has only a small fraction of chTOG tagged, and thus should not be used for any conclusions about protein localization dependency on chTOG. Because they were unable to construct a HeLa cell line with all copies tagged, the authors expect that the homozygous knock-in of chTOG-FKBP-GFP is lethal, and thus their experience is appropriate to report. However, the authors should not use this cell line alone to make statements about chTOG dependency. They would have to use similar localization analysis, but after another method to disrupt chTOG (as a second-best approach), such as RNAi. In fact, they have reported this in a previous publication (Booth et al 2011). However, the result was different. There, loss of chTOG resulted in reduced clathrin on spindles, suggesting it may stabilize or help recruit the complex. Alternatively, they could remove their chTOG data, but this would compromise the "comprehensive" nature of the work.

      The referee is correct. The point here is to show the results we had using this approach for all four proteins under study. For this reason, we do not want to remove this data and prefer to show our results “warts-and-all”. We feel that the shortcomings of our approach are honestly presented and discussed in the manuscript. While only a fraction of chTOG was tagged, we should expect some co-removal after its induced mislocalization. Since we saw no change, we concluded that chTOG is auxiliary.

      The “second best” approach suggested (RNAi of chTOG) is problematic for two reasons. First, chTOG RNAi results in gross changes to spindle structure (multipolar spindles) and it is difficult to pick apart differences in protein partner localization that result from loss of chTOG from those resulting from changes in spindle structure. Second, the paper is about induced mislocalization as a method for determining protein complexes once a normal spindle has formed. So, removing chTOG prior to mitosis is not comparable. If we get the same or different result, does it confirm or conflict with the data we have? Nonetheless, given the discrepancy with our earlier work, we should investigate this further.

      To address this concern, we will stain endogenous clathrin, TACC3 and GTSE1 following chTOG RNAi and measure their relative levels at the spindle.

      Making the chTOG-FKBP-GFP cell line was difficult. As described in the paper, we only recovered heterozygous clones despite repeated attempts. Since submission, we have been made aware of a HCT116 chTOG-FKBP-GFP cell line that is reported to be homozygously tagged (Cherry et al. 2019 doi: 10.1002/glia.23628).

      A note about this cell line has been added to the paper (Results section, final sentence of 1st paragraph).

      2.The authors initially analyze complex member localization after knocksideways experiments by antibody staining, which has the advantage of analyzing endogenous proteins (versus the later transfected fluorescent constructs). Setting aside potential artefacts from fixation, this would seem to be a better method for controlled analysis to take advantage of their setup (short of generating stable cell lines with second proteins endogenously tagged in a second color - a huge undertaking). The authors conclude that antibody specificity problems confounded their analysis and explained unusual results. However, I think is worth investing a little more effort to sort this out, rather than bringing doubt to the whole data set. Verifying and then using another antibody for chTOG localization would be informative. Of course, the negative control should not be their chTOG-FKBP-GFP line, as it does not relocalize most of chTOG.

      In the case of GTSE1, an alternative explanation to antibody specificity issues would be that the GTSE1-FKBP-GFP cell line is not in fact homozygously tagged. Given the low expression levels on the western provided, and the detection of GTSE1 on the spindle in the induced GTSE1-FKBP-GFP cell line (but not TACC3-FKBP-GFP), it seems plausible that an untagged copy remains. If there are multiple copies of GTSE1 in Hela cells, one untagged copy could represent a small fraction of total GTSE1. This should thus be ruled out. GTSE1 clones should be analyzed with more protein extracts loaded - dilutions of the extracts can determine the sensitivity of the blot to lower protein levels. In addition, sequencing of genomic DNA can reveal a small percentage with different reads.

      We used a two-pronged approach for assessing relocalization of protein partners (staining vs transfected constructs). The staining approach is superior since endogenous proteins are examined, but it is limited by antibody specificity. The transfection approach overcomes this limitation but is in turn limited by effects of overexpression and tagging. Together the two approaches allow us, and anyone employing this method, to get a picture of protein complexes. We didn’t want to create the impression that one or other approach is confounded, but the referee is correct that this analysis would benefit from further work.

      Specifically, to address these concerns:

      • We will verify and use alternative chTOG antibodies to try to improve this dataset.
      • We will test the possibility that an untagged allele of GTSE1 remains. We will use western blotting and a summary of our genomic analysis will be added to the paper.

        3.There is a lot of data contained in the small graphs summarizing quantification of localization in Figs 3 and 4. They would be more accessible to the reader if they were larger and/or an "example" of the chart with labels was present explaining it (essentially what is in the figure legends). Furthermore, there is no statistical test applied to this data that I see. This is needed. How do authors determine whether there is an "effect"?

      Our aim was to compress a lot of information into a small space, while still showing some example primary data. All reviewers raised the same concern which tells us that we went too far towards “data visualization”.

      To address this point, we will rework these figures.

      **Minor issues:**

      1.The GTSE1 constructs used for mutation and localization analysis are 720 amino acids long. A recent study analyzing similar mutations uses a 739 amino acid construct (Rondelet et al 2020). The latter is the predominant transcript in NCBI and Ensembl databases. It appears the construct used by the authors omits the first 19 a.a.. I do not think using the truncated transcript affects conclusions of the manuscript, but it could generate confusion when identifying residues based on a.a.#s of mutant constructs (Fig 6). This should be somehow clarified.

      We were aware of the longer transcript but were using the 720 residue form since it is the canonical sequence in Uniprot (https://www.uniprot.org/uniprot/Q9NYZ3). We did not know that the 739 form is the predominant transcript. We agree this is unlikely to affect our work but that the numbering may cause confusion.

      We have added a note to the Methods (Molecular Biology section) to accurately describe what we and Rondelet et al. have used.

      2.The labeling of constructs in Fig 6C/D is confusing, and appears shifted by eye at places. Please relabel this more clearly.

      Apologies for the error.

      We have relabeled Figure 6C,D and also made a similar alteration to Figure 5C.

      The recommended new experimental data (Analysis complex member levels on spindles after full perturbation of spindle chTOG; new chTOG antibody stainings in the FKBP lines; reanalysis of GTSE1 DNA/protein in GTSE1-FKBP line) should only require a new antibody/siRNA, plus a few weeks time to repeat the analyses already in the paper with new reagents.

      Reviewer #1 (Significance (Required)):

      While multiple individual components of this complex have been previously characterized, the structure and nature of the complex formation and its recruitment to microtubules/spindles remains a complex problem that has yet to be solved.

      Overall this study represents a comprehensive localization-dependency analysis of the Clathrin-TACC3 based spindle complex using a consistent methodology. Although several of the conclusions of the findings echo previous reports, some of the previous literature is contradictory within itself as well as with the conclusions here. Analyzing all components with a single, rapid-perturbation technique thus has great value to present a clear data set, given that the experimental setup conditions and analysis are solid (a goal to which the majority of comments refer).

      Beyond the complex localization/recruitment analysis, two novel findings of this study that emerge are:

      a)GTSE1 contains a second, separate protein region, distinct from the clathrin-binding motifs that is required for its localization to the spindle, and most likely a microtubule-interaction site. This suggests that GTSE1 recruitment to the spindle is more complex than previously reported.

      b)PI3KC2A, which has been reported previously to be a stabilizing member of this complex, is in fact not a member, nor localizes to spindles, nor displays a mitotic defect after loss. This is important conclusion to be made as it would correct the literature, and avoid future confusion.

      --

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this paper, the authors investigate the nature of interactions between members of the TACC3-chTOG-clathrin-GTSE1 complex on the mitotic spindle. By using a series of HeLa cell lines that they have created by CRISPR/Cas9 editing to enable spatial manipulation (knocksideways) of either TACC3, chTOG, clathrin and GTSE1, they show that on spindle microtubules TACC3 and clathrin represent core complex members whereas chTOG and GTSE1 bind to them respectively but not to each other. Additionally, the authors find that the protein PIK3C2A, which has been implicated in this complex previously is in fact not a component of this complex in mitotic cells. The main advance of the paper in my opinion is the endogenous tagging of the proteins for knocksideways experiments since former experiments depended on RNAi silencing and expression of tagged proteins from plasmids, which introduced issues of protein silencing efficiency and plasmid overexpression problems. This approach seems to alleviate these problems, except in the case of chTOG which seems to be lethal in its homozygous variant.

      **Major comments:**

      I find the key conclusions regarding the localization of the components of the complex convincing. There are some issues regarding the specificity of antibodies in immunostaining experiments (Fig 3.) and the influence of mCherry-TACC3 expression on distorted localization of the complex prior to knocksideways. However, I think the general conclusion about which complex components (clathrin and TACC3) influence the localization of the other proteins in the complex (chTOG and GTSE1) stands. One thing that I miss from the paper is the data on the consequences on the spindle shape and morphology after knocksideways. I have noticed on images in both Figure 3 and Figure 4 that in some cases distribution of the signal seems to influence quite a bit the spindle morphology. Also, In Figure 3 I have noticed what seems to me a quite big variation in spindle size in tubulin signal in both untreated and rapamycin cells. Since authors have many of these images already, I believe it would be realistic, not costly and of additional value for the paper to provide more data on the consequences of the knocksideways experiments. Change of spindle size, tubulin intensity and DNA/kinetochore misalignment upon knocksideways would be helpful to appreciate more the findings of the paper. More so since the authors on more than one occasion find their motivation in the field of cancer research and spindle stability relation to it. Some data connection to this motivation would be of value. Experiments seem reproducible.

      The focus of the paper is on using the knocksideways methodology to understand a protein complex during mitosis, rather than looking at its function. We are not keen to do new experiments that are not part of the central message of the paper. However, the Reviewer is correct that we do already have a dataset that can be mined in the manner described.

      To address this point, we will analyze spindle size parameters and also the intensity of tubulin. Our analysis will be limited to the short timeframe of our experiments, but it should reveal or refute any changes in spindle structure that may result from loss of complex members.

      **Minor comments:**

      I have some problems with the clarity of Figure 3 and 4. For Figure 3. In Figure 3 plots on the right are a bit small and not easy to read. Some reorganization of the figure might be beneficial. In Figure 4 plots to the right are also too small to be clear. Also, I miss the number of cells (n) I can't see the number of individual arrows because of the size of graphs.

      Our aim was to compress a lot of information into a small space, while still showing some example primary data. All reviewers raised the same concern which tells us that we went too far towards “data visualization”.

      To address this point, we will rework these figures.

      Reviewer #2 (Significance (Required)):

      I find that the biggest significance of the paper is in the creation of new tools (cell lines) to study the localization of proteins TACC3, chTOG, clathrin and GTSE1. Cell lines where endogenous proteins can be delocalized rapidly will be of value for scientist working not only in mitosis but such as in the case of clathrin research, vesicle formation and trafficking or p53-dependent apoptosis in the case of GTSE1. In the field of mitosis it will surely help and speed up the research concerning the role of these proteins in spindle assembly and stability.

      Field of expertise: mitotic spindle

      --

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      This papers analyses the chTog/TACC3/clathrin/GTSE1 complex that crosslinks and stabilises microtubule bundles in the mitotic spindle. The authors have developed an elegant knock sideways approach to specifically analyse the effects of removing individual components of the complex from the spindle and study the effect this has on the other interactors. They report, based on these assays that the core of the complex is formed by TACC3 and Clathrin while GTSE1 and chTog are auxiliary interactors. They also refute previous evidence that this complex also incorporates PIK3C2A. Overall, this is an interesting study that distinguishes itself predominantly by its methodology. However, some of the reported results need more thorough analysis to allow convincing conclusions.

      **Major comments:**

      1)The knockside way method is the main highlight if this paper. Unlike previous studies by the PI, this time endogenous genes are tagged which is a key advance and allows much better interpretation of the results. I am not sure why the authors have chosen HeLa cells as their model here, given the messed up genome of these cells. A non-transformed cell line would have been preferable, but as a proof of principle study, I think HeLa are acceptable, and I wouldn't expect the authors to repeat all the experiment in another system.

      Figure 1,2 and S1 are describing and validating this approach in some detail, but this will require some more work.

      The authors state that gene targeting was validated using a combination of PCR, sequencing, Western blotting, but show only the results for westerns. PCR analysis that demonstrates homozygous or heterozygous gene targeting should be shown here.

      Another issue is the penetrance of the phenotypes induced by Rapamycin. The authors show nice data of the system working in individual cells but do not give us an idea if this happens in all cells. The localisation of the individual tagged genes should be quantified (ideally with line plots) in 50 randomly chosen mitotic cells with 3 repeats before and after rapamycin treatment. Moreover, the analysis of mitotic duration (Figure S1D) should be extended to include a plus Rapamycin cohort and this should be moved in the main Figure.

      If the system works only in a small proportion of cells, this should be clearly stated. I don't think this would prevent publication, but it is an important piece of information that is missing.

      The Reviewer raises two issues here.

      • PCR analysis should be shown. This issue was also partly raised by Reviewer 1. A summary of our PCR analysis was actually included in Table 1, since the analysis we did is pretty unwieldy. We agree though that presenting our evidence for homozygosity of the cell lines would be useful. To address this point, we will add more detail of the PCR and sequencing work done to validate these cell lines.
      • Does knocksideways happen in all cells? The answer to this depends on the transient expression of MitoTrap and sufficient application of rapamycin. We agree that this will be a useful piece of information to add to the manuscript. A related issue is whether knocksideways of complex members affects mitotic progression. We have established through other experiments that rapamycin application to wild-type cells alters mitotic progression, although application of Rapalog does not have this effect. Our plan to address these points is 1) to analyze the efficacy of knocksideways that readers can expect to achieve using these, or similar cells, and 2) analyze mitotic duration in rapalog-treated cells expressing a rapalog sensitive MitoTrap.

        2)Apart from a simple quantification of mitotic duration, I believe a more detailed mitotic phenotype analysis for each knock-side way gene, especially the homozygous targeted clones, should be included. This can involve more high-resolution live cell imaging of mitotic progression with SiR-DNA and GFP-tubulin, using the dark mitotrap.

      We don’t agree that such an analysis should be included. The focus of this paper is on using the knocksideways methodology to understand a protein complex during mitosis, and not looking at its function. There are several papers on the mitotic phenotypes of these genes probed using RNAi in different cellular systems (examples for chTOG: 10.1101/gad.245603; TACC3/clathrin: 10.1038/emboj.2011.15, 10.1242/jcs.075911, 10.1083/jcb.200911091, 10.1083/jcb.200911120; GTSE1: 10.1083/jcb.201606081). Moreover, our 2013 paper used knocksideways (with RNAi and overexpression) and has a detailed analysis of mitotic progression, microtubule stability, checkpoint activity and kinetochore motions (Cheeseman et al., 2013 doi: 10.1242/jcs.124834).

      New experiments that are not part of the central message of the paper and are unlikely to give new insight are not the best use of our revision efforts for this paper (especially during the pandemic). Having said this, Reviewer 2’s suggestion to use our existing dataset to investigate mitotic phenotypes, will largely answer Reviewer 3’s request.

      We will analyze spindle size parameters and also the intensity of tubulin. Our analysis will be limited to the short timeframe of our experiments, but it should reveal or refute any changes in spindle structure that result from the loss of complex members.

      3)Overall, the quantitative analysis in Figure 3 ,4 and 7 is not good enough and sometimes doesn't fully support the conclusions. In Figure 3,4 a convoluted way of demonstrating the change in localisation is shown and this panel is so small that is almost impossible to read. Also, there is no statistical analysis, and the sample size seems very small . At least 25 cells should be analysed here in 3 repeats. I would suggest to unify the quantification in the MS and use the line plots shown in Figure 5 and 6 and compare each protein before and after rapamycin addition. This is much easier to read and more convincing. The images of the cells panels can be moved to a supplement as they contain very little information. This would generate space to expand the size and depth of the quantitative analysis. Instead of Anova tests, I would recommend using a simple t-test comparing each condition to its relevant control since this is the only relevant comparison in the experiment. Statistical significance should be calculated for each experiment with sufficient sample size. It would also be better to show the individual data points from the three repeats in different colours so that the reproducibility between repeat can be judged.

      This type of statistical analysis should be uniformly done throughout the MS and also extended to Figure 7.

      The referee raises several issues here with our data presentation and statistical analysis.

      • Our aim in Figures 3 and 4 was to compress a lot of information into a small space, while still showing some example primary data. All reviewers raised the same concern about these figures which tells us that we went too far towards “data visualization”. To address this point, we will rework Figures 3 and 4 to provide more clear data presentation.
      • The Reviewer’s comments about statistical analysis however are not sound. First, it is incorrect to state that simple t-tests can be applied (this is a form of p-hacking). Correction for multiple testing must be done on these datasets. Second, the reviewer arbitrarily states numbers for cells and experimental repeats without considering the effect size or it seems, understanding the structure of the data that we have collected. Sample sizes are small but they are taken from many independent replicates. Third, and related to the previous point, the fixed and live cell data are structured differently which means that a uniform data presentation is not possible. The live data has a paired design and each cell is an independent replicate (with replicates done over several trials). The fixed data is unpaired and we have taken measures from several experiments (independent replicates). The point about applying statistical tests to the data is also made by Reviewer 1 and we will use appropriate tests (NHST or estimation statistics) as we re-work the figures.

        Reviewer #3 (Significance (Required)):

      In my opinion, the most interesting aspect of the MS is the methodology. Based on this, publication is justified and will be of interest to a wider audience. That is why a more detailed analysis of the penetrance of this manipulation across the cell population will be critical.

      The application of this method to analyse the composition of the TACC3/Clathrin complex on the spindle is the main biological advance, and the novel information is rather limited but not unimportant.

      Overall, if these results can be properly quantified I would recommend publication.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      This papers analyses the chTog/TACC3/clathrin/GTSE1 complex that crosslinks and stabilises microtubule bundles in the mitotic spindle. The authors have developed an elegant knock sideways approach to specifically analyse the effects of removing individual components of the complex from the spindle and study the effect this has on the other interactors. They report, based on these assays that the core of the complex is formed by TACC3 and Clathrin while GTSE1 and chTog are auxiliary interactors. They also refute previous evidence that this complex also incorporates PIK3C2A. Overall, this is an interesting study that distinguishes itself predominantly by its methodology. However, some of the reported results need more thorough analysis to allow convincing conclusions.

      Major comments:

      1)The knockside way method is the main highlight if this paper. Unlike previous studies by the PI, this time endogenous genes are tagged which is a key advance and allows much better interpretation of the results. I am not sure why the authors have chosen HeLa cells as their model here, given the messed up genome of these cells. A non-transformed cell line would have been preferable, but as a proof of principle study, I think HeLa are acceptable, and I wouldn't expect the authors to repeat all the experiment in another system. Figure 1,2 and S1 are describing and validating this approach in some detail, but this will require some more work. The authors state that gene targeting was validated using a combination of PCR, sequencing, Western blotting, but show only the results for westerns. PCR analysis that demonstrates homozygous or heterozygous gene targeting should be shown here. Another issue is the penetrance of the phenotypes induced by Rapamycin. The authors show nice data of the system working in individual cells but do not give us an idea if this happens in all cells. The localisation of the individual tagged genes should be quantified (ideally with line plots) in 50 randomly chosen mitotic cells with 3 repeats before and after rapamycin treatment. Moreover, the analysis of mitotic duration (Figure S1D) should be extended to include a plus Rapamycin cohort and this should be moved in the main Figure. If the system works only in a small proportion of cells, this should be clearly stated. I don't think this would prevent publication, but it is an important piece of information that is missing.

      2)Apart from a simple quantification of mitotic duration, I believe a more detailed mitotic phenotype analysis for each knock-side way gene, especially the homozygous targeted clones, should be included. This can involve more high-resolution live cell imaging of mitotic progression with SiR-DNA and GFP-tubulin, using the dark mitotrap.

      3)Overall, the quantitative analysis in Figure 3 ,4 and 7 is not good enough and sometimes doesn't fully support the conclusions. In Figure 3,4 a convoluted way of demonstrating the change in localisation is shown and this panel is so small that is almost impossible to read. Also, there is no statistical analysis, and the sample size seems very small . At least 25 cells should be analysed here in 3 repeats. I would suggest to unify the quantification in the MS and use the line plots shown in Figure 5 and 6 and compare each protein before and after rapamycin addition. This is much easier to read and more convincing. The images of the cells panels can be moved to a supplement as they contain very little information. This would generate space to expand the size and depth of the quantitative analysis. Instead of Anova tests, I would recommend using a simple t-test comparing each condition to its relevant control since this is the only relevant comparison in the experiment. Statistical significance should be calculated for each experiment with sufficient sample size. It would also be better to show the individual data points from the three repeats in different colours so that the reproducibility between repeat can be judged. This type of statistical analysis should be uniformly done throughout the MS and also extended to Figure 7.

      Significance

      In my opinion, the most interesting aspect of the MS is the methodology. Based on this, publication is justified and will be of interest to a wider audience. That is why a more detailed analysis of the penetrance of this manipulation across the cell population will be critical. The application of this method to analyse the composition of the TACC3/Clathrin complex on the spindle is the main biological advance, and the novel information is rather limited but not unimportant. Overall, if these results can be properly quantified I would recommend publication.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this paper, the authors investigate the nature of interactions between members of the TACC3-chTOG-clathrin-GTSE1 complex on the mitotic spindle. By using a series of HeLa cell lines that they have created by CRISPR/Cas9 editing to enable spatial manipulation (knocksideways) of either TACC3, chTOG, clathrin and GTSE1, they show that on spindle microtubules TACC3 and clathrin represent core complex members whereas chTOG and GTSE1 bind to them respectively but not to each other. Additionally, the authors find that the protein PIK3C2A, which has been implicated in this complex previously is in fact not a component of this complex in mitotic cells. The main advance of the paper in my opinion is the endogenous tagging of the proteins for knocksideways experiments since former experiments depended on RNAi silencing and expression of tagged proteins from plasmids, which introduced issues of protein silencing efficiency and plasmid overexpression problems. This approach seems to alleviate these problems, except in the case of chTOG which seems to be lethal in its homozygous variant.

      Major comments:

      I find the key conclusions regarding the localization of the components of the complex convincing. There are some issues regarding the specificity of antibodies in immunostaining experiments (Fig 3.) and the influence of mCherry-TACC3 expression on distorted localization of the complex prior to knocksideways. However, I think the general conclusion about which complex components (clathrin and TACC3) influence the localization of the other proteins in the complex (chTOG and GTSE1) stands. One thing that I miss from the paper is the data on the consequences on the spindle shape and morphology after knocksideways. I have noticed on images in both Figure 3 and Figure 4 that in some cases distribution of the signal seems to influence quite a bit the spindle morphology. Also, In Figure 3 I have noticed what seems to me a quite big variation in spindle size in tubulin signal in both untreated and rapamycin cells. Since authors have many of these images already, I believe it would be realistic, not costly and of additional value for the paper to provide more data on the consequences of the knocksideways experiments. Change of spindle size, tubulin intensity and DNA/kinetochore misalignment upon knocksideways would be helpful to appreciate more the findings of the paper. More so since the authors on more than one occasion find their motivation in the field of cancer research and spindle stability relation to it. Some data connection to this motivation would be of value. Experiments seem reproducible.

      Minor comments:

      I have some problems with the clarity of Figure 3 and 4. For Figure 3. In Figure 3 plots on the right are a bit small and not easy to read. Some reorganization of the figure might be beneficial. In Figure 4 plots to the right are also too small to be clear. Also, I miss the number of cells (n) I can't see the number of individual arrows because of the size of graphs.

      Significance

      I find that the biggest significance of the paper is in the creation of new tools (cell lines) to study the localization of proteins TACC3, chTOG, clathrin and GTSE1. Cell lines where endogenous proteins can be delocalized rapidly will be of value for scientist working not only in mitosis but such as in the case of clathrin research, vesicle formation and trafficking or p53-dependent apoptosis in the case of GTSE1. In the field of mitosis it will surely help and speed up the research concerning the role of these proteins in spindle assembly and stability.

      Field of expertise: mitotic spindle

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Here Ryan et al. have used localization analysis following induced rapid relocalization of endogenous proteins to investigate the composition and recruitment hierarchy of a clathrin-TACC3-based spindle complex that is important for microtubule organization and stability. The authors generate different HeLa cell lines, each with one of four complex members (TACC3, CLTA, chTOG and GTSE1) endogenously tagged with FKBP-GFP via Cas9-mediated editing. This tag allows rapid recruitment to the mitochondria upon rapamycin addition ("knocksideways"). They ultimately quantify each of the 4 components' localization to the spindle following knocksideways of each component using fluorescently-tagged transfected constructs. The authors' interpretation of the results of this analysis are summarized in the last model figure, in which a core MT-binding complex of clathrin and TACC3 recruit the ancillary components GTSE1 and chTOG. In addition, the authors investigate the contribution of individual clathrin-binding LIDL motifs in GTSE1 to the recruitment of clathrin and GTSE1 to spindles. Their findings here largely agree with and confirm a recent report regarding the contribution of these motifs to GTSE1 recruitment to the spindle. They further analyzed GTSE1 fragments for interphase and mitotic microtubule localization, and identified a second region of GTSE1 required (but not sufficient) for spindle localization. Finally, the authors report that PIK3C2A is not part of this complex, contradicting (correcting) a previously published study.

      Major comments:

      1.The chTOG-FKBP-GFP cell line the authors generate has only a small fraction of chTOG tagged, and thus should not be used for any conclusions about protein localization dependency on chTOG. Because they were unable to construct a HeLa cell line with all copies tagged, the authors expect that the homozygous knock-in of chTOG-FKBP-GFP is lethal, and thus their experience is appropriate to report. However, the authors should not use this cell line alone to make statements about chTOG dependency. They would have to use similar localization analysis, but after another method to disrupt chTOG (as a second-best approach), such as RNAi. In fact, they have reported this in a previous publication (Booth et al 2011). However, the result was different. There, loss of chTOG resulted in reduced clathrin on spindles, suggesting it may stabilize or help recruit the complex. Alternatively, they could remove their chTOG data, but this would compromise the "comprehensive" nature of the work.

      2.The authors initially analyze complex member localization after knocksideways experiments by antibody staining, which has the advantage of analyzing endogenous proteins (versus the later transfected fluorescent constructs). Setting aside potential artefacts from fixation, this would seem to be a better method for controlled analysis to take advantage of their setup (short of generating stable cell lines with second proteins endogenously tagged in a second color - a huge undertaking). The authors conclude that antibody specificity problems confounded their analysis and explained unusual results. However, I think is worth investing a little more effort to sort this out, rather than bringing doubt to the whole data set. Verifying and then using another antibody for chTOG localization would be informative. Of course, the negative control should not be their chTOG-FKBP-GFP line, as it does not relocalize most of chTOG.

      In the case of GTSE1, an alternative explanation to antibody specificity issues would be that the GTSE1-FKBP-GFP cell line is not in fact homozygously tagged. Given the low expression levels on the western provided, and the detection of GTSE1 on the spindle in the induced GTSE1-FKBP-GFP cell line (but not TACC3-FKBP-GFP), it seems plausible that an untagged copy remains. If there are multiple copies of GTSE1 in Hela cells, one untagged copy could represent a small fraction of total GTSE1. This should thus be ruled out. GTSE1 clones should be analyzed with more protein extracts loaded - dilutions of the extracts can determine the sensitivity of the blot to lower protein levels. In addition, sequencing of genomic DNA can reveal a small percentage with different reads.

      3.There is a lot of data contained in the small graphs summarizing quantification of localization in Figs 3 and 4. They would be more accessible to the reader if they were larger and/or an "example" of the chart with labels was present explaining it (essentially what is in the figure legends). Furthermore, there is no statistical test applied to this data that I see. This is needed. How do authors determine whether there is an "effect"?

      Minor issues:

      1.The GTSE1 constructs used for mutation and localization analysis are 720 amino acids long. A recent study analyzing similar mutations uses a 739 amino acid construct (Rondelet et al 2020). The latter is the predominant transcript in NCBI and Ensembl databases. It appears the construct used by the authors omits the first 19 a.a.. I do not think using the truncated transcript affects conclusions of the manuscript, but it could generate confusion when identifying residues based on a.a.#s of mutant constructs (Fig 6). This should be somehow clarified.

      2.The labeling of constructs in Fig 6C/D is confusing, and appears shifted by eye at places. Please relabel this more clearly.

      The recommended new experimental data (Analysis complex member levels on spindles after full perturbation of spindle chTOG; new chTOG antibody stainings in the FKBP lines; reanalysis of GTSE1 DNA/protein in GTSE1-FKBP line) should only require a new antibody/siRNA, plus a few weeks time to repeat the analyses already in the paper with new reagents.

      Significance

      While multiple individual components of this complex have been previously characterized, the structure and nature of the complex formation and its recruitment to microtubules/spindles remains a complex problem that has yet to be solved.

      Overall this study represents a comprehensive localization-dependency analysis of the Clathrin-TACC3 based spindle complex using a consistent methodology. Although several of the conclusions of the findings echo previous reports, some of the previous literature is contradictory within itself as well as with the conclusions here. Analyzing all components with a single, rapid-perturbation technique thus has great value to present a clear data set, given that the experimental setup conditions and analysis are solid (a goal to which the majority of comments refer).

      Beyond the complex localization/recruitment analysis, two novel findings of this study that emerge are:

      a)GTSE1 contains a second, separate protein region, distinct from the clathrin-binding motifs that is required for its localization to the spindle, and most likely a microtubule-interaction site. This suggests that GTSE1 recruitment to the spindle is more complex than previously reported.

      b)PI3KC2A, which has been reported previously to be a stabilizing member of this complex, is in fact not a member, nor localizes to spindles, nor displays a mitotic defect after loss. This is important conclusion to be made as it would correct the literature, and avoid future confusion.

    1. Reviewer #2:

      In this manuscript by de Rus Jacquet et al., authors present an interesting study to detect changes in extracellular vesicles in human PD patient derived iPSC-derived astrocytes carrying the LRRK2 G2019S mutation. Isogenic gene corrected iPSCs were used as controls in all experiments. Authors first performed RNA-Seq for global gene expression changes between G2019S and "WT" gene corrected astrocytes. GO analysis showed an upregulation of extracellular compartments (including exosome compartments) in LRRK2 astrocytes. Subsequent experiments focusing on extracellular vesicles (EVs) and multivesicular bodies (MVBs), showed specific differences of MVB area and the size of secreted EVs. Secreted EVs from G2019S astrocytes also contained more LRRK2 particles and G2019S EVs contained more phosphorylated aSyn particles. Co-culture of LRRK2 astrocytes with human dopamine neurons showed accumulation of CD63+ exosomes in neurites, compared to co-culture with WT astrocytes. Co-culture with LRRK2 astrocytes decreased viability of TH+ neurons and LRRK2 dendrites/neurites were also shorter. These co-culture findings were replicated using EV-enriched conditioned media. Finally, authors showed that the trophic effect of astrocytes on neurons was due both to soluble factors released into the media, and production and release of EVs. Overall, this is a well-written and systematically performed study. This reviewer has several comments as detailed below.

      1) Based on their data, authors conclude that astrocyte-to-neuron signaling and trophic support mediated by EVs is disrupted in LRRK2 G2019S astrocytes. Have authors measured the differences in trophic factors released by LRRK2 astrocytes in EVs and in conditioned media?

      2) Authors differentiate cells (astrocytes and neurons) from midbrain lineage NPCs. The data show convincing effects of the LRRK2 derived astrocytes on neurons, but one question is whether this is specific to dopaminergic cells. Would this genotype specific effect also be expected in other lineages, e.g. cortical neurons? Authors should discuss this point.

      3) Prior work has demonstrated reductions in neurite length in neurons derived from LRRK2 G2019S iPSCs (not specific to dopaminergic neurons in LRRK2 cells) (for example Reinhard et al 2013). It is curious that the LRRK2 G2019S mutation itself can cause such a phenotype in neurons mono-cultures, and as shown in the current study, that LRRK2 G2019S astrocytes also induce a similar effect on WT neurons in co-culture. Can authors expand on this point in the Discussion?

      4) Authors should provide data on % dopaminergic neurons generated in the cultures.

      5) p7. Authors refer to phosphorylated a-synuclein as accelerating PD pathogenesis, but the references cited do not show this. In fact, Gorbatyuk et al 2008, showed that overexpression of S129 with constitutive phosphorylation eliminated a-synuclein induced nigrostriatal degeneration. The Fujiwara et al 2002 reference showed the presence of phospho a-syunclein in Lewy bodies and neurites. Authors should revise their statement that phospho a-synuclein is associated with accelerated pathology.

      6) Please provide details on the number of iPSC lines used for these experiments.

      7) Clarify whether the WT neurons used for co-culture were derived from the isogenic human neurons?

    2. Reviewer #1:

      In this manuscript titled "The LRRK2 G2019S mutation alters astrocyte-to-neuron communication via extracellular vesicles and induces neuron atrophy in a human iPSC-derived model of Parkinson's disease", Jacquet and colleagues investigated the role of Parkinsonism gene mutation LRRK2 G2019S in hiPSC-differentiated astrocytes. By isolating extracellular vesicles from ACM and examining astrocytes with various electron microscopy techniques, the authors found that LRRK2 G2019S affects the morphology and distribution of MVBs and the morphology of secreted EVs in hiPSC-differentiated astrocytes. Furthermore, the authors observed that astrocyte-derived EVs can be internalized by dopaminergic neurons and such EVs support neuronal survival. However, LRRK2 G2019S EVs lost the ability of promoting neuronal survival. This is an interesting study showing a non-cell autonomous contribution to dopaminergic neuron loss in PD.

      The proposed idea of how LRRK2 G2019S dysregulates EV-mediated astrocyte-to-neuron communication is novel and exciting. However, the authors present some conflicting data that is not addressed during the discussion: they first conclude upregulated exosome biogenesis by RNAseq in G2019S vs WT astrocytes, but later show a decrease in the number of <120nm particles in G2019S mutants suggesting a decrease in the classical exosome-sized vesicle secreted compared to WT. Lastly, their MVB images show less CD63 gold particles in G2019S compared to WT control (though this was not quantified). Do the authors suggest and increase or decrease in exosome biogenesis in G2019S mutants? How do they reconcile these seemingly contradicting data? Several experiments, controls and additional analyses are needed to fully demonstrate the validity of the proposed mechanism.

      Major concerns:

      1) In figure 1 A authors demonstrate iPSC-derived astrocytes characterization. Since there is no one unified and validated method for astrocytes differentiation, there is a need for more accurate characterization of iPSC-derived astrocytes. Authors should demonstrate the percentage of cells positive to astrocytic markers and to prove that obtained astrocytes are functional (able to promote synaptogenesis and uptake glutamate). I would also recommend analyzing the iPSC-derived astrocyte cultures for expression of more specific astrocytic markers as GLT1, SOX9 in addition to those which have been analyzed. Moreover, it is highly important to know what is the proportion of astrocytes derived from LRRK2 G2019S line and its isogenic control in order to be able to compare their effect on neurons.

      2) In Figure 1, the authors found a significant upregulation of exosome components in astrocytes, demonstrating an important role of LRRK2 G2019S in EV signaling pathway. In the discussion, the authors briefly mentioned 'sub-populations of CD63- EVs may be differentially secreted in mutant astrocytes'. Since the authors have obtained the RNA-seq data, it would be nice to dig deep into the data and comment on potential EV sub-populations which can be differentially secreted. This information can be very beneficial for follow-up studies in the PD and LRRK2 field. Furthermore, the authors should assess the expression of Rab27a and CD82 in WT and LRRK2 G2019S astrocytes by western blots to verify RT-qPCR data. Furthermore, the authors should present specifically exosome biogenesis or secretion genes are altered to provide further insight into the stage of exosome biogenesis that is affected (ESCRT0-3, VPS4, ALIX, etc).

      3) In Figure 2A and B, data shows that both WT and LRRK2 G2019S astrocytes produce MVBs and MVBs in LRRK2 G2019S astrocytes is smaller than in WT astrocytes. In Figure 2E, the authors showed the abundance of CD63 localized within MVBs in WT astrocytes but did not show the CD63 localization in MVBs in G2019S astrocytes. However, it is important to show CD63 localization in MVBs in G2019S astrocytes to fully support the conclusion that CE63+ MVBs are present in LRRK2 G2019S astrocytes. In addition, CD44 is a marker for astrocyte-restricted precursor cells. Although CD44+ positive cells are committed to give rise to astrocytes, it is crucial to include another astrocyte marker to ensure these cells are indeed mature astrocytes. -Related, authors should consider citing some of the MVB maturation literature to guide the readers.

      4) In Figure 3, it is impressive that the authors are able to image EVs using cyro-EM approach and analyze their sizes. The authors also observed different shapes of EVs. Is there any shape difference between WT EVs and G2019S EVs? Is there a way that the authors could categorize these shapes and do a detailed analysis in EV shapes? Also, In Figure 3D, both WT EV and G2019S EV images should present side by side for comparison. -Related, the size frequencies of EVs presented suggest a difference in the types of EV's released. Interestingly, exosomes are classically known to range from ~50-120nm and this population is significantly decreased in G2019S compared to WT. What does this suggest?

      5) In figure 3c, SBI ELISA claims to quantify CD63+ vesicles, the authors should present more standardized particle quantification data (either by CD63 FACs for isolated EVs in WT vs G2019S or ZetaView/QNano particle tracking). The authors should also directly quantify the total number of EVs secreted in WT vs G2019S conditions (not only CD63+).

      6) In Figure 4, the authors quantify LRRK2+/CD63+ particles by imaging. Importantly, it appears that there are less CD63 "large gold" particles in MVB of G2019S compared to control. This CD63 baseline quantification in MVB of WT vs. G2019S should be presented in this figure. These data are not convincing and should be quantified by FACS in secreted EV. Supplementary figure 3 should be brought into this figure.

      7) In Figure 5, using CD63 as a MVB marker is not the most accurate approach. ESCRT markers should be co-stained with these experiments to truly show MVB localization (CD63 can localize to MVBs but is known to have a wider distribution throughout the cell compared to TSG1010 or other ESCRT complex proteins). Additionally, the authors must show their Supplemental Figure 3 ELISA quantification of p-aSyn in this main figure, and comment on why they conclude higher p-aSyn content in MVBs based on their IEM but then find no differences in aSyn in secreted EVs in WT vs. G2019S by ELISA.

      8) In figure 6, it is even more clear that there is a stark difference between the CD63 presence in/near MVBs between WT and G2019S conditions. Since the authors normalize several pieces of data to CD63 (MVB localization, LRRK2 co-localization, etc), it is critical to quantify the number of baseline CD63 gold particles in MVBs in WT vs G2019S.

      9) In Figure 7, the authors used the co-culture of astrocytes and neurons to assess astrocyte-derived EV uptake by dopaminergic neurons. Although 3D reconstitution of neurons and exosomes can be precise, the data may not be 100% clean. It would be better if the authors collect ACM containing EV fraction from WT astrocyte and G2019S astrocytes and then incubate dopaminergic neurons with ACM containing EV fraction. In this way, only dopaminergic neurons are in the culture and there will be no CD63-GFP expressed astrocytes to contaminate the CD63-GFP signal in neurons.

      10) In Figure 9, the authors must show their ACM control. They show untreated, EV-free, and EV-rich ACM, but do not show unmanipulated ACM control.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The discussion between reviewers and editors centered on a few key points. First, all reviewers felt that it is of utmost importance that a justified and appropriate number of hiPSCs and their appropriate controls are utilized throughout. In particular, there is concern that G2019S-related phenotypes may be more variable than other presumed monogenetic causes of disease, for example a low penetrance of disease causation associated with G2019S in people (e.g., 20% lifetime penetrance for PD) that may necessitate more lines analyzed than usual, and possible lines from carriers of the mutation that appear resilient to disease. Studies in the past decade that use only one or a few lines of G2019S hIPSCs have generally failed to replicate in more than one laboratory, possibly due to low power. The reviewer's were not sure how rigorous the study was in this regard. Second, reviewer's felt there was over-interpretation and speculation regarding the possible roles of differential trophic factors released by the astrocytes in EVs and conditioned media without many measures of specific trophic factors, or rescue experiments, to help define the mechanism. Third, the EV data are not broadly supported by NTA (like Zeta or nanosight) or quantitative measures fairly standard in the EV field. For example, the authors did not clearly quantify the total number of EVs secreted in WT vs. G2019S conditions, which would be a basic experiment needed to create interest in the study in the EV community.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are grateful for the insightful, constructive and very positive reviews provide by the three reviewers. Please find responses to each of the reviewer comments below.


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors study proteins localised to the apical end of the highly polarised parasites causing Toxoplasmosis and malaria. They find new proteins using BioID and examine the localisation of these along with recently identified proteins in the two different parasites. They key question they address is whether there is a conservation of the apical components in these distantly related parasites as well as in some even more distantly related organisms. This is an important question as the apical part comprises many essential proteins of invasion of host cells and shows a unique structure that defines the apicomplexans as a group. The apical structure can be highly elaborate such as in T. gondii and less elaborate as in P. falciparum. The authors now show that there is a large conservation between the species in the protein makeup of the apical end. The experiments are well performed, displayed and discussed and there is no doubt about the validity of the presented results. The text is eloquently written, if at times a bit wordy.

      My only main suggestion would be to possibly add data on gene disruption of the two candidates (0310700 and 1216300) that are not detected in blood stage parasites but in the insect stages. A deletion of these should be technically straightforward and would show whether the proteins are important to the parasite. Likely not all of the now many proteins are essential for the parasites but these are good candidates to rapidly investigate. But showing a functional impact might convince editors at certain journals.

      Authors’ response: The central aim of this study was to ask if the molecular composition of the conoid complex is conserved across Apicomplexa. Functional dissection of proteins is part of an exciting set of subsequent questions and studies that will now follow by us and others. However, careful and thorough phenotyping of gene disruptions is not trivial work, would be most informative to perform in both Toxoplasma and Plasmodium, and is therefore beyond the scope of this project. Regarding the two proteins suggested by this reviewer for follow-up work and the question of ‘essentiality’, that the proteins have not been lost during parasite selection through evolution is clear evidence of their relevance to the biology of Plasmodium.

      Other suggestions in chronological order (line numbers would have helped)

      title: maybe write 'conoid complex proteome'

      Authors’ response: while we initially thought that this change would be suitable, given that the subsequent part of the title is ‘reveals a cryptic conoid feature’ we think it is clearer and more logical to leave this title in its original form. The conoid complex includes the apical polar rings, and these are not considered to be cryptic or previously unrecognised, only the conoid. While our study confirms that there is conservation across all proteome components of the conoid complex, this is secondary to the primary question of this study.

      abstract: not sure about the use of the words instrument and substructures

      Authors’ response: we believe that the use of ‘instrument’ is an appropriate analogy of a tool and not different from the use of ‘machine’ and ‘machinery’ that is widely used in molecular and cellular biology. Similarly, ‘substructure’ acknowledges that within recognised structures, such as the conoid, there is further specific organisation such as the conoid base or apex.

      page 2 last lines: is tubulin monomeric or polymerized?

      Authors’ response: to specify the polymerized state of tubulin as mentioned here the text has been changed to ‘the presence of tubulin polymers’.

      page 3 name protein talked about in 9th line

      Authors’ response: we have now named this protein (RNG2) as suggested.

      third paragraph: mention previous proteomics studies e.g. from Ke Hu (mentioned later in discussion)

      Authors’ response: We feel that it is more appropriate to leave the discussion of the Hu et al (2006) proteomics study, along with various subsequent approaches used in pursuit of discovering conoid-associated proteins, to the discussion as currently occurs. In the introduction we seek to efficiently inform the reader of the current state of knowledge that makes the value and nature of the questions that we have asked in this study apparent. But we do give full credit and evaluation of previous studies in the discussion which we think is the most appropriate place for this.

      first paragraph or results could go into introduction

      Authors’ response: The first paragraph of the Results contains specific detail of just one aspect of this study, the use of hyperLOPIT. This is relevant to the new analysis that we have made of the hyperLOPIT data in this study. We, therefore, believe that it is most appropriately presented here in the Results in association with the new analyses we described. Our aim is that the Introduction is succinct and serves the entire study.

      page 4: add reference after BioID

      Authors’ response: reference added as suggested

      page 5: add definitions of the conoid; what technique was used to report YFP-SAS6?

      Authors’ response: It is unclear what this reviewer is requesting with respect to definitions of the conoid on this page. Nevertheless, we have now included a thorough definition of the conoid based on the original electron microscopy studies (fourth paragraph of the Introduction).

      With respect to the technique used to report on YFP-tagged SAS6 in the de Leon et al 2013 study, we now include fuller description of this previous study as follows:

      ‘The fluorescence imaging used in the de Leon et al study was limited to lower resolution widefield microscopy. Immuno-TEM was also used, however, contrary to their conclusions, did show YFP presence throughout transverse and oblique sections of the conoid consistent with our detection of SAS6L throughout the conoid body.’

      page 7: 'showed similar localisation' instead of 'phenocopied'?; add reference after ookinete stage; add expression levels from PlasmoDB to the Table 1 data at least for merozoites, ookinetes and sporozoites or add separate table for the 9 proteins in supplement

      Authors’ response: ‘phenocopied’ replaced, as suggested. Reference added after ookinete stage, as suggested.

      As requested, we have complied available expression data for the Plasmodium proteins throughout the different zoite stages and will include these data as supplemental material in our subsequent revision.

      Discussion: Maybe discuss that the conoid complex is a cytoskeletal structure and that the other cytoskeletons (actin, microtubules, subpellicular network) also differ between the species investigated in their composition and overall architecture

      Authors’ response: These are reasonable suggested analogies and we will introduce them in the subsequent revision.

      page 9: at least two proteins could be deleted as they seem to not confer any growth defect on blood stages (see main comment)

      Authors’ response: This reviewer has not linked this comment to a specific statement on page 9, however, we are cautious not to interpret lack of observed growth defects in experimental scenarios with unimportant or irrelevant proteins. Maintenance, through natural selection and evolution, of proteins of a structure indicate that they are selectively advantageous and of functional relevance. The two proteins in question are not expressed in the blood stage, so one wouldn’t expect their deletion to have consequence in this stage.

      Apart from classic TEM images also Cryo EM data is available for apex of merozoite and sporozoite. Worth to discuss?

      Authors’ response: According to this review’s subsequent suggestion (below), we are now preparing a schematic for the subsequent revision of each of the zoite stages of Plasmodium and these draw on Cryo EM tomography data.

      Add and discuss the recent work from Curr Biol and EMBO J of the Yuan lab on ookinete formation?

      Authors’ response: These two reports are excellent studies of the polarised development of the cell pellicle during ookinete formation and control of gliding initiation, but don’t specifically related to the conoid complex structures that are the subject of our study. We, therefore, do not see a logical place to include discussion of these works.

      Reviewer #2 (Significance (Required)):

      The paper provides a conceptual advance over previous data as it shows clearly a high level of conservation of the protein components of the conoid complex. It could introduce a new terminology for these important apical structure of Apicomplexan parasites and provides a good basis to dissect the molecular functions.

      Authors’ response: We appreciate this reviewer recognising this opportune point in time to more clearly define the terminology applied to these apical structures so that they can be more clearly and easily compared between taxa. We will use the suggested schematic figure (see comment below) that is now in preparation as a basis and guide for a refined nomenclature based on precedent in the literature.

      As it stands all scientists investigating Plasmodium and Toxoplasma invasion of host cells will be highly interested in this study, most scientists researching apicomplexan organisms should be and some evolutionary scientists will be interested in this study.

      Key papers in the field are the discovery of the Toxoplasma conoid as a highly twisted microtubule-like structure (Hu et al., JCB 2002; doi: 10.1083/jcb.200112086) the first description of an apical proteome (Hu et al., PLoS Path 2006; 10.1371/journal.ppat.0020013), the description of a tilted arrangement of the rings in Plasmodium versus Toxoplasma (Kudryashev et al., Cell Microbiol 2012; doi: 10.1111/j.1462-5822.2012.01836.x) and the discovery of apical located proteins that are essential for conoid formation (Tosetti et al., eLife 2020; 10.7554/eLife.56635) to name a few.

      If intended for a broader audience, a cartoon of a conoid complex across the different species investigated and discussed here would help for visual guidance highlighting the similarities and differences

      Authors’ response: This is a good suggestion and we are presently preparing a schematic of all stages studied and supporting this with electron microscopy.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this work, Koreny et al. characterized the localization of a new collection of conoid proteins in Toxoplasma gondii as well as in several different stages of Plasmodium berghei. The authors discovered that these proteins are located in several distinct substructures in Plasmodium and are expressed in a stage-specific manner. The data are of high quality, well‐organized, and well presented. The paper is well written. The introduction, in particular, was a pleasure to read. This reviewer (Ke Hu) does not have any new experiments to suggest.

      However, while the authors present LOPIT+BIOID as a powerful approach to identify conoid proteins, implying that it is more reliable than previously published approaches (see below), the manuscript includes no data to show what the false positive or false negative rate is with the current approach, nor any estimate of how many conoid proteins were missed entirely.

      Authors’ response: In our validation of putative conoid-associated proteins identified by the hyperLOPIT+BioID approach we reporter-tagged 18 proteins to resolve their cellular location by microscopy. All 18 were verified as being located at the site of the conoid. So, by this measure there were no false positives. The veracity of the hyperLOPIT data was also confirmed across other cell compartments in our report where 62 proteins were reporter-tagged from which there were no false positive assignments of cell location (Barylyuk et al., 2020, Cell Host & Microbe, in press:doi:10.1016/j.chom.2020.09.011), bioRixv: https://doi.org/10.1101/2020 .04.23.057125).

      Estimating false negatives is more difficult, but we know that these would occur as for any mass spectrometry-based detection technique. However, we have not claimed to have been exhaustive, nor was this required to answer our central question of are there conserved conoid-associated proteins throughout Apicomplexa? To address this question, we required a good sample of proteins, and the methods that we have employed provided this.

      Page 7: "Previous identification of conoid complex proteins used methods including subcellular enrichment, correlation of mRNA expression, and proximity tagging (BioID) (Hu et al. 2006; Long, Anthony, et al. 2017; Long, Brown, et al. 2017). Amongst these datasets many components have been identified, although often with a high false positive rate. We have found the hyperLOPIT strategy to be a powerful approach for enriching in proteins specific to the apex of the cell, and BioID has further refined identification of proteins specific to the conoid complex region."

      The authors should state whether the candidate proteins were chosen in an unbiased way or not.

      Authors’ response: Candidate proteins selected for validation by microscopy were not biased for any known likelihood of being associated with the conoid, other than our proteomics data what we were seeking to test. However, we did preference proteins with the following traits, 1) proteins with strong corresponding gene knockout fitness phenotypes from published studies, 2) proteins with some evidence of conserved functional domains, and 3) genes with orthologues found in Plasmodium spp. and other apicomplexans. These traits were chosen with future functional studies in mind where proteins might be more informative of conoid-related functions and relevance in other apicomplexans. All validated proteins, however, were otherwise uncharacterised and, therefore, were not knowingly biased for more likely conoid-association over others discovered by our proteomics approach. We now include the following statement.

      “All proteins selected for validation were previously uncharacterised and with no a priori reason to be identified as conoid-associated other than our proteomics data.”

      If so, how many proteins were localized to the conoid and how many were not?

      Authors’ response: as stated above, we observed no false positives from the sample of 18 protein locations verified by microscopy.

      Related to this, the majority (14 out of 20) of the conoid proteins identified by LOPIT+BIOID in this paper were previously identified as conoid candidate proteins in Hu et al's 2006 paper, based on the number of peptides retrieved from the conoid enriched vs depleted fractions. Those data (see below) have been available from ToxoDB for many years and should be acknowledged.

      Accession# - conoid enriched : conoid depleted (from Hu et al. 2006)

      222350 - 2:0

      274120 - 3:0

      291880 - 1:0

      301420 - 3:1

      246720 - 4:0

      258090 - 10:0

      266630 - 8:1

      208340 - 4:2

      253600 - 1:0

      306350 - not found

      250840 - 1:0

      292120 - not found

      219070 - not found

      274160 - not found

      320030 - 7:1

      227000 - 10:0

      278780 - not found

      284620 - not found

      295420 - 6:0

      297180 - 4:0

      Authors’ response: Proteomic methods and mass spectrometry have experienced revolutionary advances since this 2006 study was conducted. These include improvements in both sensitivity and quantitation accuracy. The Hu et al 2006 study provided an exciting first step towards conoid protein discovery. However, by their original estimation, at least 35% of their putative conoid-specific proteins were identifiable as false positives (e.g. ribosomal proteins) and this estimate could not account for the majority of uncharacterised proteins whose potential for false positive attribution to the conoid was untested. From almost 300 proteins, this study only validated four as associated with the conoid. The further proteins listed above were not validated as conoid proteins in the Hu et al study and, therefore, could not be distinguished from the many false positives reported in their work. In our Table 1, we have acknowledged the Hu et al study for the select proteins that they established as conoid proteins in their study.

      To further assess the utility of this 2006 conoid-enriched proteome we sorted the Hu et al detected proteins on our full hyperLOPIT assignments. Of the proteins that were reported by Hu et al as either exclusive to the conoid-enriched fraction or enriched by at least 2-fold over the conoid-depleted fraction, 15% were assigned to the apical 1 and 2 clusters (representing the relevant compartments to the conoid complex). Thus, according to the hyperLOPIT data these represent the true positives found in this study and 13 of these proteins were independently validated as conoid-associated by us. Significantly, however, 85% of the conoid-exclusive and conoid-enriched proteins from Hu et al (2006) were allocated to a non-apical location with 99% probability by hyperLOPIT, and, during our validation of 62 assignments we verified the alternative location of eight of these. False positives, therefore, greatly outnumbered true positives in this earlier dataset. This high rate of false positives in subcellular isolation proteomics is typical of the challenges that this method faces, and this was the rationale for and strength of the alternative hyperLOPIT approach. Given the overall relatively low level of conoid specificity in the earlier work we do not think that there is value in making specific protein-by-protein reference to it.

      Reviewer #3 (Significance (Required)):

      see above

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      This manuscript details the further use of the hyperplexed Localisation of Organelle Proteins by Isotope Tagging (hyperLOPIT) that the group has previous published using T. gondii tachyzoites by combining this with BioID and super-resolution microscopy in order to uncover new proteins that form part of a structurally known and functionally elusive conoid. The authors conclusively identified new proteins that localise to the conoid structure in T. gondii and also excitingly showed that not only is this structure found in all invasive forms of plasmodium (using the P. berghei model) but there also is a different molecular make up in the blood stage merozoites which have a slightly reduced number of proteins (or possible as yet unknown alternatives) compared to ookinetes and sporozoite conoid structures. This study is scientifically sound and the conclusions reached are well supported by the results presented.

      **Major Comments:** No major comments

      **Minor Comments:**

      1)While both the introduction and discussion and well written and detailed they could both be a little more concise.

      Authors’ response: We take this as a style recommendation, but we note that the other reviewers commented on the text’s “eloquence” and that the introduction in particular was a “pleasure to read”. We take these comments as votes of confidence in the current form.

      2)Selection of the 5 new genes in Tg to be tagged (top pg 5) it was not clear as to the selection criteria for these 5.

      Authors’ response: Please see the same query, and response with modified text, made by Reviewer #3.

      This also leads to the second part of this question where there appears to be some genes missing from Table 1 and Table S1, specifically those found in both SAS6L and RNG2 BioID. It was mentioned that 25 were identified in both SAS6L and RNG2 BioID. In Table 1 (there are 23) there is no mention of 223790, 281650, 224700, and 293540 but they are in the Table S1 (assuming these 4 are not selected in this study for tagging) but in table S1 (there are 25 listed) 216080 (AKMT) and 234250 (CIP1) that are in the Table 1 as being identified in both SAS6L and RNG2 BioID are absent from the Table S1 does this mean there are actually 27 or was the indication of identified in both SAS6L and RNG2 BioID for 216080 (AKMT) and 234250 (CIP1) in Table 1 a mistake?

      Authors’ response: This reviewer has overlooked that Table 1 reports on all currently known conoid associated proteins, including those not detected in the hyperLOPIT data but reported in the literature, whereas Table S1 is exclusively those proteins detected and assigned as ‘apical’ by hyperLOPIT. The reported BioID-detection for each protein is then made within this framework. Thus, the proteins that occur in only one or the other table do so because they don’t satisfy these two sets of criteria. We have rechecked the numbers reported in the text and they are correct.

      3)Table 1: There is the fitness score for Pf orthologues but no mention of fitness in Pb (the model used) from the PlasmoGEM screens, considering the authors use the Pb model it would be of interest to add this in the table.

      Authors’ response: The Plasmodium berghei PlasmoGEM gene disruption screen were much more limited in number than that for P. falciparum. Consequently, fitness scores were available for only two of the Plasmodium orthologues for which we have location data. We, therefore, thought it was of limited utility to include these data in Table 1, and these data are in the public domain should a reader seek them.

      4)Figure 2: The image for localisation with SAS6L for 291880 and 258090 appear to be missing.

      Authors’ response: Initially we did not make the separate transgenic cell lines for each protein with both the SAS6L and RNG2 markers. This was because one marker was usually sufficient to resolve the relative location of the protein of interest. However, given this reviewer’s comment and the potential for some extra information to be recovered by using both markers, we have now generated all cell lines necessary for this analysis. We are presently completing the imaging of these new cell lines and these data will be included in the subsequent revision.

      5)Figure 3: It is unclear why both SAS6L and RNG2 are not used for all localisations shown (this could be clarified in the text)

      Authors’ response: see previous comment.

      6)Figure 5: It is a shame only 7 of the 9 plasmodium orthologues were included in the super resolution as there is only 2 more to have the complete set.

      Authors’ response: Ideally, we would have been able to achieve this but, the restrictions imposed by the COVID-19 disruption to laboratory access and activities ultimately slightly limited these analyses. However, to answer the central question of whether there is conservation of the Toxoplasma conoid proteome in Plasmodium it was not necessary to perform super resolution imaging for all of these proteins. The major outcome of this study, therefore, is not affected by this.

      7)Figure 6: As with Figure 5 it would be better if more were included in the super-resolution images in this sporozoite stage.

      Authors’ response: Same response as above. Generation of sporozoites requires passage through the mosquito vector so this is even more resource-intensive than generation of ookinetes that can be differentiated in vitro from mouse-derived parasites. Again, the answers to the central questions posed by this study do not require these further, high resolution, data.

      8)Figure 7: This would be improved with at least a selection (or even all 6) to have the super-resolution images (possibly even with free merozoites)

      Authors’ response: We did apply 3D-SIM imaging to fixed merozoites, however, unlike ookinetes and sporozoites, the imaged fixed material was inferior to the live cell GFP imaging that we have included. This likely reflects the poorer fixation properties of Plasmodium merozoites that is a challenge of these cell forms that is widely experienced by Plasmodium researchers. We do not have access to a 3D-SIM microscope within a containment laboratory necessary for handling viable parasites, therefore, could not attempt to image live material with this instrument. Again, the answers to the central questions posed by this study do not require these further, high resolution, data

      9)As there are numerous new protein identified in 2 different parasites and with the composition of the conoid differing at different stages it would be beneficial to have some sort of schematic model of the apical complex in Tg and Pb indicating where each new protein localises

      Authors’ response: In response to this reviewer, and reviewer #2’s suggestion, we are now preparing schematic models of the apices of all of the relevant organism stages.

      Reviewer #4 (Significance (Required)):

      The authors have combined expert mass spectrometry and super-resolution microscopy to identify new components of the conoid in Tg and added to the knowledge that will help to uncover the function of the structure. But perhaps the most significant is the conclusive identification of the conoid in all 3 invasive stages of the plasmodium parasite. Until now it was widely accepted that the conoid was missing in plasmodium and to uncover multiple proteins that appear to make up and constitute this structure in Plasmodium is highly significant and clear of interest to the Apicomplexean field. Furthermore the suggestion that the conoid differs in the molecular makeup within Plasmodium depending on stage is very intriguing and clearly of interest. This paper expertly combined cutting-edge proteomic and microscopy to identify the conoid in Plasmodium. This manuscript would have a broad readership in parasitology, proteomics, and cell biology

      Our expertise is largely in molecular parasitology and microscopy

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript details the further use of the hyperplexed Localisation of Organelle Proteins by Isotope Tagging (hyperLOPIT) that the group has previous published using T. gondii tachyzoites by combining this with BioID and super-resolution microscopy in order to uncover new proteins that form part of a structurally known and functionally elusive conoid. The authors conclusively identified new proteins that localise to the conoid structure in T. gondii and also excitingly showed that not only is this structure found in all invasive forms of plasmodium (using the P. berghei model) but there also is a different molecular make up in the blood stage merozoites which have a slightly reduced number of proteins (or possible as yet unknown alternatives) compared to ookinetes and sporozoite conoid structures. This study is scientifically sound and the conclusions reached are well supported by the results presented.

      Major Comments: No major comments

      Minor Comments:

      1)While both the introduction and discussion and well written and detailed they could both be a little more concise.

      2)Selection of the 5 new genes in Tg to be tagged (top pg 5) it was not clear as to the selection criteria for these 5. This also leads to the second part of this question where there appears to be some genes missing from Table 1 and Table S1, specifically those found in both SAS6L and RNG2 BioID. It was mentioned that 25 were identified in both SAS6L and RNG2 BioID. In Table 1 (there are 23) there is no mention of 223790, 281650, 224700, and 293540 but they are in the Table S1 (assuming these 4 are not selected in this study for tagging) but in table S1 (there are 25 listed) 216080 (AKMT) and 234250 (CIP1) that are in the Table 1 as being identified in both SAS6L and RNG2 BioID are absent from the Table S1 does this mean there are actually 27 or was the indication of identified in both SAS6L and RNG2 BioID for 216080 (AKMT) and 234250 (CIP1) in Table 1 a mistake?

      3)Table 1: There is the fitness score for Pf orthologues but no mention of fitness in Pb (the model used) from the PlasmoGEM screens, considering the authors use the Pb model it would be of interest to add this in the table.

      4)Figure 2: The image for localisation with SAS6L for 291880 and 258090 appear to be missing.

      5)Figure 3: It is unclear why both SAS6L and RNG2 are not used for all localisations shown (this could be clarified in the text)

      6)Figure 5: It is a shame only 7 of the 9 plasmodium orthologues were included in the super resolution as there is only 2 more to have the complete set.

      7)Figure 6: As with Figure 5 it would be better if more were included in the super-resolution images in this sporozoite stage.

      8)Figure 7: This would be improved with at least a selection (or even all 6) to have the super-resolution images (possibly even with free merozoites)

      9)As there are numerous new protein identified in 2 different parasites and with the composition of the conoid differing at different stages it would be beneficial to have some sort of schematic model of the apical complex in Tg and Pb indicating where each new protein localises

      Significance

      The authors have combined expert mass spectrometry and super-resolution microscopy to identify new components of the conoid in Tg and added to the knowledge that will help to uncover the function of the structure. But perhaps the most significant is the conclusive identification of the conoid in all 3 invasive stages of the plasmodium parasite. Until now it was widely accepted that the conoid was missing in plasmodium and to uncover multiple proteins that appear to make up and constitute this structure in Plasmodium is highly significant and clear of interest to the Apicomplexean field. Furthermore the suggestion that the conoid differs in the molecular makeup within Plasmodium depending on stage is very intriguing and clearly of interest. This paper expertly combined cutting-edge proteomic and microscopy to identify the conoid in Plasmodium. This manuscript would have a broad readership in parasitology, proteomics, and cell biology

      Our expertise is largely in molecular parasitology and microscopy

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this work, Koreny et al. characterized the localization of a new collection of conoid proteins in Toxoplasma gondii as well as in several different stages of Plasmodium berghei. The authors discovered that these proteins are located in several distinct substructures in Plasmodium and are expressed in a stage-specific manner. The data are of high quality, well‐organized, and well presented. The paper is well written. The introduction, in particular, was a pleasure to read. This reviewer (Ke Hu) does not have any new experiments to suggest.

      However, while the authors present LOPIT+BIOID as a powerful approach to identify conoid proteins, implying that it is more reliable than previously published approaches (see below), the manuscript includes no data to show what the false positive or false negative rate is with the current approach, nor any estimate of how many conoid proteins were missed entirely.

      Page 7: "Previous identification of conoid complex proteins used methods including subcellular enrichment, correlation of mRNA expression, and proximity tagging (BioID) (Hu et al. 2006; Long, Anthony, et al. 2017; Long, Brown, et al. 2017). Amongst these datasets many components have been identified, although often with a high false positive rate. We have found the hyperLOPIT strategy to be a powerful approach for enriching in proteins specific to the apex of the cell, and BioID has further refined identification of proteins specific to the conoid complex region."

      The authors should state whether the candidate proteins were chosen in an unbiased way or not. If so, how many proteins were localized to the conoid and how many were not? Related to this, the majority (14 out of 20) of the conoid proteins identified by LOPIT+BIOID in this paper were previously identified as conoid candidate proteins in Hu et al's 2006 paper, based on the number of peptides retrieved from the conoid enriched vs depleted fractions. Those data (see below) have been available from ToxoDB for many years and should be acknowledged.

      Accession# - conoid enriched : conoid depleted (from Hu et al. 2006)

      222350 - 2:0

      274120 - 3:0

      291880 - 1:0

      301420 - 3:1

      246720 - 4:0

      258090 - 10:0

      266630 - 8:1

      208340 - 4:2

      253600 - 1:0

      306350 - not found

      250840 - 1:0

      292120 - not found

      219070 - not found

      274160 - not found

      320030 - 7:1

      227000 - 10:0

      278780 - not found

      284620 - not found

      295420 - 6:0

      297180 - 4:0

      Significance

      see above

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors study proteins localised to the apical end of the highly polarised parasites causing Toxoplasmosis and malaria. They find new proteins using BioID and examine the localisation of these along with recently identified proteins in the two different parasites. They key question they address is whether there is a conservation of the apical components in these distantly related parasites as well as in some even more distantly related organisms. This is an important question as the apical part comprises many essential proteins of invasion of host cells and shows a unique structure that defines the apicomplexans as a group. The apical structure can be highly elaborate such as in T. gondii and less elaborate as in P. falciparum. The authors now show that there is a large conservation between the species in the protein makeup of the apical end. The experiments are well performed, displayed and discussed and there is no doubt about the validity of the presented results. The text is eloquently written, if at times a bit wordy. My only main suggestion would be to possibly add data on gene disruption of the two candidates (0310700 and 1216300) that are not detected in blood stage parasites but in the insect stages. A deletion of these should be technically straightforward and would show whether the proteins are important to the parasite. Likely not all of the now many proteins are essential for the parasites but these are good candidates to rapidly investigate. But showing a functional impact might convince editors at certain journals.

      Other suggestions in chronological order (line numbers would have helped)

      title: maybe write 'conoid complex proteome'

      abstract: not sure about the use of the words instrument and substructures

      page 2 last lines: is tubulin monomeric or polymerized?

      page 3 name protein talked about in 9th line

      third paragraph: mention previous proteomics studies e.g. from Ke Hu (mentioned later in discussion)

      first paragraph or results could go into introduction

      page 4: add reference after BioID

      page 5: add definitions of the conoid; what technique was used to report YFP-SAS6?

      page 7: 'showed similar localisation' instead of 'phenocopied'?; add reference after ookinete stage; add expression levels from PlasmoDB to the Table 1 data at least for merozoites, ookinetes and sporozoites or add separate table for the 9 proteins in supplement

      Discussion: Maybe discuss that the conoid complex is a cytoskeletal structure and that the other cytoskeletons (actin, microtubules, subpellicular network) also differ between the species investigated in their composition and overall architecture

      page 9: at least two proteins could be deleted as they seem to not confer any growth defect on blood stages (see main comment)

      Apart from classic TEM images also Cryo EM data is available for apex of merozoite and sporozoite. Worth to discuss?

      Add and discuss the recent work from Curr Biol and EMBO J of the Yuan lab on ookinete formation?

      Significance

      The paper provides a conceptual advance over previous data as it shows clearly a high level of conservation of the protein components of the conoid complex. It could introduce a new terminology for these important apical structure of Apicomplexan parasites and provides a good basis to dissect the molecular functions. As it stands all scientists investigating Plasmodium and Toxoplasma invasion of host cells will be highly interested in this study, most scientists researching apicomplexan organisms should be and some evolutionary scientists will be interested in this study.

      Key papers in the field are the discovery of the Toxoplasma conoid as a highly twisted microtubule-like structure (Hu et al., JCB 2002; doi: 10.1083/jcb.200112086) the first description of an apical proteome (Hu et al., PLoS Path 2006; 10.1371/journal.ppat.0020013), the description of a tilted arrangement of the rings in Plasmodium versus Toxoplasma (Kudryashev et al., Cell Microbiol 2012; doi: 10.1111/j.1462-5822.2012.01836.x) and the discovery of apical located proteins that are essential for conoid formation (Tosetti et al., eLife 2020; 10.7554/eLife.56635) to name a few.

      If intended for a broader audience, a cartoon of a conoid complex across the different species investigated and discussed here would help for visual guidance highlighting the similarities and differences

    1. Reviewer #2

      In this manuscript, the authors applied Gaussian Process regression to drug response data and attempted to utilize the estimates of uncertainty from these regression to improve on drug response curve fitting and biomarker discovery. Their approach and application case is an interesting one that deserves further investment and attention. However, I have substantive concerns with the current manuscript draft and would recommend to the authors that these concerns be addressed.

      1) Figure 3 and the accompanying text section of the main document seems to be focused on characterizing estimation uncertainty, which appears to simply be the between-sample dispersion of the dose-response curve (or summary statistics thereof) from replicate runs. The main conclusion seems to be that drug compounds with partial responders are the ones with the greatest between-sample dispersions.

      What is missing from this Figure and accompanying text is a comparison of these results with analogous ones for the observation uncertainty to help readers understand why one approach may be preferred over the other.

      2) Figure 5A compares the posterior probability from the Bayesian test (presumably accounting for estimation uncertainty) against the q-value from an ANOVA test. The q-value should be the False Discovery Rate, which controls for the proportion of false positives. This does not seem to be directly comparable to a posterior probability. The authors should clarify why a comparison of proportion to posterior probability is reasonable.

      3) The authors do not appear to have demonstrated how estimation uncertainty can improve on drug response curve fitting or biomarker discovery?

      For the former, the fitted curves using standard approaches appear similar to those fitted using GP regression, as the authors seemed to have focused on those curves where the two approaches are concordant and as the IC50 value differences appear minimal for those cases where IC50 is within the tested concentration range. The greatest differences are seen for those cases where IC50 values are outside the tested concentration ranges, but these cases were not in focus in the text. In addition, for these cases, it is unclear if relying on curve fits from GP regression makes sense because they are also the cases with the highest estimation uncertainty.

      For the latter, it appears that every significant biomarker identified using Bayesian posterior probability is also significant by ANOVA (using a standard q-value < 0.05 cutoff).

    2. Reviewer #1

      The authors propose two related (though distinct) methods for the improvement of pharmacological screening analysis and related biomarker analyses. The first is a Gaussian process (GP) approach to dose-response curve fitting for the estimation of IC50, AUC, and related quantities. The goal of this method is to improve point and uncertainty estimates of these quantities through more flexible functional specification and outlier-robust error modeling. The second method is a hierarchical Bayesian approach to biomarker association analysis. This incorporates uncertainty estimates produced by the GP modeling with the aim of providing more sensitive association analyses with fewer false positives.

      The combination of methods presented has some potential. Flexible modeling of dose-response relationships and better estimation of uncertainty are interesting axes to wring more information out of large-scale screening datasets. There are a few areas to shore up in the paper to increase confidence in the empirical results and generalizability of the methods.

      1) There are a number of fixed parameters in the proposed methods, and the calibration procedure used to set these is unclear to me. For the GP models, there are a set of noise parameters for Beta mixture and the length scales and variance parameter for the kernel. I'm not sure how one would generalize the GP methods to other screening datasets as a result of this ambiguity (e.g., how would one determine appropriate noise parameters?). For the hierarchical Bayesian biomarker association model, we have prior scale parameters related to both the effect size and variance parameters. The number of researcher degrees of freedom introduced by these tuned parameters also raises some concerns about the sensitivity of empirical results (e.g., 24 clinically established biomarkers and 6 novel) to these choices. It's not clear if we're seeing a corner case or a robust result. I think the work would benefit from both sensitivity analyses with respect to tuned parameters and guidance on or methods for their estimation. The latter is particularly important if other researchers hope to employ these methods in a different context.

      2) The proposed hierarchical Bayesian approach to biomarker association analysis is a reasonable start, but it was unclear to me whether changes in performance stemmed from correcting misspecification in original ANOVA or the use of uncertainty estimates. I suggest comparing results to a heteroskedasticity-robust estimator (e.g., HC3, see Long and Ervin, 2000), which would be valid under the stated model without the requirement for explicit uncertainty estimates or priors. The transformations and tuning applied to uncertainty estimates in this context also make generalization of the approach challenging. The need for the c (power) parameter suggests a potential misspecification or miscalibration at some point in the modeling chain. It would be useful to understand this misspecification better, particularly for researchers hoping to extend or reuse these methods.

      3) The GP method provides reasonable estimates of uncertainty, but it would be useful to see them compared to those from the sigmoid model (e.g., from the delta method). It wasn't clear to me how much of the difference in results is coming from incorporation of uncertainty estimates as opposed to changes in the point estimates.

      4) The handling of cases with IC50 beyond the maximum observed dose (extrapolating to 10x the maximum concentration) provided a reasonable starting point, but a few subtleties in the handling of corner cases remain unaddressed (e.g., GPs allow positive slope at right edge of range). It would be useful to provide a more general, systematic procedure to address these. Imposing monotonicity may not be the best path, but additional guidance for researchers applying these methods in other contexts would help.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary

      This manuscript presents two statistical approaches to evaluating drug effect measurements and associations between biomarkers, for dose curve data. Measurements of these kinds are made in many contexts, and frequently reported without accounting well for measurement uncertainties. A statistical framework of this kind will be widely useful and should be frequently applied.

    1. Reviewer #3

      This manuscript reports the first description of a eukaryotic-like Bin/Amphiphysin/RVS (BAR) domain protein in a bacterium (Shewanella oneidensis MR-1), BdpA, with conserved roles in membrane curvature control during outer membrane vesicle (OMV) formation. Consistent with this, a BdpA-defective mutant had defects in the size and shape of redox-active membrane vesicles and formed outer membrane extensions (OMEs) lacking the characteristic tubular structure. Heterologous expression of the BdpA proteins in model (Escherichia coli) and non-model (Marinobacter atlanticus) bacteria hosts promoted OME formation. The authors propose BdpA as a new subclass of prokaryotic BAR proteins with eukaryotic-like roles in membrane curvature modulation. This is an interesting finding that could be strengthened with topological studies of BdpA in OMV/OME and quantitative analyses to validate the many qualitative microscopic observations.

      Numbered summary:

      1) To my knowledge this is the first description of a BAR-domain protein in a prokaryotic organism. But the role of prokaryotic proteins with amphipathic α-helical domains in membrane binding/curvature is not new. A review by Dowrkin1 describes some of these structural homologs and their role in membrane binding and curvature control via their amphipathic domains (e.g., Bacillus subtilis SpoVM, which controls forespore membrane curvature during sporulation using its helical domain). This information is important in the introduction and could help with the phylogenetic analyses (comment 4 below).

      2) I am having a hard time reconciling the presence of a galactose-binding domain in BdpA and LPS sugar binding. This would suggest that the proteins coat the OMV rather than interacting with the periplasmic side of the outer membrane to promote OMV formation and release (which I somehow assume based on the role of some eukaryotic BARs). The lack of topological studies makes these models highly speculative and weakens some of the conclusions. The paper would be strengthened with the addition of topological studies in OMVs and OMEs.

      3) Many experiments rely on microscopic observations of cells, OMVs and OMEs to support conclusions based on (at most) semiquantitative data. These experiments require validation with methods that quantitatively determine critical variables such as OMV size and size distribution. Also note the microscopic methods are poorly described or not described at all in the methods section. Thus, it is not clear how many cells they examined microscopically and how many biological replicates (cultures) they used. The variability associated with this type of microscopic assessments makes sample size (number of cells, typically in the hundreds) and replication in independent cultures critical.

      4) Many of the branch points in the phylogenetic tree (Fig. 5) have very low confidence values. The authors did not provide the alignments so I could not evaluate the accuracy of the approach to offer suggestions for improvement. The predictive value of the tree may improve by including prokaryotic amphipathic helical domains such as those from SpoVM, MinD and FtsA. These issues are not as concerning in the tree presented in Fig. S6 although I note that this tree is supposed to show the distribution of "BdpA orthologs in other prokaryotes" but most of the branches are for eukaryotic proteins. I also note that the Methods section describes important results about the homology (or lack of homology) between BdpA and other prokaryotic and eukaryotic proteins. This information is more appropriate in the Results section.

      References:

      1) Dworkin, J. Cellular polarity in prokaryotic organisms. Cold Spring Harbor perspectives in biology 1, a003368-a003368, doi:10.1101/cshperspect.a003368 (2009).

      2) Gorby, Y. et al. Redox-reactive membrane vesicles produced by Shewanella. Geobiology 6, 232-241, doi:10.1111/j.1472-4669.2008.00158.x (2008).

    2. Reviewer #2

      Some Gram-negative bacteria, such as Shewanella oneidensis, produce outer membrane extensions (OME) that mediate electron transfer to extracellular substrates. Many of the players involved in the transfer of electrons via these nanowires have been discovered but the mechanisms of outer membrane remodeling have remained mysterious. Here, Phillips, Zacharoff, and colleagues, identify BdpA as a protein that stabilizes OMEs in Shewanella oneidensis and perhaps displays outer membrane remodeling activity in other bacterial species. Given its homology to eukaryotic BAR-domain proteins, the authors suggest that BdpA and its homologs define the first prokaryotic family of BAR proteins or pBARs.

      This works tackles a number of significant questions that span broad areas of microbiology and cell biology. First, it explores a critical area of bacterial cell biology: how do gram negatives remodel their outer membranes? Second, it focuses on an underappreciated aspect of extracellular electron transfer, an activity widespread amongst bacteria with clear relevance to basic and applied fields. Finally, it provides a possible glimpse into the evolution of BAR-domain proteins which play diverse cellular roles in eukaryotes. Despite the substantial advances presented here, I have some concerns which, if addressed, can lead to more certain conclusions about the cellular role of BdpA.

      1) I liked the comparative proteomics approach as a tool to identify unique OME components. I was surprised that the two fractions differed so much in their protein composition. Based on the materials and methods the OM and OME fractions were isolated from cells grown under very different conditions. Could this account for the large differences between these two fractions? Looking at the list of proteins enriched in either fraction is there any indication of significant contamination from other cellular fractions? What controls were used to ensure that the purification procedure was working effectively?

      2) The authors conclude that the OM vesicles are conductive. However, some controls are needed since other cellular components (such as OM fractions containing Mtr proteins) may have contaminated the OME fraction. Is the OME fraction "enriched" for this activity compared to just the OM fraction?

      3) Is BdpA really a BAR-domain protein? The authors use computational tools (such as BLAST and homology modelling) to posit that BdpA is a BAR-domain protein. This hypothesis is strengthened by the phenotype of mutants missing bdpA. While OMEs are not absent, their architecture is visibly altered which may point to some instability in the membrane extensions. Significantly, BdpA is sufficient to induce OME-like structures when expressed in planktonic Shewanella cells, a condition during which OMEs are not normally produced. However, as authors state, BdpA barely meets the cutoff (as set by the program used) for a BAR-domain protein. Furthermore, some of its homologs that share high levels of sequence identity don't pass the bar set by these computational methods. However, we cannot say that BdpA is actually a BAR-domain protein. Its effects on membrane stability could be indirect or the result of binding to outer membrane features in a manner distinct from other BAR proteins. Therefore, some biochemical corroboration of its activity on membranes or structural data are needed to confirm its relationship to eukaryotic Bar domain proteins. On a minor note I would prefer "bacterial" rather than "prokaryotic" since BdpA Bar-like domain is not found in archaea. Also, other groups have proposed that bacterial proteins contain BAR domains (for instance, Tanaka et al in reference 28). How similar is BdpA to these proteins?

      4) Heterologous expression of BdpA in other bacteria provides one of the most compelling arguments for its central role in producing OMEs. However, the imaging data provided here (at least in my pdf) do not provide the clearest evidence for induction of OMEs in M. atlanticus and E. coli. This is especially the case with the E. coli images. The extended web of staining in 4c does not resemble the tubules seen in S. oneidensis. It would be great to have some electron microscopy data and/or higher resolution fluorescence images of these bacteria as corroborating evidence. Additionally, only a few cells are shown so some quantification of the proportion of cells with OMEs is needed.

      5) Other than the predicted signal peptide, does BdpA have any predicted features that indicate it is an outer membrane protein? The authors hypothesize that the putative Galactose-binding domain of BdpA mediates binding to LPS. However, it is also possible that it binds to peptidoglycan components. Therefore, independent data on localization of BdpA via microscopy or higher resolution biochemical fractionation would provide greater confidence that the protein is acting in the appropriate cellular location.

    3. Reviewer #1

      In the manuscript "A Prokaryotic Membrane Sculpting BAR Domain Protein" the authors describe the identification of the first bacterial membrane sculpting BAR domain protein, and the characterization of its function. In eukaryotes this protein is important for shaping membrane curvature. Here they identify a protein containing a BAR domain in the bacterium Shewanella oneidensis, which they name BdpA (BAR domain-like protein A). The authors show that BdpA is enriched in outer membrane vesicles (OMVs) and outer membrane extension (OMEs), regulates the size of OMVs and the shape of OMEs. They show this by characterizing and quantifying membrane vesicles and extension comparing WT with a BdpA mutant and the BdpA mutant with heterologous BdpA expression. They further show that heterologous expression of BdpA promotes OME in E. coli.

      In my opinion this paper provides solid support for the presence of these proteins in bacteria with an important function in membrane vesicles and membrane extensions.

      Minor Comments:

      1) In the introduction the authors summarize what is known about BAR eukaryotic protein in terms of membrane localization and their role in membrane curvature and tubulation events. I think it is important to also provide a summary of what is known about the functional biological implication of these proteins in eukaryotes. Namely, if the main function of BAR proteins in eukaryotes is always related to tubulation formation or if there are other functions attributed to these proteins.

      2) Contrast and resolution in Figure 3, panel a, is weak making it difficult to see tubules described by the authors.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary

      In this manuscript the authors propose the identification of a novel protein involved in outer membrane remodelling, named BdpA (BAR domain-like protein A). According to the proposed model BdpA has a conserved role in membrane curvature control during formation of outer membrane vesicle (OMV) and of outer membrane extension (OMEs) in Shewanella oneidensis. The authors also provide evidence that heterologous expression of BdpA promotes formation of OMEs in other bacteria (namely in E. coli), and that BdpA is sufficient to induce OME-like structures when expressed in conditions where OMEs are normally not formed. In eukaryotes proteins containing BAR domains are important for shaping membrane curvature. Given the homology of BdpA to eukaryotic BAR-domain proteins, the authors suggest that BdpA and its homologs define the first prokaryotic family of BAR proteins or pBARs, with eukaryotic-like roles in membrane curvature modulation.

      Overall, the reviewers think that this is a very interesting study, and provided that further support is obtained to substantiate the proposed model the reviewers agree that the findings described here tackle a number of significant questions of broad interest. However, the reviewers also think that the evidence provided in this manuscript still does not fully support the conclusion that BdpA protein is involved in membrane curvature control as the eukaryotic proteins containing the BAR domains.

      We have compiled a list of comments that we hope will help the authors address the concerns of the reviewers to obtain stronger support for the function of BdpA.

      1) The reviewers are concerned that some of the conclusions are based on qualitative observations of microscopy analysis of OMVs and OMEs, and quantitative analyses are lacking to validate qualitative observations. As specified in with examples in the list of minor points below the reviewers propose that the data should be re-analyzed to obtain quantitative results. Specifically, a size distribution analysis could be applied to some microscopy data. Also note that the microscopy methods are poorly described, and as the calculation methods used are not fully available it is difficult to understand if the appropriate methods were used. Please specify how many cells were examined microscopically and how many biological replicates (cultures) were used in each experiment.

      2) Statistical analyses were not always the most accurate. In figure 2 unpaired t-test was used for samples that have high variance, this approach may inflate the statistical difference between the strains. For figure 2 a histogram of size distribution analyses could be shown for each strain.

      3) The reviewers are concerned that the proteomic data is not clear enough to conclude that the BdpA protein is localized to or enriched in OMV/OME. Could the results be complemented with some other method to confirm BdpA localization? The reviewers are particularly concerned by the fact that a large number of proteins were identified in the OMV fraction. Could it be that some of the OMV/OME fractions were contaminated? What controls were used to ensure that the purification procedure was working effectively? Could the data be strengthened by some quality control analyses to determine how many of those proteins are actually predicted to localize to the outer membrane and periplasm? From the methods it seems that the culture conditions used to prepare the OM versus OMV were different, is this so? If yes, why were the culture conditions different? This could affect protein expression? Please include the detailed growth conditions in the method section.

      4) The conclusion that BdpA is a BAR-domain protein is largely based on homology. The supplementary information file includes homology models that show striking similarity with eukaryotic BAR proteins. However, as the authors state, BdpA barely meets the cutoff for a BAR-domain protein. The results with the phenotype of the BdpA mutant, complementations and sufficiency data provide good support to the functional role of BdpA in membrane remodelling. However, the effect of BdpA on membrane stability could be indirect or the result of binding to outer membrane features in a manner distinct from other BAR proteins. Could these results be strengthened with some biochemical corroboration of its activity on membranes or structural data to confirm its relationship to eukaryotic Bar domain proteins? Or structural data to confirm its relationship to eukaryotic BAR domain proteins?

      5) The reviewers propose that the paper would be strengthened with the addition of topological studies in OMVs and OMEs. The reviewers had problems in reconciling the presence of a galactose-binding domain in BdpA and LPS sugar binding. The authors hypothesize that the putative Galactose-binding domain of BdpA mediates binding to LPS. However, it is also possible that it binds to peptidoglycan components. This would suggest that the proteins interact with the periplasmic side of the outer membrane rather than coat the OMV to promote OMV formation and release (which one could assume based on the role of some eukaryotic BARs). The addition of topological studies (or some biochemical approach) could make these models less speculative, strengthening the conclusions.

      6) Heterologous expression of BdpA in other bacteria provides important compelling arguments for its central role in producing OMEs. However, the imaging data provided do not provide the clearest evidence for induction of OMEs in M. atlanticus and E. coli. This is especially the case with the E. coli images. The extended web of staining in 4c does not resemble the tubules seen in S. oneidensis. It would be great to have some electron microscopy data and/or higher resolution fluorescence images of these bacteria as corroborating evidence. Additionally, only a few cells are shown and quantification of the proportion of cells with OMEs is needed. Thus, as already discussed in point 1, quantitative analyses could improve this important point.

    1. Reviewer #4

      This is an innovative and very interesting study reporting the correlation of extracted neural timescales and expression of NMDA and GABA_a receptor subunits amongst others.

      Comments:

      -definition of timescale is missing in the introduction. Fast and slow responding to sensory versus cue related information reflects a circular definition of timescales.

      -the results text say that the aperiodic components is interpreted as time scale but not how the inference is made, i.e. what quantity is interpreted as time scale.

      -it is difficult to keep track of which timescales are referred to when in the text, e.g. the authors start referring to neuronal timescales after having discussed ECOG based time scales and spike timescales. It seems important for cleanly separating the source of the timescale to denote them with a unique label depending on the source data that gives rise to them. Why not use a subscript for spike, epiduralECoG, subduralECoG, intracranialLFP, ... ?

      -the article seems to assume that mRNA expression for specific receptor subunits correspond to the density of expression of those receptors. It seems important that this is made explicit (if correct) and that a reference is given that shows this relationship.

      -line 142 refers to "task-free ECoG recordings in macaques" but does not clarify where the data comes from. No reference is provided.

    2. Reviewer #3

      In this paper entitled 'Neuronal timescales are functionally dynamic and shaped by cortical microstructure', Gao et al. use open access databases to address two distinct questions: 1) the relationship between hierarchically organized variations in neuronal timescales and brain gene expression and 2) the effect of task and age onto the neuronal timescales of a given cortical regions.

      Overall, this is a well-designed study and the combination of open access databases is well organized and astutely exploited. I, in particular, very like the analysis that tests whether variations in gene expression still accounts for variations in neuronal timescales when the main gradient effect is regressed out. Below are my comments on the manuscript.

      1) For the non-specialist reader, the concept of neuronal timescales that is central to the paper should be defined more explicitly in the introduction ('neuronal timescales' appear in paragraph 3, while it gets defined in paragraphs 1 and 2).

      2) In figure 2B, some T1w/T2w values are above values of 2, which is not standard. Likewise, several outliers can be observed. This might have impacted the estimation of the regression slope. This slope currently matches the one from Burt et al. 2018, although the data point distribution is different.

      3) Figure 4B is contradicting figure 2C as the evidenced timescale hierarchy is different (comparing PC, PFC and OFC). Please explain.

      4) Figure 4B and 4C, please show actual data points and justify parametric tests.

      5) Figure 4C: how consistent is the increase in delay period timescales across areas within each subject. In other words, is this a general property of the brain, task-related effects resulting in a non-specific adjustment in neuronal timescales or are there regional differences in the reported increase (you might want to exclude the PFC from the analysis to remove task related effects).

      6) The manuscript addresses two distinct aspects of neuronal timescales: their relationship to local microarchitecture and their dynamics as a function of task or age. Although there is obviously a strong inter-relationship between these two aspects, this deserves a more extensive discussion. For example, in relation with the previous point, if local microstructural properties predict neuronal timescales, why is it that timescale changes during the delay seem to be ubiquitous (or are they)? And why should such changes (that are overall in the same range) correlate with subject performance in the PFC but not in the other areas? How does this relate to the aging observations? Although this discussion is bound to be speculative, I think it is important in order to strengthen the link between these two independent avenues of the paper, and to enrich the discussion about the functional role of these dynamic changes in neuronal timescales.

      7) Given the described age-related effect, did the authors check that the different databases they used sampled from subjects with the same age distribution.

      8) Legend of figure 1 is not self-explanatory and a lot of the symbols and information plotted in the figures are not explained. Unfortunately, this information is also missing from the result section.

      9) Figs 3E and 3F are mislabeled as 4E and 4F.

      10) Generally speaking, given that the main text itself is very dense, figure legends should be more self-explanatory. Quite often, figure detail description and contextual information are missing both from the text and the figures. This also applies to the supplementary figures.

    3. Reviewer #2

      Overall, this is an interesting manuscript and a well-done study. The main finding is that neural timescales, as quantified through the decay of the power spectrum, vary over cortical regions and are correlated with genes that regulate ionic and structural properties of neurons. The findings aren't terribly surprising and the computational impact on cognition and aging remains unclear (other than showing differences), but the overall approach is novel and interesting.

      I have an overarching concern, which is that the manuscript is written to be dense yet terse, which makes it harder to read, particularly given the complexity of the analyses. It feels like it was written for a journal with extreme word limitations. The manuscript would be overall improved if the authors would "loosen their belt" and explain the findings and methods in more detail.

      What are "these" limitations on line 96?

      Figure 1e: how is r2=1 when the dots do not fall on the line?

      I'm confused about the description of the methods on page 5. For example, "we can estimate neuronal timescale from the 'characteristic frequency'" which implies a peak in the spectrum. Yet in the next sentence they write that they extract timescale from aperiodic components.

      Page 7: Are these markers also correlated with cell packing density? If so, it's possible that denser neural networks have longer timescales.

      Relatedly, how strongly inter-correlated are these genetic markers across the cortex? The authors mostly take a mass-univariate approach except for showing gene-PC1 in Figure 3a. There isn't enough information shown to evaluate whether the top PC is suitable, or whether this PC comprises many/all gene contributions or is driven by a small number, etc.

      I'm missing the modeling results. They appear as a schematic in figure 1 and are mentioned in the Methods section. Was this model actually used somewhere?

    4. Reviewer #1

      These findings are a significant advance in comparison to previous work like Murray et al. (2014) and Dotson & Gray (2018 - please cite here) in the sense that brain-wide hierarchy is considered, whereas previous work considered a smaller set of brain areas. Furthermore, several other interesting correlations are reported with timescales. Overall the analyses appear to be of very high quality, providing a standard for similar studies in the future, and the authors carefully considered problems that arise in correcting for dependent samples, which I applaud.

      Some of the claims need further discussion or refinement, in my opinion.

      1) The comparison shown in Figure 2 between spiking time-scale and ECOG time-scale might be problematic, in the sense that the spiking time-scales were taken from the Murray et al. (2014) paper where they were quantified with a different technique. My suggestion would be to quantify time-scales in the same manner as Murray, or maybe there is a convincing argument why this is not a problem.

      2) The correlations shown between transcriptomics and timescales need to be carefully considered. While the authors regress out T1w/T2w residuals, these might just be one structural factor that changes with cortical hierarchy and assumes that the underlying relationships are linear. Hence, it is possible that timescales and gene profiles are correlated with structure but that there is no causal relationship between these genes and timescales. In this sense, the correlation of genes with hierarchy might also yield similar genetic profiles. It would be important to show the correlation of hierarchy with genetic profiles, to see whether this looks different from the correlations that are obtained with timescale.

      3) The authors use T1W/T2W as the measure for cortical hierarchy. This is a gradient-based perspective on cortical hierarchy. However, there are other perspectives on hierarchy that are not gradient-based, but are based on anatomical connectivity, e.g. as pursued by Kennedy and Van Essen (Vezoli et al., 2020, Biorxiv). This needs to be discussed.

      4) The paper does not consider oscillations, which is fine, but the reader is left wondering how oscillations affect these time-scales. Discussion on this aspect would be useful.

      5) Are the rho correlation values corrected for the expected value of the surrogate distribution? That is, are they significantly overestimated due to the dependent samples issue? In this case I would recommend reporting the corrected correlation values, rather than the raw correlation values.

      6) The correlation performed in Figure 4D is a bit unclear to me. Are the different dots+lines participants, or is this a binned correlation? If it is a binned correlation, does that represent a problem for the correlation analysis?

      7) It would be useful in Figure 1/2 to show some examples of ECOG time-scales related to the actual underlying signals and PSDs, rather than just illustrating the technique on simulated data, so that the validity of the technique can be judged.

      8) In general it would be useful to report carefully the N's and the dataset that is used for each analysis, because it is easy to get lost in what is what as the authors analyze a huge number of datasets.

      9) The technique of removing spatial autocorrelations that influence the p-value appears to be sophisticated and well done. In case this analysis poses problems with other reviewers, I would recommend using a cross-validation prediction approach where a subset of subjects is used for training and the other subjects are used for testing.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary

      Gao et al. analyze how brain-wide timescales of ECoG signals vary across the cortical hierarchy and relate these timescales to several other aspects of structure, behavior and function. They report the following main findings: 1) Timescales increase with the cortical hierarchy. 2) Time-scales, after regressing out the hierarchical T1w/T2w structure variable, correlate significantly with several genes related to synaptic receptors and ion channels. 3) Time-scales increase with working memory task vs. baseline, and predict working memory performance across subjects. 4) Time-scales decrease with aging, in a region-specific way. These findings are a significant advance in comparison to previous work by considering brain-wide hierarchy at a high spatial and temporal resolution and relating them to behaviour and genetics.

    1. Reviewer #3

      The work by Barros et al. looks at the role of the Ribosome Quality Control pathway (RQC) in regulating the expression of endogenous messages containing polybasic sequences. Using ribosome profiling and western blotting, the authors show that proteins containing various types of polybasic sequences are not targeted by the RQC. The authors argue that one of the few endogenous RQC substrate, RQC1, is not regulated via the canonical RQC pathway, but by a Ltn1p-dependent post transcriptional mechanism.

      The question of whether there are endogenous RQC substrates has previously been explored. With the exception of the few identified substrates, such as RQC1 (Brandman et al, 2012) and SDD1 (Matsuo et al., 2020), these studies largely concluded the RQC has a minimal regulatory role for endogenous messages, and is most likely protecting cells from damage and environmental stressors. This idea is further supported by the observation that the RQC is non-essential under standard growth condition, but becomes synthetic lethal with translation inhibitors (Kostova et al, 2017, Choe et al, 2016). The work by Barros et al. comes to the same conclusions, and therefore it is unclear how this work contributes to the already established role of the RQC.

      The authors also explore the regulation of RQC1 by the RQC and argue that this gene is regulated by Ltn1p in an RQC-independent way. However, mechanistic understanding of the proposed regulation is lacking, and the data are largely inconsistent with the previously published observations by Brandman et al, 2012.

      Major points:

      1) The authors use the dataset published by Pop et al., 2014 for their 27-29 nt no drug ribosome profiling analysis. However, these no-drug samples have been reported to exhibit surprising heterogeneity, and similarities with CHX-pretreated samples (see Hussmann et al., 2015 for detailed analysis). It is unclear how this heterogeneity can affect the analysis in the current manuscript, and whether the authors were aware of these caveats. Have the authors used independent datasets to confirm their observations? Have they excluded replicas that show CHX-like characteristics, such as A-site occupancy bias similar to CHX pretreated samples?

      2) It is not clear what the purpose of the analysis presented in Fig 2 is, and how it is different from the modeling in the Park and Subramaniam 2019 paper? Are the authors using these parameters (TE, Kozak score, etc.) to show adaptations that minimize ribosome collisions?

      3) Fig 3 - some of the selected examples (Dbp3, Yro2, Nop58) lack sufficient coverage in the region of interested highlighted in the right column for the short and/or long footprints. Since the data are insufficient to make conclusions about ribosome stalling and queuing, these examples should be excluded from the analysis.

      4) Fig 4:

      -Does ASC1 deletion cause frameshifting? Since the TAP-tag is C-terminal, it is possible that it is now out of frame, and therefore undetectable. Is it possible for the authors to introduce the tag on the N-terminus, and follow simultaneously the stalled nascent polypeptide (upon LTN1 deletion), and the full length protein?

      -Is the putative stalling site of Dbp3 too close to the stat codon to cause collisions?

      -Can the authors include a positive control, such as TAP-tagged Sdd1 to make sure their assay works and their strains and KOs behave as expected?

      5) Fig 5:

      -What is causing the inconsistency with the Brandman et al., 2012 data about RQC-dependent regulation of RQC1? In the original paper, Rqc1p has an N-terminal FLAG tag, so the authors primarily follow the stalled nascent polypeptide, whereas the current study focuses on the full length protein. Can the authors compare the same construct (FLAG-tagged Rqc1p) in their strains, so it is an "apples to apples" comparison?

      -Fig 5c bottom panel - the read coverage is too sparse to make a conclusion. This analysis should be removed.

      -5 d, e. The comparison between the GFP-12R-RFP stalling reporter and RQC2-TAP is not fair. The GFP construct reports on the fate of the stalled nascent polypeptide, whereas the RQC1-TAP looks at the full-length protein, and remains blind to the putative stalling product. Can the authors change the location of the tag, and repeat the experiment now looking at the stalled nascent polypeptide for RQC1? In addition, the signal in Fig. 5e look saturated. Is it possible that no effect is observed simply because the TAP signal is out of the dynamic range for the assay?

      Minor Comments:

      1) The introduction presents an overly simplistic view of ribosome stalling, arguing that stalling can be caused by polybasic stretches. We now know that stalling is much more complex, and there are many other factors, including the presence of non-optimal codon pairs, that cause ribosome collisions. Although the authors discuss these factors in their discussion, they should also be emphasized in the introductory paragraph.

    2. Reviewer #2

      In this manuscript, Barros et al. examine published ribosome profiling data in an effort to identify possible targets for ribosome-quality-control (RQC) process in yeast. They found that although many of the obvious mRNA features, such as polybasic sequences, appear to stall the ribosome, they in fact are not targets of RQC. The authors then went on to confirm these observations by western-blot analysis of a few candidate genes and observe that deletion of the RQC factors Ltn1 and Asc1 has no effect on the levels of the full-length protein products. The authors conclude that RQC has little to no endogenous targets in yeast. While I have no doubt about the authors' conclusions and most of their analyses, I have major issues with the originality of the manuscript.

      1) The argument that RQC has little to no endogenous targets is not new. Many groups, including the authors' one, made the same arguments before. The authors recently published a paper in the Biochemical Journal "Influence of nascent polypeptide positive charges on translation dynamics". In particular, the analysis in that paper appears similar to the one carried out here. Furthermore, the Guydosh group made similar arguments in their recent paper (Meyden and Guydosh, Mol Cell).

      2) The authors conclude their abstract by stating that "our results suggest that RQC should not be regarded as a general regulatory pathway for gene expression". To the best of my knowledge, RQC has not been regarded as such and instead the consensus has been that the process is a quality control one (as the name suggests).

      3) The authors use LTN1 and ASC1 deletions to determine whether certain sequences are RQC targets or not. But for the ltn1D, instead of looking at the stabilized shorter products, the authors only looked at the full-length one. Ltn1 has no effect on readthrough on stalling sequences. A better deletion should have been that of HEL2.

    3. Reviewer #1

      In this manuscript the authors use existing high throughput data sets and perform some new experiments to explore in yeast potential physiological substrates of RQC. In a first step, they use bioinformatics to identify genes with features previously implicated in RQC (usually with reporter assays) including inhibitory codon pairs, poly-basic stretches, and poly-A tracts. With these genes in hand, they characterized various features of "translatability", using existing ribosome profiling data sets, and concluded that with the exception of the ICPs, that there were no strong signatures indicative of reduced ribosome density that might have evolved to deal with problematic ribosome queueing. The authors then looked at the RP data at higher resolution, looking for characteristic patterns of RPF distribution around the pausing site, and found that the striking patterns seen previously for Sdd1 (and for reporter analysis in D'Orazio et al. eLife) were not recapitulated for any of the top candidates in their list. In a final set of experiments, the authors took advantage of TAP-tagged variants of their proteins of interest and asked whether deletion of Asc1 or Ltn1 impacted protein levels - and found that there were no discernible effects (though validation with TAP-tagged Sdd1 is an important missing control). Importantly, expression of full length Rqc1 (previously argued to be a direct target of the RQC) was unaffected by RQC components including Asc1, Hel2 and Rqc2, but was strongly impacted by Ltn1. These data together argue for an RQC-independent role for Ltn1 in regulating Rqc1 expression.

      Overall, the manuscript was thought provoking for consideration of what might be natural targets of RQC, and in the end, one would conclude that natural targets of RQC are not encoded in the genome, but may instead be predominantly either prematurely polyadenylated mRNA substrates that escape nuclear QC, or instead, ubiquitous damaged mRNAs in the cell. In general, the discussion of the analysis of RP data indicated naivete about the identity of different RPF sizes and their relevance to mechanism (this could be corrected easily in a revised version). In the end, this manuscript brings important questions to the table, and provides some reasonable evidence to suggest that natural poly-basic stretches, including the one found in Rqc1, are not targets for the RQC under normal conditions. Moreover, the data support a non-canonical role for Ltn1 in regulating expression of Rqc1 which needs to be more fully explored. Importantly, however, what is critical to support the negative results surrounding Rqc1 is a demonstration of a role for RQC for Sdd1, around which the narrative is constructed (this gene exhibits characteristics by RP of being a target and is reported previously to be impacted by the relevant genes Asc1 etc.).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 4 of the manuscript.

      Summary:

      There were substantial concerns about the novelty of the study, the choice of RP libraries, coverage depth, and analysis of the ribosome profiling data. Previous studies have argued that there are very few endogenous targets (so far Rqc1 and Sdd1) of the RQC pathway, and that this is rather a QC pathway for damaged mRNAs. While we appreciate that your studies were inconsistent with these earlier studies, it will be critical for you to replicate those experiments, using protein tags that allow you to follow the fate of both the full length and truncated species. Additionally, it will be important to validate using your own approaches and reagents that Sdd1 is indeed a substrate for RQC, given that your data suggest that Rqc1 itself is not. Finally, the novel Ltn1-dependent, RQC-independent pathway proposed to regulate Rqc1 expression requires further mechanistic work.

    1. Reviewer #3

      Three distinct amyloid-based cell-death pathways in fungi have been reported. The authors of the current manuscript extend their previous work of the HELLP/SBP/PNT1 pathway in Chaetomium globosum and describe a similar system in P. anserina. It is shown that the amyloid signaling domain of PTN1 can form a prion in cells deleted of HELLP, which is otherwise activated by the prion to cause cell death. Using this artificial system, the authors test whether the related RHIM motif of the human RIP1 and RIP3 protein can also form a prion in P. anserina and whether RHIM amyloids as well as other fungal amyloid-forming motifs can cross-seed PTN1.

      The experiments are well executed and explained but I have a few suggestions:

      1) Amyloid cross seeding is usually assayed in vitro using purified protein fragments. The artificial genetic system used here is certainly clever but the expression level of different proteins needs to be measured for better comparison of cross-seeding efficiencies.

      2) Page 16, line 333-334 and Fig 8: How were recipient strains sampled? How random was it? How many samples?

      3) Jargons/abbreviations. Page 19, line 405; Page 20, line 429: What are PAMPs, MAMPs, and PCD?

    2. Reviewer #2

      This work reports the discovery of an amyloid-based cell death signaling pathway in the filamentous fungus, Podospora anserina. This makes the third such pathway in this fungus. As for the others, the amyloid in this case has prion-like activity, is selectively nucleated by a cognate innate immunity sensor protein, and results in activation of the membrane-disrupting activity of the protein. They show that all three pathways operate orthogonally - that is without cross-seeding. In contrast, cross-seeding did occur between this pathway and the putatively homologous human necroptosis pathway when it is reconstituted in P. anserina, which further supports an evolutionary relationship between them.

      Substantive concerns:

      1) The novelty of this finding is somewhat dampened by this group's prior demonstration of several of the major points of interest in previous papers. They had previously discovered and characterized the homologous pathway in a different fungus, and suggested an evolutionary link between fungal amyloid signalosomes and mammalian necroptosis using strong bioinformatic and structural evidence. In addition, they had shown that the two previously known amyloid signaling pathways in P. anserina operated orthogonally. Hence the major point of novelty, as reflected in the title, is the demonstration that this particular amyloid pathway can cross-seed the human necroptosis amyloids.

      2) Implications of "cross-seeding". The interspecific cross-seeding observed was modest; much lower than that for intraspecific templating between proteins of the same pathway. Specifically, it failed to induce a barrage, the puncta formed at different times, and colocalization was incomplete. More importantly, cross-seeding does not imply functional or evolutionary conservation. Consider the wide range of amyloid proteins that have been reported to cross-seed each other despite in some cases very different sequences, structures, and functions - for example the type-II diabetes peptide IAPP with the Alzheimer's peptide Aβ; the yeast prion protein Rnq1 with human Huntingtin; and the yeast prion Sup35 with human transthyretin. Although a direct comparison with the present data are not possible, these cross-seeding interactions appear comparably robust. The present demonstration of limited cross-seeding therefore seems not to add much additional support for an evolutionary relationship between necroptosis and fungal amyloid cell-death pathways.

      3) Rigor of the fusion experiments. In all cases, despite having generated and validated the use of RFP- and GFP-labeled proteins, all fusion experiments to examine cell death microscopically (using Evans Blue staining) were between two GFP-expressing strains. This is frustrating because it makes it impossible to know from the images alone which of the two proteins is expressed in which cells, and in which cases of mycelia crossing paths is fusion occurring. I must therefore rely entirely on the labels provided, but they sometimes appear implausible. For example, the lower fusion event demarcated in Fig. 3C left panel would have been expected to allow GFP levels to equilibrate across the point of contact; instead there remains a sharp transition in GFP intensity between the two mycelia (third panel) indicating the cytoplasm is not being shared at the time of the image. In Fig. S8 top row, there is no apparent relationship between cell death and HELLP-GFP; moreover, cell death is seen occurring in mycelia containing either punctate or diffuse GFP-RIP3. While I appreciate that Evans Blue fluorescence may overlap with that of RFP (which should be stated) and preclude its visualization without multispectral imaging capabilities that may not be available to the authors, alternative viability stains and fluorescent proteins could in principle have been used to avoid this problem.

      Minor Comments:

      1) The significance of these proteins forming "prions", as opposed to (merely) amyloids, should be articulated. This is important because prion-formation per se is irrelevant to the cell-level functions of the proteins, as nucleation of the amyloid state causes cell death and hence precludes their persistent/heritable propagation. Amyloid by nature is self-perpetuating at the molecular level and hence would seem to explain the properties of the protein. The discussion about possible exaptation of these pathways for allorecognition could be expanded or clarified in this regard.

      2) Colocalization between two proteins does not imply that one has templated the other to form amyloid, even when both are capable of forming amyloid independently (see https://doi.org/10.1073/pnas.0611158104 ).

      3) Statements of partial cross-seeding are supported by quantitation (Fig. 8). In contrast, the authors appear to use qualitative observations to support rather definitive statements about the "total absence of" (line 344) of cross-seeding between other pathways.

      4) Fig. S9. "Note that induction of [Rhim] in transformants leads to growth alteration to varying extent ranging from sublethal phenotype to more or less stunted growth." Can the authors suggest an explanation for this heterogeneity? From my limited perspective, it suggests the existence of amyloid polymorphisms (i.e. a prion strain phenomenon), which is quite unexpected given the lack of polymorphism among known functional amyloids in contrast to rampant polymorphism among pathological amyloids. Hence the phenomenon could be interpreted as suggesting that amyloid is not an evolved/functional state for the PP motif. In any case the phenomenon is interesting and merits further discussion.

    3. Reviewer #1

      Bardin and colleagues identify and characterize a third prion system in P. anserina based on the PNT1/HELLP NLR-based signalosome based on the amyloid signaling motif PP from Chaetomium globosum. The C-terminal domain of HELLP is shown to exist in either soluble or aggregated states based on fluorescence microscopy of tagged protein in vivo, termed the [pi] state, and to form amyloid in vitro. These distinct states can be propagated independently and induce conversion of full-length HELLP upon cytoplasmic mixing, which leads to cell death. The PNT1 N-terminal domain also forms foci in vivo and can seed conversion of HELLP, also leading to cell death. The C-terminal domain of C. globosum HELLP and the RHIM regions of mammalian RIP1 and RIP3, which both contain PP motifs, can cross-seed HELLP conversion to the aggregated state but the other known P. anserina prions [Het-s] and [phi] are unable to do so.

      Support for the model proposed is generally qualitative in nature, with multiple instances of data described but not presented, including the timing of conversion to the aggregated state, revision of the aggregated state in meiotic progeny, the frequencies of conversion and co-localization, and the correlations between growth and prion phenotype. For the data presented, replicates, frequency of observations, and variability are not reported. In addition, a mechanism is proposed to explain the toxicity associated with HELLP conversion to the aggregated state - membrane localization - but this model is not supported by robust data such as a marker for the membrane in the fluorescence images or a biochemical fractionation. Moreover, the absence of functional data, such as mutations that disrupt amyloid formation, leave the model with correlative observations to support it. Finally, observations on the C. globosum system decrease the novelty of the observations.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary

      Bardin and colleagues identify and characterize a third prion system in P. anserina based on a cognate innate immunity signalosome comprised of PNT1/HELLP. The authors demonstrate that the three prion pathways operate orthogonally without cross-seeding; however, the newly identified PNT1/HELLP prion can be cross-seeded by the putatively homologous human necroptosis pathway when it is reconstituted in P. anserina, which further supports an evolutionary relationship between them. The review has identified substantive concerns, which limit the novelty of the work and would require significant new studies to address the mechanistic gaps. These concerns include prior work revealing several major tenets including prion activity for PNT1/HELLP in C. globosum and evolutionary conservation to the mammalian necroptosis pathway and the absence for robust experimental support for cross-seeding, or the absence thereof, membrane disruption as the cause of incompatibility, and for the relationship among toxicity, growth, protein state, and protein interaction. Concerns were also raised about the data presented, or absent, in terms of replicates, frequency of observations, and variability.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Point-by-point response to reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): **

      The authors constructed a virtually complete fitness landscape of the P1 extension region (4-base-paired helix) in the group I intron from Tetrahymena thermophila, using a kanamycin resistance reporter to evaluate the fold-change in fitness, which is related to self-splicing activity. This was a clever choice of system because it was known from earlier work that the P1 extension adopts two different conformations during self-splicing. The fitness of each variant was determined from the number of reads acquired from the sequencing data sets and analyzed through an extensive computational pipeline. The strength of the paper is that this machine learning approach can be used to calculate how individual variants contribute to the fitness landscape and assess the directions of epistasis across a large number of identified genotypes.

      We thank the reviewer for highlighting one of the key strengths of our manuscript, the fact that our analytical approach, using SHAP values, enables contributions of individual variants to be assessed in a genotype-specific manner. This approach provides for a sound, robust, and principled way of describing and understanding the fitness impact of one mutation in the context of (potentially many) others.

      The authors argue that machine learning more successfully models subtle effects that arise from interactions between RNA residues, and that the power to analyze deep mutational sequencing experiments can better rationalize fitness constraints arising from multiple conformational states.

      We do indeed argue that machine learning is likely to play an increasing role in making sense of deep mutational scanning data. These scans provide high-resolution information on how fitness maps onto genotype, but the molecular underpinnings of this relationship often remain obscure. It is these “hidden” underpinnings, including the effects of specific mutations on RNA/protein folding, structures, and dynamics, that machine learning approaches can help elucidate.

      The results are mostly consistent with previous studies even though the authors collected the data in a more advanced and complicated way. They are also able to rationalize complex phenotypes - for example, the observed fitness defects are more prevalent under an unfavorable growth condition (30ºC), because the lower temperature hinders conformational exchange. Although such cold sensitive effects are well known in RNA, it is gratifying that this can be captured in the fitness landscape.

      Finding temperature-related fitness effects that are consistent with impaired conformational exchange was also gratifying for us and we thank the reviewer for highlighting this finding.

      The results would be more convincing if the authors directly measure the self-splicing activity of a few key variants, such as the C2C21 mutant, to determine whether these mutations alter the self-splicing mechanism of the Tte-119(C20A) master sequence in the way that they infer from their model. In interpreting their results, they may want to consider misfolding of the intron core (coupled to base pairing of P1) and reverse self-splicing. Reversibility in the hairpin ribozyme, for example, turned out to be the key for understanding the effects of certain mutations.

      We appreciate that measurements of splicing activity for individual genotypes would complement and further strengthen our study. We will therefore aim to construct strains for a few key genotypes and assay self-splicing activity using RT-qPCR – an approach we previously used successfully to monitor splicing kinetics of self-splicing introns in yeast mitochondria (see Rudan et al. 2018 eLife 7:e35330). Specifically, we will quantify the fraction of spliced and unspliced transcripts using primers that span the exon-exon and the 3’ exon-intron junction, respectively (the 5’ intron-exon junction is genotypically diverse and would require genotype-specific primers). This will be done under non-selective (-kan) conditions, where the relative fraction of spliced and unspliced transcripts is a function of intrinsic splicing ability and not confounded by selection. We aim to include the master sequence, C2C21, G3C20 and its mirror genotype C3G20, U3 (which restores perfect complementarity in the master sequence), and G5 (inferred from the high-throughput experiment to make a strong negative contribution to fitness).

      In interpreting our results, we will consider different mechanisms of splicing failure, such as kinetic problems (slow dissociation of P1ex), misfolding of the intron core, reverse self-splicing, and the use of cryptic splice sites, which has previously been documented (see e.g. Woodson & Cech 1991 Biochemistry 30:2042-2050). We note, however that a precise mechanistic dissection of the splicing defects of individual variants is not the purpose of this manuscript and we therefore do not aim to establish genotype-specific defects in great molecular detail.

      Related to the point above, interesting conclusions regarding the relationships between base identity and epistasis that arise from metastability should be strengthened with additional examples. For example, the authors can explain why a reverse base-pairing variant (C3G20) exhibits negative epistasis but is not similar to that of the G3C20 construct. This would ideally use the data from the screen but also be validated by checking the self-splicing activity of a few individuals at low and high temperature.

      In measuring splicing activity and its link to fitness for a subset of key variants (see point #4), we will include at least one mirror example such as C3G20/G3C20. In addition, we will highlight additional examples of this mirror asymmetry based on the results from our high-throughput screen.

      They should validate the screen by showing that kanamycin resistance does indeed correlate strictly with self-splicing activity, and not some other feature such as RNA turnover. (It would also not be a bad idea to check this in the cell, which can be done by primer extension or Northern blotting.)

      This question (i.e. whether altered RNA stability rather than splicing efficiency explains differential KNT production and ultimately fitness) has previously been addressed by Guo & Cech (2002) when introducing the knt+intron reporter system. These authors found no difference in mRNA stability in constructs that displayed differential kanamycin resistance. To shore up this conclusion further, we will measure fitness (via colony counts, growth rate or more directly through competitive fitness assays) of the key variants for which we determine splicing activity (see point #4) and then correlate splicing and fitness.

      The benefit of the machine learning model is that it can extract signals that may be hard to detect otherwise. The downside is that it doesn't produce a physical model, as far as I am aware. The parameters are themselves not meaningful - except to the degree that trends in the fitness estimates can be explained after the fact. This is something that should ideally be explained more directly in the manuscript.

      The reviewer raises an interesting point, that indeed deserves further discussion/explanation. The reviewer is right that, at first sight, high-resolution fitness landscapes like ours do not directly produce a physical (structural) model of the molecule under investigation. They connect genotype and fitness, but the molecular intermediate – a biophysical structure – is not explicit. However, over the last few years, it has become apparent that deep mutational scanning experiments can – both in principle and in practice – yield information that can be leveraged to infer such a physical model. In short, covariation in fitness between residues in a protein or bases in an RNA can be used as inputs for constraint-based modelling of physical interactions. Notably, Schmiedel & Lehner (2019, Nature Genetics 51: 1177-1186) recently demonstrated that deep mutational scanning data can be used in this manner to reconstruct secondary and tertiary protein structure with high accuracy. In principle, the same approach can be used to reconstruct RNA structures. This will require more extensive, molecule-wide fitness data, but our study points towards just this future, even for data collected from structural ensembles.

      When we stated in the original manuscript that deconvolution of the fitness landscape might help to reverse engineer structures, this ability to interpolate between genotype and fitness to reveal hidden biophysical/structural relationships is what we refer to. We will revise the manuscript to make this connection more explicit.

      The authors claim that by evaluating a large number of sequences at two conditions, they can capture variants with intermediate phenotypes (Fig. 1). This is not necessarily true. If the original screen allows only the most active variants to survive on kan+ medium, then the signature of intermediate phenotypes may not be encoded in the original data, and thus not retrievable even with sophisticated algorithms, which may also be prone to overfitting. At what limit of stringency will the screen fail to yield information about intermediate fitness? How deeply must one sequence to recover this information, especially if noisy or degraded? Some discussion of these effects would be helpful.

      The capacity of any high-throughput sequencing-based DMS experiment to resolve intermediate phenotypes does indeed depend on a number of things. The reviewer highlights two of these: First, in screens where the phenotype is not binary (dead/alive) but fitness can be measured on a continuous scale, can we – and do we – capture phenotypes with intermediate fitness? What if only the fittest/most active variants survive? This is, ultimately, an empirical question, and one we can answer quite definitively: we do observe a large range of intermediate phenotypes, which – in our study – correspond to intermediate fold-change values. For each genotype, we can provide confidence limits and assess statistical significance. Table S1 provides this information. Our capacity to resolve these intermediate phenotypes is mainly based on three things. One is adequate sequencing depth, as highlighted by the reviewer. The second is the number of biological replicates (N=6) we analyse, which allows us to differentiate biological variability from noise for a large number of genotypes. This is an important aspect of DMS experiments that has often been overlooked (i.e. there are many other studies where only a single replicate is analysed and biological heterogeneity is not taken into account). With six replicates in hand, we can directly estimate variability (as done e.g. in our DESeq2 analysis) and quantify uncertainty so as to guard against overfitting. In our view, this is arguably more important than sequencing depth in deriving appropriate fitness estimates. Finally, we can resolve intermediate phenotypes because we keep the time lag between initial exposure to kanamycin and assaying genotype frequencies relatively short (overnight growth, see Methods). Our experiment is effectively a multi-genotype competition experiment, and we provide a snapshot across the genotype pool at a given time. If we had measured after several days of culture, genotypes with greater relative fitness would have spread further through the population, at the cost of less fit genotypes, many of which would likely have been eliminated. We kept measurement lag relatively short on purpose so that we could see a clear differential response to kanamycin while still being able to catch more than just a handful of the very fittest genotypes.

      In light of the above, it will be apparent that there are no simple answers to the reviewer’s questions about required sequencing depth, levels of stringency, etc. The ability to assign differential fitness across a large population of genotypes hinges on multiple interrelated considerations (sequencing depth, complexity of the final & starting pool, number of replicates). In revising the manuscript, we will highlight some of the key considerations just discussed, bearing in mind that the manuscript cannot possibly discuss all possible pitfalls and requirements of deep mutational scanning experiments in great detail.

      Lastly, the evolvability of RNA is fascinating and there is much to learn. However, the authors don't discuss the implications of their findings for molecular evolution although they throw the term around. It would be exciting if there is a trend in the fitness landscape that could help explain the trajectory of RNA evolution in nature.

      We agree with the reviewer that it would be exciting to link deep mutational scanning results more closely with observable patterns of RNA evolution. This is true both in relation to evolution of P1ex/group I introns specifically and evolution of dynamic RNA structures more generally. Regarding the latter, we note that selection against excess stability has previously been inferred for 5’ UTRs (see e.g. Gu et al. 2010 PLoS Comp Biol 6: e1000664), although our case is slightly different in that a helix still needs to form but be sufficiently unstable to enable swift dissociation. We also note that riboswitches might make for an excellent subject to study asymmetric constraint and selection against excess stability as they involve formation of competing helices (including participation of some but not all nucleotides in more than one helix), their structure/function is well understood, and many examples are known, providing opportunities for evolutionary analysis. We consider this outside the scope of the current study. We will, however, seek to analyse patterns of evolution in P1ex to establish whether they correspond in a meaningful way to the fitness trends we observe in the laboratory. To do so, we will analyse the distribution and evolutionary history of variants across orthologous introns in different Tetrahymena species/strains, with a focus on P1ex, P10 and the surrounding sequence. Fortunately for us, the 23S ribosomal RNA gene in which the intron is embedded has been used as a phylogenetic marker so that intron/exon sequence information is available for a reasonable number of species/strains (see Doerder 2018 J Eukaryot Microbiol 66:182-208). We will generate an alignment of these sequences and ask, for example, whether N2-N5 are subject to different constraints than N18-N21 mirroring our experimental findings. We have previously successfully quantified patterns of variation surrounding self-splicing introns in yeast mitochondria (Repar & Warnecke 2017 Genetics 205:1641-1648). Note here that extending this analysis beyond Tetrahymena is problematic. Specifically, the intron is absent from close relatives of Tetrahymena (Doerder 2018 J Eukaryot Microbiol 66:182-208) and P1-proximal structures of distant relatives are quite variable. In addition, we are looking at intronic regions that are not only adjacent to but also directly interact with exonic sequence. The exonic context in which the intron is embedded therefore matters but will be quite different for more distant group I introns. We therefore think that aligning and comparing distant orthologs has limited merit.

      The authors use the abbreviation DMS for deep mutational scanning; the RNA structure field uses the reagent dimethylsulfate that is also abbreviated DMS. They may want to choose a different acronym or just avoid an acronym altogether.

      We appreciate this point about false-friend acronyms. We will either find a different acronym or avoid it altogether.

      Reviewer #1 (Significance (Required)):

      As the importance of RNA structure for gene expression becomes more widely appreciated, interest in understanding the evolution of RNA structures is also increasing. Compared with the molecular evolution of proteins, evolution and fitness in RNA is far less understood, although the authors appropriately point to a number of recent studies on this topic. The main advance here is to use machine learning methods to analyze the results of a large genotypic screen, with the goal of more accurately capturing the fitness effects of sequences at varied distances from the parental sequence. The specific conclusions reached here such as the importance of metastability or the prominence of cold sensitive effects are not revolutionary, but the authors illustrate how such phenomena can be investigated more systematically and in more depth.

      We thank the reviewer for highlighting that our analytical approach showcases how deep mutational scanning data can be analysed in an unbiased and systematic manner to better understand the relationship between genotype, molecular phenotype (e.g. structure), and fitness. The reviewer also rightly points to specific results we obtain regarding temperature-related effects and metastability of P1ex/P10. However, we believe that the most important contribution of this work is a more general one, namely our proof-of-principle demonstration that deep mutational scanning data can capture multiple conformational states simultaneously, and that these states can be deconvoluted from a single fitness landscape to attribute the fitness impact of individual mutations to specific RNA conformations. To our knowledge this had not been explicitly demonstrated before and our work provides an important cornerstone for future studies looking to interpret mutational effects in either RNAs or proteins in the light of dynamic structures.

      In light of comments by reviewer #2 below, it is worth reiterating the proof-of-principle nature of this study. Many of the specific results we obtain (e.g. importance of avoiding excess stability in P1ex) are not revolutionary. Indeed, we would be worried if they were. We chose to investigate P1ex because substantial prior work exists that has furnished us with solid positive controls. This independent prior validation allows us to both have great confidence in the data we generate and demonstrate cogently that the two conformational states at the beginning and end of the splicing reaction are captured in the data.

      Finally, we believe our work, in covering a virtually complete genotype space, using multiple replicates to quantify uncertainty in fitness estimates, and using SHAP scores to interpret variant effects in genotype-specific context, sets a new high bar for this type of study and will provide valuable reference data and analytical recipes for future analyses. **

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Soo et al probes the effect of mutations on the fitness of the Tetrahymena Group I self-splicing intron. They used high-throughput sequencing to simultaneously identify the effect of every possible sequence in a 4-bp helix. The approach is sound and the conclusions are generally supported. However, the analysis seems overly complicated given the dataset. Both the analysis and the accompanying writing make it difficult to understand what seems to be a fairly clear conclusion - that the relative stabilities of two alternative RNA helices are important for splicing.

      We thank the reviewer for testifying to the validity of our approach and the soundness of our conclusions. Regarding the complexity of the analysis, the reviewer is right in that – for the conclusion that the relative stabilities of two alternative helices are important for fitness – a simpler analysis would have sufficed. However, as elaborated in response to point #11 above, our objective here is not merely to draw specific conclusions about the relative stabilities of P1ex and P10, but more general: a) to demonstrate that a single fitness landscape can be deconvoluted to implicate multiple conformations in fitness defects and b) to provide a basic but powerful recipe for doing so in an unbiased, systematic manner using machine learning.

      We will strive to make the writing clearer so that readers can follow this reasoning and appreciate our analytical choices.

      • **Major Comments** *

      The authors state that this method can identify the impact of transient conformational states. However, the two conformational states in this study are not transient - in fact they are associated with two distinct chemical steps of splicing and are quite stable. It may be that the effect of important transient states would be observed, but this study does not demonstrate that.

      We used the word “transient” to describe two alternative RNA structures formed during the life cycle of the intron. Both states (characterized by P1ex and P10 formation) are transient in as much as they disappear as splicing proceeds. In retrospect, we agree with the reviewer that this usage is too loose (given how the term is generally used in the literature) and might evoke the wrong connotations. We will therefore revise the manuscript to eliminate references to P1ex and P10 as transient states, but rather describe them as alternative conformations. Of course, the general point remains true: that deep mutational scanning data should in principle capture all fitness-relevant structural states even if these are transient (in the strict sense of the word).

      "Fitness" ends up being on an arbitrary scale, which impairs some analysis. A similar high-throughput sequencing pipeline could have been used to directly monitor splicing of every mutant, though at this point that is outside the scope of this study. Even with the arbitrary units, it would be clearer if more time were spent comparing fitness to base-pair stability on an individual basis, rather than the broad analyses. (See minor comments for details.)

      The reviewer is right in saying that a high-throughput pipeline could have been designed to monitor splicing of each genotype directly (rather than assaying fitness of the cell population that represents a particular genotype).We chose not to do so. One reason for this is that monitoring splicing directly would have necessitated design of a more complicated assay. This is because, to monitor splicing efficiency, one would have to monitor both pre-mRNA and mRNA for different genotypes. The former is straightforward (using primers that span the exon-intron junction) but the latter is not: successful splicing removes the genotype-specific information from the mRNA (that information being solely encoded in the intron). This a solvable problem in principle. One might, for example, introduce barcodes of sufficient complexity in the mRNA that can be linked back to the intron genotype, but doing so would have introduced a further source of error and complicated analysis. We therefore opted for monitoring genotypic fitness by sequencing the plasmids from which the RNAs originate. This does mean that our measurements of fitness are not coupled to a specific molecular phenotype (such as splicing efficiency) – we presume (but are not entirely sure) this is what the reviewer refers to when talking about fitness being on an “arbitrary scale”. However, fitness derived in this manner has the advantage of providing information that does not start from a mechanistic preconception. We ask how variant affects survival and reproduction of the cell without presuming specific mechanism and the results can therefore capture any mechanism, including those that we did not consider initially. The challenge then becomes to tease out possibly multiple mechanisms from unbiased data.

      We will tackle the reviewer’s final comment, regarding analysis of base-pair stability, below in response to one of the minor comments (point #20).

      \*Minor Comments** *

      The sentence in the abstract beginning "Using an in vivo report system..." is very difficult to comprehend. This is due both to the length of the sentence and the word usage. The final sentence of the abstract is similarly difficult. In general, the writing overemphasizes complexity at the cost of clarity.

      We will revise the entire manuscript to make the writing both clearer and more concise.

      Analysis of results in terms of "epistasis" obscures what could be a straightforward observation. This is the same as saying that mutants are not independent, or that their energetic costs are not additive. This follows obviously from the observation that the nucleotides being mutated are base-paired.

      Making explicit reference to “epistasis” is a considered choice. Framing results in terms of epistasis might be less familiar to readers grounded in RNA or protein biophysics/biochemistry, but is very much at the heart of thinking about the genotype-phenotype relationship from an evolutionary perspective, where global descriptions of epistasis are commonplace and usually provide the starting point for thinking about genotype-phenotype relationships, evolution and evolvability. So what seems unnecessarily obscure when seen through the lens of one field, is natural when considered in the context of another. Importantly, it is also the central approach adopted by many if not most prior deep mutational scanning studies (see e.g. Hayden et al. 2011; Pressman et al. 2019; Zhang et al. 2009; Li et al. 2016; Puchta et al. 2016; Domingo et al. 2018; Li and Zhang 2018; Weinreich et al. 2013; Lalić and Elena 2015; Bendixsen et al. 2017 as cited on page 3 of the manuscript) so we think this framing is helpful to compare our results to prior work.

      We expect that the readership will include many researchers interest in mapping genotype-phenotype-fitness relationships who will expect to see global analyses and descriptors of the type we present. We will, however, revise the manuscript to ensure that our description of the findings remains accessible to readers from other fields.

      More specifically, we also note that the fact that mutations are not independent (i.e. epistasis exists) might be trivial from the fact that P1ex is a base-paired helix. The magnitude and direction (“sign”) of epistasis, however, are not. In fact, as we describe, contrary to prior DMS on RNA helices, we find a lot of positive epistasis, reflecting, as we argue, selection against excess stability of P1ex to allow subsequent formation of P10.

      The novel information is the sensitivity of fitness to base pairing. This is best shown in an analysis like Figure 3A (see below), not broad measures of epistasis.

      Please see responses to points #11, #12, and #16 above for an elaboration of what we consider to be the main merits of this study and why providing broad measures of epistasis is a sensible choice.

      Figure 1C isn't necessary for the reader to understand the process.

      We are happy to follow editorial guidance as to whether this panel is superfluous and should be removed or is worth including.

      It is unclear what figure 2C is showing. It appears that the replicates are similar to each other, that 30 deg C and 37 deg C are also similar, but that +/- Kan are different. This probably doesn't need a figure in the main text.

      This figure does indeed capture what the reviewer describes: genotype pools in +/-kan are least similar to each other, while 30/37ºC are similar but distinct in the +kan condition and effectively indistinguishable in the -kan condition, in line with expectations. We agree with the reviewer that this information per se is something that would typically be found in a supplementary figure. However, we would advocate for retention of this panel in the main manuscript in this instance because of the way in which it was derived: using the Bray-Curtis dissimilarity index. To our knowledge, this is the first time that Bray-Curtis dissimilarity has been used to quantify, in a principled way, the similarity between genotype pools. Borrowed from the ecology literature, the index captures both richness (number of different species/genotypes in the ecosystem/genotype pool) and relative abundance to provide an integrated measure of genotype diversity. We believe that this measure will be useful for future studies and rather than relegating the figure to the supplement, we would aim to briefly highlight its methodological novelty. *

      *

      Figure 3A could be the most informative part of the manuscript. However, predicted minimum free energy should be on the x-axis as the independent variable. The expectation then is that you would see a peak in fitness at some free energy, with fitness falling off both with increased and decreased stability. Furthermore, there should be more analysis along these lines. The authors should calculate helical stability for both P1ex and P10 for every mutant and compare with fitness. Mutations which affect both could also be separated out. Figure 4C comes the closest to this but views it only in terms of GC pairs; there is no reason not to quantify the energetic effects given that predictions of stability for helices is quite good. Deviations from a model invoking only helical stabilities would indicate another factor is involved (alternative base-pairing or tertiary structure, for example).

      We agree with the reviewer that the axes in Figure 3A should be flipped and we will do so in the revised manuscript. We also agree that, when it comes to helical stability of P1ex, the simple expectation would be to see a peak at a certain stability with drop-offs either side, as intimated by Figure 4C. We further agree with the reviewer that Figure 4C is rather indirect and can be made more quantitative by considering helical stability across all genotypes directly. To this end, we will use one of the many tools available that allow prediction of helical stability from primary sequence (e.g. the enf2 function in RNAStructure, as used by Torgerson et al 2018 RNA, see point #24 below) and replace Figure 4C with a more quantitative fitness landscape based on these computations. To provide added confidence in the computations of helical stabilities from primary sequence in the context of our structure, we will also calculate helical stabilities from molecular dynamics simulations for the subset of genotypes we considered previously (Figure 4E/F) and see how inferred stabilities compare.

      There appears to be a missing verb in the legend for figure 3A, second sentence.

      We will fix this error.

      Figure S5 appears to be redundant with Figure 1.

      At first glance, Figure S5 does indeed appear redundant with Figure 1 but it is not. Figure S5 shows the relevant sequence of the group I intron and bordering exons in its native context, i.e. when embedded in the 23S ribosomal RNA gene of Tetrahymena thermophila, whereas Figure 1 shows the genotype of the mutant intron embedded in knt. The sequences are different. We will revise the legend to Figure S5 to make this clearer.

      Figure S6 is a better analysis than what appears in the main text, and could be expanded to all base pairs.

      We will expand Figure S6 to include all base pairs as suggested. We disagree that this is a better analysis compared to what appears in the main text. Rather, it provides a complementary, hypothesis-driven view whereas the analysis in the main text is more systematic and unbiased in approach. *

      *

      Reviewer #2 (Significance (Required)):

      This manuscript largely focuses on the technical approach. The shift in analytic strategy described above would increase the conceptual impact. The conclusions are consistent with and fit in with recent uses of high-throughput sequencing to study RNA systems. For example Pitt & Ferré-D'Amaré, Science (2010) and Kobari et al, NAR (2015) describe fitness landscapes of the ligase and HDV ribozymes, respectively. Torgerson et al RNA (2018) make similar measurements on the glycine riboswitch, including a treatment of relative helix stability for two mutually exclusive conformations. The overall results are of interest to researchers in the field of noncoding RNA.

      We thank the reviewer for highlighting the paper by Torgerson et al, of which – embarrassingly – we were not aware. We will make reference to this paper in a revised manuscript and highlight that riboswitches might be a good model system to further explore asymmetric constraint and selection against excess stability in an evolutionary context (also see our response to point #9 above).

      As highlighted earlier, we think the main conceptual impact of our work lies not in the description of helical stabilities. Rather, it lies in a) providing a rigorous proof-of-principle that deep mutational scanning can capture multiple conformational states simultaneously, and b) that, using an unbiased machine learning approach, these states can be deconvoluted from a single fitness landscape to attribute the fitness impact of individual mutations to specific RNA conformations. A shift in analytical strategy to “cut to the chase” and narrowly focus on helical stability would be misguided in this context, as we seek to provide not only insights into the data at hand but also lay out a sound and general recipe for analysing similar datasets in the future.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The manuscript by Soo et al probes the effect of mutations on the fitness of the Tetrahymena Group I self-splicing intron. They used high-throughput sequencing to simultaneously identify the effect of every possible sequence in a 4-bp helix. The approach is sound and the conclusions are generally supported. However, the analysis seems overly complicated given the dataset. Both the analysis and the accompanying writing make it difficult to understand what seems to be a fairly clear conclusion - that the relative stabilities of two alternative RNA helices are important for splicing.

      Major Comments

      1.The authors state that this method can identify the impact of transient conformational states. However, the two conformational states in this study are not transient - in fact they are associated with two distinct chemical steps of splicing and are quite stable. It may be that the effect of important transient states would be observed, but this study does not demonstrate that.

      2."Fitness" ends up being on an arbitrary scale, which impairs some analysis. A similar high-throughput sequencing pipeline could have been used to directly monitor splicing of every mutant, though at this point that is outside the scope of this study. Even with the arbitrary units, it would be clearer if more time were spent comparing fitness to base-pair stability on an individual basis, rather than the broad analyses. (See minor comments for details.)

      Minor Comments

      1.The sentence in the abstract beginning "Using an in vivo report system..." is very difficult to comprehend. This is due both to the length of the sentence and the word usage. The final sentence of the abstract is similarly difficult. In general, the writing overemphasizes complexity at the cost of clarity.

      2.Analysis of results in terms of "epistasis" obscures what could be a straightforward observation. This is the same as saying that mutants are not independent, or that their energetic costs are not additive. This follows obviously from the observation that the nucleotides being mutated are base-paired. The novel information is the sensitivity of fitness to base pairing. This is best shown in an analysis like Figure 3A (see below), not broad measures of epistasis.

      3.Figure 1C isn't necessary for the reader to understand the process.

      4.It is unclear what figure 2C is showing. It appears that the replicates are similar to each other, that 30 deg C and 37 deg C are also similar, but that +/- Kan are different. This probably doesn't need a figure in the main text.

      3.Figure 3A could be the most informative part of the manuscript. However, predicted minimum free energy should be on the x-axis as the independent variable. The expectation then is that you would see a peak in fitness at some free energy, with fitness falling off both with increased and decreased stability. Furthermore, there should be more analysis along these lines. The authors should calculate helical stability for both P1ex and P10 for every mutant and compare with fitness. Mutations which affect both could also be separated out. Figure 4C comes the closest to this but views it only in terms of GC pairs; there is no reason not to quantify the energetic effects given that predictions of stability for helices is quite good. Deviations from a model invoking only helical stabilities would indicate another factor is involved (alternative base-pairing or tertiary structure, for example).

      4.There appears to be a missing verb in the legend for figure 3A, second sentence.

      5.Figure S5 appears to be redundant with Figure 1.

      6.Figure S6 is a better analysis than what appears in the main text, and could be expanded to all base pairs.

      Significance

      This manuscript largely focuses on the technical approach. The shift in analytic strategy described above would increase the conceptual impact. The conclusions are consistent with and fit in with recent uses of high-throughput sequencing to study RNA systems. For example Pitt & Ferré-D'Amaré, Science (2010) and Kobari et al, NAR (2015) describe fitness landscapes of the ligase and HDV ribozymes, respectively. Torgerson et al RNA (2018) make similar measurements on the glycine riboswitch, including a treatment of relative helix stability for two mutually exclusive conformations. The overall results are of interest to researchers in the field of noncoding RNA.

      Our expertise is in RNA biochemistry and biophysics. We are not qualified to evaluate the details of several of the computational pipelines described.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors constructed a virtually complete fitness landscape of the P1 extension region (4-base-paired helix) in the group I intron from Tetrahymena thermophila, using a kanamycin resistance reporter to evaluate the fold-change in fitness, which is related to self-splicing activity. This was a clever choice of system because it was known from earlier work that the P1 extension adopts two different conformations during self-splicing. The fitness of each variant was determined from the number of reads acquired from the sequencing data sets and analyzed through an extensive computational pipeline.

      The strength of the paper is that this machine learning approach can be used to calculate how individual variants contribute to the fitness landscape and assess the directions of epistasis across a large number of identified genotypes. The authors argue that machine learning more successfully models subtle effects that arise from interactions between RNA residues, and that the power to analyze deep mutational sequencing experiments can better rationalize fitness constraints arising from multiple conformational states. The results are mostly consistent with previous studies even though the authors collected the data in a more advanced and complicated way. They are also able to rationalize complex phenotypes - for example, the observed fitness defects are more prevalent under an unfavorable growth condition (30{degree sign}C), because the lower temperature hinders conformational exchange. Although such cold sensitive effects are well known in RNA, it is gratifying that this can be captured in the fitness landscape.

      Despite these strengths, there are several weaknesses that should ideally be addressed before publication.

      1.The results would be more convincing if the authors directly measure the self-splicing activity of a few key variants, such as the C2C21 mutant, to determine whether these mutations alter the self-splicing mechanism of the Tte-119(C20A) master sequence in the way that they infer from their model. In interpreting their results, they may want to consider misfolding of the intron core (coupled to base pairing of P1) and reverse self-splicing. Reversibility in the hairpin ribozyme, for example, turned out to be the key for understanding the effects of certain mutations.

      2.Related to the point above, interesting conclusions regarding the relationships between base identity and epistasis that arise from metastability should be strengthened with additional examples. For example, the authors can explain why a reverse base-pairing variant (C3G20) exhibits negative epistasis but is not similar to that of the G3C20 construct. This would ideally use the data from the screen but also be validated by checking the self-splicing activity of a few individuals at low and high temperature.

      3.They should validate the screen by showing that kanamycin resistance does indeed correlate strictly with self-splicing activity, and not some other feature such as RNA turnover. (It would also not be a bad idea to check this in the cell, which can be done by primer extension or Northern blotting.)

      4.The benefit of the machine learning model is that it can extract signals that may be hard to detect otherwise. The downside is that it doesn't produce a physical model, as far as I am aware. The parameters are themselves not meaningful - except to the degree that trends in the fitness estimates can be explained after the fact. This is something that should ideally be explained more directly in the manuscript.

      5.The authors claim that by evaluating a large number of sequences at two conditions, they can capture variants with intermediate phenotypes (Fig. 1). This is not necessarily true. If the original screen allows only the most active variants to survive on kan+ medium, then the signature of intermediate phenotypes may not be encoded in the original data, and thus not retrievable even with sophisticated algorithms, which may also be prone to overfitting. At what limit of stringency will the screen fail to yield information about intermediate fitness? How deeply must one sequence to recover this information, especially if noisy or degraded? Some discussion of these effects would be helpful.

      6.Lastly, the evolvability of RNA is fascinating and there is much to learn. However, the authors don't discuss the implications of their findings for molecular evolution although they throw the term around. It would be exciting if there is a trend in the fitness landscape that could help explain the trajectory of RNA evolution in nature.

      7.The authors use the abbreviation DMS for deep mutational scanning; the RNA structure field uses the reagent dimethylsulfate that is also abbreviated DMS. They may want to choose a different acronym or just avoid an acronym altogether.

      Significance

      As the importance of RNA structure for gene expression becomes more widely appreciated, interest in understanding the evolution of RNA structures is also increasing. Compared with the molecular evolution of proteins, evolution and fitness in RNA is far less understood, although the authors appropriately point to a number of recent studies on this topic. The main advance here is to use machine learning methods to analyze the results of a large genotypic screen, with the goal of more accurately capturing the fitness effects of sequences at varied distances from the parental sequence. The specific conclusions reached here such as the importance of metastability or the prominence of cold sensitive effects are not revolutionary, but the authors illustrate how such phenomena can be investigated more systematically and in more depth.

  2. Sep 2020
    1. Reviewer #3

      Introduction:

      1) For those not familiar with personality/trait constructs, harm avoidance should be defined.

      2) The authors unnecessarily make a distinction between emotion and cold cognition, or emotion and non-emotional perception. I don't think this distinction needs to be made and furthermore, the separation of emotion and cognition is a little antiquated in what we know about holistic processing of the brain.

      3) There is no mention of the amygdala or bed nucleus of the stria-terminalis in discussions of anxiety and especially in anticipation. Nor is there any mention of anticipatory or arousal components of anxiety.

      4) There are two competing points brought up in the introduction, regarding the pre-SMA: 1) that the pSMA is involved in time tracking and 2) that the pSMA is involved in threat related shock. This appears to be problematic due to the proposed hypotheses. Perhaps, the authors could adjust the hypotheses to illustrate why only time perception is a main effect hypothesis and time and anxiety are an interaction hypothesis.

      5) Hypothesis 5 is unclear, I assume the brain (neural changes) are being correlated with time estimation (behavioral index?), but it is unclear.

      Methods:

      1) Nim-Stim images need to be described in more detail in the methods and not just in the figure caption.

      2) The experimental methods specifics needs to be more clear regarding the differences in stimulus duration. This is an important distinction between the two studies and not enough details are given. It should be clearly worded and not left up to the reader to try and interpret the table.

      3) Why did the number of shocks differ between participants? That seems like a confound for the neural interpretation. The authors need to explain.

      4) There is no mention of fMRI screening for Study 1.

      5) Why a power calculation for Study 2, but not 1?

      6) The methods section is written such that the amount of explanation between the two studies needs to be resolved. They are quite different, i.e., how many total shocks in Study 1? There are inconsistencies throughout.

      7) Why are different analysis methods used to examine behavioral effects? ANOVA vs. paired sample t test? Details like this need to be explained throughout the manuscript if the authors are trying to compare two data sets.

      8) It isn't clear or mentioned that Study 1 was a pilot study for Study 2 until the neuroimaging analysis section. This needs to be explained and more detail should be included much earlier in the manuscript.

      9) For information: Siemens Skyra and Prisma scanners have built in dummy scans at the beginning of sequences to allow for equilibration.

      10) The neuroimaging methods require much more detail, i.e., SPM version used, etc., etc.

      11) ROIs description needs more detail, i.e., a 10mm sphere. 10mm what? Radius, circumference? That's a huge ROI for subcortical regions.

    2. Reviewer #2

      The manuscript "Anxiety makes time pass quicker: neural correlates" outlines an interesting and potentially important set of experiments aimed at replicating a previously reported effect of distorted time perception while under threat of electric shock while adding fMRI measurement of brain activity during the task. The manuscript has multiple strengths, in my opinion, including the use of a cleverly designed paradigm coupled with sophisticated neuroimaging methods, pre-registered predictions and analysis plan, and a potentially informative mechanistic focus. The study is also well grounded in the literature and the manuscript well written. I have some concerns, however, with the current version of the manuscript. These concerns mostly center on the strength of evidence afforded by the current design and the interpretability of the design and results. I outline these concerns, point by point below.

      1) The choice to pre-register the predictions and analysis plan is laudable. For clarity, I believe the authors should indicate, up front, what aspects of the study were pre-registered rather than simply saying that it is pre-registered.

      2) There are potentially important differences between the study pre-registration and the reported hypotheses and analysis. Sticking rigidly to the pre-registration is certainly not necessary to benefit from a pre-registration but I believe all potentially substantive deviations from the pre-registration should be identified and explained in the manuscript for transparency. For example, the specific brain regions mentioned in Prediction 2 are not consistent between the manuscript and pre-registration.

      3) In the pre-registration, I didn't see Prediction 4 (interaction of time-related and anxiety-related neural processing) but this may be attributable to inconsistent wording between the pre-registration and manuscript.

      4) The pre-registration discusses planned hypotheses and analysis involving functional connectivity but I do not see this mentioned in the manuscript.

      5) Some description of why faces (versus anything else) were used as stimuli is needed for readers to understand the task.

      6) Related to point 6 above, it is reported that the durations of stimuli were randomized but I did not see a description of randomization of the face stimuli themselves. This is needed (if I didn't just miss it).

      7) The authors indicated that the study was powered to detect the effect of threat on (I assume) behavior. I would guess that this is one of the largest effects that could be tested for in this study. In fact, the study appears underpowered to detect anything but very large effects. This could explain why many effects tested were not found (especially the interactions). I believe this should be explicitly acknowledged as a limitation for readers to be able to appropriately evaluate the strength of evidence for the claims made.

      8) Given the short ITIs in the task, perhaps the effects attributed to anxiety caused by threat of shock are in actuality effects due to continued processing of the previous aversive shock. I know the authors said they regressed out the effect of shock from the brain measures but it is unclear how one would regress out the effects of processing of previous shocks. Perhaps this potential confound has been addressed in previous reports of this task but I think some brief attention to the issue here would help readers to evaluate the results.

      9) Given the fact that shocks always occurred during the ITI and never during the cue, readers may be left wondering if the participants were indeed anxious versus, e.g., distracted, during the temporal decision task since they technically are not even yet at risk of receiving a shock at that moment of the task. Some clarification of this point would be helpful.

      10) Related to and overlapping with some of the points above, I request that the authors add a statement to the paper confirming whether, for the experiment, they have reported all measures, conditions and data exclusions and how they determined their sample sizes. The authors should, of course, add any additional text to ensure the statement is accurate. This is the standard reviewer disclosure request endorsed by the Center for Open Science [see http://osf.io/hadz3 ]. I include it, where appropriate, in every review.

    3. Reviewer #1

      This manuscript reports a pair of studies investigating the neural correlates of the temporal underestimation that has been shown to accompany anxiety in previous studies. Hypotheses were pre-registered, including increased activation in the anterior cingulate during threat and that "threat-related bold signal changes will correlate with the threat related behavioural changes". The current work found threat-related activity in the anterior cingulate gyrus, and that greater mid-cingulate activity for longer estimates of stimulus duration, with a trend toward overlap between these contrasts, which was subthreshold after correcting for multiple comparisons. In addition, activity associated with state anxiety and temporal estimation overlapped in the insula and putamen. The authors interpret these findings as consistent with the overloading hypothesis that vigilance during state anxiety and duration perception rely on overlapping areas, resulting in inaccurate duration perception during anxiety. However, these results should be interpreted with caution given that, as the authors note, there was no interaction between threat and perceived duration, and no correlation "between the underestimation of time during threat and either insula or midcingulate activation in the interaction contrast". Given the relatively small sample size, these null findings may have been the result of low power. Nevertheless, the current study will likely serve as a useful starting point for future work on this topic.

      Below are my comments on the manuscript:

      1) In the pre-registration, hypothesis 2 refers to the ACC and frontopolar areas, while in the manuscript I am not seeing the frontopolar areas. I know this region is particularly susceptible to dropout, so it is possible you were unable to adequately test this hypothesis – if so, this should be stated in the manuscript. In addition, the manuscript lists right IFG in the hypotheses, but I am not seeing results reported for this region.

      2) It would be good to explain why you chose to use 10 mm spheres centered on your ROIs, rather than using all voxels that met the p>.05 threshold in the clusters identified in Study 1.

      Minor comments:

      The abstract starts off talking about how anxiety can be adaptive, however, unless I missed something, they don't explicitly tie this thought into temporal underestimation. From the perspective of someone who is naive to literature on temporal underestimation, it seems that causing temporal underestimation would be maladaptive, if it causes one to underestimate how long you've been worrying about something. I would suggest either making the relationship between these ideas more explicit in the text, or either removing this first sentence or moving it to a less prominent spot.

      If there was a methodological reason for switching to a train of shocks (ex. an expectation that it would elicit more anxiety) in Study 2, it may be helpful for future researchers to state it. If it was simply a matter of equipment available at the second site, then no changes are needed.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This manuscript reports a pair of studies investigating the neural correlates of the temporal underestimation that has been shown to accompany anxiety in previous studies. Hypotheses were pre-registered, including increased activation in the anterior cingulate during threat and that "threat-related bold signal changes will correlate with the threat related behavioural changes". The current work found threat-related activity in the anterior cingulate gyrus, and that greater mid-cingulate activity for longer estimates of stimulus duration, with a trend toward overlap between these contrasts, which was subthreshold after correcting for multiple comparisons. In addition, activity associated with state anxiety and temporal estimation overlapped in the insula and putamen. The authors interpret these findings as consistent with the overloading hypothesis that vigilance during state anxiety and duration perception rely on overlapping areas, resulting in inaccurate duration perception during anxiety.

      The reviewers and I identified several strong points:

      1) The current study may serve as a useful starting point for future work.

      2) Interesting set of experiments aimed at replicating a previously reported effect of distorted time perception while under threat of electric shock while adding fMRI measurement of brain activity during the task.

      3) Clever paradigm.

      4) Pre-registered predictions and analytic plan.

      5) Grounded in the literature.

      Yet, on balance, there was consensus that the study provides only an incremental advance, largely owing to limitations of the approach.

      Major/general concerns are:

      1) Insufficient power. E.g.

      -"The authors indicated that the study was powered to detect the effect of threat on (I assume) behavior. I would guess that this is one of the largest effects that could be tested for in this study. In fact, the study appears underpowered to detect anything but very large effects. This could explain why many effects tested were not found (especially the interactions). I believe this should be explicitly acknowledged as a limitation for readers to be able to appropriately evaluate the strength of evidence for the claims made."

      -"The results should be interpreted with caution given that, as the authors note, there was no interaction between threat and perceived duration, and no correlation "between the underestimation of time during threat and either insula or midcingulate activation in the interaction contrast". Given the relatively small sample size, these null findings may have been the result of low power."

      2) Writing style. The reviewers found the lack of attention to polishing the manuscript distracting, e.g. "The methods section appears to be written by two different authors with major inconsistencies in style and phrasing"

      3) Missing details. Crucial methodological details are lacking or inconsistent, making it difficult to fully evaluate the approach

    1. Reviewer #3

      The authors provide a clear and effective response to the demand for robust real-time pose estimation software with closed-loop feedback capabilities. In addition, we appreciate the effort that the authors have put into making the software user-friendly and extensible. The paper is very well written and contains many tools for those in the field to effectively use.

      A small weakness is the authors have demonstrated the LED flash latency but do not show an application such as optogenetic stimulation or behavioural manipulation using the system. Also, most of their benchmark numbers are based on videos and not camera streams, this does not fully address potential hardware issues. I believe the heavy dependence on video data and not actual ground truth live video feed is something that should be checked to present accurate numbers.

      Their Kalman filter approach seems useful but the deviations in pose estimation prediction from the normal pose estimation are sometimes 30 px or more. People may make trade-offs between latency and accuracy when using this software. Another important factor for real-time tracking is the accuracy of the pose estimation, it determines whether the system is really useful in true application.

      It would be nice to see a bit more validation of the software in a realistic live stream context. The quality of their code is quite high.

      1) The authors emphasize that their software enables "low-latency real-time pose estimation (within 15 ms, at >100 FPS)". Upon inspection of table 2, it appears that this range of latency and speed combinations is primarily achieved using 176x137 px images on Windows/Linux GPU based hardware, with corresponding FPS dropping to well below 100 for larger images in the DLCLive benchmarking tool on all platforms except for Windows. As the range in framerate/latency combinations appears to vary quite a bit between setups and frame sizes, we would suggest including a more realistic range for the latency and framerate in the abstract or at least mention the heavily down-sampled video used.

      2) In table 2, the mean and SD latency appear to be stable across modes, frame sizes, and GPU setups. However, there appears to be a notable spike in the latency range (14 {plus minus} 73) for the image acquisition to LED time on Windows computers that stands out from other latency figures. This latency range is concerning for the consistency of real-time feedback applications on a platform and at a frame size that is likely to be commonly used. Would the authors be able to explain a possible reason for this large SD?

      3) The DLG values appear to have been benchmarked using an existing video as opposed to a live camera feed. It is conceivable that a live camera feed would experience different kinds of hardware-based bottlenecks that are not present when streaming in a video (e.g., USB2 vs. USB3 vs. ethernet vs. wireless). Although this point is partially addressed with the demonstration of real-time feedback based on posture later in the manuscript, a replication of the DLG benchmark with a live stream from a camera at 100 FPS would be helpful to demonstrate frame rates and latency given the hardware bottlenecks introduced by cameras.

      4) In Figure 3, the measurement of the latency from frame to led is not very clear. The DLC will always give pose estimation even when the tongue is not appeared in the image so the LED will always be turning on very quickly after obtaining the pose from the image.

      5) In "Real-time feedback based on posture", the Kalman filter approach to reduce latency through forward prediction is innovative and likely of use for rapid characterization of general behaviours. In Figure 8C, the deviation of pose predictions from non-forward predicted poses appears to follow the general trend of the trajectory but appears to deviate by as many as 50 pixels from the non-forward predicted poses. While this tolerance may be acceptable for general pose estimation, many closed-loop pose estimation implementations may focus on rapid and accurate feedback based on very small movements (e.g. small muscular movements). For example, movements differing in magnitude by a few pixels may distinguish spontaneous twitches from conditioned behaviours. Considering that the demonstrated setup achieves a mean image to LED latency of 82 ms without the Kalman filter, it appears that many users would have to make a large trade-off between accuracy and latency in order to use the system with a conventional webcam and reasonably priced setup. Although the methods discussed are state-of-the-art and impressive considering the hardware used, it may be helpful to include a discussion of how the Kalman filter approach may be improved in the future to improve pose estimation accuracy while maintaining low latency.

      6) The software is compared favourably to existing real-time tracking software in terms of latency (refs 12-14). The efficacy of the existing realtime pose estimation software has been validated on animal movements using closed-loop conditioning paradigms. If feasible, a demonstration of the software reinforcing an animal based on real-time pose estimation (e.g. a similar paradigm to that used in the DLG benchmark video) would provide useful context as to whether the pose estimation strategies discussed are effective in closed-loop experiments. In particular, this would be important to evaluate given the novel Kalman filter approach - which influences the accuracy of pose estimation. We list this closed loop experiment as optional given the pandemic conditions we face. In contrast to the live animal reinforcement experiment, we do feel that real world streaming video to output trigger latencies are required (pt #3).

    2. Reviewer #2

      Kane et al. introduce a new set of software tools for implementing real-time, marker-less pose tracking. The manuscript describes these tools, presents a series of benchmarks and demonstrates their use in several experimental settings, which include deploying very low-latency closed-loop events triggered on pose detection. The software core is based on DeepLabCut (DLC), previously developed by the senior authors. The first key development presented is a new python package – DeepLabCut-Live! – which optimises pose inference to increase its speed, a key step for real-time application of DLC. The authors then present a new method for exporting trained DLC networks in a language-independent format and demonstrate how these can be used in three different environments to deploy experiments. Importantly, in addition to developing their own GUI, the authors have developed plugins for Bonsai and AutoPilot, two software packages already widely used by the systems neuroscience community to run experiments.

      The tools presented here are truly excellent and very exciting. In my view DLC has already started a revolution in the quantification of animal behaviour experiments and DeepLabCut-Live! is exactly what the community has been hoping for – to deploy the power of DLC in real-time to perform closed-loop experiments. I have very little doubt that the tools described in this manuscript and their future versions will be a mainstay of systems neuroscience very quickly and for years to come. Key to this is that the software is entirely OpenAccess and easy to deploy with inexpensive hardware. I commend, and as a DLC user, I certainly thank the authors for their efforts. I have a couple of comments below on the manuscript itself, which the authors might want to consider. As for the software itself, all of the benchmarks look good and the case studies make a compelling case for its applicability in real-life – and the beauty of it is that because its Open Access, any issues and improvements needed will be quickly spotted by the community, and I expect duly addressed by the authors judging from their track-record on DLC.

      Main comments:

      1) One important parameter that is not really discussed throughout the manuscript is the accuracy of pose estimation. I realize that this might be more of a discussion on DLC itself, but still, when relying on DLC to run closed-loop experiments this becomes a critical parameter. While offline we can just go back, re-train a new network and try again, in a real-time experiment, classification errors might be very costly. The manuscript would benefit from discussing these errors and how they can be best minimised. It would also be helpful to show rates for positive and false negative classification errors for the networks and use-cases presented here, to highlight the main parameters that determine them and perhaps show how classification errors vary as a function of these parameters (e.g., do any of the procedures to decrease inference latency, such as decreasing image resolution or changing the type of network, affect classification accuracy?). Along the same lines, while the use of Kalman Filters to achieve sub-zero latencies is very exciting, it is unclear how robust this approach is. This applies not only to the parameters of the filter itself, but also on the types of behaviour that this approach can work with successfully. Presumably, this requires a high degree of stereotypy and reproducibility of the actions being tracked and I feel that some discussion on this would be valuable.

      2) A related point is that some applications are likely to depend on the detection of many key-points and it is unclear how the number of key-points affects inference speed. For example, the 'light detection task' using AutoPilot uses a single key-point, how would the addition of more key-points affect performance in this particular configuration?

    3. Reviewer #1

      The authors present a new software suite enabling real-time markerless posture tracking - with the aim of making low-latency feedback in behavioral experiments possible. They demonstrate the software's capability on a variety of hardware and software platforms – including GPUs, CPUs, different operating systems, and the Bonsai data acquisition platform. Moreover, they demonstrate the real-time feedback capabilities of DeepLabCut-Live!.

      While there have been other methods that have been introduced recently that have incorporated real-time feedback on top of DeepLabCut, this software shows improved latency, has cross-platform capabilities, and is relatively easy to use. The software was thoroughly benchmarked (with one small exception that I'll outline below), and although I wasn't able to directly test it myself, I was easily able to download the code, and the documentation was sufficient for me to understand how it works. I have every confidence that this is a piece of software that will be extensively used by the field.

      My one comment is that it would have been good to have some analysis as to how the network accuracy (i.e., real space – not pixel space – error in tracking) scales with resolution, as the fundamental tracking trade-off isn't image size vs. speed, it's accuracy vs. speed. I wouldn't call this an essential revision, but I think that including these curves would greatly help potential users make important hardware and software decisions. Granted, this difference will alter depending on the network, but even getting a sense from the Dog and Mouse networks here would likely be sufficient to provide a general sense.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary

      This submission introduces a new set of software tools for implementing real-time, marker-less pose tracking. The manuscript describes these tools, presents a series of benchmarks and demonstrates their use in several experimental settings, which include deploying very low-latency closed-loop events triggered on pose detection. The software core is based on DeepLabCut (DLC), previously developed by the senior authors. The first key development presented is a new python package – DeepLabCut-Live! – which optimizes pose inference to increase its speed, a key step for real-time application of DLC. The authors then present a new method for exporting trained DLC networks in a language-independent format and demonstrate how these can be used in three different environments to deploy experiments. Importantly, in addition to developing their own GUI, the authors have developed plugins for Bonsai and AutoPilot, two software packages already widely used by the systems neuroscience community to run experiments.

      All three reviewers agreed that this work is exciting, carefully done, and would be of interest to a wide community of researchers. There were, however, four points that the reviewers felt could be addressed to increase the scope and the influence of the work (enumerated below).

      1) The fundamental trade-off in tracking isn't image size vs. speed, but rather accuracy vs. speed. Thus, the reviewers felt that providing a measure of how the real space (i.e., not pixel space) accuracy of the tracking was affected by changing the image resolution would be very helpful to researchers wishing to design experiments that utilize this software.

      2) The manuscript would also benefit from including additional details about the Kalman filtering approach used here (as well as, potentially, further discussion about how it might be improved in future work). For instance, while the use of Kalman Filters to achieve sub-zero latencies is very exciting, it is unclear how robust this approach is. This applies not only to the parameters of the filter itself, but also on the types of behavior that this approach can work with successfully. Presumably, this requires a high degree of stereotypy and reproducibility of the actions being tracked and the reviewers felt that some discussion on this point would be valuable.

      3) A general question that the reviewers had was how the number of key (tracked) points affects the latency. For example, the 'light detection task' using AutoPilot uses a single key-point, how would the addition of more key-points affect performance in this particular configuration? More fully understanding this relationship would be very helpful in guiding future experimental design using the system.

      4) The DLG values appear to have been benchmarked using an existing video as opposed to a live camera feed. It is conceivable that a live camera feed would experience different kinds of hardware-based bottlenecks that are not present when streaming in a video (e.g., USB3 vs. ethernet vs. wireless). Although this point is partially addressed with the demonstration of real-time feedback based on posture later in the manuscript, a replication of the DLG benchmark with a live stream from a camera at 100 FPS would be helpful to demonstrate frame rates and latency given the hardware bottlenecks introduced by cameras. If this is impossible to do at the moment, however, at minimum, adding a discussion stating that this type of demonstration is currently missing and outlining these potential challenges would be important.

    1. Reviewer #3

      This manuscript describes analysis and experiments designed to implicate CBX2 and CBX7 in breast carcinogenesis. Naturally, the analysis of existing data provides only correlative measures, and some of these are likely insignificant and driven by outliers (see specific points below). The experimental validation is done in two cell lines with a single siRNA, and data showing successful targeting of siRNA is lacking. The authors also claim direct regulation of mTORC by CBX2 and CBX7, but the evidence provided is weak. Overall the results are suggestive but do not provide conclusive evidence justifying the conclusions.

      Specific Points:

      The expression of CBX2/CBX7 correlates with breast cancer subtype, so all the predictive power may be in the subtype of cancer. Is there evidence that once standard prognostic methods are applied, CBX2 and/or CBX7 expression levels add to prediction? If not, it is not clear that these are drivers and not simply correlative markers of disease status.

      Figure 2 should include CBX2, CBX7, and other CBX RNA and protein levels to show that targeting was effective and specific. Multiple siRNAs should be used to demonstrate that it is not an off-target effect.

      Figure 3 correlations are extremely weak. Significance is driven by the large number of data points and not by correlation, and likely it is also driven by the few outliers on the left in each figure. If these are removed correlation is likely close to zero.

    2. Reviewer #2

      Saluja and colleagues present a study examining the contribution of chromobox-family of proteins, specifically to CBX2/7, on metabolic reprogramming of breast cancer cells. Notably, little is known regarding CBX2/7's activity in metabolism. The manuscript is well written and clearly presented. The major findings are that CBX2 and 7 are related to metabolic reprogramming and have inverse roles in regulating anerobic glycolysis, respectively. Through mining of several large datasets (TCGA/METABRIC), investigators demonstrate that amplification and upregulation of CBX2 correlates to more aggressive tumors and correlates to increased mTORC signaling. Authors directly demonstrate that siRNA knockdown of CBX2 leads to loss of glucose uptake and a reduction in ATP production. Conversely, loss of CBX7 increased glucose uptake, increased ATP production, promoted an increase in cell number, and pS6 phosphorylation. There is a significant need to better define the contribution of CBX2 and CBX7 in breast cancer, which will shed light on breast cancer progression, metabolic reprogramming, and therapeutic response. The strengths of the study included the use of large, well-annotated datasets and a novel area of cross-talk between epigenetics and metabolism. However, there are concerns detailed below that need to be addressed:

      Major:

      1) Most of the research presented is correlative studies with little mechanistic insight. CBX2 and CBX7 are members of the polycomb repressor complex 1 (PRC1). Are the CBX2 and CBX7 expression mutually exclusive? Related to figure 3, what is the mechanism of action that loss of CBX2 expression and decreases mTORC signaling? CBX2 and CBX7 proteins are not likely functioning alone. In CBX2High cell lines authors should investigate the impact of a PRC1 inhibitor in the context of anaerobic glycolysis to assess whether the CBX2 is functioning independent of PRC1. Also, the discussion regarding the interplay between PRC1, PRC2, and metabolism should be included.

      2) The MTT and Cell titer glo therapeutic sensitivity assays need to be repeated using a non-metabolic readout. The major conclusion of the study is that CBX2 and CBX7 promote metabolic reprogramming thus using metabolic outputs (Cell Titer Glo - ATP production and MTT - mitochondrial respiration) for the chemotherapy assays are flawed.

      3) Only two cell lines examined (MCF7 [ER/PR positive] and MDA-MB-231 [triple negative]), which is a study limitation. Why were these cell lines selected? Also, only pooled siRNA for both CBX2 and CBX7 were used, thus only loss-of-function responses are evaluated. Does overexpression of CBX2 in a CBX2-low cell line exacerbate anaerobic glycolysis and conversely does CBX7 overexpression in CBX7-low inhibit anaerobic glycolysis?

      4) Based on figure 6, the CBX2high lines are less responsive to Rapamycin suggesting that the cells are not dependent on CBX2-mediated upregulation of mTORC. Temsirolimus was also not detected as being significant, further highlighting that CBX2-activity on mTORC is not a critical pathway. Also, given the antagonistic effect of CBX7, what are the therapeutic vulnerabilities conveyed in CBX7high?

      5) The survival curves demonstrated in Figure 5 show a substantial difference between TCGA and Metabric data, what is the possible explanation?

    3. Reviewer #1

      The manuscript entitled "CBX2 and CBX7 antagonistically regulate metabolic reprogramming in breast cancer" analyzed multi-omics data of breast cancer mainly from METABRIC and TCGA with the focus on the chromobox family member genes (CBXs). Authors showed the association of CBX2 and CBX7 expression levels with glycolysis in tumors and the mTOR signaling, especially the levels of phosphorylation of S6 protein in tumors. Knockdown of CBX2 and CBX7 in two breast cancer cell lines showed opposite effects on glycolysis, cell viability and growth. Previous studies reported that CBX2 and CBX7 have oncogenic and tumor-suppressive roles in breast cancers. Results from this study showed their involvement in regulation of glycolysis, as well as their association with the prognosis of disease-specific survival of breast cancers. While some of the findings about CBX2 and CBX7 are interesting, most of the results showed association and provided limited insights about how CBX2 and CBX7 regulates glycolysis and their contribution in breast cancer.

      Specific comments:

      1) The authors need to provide detailed methods of analysis, including glycolysis deregulation score, where to obtain the DNA methylation levels, etc.

      2) It is uncertain that it is acceptable practice to base/categorize breast cancer aggressiveness according to different subtypes (from LumA, LumB, Her2 to Basal) as shown in Figure 1D, Figure 4C, 4F.

      3) 2-DG experiments were only performed in MDA-MB-231 and MCF-con cells but not cells with CBX knockdown (Fig S3). It is therefore unclear whether the changes of cell viability, proliferation by CBX knockdown are due to the metabolic changes (Figure 2).

      4) Figure 3 showed the effects of CBX on pS6 levels in breast tumors. However, it is unclear whether this change contributes to the role of CBX2, CBX7 in glycolysis. The statement on page 6, line 1 "CBX2 and CBX7 exert their effects on breast cancer metabolism via modulation of mTORC1 signaling" is not established and has no data to support.

      5) Figure 5, since the % of CBX2 high/low and CBX7 high/low differ in different subtypes of breast cancers, it is suggested to analyze the association of CBX2, CBX7 expression with prognosis in different subtypes.

      6) Figure 6, please discuss why CBX2 high cells which supposedly have high mTOR activity showed higher resistance towards Rapamycin compared to CBX2 low cells. Also, whether CBX7 showed opposite effects of drug sensitivity towards the same group of compounds.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The manuscript has been reviewed by three experts in the field, including an expert in metabolism, one in breast cancer and one in bioinformatics. There are concerns that much of the data are correlative as opposed to mechanistic, and that the material thus falls short of increasing insights into the role of CBX2/7 in breast cancer. There is concern that the cell viability assays are actually readouts of metabolism, and that viability assays should be repeated using a non-metabolic readout such as trypan blue or calcein/EtBr stain. There are concerns about the possibility that the expression of CBX2 and 7 as markers of breast cancer subtype are actually driving the correlations seen. And there are concerns that only two cell lines are analyzed, and only a single siRNA. It is suggested that performing the metabolism assays in the presence of knockdown for CBX would better support the premise that there is a correlation between metabolism and proliferation, and that these are together regulated by CBX proteins. Finally, one of the reviewers requests more detail in the methods.

    1. Reviewer #2

      General assessment:

      This study uses innovative analytical tools to characterise movement-evoked patterns in the cortex and evaluate functional recovery after stroke. They employ a motor task wherein a sliding platform that has to be pulled back by the mouse upon an acoustic cue to obtain a reward. Calcium-imaging cortical events are matched to force events. A propagation map is generated based on SPIKE-order: an asymmetric counter of threshold crossing coincidences between each and all pixels. Three propagation indicators are investigated: duration, angle and smoothness. These indicators show differences between healthy and stroke mice during the first and last three weeks of treatment.

      The proposed SPIKE-order algorithm is a promising analytical tool to characterize brain dynamics in a variety of cortical functional imaging data. The terms 'spike' or 'synfire' do not correspond to the neuronal processes, but are used analogously referring to threshold crossings, and consistent spatiotemporal patterns of spike coincidence respectively. This analysis is highly versatile, being scale and parameter free, thus this approach must be empirically validated.

      Major concerns:

      1) My main concern with the study is in the use of these indicators to track recovery after stroke. There is no control group that received stroke but did not perform the task during the acute phase. An increase in oxygenation in the area over time due to collateral irrigation may account for the reported effects. Without the appropriate control, the recovery in propagation indicators cannot be attributed to motor rehabilitation.

      2) Notably, there is no effect of training on changes to these indicators in healthy mice. Previous work by Makino et al. 2017 reported decreased duration of activity as learning progressed. Looking at spatial gradients of phase, Makino et al also found a secondary activity flow at later stages of training. The authors should provide reasons for the absence of these changes in their indicators of duration and angle.

      3) There is no analysis on the frequency of action types to indicate behavioural recovery. This should be addressed in the discussion, but it may also suggest that these indicators have no relation to a longitudinal effect of the motor task.

      4) The key to the status codes is missing. There are 7 discrete statuses of the robotic slide in total, but only status 3 is described. Also, the schedule for the acoustic cue within status 3 is unclear.

      5) The nature of the reward for pulling is not specified. In the drawing in Figure 1, it looks like it could be water or sucrose. However, it is stated that mice are not water deprived. The difference in cortical activity between R and nR events is due solely to auditory cues and not voluntary action. It is important to know the nature of the reward to assess motivation and intention in the movement.

    2. Reviewer #1

      I enjoyed reading the paper by Cecchini et al. on using wide-field calcium imaging in mice to assess propagation of motor-related cortical network activity before and after focal photo-thrombotic stroke. The paper is well-written and relatively easy to follow because of the lengthy (perhaps even verbose and at times jargonny) explanations of the methods and results. The authors are clearly experts in the field, having published on the topic of stroke recovery in recent years, and in the methods employed, especially in the analysis approach, which they recently developed (cf. Allegra Mascaro et al., 2019). They also cite many of the relevant papers in the field. After decades of stroke research documenting various aspects of molecular or anatomical changes in circuits after stroke, studies such as this one that focus on alterations in network activity, are very important. The main technique used, single-photon calcium imaging through the skull of bulk signals on the cortical surface, is elegant in its simplicity and has clear advantages over similar wide-field imaging techniques using voltage sensors (which includes sub-threshold activity not related to action potentials) or intrinsic signals (which depend on blood flow/volume and are hard to interpret in the context of stroke). The authors then use sophisticated quantitative approaches to analyze three aspects of the propagation of cortical network activity (duration, smoothness, and angle) and how they are affected by stroke and by two rehabilitative strategies. The main findings can be summarized as follows: 1) These three indicators are stable over time (4 weeks) in healthy mice; 2) After stroke, network events last longer and are more chaotic (lower smoothness); 3) A combination of motor training and silencing the healthy hemisphere after stroke drastically alters these three parameters.

      The main strengths of the paper, in my opinion, include the novelty of their analysis of wide-field calcium imaging in the context of stroke, especially when coupled with a rehabilitative strategy, and the results showing differences in propagation of activity between stroke and healthy controls. However, I have noted the following issues, some of which I consider serious.

      One problem I encountered is that the authors do not provide sufficient data on the impact of stroke, both in terms of size/location and its impact on function (motor pull task), or about the pharmacological silencing approach. Although they refer to their previous paper (Allegra Mascaro 2019), I could not find clear answers there either.

      My first recommendation is that the authors present data on the location and size of the infarcts they produced in each of the mice used in the present study. They should show at least a couple of histological examples of infarcts and, more importantly, a graph that plots infarct volume for all the individual mice (this could be in a suppl. figure), and ideally the location of the infarct with respect to the landmarks of M1. PT strokes can be quite variable, and one wonders whether some mice suffered large infarcts whereas in others they are negligible or may have missed M1 altogether.

      Second, they should clarify in a lot more detail what the behavioral deficits are after such a stroke, if any, not just as detected by the robot task but also with other behavior assays. In the Allegra Mascaro paper, the plots in Fig. 1D indicate that normal control mice have gradual reductions in peak amplitude and in slope of the force over 5 days of training (whereas stroke mice do not), but it's not clear whether this is statistically significant. Moreover, in the Results section of that paper, they claim the "amplitude and slope of the force task (...) were not significantly different across groups." I believe the authors need to show their behavior data for this new cohort of mice. In fact, if they can't find significant deficits in forelimb function with the pull task after PT stroke, then the authors should clearly state that their robot assay is insensitive (which would seriously undermine the significance of their findings.) The present manuscript states that the combined treatment promotes "a generalized recovery of the forelimb dexterity" (line 358), but this is not supported by any data provided. If the authors are unable to provide behavior data, any statements about the robot task should be modified, if not removed. Solely referring to their 2019 paper is not appropriate, since this is an entirely new group of animals. I'm very much hoping that the authors actually have these data on behavioral performance across time for all mice in the study, because they would be in a position to actually correlate changes in pulling (amplitude, slope) with network activity data and provide a more robust narrative. However, Fig. 6 indicates that the effects of Rehab were the same for all types of events (F vs. nF, Act vs. Pass, or RP vs. nRP), which suggests that there is probably no correlation between training and network activity.

      Third, regarding the BoNT/E experiments, neither the Allegra Mascaro 2019 paper, nor this one, provides any evidence that the procedure actually works as intended. The authors should either do in vivo wide-field calcium imaging in a subset of mice in the injected hemisphere to show that spontaneous and motor-related cortical activations are eliminated in toxin-injected mice (or some ephys in slices at the very least), with appropriate controls of course, such as a mice injected with vehicle or with denatured toxin. An important control that is currently missing is a BoNT/E alone group, without stroke (see comment #1 below).

      Lastly, I am concerned about the sample size they use for statistics. Although they discuss the numbers of mice in their power analysis, all the plots they show include many more individual points than the number of mice (what are those, FOVs? events?). The preferred sample size would be to use the number of mice. I believe the authors should show the data (and perform statistics) only for individual mice. Otherwise they need to justify why they didn't do stats with n= # mice.

      Other comments (not necessarily minor):

      1) I agree that the pattern of activity is different in the Rehab group (presumably an effect of silencing the contralesional, healthy hemisphere). But, since it is also very different from the pattern of propagation in healthy control mice (or pre-stroke baseline), it is also possible that this is also a pathologic pattern, not necessarily reflecting a "new functional efficacy (line 358-9). The authors should comment on this possibility in the Discussion, namely that Rehab did not restore activity to a control pattern, but to a different pattern altogether. This will be easier once they analyze a BoNT/E control group in which mice are injected with BoNT but do not receive a stroke. This is a critical control that will allow the reader to determine whether the effects they see in the Rehab group reflect adaptive plasticity to restore functional connectivity, or simply disconnection from the silenced hemisphere.

      2) Regarding the standardized maps for cortical brain regions in Fig. 1, the authors should explain in more detail how the imaging fields of view (FOV) were superimposed and aligned to the contours; it is briefly described in terms of aligning to Bregman and Lambda, but more information would help if there is concern for animal to animal variability (being off by 3 pixels in any direction is >0.5 mm.) In Fig. 1d it looks like the imaging field of view is actually quite caudal, with very little motor cortex included. Is this a typical representation or was there some variability from animal to animal in the location of the imaging FOV? I recommend that the authors provide the exact location of the imaging FOV rectangle for each animal and an outline of where the PT stroke was located in the same figure. I would also recommend redrawing the contours that demarcate brain regions in Fig. 1c and d so that they do not appear so thick.

      3) I was surprised that spatiotemporal dynamics of the calcium signals did not change with learning the task; the authors suggest this is because mice learn the task so quickly (line 401-8). I wonder if, alternatively, the reason is because they don't learn at all (since they did not report significant differences across days in control mice in their 2019 paper) or because it doesn't require learning. The robot task extends the forelimb into an uncomfortable position and the mice may simply reflexively pull it back into a more comfortable resting position.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers were both very enthusiastic about the novelty and potential application of the calcium imaging technique. However, some major issues were raised that dampened the enthusiasm of the paper. Some of the key issues raised were that essential controls are missing, key measurements (behavioral) of post-stroke recovery are not provided, there are some questions about the statistics that were applied to the data, and the sample size used in the experiments was also an area of question.

    1. Reviewer #3

      Overview and general assessment:

      In further untangling the organisation of occipitotemporal cortex (OTC), this paper attempts to explain, using behavioural and categorical models, the graded representations of images of animal faces and bodies, and objects (plants), in OTC and the face, body, and object-selective regions within OTC. The data suggest two main results. One, the representations in OTC seem to be (independently) related to an animate-inanimate distinction, a face-body distinction, and a taxonomic distinction between the images. Two, the representations in the face and body selective regions in OTC are related to the face/body images' similarity to human face/body respectively as gauged with a behavioural experiment. This similarity to human face/body subsumes the variance in face/body-selective OTC related to the authors' model of taxonomic distinction. These observations are used to suggest that the graded responses to animal images in OTC reported by previous studies (termed the animacy continuum in some cases) might just be based on animal resemblance to human faces and bodies than on a taxonomy. The claims, if valid, are a major addition to the ongoing discussion about the nature and underlying principles of the organisation of object representations in high-level visual cortex.

      There might be a multitude of issues, outlined below, with the way the observations are used to support the authors' claims. Addressing those issues might help reveal if the claims are indeed supported by the data which would be crucial in deciding whether to publish the current version of this paper.

      Main concerns:

      On "OTC does not reflect taxonomy" (line 390): Observations in Figure 4 suggest that the variance in face/body-selective OTC explained by the taxonomy RDMs is for most part a subset of the variance explained by the human face/body similarity RDMs. This observation is used to suggest that "there is no taxonomic organisation in OTC" (line 423). Wouldn't such a statement be valid only if the taxonomy RDM did not explain any variance in OTC? Couldn't the observation that the variance it explains is also explained by human-similarity imply that the human-similarity is partly based on taxonomy? Also, the positive and strong correlation between the human-similarity RDMs and CNN RDMs in Figure 6 suggest that the human-similarity judgements reflect visual feature differences. However, how would you distinguish between the variance in the human-similarity RDM described by visual feature differences and by a more semantic concept such as taxonomy? Without disentangling these visuo-semantic factors (as done in Proklova et al. 2016 and Thorat et al. 2019) how could we be sure that OTC does not reflect taxonomy?

      On "OTC does not represent object animacy" (line 434): Figure 2 suggests that the animacy RDM is related to the OTC RDMs, even after factoring out the face/body and taxonomy RDM contributions. The point raised in the above section also makes it harder to suggest that animacy (the semantic part) is not represented in OTC. While the studies mentioned in the discussion are part of the ongoing debate on whether animacy is indeed represented in OTC, such a definitive statement seems out of place in the discussion in this paper where the data do not seem to suggest the absence of animacy in OTC.

      On "Deep neural networks do not represent object animacy" (line 468): "trained DNNs plausibly do not represent either a taxonomic continuum or a categorical division between animate and inanimate objects" (lines 487-488). In Figure 5 there is a clear negative correlation with the animacy RDM for most of the CNNs i.e. a "categorical distinction". Other models are not factored out in Figure 5 to suggest that the animacy RDM contribution is not unique as the statement suggests. Also, the way the CNNs are trained, they are not fed explicit animacy information so whatever variance is related to animacy as quantified by the categorical/behavioural models suggests that those models might be capitalising on visual feature differences. As such, indeed, CNNs do not represent animacy – but then that is a trivial statement – it seems they do represent visual feature differences which can be associated with animacy.

      Minor comments:

      (lines 53-54) "These studies equate the idea of a continuous, graded organisation in OTC with the representation of a taxonomic hierarchy" This is false. For example, in Thorat et al. 2019 this equality was questioned by dissociating between an agency-based (which would be similar to taxonomy) hierarchy and a visual similarity hierarchy. The point about differential focus on faces or bodies for different animals is a valid point and requires further research to be elucidated.

      For the taxonomy model, is it appropriate that the assumed distance between the Mammal 1 class and the Mammal 2 class is the same as the one between Mammal 2 class and the Birds class? Is this what we expect in OTC? In terms of spearman correlations this assumption might be fine, but when the model contributions are partitioned using regression (e.g. in Figure 2) the emphasis does shift to the magnitude of the distances than the ranks of the distances. This assumption might be running into a bigger problem when comparisons between the taxonomy model and human-similarity models are made. The human-similarity model seems to capture the differences with the Mammal 1 class which are collapsed into one measure in the taxonomy model. Might this difference underlie the observed results where the variance captured by the taxonomy model is subsumed by the variance explained by the human-similarity model?

      Would it be possible to acquire confidence intervals for the independent and shared variance explained by the 3 models in Figure 2 (and elsewhere where there is a similar analysis)? That might help us understand if the individual contribution of, say the animacy model, to OTC is robust. In the same vein, it might be a good to indicate the robustness of the differences between the correlations of the different models with L/V-OTC in the figures.

      (lines 181-182) "the taxonomic hierarchy is more apparent in VOTC-all, while the face-body division is also still clearly present" What is the significance of this distinction (also echoed in lines 222-223 after the face/body ROI analysis)?

      Across the animals how correlated are the human-body similarity and human-face similarity RDMs? It seems that different set of participants provided these two models. Is that the case? Are the correlations between the two models at the noise ceilings of each other? Is there any specificity of model type with ROI type i.e. does the human-face similarity model correlate more with L/V-OTC face than with L/V-OTC body and vice versa for the human-body similarity model? Basically, how different are the two models?

      In Figure 4, how do the correlations of the mentioned models look like with L/V-OTC-object? While it is interesting to understand the graded responses in the face and body areas, it might be good to see if the human-face/body similarity models also explain the graded responses in the, arguably more general, object-selective ROIs. Of course, here the object-selective ROI would share a lot of voxels with the body and face selective ROIs and the results might be similar, but might still make sense to add the object-selective ROI results as a supplemental figure to Figure 4. Also in Figure 1, it is clear that the 3 ROIs do not cover all of L/V-OTC. In making claims about the representations in OTC at large, would it be useful to also analyse L/V-OTC-all (or go further and get an anatomically-defined region) with the human face/body-similarity models?

      What is the value of the noise ceiling for VOTC-body in Figure 4B?

      Why might the animacy model be negatively correlated with the CNN layer RDMs?

    2. Reviewer #2

      The authors sought to reconcile three observations about the organisation of human high-level visual cortex: 1) the reliable presence of focal selective regions for particular categories (especially faces and bodies) 2) broader patterns of brain responses that distinguish animate and inanimate objects and 3) more recent findings pointing to organisation reflecting a taxonomic hierarchy describing the semantic relationships amongst different species. To this end, they conducted a well-designed and technically sophisticated fMRI study following a representational similarity approach, seeking to pull apart these factors via careful selection of stimuli and comparison of evoked BOLD activity with predicted patterns of (dis)similarity. This was complemented by an analysis comparing similarities of these models with the properties of the deeper layers of several deep neural networks trained to categorise images. The authors draw "deflationary" conclusions, to argue that models of OTC emphasising semantic taxonomy or animacy are unnecessarily complex, and that instead the most powerful organisational principle to account for extant findings is by reference to representations that are anchored specifically on the face and the body.

      1) In many ways, this study is designed as a response to a few specific previous papers on related topics, notably two by Connolly et al., and others by Sha et al and Thorat et al. One limitation of the paper is that it perhaps relies too much on knowledge of that previous work - for example, points about the "intuitive taxonomic hierarchy" that build on that work were not fully explicated in the Introduction and only became gradually clear through the ms. More seriously, I am concerned that the authors' conclusions depend on methodological differences with the other work. The authors focused their analyses on focal regions identified as face-, body-, or object-selective in localiser runs. Judging from Figure 1B, this generates a rather restricted set of regions that are then examined in detail with various RDM analyses. In comparison, some of the previous studies worked with much broader occipito-temporal regions of interest, and/or used searchlight methods to find regions with specific tuning properties without defining regions in advance. To put it more bluntly, the authors may have put their thumb on the scale: by focusing closely on regions that by selection are highly face or body selective, they have found that faces and bodies are key drivers of response patterns. So in this light I was confused by the section beginning at line 442 ("Based on this...") in which the authors seem to dismiss the possibility that animacy dimensions are captured over a broader spatial scale, but they have not measured responses at that scale in the present study. In sum: applied to wider regions of occipitotemporal cortex, the same approach might plausibly generate very different findings, complicating the authors' ultimate conclusions.

      2) I was not fully convinced by the inclusion of the DNN analyses. In contrast with the brain/behaviour work, this did not seem strongly hypothesis driven, but rather exploratory, and more revealing of DNN properties than answering the questions about human neuroanatomy that the authors set out in the introduction. Would this part of the study be better reported in more detail, in a different paper?

      3) Looking at Figure 1C - is it the case that each of these data-to-model comparisons is equally well-powered? The three models are not equally complex: the animacy and face-body models are binary, while the taxonomy model makes a more continuous prediction. Potentially, then, this sets a higher statistical bar for the taxonomy model than the others. That is, it is consistent with a narrower and more specific set of the space of possible results: the binary models essentially say "A should be larger than B" but the taxonomy model says "A should be larger than B, should be larger than C, etc.". If not taken into account, this difference might put the taxonomy model at an unfair disadvantage when compared directly against the other two.

      Minor Comments:

      The authors report a series of VOTC/LOTC "all" analyses, and also a series of analyses of the specific ROIs that compose these unified ROIs (e.g. face or body specific regions only). In that sense, these analyses are partly redundant to each other, rather than being independent tests. If I read this correctly, then this suggests that statistical corrections may be in order to account for this non-independence, and/or some tempering of conclusions that rely on these as being two distinct indexes of brain activity.

    3. Reviewer #1

      In this fMRI study, Ritchie et al. investigated the representation of animal faces and bodies in (human) face- and body-selective regions of OTC, testing whether animal representations reflect similarity to human faces and bodies (as rated by human observers) or a taxonomic hierarchy. Results show that similarity to humans best captures the representational similarity of animal faces and bodies in face- and body-selective regions.

      This is a well-conducted study that convincingly shows that animals' similarity to humans is important for understanding responses to animals in face- and body-selective regions. More generally, it suggests that previously observed selectivity to animals is (at least partly) driven by responses in known (human) face- and body-selective regions. These findings make a lot of sense in the context of earlier work. I was, however, a bit puzzled by the framing of the study and the interpretation of the results. I hope my comments are useful for revising the paper.

      Major comments:

      1) The study is framed around a couple of recent fMRI studies (most notably Sha et al., 2015 and Thorat et al., 2019) claiming that the animacy organization in visual cortex reflects a continuum rather than a dichotomy. The submitted study contrasts this claim with the alternative of a face-body division. The authors conclude that taking into account the face-body division explains away the proposed animacy continuum (here taken as taxonomic hierarchy) account. I had difficulty following this logic. There seem to be at least three separate questions here: 1) does the animacy organization reflect activity in face/body-selective regions, or are there animate-selective clusters that are different from known face- and body-selective regions? 2) assuming that animals activate known face- and body-selective regions, are responses in these regions organized along a human-similarity continuum? 3) what is the nature of this continuum - conceptual and/or visual? Could you clarify which questions your study address? See below for more explanation.

      2) One of the conclusions relates to the first question ("Our results provide support for the idea that OTC is not representing animacy per se, but simply faces and bodies as separate from other ecologically important categories of objects."). I am missing a review of previous work here: there is already strong evidence showing that the animacy organization is closely related to the face/body organization. For example, Kriegeskorte et al. (2008) showed that the animate-inanimate distinction is the top-level distinction in OTC, with the animate category consisting of face and body clusters (rather than human vs animal); see also Grill-Spector & Weiner (2014) for perhaps the leading account of how animacy and face/body selectivity may be hierarchically related. Furthermore, earlier work reported responses to animal faces and bodies in human face- and body-selective regions. For example, Kanwisher et al. (1999) found responses to animal faces "as might be expected given that animal faces share many features with human faces" and concluded: "Thus the response of the FFA is primarily driven by the presence of a face (whether human or animal), not by the presence of an animal or human per se.". Tong et al. (2000) reached similar conclusions. Similar findings were also reported for animal bodies in body-selective regions, with stronger responses to animal bodies (e.g. mammals) that are more similar to humans (Downing et al., 2001; Downing et al., 2006). Considering this literature (none of which is cited in the Introduction), it seems rather well established that the animacy organization is directly related to face/body selectivity, that animal faces/bodies activate human face-/body-selective regions, and that this activation depends on an animal's similarity to human faces/bodies. (More generally, visual similarity is well-known to be reflected in visual cortex activity, including in category-selective regions (e.g. work by Tim Andrews)). It would be helpful if the current study is introduced in the context of this previous work so that it is clear what new insights the current study brings.

      3) Related to the second question, the current results provide convincing evidence for a human-similarity dimension. However, contrary to the claims of the paper, the continua proposed in Sha et al. and Thorat et al. would seem to predict a similar result, considering that these studies defined the animacy continuum in terms of an animal's similarity to humans: Sha et al.: "the degree to which animals share characteristics with the animate prototype-humans."; Thorat et al.: "the animacy organization reflects the degree to which animals share psychological characteristics with humans". To model this dimension, rather than assuming a 1-6 taxonomic hierarchy, participants could rate the animals' similarity to humans, as for example done in Thorat et al. You will likely find that these ratings correlate highly with the visual similarity ratings in the current study. The obvious problem is that animals that are similar to humans tend to share both conceptual and visual properties with humans. By the way: it would be relevant to discuss Contini et al. (2020) in the Introduction, as this paper similarly proposed a human-centric account.

      4) This brings us to the third question, whether "similarity to humans" is purely visual (i.e., image based) or whether conceptual similarity also contributes to explaining responses. Sha et al. could not address this question because their stimuli confounded the two dimensions. However, it was not clear to me that the submitted study can address this question any better, considering that the stimuli were not designed for distinguishing the two dimensions either: bodies/faces that are visually more similar to humans will belong to animals that are conceptually more similar to humans as well.

      5) The study is quite narrowly focused on debunking the taxonomy hierarchy supposedly proposed by previous studies. If this is the goal, you would need to stay close to these previous studies in terms of analyses and regions of interest. If not, it is hard to compare results across studies. For example, the abstract states that: "previous studies suggest this animacy organization reflects the representation of an intuitive taxonomic hierarchy, distinct from the presence of face- and body-selective areas in OTC." I'm not sure who made this claim, but if this was the claim that you want to test, wouldn't you need to look outside of face- and body-selective regions for this taxonomic hierarchy? Or if the study is a follow-up to Sha et al., then it would be useful to see their analyses repeated here, or at least present results in comparable ROIs. Alternatively, you could detach the research question from these studies and focus more on animal representations in face- and body-selective regions (after introducing what we know about these regions).

      Minor comments:

      1) The third paragraph of the Introduction mentions "these studies", but it is not clear which specific studies you refer to (the preceding paragraph cites many studies).

      2) Did you correct for multiple comparisons when comparing the models (e.g. p.10)?

      3) Could the human-similarity ratings partly reflect conceptual similarity? Might it not be hard for participants to distinguish purely visual properties from more conceptual properties? Perhaps the DNNs can be used to create an image-based human-similarity score?

      4) It was not entirely clear to me what the DNNs added to the study (which asks a question about human visual cortex). These are also not really introduced in the Introduction, and are only briefly mentioned in the Abstract. Was the idea to directly compare representations in DNNs to those in OTC?

      5) p.15: refers to Figures 6A and 6B instead of 4A and 4B

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers agreed that your paper reports a well-conducted study revealing several interesting results. However, they were ultimately not convinced that one of the main conclusions of the paper – the absence of an animal taxonomy – was sufficiently supported by the presented data, also considering the difference in analysis methods compared to previous studies. Furthermore, they noted that the reported results are somewhat incremental relative to earlier work reporting responses to animal faces/bodies in face-/body- selective regions.

    1. Reviewer #3

      PREreview of "Evolutionary transcriptomics implicates HAND2 in the origins of implantation and regulation of gestation length"

      Authored by Mirna Marinić et al. and posted on bioRxiv DOI: 10.1101/2020.06.15.152868

      Review authors in alphabetical order: Monica Granados, Katrina Murphy, Maria Sol Ruiz, Daniela Saderi

      This review is the result of a virtual, live-streamed journal club organized and hosted by PREreview and eLife. The discussion was joined by 17 people in total, including researchers from several regions of the world, the last preprint author, and the event organizing team.

      Overview and take-home message:

      In this preprint, Marinić et al. begin the beautiful exploration of gene involvement at the maternal-fetal interface of pregnancy evolution with a look at the importance of a known early-pregnancy gene, HAND2. The research team's findings shown through uterine models and a combination of cell, gene, and data analysis demonstrate HAND2's roles in supporting progesterone in placental mammals by down-regulating estrogen in time for implantation, and through IL15 signaling, where both the promotion of immune and placental cell migration as well as up-regulation of estrogen at the end of term for a healthy gestation length is noted. This important work also sheds some light on progesterone's role in non-placental mammal pregnancy where estrogen continues to be produced throughout the pregnancy. Although this work is an important addition to the field of pregnancy evolution, there are some points that need clarification and a few minor concerns that could be addressed in the next version. These are outlined below.

      Positive feedback:

      1) The selection of HAND2 as a hypothetical regulator of gestation was based on previous knowledge, but the authors supported this selection after an extensive phylogenetic analysis of genes expressed in the endometria of pregnant/gravid organisms from several Eutherian and non-Eutherian species.

      2) Several participants evaluated the results as encouraging for looking into other models such as organoids (as stated by the manuscript), and as a great start for a deeper understanding of pregnancy evolution via the study of gene expression.

      3) The potential implications of these results in the field of abnormalities in pregnancy/infertility were also mentioned as relevant.

      4) Definitely recommended for peer review because this is a great start for a deeper understanding of genes involved in the evolution of pregnancy!

      5) I think the fact that there could be a mechanism involved in HAND2 that ends gestation is really interesting.

      6) Cool to learn that HAND2 expression was specific to fibroblasts and the fibroblasts influence signaling in other cell types.

      7) A proposal of a new hypothesis based on "evolutionary" observations.

      8) Enjoyed learning from the author that a uterus is a counter-intuitive place with immune cells making up half the cells to allow for tolerance towards the pregnancy process.

      9) The methods section was quite detailed; including a GitHub repository and on page 17, a data availability statement for images, genes, and related data. I found the manuscript really interesting. Enjoyed it very much!

      10) In general, the manuscript was easy to follow and figures were logically arranged.

      Concerns:

      Areas that could use more clarification:

      1) It was helpful to hear from the author that the known HAND2 gene wasn't knocked out in mice, so it was an easy early pregnancy gene to start with.

      2) To reproduce the study, there were a couple of questions around the production of the conditioned media including, how long were the cells incubated in the media and what was the volume of the media used. Can more details be shared in the next version?

      3) Can you further explain why the opossum was used to measure the estrogen levels?

      4) Please explain why the researchers decided on the TPM=2 expression cut-off.

      -We heard from the author that genes with TPM less than 2 are functioning in the cell; this might be nice to add in the next version.

      5) Can you include your thoughts on why mammals have evolved this way? This might be a good addition to the discussion.

      6) I think that given the technical model limitations present in the study of the uterus, and in the study of different species, it would deserve some comments about limitations in order to highlight these great findings.

      7) The relationship between ESR1 and HAND2 is a little unclear. Is ESR1 expression correlated with HAND2 expression in all species studied?

      Acknowledgments: We thank all participants for attending the live-streamed preprint journal club. We are especially grateful for both the last author's contributions to the discussion and for those that engaged in providing constructive feedback.

      Below are the names of participants who wanted to be recognized publicly for their contribution to the discussion:

      Monica Granados | PREreview | Leadership Team | Ottawa, ON

      María Sol Ruiz | CONICET-University of Buenos Aires | Postdoctoral Researcher | Buenos Aires Argentina

      Katrina Murphy | PREreview | Project Manager | Portland, OR

    2. Reviewer #2

      The manuscript "Evolutionary transcriptomics implicates HAND2 in the origins of implantation and regulation of gestation length" by Marinić et al. uses an innovative expression dataset in an evolutionary framework to identify a set of transcripts whose endometrial expression emerged at the eutherian stem lineage. One of these is the transcription factor HAND2. Using both existing datasets and experimental data they build a model of the activity of HAND2 and its associated protein IL15 at the maternal-fetal interface and implicate the proteins in both the evolution and disorders of pregnancy. I highly recommend this manuscript. This work illustrates the utility of evolutionary analysis for elucidating functional mechanisms of complex disorders. The authors support their evolutionary analysis with a thorough characterization, including additional experimental data, of their hypothesized gene association. This work substantially contributes to our knowledge of the evolution and diseases of pregnancy.

      I have only two point of inquiry that I believe the authors should address in the manuscript:

      1) Of the 149 genes that unambiguously evolved endometrial expression why was only HAND2 analyzed? I am not suggesting that each gene be followed up with this level of rigor but would you hypothesize that each of the genes you identified play a role in eutherian reproduction? Or are there other major innovations that some of these genes may be associated with? How frequently would this pattern occur by chance?

      2) Figures 2F and 4F - there appears to be a gap in the data points during the third trimester (which looks like it says "thirdr"). Is there still a negative trend if each section is analyzed independently as if they were independent datasets? Aka could this linear trend be composed of two separate trends instead?

    3. Reviewer #1

      Parsing mechanisms of disease from the perspective of evolutionary biology is an interesting approach. This perspective may be particularly advantageous when focussing the 'bigger picture' as it is perhaps less constrained by details that tend to preoccupy more conventional disease-focussed studies, such as clinical phenotyping, timing of biopsies, sample size, validation studies etc. In this study, Marinić and colleagues made use of a wealth of publicly available data sets to argue for a role of HAND2-IL15 axis in endometrial cells in implantation and, more importantly, the onset of parturition. The observation that enhancer regions in both HAND2 and IL15 harbour SNPs associated with gestational length/preterm birth renders the study timely and compelling. However, to my knowledge, the impact of these SNPs on the expression of either gene is not known. Further, the lack of validation studies on clinical samples renders the proposed mechanism plausible but speculative, as acknowledged by the authors. There are several other issues that require clarification:

      1) Fig. 1C appears interesting but there is no comparator or controls. Without comparison, for example the histotrophic phase, it appears difficult to conclude that estrogen signaling genuinely persists during pregnancy in the opossum. pESR1 staining in the tissue section is ubiquitous with no evidence of nuclear localisation, raising concerns about antibody specificity. KI67 staining may be more informative?

      2) The authors used a large single-cell RNA-seq data set to map HAND2 expression at the human maternal-fetal interface in the first-trimester of pregnancy (Vento-Tormo et al. 2018). They demonstrate that HAND2 expression is confined to 3 maternal subsets, termed endometrial stromal fibroblast (ESF) 1 and 2 and decidual stromal cells (DSC). If I am not mistaken, in the Vento-Tormo paper, these populations of cells were labelled decidual stromal cells 1-3 (DS1-3), emphasizing that all these cells were decidualized, as expected in pregnancy. Vento-Tormo et al. further demonstrated that the differences in gene expression between DS subsets relate to their topography in the maternal tissue. Hence, it is confusing that the authors changed the terminology of these subsets, giving the erroneous impression of two undifferentiated ESF populations and a single DS/DSC population in pregnancy. By doing so, the inference seems to be that T-HESC, a telomerase-transformed endometrial stromal cell line used in functional studies, is a good model of ESF populations in vivo, which is doubtful.

      3) Fig. 2G. The authors state that 'We also used previously published gene expression datasets (see Methods) to explore if HAND2 was associated with disorders of pregnancy and found significant HAND2 dysregulation in the endometria of women with infertility (IF) and recurrent spontaneous abortion (RSA) compared to fertile controls' - This bold statement is based on microanalysis of merely 5 biopsies in each group. Considering the intrinsic temporo-spatial heterogeneity of the cycling endometrium, this sample size is grossly inadequate. The microarray study was published in 2011. In fact there are several more recent and more robust datasets available (e.g. 115 IF biopsies in GSE58144 and 20 RM biopsies in GPL11154). These comments also apply to Figure 4G.

      4) The authors also state 'HAND2 was not differentially expressed in ESFs or DSCs from women with preeclampsia (PE) compared to controls (Figure 2G).' It is unclear which dataset this was based on. The authors' claim seems to indicate that this was single-cell data? In any case, the sample size is again grossly inadequate to draw robust conclusions without further validation in a much larger cohort of samples.

      5) Figure 3. The authors decided to knockdown HAND2 in T-HESC, a telomerase-transformed endometrial stromal cell line, and performed RNA-seq 48 h later. The cells were not decidualized or even treated with progesterone. Hence, the rationale for this experiment, and its relevance to the in vivo situation, is genuinely lost on me. See also comment regarding the renaming of DS subsets into ESF. In an undifferentiated state, these cells are not representative of gestational cells (with the possible exception that decidual senescence is characterised by progesterone resistance, i.e. re-activation of genes that are suppressed by progesterone). More importantly, as HAND2 is critical for the identity of these cells, perhaps knockdown triggers a stress response? For example, from the data presented in Supplementary Table 6 (it would be helpful to add gene names), on of the strongest up-regulated gene upon HAND2 knockdown is BLCAP2 [Log2(FC): 10.2], which encodes a protein that reduces cell growth by stimulating apoptosis.

      6) The authors illustrated the importance of examining the right cellular state: knockdown HAND2 in T-HESC increases IL15 expression whereas it is well established that HAND2 knockdown in decidual cells decreases IL15 expression. Further, IL15 is strongly induced upon decidualization and previous studies on primary endometrial stromal cells demonstrated that IL15 secretion is undetectable in undifferentiated cells whereas it is abundantly secreted upon decidualization (PMID: 31965050). Thus, to be informative, the authors should repeat HAND2 KD in decidualizing T-HESC and measure IL15 secretion in both states, with and without HAND2 knockdown.

      7) Fig. 3B - it is unclear what is compared here: genes deregulated upon HAND2 knockdown in T-HESC versus knockdown NR2F2, FOXO1 and GAT2 in decidualized primary cultures? If this is the case, the comparison is not informative as it involves two different cell states. It is surprising that FOSL2 was not included in this analysis.

      8) I do not understand the relevance of the experiments described in Figure 5 to the context of gestation length or preterm birth. Trophoblast invasion will have been completed in the 2nd trimester of pregnancy - what is the purpose/message of these experiments? What is the level of IL15 secreted by these cells? Again the T-HESC appears not decidualized - so, what is the relevance to either the midluteal implantation window or gestation?

      9) What is the evolution of IL15 expression at the maternal-fetal interface? Does it parallel HAND2?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      Parsing mechanisms of disease from the perspective of evolutionary biology is a powerful approach. The manuscript by Marinić et al. uses an innovative expression dataset in an evolutionary framework to identify a set of transcripts whose endometrial expression emerged at the eutherian stem lineage. One of these is the transcription factor HAND2. Using both existing datasets and experimental data the authors build a model of the activity of HAND2 and its associated protein IL15 at the maternal-fetal interface and implicate the proteins in both the evolution and disorders of pregnancy. The work illustrates the utility of evolutionary analysis for elucidating functional mechanisms of complex disorders and substantially contributes to our knowledge of the evolution and diseases of pregnancy.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Dear reviewers,

      Thank you very much for your constructive and helpful remarks and suggestions!

      We marked the changes in the manuscript in yellow.

      Our replies to the specific points:

      Reviewer #1 In the Introduction the authors need to cite earlier work in Chlamydomonas which first showed that binding of specific proteins to the psbA 5'UTR is correlated with increased translation in the light (Danon et al. 1991).

      As suggested, we added the reference to the introduction.

      Reviewer #1 The paper could be improved by testing for protein binding to the footprint region in high vs low light. An obvious candidate is HCF173.

      We agree that HCF173 is an obvious candidate, although its interaction could be mediated via additional proteins. Alice Barkan’s group has demonstrated that in maize HCF173 binds to the same region upstream of the translation initiation region (McDermott et al., 2019) where we detected a footprint (Supplemental Figure S11A-D). Furthermore, McDermott et al showed that the binding sequence is conserved. We would like to analyze this question in more detail, but we have currently in the lab no approach available to specifically isolate psbA mRNA with its bound proteins for this analysis and therefore have to postpone the answer to this question to future studies.

      Reviewer #2: \*Important changes to make before full submission:** 1)It is becoming clear that the translation efficiency (TE) is often not a calculation of translational output from specific mRNAs but in fact is better to be described as ribosome association. There can be many reasons for increased ribosome association including ribosome stalling and increased translational engagement. It would be good for the authors to add a simple Western blot to demonstrate directly increased protein output from psbA during high light as compared to low light treatments. This figure could be added to Figure S1.*

      We want to stress that we have chosen a condition that is well known to increase psbA translation in higher plants as shown in the literature with different methods (e.g. Chotewutmontri and Barkan, 2018; Schuster et al., 2020). The protein encoded by psbA, the D1 subunit of photosystem II, has an increased turnover in high light, i.e. a higher amount of D1 has to be produced to compensate for the increased degradation of photodamaged D1 (Mulo et al., 2012; Li et al., 2018).

      Although there is a lot of evidence in the literature for good correlation of translation efficiency as determined by ribosome profiling and protein synthesis, the reviewer raised a valid concern. Ribosome pausing or even ribosome stalling could also cause increased ribosome binding and thereby increased amounts of ribosome footprints. Therefore, we analyzed ribosome pausing in selected genes including psbA and rbcL. The pattern of ribosome pausing was very similar in low and high light (new Supplemental Figure 14), which rules out any ribosome stalling at specific sites or drastic changes in ribosome pausing. To analyze if there is increased ribosome pausing, we determined the fraction of footprints at pause sites compared to the total number of footprints. We used two different pause scores as cutoffs to determine pause sites. To include as many pausing events as possible, we used a pause score of 1, i.e. everything higher than the mean ribosome density per nucleotide of the corresponding coding region (Gawronski et al., 2018). This fraction was unaltered in low and high light (new Supplemental Figure 14). With a more stringent pause score of 20 (20 times higher ribosome density than the mean), an increase of ribsome pausing in high light was detected for psbA, whereas we did not find differences between high and low light for rbcL and psaA. However, this increase in pausing at the psbA mRNA is insufficient to explain the increase in the total amounts of ribosome footprints. Additional pause scores were tested, the value for the psbA fraction with a pause score of 20 included in Supplemental Figure S14 showed the largest difference.

      Reviewer #2: \*Strongly suggested additions to the manuscript to improve its significance before publication** 1)Identifying the RNA-binding protein(s) (likey HCF173 which may be in a complex with other proteins) that interacts with the 5' UTR of psbA in a highlight dependent manner would increase the significance of this study. Finding that this protein binds to other plastid transcripts with weak Shine-Delgarno sequences would also be a nice addition to this study.*

      See comment to reviewer 1. McDermott et al. (2019) describe HCF173 as relatively specific for psbA. Therefore, we do not assume that other genes with weak Shine-Dalgarno sequences are regulated via HCF173 but via different proteins using a similar molecular mechanism to influence the mRNA secondary structure at the translation initiation region.

      Reviewer #2: \*Strongly suggested additions to the manuscript to improve its significance before publication** 2)Mutational analysis of the RBP binding site and also to change the secondary structure around the start codon based on the new structure maps to show the effects of these various changes on protein output would really provide important new findings on how important the RBP being as compared to the RNA secondary structure changes are for regulating protein output form psbA. It could also allow the demonstration of the dependence or independence of these two features on regulating translation from chloroplast mRNAs.*

      We agree with the reviewer that this would be a very interesting study. Unfortunately, it requires a larger collection of lines with mutated psbA sequences. Plastid transformation in Arabidopsis thaliana is still technically demanding and time consuming. Even in the case of Nicotiana tabacum, for which plastid transformation is well established, such a project would likely need several years. We therefore think that such a study is beyond the scope of the current manuscript.

      Reviewer #3 1.In this paper, author mentioned that DMS can modify four nucleotides under alkaline conditions. Because the chloroplast is slightly alkaline, the authors use DMS reactivity from 4 nucleotides to model RNA secondary structure. Based on Kevin Weeks' s paper, it shows that in cell-free condition, DMS has very limited ability to modify single-stranded G and U compared to A and C (Anthony M. Mustoe et al., 2019, PNAS 116: 24574. fig. 1B). In Lars B. Scharff' paper which is cited by the author, it is also mentioned that A and C is more reliable to model RNA secondary structure. The authors might need to calculate the correlation the DMS data and known RNA structure using G/U or all four nucleotides to show that DMS reactivity from G and U is also reliable to be used. Also in Fig. S3B, the reproducibility of G/U between replicates is not as good as A/C. I don' t think G and U can be used to predict RSS.

      We agree with the reviewer that DMS reactivities at G/U are less reliable than those at A/C. This was shown by Mustoe et al. (2019) and by us for chloroplast rRNAs (Gawronski et al., 2020, Plants). We included a correlation of the known 16S rRNA secondary structure and the DMS reactivities at the different nucleotides (Supplemental Figure S5A) that demonstrates that the DMS reactivities at G/U actually contain information about rRNA secondary structure. This analysis demonstrated again that the reactivities at G/U are less reliable than at A/C. Therefore, we added an analysis of the more reliable A/C for comparison with the results for all four nucleotides (Figure 1D-F, 3C-F).

      Reviewer #3 2.Is the 5'UTR the only region which has RSS change? If not, how do RSS changes in other region contribute to translation?

      Translation initiation in plastids is mainly influenced by the secondary structure of the translation initiation region, especially at the cis-elements required for the recognition of the start codon. In addition, we have analyzed different other regions, e.g. the coding regions, the coding regions without the sequences next to the start codon, the end of the coding region, and the complete 5’ UTR (Supplemental Figure S14). We added a more detailed analysis of the changes of secondary structure of the coding region of those genes we focus on (Supplemental Figure S16). This shows that the secondary structure changes of the complete coding region correlate negatively with translation efficiency (see also Supplemental Figure S14G). A similar observation was made in E. coli and explained to be caused by differences in translation initiation, which are mainly influenced by the secondary structure of the translation initiation region (Mustoe et al., 2018).

      Reviewer #3 3.In Fig. 2A and 2B, the DMS reactivities seem very similar under low light and high light. Why did the authors obtain significantly different RNA secondary structure? Are the parameter of low light and high light the same when modelling RNA structure?

      The parameters for the RNA secondary structure predictions in Figure 2 are not identical (see Figure legend). For all structure predictions, the DMS reactivities were used as constrains, but only for the high light structure the sequence of the RNA binding protein’s footprint was forced to be single-stranded. These structure predictions are included to illustrate the mRNA structures in the presence and absence of an RNA binding protein. These structures are based on the observation that the two halves of the stem loop structure have different DMS reactivities in response to high light. The sequence including the protein footprint has lower DMS reactivities in both low and high light. This is in agreement with both a double-stranded sequence as well as a protein-bound sequence. In contrast, the other half of the stem loop, the sequence including the cis-elements of the translation initiation region, has increased DMS reactivities in high light, indicating that it is single-stranded. This suggests that there is protein binding in high light preventing the formation of the inhibitory stem loop.

      Reviewer #3 4.In Fig. S12, the correlationship between HL and LL in ribo-seq and RNAseq is high, which means no significant changes upon light change. In this paper, psbA should have translation change under high light conditions. I suggest the authors to label the dot representing psbA.

      Thank you very much for this suggestion! We marked psbA in the correlation plots (Supplemental Figure 12). The changes in the transcript levels are really minor, whereas for some genes the translation efficiency changes (see Figure 4 and Supplemental Figure S13).

      Reviewer #3 5.I suggest to use plants at the same stage for DMS-MaPseq and SHAPE probing.

      The different plant material was chosen because of the different requirements during probing. In this context, we would like to point out that observing the same changes in the translation initiation region in response to high light in different developmental stages is a stronger confirmation than observing the same response at the same developmental stage. This indicates that the response is not specific for a developmental stage.

      Reviewer #3 6.In Huang's paper (Jianyan Huang et al., 2019, Cell Reports 29: 4186-4199), there are many differential express genes under high light for 0.5hr. However, in the RNAseq data here, the correlation between high light and low light conditions is very high (Fig. S12). Why? Also, it would be nice if the authors could label several DEG whose expression change under high light treatment in Fig. S12?

      Supplemental Figure S12 contains only plastid-encoded RNAs, whereas Huang et al. (2019) focused on nuclear-encoded mRNAs. We clarified the figure legend of Supplemental Figure S12 by adding “of the plastid-encoded genes”. The values for the individual genes can be seen in Supplemental Figure S13.

      Reviewer #3 7.For the MNase footprint method, is the as-SD region the only region show enrichment under high light conditions? Besides, please provide the detailed method of MNase footprint. Does it work for RNA footprinting?

      The used methods are described under “Ribosome profiling (Ribo-seq)” and “Processing of Ribo-seq and RNA-seq reads” in Material and Methods. The approach was very similar to the one used for ribosome profiling with the difference that also smaller read lengths were included in the analysis (18-40 nt instead of 28-40 nt). We did this, because many plastid RNA binding proteins have footprints that are smaller than a ribosomal footprint. The described footprint is the only one detected near the translation initiation region of psbA. Binding of HCF173 was detected by the Barkan group in the same region using a RIP-Seq Analysis combined with RNase I digestion (McDermott et al., 2019), which confirms that our approach is working. We added a reference to the method section in the results part to clarify which approach was chosen.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      RNA can fold into secondary and tertiary structure through base-pairing. RNA structure plays a crucial role in gene functions and regulations, including transcription, processing, translation and decay. Plants acclimate to fluctuating light conditions to optimize photosynthesis and minimize photodamage. Translational regulation is known to be a strategy of these acclimations. It reported that translation of psbA, encoding the D1 reaction center protein of Photosystem II, is increased under high light condition. The light-controlled psbA translation has been intensively studied and was suggested to be related with redox/thiol signals, the ATP status, and some certain proteins. In this ms, Gawroński et al. explored the possible link between RNA secondary structure and translational efficiency. They adopted DMS-MaPseq and SHAPE-seq methods to profile the RNA secondary structure in 5UTR of psbA under low light and high light conditions. The results showed that the DMS and SHAPE activities of Shine-Dalgarno (SD) sequence, star codon and as-SD region are higher under high light condition than that under low light control, indicating that the psbA translation initiation region becomes more single-strandeness and accessible under high light condition. MNase-digestion and DMS activity analysis suggested that protein binding might cause the change of RNA secondary structure of psbA translation initiation region. In addition, the authors probed the RNA secondary structure of the translation initiation region of rbcL that encodes the large subunit of Rubisco and found no change in RNA structure of rbcL, while the translation of rbcL is also increased under high light condition. To address the question that RNA structure changes is related with high light-dependent translational activation of psbA but not rbcL, plastome-wide translational efficiency and RNA structure were analyzed. The results showed that a significant correlation between the RNA secondary changes and translational efficiency changes in the chloroplast-coded mRNAs with week SDs (such as psbA), but not with strong SDs (such as rbcL).

      The light-dependent translational activation of psbA is critical for maintaining photosynthetic homeostasis. Also, the molecular mechanism of RSS's impact on translation is still exclusive The topic of this study is very important. However, this study just described the phenomenon of RNA secondary structure changes in translational initiation region, but does not give further evidence to validate the effect of RNA secondary changes on the translational activation of psbA under high light condition. Besides, the evidence of protein binding causing RNA structure changes is week and unclear. In addition, there is much room for improvement for this work

      1.In this paper, author mentioned that DMS can modify four nucleotides under alkaline conditions. Because the chloroplast is slightly alkaline, the authors use DMS reactivity from 4 nucleotides to model RNA secondary structure. Based on Kevin Weeks' s paper, it shows that in cell-free condition, DMS has very limited ability to modify single-stranded G and U compared to A and C (Anthony M. Mustoe et al., 2019, PNAS 116: 24574. fig. 1B). In Lars B. Scharff' paper which is cited by the author, it is also mentioned that A and C is more reliable to model RNA secondary structure. The authors might need to calculate the correlation the DMS data and known RNA structure using G/U or all four nucleotides to show that DMS reactivity from G and U is also reliable to be used. Also in Fig. S3B, the reproducibility of G/U between replicates is not as good as A/C. I don' t think G and U can be used to predict RSS.

      2.Is the 5'UTR the only region which has RSS change? If not, how do RSS changes in other region contribute to translation?

      3.In Fig. 2A and 2B, the DMS reactivities seem very similar under low light and high light. Why did the authors obtain significantly different RNA secondary structure? Are the parameter of low light and high light the same when modelling RNA structure?

      4.In Fig. S12, the correlationship between HL and LL in ribo-seq and RNAseq is high, which means no significant changes upon light change. In this paper, psbA should have translation change under high light conditions. I suggest the authors to label the dot representing psbA.

      5.I suggest to use plants at the same stage for DMS-MaPseq and SHAPE probing.

      6.In Huang's paper (Jianyan Huang et al., 2019, Cell Reports 29: 4186-4199), there are many differential express genes under high light for 0.5hr. However, in the RNAseq data here, the correlation between high light and low light conditions is very high (Fig. S12). Why? Also, it would be nice if the authors could label several DEG whose expression change under high light treatment in Fig. S12?

      7.For the MNase footprint method, is the as-SD region the only region show enrichment under high light conditions? Besides, please provide the detailed method of MNase footprint. Does it work for RNA footprinting?

      Significance

      see above

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This study uses multiple high-throughput sequencing approaches to probe the secondary structure of the chloroplasitc psbA mRNA during low and high light treatments. They are able to demonstrate a shift in secondary structure around the start codon of this mRNA in response to the high light treatment as compared to under low light conditions. This structural shift is also accompanied by an RBP binding even that may also be involved in regulating the translation from this mRNA in response to high light. I think this study is very interesting and timely. However, I think determining the relative contributions of the secondary structure and RBP binding changes to potential increases in protein outputs from this mRNA in response to high light would improve this manuscript. I also think directly looking at protein levels through a straight-forward Western blot to show increase psbA protein in response to high light treatment is an important addition to this study. I outline my few suggested experimental additions for this manuscript below.

      Important changes to make before full submission:

      1)It is becoming clear that the translation efficiency (TE) is often not a calculation of translational output from specific mRNAs but in fact is better to be described as ribosome association. There can be many reasons for increased ribosome association including ribosome stalling and increased translational engagement. It would be good for the authors to add a simple Western blot to demonstrate directly increased protein output from psbA during high light as compared to low light treatments. This figure could be added to Figure S1.

      Strongly suggested additions to the manuscript to improve its significance before publication

      1)Identifying the RNA-binding protein(s) (likey HCF173 which may be in a complex with other proteins) that interacts with the 5' UTR of psbA in a highlight dependent manner would increase the significance of this study. Finding that this protein binds to other plastid transcripts with weak Shine-Delgarno sequences would also be a nice addition to this study.

      2)Mutational analysis of the RBP binding site and also to change the secondary structure around the start codon based on the new structure maps to show the effects of these various changes on protein output would really provide important new findings on how important the RBP being as compared to the RNA secondary structure changes are for regulating protein output form psbA. It could also allow the demonstration of the dependence or independence of these two features on regulating translation from chloroplast mRNAs.

      Significance

      This study definitely focuses on a research topic that is currently of interest and highly timely.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This manuscript addresses the regulation of chloroplast translation, an important topic in chloroplast biology. The authors show that specific changes in the secondary structure of the 5'UTR of the psbA mRNA involving the Shine-Dalgarno sequence and the AUG initiation codon can be correlated with changes in translational efficiency during a low light to high light shift. Based on indirect evidence they propose that this may be caused by binding of specific proteins to this region. They also show that this correlation appears to be valid to some extent for other mRNAs with a weak SD sequence. The technical quality of this manuscript is excellent and the manuscript is clearly written.

      Additional remarks

      In the Introduction the authors need to cite earlier work in Chlamydomonas which first showed that binding of specific proteins to the psbA 5'UTR is correlated with increased translation in the light (Danon et al. 1991). The paper could be improved by testing for protein binding to the footprint region in high vs low light. An obvious candidate is HCF173.

      Significance

      This work provides valuable new insights into the molecular mechanisms involving the psbA 5'UTR in the initiation of chloroplast translation.

      This work will be of interest to a wide audience interested in the mechanisms of translational regulation.

      My expertise is in chloroplast biogenesis and in assembly and regulation of the photosynthetic apparatus

    1. Reviewer #3

      This work presents a method to analyze integrated mutation and transcript data to identify mutations in individual genes that drive similar and divergent transcriptional signatures. Overall the work appears novel and provides potential insights that could generate hypotheses worthy of further study. The work is limited in that confirmation is done only for a set of mutations on GATA3 with existing drug sensitivity cell line data. It would be helpful to have an indication that more than a single result from the large study provides validated insights.

      Concerns:

      1) While the approach is nicely detailed, one critical aspect remains unclear. An AUC is generated for each prediction of mutation from transcriptional signature based on cross-validation. I could not deduce from this statement exactly how this was done given in the introduction here of a mean score: "We measured a classifier's ability to identify a transcriptomic signature for its assigned task using the area under the receiver operating characteristic curve metric (AUC) calculated using samples' mean scores across ten iterations of four-fold cross-validation."

      2) The claim "These results are striking in that predicting the presence of a rarer type of mutation should, everything else being equal, be more difficult owing to decreased statistical power" is really applicable to a hypothesis test, so it is not immediately obvious that is applies in a case of cross-validation generating an AUC.

      3) The claim that a Spearman correlation of AUCs between methods is a validation of robustness is difficult to accept. Note that if you uniformly subtracted 0.5 from every AUC, the result would give a Spearman correlation of 1 with the original data, but it would not be a very robust result. Why is Pearson correlation not used?

      4) It is clear that many classifiers were actually run, and it would be helpful to have the number actually summarized. This ties into the concern with only a single validation in drug sensitivity data, since there may be false discoveries given a large number of classifiers.

    2. Reviewer #2

      In this study, authors analyzed the association between types of somatic mutations and the downstream effects on the transcriptome using data obtained from many large tumor data consortia such as METABRIC, TCGA etc. Subsequently, authors systematically show functional relevance using CCLE data.

      Concerns:

      Using the tumor profiling data from various consortia, several groups have shown these associations using different statistical methodologies (PMID:21555372, PMID: 26436532, PMID: 27127206 and thereon). In that light, results described in this study are correlational and some are obvious. It is not clearly described what transcriptional programs are impacted by mutation subgroups and how distinct they are from other tumor types with similar mutation subgroups. Also, it is not clear if these distinct mutation subgroups carry any clinical significance such as outcomes. Furthermore, transcriptional programs are also under regulation by DNA methylation and its role in defining the transcriptional program under the influence of mutation subgroups is not described.

      Specific Concerns:

      1) What data normalization and batch correction methods were applied on expression data from TCGA, METABRIC and other datasets.

      2) What clustering methods were applied for subsequent UMAP projection.

      3) Although association between mutation sub-groups and expression is described, it is not clear if expression profile of a group of genes found in the analysis. If so, functional significance of those co-regulated genes is not described.

      4) Page 35 (lines 781-782); What is the biological and statistical rationale for removing neighborhood genes. There is significant neighborhood effect in certain cancers such as ccRCC where 3p is significant for tumorigenesis and progression.

      5) Statistical methods and reasons of their application on the data is not well described. Moreover, linearity in describing the methods on data from start is not clear thus leading to confusion. Multiple correction sections, although mentioned are vague.

      6) Earlier studies have shown concordance between RNA-Seq and microarrays. In that context, page 16; lines 348-351, why do the authors assume differences exist between these platforms.

      7) Manuscript is long and difficult to read with emphasis on some obvious things. Manuscript can be shortened for easy reading.

    3. Reviewer #1

      While this is an important area, the organization and results presentation render this current form of the manuscript unacceptable. Some specific challenges are described below.

      1) Throughout the manuscript, the authors report AUC on the training set as the primary metric of assessment and to compare models between genes. However, these performance metrics are more valid for cross-validation and may be sensitive to the differences in sample size introduced by the number of mutations. The authors would be better served by using the permutation-based statistic they develop later in the results throughout to report results.

      2) The authors develop a permutation based statistic to assess performance in a manner that controls for sample size presented as part of the results and relegate most of its description to the supplemental methods. This is a critical part of evaluation that should appear in the main manuscript and used for all results presented in the manuscript. This is of particular importance for the comparison between TCGA and METABRIC performance, which have different sample sizes.

      3) Several hypotheses about the function of specific mutations or mutational groupings are made throughout the manuscript based solely on the AUC prediction values. These appear speculative and could be better grounded in results by evaluating the function of the genes in the transcriptional programs that underlie the prediction (e.g., using feature importance scores to determine specific genes associated with the classifier.

      4) It is unclear why specific genes are selected for presentation in the manuscript. These appear cherry picked to describe well performing genes and do not do a comprehensive presentation of the performance of the algorithm, particularly in the first subsection of results "Subgrouping classifiers uncover alteration divergence in a breast cancer cohort" and "Subgrouping classifier output reveals the structure of downstream effects within cancer genes." The latter section particularly includes a substantial amount of biological description of function based solely on performance that is not grounded in the results presented.

      5) The definition of "subgroupings" is not clearly described. It is not possible to follow as written how the 7598 groupings are determined and how these are used in the machine learning framework. This needs to be significantly clarified.

      6) It is unclear why HER2 amplifications are a focus of analysis for Luminal A subtype breast cancer samples, which are by definition HER2-.

      7) An expanded presentation of the results of relative classification accuracy by gene and cancer type would be useful for evaluating the further impact of cancer-type on performance to determine the role of the biology on mediating mutations. In particular, it would be useful to evaluate whether cancers with different cell type composition (e.g.,large fibroblast content in messenchymal HPV- HNSCC tumors) impact the results of the classifier. A similar comparison would be useful between in vivo tumors and in vitro cancer types from the gene expression profiles in CCLE.

      8) The GitHub links for the software presented in this paper do not work.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers are in agreement that the authors present an innovative classifier framework to predict mutational status and subgroups based upon transcriptional profiles. They perform a comprehensive analysis across cancer subtypes to assess context-dependence of mutations and link these classifiers to cell line data to further predict therapeutic outcomes. Overall the work appears novel and provides potential insights that could generate hypotheses worthy of further study. While this is an important area, the work is limited in several ways. These include numerous issues with the statistical methods used, lack of clarity as to whether the results were significant, potential concern about cherry-picking results, and the need to consider alternative factors contributing to the reported relationships, coupled with weaknesses in the organization and presentation of the results.

    1. Reviewer #3

      General assessment of the work: The authors report a study of the mesial temporal lobe (MTL), particularly focusing on structural/functional changes related to transition regions from six layer isocortex to three layered allo-cortex. This group uses their expertise in imaging processing techniques to define the anatomical regions of the mesial temporal lobe transition from isocortex to allocortex using the BigBrain high-resolution histological reconstruction. Using this single high-resolution histological image, they show intensity changes which correlate with the isocortex/allocortex transition. They then use this high resolution reconstruction to coregister to rs-fMRI, and define effective connectivity within the mesiotemporal lobe. Finally, they show variation rs-fMRI global patterns in relationship to the iso-to-allocortical axis, as well as the mesial temporal a/p axis.

      Substantive concerns:

      This is an interesting study which shows novel relationships between mesial temporal structures and whole brain functional organization. As the authors point out, the novel part of the study involves defining cytoarchitectural regions, and correlating these changes with both local and global function as defined by BOLD fMRI. This is a novel study examining the iso-allocortical transitions with the MTL, and correlating them with local and global rs-fMRI changes. As the authors state, the global rs-fMRI findings related to the anterior-posterior axis of the MTL are not new, but add complementary findings in comparison to the iso-allocortical transition findings. Given this, I will focus my comments on the use of the BigBrain image, and definition of the MTL transitions for use in defining regions in the rs-fMRI images.

      1) With the BigBrain data, only the right hippocampus was used for segmentation, due to a rip in the histopathological sections of entorhinal cortex on the left. It is therefore assumed that the right MTL segmentations were inverted and also used for the left MTL rs-fMRI analysis. If this is the case, it should be more clearly stated in the methods. Also, discussion should be added to the possible implications for results, both in respect to replicating the histological intensity findings (which could be tested in two hippocampi if both right and left were processed) and the known structural differences between the right and left hippocampi.

      2) I had concerns that using the higher resolution BigBrain image as a template for the 8 nodes in the MTL for the much lower resolution rs-fMRI images would be problematic for signal to noise ratio. However, the authors have convincingly shown consistent findings when controlling for signal to noise ratios.

      3) The authors mention (and reference) the correlation of histopathological cellular staining intensities with cellular densities and soma size in the methods section. Given the centrality of this concept to their findings of the BigBrain data, some addition to the discussion about this concept and the underlying evidence for correlation of staining intensity and cellular densities and soma size would be helpful.

    2. Reviewer #2

      This paper does a very good job of underscoring the importance of characterizing the structural organization of the cortex at a deep level in order to inform functional organization. The authors present an exciting and innovative method of bridging post-mortem cytoarchitecture with in vivo functional MRI, allowing for a powerful and compelling investigation of MTL micro-architecture. This work has important implications for how information transfer occurs through macro-structural and more local brain circuits. The two major findings regarding the allo-iso and the anterior-posterior gradient are supported by the previous literature, but so far characterization of this organization in humans in vivo has been somewhat limited. Most of my suggestions below are regarding points that could be clarified or methods that were unclear.

      1) Was there an a-priori prediction regarding the "multi-demand" network? This part of the narrative seemed to come out of the blue and could use more background.

      2) Some of the methods are not fully described and are hard to understand. For example, the surface models that are used to sample and model the properties of the microstructure at different cortical depths could be described in more detail. I was also having trouble understanding two things about the "confluence" or "intersection" between the allocentric and isocentric cortices. I was left wondering if the intersection is defined as a plane in surface space, demarcating the separation between hippocampus and entorhinal cortex? Is the confluence/intersection defined based on the manual hippocampal subfields (i.e. medial boundary of the subiculum) or is it defined some other way using the surface profiles/features? Finally, how is geodesic "distance" computed? I would suggest adding a figure to give an overview of these aspects of the methods.

      3) Related to the point above, I get the impression that this data shows there is no strict boundary between the allo and iso-cortex but rather that there is a somewhat smooth gradient. This point could be made more clear in the abstract and discussion. What implications does this particular finding have for theories of MTL subregion function?

      4) When r-values are reported to differ for different gradients (e.g. iso versus allo) it is important to test for a significant difference in the slopes (e.g. Fisher r-to-z transform or similar) to know if the relationships are statistically different from one another.

      5) This paper builds nicely on other work by DeKraker and colleagues (2019) that has analyzed the microstructural properties of the hippocampus. I think the readers of this paper would appreciate a brief description of how this investigation is similar/different from that work. For example, are the "features" identified here largely overlapping with those identified by DeKraker, and if not, how do they differ here?

      6) In the effective connectivity analysis of the MTL, how is variability of the MTL anatomy taken into account? For example, the fusiform and parahippocampal regions of interest will contain highly variable anatomical structures across subjects (e.g. different folding patterns of the collateral sulcus). Given that the focus on anatomical specificity is a major strength of this paper, I would be curious to know how anatomical variability/specificity is accounted for when the data is morphed into MNI152 volume space.

      7) I was unsure which analyses were replicated in the Human Connectome Project (HCP) dataset. It is stated that the isocortical functional gradients were re-generated within the HCP cohort and that results were "highly similar" (p. 18) to the original dataset. Was this similarity formally tested?

    3. Reviewer #1

      Thank you for inviting me to review this manuscript by Paquola and colleagues, in which the authors used a combination of high-resolution anatomical data, machine learning, spectral DCM and resting functional connectivity measures to interrogate the relationship between structural and functional gradients of organization within the mesial temporal lobe.

      The study is broken into four related sections. In the first section, the authors analysed vertices within a set of mesial temporal lobe structures using a random-forest algorithm, which identified a set of microstructural profiles across the structure. They then interrogated these profiles for evidence of an iso-to-allometric axis, which is a principle known to characterise the transition from 6-layered isocortex (in entorhinal cortex) to 3-layer allocortex (in the hippocampal formation). The authors found evidence consistent with this transition in the BigBrain data, particularly with respect to the skewness of the distribution of thickness across the layers.

      In the second section, the authors use Spectral DCM on resting state data from a group of 40 individuals. They then relate the results of the spectral DCM model to the gradients identified using structural anatomy. This section was well-motivated and conducted.

      In the third section, the authors compare the structural gradient to resting state functional connectivity with vertices within the cerebral cortex. The results here were quite compelling, showing a dissociation between the iso- and allo-cortical poles in the MTL in which the iso-cortex was correlated with fluctuations in the lateral dorsal attention and frontoparietal networks, whereas the allo-cortical pole was correlated with vertices in the default mode and medial occipital regions.

      In the final section, the authors conducted a number of checks of their analysis, including an SNR test to ensure that the temporal lobes (a notorious site for MRI signal dropout) were adequate, and a substantial replication analysis. They should be commended for these steps, and also for making their code freely available.

      Comments:

      1) Section 1: I wonder whether the manuscript might benefit from the unpacking of the random forest results. Is there an intuitive way to characterize skewness that may benefit the reader - such as a particularly uneven spread of thickness distributed across the layers? And is this finding something that we might expect, given the hypothesized gradient of iso-to-allocortex in the MTL?

      2) Section 1: Along these lines, is it fair to single out an individual measure from the random-forest regression as being the most salient? From my understanding (which might be mistaken), the weights on a particular variable in a regression need to be viewed in context of the performance of the whole model.

      3) Section 2: One minor comment is that it might be helpful for the reader if the "in" and "out" effective connectivity directions were incorporated into the matrix in Figure 2A.

      4) Section 2: I wasn't sure that I followed the logic of the experiment in which the authors split the MTL data into thirds to test for the consistency of their results. Were each of these sufficiently powered to allow for direct comparison with the main effect? Did the boundaries between these models cut across known regional areas? Perhaps a different way to achieve the same ends would be to use bootstrapping in order to provide a confidence interval around the relationship between structure and function?

      5) Section 3: Did the authors hypothesize the iso vs. allo-cortical relationship to resting state networks a priori, or was it discovered upon exploration of the data. Either is fine, in my opinion, but I think it would benefit the reader to have these results placed in the context of the known literature.

      6) Section 3: Do the authors expect that the patterns identified in the MTL will relate to subcortical gradients identified in other structures, such as the cerebellum (Guell et al., 2018), thalamus (Müller et al., 2020, and basal ganglia (Stanley et al., 2019)? See also Tian et al., 2020 for general subcortical gradients.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All three reviewers saw great merit in your work and were enthusiastic about its potential. Nonetheless, each reviewer raised several substantive concerns. Broadly speaking, we see the essential revisions as (1) providing additional clarity with respect to methods, (2) further unpacking of some of the results, as well as conducting a few targeted statistical analyses (i.e., to test for differences in slopes), and (3) clearer positioning of the current work as it relates to the existing literature.

    1. Reviewer #2:

      General Assessment:

      The role of visual experience with faces in the formation of face-specific neural "modules" is tested in a deep convolutional neural network model of object recognition, AlexNet. A modified version of the ILSVRC-2012 training dataset was constructed by removing all images with primate faces, removing remaining categories with fewer than 640 images, and re-training the deprived network: d-Alexnet. d-Alexnet was compared to pre-trained Alexnet on classification performance, quality of fit to fMRI data, strength of face-selectivity, representational similarity, and learned receptive field properties. The authors argue that face-selectivity is significantly reduced, but not eliminated, with the deprivation, and that this reduction is consistent with an interpretation that d-Alexnet represents faces more similarly to objects than Alexnet. While this work is well-motivated and timely, there are substantial issues in the conceptual approach, the methods used, clarity of the results, and most importantly, the strength of the conclusions.

      Major Concerns:

      1) The validity of these results is uncertain due to a) insufficient reproducibility within this work and b) fragile definitions of face-selectivity.

      a) Given that small changes in weight initialization or training procedure can have a large effect on learned representations (see Mehrer et al. 2020, https://www.biorxiv.org/content/10.1101/2020.01.08.898288v1.abstract ), the authors must demonstrate that their results hold across multiple initializations of each network type. Several key results hinge on the number and identity of "face-selective" channels (Figure 2, 3c-e) and only a single instance of each model type is used. In particular, the result that 2/256 channels are "selective" in d-Alexnet compared to 4/256 in Alexnet is likely sensitive to small variations in the methods, including the choice of evaluation stimuli and the initialization of the weights. If the models were re-trained, could the ratio be 4 channels to 4 channels, 0 channels to 2 channels, or some other result? With only a single instance of each model and such a small (and potentially unstable) number of face-selective channels in each model, I am not convinced that these results support the claims made.

      SUGGESTION: Report results averaged across multiple initializations of each model to demonstrate robustness. Statistical tests should be conducted across models (as if they were individual subjects) to demonstrate the significance of any effects found.

      b) The definition of "selectivity" is potentially fragile and may not hold when tested with more standard evaluation sets. In the primate face-selectivity literature, functional localizers are used to compare face responses to non-face responses. These localizers have much stronger controls over low-level features than the stimuli used to evaluate selectivity in this work. I am especially concerned that the faces (from FITW) differ from non-face objects (from Caltech-256) in low-level properties such as image resolution, pose, background, contrast, luminance, and more. Furthermore, selectivity is typically defined in the field as a continuous quantity (e.g., t-contrast, d-prime, face-selectivity-index) and is not often assessed in a binary fashion by the number of units significantly more responsive to faces than the second-best category. Many of these continuous metrics also incorporate variance in responses as well as the mean of responses. Thus, the designation of channels as "selective" or "not-selective" in this work based on mean responses to only 2 of the 205 categories (L101) prevents the reader from understanding how the distribution of face-selectivity shifted under the deprivation, which is one of the primary claims. Instead, we only see the number of selective channels after a binary cutoff, which may be sensitive to initialize and the stimulus set used to evaluate selectivity.

      SUGGESTION: Compute selectivity using evaluation sets in which faces are better matched to non-face objects. Report the distribution of selectivity for each channel before and after deprivation.

      2) Because one model in the comparison is pre-trained and the other is trained from scratch, there is the possibility that all of the differences between the models are due to differences in the training that are independent from the content of the training images.

      a) In the regression analysis, is it the case that non-selective channels also show differences in R2? For example, if the d-Alexnet is worse on the training task (d-ImageNet) than Alexnet, we expect a general reduction in its ability to explain neural responses (see e.g. Yamins et al., 2014). The claims that face-selectivity is specifically impaired in d-Alexnet need to be supported by demonstration that non-selective channels are equally good (or poor) fits to vertices in face-selective regions. Furthermore, the authors do not demonstrate that face-selective channels are better than non-selective channels in either model type, which is useful context for understanding whether the correspondence between face-selective channels and face-selective brain regions is meaningful.

      SUGGESTION: report non-selective channel fits to the same vertices for each model type and compare to face-selective channel fits.

      b) L366: the authors write that "the d-Alexnet was initialized with values drawn from a uniform distribution". This is not standard practice; in fact, the kernel weights in the original AlexNet model were initialized from a Gaussian distribution. To make comparisons to the non-deprived model, the authors need to also retrain the non-deprived model to account for the potential confounds between their training/initialization procedure and that used in the pre-training.

      SUGGESTION: re-train the non-deprived AlexNet in-house, then compare that model to d-AlexNet.

      1. A major conceptual issue is in the definition of a "face module". Despite "face module" in the title, a working definition of "face module" is not clearly provided in the manuscript. Context clues suggest that the authors may consider any face-specific process evidence of a "face module", but the experiments performed indicate that a specific set of criteria were explored: selectivity for faces, different representations for faces and non-face objects, holistic processing, etc. Especially given that the results of this work indicate some residual face-selectivity, a clear definition of "face module" - grounded in the existing literature - is needed to evaluate the claims provided.

      SUGGESTION: clearly define what the "face module" is in the brain, then explain what the corresponding evidence for a "face module" would be in the DCNN.

      4) A number of analyses are not well-motivated or are lacking in detail

      a) The analysis of the "empirical receptive field" is lacking in detail and motivation, and the color-scale is both nonlinear and missing a label. Specific questions:

      i) How should this result be compared to data in primate face-selective regions?

      ii) Is this result a trivial consequence of the difference in number of activated units (panel D)?

      iii) What are the units of the colormap?

      iv) Why are only two channels shown for AlexNet if 4 channels are face-selective?

      v) Is the extent of the empirical receptive field quantified?

      vi) How should the reader think about empirical receptive fields in a weight-shared convolutional architecture?

      b) The evaluation of the face-inversion test is poorly motivated. The face-inversion effect indicates that human subjects are better at remembering upright faces than inverted faces. However, the analysis performed here evaluates the magnitude of the response of face-selective channels. If anything, a classification task is needed to compare to the human task, because the "face inversion effect" cited is not simply that face-selective units respond more strongly to upright than inverted faces, but that the activation of the units supports differences in classification between upright and inverted faces.

      SUGGESTION: At minimum, justify 1) why the magnitude of channel response is a good measure of the face inversion effect or 2) remove the claim that the models do/don't exhibit the behavioral effect.

    2. Nancy Kanwisher (Reviewer #1):

      Xu et al use deep nets to ask whether face selectivity, and face discrimination performance, can arise in a network that has never seen faces. By painstakingly removing all faces from the training set, and comparing Alexnet trained with and without faces, they claim to find, first, that the face-deprived network does not have deficits in face categorization or discrimination (relative to the same network trained with faces), second that the face-deprived network showed some face-selectivity, and third that face deprivation reduced face selectivity. They conclude that "domain-specificity may evolve from non-specific experience without genetic predisposition, and is further fine-tuned by domain-specific experience."

      I love the question and the general strategy behind this study, and indeed we have long discussed doing something much like this in my lab, and we presented a preliminary result of this kind at VSS years ago (https://jov.arvojournals.org/article.aspx?articleid=2433862 ). It is a great use of deep nets to ask what kinds of structures can in principle arise with different kinds of training diets. Xu et al are also to be congratulated for the huge effort they went to in curating a data set of stimuli with no faces, for which they are correct no current algorithm is adequate, requiring a huge amount of labor-intensive human effort.

      Nonetheless, despite my might enthusiasm for the question, the general logic of the study, and the major effort to create the training set, I do have a few significant concerns about the paper:

      1) The biggest problem in the paper in my view is that although regular Alexnet saw faces in the training set, it was not trained on face discrimination, and its performance on this task is very low (66%). That is above chance but very much lower than a network that is actually trained on face discrimination. In our studies, which are typical of this literature, we find that when Alexnet is trained on the VGG-Face dataset identification of novel faces is around 85% correct (top-1). So to say that the face-deprived network performed no differently from the face-experienced network on a face discrimination task, while true, is misleading, because really this reflects the fact that neither was trained on face discrimination and both do pretty badly. And perhaps more importantly, for faces humans have learned, their typical face recognition accuracy would be way higher than 66% correct. So, the face-deprived network really does very badly compared to a real face-trained network, or to humans, and does not represent a strong case of preserved face discrimination despite lack of face experience. Instead, it reflects the kind of face recognition performance one would expect from an object recognition system or a prosopagnosic patient: above chance but not very accurate. Thus, I think the behavioral data show not preservation of face perception abilities in a network trained without faces, but low performance at face discrimination, much like a network that has seen faces but not been trained to discriminate them.

      2) The claim that "face-selective channels already emerged in the d-AlexNet" is similarly overstated in my view, given that only two such units were found and the selectivity of the one we are shown (on the right in Figure 2a) is weak. Although the authors concede that the selectivity of these two units is lower than found in Alexnet trained with faces, that understates the case, as Figure 2a shows. The analysis in Figure 2b, correlating responses of face-selective channels from Alexnet to natural movies, with brain responses to the same movies, is interesting but doesn't tell us what we most need to know. Several public data sets include the magnitude of response of FFA and OFA to a set of 50-100 images, and I would find it more useful to compare those to the response of Alexnet face units to the same images.

      A small point: Only human and primate faces were removed from the dataset, but I would think other animal faces (e.g. cats and dogs) should produce some relevant training. Certainly face-selective regions in the human brain respond strongly to animal faces, as several studies have shown. This might be worth considering in the discussion when potential reasons for the emergence of face-selective channels are discussed (line 229-236).

      For the reasons above, I don't think the results of this study strongly support the conclusion that "the visual experience of faces was not necessary for an intelligent system to develop a face-selective module". At least the "face-specific module" so claimed is a far cry from the human face processing system in both neurally measured selectivity and behavioral performance.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript. Thomas Serre served as the Reviewing Editor.

      Summary:

      In general, the reviewers and myself agreed that the study had strength including the question being asked and the general strategy used. We also thought that it was a great use of deep nets to ask what kinds of structures can in principle arise with different kinds of visual training diets. The authors should also be commended for the huge effort that went into curating ImageNet to remove images containing faces requiring a huge amount of labor-intensive human effort.

      At the same time, as you will see, the reviewers found a number of shortcomings in your study. Most of them could be addressed with (a lot of) additional work but, unfortunately, one issue raised seems impossible to convincingly address. Specifically, the accuracy of both the face-deprived network and the control network for face discrimination is far below that of both comparable networks specifically trained for face discrimination and most likely human observers (although this was not tested). Hence, the study does not represent a strong case of preserved face discrimination despite lack of face experience. To paraphrase the reviewer: "Instead, it reflects the kind of face recognition performance one would expect from an object recognition system or a prosopagnosic patient: above chance but not very accurate. Thus, I think the behavioral data show not preservation of face perception abilities in a network trained without faces, but low performance at face discrimination, much like a network that has seen faces but not been trained to discriminate them."

    1. Reviewer #3:

      In this manuscript, Carvajal and coworkers prepared a recombinant LUBAC complex, composed of the full-length HOIP, HOIL-1L, and SHARPIN subunits, and analyzed its 3D structure by electron microscopy. This is the first report to show that the LUBAC complex has an elongated, asymmetric crescent-like structure, although it is low resolution. Moreover, the authors examined the intra- and inter-domain associations by cross-linking mass spectrometry, and investigated the oxyester-linked heterotypic branched ubiquitin chains produced through the E3 activity of HOIL-1L. These results are novel and intriguing; but unfortunately, this study has not provided detailed clarifications of the LUBAC structure and catalysis.

      Major comments:

      1) How about the EM structure from peaks I and III in Suppl. Fig. 1A? Peak I eluted in a higher molecular weight fraction than that of thyroglobulin (670 kDa). Is it possible to form a LUBAC complex consisting of trimers with 1:1:1 stoichiometry between the HOIP, HOIL-1L, and SHARPIN subunits? Peak III predominately includes HOIL-1L and SHARPIN, but lacks HOIP. Therefore, it seems possible to estimate the subunit organization in the 3D structure. Please clarify whether the 3D structure shown in Fig. 2B represents monomers or dimers with 1:1:1 stoichiometry between the HOIP, HOIL-1L, and SHARPIN subunits.

      2) On pages 7-8: The authors emphasize the interaction of the RBR domains of HOIP and HOIL-1L, based on their XL-MS analysis, and speculate that LUBAC may have a single catalytic center. However, since multiple interactions in-between LUBAC domains are detected (Figs. 3B-E), the authors need to explain why they focused on this particular interaction. It will be interesting to analyze the effect of E2 or E2~Ub.

      3) In Fig. 4B, why could the mixed LUBAC subunits generate a linear chain, but not an oxyester-linked branched Ub4? Does it form a high molecular weight complex in gel filtration? Please indicate the anti-ubiquitin blot in Figs. 4B and 4C to clarify the doublet migration in M1-Ub3.

      4) In Figs. 4E and 5A, it is interesting that Cezanne and vOTU could cleave ester-linked branched Ub4, although the molecular bases of these reactions were not revealed. Are the NH2OH-sensitive His-Ub3 and Ub2 generated by LUBAC, as shown in Fig. 5B, cleavable by Cezanne and vOTU? Please indicate that the Ub2 remaining after the OTULIN-treatment (Fig. 4E) is sensitive to NH2OH or not.

      5) Why did the NH2OH-treatment in Figs. 5F and 6C cause a drastic decrease in the linear ubiquitin level? The previous PNAS paper from Cohen's group showed a partial reduction in the molecular weight of the Ub chain bound to IRAK and Myd88 after NH2OH-treatment. In contrast, the current data seem to indicate that most of the LUBAC-generated ubiquitin chains were composed of an ester-linked Ub chain, but not a linear chain. Please indicate the lower molecular weight region of the immunoblot. It is surprising that GST-NEMO(250-412) almost non-specifically captured a variety of Ub chains. How about employing GST-NEMO-UBAN alone or M1-TUBE to specifically pull-down the linear polyubiquitin-containing chains?

      6) On page 11, 2nd paragraph, although the authors described that "the restriction analyses showed that the ubiquitin chains assembled by LUBAC contained non-linear di- and tri-ubiquitin chains", the di-ubiquitin is barely detectable in Fig. 6B.

      7) On the bottom of page12, the authors mentioned that "LUBAC with HOIL-1L T203A,R210A assembled ubiquitin chains more efficiently than WT-LUBAC, but less efficiently than HOIL-1L C460". However, in Fig. 6E, LUBAC with HOIL-1L T203A,R210A seems to have the most powerful E3 activity. Moreover, it is not clear if the partial impairment of branching activity is due to HOIL-1L T203A,R210A, since the upper band of Ub4 has a good signal. Therefore, the authors should reconsider the scheme shown in Fig. 7. The NH2OH-sensitive upper band of Ub3 did not react with an anti-linear ubiquitin antibody, in contrast to the pan-ubiquitin antibody. These results suggested that the upper band of Ub3 consists of two ester-linked branched ubiquitins on single ubiquitin. Does it bind HOIL-1L NZF? If not, then HOIL-1L NZF apparently does not contribute the ester-linked branched ubiquitination activity of LUBAC.

    2. Reviewer #2:

      The manuscript by Carvajal et al. describes a study on the LUBAC complex. They build upon the striking and highly significant discovery that the HOIL-1 protein is an active ubiquitin E3 ligase with non-lysine esterification activity. This discovery was initially demonstrated by Kelsall et al. As the original findings by Kelsall et al. were quite unexpected, and in part contrary to a study from the Iwai lab, the findings presented here corroborating the former study are of great importance for the field.

      Testament to the challenges with structure determination of the LUBAC complex, little structural information is known, despite its discovery over 10 years ago, few structural insights have been obtained. Carvajal et al. report an insect-based expression and purification system for preparing recombinant LUBAC and present a low-resolution structure of the LUBAC complex consisting of sharpin, HOIL and HOIP at 1:1:1 stoichiometry. The structure is supported by mass photometry and most informatively, crosslinking mass spectrometry. However, the low resolution of the negative stain EM LUBAC structure does not allow placement of the individual subunits but does reveal an asymmetric elongated dumbbell shape. Complementary XL-MS data suggests the catalytic RBR modules from HOIP and HOIL-1 are in proximity. They build upon the work of Kelsall et al. by demonstrating that HOIL-1 retains its esterification activity when part of the LUBAC complex. This is notable as it allows prior LUBAC-associated function to be implicated with non-lysine ubiquitination. The manuscript implies that a major function of HOIL-1 esterification activity is to introduce ester branch points within linear Ub chains, and this is observed within cells after TNF stimulation. Intriguingly, at the end of the manuscript they propose that HOIP and HOIL-1 might undergo ubiquitin relay, reminiscent of that reported for MYCBP2 by the Virdee lab.

      Overall the manuscript is an important contribution. Some additional experiments should be carried out. Furthermore, the manuscript in its current form affords only a modest advance over the Kelsall et al. study. Additional experiments should also be carried out to address this as stated below.

      1) The grey unannotated regions (Figure 3) in sharpin, HOIL and even HOIP to a degree demonstrate anomalously promiscuous crosslinking. Could the authors comment and perhaps add some discussion to the paper? Does this suggest these unannotated regions are highly dynamic? Might this relate to the difficulty in solving higher resolution structures?

      2) Thr12 and Thr55 were identified as potential ester linkage sites within polyUb species. However, their mutation did not abolish formation of the hydroxylamine sensitive bands. The authors should state the observed ubiquitin sequence coverage in the MS experiment. Which regions were not covered?

      3) To confirm that the residual oligomeric Ub species after OTULIN treatment are exclusively ester-linked, a subsequent hydroxylamine treatment step should be performed.

      4) The authors hypothesise that a key function of the HOIL-1 esterification activity is to form heterotypic chains. Whilst this might be the case, the alternative hypothesis that HOIL-1 primes substrates via an ester linkage, which are then linearly extended by HOIP, is also equally valid. Particularly as multiple substrates have been reported to be modified with linear chains yet HOIP appears to be tailored to modify a Ub substrate exclusively. The authors should discuss this alternative hypothesis and also how and why both systems might be important.

      5) Perhaps in further support of substrates being the most abundant ester linked species, NEMO enriched linear chains from TNF treated cells show a much more pronounced collapse compared to the ester-linked Ub-Ub linkages produced in vitro in the absence of substrate. It would greatly strengthen the paper if they could add a recombinant substrate to the in vitro reaction (e.g. IRAK1/2 or MyD88). I am not sure about the feasibility of this.

      6) Finally, the suggestion that HOIP-HOIL Ub relay might be at play is exciting and implies that E3-mediated Ub relay might be a prevalent process. In principle it should be possible to test this by impairing E2 binding to the RING1 domain in HOIL in the LUBAC complex. A steric mutation (e.g. X to Arg) would be a more elegant approach than mutation of the zinc coordinating cysteine. If relay is at play then the LUBAC should still be able to form ester linkages.

    3. Reviewer #1:

      Carvajal et al. provide a novel mechanistic insight to the function of HOIL-1L in the formation of heterotypic ubiquitin chains in the context of the full LUBAC complex. This expands on recent work suggesting HOIL-1L has the intrinsic ability to form oxyester-type linkages on its own, and nicely describes the phenomenon in the context of LUBAC both in vitro and in cells. Initial descriptions of the preparation of pure and stoichiometric LUBAC complex are clear and will be of utility to the field. The authors use negative stain EM to structurally characterize the complex, but conformational flexibility prevented the generation of a reliable 3D model for de novo model or docking of known components. The organization of the complex is also described by XL-MS, which enabled the authors to suggest positions the RBR domains of HOIP and HOIL-1L in proximity along with the NZF domain of HOIL-1L into a putative catalytic center. Visualization of a unique triUb or tetraUb conjugate is analyzed with gel-based assays to assess determinants associated with its formation or destruction. The unusual species are formed only in the presence of co-purified LUBAC containing catalytically active HOIL-1L, but without requirement for the previously suggested T12 acceptor residue within Ubiquitin. Further, the heterotypic chains are removed by treatment with hydroxylamine (a nucleophilic acceptor of oxyester-linked Ub) or treatment with Cezanne (a deubiquitinase with K11 linkage specificity) but not OTULIN, a deubiquitinase specific for Met1 linkages. The work is given cellular context by induction of LUBAC activity in response to TNF signaling in lysates of MEFs with wild-type or mutant HOIL-1L. Indeed, more hydroxylamine-sensitive Ubiquitin chains are formed (and immunoprecipitated by the Linear-chain binding NEMO construct) in the wild-type but not HOIL-1L catalytic mutant MEFs upon TNF stimulation.

      This clearly written and well-organized manuscript presents new insights into LUBAC assembly and its formation of heterotypic chains. While it is unfortunate that the seemingly well-behaved, monodisperse, stoichiometric complex could not be further structurally characterized, the biochemical characterization of heterotypic Ub formation is thorough and the study constitutes an impactful advance in our understanding of polyubiquitin formation, non-traditional chain linkages, and the LUBAC.

      My primary criticism is centered on the 3D structure presented - what does it really contribute to the study? The 2D analyses demonstrate the substantial flexibility of the complex, and projections generated from the 3D structure only marginally match the selected projections shown in Figure 2. If EM analyses are meant to support the biochemical reconstitution of the active LUBAC complex, then the 2D class averages are more than sufficient. Based on the 2D data, and the fact that there are many class averages that are not recapitulated by 2D projections (and vice versa) it is highly unlikely that the purified complex is consistent with a single 3D structure. If the authors were able to use negative stain of complexes, where individual subunits contained identifiable tags (e.g. GFP, MBP), to localize subunits and corroborate the XL-MS, perhaps a 3D model would be appropriate, but as it stands, I don't see the utility of the 3D density.

      One other issue has to do with the 2D XL-MS plots. I've always found these plots to be particularly uncompelling representations of 3D structures. In particular, circus plots such as Figure 3B are difficult to interpret. Is it possible to "weight" the quantity or confidence of observed crosslinks, such that the reader's attention would be drawn to the most important and obvious linkages? This could be accomplished by using different line widths, color shade, or the presentation of multiple plots at distinct cutoff values. Further, the pair-wise domain representation similarly gives the impression that a single domain (or even single residue) is caught crosslinking to almost every part of the opposing protein {a straight line in the plot which contains many dots) in several instances. This could similarly benefit from thresholding or a more cautious description. Can it truly be inferred that the red RBRs and green NZF of HOIP and HOIL-1L are forming a catalytic center, when grey linker-regions are over-represented in the plot? It may also be visually more appealing to make non-domain grey regions significantly smaller in thickness than known domains or even just a linking line, in all representations 3A-3E and 6D.

      I do not review anonymously, and I applaud the authors for publicly sharing their submitted manuscript on the bioRxiv preprint server. This practice enables others to benefit from findings presented in this research, as well as providing the authors with feedback from the community prior to completion of formal peer review. A postdoc in my lab, Randy Watson, helped me with this review.

      -Gabe Lander

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This manuscript by Carvajal et al. provides novel insights into HOIL-1L's activity within the LUBAC complex in synthesizing heterotypic, branched ubiquitin chains through oxyester-bond formation. The authors successfully produced and isolated recombinant LUBAC, containing full length HOIL-1L, HOIP, and SHARPIN, and although the intrinsic flexibility prevented a higher-resolution 3D-structure determination, negative-stain EM combined with crosslinking mass spec revealed important new information about the architecture of this complex. Based on the observed spatial proximity of HOIL-1L's and HOIP's catalytic RBR domains, the authors propose an intriguing ubiquitin relay mechanism between these E3 ligases in LUBAC.

      The reviewers agreed that this work represents an important contribution to the field, as it corroborates and extends previous findings of HOIL-1L's non-lysine esterification activity. However, the advance and impact could be improved by some additional experiments to further strengthen the mechanistic conclusions.

    1. Reviewer #3:

      General assessment:

      The manuscript "Calponin-Homology Domain mediated bending of membrane associated actin filaments" by Palani et al investigates the role of truncated versions IQGAP1 (from yeast and humans) in forming ring-like structures on lipid supported bilayers in an in vitro TIRF assay. This reviewer is still confused by the mechanism that the "curly" truncation uses to bend actin filaments and context between this new "curved actin" generating mechanism with the mechanisms for generating actin rings in other contexts could help the reader understand this advance with more clarity. The authors mention several physiological contexts where the formation of actin rings might apply (associated with mitochondria, in axons, and during cell division in the actomyosin ring) however do not follow up with experiments addressing these specific ringed structures, rather non-specific cortical actin rings in several cell types. While this work has strong potential and is very intriguing additional support/clarification is required to back the claims made by the authors.

      Numbered summary of substantive concerns:

      1) The visual components of this work are striking. However, the accompanying quantification is somewhat confusing. Throughout the text mean values are listed for various parameters beyond those shown in the figures and it will improve the flow of the manuscript/aid the reader if these were represented as panels in each figure. Further, at least 3 FOVs should be analyzed for all analysis, from independent experiments, however it appears that a single FOV was measured in several figures (i.e. Figure 3 sup 1; Figure 3 sup 2). Other experiments also have relatively low "n" (i.e. 6 filaments measured for the analysis in Figure 2 sup 2). Do these N values have enough statistical power to support these conclusions?

      2) In the movies provided it looks like many of the "rings" are formed away from the coverslip and "fall" down into the TIRF field. Are these movies the most representative of ring formation for these versions of IQGAP? A comparison to actin filaments "alone" but with the lipids might ease this concern.

      3) Are the two IQGAP1 truncations dimers or monomers? Based on sequence alone it appears the dimerization domain is lacking from these constructs, but the SNAP-labeled images in Figure 2 have bright punctate and dimmer filament-like structures. The addition of a model or further clarification on how this arrangement of labeled IQGAP leads to ring formation would aid the reader.

      4) From the image presented in Figure 4 the "rings" from the human IQGAP1 truncation look substantially different than that from the yeast version - they are much larger (about 5x) and while "curvy" not exactly tight rings like I can see in the yeast examples. Yet the quantification as presented looks very similar. Is there a different optimal lipid content between mammalian or yeast lipids? Is the longer unstructured region in the mammalian isoform contributing to the difference?

      5) The authors should provide an explanation in the body of the manuscript of what "curly" constructs are being used in mammalian cells. From the methods it looks like the yeast truncations are being expressed. This should be compared to the mammalian version. Additionally, are the cellular rings a similar size to those observed in vitro (perhaps from the example in mammalian cells they are, but not for the yeast?). Additionally, this work would be really sing the in vitro rings were linked to a specific population(s) of cellular actin rings - what is the nature of the cortical rings analyzed by the authors? Are these actin associated mitochondria? Where is IQGAP1 during cell division?

    2. Reviewer #2:

      In this manuscript, Palani and coworkers investigate the structural effects of binding of a fragment of the IQGAP family of proteins, called "curly", to actin filaments. When tethered to a supported lipid bilayer, curly induces curvature in actin filaments, ultimately giving rise to ring-shaped filament structures. Filament decoration by tropomyosin increases the propensity of ring formation, and introduction of myosin II filaments induces constriction.

      This manuscript presents novel and intriguing insights into the mechanisms that regulate the formation of cytoskeletal structures with curved geometries. The manuscript is well written, and the experiments are logically described. As such, this paper is sure to be of interest to a broad audience.

      Below are a few suggestions I would like to see addressed:

      1) What is the magnitude of curly's affinity for actin filaments? How does this compare to the binding affinity of the isolated CH domain?

      2) Given that curly is proposed to contain two actin-binding sites, has this protein ever been observed to bundle filaments? Also, do multiple filaments ever become incorporated into the same ring?

      3) How does the counter-clockwise direction of curvature of the actin rings compare to the helical pitch of the actin filament? In other words, are the actin subunits being wound tighter around the filament's long axis or are they being loosened?

      4) The authors compare the structural effects of curly binding to those produced by cofilin. Cofilin binding has been reported to alter the twist of actin filaments. Is this what is proposed to happen for curly-bound filaments as well?

      5) At the bottom of page 3, the authors state that: "Importantly, the uni-directional bending supports the hypothesis that the binding site of curly with actin filaments defines an orientation, and the propagation of a curved trajectory once established indicates a cooperative process."

      Cooperativity implies that a process becomes easier once it is started. Do the authors have evidence that it becomes easier to bend the filament along its length once the first binding/bending event occurs? Or is it possible that the additive effect of multiple filament bending events eventually generates a ring-like shape?

      6) It is unclear to me how the model of the myosin II-bound actin ring in Figure 3 Supplement 4 Part E illustrates a possible mechanism for myosin-induced constriction of the actin ring. If I am interpreting the schematic correctly, the authors indicate that ring constriction occurs via the application of force in the upward direction to the inner portion of the filament on the left side of the ring, and in the downward direction to both the inner and outer parts of the filament on the right side of the ring. However, it is my understanding that pulling simultaneously on the outer and the inner parts of the filament on the right side of the ring would not stimulate constriction. I believe one would have to pull on only one of those outer and inner segments at a time to slide them along each other and constrict the ring.

      If I am misunderstanding the schematic, can the authors correct me by expanding on their proposed mechanism?

      7) How constrained are the motions of Rng2 in S. pombe? Once Rng2 localizes to cytokinetic nodes, do the nodes move around enough to be mimicked by tethering curly to the supported lipid bilayer?

      8) The reference to the Tebbs and Pollard paper has an incorrect author listing in the References.

      9) The filament on the left in Figure 1A has a left-handed helical twist and should be corrected. The same is true for the filaments in Figure 3 Supplement 2, and Figure 3 Supplement 3.

    3. Reviewer #1:

      The IQGAP family proteins interact with actin, and contribute e.g. to the formation of cytokinetic rings. Here, Palani et al. provide evidence that the N-terminal fragments of these proteins, composed of a CH domain and 'unstructured region', contain two separate actin-binding sites and can bend actin filaments into rings. This activity requires anchoring of the IQGAP fragment, which they named 'Curly', on the surface of a membrane. Moreover, they demonstrate that actin filament bending by Curly can be enhanced by addition of tropomyosin, and that myosin II can contract these actin rings.

      Major comments:

      1) The authors discuss on pages 1 -2 how full-length Curly and its various deletion constructs bind actin filaments. However, actin-binding was not properly tested for any of the constructs used in this study. Thus, the authors should carry out proper actin filament co-sedimentation assays for all constructs. The assays should be performed with a constant concentration of Curly, and varying the actin concentration (form 0 uM to e.g. 8 uM) to obtain binding curves, and to be hence able to compare the F-actin affinities of different constructs.

      2) The cell biology data presented in Fig. 4 and Fig. 4 - figure supplement 2 are not particularly convincing. The authors should thus perform a careful quantification of F-actin curvature and 'actin ring frequency' in cells transfected with plasmids expressing (i). EGFP, (ii). EGFP-Curly, and (iii). an EGFP-Curly mutant defective in ring formation. Because EGFP-Curly most likely does not associate with the plasma membrane in cells, it is somewhat confusing how it could still induce the formation of actin rings. Thus, the authors may observe much more robust actin ring formation in cells if they would use a membrane-anchored Curly-EGFP instead of soluble EGFP-Curly.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This manuscript reports the structural effects of a fragment of the IQGAP family proteins, called "curly", on actin filaments. When tethered to a supported lipid bilayer, Curly induces curvature in actin filaments, ultimately giving rise to ring-shaped filament structures. Moreover, this study demonstrates that filament decoration by tropomyosin increases the propensity of ring formation, and introduction of myosin II filaments induces constriction of actin rings.

      The findings presented in this manuscript are potentially very important. However, in some cases the results are somewhat preliminary and lack essential controls. Thus, additional experiments and data analysis are required to strengthen the study.

    1. Reviewer #2:

      This study investigated gene expression profiles related to diabetic retinopathy by using several strategies. First, they tested differential gene expression associated with response to glucose by comparing lymphoblastoid cell lines (LCLs) between cases (with retinopathy) and controls (without retinopathy) with type 1 diabetes. Secondly, they identified significant eQTLs from gene expression analysis and public gene expression databases and then tested significant eSNPs by the meta-analysis GWAS using independent cohorts. Furthermore, they confirmed one gene expression, the FLCN gene, to be a mediator of diabetic retinopathy by the Mendelian Randomization method. The aims of the study are clear and the paper is well organized. However, the following points should be addressed.

      Comments:

      1) It is confusing that the authors used different selection criteria for gene identifications. In Results (Line 472), they identified 19 differential response genes (P <0.05) between retinopathy cases and controls. However, they have selected the top 103 genes with P<0.01 (Results, Line 494) for further investigation. The reason for this is unclear. I assume that the FLCN gene is in the top 103 gene set but not in the above 19 gene set. Explanations are needed for including specific genes for different analysis purposes.

      2) The authors selected LCLs from individuals of 3 groups, non-diabetes (nDM), type 1 diabetes without retinopathy (nDR) and type 1 diabetes with proliferative diabetic retinopathy (PDR). I didn't see much benefit of utilizing nDM samples in the analysis. Although both gene expression and GSEA methods were conducted, the results were not relevant to diabetic retinopathy. What is the purpose of including these samples?

      3) Similarly, it is not clear what the purpose of using the gene set enrichment analysis (GSEA) was. My understanding is that the authors performed most analyses to identify genetic components by gene-based or SNP-based methods in the manuscript.

      4) The authors tested gene expression profile and associations using data from type 1 diabetic retinopathy. However, for the confirmation with UK BioBank (UKBB) data, they included all samples with both type 1 and type 2 diabetes. Did you perform the analysis stratified by the type of diabetes? Do you have any explanations of possible differences?

    2. Reviewer #1:

      This paper is based on the analysis of a blood cell line of 22 subjects from three different groups in relation to diabetic eye disease. It includes first a transcriptome analysis based on microarrays. Then the studies are mainly based on bioinformatics analyses with GWAS meta-analysis and GTEx data extraction. The in silico study is followed by a so-called validation in the UK biobank.

      The overall strategy is sound and the paper well written. It remains that the whole paper and it’s conclusions are based on a very small number of samples and not supported by strong experimental data about causality. This reviewer is surprised that the title only focused on "Mendelian randomization", which is an overstatement of this gene expression study. In addition stating that RM "identifies folliculin expression as a mediator of diabetic retinopathy" is also an overstatement for this reviewer (the mediator effect is not shown). Overall, the small group of studied subjects present huge differences in duration of diabetes and glucose control, the 2 main risk factors for retinopathy. How can you differentiate the biological effects of long term high glucose and their impact on retinopathy? In other words is it possible to change the title to "Mendelian randomization identifies folliculin expression as a mediator of long term uncontrolled diabetes"?

      Based on the transcriptome analysis this reviewer is afraid that the conclusion "This finding suggests that chronic glucose exposure depresses cellular immune responsiveness and may explain in part the increased risk of infection found in patients with diabetes" is not based on evidence as authors selected transcripts of their choice and also because causality is not shown. "Individuals with diabetic retinopathy exhibit a differential transcriptional response to glucose". Note that the level of association shown (especially for PDGF) is somewhat marginal. "Genes with differential response to glucose are implicated in the pathogenesis of diabetic retinopathy." This part is the most intriguing and original but it is based on expression in many tissues and thus the title is also overstated: it shows some kind of association but certainly not that these 103 genes "are implicated" in retinopathy.

      "Folliculin (FLCN) is a putative diabetic retinopathy disease gene" this part is also interesting (and includes some in vivo experiments) but this reviewer wants to stress that the original whole genome gene expression study did not detect FLNC as differentially expressed in the blood cells of the patients with retinopathy. Why?

      It is also noteworthy that to this reviewer's knowledge no GWAS found SNPs near FLCN associated with diabetes or complications. This is worrying.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This study investigates gene expression profiling related to diabetic retinopathy using several strategies including differential gene expression associated with response to glucose by comparing lymphoblastoid cell lines (LCLs) between cases (with retinopathy) and controls (without retinopathy) with type 1 diabetes. The study identified significant eQTLs from gene expression analysis and public gene expression databases and then tested significant eSNPs by the meta-analysis GWAS using independent cohorts. The expression of one gene, FLCN, to be a mediator of diabetic retinopathy by the Mendelian Randomization method was confirmed.

    1. Reviewer #3:

      In their manuscript "Characterization of the dynamic resting state of a ligand-gated ion channel by cryo-electron microscopy and simulations", Rovšnik et al. describe a structural study of the GLIC ion channel under 3 pH conditions combining cryo-electron microscopy and molecular dynamics simulations. Their aim is to shed light on the resting state (neutral pH) structure of this ion channel, that has previously been described by a crystallographic study with intriguing observations. Although the authors do not really say so explicitly, it seems their interpretations of the new data largely confirm the conclusions of that previous work. This is a major point that needs to be made explicit: does their study confirm (and to what extent) the one by Delarue (ref [27]) and how similar are the structures. Here a comparison of the pH7 cryo-EM and x-ray density maps could be a welcome analysis. The important related question is: what new information (in terms of the ion channel function etc., not in terms of structure determination methodology) do we learn from this study compared to ref [27]? This should also be made more explicit and be implemented by taking into account intrinsic uncertainties in the study (see next paragraph).

      One concern - quite honestly raised by the authors themselves - is to what extent the cryo-EM maps obtained at ph3 and ph5 may represent the expected functional state, or incorporate some artefactual conformational substates, as they seem to lack a few key features of an open/active state that would be expected under these conditions. For the ph7 state as well, it cannot be excluded that the observed conformation bears some traits of desensitized or intermediate states, as is mentioned in the present manuscript. These overall uncertainties are somehow convoluted with the interpretation and analysis of the data, and in the present version of the manuscript it needs to be made clear much earlier that most of the interpretations only hold/make sense if one assumes certain hypotheses (eg that the pH7 structure is a resting one and not any of the other possibilities for instance, etc.), which otherwise is perfectly fine.

      The last major concern about the manuscript concerns the computer simulations. The protonation states adopted to represent activating or resting simulations are not explicitly given in the paper, nor the choices discussed and justified in any way, whereas this seems to be a rather controversial issue for the simulation of this particular pH-gated channel as literature attests, and obviously a central one with respect to the questions studied in the present work. Also, are there indications in the cryo-EM derived structures on specific protonation states (eg two acidic side chains very closeby may indicate at least one is deprotonated, etc.)? The next issue that has not been mentioned, but seems quite critical to assess whether activating simulations actually go the right way, is about the wetting/dewetting of the channel pore. Are they stably water-filled in any of the simulations? This is one of the metrics actually used in ref. [21] and a few of which have been adopted for the analysis in Fig. 5 of this paper. A more detailed comparison with that computational work seems rather commendable, as well as probing more of the metrics that are employed there. Also, the discussion of Fig. 5 results should be extended, as it is not clear how to interpret this important figure. Why were the simulations ordered as they are? And how consistent are the observed trends for ECD radius, twist and upper spread?

    2. Reviewer #2:

      This article reports 3 new structures by cryo-EM of a bacterial pentameric ligand-gated ion channel (pLGIC) known as GLIC, in its resting form, at 3 different pH: pH 7, pH 5 and pH 3. The resolution extends from 4.1 Å for the first one and to 3.4-3.6 Å for the last two. Since GLIC is gated by protons, one should see at least two different forms, resting and active, at the various pHs. The main results are the following:

      1) The structure at pH 7 is in a resting state and is highly flexible

      2) It becomes much less flexible at pH 5 or pH 3, but the pore remains closed

      3) All three structures were obtained in detergent (not in nano-discs)

      In itself, this is a valuable article with a lot of new interesting information. However, I suggest to consider the 4 following points to improve the manuscript. In a nutshell, I see 3 main points in the analysis of the structures that should be addressed, plus a methodological issue.

      1) The fact that GLIC at pH 7 in its resting form is highly flexible was already known before this study and has been extensively documented in the article that describes the x-ray structure at 4.4 Å (Sauguet et al., 2014, Ref. 27) because the asymmetric unit of the crystal contains in fact 4 different pentamers in different conformations. This should be better discussed in the article, in particular in relation with Figure 4 of Ref 27, where the dynamical nature of the resting state is clearly mentioned.

      2) While the analysis of differences between GLIC structures at 3 different pH is well conducted, there is no detailed comparison with the other crystal structures of the same ion channel GLIC, which are listed in the manuscript (p. 2, line 27 to p. 3 line 6): the crystal structures of the resting state, the activated state, a locally-closed state and a possible desensitized state. One should expect at least a panel in a principal Figure of a detailed comparison between these structures. To understand the differences between the 3 structures presented here (pH 7, pH 5 and pH 3) and other known structures of GLIC, a projection of these 3 structures on various 2D maps should be presented using relevant variables (RMSD are rather useless here), along with representative structures of all other known forms of GLIC: the open form (4HFI), the 4 structures in 4NPQ and the locally closed form in 3TLT. See B. Lev et al, PNAS 2017 for such variables, in Figure 4 and 5 (ECD radius, beta expansion, M2-M1(-) distance, ECD twist).

      3) While it is surprising to observe that the pH 3 structure is still in a resting form, it is possible to interpret this as the left side of the minimalist reaction path of the allosteric transition that looks like this:

      pH 7 closed <-> open

      ^ ^

      | |

      v v

      pH 4 closed <-> open

      However, the reaction path of the gating transition is unlikely to be this simple. The dynamics of the gating transition in GLIC has been extensively studied in B. Lev et al., PNAS 2017 by long MD simulations and the string method. Unfortunately, this article is not cited in the present work, nor any detailed comparison of its conclusions with the proposed pathway presented in Figure 6A. In particular, Lev et al. insist on the role of the salt-bridge D32-R192, that gets broken to form another salt bridge D32-K248 in the open form. Do the 3 new GLIC structures solved in this new work confirm the importance of this salt bridge in driving the transition or not? In p. 6 the authors analyze specifically the conformations of the side-chain K248 but do not mention this possibility.

      4) Methodology (p. 10) The paper reports both a new and interesting method to refine models in cryo-EM maps using MD simulations with adaptive constraints and the resulting refined models. But the validation of the method itself on well documented test cases is missing (unless I missed something). In other words, there is some sort of a circular argument here: a new method is presented that allows good sampling and flexibility in the refinement under experimental constrains, but the justification is simply the output of the method, namely fitted -and flexible- models. While it is possible that the new method is superior to other extant and validated methods in speed, is it as accurate - or more?

      Specific comment on the Figures:

      Figure 1: The structure at pH 3 has (overall) a slightly higher local resolution than at pH 5. Any comment?

      Figure 2: Does K248 makes a salt bridge with D122 (Panel B)?

      Figure 4: Rmsd do not bring a lot of information. Could the authors map their structures, along with all other known GLIC structures, on 2D maps with essential parameters such as ECD twist angle, M2-M1(-) distances as in Figure 4 and Figure 5 in Lev et al., PNAS, 2017?

      Figure 5: Again Rmsd -and their distribution- plots do not bring a lot of information. Also,

      1) Which pentamer has been used for the pH 7 X-ray form? (there are 4 of them in the asymmetric unit). Would the result be different with a different pentamer?

      2) I strongly oppose the names of the so-called pH5 and pH3 cryo Activating forms: they are not Activating, but merely the same structures with different sets of electrostatic charges. This is misleading, the reader might think it is an experimental structure (cryo). Best if the words Resting and Activating are changed to Deprotonated and Protonated, respectively.

      Figure 6: Panel A should be compared and discussed with Figures 3 & 4 in Sauguet et al., PNAS 2014, as well as with the Discussion in Lev et al., PNAS 2017.

    3. Reviewer #1:

      This manuscript reports cryo EM structures of the GLIC channel under resting (high pH), partially (pH 5) and fully (pH 3) activating conditions. The structures reveal some features that were not so well resolved in previous X-ray structures and use simulations to suggest a dynamic structure at high pH, indicative of an ensemble of resting state conformations, compared to a more compact and well-defined structure under activating conditions. This idea is not entirely new, however, as it was a conclusion of the resting state X-ray structure paper of Delarue and co-workers (ref.27). The study also sees changing structural elements that might imply roles in gating, such as with loop F and interactions of E243, though also suggested in past X-ray structures. It is surprising that all structures, including under maximally activating conditions, are completely closed, and the explanation for this is not compelling. Another surprising outcome is that the distributions from simulations of the resting state at high pH based on the new cryo EM structure are so different to those obtained using the past X-ray structure, and there are indications of lack of convergence of these simulations.

      The findings and discussion of Delarue and co-workers in Ref27 could be more prominent, including in the introductory statement, which could be cited along with refs 11,14,15 as a solved resting state, and not just described as being of low resolution. I refer to Fig.3c of ref.27 which conveys the idea of the diverse resting state distribution in that paper.

      In regards to the "relative novelty" of the methods used for MD fitting to cryo EM data, it is not obvious how different the approach is to standard MDFF flexible fitting strategies. Although there is brief mention in the discussion section, it is not clear from the introduction and methods how novel the approach is. I do suggest, however, that it does not make sense to refine the structure with simulations of GLIC in a POPC lipid bilayer, when the cryo EM involved detergent solubilised particles. Fitting MD should have been done in micelles as it is not appropriate to refine in a different environment to which it was solved.

      The authors claim higher RMSD for pH 7, but fig.4A suggests divergence of simulations in 1us. It seems the simulations would need to run longer to reach an equilibrium distribution. It is curious that such divergence is not evident in high pH X-ray structure simulations in the same figure. Does this suggest the cryo EM structure at high pH is unstable? Is this increasing RMSD spread uniformly or due to changes in particular parts of the protein during MD? I note that subsequent analysis, such as fig.5, revealing no maximum in the distribution for ECD bloom compared to X-ray simulations at high pH, may be due to not yet converging on an equilibrium for the resting state (and pre-equilibration period not being excluded).

      Despite the pH 7 cryo EM simulations likely being not yet equilibrated, leading to some uncertainty about the meaning of the distributions in Fig.5, it is clear that low pH leads to a more tightly bound ECD bloom range than pH 7 in that figure. Although the effects of pH are similar between cryo EM and X-ray starting structures, why is the peak in Fig5b ECD twist also so different for pH 7? This also could be an artefact of lack of equilibration. Differences are also noted at low pH.

      Fig5c is striking. It suggests cryo EM at low pH has failed to capture an open pore, whereas X-ray was able to capture an enlarged pore radius. The authors write that this was initially surprising, having all low pH structures closed, but consistent with past X-ray with one structure partially closed. But here all structures look completely closed, whereas a fairly even mix of open and closed TMDs may have been anticipated at low pH, at worst. The possible artefact due to interaction with the glow-discharged cryo EM grid could be better explained for the reader. On page 16, the authors say the closed pores do not look like they would expect for a desensitised state. This also needs a better explanation with more specifics. They then suggest it may be because at low pH the pore can flicker and the open pore has a high free energy. Why is the open state expected to be high free energy at low pH? Doesn't the pH50 of 5 suggest the equilibrium is shifted to the open channel (lower free energy) by pH 3, as also suggested by previous free energy analysis in ref.21? While fig.6 is used to illustrate a reduction of number of closed states to the left with lowered pH, "priming" the protein for gating, again it does not make sense to me that at low pH the free energy of the open state on the right is higher than the closed state on the left.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This manuscript reports cryo-EM structures of the pentameric ligand-gated ion channel GLIC at pH 7, 5 and 3. The reviewers have appreciated several aspects of the manuscript, which combines experiment with simulation to describe the GLIC channel's resting state. However, concerns have been raised. The reviewers have questioned what has been gained in addition to previous work on the structure and mobility of the resting state (Sauguet et al. PNAS 2014; ref.27), not described in this manuscript. How do the new structures compare to past X-ray structures/density maps? The reviewers raise questions about the functional states found. In particular, while rigidification at pH 5 or 3 is interesting, normally it should switch to the open state, especially at pH 3, and why this has not occurred is not explained well. Several concerns have been raised about the simulations and what is learned. This includes protonation state choices (not discussed or justified), why flexible fitting was conducted in a bilayer instead of a micelle (which may impact regions of the map less well defined), and have the simulations converged? The reviewers note lack of informative analysis, leaving us in the dark as to the functional states visited. It has been suggested that analysis in collective variable space would be needed, such as defined in Ref.21 (not discussed in this manuscript), so that the reader can observe if structural features change, despite maintaining an apparent resting conformation (e.g. does the D32-R192 salt bridge break; does the pore wet/dewet)?

    1. Reviewer #3:

      The manuscript by Karamanlis and Gollisch examines the responses of mouse retinal ganglion cells (RGCs) to natural stimuli. The primary conclusion of the manuscript is that spatial integration of stimuli within the receptive field is nonlinear. This nonlinear integration is consistent with "local signal rectification". This results in a set of RGCs that are sensitive to spatial contrast within the RF. The Authors also note the presence of cells that are suppressed by contrast and cells that prefer uniform stimulation of the RF. To reach these conclusions the authors use multi-electrode array recordings from isolated mouse retina. Spatial RFs are estimated using white noise stimuli, which are then used to generate a null-model for linear spatial summation. They compare predictions of this null-model to the responses of the same RGCs to briefly flashed natural images. The authors find some RGCs that are consistent with this null model and many that are not consistent. The authors correlate deviations from linear spatial summation to deviations revealed by contrast reversing gratings. They also used a mixed-contrast, flashed-checkerboard paradigm to map the contrast tuning and rectification of RF subunits. Finally, the authors show that some of these results track with functionally distinct RGC types such as direction-selective and "IRS" RGCs.

      The data and analyses presented in this manuscript are high quality. However, I think the study is largely consistent with many previous studies that demonstrate nonlinear spatial integration among RGCs in the mammalian (including mouse) retina. I think the Authors view the use of natural stimuli as a major departure from previous work, but I'm not convinced of this for two reasons. First, I don't see a compelling reason to think that results using contrast reversing gratings or other 'textured stimuli' (e.g. Schwartz et al Nat Neuro 2012) would fail to generalize to flashed natural scenes. Second, the implicit claim here is that a 200ms flashed natural scene interleaved with an 800ms gray screen is a natural stimulus. I think this assumes a lot about the space-time separability of the RF mechanisms, and these assumptions are not well justified.

      Major Concerns:

      1) I think the introduction of the manuscript is building a straw man argument, suggesting that many (or most) scientists think the retina is predominantly linear. A pubmed search of 'retinal ganglion cell' and 'nonlinear' produced more than 300 studies. Specifying subunit nonlinearity produces 28 studies. The discovery of subunit nonlinearities is roughly 50 years old and many manuscripts demonstrate Y-like receptive fields are more common across RGC types than X-like receptive fields.

      2) The authors seem to be arguing that the spatial nonlinearities engaged by the contrast reversing gratings are not the same as those engaged by their natural scenes (Figure 3). However, I think the authors are assuming too much that the spatial and temporal components of the RFs are separable. The flashed natural scenes are interleaved with relatively long gray screens. The contrast reverse granting are reversed in a square-wave fashion with no interleaved gray screen. These distinct spatiotemporal dynamics in the stimuli seem likely to explain the difference. This would also seem likely to explain why the flashed checkerboards in Figure 4 produced results more correlated to flashed scenes in Figure 1. In summary, I don't see a strong reason to think the authors are observing anything other than subunit rectification of the sort described by Hochstein and Shapley in the 1970s and followed up in many subsequent studies.

      3) It is not clear to this reviewer that flashed natural images interleaved by a gray screen is qualitative more natural than white noise, sinusoidal gratings, or square-wave gratings.

      4) The null-model constructed by the authors in Figure 1 assumes the RF follows a specific functional form (e.g. Gaussian). However, many studies show that individual RFs frequently exhibit strong deviations from a Gaussian RF. To what extent are the deviations from the null model produced by deviations from linear summation or just linear mechanisms that deviate from the specific parametric form imposed by the model?

      5) It was unclear how the authors rule out the contribution of differences in (nonlinear) temporal integration to the effects in this study. In general, RGC RFs are not space-time separable, and it seems that the analyses in the manuscript assume they are.

      6) This study overlaps significantly with Cao, Merwine and Grzywacs (2011), 'Dependence of retinal Ganglion cell's responses on local textures of natural scenes', Journal of Vision. This article is not cited here, but in my view, the major conclusions are similar.

      7) In my experience, the strength of subunit rectification can be labile during ex vivo experiments. What controls have the author's performed to ensure the effect they are studying remain stable over the duration of their recordings?

    2. Reviewer #2:

      Summary:

      Understanding how retinal ganglion cells respond to natural stimuli is a central but daunting question, which retinal neurophysiologists have begun to tackle recently. Here Karamanlis and Gollisch perform large-scale multi-electrode recordings in the mouse retina and demonstrate that the responses of many ganglion cells cannot be predicted by standard linear-nonlinear models (L-LN). They go on to test a variety of clever artificial stimuli that emphasize and allow for the quantification of the non-linear aspects of RGCs responses and convincingly demonstrate that non-linear processing is associated with sensitivity to fine spatial contrasts (subunits) and local rectification. While these aspects of RGC receptive fields have been previously described, demonstrating their applicability to natural vision is a significant advancement.

      Major Comments:

      My first main concern is with the way the paper is written. It does not highlight the significant advancements but rather emphasizes what is already known from other studies. For example, many of the conclusions of non-linear spatial integration & signal rectification arising in bipolar cells have been well described previously. By contrast, novel aspects like the sensitivity of reversal gratings being unrelated to LN model performance for natural scenes should be explained more in detail. The authors should more clearly state the major advancements that are being made here beyond what has already been shown previously (e.g. Turner and Rieke, 2016)

      Second, the authors never include non-linear subunits in their model to demonstrate improved performance. Testing models with filters that incorporate rectification and convexity as experimentally determined will enable them to show their utility more convincingly. Without this, the reader is left with the conclusion that there are RGCs that exhibit non-linear or linear spatial integration (already known) and that non-linear integrators cause LN models to perform poorly with natural images (Turner and Rieke, 2016).

      Third, I'm not sure how 'natural' their natural images are, given static images are flashed over the cell intermittently. While such stimuli might simulate some sort of saccadic eye movements, whether this is relevant for mouse vision is not clear. Would linear models be more predictive for responses to natural movies? Some discussion on this issue would be helpful.

    3. Reviewer #1:

      This paper investigates how retinal ganglion cells integrate inputs across space, with a focus on natural images. Nonlinear spatial integration is a well-studied property of ganglion cells, but it has been largely characterized using grating stimuli. A few studies have extended this to look at spatial integration in the context of natural images, but we certainly lack a comprehensive treatment of that issue. The current paper has a number of strengths - notably using a number of complementary stimuli and analysis tools to study a large population of ganglion cells and linking properties of responses to artificial stimuli with those to natural stimuli. It also has a few weaknesses (some detailed carefully in the paper) - such as the inability to identify ganglion cell types (aside from a few), and to pinpoint specific circuit mechanisms. These are limitations of the techniques used. This is not a request as much as setting the context of the contribution of the paper. Generally the paper was in good shape, and the data supported the conclusions well. I do think there are a number of issues that could be strengthened. Those are listed below in rough order of importance.

      Statistical correlations in natural scenes:

      A number of analyses in the paper rely on estimating the spatial contrast from an image and comparing the dependence of various measures of the cells' responses on spatial contrast. A danger in this analysis is that spatial contrast is likely correlated with many other statistical properties of the image, so attributing a given response property to spatial contrast has some potential confounds. This issue should be discussed as a possible caveat, unless the authors can rule it out. The paper, accurately, describes the results in terms of correlations (and not causal relationships), but some discussion of the complexity of natural image statistics would be helpful.

      Comparison of grating and natural scene spatial scale:

      The section starting around line 233 was confusing for several reasons. First, this section starts by measuring the spatial scale associated with the grating responses, and then comparing that to LN model performance for natural inputs. It's not clear why the spatial scale is the relevant aspect of the responses to gratings. Indeed, the next paragraph provides a measure of the relative sensitivity of the nonlinear and linear response components (via a comparison of F1 and F2 responses). It would be helpful to include some initial text to motivate the different measures of the grating responses and to anticipate that you will look at both spatial scale and sensitivity. A related issue that bears more directly on the scientific conclusions comes up later in the blurring experiments. The issue is whether it is valid to directly compare the apparent spatial scale of nonlinear responses to images (estimated via blurring) with that of the grating responses. Natural images should have much higher power at low spatial frequencies, and this may strongly impact the spatial scale identified with the blurring experiments.

      Clustering of orientation-selective cells:

      An interesting suggestion in the paper is that the orientation-selective cells can be divided into two groups that differ in their spatial integration properties. Do these groups represent different orientations, as suggested in the text? That seems a simple piece of information to add. Related to this, I would suggest moving Figure S4 into the main text.

      Presentation of checkerboard stimuli and results:

      The checkerboard analysis, particularly how it isolates properties of spatial integration, could get introduced more thoroughly for a reader unfamiliar with it. A related issue is how well the chosen isoresponse contour captures structure in the full distribution of responses. In some cases that looks pretty good, but in others it is less clear. Could you add a supplementary figure or something similar that characterizes how consistent the isoresponse contours are for different response levels?

      Drift in responses over time:

      Some of the rasters - e.g. the bottom left in Figure 1C - show considerable drift over time. It is important that this drift not be interpreted as a failure of the LN model and hence indicative of nonlinear spatial integration. Can you test for drift like this across cells, and exclude any that seem potentially problematic? More generally, some assurance that the variability in the responses for a given generator signal value is real variability across images is needed.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      All of the reviewers expressed concerns about the advance that the work described in the paper represents. These issues were a focus of the consultation among the reviewers. The main concern is that the work needs to go beyond demonstrating that some ganglion cells exhibit nonlinear integration for naturalistic inputs - as that point is quite well established in the literature. The comparison between natural stimuli and gratings could help in this regard, but several issues confound that comparison (e.g. differences in dynamics of the two types of stimuli). These concerns are detailed in the individual reviews below.

    1. Reviewer #3:

      The authors study the effect of confinement on the alignment of REF cells confined within circular micropatterned islands. They observed that the cells are aligned perpendicularly to the boundary after 48h, contrary to other elongated cells such as NIH-3T3. After testing several subclones of that cell line, they identified cell contractility and cell-cell adhesion that affect the organization of the cells in the circular patterns. They confirmed this using drugs that affect contractility and disrupt cell adhesion. Then they compared their results to a continuum model and to a voronoi model.

      The science is interesting. Many cell types are elongated and do align with their neighbors. The fact that these cells align perpendicularly to a boundary is curious, and deserved to be studied in depth. 3 similar papers came out on arxiv from the Roux group. They should be discussed in the manuscript and cited.

      It is not clear what is the "condensation" process the authors are referring to and how this is related to the boundary alignment of the REF cells. Please, read the work of trepat et al on active dewetting published in 2018. I do not know what the author means by tendency. Some it condensed, sometimes It does not? IT is not a scientific term. I would advise choosing different wording to explain their results. Condensation is the first word in the title of the manuscript, still it appears for the first time in the text on page 18, and is poorly defined. It is never well explained and the 2 terms always come up together, condensation and tendency, like if the author does not know themselves what to call what they are observing.

      There is a lot of data, analysis and model, but it is very confused, not well organized and poorly presented, which prevents me from judging the quality of the interpretations. The authors chose to show all the analysis they could do in the figures, and therefore there is no clear take-home message. Are all those plots necessary?

      It was a very difficult paper to read. Often, terms like nematic, or symmetry are misused. Such words have a very specific meaning, in particular for liquid crystal physisicst, which are one of the targeted audience for this paper. The figures are not clear. They at the same time put too much information and not enough. There are too many graphs, I don't know which one is important. Please, plot the 2 cells types in the same graph instead of showing one graph/cell type. At the same time, there is often not enough information to understand what the authors are plotting, and what is the take-away message.

      Below, I have specific comments about the text, not so much about the science. Again, I found it very hard to read and understand, hence I am not able to judge the quality of the research at that point.

      Specific comments about the figures: Fig 3: What is the unit of the heat maps? Please add fluorescent image, and average for the second row, and for the plot, please, add a label "normalized mean intensity" of what?

      I do not understand Fig 4. The captions just reads the labels of the plots, it does not tell me the results, nor the relevance.the is no information in the caption, please revise.

      What is the main result of Fig5? The title could not be more vague: "Voronoi cell modeling predicts REF 2c cell behaviors in circular pattern.". Please give specific titles to your figures that help the reader understand the take-away message. Please change the contrast of Fig. 5A. All the disks look black to me. I have difficulties trusting statistical analysis. The top right plot of fig 5C looks totally flat to me. Why is there sometimesa statistical analysis and sometimes not? ( %B 1,2,4 and %C1 have no statistical analysis).

      Same critics for Figure 6.

      About the abstract: The terminology is vague and confusing, which I think that the authors have not fully characterized the connection between their experiments and the physics of liquid crystals. examples: "to form nematic symmetry" "to form a new type of symmetry" "new symmetry?" changing boundary condition does not mean you are changing the symmetry of the liquid crystal...

      Strong adhesive interaction... MDCK also have strong adhesive interactions, therefore the comparison is not adequate, please revise.

      What does "condensation tendency" mean? What does "prestrech" in the last sentence of the abstract mean? Is the tissue under stretch? There is no reference to stretch in the abstract before that.

      Comments about the introduction: The introduction is scattered, very confusing as it mixes results from a broad range of model systems. For example in 4 successive sentences, we have: adipocytes, then fish then reconstituted asters, then back to muscle cells. This looks like a laundry list... Same thing in the next paragraph: neural crest cells, mesenchymal stem cells, chondrocytes, At this point, it is not clear what cells types the authors are studying and why it is relevant to all the others cited in the introduction.

      Cell condensation is not "unique" to their cell types. MDA-MB-231 also do that ( ref: TRepat et al, Active wetting of epithelial tissues, 2018).

      "to robustly self-organize in polarized organization", please rephrase

      "mechanical variable have been used to describe the mechanical behavior of a cell monolayer", please rephrahe, this is way too vague. What are you trying to say?

      Why epitheial-like? Why not just epithelial? Are these cells different?

      What does "presented cytoskeleton" mean?

      3T3 cells are not incompressible. No cell types are. They divide all the time.

      You can have radial alignment in a nematic liquid crystal, it is called homeotropic anchoring. It has nothing to do with the symmetry of the liquid crystaline units.

      Condensation driven by chemotaxis? I never heard that. See again TRepat et al 2018. The cells are confined in a similar circular island, there is no chemotaxis.

      References were not properly cited. As an example ref [11] does not talk about the effect of confinement at all.

      About the methods: Manual tracking is passe. There are robust methods to automatically track cells. You are already segmenting the tissue, why not tracking the cells automatically this way?

      "The average speed for each cell was calculated as the total migration length of each cell divided by the total time". So if I track the cells for long enough and they diffuse randomly, the average speed is 0? Does not really make sense. For how long were the cells tracked? Are all the trajectories the same length?

      About the model: What other types of stress were neglected in the model and why? Especially, if you are trying to model a nematic liquid crystal, why not take into account the nematic elastic stress?

      Why nematic-like? This is confusing as is much of the terminology used in this manuscript.

    2. Reviewer #2:

      This article reports the radial alignment of rat embryonic fibroblasts at the periphery of circular confinement patterns. The authors experimentally isolate that contractility, adhesion and stiffness gradient are necessary to obtain this alignment. They further devise continuum and discrete models, with only two free parameters, to describe the mechanical origin of such cellular arrangements.

      The article is an interesting contribution to the field, with the discussion and conclusion well supported by the experimental data. It is further well written, with a good logic.

      1) The authors should explain (e.g., in an appendix) how they solve Eqs.(7-9) and how they run their Voronoi simulations (or indicate which solver/package they use if those already exist).

      2) A movie showing the formation of the radially aligned cell pattern would be a good addition, even if the dynamics are not discussed in the article. The x,y,t axes should be labelled (with units) in Fig.1-Supp.1.

      3) p.17 l.3, "stiffnesses" instead of "substrates"?

      4) p.20 l.7, the authors should better explain how Fig.1-Supp.4 supports a homogeneous isotropic contractility.

      5) The authors should show some of the images used to extract actin fibers structure (or are these shown in Fig.3?). Is Fig.4-Supp.1 obtained for REF 2c?

      6) p.24 l3, the authors may comment on how stiffness anisotropy could be incorporated in their model to explain inner cells' circumferential alignment. The author should plot the structure parameter (k_h) vs radial distance instead of giving a table (Fig.4-Supp.1 and Fig.6-Supp.1); they should use the same origin (the center of the circle) for the radial distance in the ring experiments (x-axis in Fig.6B and Fig.6-Supp.1A vs x-axis in Fig.7 and Fig.7-Supp.1) to facilitate comparisons.

      7) The authors should clarify what they mean by "clear boundary junctions" (p.18 l.9) when describing Fig.2D, which is challenging to discern.

      8) In Fig.4, are the authors showing the strain or the stretch ratio? It would help to start the y-axis at 0 in Figs.4A-B. At which distance are the radial strain and stress evaluated in Figs.4C-D? Are the pre-stretch ratio and stiffness gradient challenging to evaluate from the experiments (p.20 l.4)? Can the authors comment on the values needed for these model parameters to see radial alignment in the simulations? Are they realistic when compared to the experimental data?

    3. Reviewer #1:

      The manuscript by Xie et al combines an impressive array of experimental and modeling approaches to study cell morphological changes due to stiffness heterogeneities and contractility.

      1) The assumption of a purely elastic process needs substantiation. Fig. 1A shows a dramatic increase in the number of REF2c cells from 24 to 48 hours, suggesting that cells are proliferating. This, together with continuous remodeling of cell-cell contacts, would result in deformations that dissipate elastic energy. Neither modeling approach accounts for this. It would be important for authors to incorporate these behaviors, or to provide evidence that cell proliferation and remodeling are unimportant, and similar between the three cell populations being compared.

      2) The assumption that contractility is uniform needs to be substantiated. Work cited (Tambe et al) shows on the contrary that collective cell behaviors exhibit highly heterogeneous active stresses. Experimentally, there are a few potential ways at this. Authors could use the stiffer (1 MPa) micro post cultures, which recreate radial alignment seen on micropatterned PDMS islands, and compute force variations from post deflection. Alternatively, authors could perform short time lapse experiments to measure deformations following treatment with blebbistatin or Y27632. Yet another option would be to perform staining for contractile proteins such as phospho-myosin light chain, GTP-bound RhoA, or others, to confirm they are uniformly distributed despite the heterogeneity of F-actin (although this reviewer is skeptical that such experiments would reveal uniform contractility when F-actin is nonuniform). Finally, if no experimental support is possible, then authors could turn to model simulations to test whether spatial heterogeneities in contractility alter the overall behavior of the system (although, again, this reviewer is skeptical that such simulations would suggest the heterogeneity of contraction is unimportant). In addition to either modeling or experimental support for the assumption that contractility is uniform, authors should provide examples from the literature on related systems that support this assumption.

      3) The importance of a stiffness gradient in the cell population is one of the key aspects of this work. However, evidence for the existence of such a gradient is provided only by staining for F-actin, which is insufficient. While F-actin is indeed a key cytoskeletal component in defining the stiffness of cells, the link between intensity of staining and stiffness needs to be proven. Only a single reference is provided, which focused on one specific cancer cell line and the role of stress fibers - a specific configuration of F-actin together with myosin - in stiffening the cell. Moreover, given that F-actin interacts with nonmuscle myosin to form the key contractile machinery of most cell types, heterogeneity in F-actin likely implies heterogeneity in contractility as well. There are also concerns with the measurement of F-actin abundance, including need for statistics on the spatial distribution, and to normalize per cell to reflect variations in F-actin as opposed to simply variations in cell density, which are also present (Fig. 1A). Finally, the F-actin gradient is only shown and quantified when intensities are summed over many samples. It would be important to demonstrate a significant gradient within individual samples, and how it varies across samples.

      4) Greater integration between modeling and experiment would strengthen the manuscript. This is particularly true of the continuum model, where it is nontrivial to relate strain and stress to cell shape changes, given that cell shape is not simply an affine elastic deformation owing to stresses acting on it, but instead a response to stresses integrated with cell autonomous behaviors. There is a large body of literature on the alignment of cells relative to the direction of applied static or dynamic stretch. This mechano-responsivity that dictates cell shape is not considered in the present study. Even without considering these complicating cell behaviors, it is not clear how the magnitude of stress or strain relate to the change in cell shape. In addition, authors would ideally make use of the models to pinpoint what underlies the distinct polarization phenotypes between REF2c, REF11, and 3T3 cell types.

      5) The importance of cell-cell adhesion is another crux of the story, pointing to differences underlying the various polarization phenotypes. However, the only experimental support for this is via treatment with a calcium chelator, EGTA. Only one reference is provided for this method (#35, Chen et al), yet Chen et al appear not to have used EGTA at all, and instead disrupted E-Cadherin using neutralizing antibodies. This is a much more specific and direct approach that the authors of the present study should consider in place of EGTA. In the absence of this or similarly targeted approaches (RNAi, etc), the authors should include control experiments that demonstrate this rather broad perturbation does not alter contractility or cell-substrate interactions. This could be done at least in part, by using the traction force measurement system the authors have devised. It is particularly important to do so given the importance of calcium for cytoskeletal contraction via calmodulin. A second experiment authors could supplement this with is pharmacologic inhibition of calcium-depdendent contractility, with the hope/expectation that calmodulin-mediated contractility does not predominate this system. Even with these experiments, however, authors need to provide support from published work that this method of disrupting cell-cell adhesion is well established.

      6) The system is quite artificial with respect to in vivo conditions in most contexts. This on its own is not a limitation, as such approaches can still be used to reveal fundamental insights into the mechanisms of cell behaviors and interactions, employing approaches that are not feasible in vivo. However, it is important to tie the specific behaviors and outcomes of this study directly to events of developmental, physiologic, or pathologic importance. While authors do broadly invoke these as motivations for the work, the true impact of the findings is not fully realized without more direct links. Further, because the work is largely descriptive, and lacks direct measurement of cell generated forces, it does not truly take full advantage of the artificiality of the system.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      The authors study the effect of confinement on the alignment of REF cells confined within circular micropatterned islands. They observed that the cells are aligned perpendicularly to the boundary after 48h, contrary to other elongated cells such as NIH-3T3. After testing several subclones of that cell line, they identified cell contractility and cell-cell adhesion affect the organization of the cells in the circular patterns. They confirmed this finding using drugs that affect contractility and disrupt cell adhesion. Then they compared their results to a continuum model and to a Voronoi model.

      Enthusiasm for the work is diminished by the limited experimental support for key assumptions of the conceptual and math models (e.g. existence of stiffness gradient, assumption of uniform contractility, use of calcium chelator to show importance of adhesion). Further, integration of model and experiment could be improved, and some of the narrower assumptions of the models (e.g. omitting cell proliferation, remodeling of cell-cell contacts, and cell-substrate interactions, assuming uniform contractility) need better justification. Also, a clear correlation to specific events in development, physiology, or disease would highlight the broader impact of the work beyond a very specific event in a carefully engineered system. Finally, 3 similar papers came out on arxiv from the Roux group. They should be discussed in the manuscript and cited.

    1. Reviewer #2:

      While this paper develops some useful tools for targeting neurons expressing different isoforms of the FoxP transcription factor, the broad expression of FoxP (~1800 neurons throughout the brain and VNC) makes it challenging to interpret the general motor deficits that result from knocking out FoxP expression during development. The study lacks a structural or physiological link between the low-level genetic manipulations (elimination of FoxP expression) and high-level behavioral phenotypes (abnormal locomotion and landmark fixation).

    2. Reviewer #1:

      This is an elegant molecular manipulation of the FoxP gene, coupled with anatomical description of the neuronal distribution of isoform expression in the brain and ventral nerve cord of the fly.

      Isoform B functional knockouts show behavioral abnormalities in flies' ability to walk toward a dark vertical bar representing naturally attractive landscape features like plant stalks. FoxP isoform B manipulated animals walk slower and are less adept at targeting the dark bar. Knocking out all FoxP isoforms has similar behavioral effects as knocking out FoxP-iB alone.

      FoxP is expressed broadly throughout the peripheral and central brain and in the ventral nerve cord, throughout development. Expression within leg motorneurons and the protocerebral bridge of the central complex is required for normal walking visual fixation, which is entirely consistent with what we've been learning about the functional organization of this brain region for spatial navigation.

      The problem here is that the conceptual gap between molecular manipulation of the FoxP gene and the behavioral phenotype is wide. Absent any understanding of either the cell physiological mechanisms of action of FoxP, or the function of FoxP-positive neural circuitry involved in the behavior being explored, the advance remains preliminary.

      Even in the case where identified neurons that have recently been implicated in bar fixation by walking flies, which the authors demonstrate express at least some FoxP isoforms, broad FoxP knockout had no effect on the behavior. As the work is currently presented, there is not enough resolution between FoxP expression, cell circuit function, and behavior for the work to make a sufficiently compelling case.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers thought that the work was of quality and that the paper develops some useful tools for targeting neurons expressing different isoforms of FoxP. However, they also felt that there is a conceptual gap between the molecular manipulation of FoxP and the behavioral phenotype, with little understanding of the mechanisms of action of FoxP and of the function of FoxP in the neural circuitry involved in the behavior.

      The broad expression of FoxP in ~1800 neurons makes it challenging to interpret the motor deficits that result from knocking out its expression during development. Although neurons that express FoxP have recently been implicated in bar fixation, the behavioral phenotype of the FoxP knockout is difficult to interpret. Therefore, the integration of FoxP expression, the function of the circuit involving FoxP and the behavior is not sufficiently clear.

    1. Reviewer #3:

      The study by Pesoli et al. uses MEG acquisition in sleep deprived participants in order to explore the functional integration derived from MEG source reconstructed connectivity and its potential link to attentive functions. The study is well conducted with an appropriate size to explore global graph measures derived from MEG connectivity.

      1) My major concern is that the authors' main claim that MEG connectivity is correlated to attentive function has at best very weak support from the presented data. Though the authors claim in the methods that all analysis were FDR corrected the correlational analysis linking behavior to MEG connectivity does report uncorrected values. E.g. the correlation between Alpha-MEG Degree of Right superior Occipital gyrus bases on a statistical test on 90ROIs x 5 frequency bands x 2 nodal metrics which would result in a Bonferroni threshold of p=0.05/900, the reported p=0.009 is by orders larger than this threshold. This problem applies (on different levels) to all correlations reported in Fig. 6. In order to limit the amount of false positives more stringent statistical thresholding would be needed to analyze the link between connectivity and behavior (a good starting point to solve this issue can be [Makin et al. 2019, elife]). Related to this issue: the hypothesis 'such topological rearrangements would relate to cognitive performance' is highly underdetermined and the authors could stress the strong exploratory character of this study more in both abstract and introduction.

      2) The link to previous literature unclear for the connectivity measure used (Phase Linearity Measurement): the authors should shortly address in a paragraph what we should expect e.g when comparing the measure to more frequently used connectivity measures such as amplitude envelope coupling or coherence (Colclough et al. 2016, NeuroImage). What are the differences of the used measure and why did the authors choose this measure instead of a more frequently used measure?

      3) I was generally missing a consistent definition of the term integration: why did the authors choose the selected graph metrics to measure integration and how do the graph metrics show that the brain loses integration (like they state in the title of the article). The use of all graph measures should be clearly motivated: why did the authors choose these measures and what are they planning to measure to support their hypothesis?

      4) 'In particular, with regards to TS, median reaction times (in ms; median RT) to both repetition and switch trials, and angular transformations of the proportion of errors resulting from the two experimental sessions were submitted to two-factor repeated-measures ANOVA, instead, SC as well as all dependent variables obtained from LCT (number of hits and number of rows completed), were submitted to paired t-test.' This sentence is difficult to understand, I did not understand why in one case you use only posthoc t-tests and in the other case an ANOVA.

      5) Data availability: 'All data generated or analyzed during this study are included in the manuscript and supporting files.', the authors should include a more detailed description of where the interested reader can find data and code. Is it available on request or will it be provided in a repository?

    2. Reviewer #2:

      This study employs the use of MEG to incorporate both spatial and temporal strengths of previous fMRI and EEG studies to uncover the effects of sleep deprivation on brain function. While the motivation is clear, there are some issues with methodology and the writing is difficult to understand in many places.

      Introduction:

      1) L32-33 This sentence is not clear - 'neuroimaging techniques allowing us to overcome the concept of specific control vs. a distributed property'. Can you use a term like 'distinguish' or 'clarify'?

      2) L56-68 It would be better to talk about overall function of neural oscillations (SWA and spindles) during sleep on executive function and memory consolidation (systems consolidation/synaptic downscaling theories), rather than 'increases', as your study does not augment SWA per se. In fact sleep deprivation does augment SWA in the subsequent recovery period as an indicator of sleep pressure/intensity but we wouldn't consider this as beneficial.

      3) L100 - Can you briefly explain here why these tasks were chosen - e.g. if they have been used in prior SD work with other imaging modalities.

      Results:

      1) L-173 - you're not really comparing between two groups... should read conditions

      2) L204-216 - correlation assumes independence of observations, here you are combining both T0 and T1 conditions and combining them in 1 plot. This is problematic, also if you split these, some relationships look like they are going in opposite directions (e.g. Fig. 6b). Why not correlate change scores (brain/behavior) with each other?

      Discussion:

      1) L277 - There is a lot of discussion about the loss of integration measures during SD, however, the leaf fraction which is supposed to indicate integration of the networks is not significant between conditions.

      2) L252 - Most of the manuscript is set up for the reader to expect that SD would primarily affect frontal lobes and top-down cognition. However, the findings here are somewhat opposite - occipital regions associated with processing of visual stimuli are the ones that show altered diameter and degree metrics - but the authors claim that bottom up processing does not suffer from the effects of SD (L294). These findings need to be reconciled, and also with prior work.

      3) L293 - even if task engagement were a factor, we would not typically expect that participants would perform better after SD (maintained performance might be possible). This could suggest a practice effect at play here - since the first session was always the well-rested session.

      Methods:

      1) L315 - Can you show in a table descriptives for the actigraphic assessments of sleep the night before the experiment?

      2) L378 - disjoint sentence

      3) L400 - what does 'on the letter a beamforming procedure was performed' mean?

      4) L436 - there appears to be no counterbalancing across conditions here as all participants completed T0 first before T1. This could lead to practice effects confounding some of the interpretations. There is a statement about reduction of learning effects using different parallel forms from the LCT (L330) but it is not clear what this means. Can you show within each session (rested/SD) whether or not you see improvements in performance as the task progressed?

    3. Reviewer #1:

      In this study, 34 participants underwent 24 hours of sleep deprivation. They performed two tasks (letter cancellation and task switching) before and after sleep deprivation. Graph metrics were computed based on resting-MEG data. The authors showed that participants performed worse in the letter cancellation task after sleep deprivation, but performed better in task switching after sleep deprivation. They showed that certain graph metrics were changed after sleep deprivation and some of these metrics were correlated with task performance changes in task switching, but not letter cancellation.

      1) I think it's quite worrisome that participants actually performed better at task switching after sleep deprivation. I wonder if there's a serious flaw in the experimental procedure. One possibility is practice effect since participants performed the task before they were sleep deprived and then performed the task again after sleep deprivation.

      2) While the minimal spanning tree (MST) has been used in some papers, it seems to me that the resulting tree might be sensitive to noise. Besides, such pruning does not seem biologically plausible. I would suggest the authors repeat their analyses using more standard approaches, while taking into account potential pitfalls ( https://www.sciencedirect.com/science/article/pii/S105381191730109X )

      3) False discovery rate was not reported.

      4) It's unclear the sequence of experimental procedure. Perhaps I missed it but were the tasks performed before or after the MEG/MRI acquisition? I only knew the tasks were not performed during MEG because the authors mentioned in the discussion that "the brain measures are made at rest and not during the execution of the task." Seems pretty important to mention this more prominently in the manuscript.

      5) The title states that "Loss of integration of brain networks after one night of sleep deprivation underlies worsening of attentive functions". However, the authors' results contradict the title, since network measures did not correlate with worse letter cancellation task (LCT) performance, but correlated with better task switching performance! The same issue is present in the abstract, where the authors state that "brain network changes due to SD selectively impaired attention", yet the authors reported that "LCT performance and NASA score were not correlated with topological data".

      6) It's hard to follow the results section without first reading the methods section. This is fine if the methods section was before the results section. However, in this manuscript, the results section was before the methods section. Therefore, the authors should provide more methodological overview in the results section. For example, graph theoretic terms like BC and Diameter in Alpha were used in the results section with no explanation.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      This study utilizes MEG to study the effects of sleep deprivation on functional network integration, attention and task-switching. The strength of this study is that this is perhaps the first MEG sleep deprivation dataset and thus, the community would benefit from this data. However, the reviewers felt that there were potentially serious issues with the study design and statistical analyses. More specifically, the improvement in task switching performance after sleep deprivation might simply be due to practice effects. Without counterbalancing T0 and T1, it is unclear how this issue could be resolved. Furthermore, there were concerns about the pooling of T0 and T1 conditions in the correlations with KSS and task performance, as well as issues with multiple comparisons correction.

    1. Reviewer #3:

      Holmgren et al describe a novel model of reversible mechanical damage to zebrafish neuromast hair cells. The authors demonstrate that when zebrafish are exposed to strong currents, neuromast morphology, hair cell number, innervation, and MET function suffer various types and degrees of damage, from which the NMs recover within 2 days. Additionally, they show macrophage recruitment to damaged neuromasts, where they may be phagocytosing synaptic debris. Based on various mechanistic and phenotypic commonalities (involvement of ROS, stereocilia and synapse phenotype), the authors argue that this model is a good approximation of noise-induced hair cell damage in mammals.

      Overall impact:

      This reviewer agrees that a "noise" damage model in the zebrafish would be a powerful tool to better understand the mechanisms underlying noise-induced hearing loss. However, due to various weaknesses of the data (detailed below), the main claims of the paper are not sufficiently supported. In addition, noise-induced hearing loss has been previously modeled in the zebrafish model. The present model, therefore, does not provide a significant methodological innovation.

      Major concerns:

      1) As the authors point out, zebrafish hair cells can be regenerated. With that in mind, and to make the relevance for mammalian hair cell repair clear, a clear distinction between mechanisms mediated by "repair" or "regeneration" needs to be made. The authors discuss that proliferative hair cell generation can be excluded based on the short time period, but suggest that transdifferentiation might be involved. Recovery of NM hair cell number occurs within the same 2 hour period in which NM morphology and hair cell function improved, making it difficult to determine the extent to which "regeneration" contributed to the recovery. The amount of transdifferentiation has to be shown experimentally (lineage tracing?).

      2) The classification of "normal" vs "disrupted" is vague and not quantitative. The examples shown in the paper seem to be quite clear-cut, but this reviewer doubts that was the case throughout all analyzed samples. Formulate clear benchmarks and criteria for the disrupted phenotype (even when blind analysis is performed).

      3) Sustained and periodic exposure: These two exposure protocols not only differ with respect to sustained vs periodic, they also differ in total exposure time (Fig 2B). This complicates the interpretation, especially considering the authors own finding that a pre-exposure is protective.

      4) The data on the mitochondrial ROS aspect seems not well integrated into the overall story.

      5) It is surprising that the hair bundle morphology was not assessed after recovery. This is crucial. Overall, it would be good to see some quantification of the SEM data, e.g. kinocilia length and number of splayed bundles.

      6) Behavioral recovery (measured as number of "fast start" responses) was also not assessed. This is essential for determining the functional relevance of the recovery.

      7) This reviewer is not yet convinced that this damage model displays enough commonalities to mammalian noise damage to justify the ubiquitous use of the term "noise" throughout the manuscript. It would be more prudent to use a more careful term along the lines of "mechanical overstimulation-induced damage".

      8) Overall, there was a lack of experimental and analysis detail in the results section. For example, how was afferent innervation quantified? Just counting GFP labeled contacts to hair cells? There was also inconsistency in the use of two variations of the mechanical damage protocol, the time points at which repair was assessed, and whether the damage was quantified in all neuromasts or in normal vs. disrupted neuromasts separately, making the data difficult to interpret.

    2. Reviewer #2:

      Holmgren et al. describe the development of a model for hair cell noise damage using the zebrafish lateral line line system. Using an electrodynamic shaker, the authors induce quantifiable damage and death of hair cells after a two-hour treatment. They describe gross morphological changes of hair cells, changes in innervation and synapse distribution. In addition they describe disruption of stereocilia and kinocilia, as well as reduced mechanotransduction-dependent uptake of FM1-43 dye. Damage is no longer detectable several hours after insult, demonstrating recovery.

      1) While the findings are carefully measured and described, the effects of insult on hair cells are relatively minor, with a change in hair cell number, extent of innervation or synapses per hair cell (Figs 3 and 4) in the range of 10% reduction compared to control. One potential value of the model would be to use it to discover underlying pathways of damage or screen for potential therapeutics. However with these modest changes it is not clear that there will be enough power to determine effects of potential interventions.

      2) The most dramatic phenotype after shaking is a physical displacement of hair cells, described as disrupted morphology. However it is not clear what the underlying cause of this change. Are only posterior neuromasts damaged in this way? Is it a wounding response as animals are exposed to an air interface during shaking? It is also not clear to what extent this displacement reveals more general principles of the effects of noise on hair cells. Additional discussion of underlying causes would be welcome.

      3) Because afferent neurons innervate more than one neuromast and more than one hair cell per neuromast, measurements of innervation of neuromasts (Figure 3) or synapses per hair cell (Fig 4) cannot be assumed to be independent events. That is, changes in a single postsynaptic neuron may be reflected across multiple synapses, hair cells, and even neuromasts. This needs to be accounted for in experimental design for statistical analysis.

      4) The SEM analysis provides compelling snapshots of apical damage, but could be supplemented by quantitative analysis with antibody staining or transgenic lines where kinocilia are labeled. The amount of reduced FM1-43 labeling is one of the more dramatic effects of the shaking insult, suggesting widespread disruption to mechanotransduction that could be related to this apical damage. Further examination of the recovery of mechanotransduction would be interesting.

      5) A previous publication by Uribe et al.2018 describes a somewhat similar shaking protocol with somewhat different results - more long-lasting changes in hair cell number, presynaptic changes in synapses, etc. It would be worth discussing potential differences across the two studies.

    3. Reviewer #1:

      In the manuscript titled "Mechanical overstimulation causes acute injury followed by fast recovery in lateral-line neuromasts of larval zebrafish" by Holmgren et al., the authors develop a method to overstimulate hair cells and determine some of the consequences of this overstimulation. The overarching goal of this work is to develop a model for noise-induced hair-cell damage in the zebrafish. The authors use the lateral line for their studies and stimulate hair cells using an electrodynamic shaker which generate significant aqueous agitation. The authors demonstrate physical damage to hair cells of the lateral line that are dependent on the position of the neuromast. The damage includes alteration of afferent synapses, afferent neurite retraction, limited damage to hair bundles and a decrease in mechanotransduction. After damage, they show macrophage recruitment and quick recovery of hair cell neuromasts, which is surprising.

      The paper is interesting in that it brings a new capacity to the zebrafish animal model: mechanical overstimulation of the hair cell. Tempering this is a general feeling that the authors do not dig deep enough in the current form of the manuscript, but this could be remedied. More specifically, the authors are making a model in zebrafish for noise-induced damage, so they need to show that this model is similar to mammals in the way hair cells are damaged. This is done in the manuscript, but it is limited and should be expanded as suggested below.

      Major comments

      1) The authors use a vertically-oriented Brüel+Kjær LDS Vibrator to deliver a 60 Hz vibratory stimulus to damage lateral line hair cells. It is not made clear on why this frequency was selected. Did the authors choose this frequency because they screened a number of frequencies and this is the one that did the most damage to hair cells or was it chosen for another reason? Or, do all frequencies do the same amount of damage? The authors should screen a number of frequencies and choose the stimulus that does the most damage to hair cells. This would set the field in the best direction, should members of the community attempt this new technique. It is not necessary to repeat all of the experiments, but the authors should show which frequencies are best for inducing damage.

      2) The SEM images of the hair bundle are beautiful and do show damage to the hair bundle, but historically speaking older studies in mammals have shown that the actin core of the stereocilia is damaged. It would be critical to know if this was the case. Showing damage to the kinocilium and stereocilia splaying is a start, but readers would need to know if the actin cores are damaged. So, TEM should be used to find damage to the actin cores of stereocilia.

      3) I think the use of "Noise-exposed lateral line" as a term for mechanically overstimulated lateral line hair cells is not correct and could be misleading. The lateral line senses water motion not sound as the word noise would imply. Calling the stimulus "noise" should be removed throughout.

      4) Decreases in mechanotransduction are shown by dye entry. These results should be strengthened using microphonic potentials to determine the extent of damage. This experiment is not necessary but would improve the quality of the document.

      5) In figure 2, PSD labeling is not clear.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Doris K Wu (NIDCD, NIH) served as the Reviewing Editor.

    1. Reviewer #3:

      The paper by Fair (Gilad) and colleagues examined the determinants of gene expression variation within human and chimpanzee populations. Studies focused on an analysis of left ventricle in 39 chimpanzees and 39 human samples. The authors first developed a strategy to measure "dispersion", or gene expression variance after regressing out the effects of mean expression. This metric of dispersion was correlated between human and chimpanzee in most genes, but there were substantial differences between species that could not be explained by changes in mean expression level. Highly dispersed genes were enriched for genes with a higher amino acid divergence, TATA boxes, and cellular composition. In fact, the authors found that changes in cellular composition between samples were highly correlated with expression dispersion, wherein genes that were markers of specific cell populations were highly dispersed. Analysis of eQTLs discovered that genes which are variable based on eQTLs in one species were enriched for eQTLs in the other.

      Overall, there are many good things about this paper. The data will be of broad utility to the comparative genomics community: the authors added RNA-seq data from the left ventricle of 21 chimpanzees and high coverage complete genomes from 39. The calculation of power for discovering differentially expressed genes as a function of sample size at the beginning of the paper is a thoughtful analysis that is useful to many in the community. As I have come to expect from these authors, all of the analyses are extremely thorough and well-executed. The statistical tests are appropriate and rigorous. Results are interpreted in a conservative fashion.

      The main issue is that the authors are not able to conclusively disambiguate between different causes of dispersion. Genetics, cell type, and technical variation may all contribute to dispersion. The authors state this very clearly throughout the manuscript. In part, this may reflect the authors' underselling their results somewhat. But in part, this really does reflect reality: Cell type is a major confounder that may provide false signals in other analyses.

      Major comments/ suggestions:

      1) Did the authors test directly whether eQTLs were enriched in genes with a high dispersion? I could not find this going back through the paper. This seems almost trivially likely to be true. I may have missed this result? Or did the authors worry this is too likely to be confounded with cell type? Either way, this seems like a result that may be useful to show even if the authors did acknowledge that it was likely to be confounded.

      2) Did the authors consider looking for cell-type QTLs? They state several times in the paper the possibility that genetic factors may influence cell types. They have enough data - at least in humans - to obtain QTLs for specific cell types, as others have done (Marderstein et. al. Nat Comms 2020; Donovan et. al. Nat Comms 2020). If these cell type QTLs were enriched near genes with a high dispersion, this may bolster the author's argument that genetic factors underlie dispersion by affecting cell type composition.

      3) The scRNA-seq reference used for estimating cell types in heart tissue was derived from mice. Could this lead the authors to underestimate the degree to which cell types drive dispersion in genes that are variable between human and chimp? Genes that are variable between human/ chimp may also be more likely to be variable between either species and mouse, and perhaps this variability has led to them becoming more/ less of a marker of a specific cell population (and hence their dispersion in primates does not correlate with cell type specificity in mouse).

      4) Have the authors tried estimating dispersion on top of what is expected based on differences in cell type? There are several strategies that might work for this: There are new strategies for estimating a posterior of cell type specific expression from a bulk sample, conditional on scRNA-seq data as prior information (Chu and Danko, bioRxiv, 2020). These cell type specific expression estimates could then be analyzed for dispersion. Alternatively, it may also work to regress the estimated proportion of each cell type out of the dispersion estimates. While there are certainly a lot of pitfalls with using these strategies, especially in the setting shown here (all of this would work better if there were species matched reference data), they might provide an avenue for depleting the contribution of cell type differences from dispersion estimates.

      5) Can the authors add a dotted line to show the shape of the distribution for genes with low dispersion, or where dispersion is shared in both human and chimpanzee, in figure 4b? Is this different from genes that are dispersed in either chimp or human?

      6) Type. pp. 20. "... in only in ..."

    2. Reviewer #2:

      In this study, Fair et al. focused on assessing inter-individual variability in gene expression, which has been shown to be heritable and associated with disease susceptibility. More specifically, unlike many studies focused on mapping associations between genetic and gene regulatory variation, authors paid attention to the group dispersion/variance of gene expression among samples as well as the evolutionary processes that shape the differences in gene regulation between individuals in humans or any other primate. Using computational deconvolution, they found that cell-type heterogeneity determines expression variability in both species. They also found a significant overlap of orthologous genes associated with eQTLs in both species. They concluded that gene expression variability in humans and chimpanzees often evolves under similar evolutionary pressures. The manuscript, in general, is well prepared. For example, authors put supplementary figures within the main text whenever they are supposed to be, which is convenient. The authors collected data from 39 human vs. 39 chimp primary heart tissue samples. The sources of human samples include 11 (old study)+28 (GTEx) and chimp samples 18 (old)+21 (new). Twenty-one new specimens are generated specifically for this study. This study involves a large number of tests, but the main problem is the lack of a coherent central hypothesis.

      Major comments:

      1) The first test authors conducted is to identify differentially variable (DV) genes. A total of 2658 DV genes were identified. The problem of the result is that almost equal number of up- and down-regulated DV genes symmetrically distributed around DV=0. Often, this is an indication of a lack of biological signals in data analysis. This might be due to the pooling of gene groups with diverse functionality together. Therefore, this reviewer suggests that authors should break down genes into subgroups to detail the up and down-regulatory patterns with the hope that some of the gene groups give interpretable results

      2) The second test is to correlate the higher coding sequence conservation with lower dispersion. Again, the positive result is not unexpected. There are many indirect and/or confounding factors that may explain the effect. This reviewer, however, understands it is impossible to control them all (also authors have attempted to address some of them in the next few tests). However, here it is better to add exploratory analyses for genes in different functional groups and also give examples of outlier genes that do not follow the rule.

      3) The third test is to examine the correlation between gene expression variability with single-cell type heterogeneity of samples. Authors first used Tabula Muris dataset to show dispersion is strongly correlated with cell-type specificity/diversity. If this is true, then the point that authors really wanted to demonstrate is, in fact, hampered. Authors might really want to show the "true" single-cell variability (see, for example, PMID: 31861624) is correlated with the level of group variance of gene expression.

      4) The fourth test authors conducted is to show that dn/ds and pn/ps ratios of genes are correlated with gene expression variability (variance). However, because of the existence of heterogeneity of cell-type composition in samples, any correlation observed may be utterly biased by this single uncontrollable confounding factor. Furthermore, heart tissues contain an over-abundant expression of genes encoded in the mitochondrial genome. The expression level of these mt-genes may vary substantially between samples and reflect the health status of primary sample donors. PEER normalization may have to take this into account as a covariant.

      5) Several other tests authors performed are around eQTLs (eGene overlap and eSNP overlap) between the two species. These are typical tests evolutionary biologists usually try to do whenever data is available. However, the issues with these types of tests are the low power in general. More importantly, in order to be consistent with previous tests which are all around the explanation of gene expression variance, this part should address the overlap between expression vQTLs in humans and chimps.

    3. Reviewer #1:

      This is a solid study, with a large sample size, identifying quantitative trait loci (eQTLs) in humans and chimpanzees, using gene expression data from primary heart samples. The authors complemented the analysis of gene expression with a comparative eQTL mapping, as opposed to relying on mean expression levels, as most studies like this one do.

      1) I would like to see more discussion about the inter-relatedness of the chimpanzees in the analysis of gene expression. Is that contributing to the power of the DE analysis, which has really high numbers of DE genes. That may certainly be due to the large samples size, but should be addressed. Related to that, the support that the gene-wise dispersion estimates are well correlated in humans and chimpanzees overall (Fig1C, and S4) seems qualitative. It looks like the chimpanzees might have less dispersion overall?

      2) What do the authors think these findings mean for study systems outside of humans and captive chimpanzees? Both on the technical level (e.g. sample size), and for how their approach could be helpful outside of these species. Generalizing this approach would broaden the impact and audience of the paper.

      3) Just a comment that I appreciated the thoughtfulness of the possible technical confounds in the results and discussion.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This is a solid study, with a large sample size, identifying quantitative trait loci (eQTLs) in humans and chimpanzees, using gene expression data from primary heart samples. The authors complemented the analysis of gene expression with a comparative eQTL mapping, as opposed to relying on mean expression levels, as most comparative studies like this one do. Also unlike many studies focused on mapping associations between genetic and gene regulatory variation, the authors paid attention to the group dispersion/variance of gene expression among samples as well as the evolutionary processes that shape the differences in gene regulation between individuals. The calculation of power for discovering differentially expressed genes as a function of sample size at the beginning of the paper is a thoughtful analysis that is useful to many in the community. All of the analyses are extremely thorough and well-executed. The statistical tests are appropriate and rigorous. Results are interpreted in a conservative fashion.

      The main limitation is that the authors are not able to conclusively disambiguate between different causes of dispersion. Genetics, cell type, and technical variation may all contribute to dispersion. The authors state this very clearly throughout the manuscript. In part, this may reflect the authors' underselling their results somewhat. But in part, this really does reflect reality: Cell type is a major confounder that may provide false signals in other analyses.

    1. Reviewer #2:

      General assessment:

      This work utilizes two Spiroplasma populations as the materials to study the substitution rates of symbiotic bacteria. A major finding is that these symbionts have rates that are ~2-3 orders higher than other bacteria with similar ecological niches (i.e., insect symbionts), and these substitution rates are comparable to the highest rates reported for bacteria and the lowest rate reported for RNA virus. Based on these findings, the authors discussed how this knowledge could be used to infer and to understand symbiont evolution. The biological materials used (i.e., symbionts maintained in fly hosts for 10 years and cultivated outside of the host for > 2 years) are valuable, the technical aspects are challenging, and the answers obtained are certainly interesting. The key concern is the limited sampling of other bacteria for comparison to derive the conclusions.

      Major comments:

      1) The key concern regarding sampling involves several points. (a) The two populations represent the species Spiroplasma poulsonii. Is this species a good representative for the genus? Or is it an exception because it is a vertically inherited male-killer? Most of the characterized Spiroplasma species appear to be commensals and are not vertically inherited. (b) The other species with a comparable rate is Mycoplasma gallisepticum (i.e. a chicken pathogen that spreads both horizontally and vertically). Mycoplasma is a polyphyletic genus with three major clades. While closely related to Spiroplasma, their hosts and ecology are quite different. Do all three groups of Mycoplasma have such high rates? If so, are the high rates simply a shared trait of these Mollicutes and has nothing to do with the distinct biology of S. poulsonii? How about other Mollicutes (e.g., Acholeplasma and phytoplasmas). (c) The group "human pathogens" in Fig. 2 show rates spreading across four orders of magnitude. This is too vague. How many species are included in this group? Are their rates linked to their phylogenetic affiliations? (d) Did Fig. 2 provide comprehensive sampling of bacteria? How about DNA viruses? Michael Lynch has done extensive works on mutation rates (e.g., DOI: 10.1038/nrg.2016.104), some of those should be integrated and discussed.

      2) This study is based on two lab-maintained populations. How may the results differ from natural populations? I understand that no estimate may be available for natural populations and additional experiments may not be feasible, but at least a more in-depth discussion should be provided.

      3) The authors use adaptation as a key explanation for several of the findings. Stronger support and alternative explanations are needed. For example, why genome degradation may be used as a proxy for host adaptation (line 497)? If this explanation works only for sHy but not the other strain within the same species (i.e., sNeo), is this still a good explanation? Similarly, for the arguments made in lines 524-528, supporting evidence should be presented in the Results. For example, what are the rate distribution of all genes? Do those putative adaptation genes have statistically higher rates and/or signs of positive selection?

      4) The chromosome and plasmids have very different rates (lines 315-316). Since this study aims to compare across different bacteria, perhaps the analysis should be limited to chromosomes for all bacteria.

      5) Formal statistical tests should be performed to test the stated correlations (e.g., lines 360-361, genome size and the number of insertion sequences).

      6) Fig. 5. The differences in CDS length distribution should be investigated and discussed in more details. The authors stated that they have re-annotated all genomes using the same pipeline, so this finding cannot be attributed to the bioinformatic tools. If these findings are true (rather than annotation artifacts), it is quite interesting. How to explain these? Why is Sm KC3 so different from all others?

      7) Lines 467-479. Multiple lineages have purged the prophages is an interesting hypothesis and may be important in furthering our understanding of these bacteria. More detailed info (e.g., syntenic regions of prophage sites across different species) should be provided in the Results to support the claim. Perhaps the sampling should be expanded to include the Apis clade (i.e., the clade with the highest number of described species within the genus) to test if the prophage invasion occurred even earlier or independently in multiple lineages. Additionally, CRISPR/Cas systems are known to have variable presence across Spiroplasma species (DOI: 10.3389/fmicb.2019.02701). How does this correspond to prophage distribution/abundance?

      Minor comments:

      1) Lines 32, 517, and possibly other parts: Use "increased" or "decreased" to describe the rate differences are inappropriate because these imply inferences of evolutionary events after divergence from the MRCA, which are clearly not the case. It would be more appropriate to use "higher" or "lower" to describe the difference.

      2) Lines 31-32. This is too vague. For the rates, the description should be more explicit (e.g., higher by X orders of magnitude). The term "symbiont" is also vague. Broadly speaking, all human pathogens (included in Fig. 2) or plant-associated bacteria could be considered as symbionts as well. Would be better to define this point more clearly.

      3) Fig. 1. The alignment is off. For example, June should be located near the middle between two tick marks.

      4) Line 207. This is confusing. There should not be 6 circular chromosomes.

      5) Line 211. Why is the hybrid assembly more fragmented?

      6) Methods and Results. More detailed information regarding the sequencing and assembly should be provided. For example, how much raw reads were generated for each library? What are the mapping rates? How much variation in observed coverage across the genome?

      7) Lines 341-342. How to establish an expected level of synteny conservation?

      8) Line 487. I do not see how this statement could be supported by Fig. 5. Also "less pronounced" is vague.

    2. Reviewer #1:

      The paper has potential. It's not there yet.

      The paper presents a sequencing study describing the evolution of Spiroplasma over various years in lab cultures. Spiroplasma is a fascinating bacteria that induces some unique phenotypes including enhancing insect immunity or "protection" and male-killing. The premise for the study was that sometimes these phenotypes disappear in cultures and thus the bacteria is likely quickly evolving and subject to frequent mutation. The researchers sequence various cultures of Spiroplasma (sHy and sMel), assemble and annotate genomes, compare the genomes, quantify the rates of evolution and compare these rates to some other studies on viruses, human microbiota/pathogens, and wolbachia. They find that Spiroplasma evolve real fast and speculate that the mechanism for this is a lack of various Mut repair enzymes. They look at fast evolving proteins of interest including RIP toxins which kill nematodes and spaid which is an inducer of male killing. So essentially the big result here is that Spiroplasma evolves real fast.

      In my opinion the paper is weak in a few senses. It doesn't reflect hypothesis driven science. It's mostly observational data and the researchers do not test any hypotheses. Now I don't think this is a deal breaker, but I do think it weakens the paper. Also, my comment should not imply that there isn't valuable data herein; and in fact I think the other big weakness is that the researchers do NOT exploit the true value of the data to derive and test novel hypotheses.

      For example: one aspect I was most excited about was to see how the researchers dissect and annotate evolutionary differences induced by axenic culture systems. The authors have the ability to compare and contrast genomes of Spiroplasma cultured in host insects AND Spiroplasma cultured without insects in axenic culture. Within these genome comparisons are likely novel insights that could shed light on mechanisms of maternal transmission and mechanisms of cell invasion etc... However, I was shocked to see that there is no in-depth analysis of specific proteins that are changing and evolving in these two diverse culture systems. I thought the analysis was entirely insufficient and didn't extract or present the real value of the datasets here. There are some brief mentions in the discussion of adherin binding proteins, but that was essentially it. I think the researchers focused too much on the past, ( the RIP toxins and spaid) rather than pointing out new interesting genes and hypotheses about them.

      For example: Maternal transmission would no longer be required in axenic culture, what genes got mutated? This is perhaps the most interesting thing that is not even touched upon.

      So essentially my main criticism is the added value from this paper which is the potential ability to compare symbiont genomes in hosts to symbionts with Axenic culture was NOT exploited. Given the novelty and impact of the axenic culture studies by Bruno, I would have hoped to see this upfront.

      Also there are some paragraphs comparing broad genomic differences between sHy and sMel, but I didn't think the differences in how these genomes evolved over time in comparison to their earlier selves was emphasized or explained in enough detail.

      Another example of not exploiting the value of the data: The plasmids are usually where much of the action is in microbes. There should be detailed annotations and figures of the plasmids. Tell me what is on them. Tell me which genes are evolving. Tell me if there are operons. Tell me what pathways are in the plasmids. I found the discussions of plasmid results wholly lacking. I also inherently felt that discussions of plasmids should be kept completely separate from discussions of chromosome evolution, regardless of similar rates of evolution or not... Plasmids are unique selfish entities and I imagine their evolution is wholly distinct from the evolution of chromosomes. They deserve their own sections and figures (in my opinion).

      The figure legends are completely insufficient and they ask me to read other papers to understand them, which is annoying.

      Other minor comments:

      What about presence/absence of recA?

      There are differences in dna extraction prior to genome sequencing for each of the strains. I suspect this is because different individuals sequenced different genomes. But I worry that different protocols could produce different results and therefore a comparison might be tainted by dna extraction and library prep specifics. Can you at least explain to the reader why this is not an issue, if it is not an issue?

      Examples:

      181 - why were heads removed? Why was this dna extraction protocol here different from the hemolymph extraction protocol? Might this have changed anything?

      195 - how much heterogeneity do you expect in any given fly. Do you have SNP data differences amongst good reads that could point out different alleles within a Spiroplasma population within an individual fly? It would be interesting to know which genes have a large amount of different alleles.

      199 - another DNA extraction protocol. There isn't consistency here. If the reads and coverage are good enough, it shouldn't be a problem. But if there were data issues or assembly issues, this would raise concern in my mind. Can the researchers discuss or alleviate concerns here? Some assemblies have 6 chromosomes, some have 3 chromosomes. I presume these were different strains of Spiroplasma and not the same one?

      Figure 1: were the samples that are 6 years apart (red) sequence in exactly the same way with the same technology? Could this produce any relics? Also, why display information for sMel in a table and information for sHy in a figure? Can't you creatively standardize a visual means of showing this information and compile information to one item?

      I wonder what would happen if you took the same sample and did different DNA extraction protocols, different library prep protocols, and different illumina rounds of sequencing and independent algorithm assemblies... how much would they come out the same? Has anyone ever done this experiment? Is there any reference for this control that shows they would in fact come out the same? This is essentially what I am worried about here. This could be a minor issue, if the researchers could just confidently explain why this is NOT an issue.

      Line 30 - you introduce sHy and sMel without defining what they are yet? Clarify immediately that they are both S.poulsoni

      line 247 - They found fragmented genes with orthofinder, if it was less than 60% length homology... why set an arbitrary cutoff of 60? Anything less than 100 is possibly a pseudogenization if the last amino acid is important, or the C-terminus is important, which it often is... What is the rationale here?

      To quantify an evolutionary rate, I read that they counted the number of changes in 3rd codon wobble positions/year. Why just wobble codons... why not all SNPs period? But then in the figure 2, it seemed like they are tallying a percentage of a total 100% = 570 "variants" or changes in the sequences (I wouldn't use the word variants, as this makes me think of strains; better to say "changes", no?). These changes include snps, insertions, deletions, and "complex"... no idea what complex is? The figure legends are completely insufficient. And I still don't know if you are tallying in some kind of number of recombinations and psuedogenizations into the mix (I assume these are included in the frame-shifts)? The quantification is murky to me.

      The adhesin proteins are evolving fast. But aren't Spiroplasma commonly intracellular... so why would it be binding an extracellular protein? ... can you discuss this? I presume invasion or something?

      There might be a correlation with genome size and speed of evolution. You mention this in the discussion, but briefly. Can you elaborate on this, especially because Spiroplasmas are close to mycoplasmas which are REALLY small genomes.

      Figure 3 is really confusing. I assume FS is frameshift, is IF induced fragmentation? After about 10 minutes I could decode it. Is this really the best way to think about these results? Perhaps? But perhaps not? ARP? I think it's adhesin stuff, but you don't say this until later.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Vaughn S Cooper (University of Pittsburgh) served as the Reviewing Editor.

      Summary:

      This work uses Spiroplasma to study the substitution rates of symbiotic bacteria, which are ~2-3 orders higher than other insect symbionts, and approaching rates reported for viruses. The use of symbionts maintained in fly hosts for 10 years and cultivated outside of the host for > 2 years are valuable, and the study is interesting. The key concern is the limited sampling of other bacteria as comparative taxa to derive the conclusions. This makes the report somewhat premature. Further analyses of existing data are also required. Equally important, the study needs to be better placed in the context of what's known about mutation rates varying as a function of effective population size, to better locate this study in the broader literature on the evolution of mutation rates.

    1. Reviewer #3:

      Bola and colleagues set out to test the hypothesis that vOT domain specific organization is due to the evolutionary pressure to couple visual representations and downstream computations (e.g., action programs). A prediction of such theory is that cross-modal activations (e.g., response in FFA to face-related sounds) should be detected as a function of the transparency of such coupling (e.g., sounds associated with facial expression > speech).

      To this end, the Authors compared brain activity of 20 congenitally blind and 22 sighted subjects undergoing fMRI while performing a semantic judgment task (i.e., is it produced by a human?) on sounds belonging to 5 different categories (emotional and non-emotional facial expressions, speech, object sounds and animal sounds).The results indicate preferential response to sounds associated with facial expressions (vs. speech or animal/objects sounds) in the fusiform gyrus of blind individuals regardless of the emotional content.

      The issue tackled is relevant and timely for the field, and the method chosen (i.e., clinical model + univariate and multivariate fMRI analyses) well suited to address it. The analyses performed are overall sound and the paper clear and exhaustive.

      1) While I overall understand why the Authors would choose a broader ROI for multivariate (vs. univariate) analyses, I believe it would be appropriate to show both analyses on both ROIs. In particular, the fact that the ROI used for the univariate analyses is right-hemisphere only, while the multivariate one is bilateral should be (at least) discussed.

      2) The significance of the multivariate results is established testing the cross-validated classification accuracy against chance-level with t-tests. Did these tests consider the hypothetical chance level based on class number? A permutation scheme assessing the null distribution would be advisable. In general, more details should be provided with respect to the multivariate analyses performed, for instance the confusion matrix in Figure 5B is never mentioned in the text.

      3) I wonder whether a representational similarity approach could be useful in better delineating similarity/differences in blind vs. sighted participants sounds representations in vOT. Such analysis could also help further exploring potential graded effects: i.e., sounds associated with facial expression (face related, with salient link to movement) > speech (face related, with less salient link with movement) > animals sounds (non-human face related) > object sounds (not face related at all). The above-mentioned confusion matrix could be the starting point of such investigation.

    2. Reviewer #2:

      The study by Bola and colleagues tested the specific hypothesis that visual shape representations can be reliably activated through different sensory modalities only when they systematically map onto action system computations. To this aim, the authors scanned a group of congenitally blind individuals and a group of sighted controls while subjects listened to multiple sound categories.

      While I find the study of general interest, I think that there are main methodological limitations, which do not allow to support the general claim.

      Main concerns

      1) Auditory stimuli have been equalized to have the same RMS (-20 dB). In my opinion, this is not a sufficient control. As shown in Figure 3 - figure supplement 1, the different sound categories elicited extremely different patterns of response in A1. This is clearly linked to intrinsic sound properties. In my opinion without a precise characterization of sound properties across categories, it is not possible to conclude that the observed effects in face responsive regions (incidentally, as assessed using an atlas and not a localizer) are explained by the different category types. On the stimulus side, authors should at least provide (a) spectrograms and (b) envelope dynamics; in case sound properties would differ across categories all results might have a confound associated to stimuli selection.

      2) More on the same point: the authors use the activation of A1 as a further validation of the results in face selective areas. Page 16 line 304 "We observed activation pattern that was the same for the blind and the sighted subjects, and markedly different from the pattern that was observed in the fusiform gyrus in the blind group (see Fig. 1D). This suggests that the effects detected in this region in the blind subjects were not driven by the differences in acoustic characteristics of sounds, as such characteristics are likely to be captured by activation patterns of the primary auditory cortex." It is the opinion of this reader that this control, despite being important, does not support the claim. A1 is certainly a good region to show how basic sound properties are mapped. However, the same type of analysis should be performed in higher auditory areas, as STS. If result patterns would be similar to the FFA region, I guess that the current interpretation of results would not hold.

      3) Linked to the previous point. Given that the authors implemented a MPVA pipeline at the ROI level, it is important to perform the same analysis in both groups, but especially in the blind, in areas such as STS as well as in a control region, engaged by the task (with signal) to check the specificity of the FFA activation.

      4) I find the manuscript rather biased with regard to the literature. This is a topic which has been extensively investigated in the past. For instance, the manuscript does not include relevant references for the present context, such as:

      Plaza, P., Renier, L., De Volder, A., & Rauschecker, J. (2015). Seeing faces with your ears activates the left fusiform face area, especially when you're blind. Journal of vision, 15(12), 197-197.

      Kitada, R., Okamoto, Y., Sasaki, A. T., Kochiyama, T., Miyahara, M., Lederman, S. J., & Sadato, N. (2013). Early visual experience and the recognition of basic facial expressions: involvement of the middle temporal and inferior frontal gyri during haptic identification by the early blind. Frontiers in human neuroscience, 7, 7.

      Pietrini, P., Furey, M. L., Ricciardi, E., Gobbini, M. I., Wu, W. H. C., Cohen, L., ... & Haxby, J. V. (2004). Beyond sensory images: Object-based representation in the human ventral pathway. Proceedings of the National Academy of Sciences, 101(15), 5658-5663.

    3. Reviewer #1:

      Bola and colleagues asked whether the coupling in perception-action systems may be reflected in early representations of the face. The authors used fMRI to assess the responses of the human occipital temporal cortex (FFA in particular) to the presentation of emotional (laughing/crying), non-emotional (yawning/sneezing), speech (Chinese), object and animal sounds of congenitally blind and sighted participants. The authors present a detailed set of independent and direct univariate and multivariate contrasts, which highlight a striking difference of engagement to facial expressions in the OTC of the congenitally blind compared to the sighted participants. The specificity of facial expression sounds in OTC for the congenitally blind is well captured in the final MVPA analysis presented in Fig.5.

      -The use of "transparency of mapping" is rather metaphorical and hand-wavy for a non-expert audience. If the issue relates to the notion of compatibility of representational formats, then it should be expressed formally.

      -The theoretical stance of the authors does not clearly predict why blind individuals should show more precise emotional expressions in FFA as compared to sighted - as the authors start addressing in their Discussion. In the context of the action-perception loop, it is even more surprising considering that the sighted have direct training and visual access to the facial gestures of interlocutors, which they can internalize. Can the authors entertain alternative scenarios such as the need to rely on mental imagery for congenitally blind for instance?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      While the work addresses an interesting research question, several shortcomings have been raised by three independent reviewers. A first issue is the lack of theoretical clarity and linkage with prior work, as discussed by Reviewer 1 and Reviewer 2. A second critical set of concerns is raised by all reviewers with the need for several additional analyses to nail down the interpretations proposed by the authors. Reviewer 2 specifically raised concerns regarding the interpretability of activation in auditory cortices, while Reviewer 3 provides insights on the MVPA analysis and suggests the possible use of RSA to clarify the main findings.

    1. Reviewer #3:

      Many of the genes whose expression is induced by the integrated stress response (ISR) encode aminoacyl tRNA synthetases. Why is expression of so many synthetases enhanced in the ISR and what is the functional significance of this induction are important unresolved questions. This manuscript focuses on the tyrosyl tRNA synthetase, which is induced by the ISR in response to different stress conditions. The study suggests that induced expression of TyrRS in response to oxidative stress leads to nuclear localization of the enzyme where it then binds to DNA targets and recruits key transcription factors that control selected gene expression that ultimately controls protein synthesis levels late in the ISR. The TyrRS dampening of translation late in the ISR apparently occurs independent of the levels of eIF2 phosphorylation.

      These ideas are a potentially interesting mechanistic feature of the ISR that builds on prior reports from this lab. However, there are major reviewer concerns about the manuscript. The manuscript uses different HEK cell models do not appear to be comparable in key ways. Hence one cannot readily integrate the results between the different models and there are important gaps in each. Additionally, key controls and assays are missing from each of the studied models. Because of these major concerns, the stated conclusions are not sufficiently supported from the experimental results. A portion of these concerns are highlighted below. These concerns diminished enthusiasm for the manuscript.

      Reviewer concerns:

      1) Figure 1: A major concern with the manuscript is that key controls and measurements are missing in experiments. The manuscript implies that prior publications have some of these measurements but this is problematic in many ways. In Figure 1A should also measure TyrRS levels and compare these to endogenous TyrRS induced in by oxidative stress. Determine the timing and duration of the anticipated induction of TyrRS expression for endogenous translation. Are the levels comparable with the rescued expression system (shown in this study) and is there induced expression of the engineered TyrRS by stress? If not, is this problematic with the proposed ISR induction model? Does this proposed translation dampening (Fig. 1B) involve continued reduction of translation initiation or elongation? Does the TyrRS +/- nuclear localization reduce global translation in the absence of eIF2 phosphorylation function?

      The H2O2 treatment involves an initial insult and presumably the H2O2 is quickly dissipated. Therefore, one is likely not measuring the length of H2O2 exposure but rather the time after a short duration of stress. Other stress treatment regimens, including those involving oxidative damage, can be continuous. In Fig. 1C and other measures the synthetases, especially TyrRS, to show the level of overexpression.

      2) Figure 2 and supplement: The ChIP analyses appears to feature overexpression of TyrRS (tagged versions different than those used in Fig. 1?). Are immunoblot measurements of the versions of TyrRS in Fig 1A applicable to those in Fig 2? A key feature of this pathway is that TyrRS expression late in the ISR directs the nuclear localization of the enzyme. Test this model with versions of TyrRS whose expression levels and regulation are appropriate in the ISR. Does the mRNA measurements in Fig. 2B involve +/- oxidative stress? This is critical to the proposed model.

      3) Figure 3: Explain more clearly the mini-TyrRS and its utility. This point is also germane to earlier figures.

      4) Figure 4: Be clear about the expression levels of the tagged TyrRS for the MS studies. Be sure to provide statistical information and support documentation in the methods and supplemental tables. Would be helpful to include the nuclear exclusion mutant with the co-IP. The analysis of the E196K mutant of TyrRS needs fuller development (e.g. with the stress condition) and clarity.

      5) Figure 5: Regarding biological implications and cell survival, one finds it difficult to separate altered TyrRS charing of tRNA(Tyr) in this equation. Show the different mutants and arrangements do not alter aminoacylation of tRNA(Tyr).

    2. Reviewer #2:

      This paper presents a very compelling story: TyrRS has an important moonlighting function in the nucleus involving regulated gene expression via the recruitment of transcriptional co-regulators that is subordinate to TyrRS' ability to sense changes in the cellular environment. If proven correct this notion stands to influence our thinking about cellular stress responses. Therefore, the task of the reviewers is simply to critically evaluate the evidence; the significance of the claims is not in question.

      According to the authors, by a mysterious process, that is not expanded on here, under oxidative stress conditions (200 µM H2O2-treatement of HEK293 cells for extended periods) a small fraction of TyrRS finds its way to the nucleus, where it selectively represses genes involved in the ability of cells to synthesize new proteins. The consequence of this selective transcriptional repression is a sustained oxidative stress-induced repression of protein synthesis that is entirely dependent on this nuclear translocation event.

      The formative experiment supporting this chain of events is a comparison of cells in which the endogenous TyrRS has been inactivated by RNAi and rescued in trans, either by a wildtype TyrRS (i.e. one subject to this regulated nuclear translocation event) or a TyrRS bearing mutations in its nuclear localization signal (242KKKLKK247 to NNKLNK. Figure 1A shows that rescue with the NLS mutant TyrRS leads to superbasal (> complete) recovery of protein synthesis, whereas rescue with the wildtype TyrRS is associated with sustained stress-dependent decrease in protein synthesis.

      This foundational experiment is not described in any detail, nor are its key tenets confirmed experimentally, instead the reader is referred to two previous papers, Fu 2012 describing the NLS mutations and Wei 2014 describing the implementation of this allele swap). Neither the extent of the inactivation of the wildtype allele nor the extent of the rescue are presented. Nor, for that matter, is there evidence that in the cells tested in Figure 1A the NLS mutation indeed abolishes the stress-dependent nuclear import of TyrRS. The WT-rescued cells are not even compared to the parental cells. These weaknesses are compounded by the inherent unreliability of any comparison of two clades of cells, as near as one can tell the authors have compared here two preparations of cells to which they attribute diverse properties.

      Given how much is hanging off the phenotypic comparison of the WT and NLS mut TyrRS, it seems reasonable to impose a much higher standard on the experimental system. In 2020, this amounts to an allele replacement of the endogenous TyrRS with a silently-marked wildtype and NLS (and other) mutant coding sequences. Given the essentiality of TyrRS this should be a simple matter, using CRISPR/Cas9 to target the endogenous locus and offering a repair template to bring in the new alleles. Once implemented this method will produce numerous independent stable clones with the desired genotypes that can then serve in a comprehensive phenotypic analysis that traverses the problem of random clonal variation and phenotypic drift in clades of puro-resistant cells (that plagues the interpretation of the experiments shown here) It is uncertain if the above would be enough. The NLS of TyrRS is also involved in tRNA binding and potentially in other aspects of the charging reaction. Thus, mutations in that sequence rather than purely interfering with the putative nuclear functions of TyrRS, may also compromise the protein's more conventional function, with important and unanticipated phenotypic consequences. Fu et al. 2012, have made an effort to address this issue by comparing the affinity of WT TyrRS and the NLS mutants for tyr-tRNA (Table 1 therein) and by measuring tRNA acylation (Figure 2B, therein). The upshot of these measurements is that mutations in NLS severely compromise tRNA binding and acylation and even the weakest mutation, used here, has a measurable defect. These findings call into question the sweeping conclusions regarding the functionality of the NLS mutation. Therefore, to convince the sceptic the authors need to provide parallel evidence that selectively compromising nuclear transport of TyrRS is at the heart of the phenotypes observed.

      In this vein it is notable that whereas in Wei 2014, study of the phenotypic consequences of the NLS mutation (on the cells' response to DNA damage) was buttressed by manipulation of angiogenin, an agent putatively implicated in the signal that sends TyrRS to the nucleus in stressed cells, no such attempt is made here; is angiogenin no longer believed to play a role? If not, it is incumbent on the authors to discover such trans-acting factors, and study the effect of their manipulation on the phenotype. This may be challenging, but the important claims for discovery made here must be matched by equally convincing experiments.

      And then there is the surprising fact that in Wei 2014 and here the same cells exposed to the same stress seem to have very different consequences to gene expression programmes - where was the nuclear TyrRS-induced downregulation of 'translation' genes in 2014? Were none included in the 718 genes on the SmartChip Real-Time PCR System (WaferGen Biosystems)? Furthermore in 2014 Wei et al were concerned about the confounding effects of the different TyrRS alleles on protein synthesis, as the basis for the effects on DNA damage response (in their words: 'Considering that a simple knockdown of TyrRS may affect global transcription through a general effect on translation...'), yet dismissed this concern only to return now with a new version of reality whereby translational effects are all important. These issues need to be discussed and accounted for.

      In summary, this is a paper presenting a very interesting but inadequately supported idea.

    3. Reviewer #1:

      Previous work has shown that the nuclear import of TyrRS is stimulated under stress and that nucleus-localized TyrRS functions through the transcriptional machinery to promote the expression of DNA damage response genes for cell protection. In this work, evidence is presented that nuclear TyrRS also inhibits bulk translation in a manner correlated with its association with several AARS-encoding genes and that for elongation factor eEF1A, and recruitment to these genes of HDACs. Mutation of the TyrRS NLS, whose function in nuclear localization provides for coupling between low tRNATyr binding and nuclear localization, was found to derepress bulk translation after prolonged oxidative stress by H2O2, without altering eIF2 phosphorylation levels or mTOR activation, and overexpression (o/e) of TyrRS can reduce protein synthesis, in a manner enhanced by the E196K mutation associated with Charcot-Marie-Tooth disease (CMT), shown previously to enhance TyrRS association with transcriptional co-repressors. ChIP-Seq of overexpressed V5-tagged TyrRS showed binding to only 17 sites, of which 15 are within gene coding sequences, among which four encode TyrRS, TrpRS, SerRS and GlyRS, and a fifth encodes elongation factor eEF1A. These results were confirmed by ChIP analysis of endogenous TyrRS, using the HisRS gene as negative control; and the occupancies were shown to increase on H2O2 treatment. The expression of these AARS/eEF1A gene transcripts was shown to be reduced by o/e of TyrRS, in a manner enhanced for at least some of them by the E196K CMT mutation; and the repression was shown to be eliminated by the NLS_mut for YARS expressed at native levels. Reductions in AARS/eEF1A protein expression were also observed on WT TyrRS o/e. Sequence analysis of the genes showing TyrRS binding by ChIP-seq led to identification of a motif that was shown to be required for binding to TyrRS in vitro in EMSA assays with either purified TyrRS or in extracts from cells overexpressing it, in a manner requiring the full-length TyrRS and not only the catalytic core of the enzyme. It was not shown however that eliminating this motif from any of the target genes attenuated their repression by nuclear-localized TyrRS. Mass spec analysis of affinity-purified, overexpressed TyrRS identified interacting proteins, and several of which were shown to be coimmunoprecipitated with endogenous TyrRS in non-stressed cells, including the transcription cofactors Trim28, HDAC1, and subunits of the NURD co-repressor/histone deacetylase complex. ChIP assays showed that overexpression of TyrRS lead to decreased levels of H3K27Ac, a histone mark of active transcription, and elevated occupancies HDAC1, TRIM28, or NURD subunit CHD4 in non-stressed cells at the AARS/eEF1A genes, with either TRIM28/HDAC1 or CHD4 being observed for all of the genes except the TyrRS gene that shows all three cofactors present. Based on these results, the authors conclude that increased nuclear localization of TyrRS on oxidative stress leads to increased binding of TyrRS to the AARS/eEF1A genes with attendant direct recruitment of either TRM28/HDAC1 or NURD, leading to transcriptional repression of these genes, which is responsible for the reduction in bulk protein synthesis observed after prolonged H2O2 treatment. They go on to provide evidence that cell survival in H2O2 is enhanced by nuclear association of TyrRS (dependent on the NLS), and that in its absence, conferred by the NLS_mut, apoptosis is increased. They also show that ROS increases by preventing TyrRS nuclear localization by the NLS_mut, and that this effect as well as decreased cell survival for this mutant in H2O2 can be rescued by the translation elongation inhibitor harringtonine.

      The results presented in this report provide some support for the main conclusions of the paper and the overall model presented in Fig. 4F. However, as detailed below, many of the main conclusions of the paper are based on correlations and lack direct experimental support, and a number of the experiments are not comprehensive enough with sufficient conditions and controls to establish that the effects observed can be attributed to enhanced nuclear localization of TyrRS in response to H2O2. Considering the statements in the abstract, the evidence is reasonably strong that nuclear localization of TyrRS leads to inhibition of global translation at a stage later than that of eIF2α/ATF4 and mTOR responses, and that excluding TyrRS from the nucleus increases apoptosis under prolonged oxidative stress (although even this last point requires better documentation). However, the evidence is inadequate in several respects to claim that TyrRS directly represses the transcription of translation-related genes by recruiting TRIM28 or NURD complex, and as claimed on p. 13 of the Discussion, that the repression of the four AARS genes and the gene for eEF1A accounts for the reduction in bulk protein synthesis on H2O2 treatment.

      Major issues:

      -Evidence is lacking that the binding of TyrRS to the AARS/eEF1A genes is functionally important for the repression of any of the 6 putative target genes upon increased nuclear localization of TyrRS conferred by the NLS_mut or in response to H2O2. This would require ChIP analysis of TyrRS binding to the target genes for WT vs. NLS_mut TyrRS in H2O2-treated cells; and CRISPR mutagenesis of the putative TryRS binding site in the genome and analysis of transcription in the presence and absence of H2O2 for at least one of the putative TyrRS target genes.

      -Evidence from ChIP analysis is lacking that TRIM28, HDAC1, or the NURD complex are recruited to the AARS/eEF1A genes at native levels of TyrRS in a manner dependent on the NLS and stimulated by H2O2, as the ChIP experiments involved only overexpressed WT TyrRS in non-stressed cells. It is also unclear whether H3K27Ac levels at the putative target genes decline at endogenous levels of TyrRS on treatment with H2O2. Similarly, evidence is lacking that the physical association of TyrRS with these co-repressors is dependent on the NLS and stimulated by H2O2, as the co-IP analysis was limited to endogenous WT TyrRS in non-stressed cells.

      -Evidence is lacking that the cofactors TRIM28, HDAC1, or CHD4 are required for the down-regulation of target gene transcription on H2O2 treatment, which would require knock-down or elimination of these factors by CRISPR accompanied by analysis of target gene transcription +/- H2O2.

      -Direct evidence is lacking from ChIP analysis of RNA Pol II that the transcription of the AARS/eEF1A genes is reduced on H2O2.

      -Evidence is lacking that the repression of bulk protein synthesis is actually mediated by the reduced expression of the 4 AARSs and eEF1A. The fact that the TyrRS-E196K mutation enhances repression of bulk translation and also repression of 3 of the 5 target genes does support the idea that the repression of the target genes is instrumental in reducing protein synthesis, but again, this is still a correlation. There is no evidence that the reduced expression of the AARSs is sufficient to reduce charging of the cognate tRNAs, or that the reduced expression of eEF1A decreases the rate of translation elongation in cells or cell extracts.

      -There is an important lack of information provided needed to evaluate the quality and significance of the ChIP-seq analysis of TyrRS binding to DNA. No details are provided concerning the ChIP-seq analysis of V5-tagged TyrRS to indicate how the TyrRS occupancy peaks were identified and distinguished above background signal from the cells expressing V5 tag alone, whether replicates were examined to provide statistical significance for the identified occupancy peaks, and the sequencing library depths. No genome browser views were provided to show the signals from the cells expressing V5-TyrRS vs V5 alone to demonstrate the quality and reproducibility of data from replicates. The supplementary table S1 describing these data was even omitted from the submission, and it's unclear whether these data are being deposited in GEO.

      -There is an important lack of information provided needed to evaluate the quality and significance of the mass-spec analysis of TyrRS interacting proteins. No details are provided about the statistical significance of the protein interactions identified by mass-spec analysis of the affinity-purified TyrRS; and a negative control for non-specific association seems not to have been included in the analysis. The supplementary table describing these data was even omitted from the submission.

      -It's unclear whether the motif described in Fig. 3A was found under the peaks of TyrRS occupancy in the various genes showing TyrRS binding in the ChIP-seq experiments, nor whether its occurrence is statistically significant. It was not indicated that the motif coincides with the peak ChIP-seq occupancies for TyrRS, and if not, how this could be explained.

      -Evidence is lacking that harringtonine treatment reduced bulk protein synthesis under the conditions where it suppressed the effects of the TryRS NLS mutation in elevating ROS and decreasing cell survival.

      -In general, the figure legends are poorly written in lacking important details about the nature of the TyrRS being examined in the experiment (tagged vs endogenous; overexpressed vs. native levels), and also whether oxidative stress was imposed in the experiment, and if so, the exact conditions for the treatment. Figure legends should contain all of the critical details needed to understand and evaluate the significance of the experimental results without having to search elsewhere in the paper for them.

      -It needs to be clarified whether the mini-TyrRS construct lacks the NLS, and the significance of its behavior as a negative control for the effects of overexpressing WT TyrRS.

      -For the experiment in Fig. 5B, quantification of the fraction of caspase-3 or PARP cleaved from biological replicates is required.

      -The experiment in Supp. Fig. S4 lacks the results from cells untreated with H2O2 to ensure that these proteins were being induced by H2O2 in their hands.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Alan G Hinnebusch (Eunice Kennedy Shriver National Institute of Child Health and Human Development) served as the Reviewing Editor.

    1. Reviewer #2:

      In this manuscript, the authors describe an interaction of EGFR and Gal7 and that Gal7 binding downregulates EGFR activity. They show that Gal7 null mice exhibit thickening of the epidermis. In the absence of Gal7, EGFR is more active, which is supported by increased EGFR phosphorylation and phosphorylation of downstream molecules. Although a related protein, Gal3, has been shown to upregulate EGFR activity that may be functionally relevant in colorectal cancer, the authors' description of EGFR-Gal7 interaction is new. However, a number of claims made are not supported by the data presented. For example, in the abstract, the authors state that Gal7 is a direct binder of E-cadherin but it is not demonstrated experimentally.

      Additional comments:

      1) In Figure 3A graphs, authors show that both baseline (Fig. 3A) and ligand-induced (Fig. 3B) EGFR phosphorylation is higher in Gal7 knockdown cells. This reviewer is left to assume that Figure 3A graphs are derived from WB data from Figure 3B and in those WBs the increase in pEGFR, pERK, pAKT levels after Gal7 in absence of EGFR are not convincing. Also, Fig. 3B has two panels and they are not clearly explained in the figure legend.

      2) Figure 4A, lower panels would be more convincing if HaCaT and shGal7 were run on same gel, just like upper panels.

      3) Figure 4B, on top of WB panels, labels are not aligned properly and the reviewer is left to assume that the loading conditions are 0, 0.5, 1, 2, 4, 8, and 24 h, first for HaCaT, followed by same time points for shGal7. Also, the results from time course in Figure 4A and 4B are not consistent; total EGFR levels are downregulated as early as 2 min in Fig. 4A, whereas loss of EGFR is more gradual (over hours) in Figure 4B.

      4) In Figure 4B legend, cycloheximide treatment is mentioned but in the figure it is not indicated which samples are treated with cycloheximide.

      5)In Figure 7A, +EGF+rGal7 condition should be included for shGal7 cells

      6) Figure 7F experiment needs to be on the same blot. Also, independent binding of Gal7 with E-cadherin is not shown in Fig. 7F or a similar experiment. This might indicate that both EGFR and Gal7 cooperate to stabilize interaction with E-cadherin as E-cadherin is unable to bind to either individually.

      7) Figure 7 is referred to as Figure 8 in the text.

      8) The manuscript is not well-written and needs to be thoroughly edited. For example, page 8, last line. “Colocalization assays of Gal7 and LAMP-1 gave no results”.

    2. Reviewer #1:

      In this paper the authors provide evidence that Galectin-7 binds the extracellular domain of EGFR regulating its signaling.

      Although the in vitro study is for the most part nicely done, the major problem of this paper is the overall novelty. To this end several publications clearly show that, 1) members of the galectin family (e.g. 3) regulate EGF receptor signaling; 2) galectins (e.g. 8) regulate the early trafficking of EGFR; 3) galectins (e.g. 3) binds and regulate RTKs, including EGFR; 4) galectin-7, the topic of this paper, regulates e-cadherin expression and dynamics. Thus it is felt that the fact that galectin-7 binds to and regulates EGFR signaling is not sufficiently novel.

      In addition, it is felt that some experiments are not sufficiently quantified (e.g. intracellular signaling) and some data are of descriptive nature (e.g. the characterization of the gal-7 null mice and in vivo evidence that gal-7 interacts with EGFR is somehow superficial).

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      As you can see from the reviews included, the reviewers have identified major shortcomings with this study that overall dampen the enthusiasm for the results reported. One of the major pitfalls identified is the overall novelty of the paper. As you can see from the detailed comments by the reviewers, other Gal family members have been shown to regulate EGF activation and trafficking, and to bind RTKs. Thus the identification of Gal 7 as a novel regulator of EGF receptors does not provide a clear advance. In addition, the claim that Gal7 is a direct binder of E-cadherin is not demonstrated experimentally. Some experiments shown should be shown on the same blots, and it felt that they lack solid quantification and in some cases are of descriptive nature. Finally it is felt that the manuscript is not well written and editing is recommended.

    1. Reviewer #2:

      CUT&RUN, a recently developed method, is a convenient alternative to ChIP-seq. Because it generates a footprint of DNA protected from MNase digestion, it can potentially also provide more nuanced information than ChIP-seq. In this paper, CUT&RUN is applied to the mapping of RNA polymerase II (Pol II) binding sites in the genome of a human lung carcinoma cell line. A technical innovation in the current paper is that the authors bypass the attachment of cells to concanavalin A-magnetic beads for all steps from cell permeabilization on, and exploit the fact that the cells they use naturally adhere sufficiently well to the bottoms of multi-well plates that these steps can all be performed on the cell culture plates themselves.

      In the original CUT&RUN paper, it was already pointed out that different size classes of protected fragments might reveal different aspects of the biology of DNA bound factors. The authors of the current work extend this observation, and report two size classes of fragments that are produced by CUT&RUN applied to RNA polymerase II. They interpret the shorter fragments as marking Pol II sitting in a poised, compact state directly at the transcription start site (TSS), and the longer fragments downstream of the TSS as reflecting a less compact or larger, stalled Pol II complex after transcription has been initiated. This is consistent with what we know about regulation of nascent RNA elongation by Pol II shortly after transcription initiation, a phenomenon that has been known for individual genes since the 1980s, and that has first been documented genome-wide well over a decade ago.

      In addition, the authors suggest that a substantial fraction of Pol II is also found in a paused/stalled/poised state upstream of the TSS. Unfortunately, it is unclear what the upstream signal reflects. E.g., is this pausing because of bi-directional transcription, or because of a separate pre-initiation complex or conformation? Without such insight, the observation does not add to our understanding of transcription initiation and elongation.

      In aggregate, the authors present a simplification over conventional CUT&RUN for cell cultures, and they provide additional details for Pol II positioning near TSSs. While the work is technically well done, the technical improvements are relatively minor, and there are no principally new biological insights.

    2. Reviewer #1:

      The technical advance, which involves CUT&RUN on plates and doing paired end reads is modest. The main result of interest is the detection of a minor Pol II ChIP peak that maps around the transcriptional start site (TSS) as opposed to the major peak that corresponds to paused Pol II downstream from the TSS. The existence of the Pol II peak near the TSS is hardly surprising on first principles, and it is unknown what this peak corresponds to in terms of mechanism. The authors refer to this as "pre-initiation" and "poised", but there is no evidence for this. It is entirely possible (in my opinion more likely) that this peak corresponds to abortive initiation, a well-known step in the transcription cycle where Pol II makes short abortive transcripts that only occasionally get extended to longer products. It wasn't clear what the CTD phosphorylation status of this TSS-linked Pol II is, but it seems like it was phosphorylated at serine 5 residues. If so, this would indicate that TFIIH had already mediated the phosphorylation, which would release Mediator and allow promoter escape. Whatever the explanation, the existence of the peak doesn't indicate anything about mechanism. Lastly, this TSS-linked peak has been seen by Erickson (2018) so the result per se isn't novel. The approach here is more physiological than Erickson, but this isn't a significant advance, especially since there is no mechanistic information.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

    1. Reviewer #3:

      The method described, Back-it-up (BIU), builds upon the recently published Shake-it-off (SIO) system for EM grid preparation by eliminating the requirement for self-wicking, nano-wire grids (along with their inherent limitations including grid-to-grid variability and limited wicking capacity) by back-blotting standard copper-faced EM grids with highly absorbent glass fiber filter paper. Additional modifications to the SIO unit are reported that enable grid preparation (sample application-to-vitrification) times on the order of ~100ms. Although the achievement of this time constant has been reported for the Spotiton and chameleon automated grid preparation robots, these systems are technically complex and expensive to build or buy. As reported here, BIU represents for labs of modest financial resources a robust, reproducible high speed cryo-EM grid preparation device for around $1000 that uses a fraction of the sample volume required by typical automatic plunge freezer and can achieve sub-second plunge times that reduce the negative effects (denaturation, preferred orientation) of the air-water interface on the protein sample.

      This study is well organized. First, it clearly demonstrates and provides visual supporting evidence of the absorptive capacity of the glass fiber filters. Next they validate the filters on a commonly used grid prep device using back-blotting. Finally, the authors use multiple samples and plunge speeds to demonstrate the utility and effectiveness of combining the glass fiber filters and a modified SIO device to prepare grids that yielded high resolution EM structural data.

    2. Reviewer #2:

      General Assessment:

      The paper is well crafted and a clever improvement on current methods by combining the shake-it-off system with a Leica GP3 back blotter switching out the filter paper for a glass fiber pad. This improvement has likewise shown impressive results, and this information should be disseminated to help the field move forward. However there are a couple of issues, with borderline tangential material, that must be dealt with.

      Substantive Concerns:

      There are two major substantive concerns. The first revolves around the use of the influenza A hemagglutinin trimer in a direct apples to apples comparison with the work of Noble et.al. In their paper using spotiton they showed that dropping from 500ms to 100ms not only reduced the preferred orientation dramatically, but it also changed the thickness distribution of the ice in the holes. Thus the paper left the reader with a bit of an open question about whether it was a thickness effect or a temporal effect that resulted in the reduction of the preferred orientation problem. This is especially pertinent given their tomography work showing that the influenza A hemagglutinin trimer displays extreme sensitivity to the thickness of ice. For example, when the ice is too thin the trimer is completely excluded, then when the ice is just barely thick enough there is a region where only the top view orientation is possible, and finally only in the thicker ice (100-150nm) are side views possible. Thus, when attempting to compare the results from the BIU to the results from Noble et. al. the ice thickness becomes a confounding factor to the assignment of the improved distribution due to reduced time between blotting and vitrification. It is quite likely that the BIU's enhanced results are not a product of the reduced time between deposition and vitrification but rather due to the BIU producing a thicker ice in the middle of the holes due to the different thinning method, thus allowing for more side views as shown in Noble et. al.. Therefore the lines 265-271 seem, to this reviewer, to be much too strong of a conclusion; however, given the importance of the observation this reviewer suggests that the authors simply remove lines 269-271 and leave the important observation as an important observation.

      The sentence starting on line 169 should be removed. A biosafety cabinet alone is insufficient to allow this invention to be compatible with BSL3/4 safety protocols, as the aerosol generated not only contaminates everything in the biosafety cabinet, but also will stay in the air for quite some time afterwards, long enough that a researcher might accidentally make the mistake of releasing whatever pathogen they are working with.

    3. Reviewer #1:

      I found no faults with this study and believe it is a timely contribution to the subfield of cryoEM sample preparation. Given the lower costs associated with this technology than the alternatives, it is possible that through-grid wicking with glass fiber will be widely adopted.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Adam Frost (University of California) served as the Reviewing Editor.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Molenaars et al., describe a protocol to extract and quantify a wide range of polar and apolar metabolites from the same C. elegans sample using methanol-chloroform based phase separation. The authors assess the method across different input amounts, in comparison to a 1-phase extraction method and through metabolic perturbations using RNAi against several metabolic enzymes. Finally, they provide a metabolomics analysis of metabolite variation across several C. elegans strains. The data are of overall high quality and presented in a clearly written manuscript.

      We really appreciate the positive words from the reviewer.

      To help assessing the value of the method to other approaches, several controls are suggested below:

      1.Fig.1: Metabolite abundance in the polar phase should be compared to 1-phase extraction methods (analogous to Fig. 2I, which compares metabolites in the apolar phase to 1-phase extraction)

      We acknowledge the apparent asymmetry in the text; comparing our two-phase method to a single phase lipidomics method indeed suggests a similar comparison for metabolomics. However, our established polar metabolomics method has always been based on this exact two-phase extraction. The current method exclusively asks whether it is possible to integrate our dedicated lipidomics platform into our established two-phase polar metabolomics method, by utilizing the apolar phase that is usually discarded. This way, the method enables comprehensive metabolomics/lipidomics screening while limiting the need of culturing twice the amount of material.

      Our manuscript does not necessarily ask the more fundamental question of the advantages of a one-phase vs two-phase extraction for polar metabolites. Interestingly, the one-phase vs two-phase metabolomics methods have been compared previously and the authors show here that the two-phase method achieved broader metabolite coverage, satisfactory extraction reproducibility, acceptable recovery and safety (DOI: 10.1038/srep38885). This is most probably due to the cHILIC column being sensitive for contamination and therefore excluding lipids from your samples is beneficial for measuring polar metabolites. We hence believe that developing a single phase polar method would appear superfluous for the purpose of this study.

      2.Are polar metabolites also detected in the apolar phase? Can the less hydrophobic lipids missing from the apolar phase detected in the polar phase?

      This is an interesting question that mostly relates to the lyso-lipids that are not detected in the lipid phase of our two-phase extraction. The first point to make is that sample solvents that are used at the final stage of extraction are not compatible between methods. In other words, the solvent we normally use for the lipids phase (xxx) cannot be injected on the cHILIC column. So, in a practical sense, we would not be able to measure these compounds, even if they would technically be dissolved in the other layer. However, we tried a few different alternative approaches to get more information on this point:

      We have attempted to integrate the lyso-lipids in the cHILIC measurements, in the polar layer, using the polar sample solvents. This was unsuccessful; no reproducible peaks, not even the internal standards, were measured. We will include a note on these results in our manuscript. We have, albeit for a different sample matrix, attempted to dissolve both layers of the two-phase extraction in the cHILIC sample solvents. While we cannot guarantee this for all metabolites, it appears that most polar metabolites are exclusively found in the polar layer. We were not able to integrate even a single peak from any of the sugar, amino acids, nucleotides, etc in the apolar layer dissolved in polar solvents. We have reconstituted both the polar and apolar layer of our two-phase extraction in 50:50 methanol:chloroform and analyzed them on the lipidomics platform. We did find some of the lipid internal standards partition to the polar phase, especially LPG (and to a lesser extent LPE and LPA) compared to for instance PE, SM, PG and PC that all end up in the apolar phase. We will include these data in the revised manuscript as a supplemental figure as it demonstrates that the lyso-lipids are poorly measured in the two-phase extraction. This is also why in the text we advise to use the dedicated one-phase extraction when interested primarily in these species.

      3.Fig.3l-n: The authors claim that extracting metabolites from the polar and apolar phases of the same sample leads to better cross-correlation than if metabolites are extracted from different samples using methods optimized for the respective metabolite classes. To provide experimental evidence, metabolite abundance should be compared directly when metabolites are extracted from the same or from different samples using suitable methods.

      We agree with this point. We will amend the text to not overstate these advantages.

      Reviewer #1 (Significance (Required)):

      The methodological and conceptual advancement of the present study is rather incremental. The authors essentially use the classical chloroform/methanol/water phase separation protocols developed by Bligh & Dyer and Folch, which have been used extensively for lipid extraction for many decades now. However, the effort to carefully measure the metabolites contained in the aqueous phase is laudable. For method validation, the authors use well-understood perturbations that yield predictable results. Overall, I consider the study more appropriate for a publication as a methods protocol, which could be of interest to the metabolomics community, rather than as a research paper.

      We agree; our goal was indeed to create and share a method, we will make sure to emphasize this in our cover letter.

      While the extraction method we use is not novel per se and based on classical extraction procedures, it is important to underscore that we are only now able to use these extractions in combination with high-resolution mass spectrometry. This opens new opportunities for basic discovery. The efficiency we achieve by using both phases of the two-phase procedure makes our method highly attractive for hypothesis generation, especially in sample sets where limited amounts of material are available.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors provide a detailed description of a method to analyse both polar as well as lipophilic metabolites from the same nematode sample. This provides significant advantages over methods using individual samples. Moreover and by using internal standards they establish an extremely good correlation of individual metabolites. This paper is of immediate importance for the worms community and beyond.

      We are very grateful to receive this positive response from the reviewer and for highlighting the advantages of our described method also beyond the worm community.

      **Major comments:**

      none **Minor comments:**

      The correction process using internal standards could be described a bit more detailed.

      In our revised manuscript, we will describe the internal standard use and corrections in more detail in the text. In summary: internal standards are selected for specific metabolites based on their Pearson correlation and %CV. Subsequently, metabolite peak areas were divided by the area of the appropriate internal standard. This corrects for any loss of sample during sample prep, for instance during the isolation of the two layers.

      Jenni Watts has written a nice Worm Book chapter on lipids which may be cited in addition to reference 17, since it covers many of the metabolites and related enzymes contained in this manuscript

      We will include a reference to this Worm book chapter reviewing fat regulation in C. elegans in our paper, thank you for the suggestion.

      Reviewer #2 (Significance (Required)):

      see above

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript is well written and consider. However, there is room for further improvements:

      We thank the reviewer for the positive response and for the suggestions raised.

      1) Author need to write exactly how many metabolites not just >, semi-quantitative analysis of >100 polar (metabolomics) and >1000 apolar (lipidomics) metabolites in C. elegans, for example they did with other papers in Table 1

      We understand that this might appear vague. The notation was a compromise, based on the following considerations:

      1. The maximum number of reported metabolites can be different to the number of analyzed metabolites in a specific experiment or even a specific sample. For instance, our method is perfectly capable of measuring creatine metabolism –we have standards for these metabolites and they can be reliably measured–, however we have not yet been able to detect these metabolites in elegans. Some mutants also lose abundance of a certain metabolite to the point of it not being reliably measurable, which means they are filtered out in the bioinformatics.
      2. Since the initial draft of our manuscript we have been able, and will continue to be able, to add new metabolites to our analysis, as we perform a full scan over the range of m/z 50-1200. Because of this, we felt it more accurate to state that we can measure >100 metabolites, instead of a specific number.

        2) Authors also need to clarify on number of samples in the result section while describing the statistical analysis.

      We understand this point raised by the reviewer and will specify not only the number of samples, but also that they are indeed biological replicates. This will be included in the figure legends.

      Reviewer #3 (Significance (Required)):

      This might be interesting paper for the research community who work with C.elegans (metabolism or in general)

      Thank you, we are in fact utilizing this double extraction for other non-worm samples such as mice an human tissues and we believe this could also benefit the research community beyond the model organism C. elegans.

      The authors must deposit the raw data and make it available for the public, so they could also benefit from this good work.

      It is our full intention to share our data in a convenient and standardized way through for instance the MetaboLights database (https://www.ebi.ac.uk/metabolights/). We agree and changes will be implemented as suggested.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      **Summary:** The authors present a method for extraction of both lipid and polar metabolites from the model organism C. elegans. This extraction method is based on the well-established Blyth and Dyer method, with a slight modification to retain and utilize both the organic and non-polar fractions for LCMS analysis. They applied and tested this method against a monophasic extraction utilizing the same solvent system. They report that there is a loss of metabolites in the non-polar fraction to the polar fraction (of more polar metabolites) and small differences between the monophasic and biphasic extractions. They also expanded on the linearity of the extraction efficiency by increasing the number of worms. Further they applied the single extraction method to both knockdown mutants of C. elegans and Recombinant Inbred Lines derived from N2 and the natural isolate CB4856 to determine whether this method would still be able to differentiate the metabolome between the genetically different C. elegans populations.

      We thank the reviewer for their comments and suggestions.

      **Major comments:**

      *Are the key conclusions convincing?*

      As a whole the conclusions are convincing and valid.

      We appreciate that the reviewer considers our work convincing and valid.

      *Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?*

      The use of the adjective "robust" is, to an extent, erroneous. As defined, a robust method implies that the method is capable of withstanding small (deliberate or not) changes or variations. In this case the robustness of the method was not assessed and not clear how replication was carried out.

      We have in fact performed analysis on both biological replicates and repeated injections of pooled samples to determine robustness. We will clarify the biological replicates in the text and will place the pooled QC samples in the main text with additional explanation and relevant statistics such as % coefficient of variance (%CV) between them. For clarity, we plotted %CV of all polar as well as apolar metabolites. For polar metabolites 97% of the metabolites had a %CV lower than 30. For apolar metabolites 86% of the metabolites had a %CV lower than 30.

      *Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.*

      Reproducibility would need to be assessed/quantified to establish how robust the method is. Even though linearity with an increase in the number of worms is a good indication, it does not satisfactorily establish the robustness of the method. The use of replicates to assess the agreement between measurements (i.e. bland-Altman plots), linearity as well as coefficients of variation (included in the sup material but not clear in the body of the manuscript) would characterize the methods best. The isolation of each variance originating from instrumental (pooled quality controls), biological (biological replication) and sample preparation (multiple extractions from the same biological source) is critical.

      We have these data and will elaborate on this in our revised manuscript. We will discuss the quality control samples more prominently in the main body of the manuscript, and show one or more figures that specifically address both analytical and biological variance (see rebuttal figure 2). In summary, we assessed this variance using (a) a repeated injection of a pooled QC sample, and (b) biological replicates prepared individually. Especially the latter condition, in which we assess biological variance is representative for the actual method application. The %CV under these conditions is ≤20% for the majority of metabolites, which is why we consider our method robust.

      *Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.*

      The suggested experiments are in-fact just further analysis with the already collected data. There would be no need for further experiments, however it is not clear whether pooled QCs/or reference materials were used and the number of replicates per experimental design.

      All the data are available. These analyses will be included in the revision.

      *Are the data and the methods presented in such a way that they can be reproduced?*

      The methods are very well described. My only comment is to address how the replicates were grown/created and how many per strain/group. If the replicate measurements were done on the same samples (repeated injections), I believe that would weaken the findings (if not invalidate them altogether), however if these were biological replicates from independent starting populations the findings are valid and convincing.

      We performed bona fide biological replicates. We will explicitly mention this in the paper together with the other descriptions of our validation protocols.

      *Are the experiments adequately replicated and statistical analysis adequate?*

      As per my above comments.

      **Minor comments:**

      *Specific experimental issues that are easily addressable.*

      It is not clear how the sample preparation process was carried out (randomization, run order, QCs etc). As per the guidelines widely accepted from –Broadhurst, D., Goodacre, R., Reinke, S.N. et al. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics 14, 72 (2018). https://doi.org/10.1007/s11306-018-1367-3.

      We will provide details on the analysis itself in a table. In summary: Samples were measured in a random order, with blanks and QC samples throughout the run.

      *Are prior studies referenced appropriately?*

      A major reference that has applied this extraction method before in the same model organism is missing:

      Castro, C., Sar, F., Shaw, W.R. et al. A metabolomic strategy defines the regulation of lipid content and global metabolism by Δ9 desaturases in Caenorhabditis elegans. BMC Genomics 13, 36 (2012). https://doi.org/10.1186/1471-2164-13-36

      We will include this paper in our references. We would like to note though that this method requires not just an LC system to analyze lipids, but also GC with additional derivatization steps. Our method achieves comprehensive lipidomics using a single technique and no additional derivatization.

      Further a recent publication that goes beyond the work described by the authors using similar approach: MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses. Ernesto S. Nakayasu, Carrie D. Nicora, Amy C. Sims, Kristin E. Burnum-Johnson, Young-Mo Kim, Jennifer E. Kyle, Melissa M. Matzke, Anil K. Shukla, Rosalie K. Chu, Athena A. Schepmoes, Jon M. Jacobs, Ralph S. Baric, Bobbie-Jo Webb-Robertson, Richard D. Smith, Thomas O. Metz mSystems May 2016, 1 (3) e00043-16; DOI: 10.1128/mSystems.00043-16

      We will also include this paper, reporting 51 polar metabolites and 84 lipid species, in our references. While we recognize that they also make use of both phases and the protein pellet, we think our method is much more practical in several key ways:

      Our metabolomics platform provides twice as many species and our lipids platform exceeds their analytical capabilities 10 fold. This means a far better coverage of differences within metabolite and lipid classes, allowing for far more intricate patterns to be detected. We show this for instance in our plots comparing carbon chain length to degree of saturation (Fig 4 and S2 in original manuscript); a comparison that is only possible with the data density that our method offers. The MPLEx metabolomics method also requires the use of a GC system and derivatization steps, while our method does not, making it much more user friendly and requiring only a single analytical system.

      *Are the text and figures clear and accurate?*

      Yes *Do you have suggestions that would help the authors improve the presentation of their data and conclusions? *

      The figures, overall are of exceptional quality.

      As per current scientific consensus, Box plots should also be overlaid with the actual datapoints (which was aptly done for the bar charts and other plots).

      The supplementary data even though comprehensive is hard to understand. A "readme" file detailing what data each file contains would improve readability and comply with FAIR principles.

      We agree that a readme file would make the supplemental data more understandable. We will provide such a file. For the box plots we will show the actual data points in our revised manuscript.

      Reviewer #4 (Significance (Required)):

      Even though the approach is not novel and has long been used in Natural Products Chemistry and in other organisms, it's highly significant to set an extraction method standard for the field of C. elegans metabolomics (including myself doing metabolomics and natural products chemistry with LCMS and NMR). However, this manuscript does not cover the technical aspects of the method with sufficient depth to hallmark this method as the standard for the field. Further information is needed to fill the missing gaps (as highlighted by the authors). Ratios between solvent and biological material amounts, reproducibility, recovery rates (even though buried in the supplementary files) and metabolite coverage are still missing.

      As a side note, the disparity between the monophasic and biphasic extractions could be overcome by a sequential extraction of the same sample, with no incurred cost on performance (and removing the much-dreaded pipetting uncertainty near the line between solvents). The second aspect of the manuscript, which initially was a welcoming idea (and important), became >50% of the manuscript creating a disconnect between the information set by the abstract and introduction and the results/conclusion. The work is extremely relevant in both sections of the manuscript, but the technical aspect is still lacking details and/or analysis.

      Strongly suggested: explicit compliance with the minimum reporting standards as per the Metabolomics Standards Initiative (MSI) and deposition of the data to a metabolomics repository (i.e. Metabolights or Metabolomics Workbench). These are internationally accepted requirements for metabolomics publications.

      We are aware that the extraction itself is an analytical chemistry staple. However, it is precisely in this fact that we find novelty. It should be noted that both of the other papers mentioned by the reviewers that have attempted to integrate lipidomics and metabolomics have had to resort to labor intensive (as well as possibly expensive and destructive) derivatization steps and a separate analysis on GC. Our method does not have these requirements. It is indeed a single and very common extraction, after which each dried phase is reconstituted and immediately injected. But this simplicity is not a concession, as our metabolome coverage is easily more comprehensive than the other mentioned methods. We therefore feel that this simplicity should not discount our currently presented method, but be considered an additional advantage.

      Sequential extractions may be an option to consider. However, we feel like they are less user friendly and unneeded. Because we use internal standards, it is never an issue to pipet slightly more or less of any particular sample; making it easy to avoid the line between solvents.

      We will explicitly clarify where we already comply with the standards (such as the analysis of biological replicates and repeated injection of a QC sample) and are confident we can add figures and further information such as deposition of our data to comply with the rest.

      REFEREES CROSS-COMMENTING

      Completely agree with reviewer #1 comments, they are on point and I completely missed it. Relevant and should be addressed.

      Reviewers #2 points out work worth acknowledging, the internal standard work was quite thorough and well designed.

      Reviewer #3 and my comments overlap nicely, the need for further description of samples/replication and deposition of data in a metabolomics repository.

      Further work is required to make this a good publication and standard for the field, without this extra work addressing the reviewers comments I feel this work could be to certain degree misleading and/or incomplete putting in cause its publication potential.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Summary:

      The authors present a method for extraction of both lipid and polar metabolites from the model organism C. elegans. This extraction method is based on the well-established Blyth and Dyer method, with a slight modification to retain and utilize both the organic and non-polar fractions for LCMS analysis. They applied and tested this method against a monophasic extraction utilizing the same solvent system. They report that there is a loss of metabolites in the non-polar fraction to the polar fraction (of more polar metabolites) and small differences between the monophasic and biphasic extractions. They also expanded on the linearity of the extraction efficiency by increasing the number of worms. Further they applied the single extraction method to both knockdown mutants of C. elegans and Recombinant Inbred Lines derived from N2 and the natural isolate CB4856 to determine whether this method would still be able to differentiate the metabolome between the genetically different C. elegans populations.

      Major comments:

      Are the key conclusions convincing?

      As a whole the conclusions are convincing and valid.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      The use of the adjective "robust" is, to an extent, erroneous. As defined, a robust method implies that the method is capable of withstanding small (deliberate or not) changes or variations. In this case the robustness of the method was not assessed and not clear how replication was carried out.

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      Reproducibility would need to be assessed/quantified to establish how robust the method is. Even though linearity with an increase in the number of worms is a good indication, it does not satisfactorily establish the robustness of the method. The use of replicates to assess the agreement between measurements (i.e. bland-Altman plots), linearity as well as coefficients of variation (included in the sup material but not clear in the body of the manuscript) would characterize the methods best. The isolation of each variance originating from instrumental (pooled quality controls), biological (biological replication) and sample preparation (multiple extractions from the same biological source) is critical.

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      The suggested experiments are in-fact just further analysis with the already collected data. There would be no need for further experiments, however it is not clear whether pooled QCs/or reference materials were used and the number of replicates per experimental design.

      Are the data and the methods presented in such a way that they can be reproduced?

      The methods are very well described. My only comment is to address how the replicates were grown/created and how many per strain/group. If the replicate measurements were done on the same samples (repeated injections), I believe that would weaken the findings (if not invalidate them altogether), however if these were biological replicates from independent starting populations the findings are valid and convincing.

      Are the experiments adequately replicated and statistical analysis adequate?

      As per my above comments.

      Minor comments:

      Specific experimental issues that are easily addressable.

      It is not clear how the sample preparation process was carried out (randomization, run order, QCs etc). As per the guidelines widely accepted from -

      Broadhurst, D., Goodacre, R., Reinke, S.N. et al. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics 14, 72 (2018). https://doi.org/10.1007/s11306-018-1367-3.

      Are prior studies referenced appropriately?

      A major reference that has applied this extraction method before in the same model organism is missing:

      Castro, C., Sar, F., Shaw, W.R. et al. A metabolomic strategy defines the regulation of lipid content and global metabolism by Δ9 desaturases in Caenorhabditis elegans. BMC Genomics 13, 36 (2012). https://doi.org/10.1186/1471-2164-13-36

      Further a recent publication that goes beyond the work described by the authors using similar approach:

      MPLEx: a Robust and Universal Protocol for Single-Sample Integrative Proteomic, Metabolomic, and Lipidomic Analyses Ernesto S. Nakayasu, Carrie D. Nicora, Amy C. Sims, Kristin E. Burnum-Johnson, Young-Mo Kim, Jennifer E. Kyle, Melissa M. Matzke, Anil K. Shukla, Rosalie K. Chu, Athena A. Schepmoes, Jon M. Jacobs, Ralph S. Baric, Bobbie-Jo Webb-Robertson, Richard D. Smith, Thomas O. Metz mSystems May 2016, 1 (3) e00043-16; DOI: 10.1128/mSystems.00043-16

      Are the text and figures clear and accurate?

      Yes

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      The figures, overall are of exceptional quality. As per current scientific consensus, Box plots should also be overlaid with the actual datapoints (which was aptly done for the bar charts and other plots). The supplementary data even though comprehensive is hard to understand. A "readme" file detailing what data each file contains would improve readability and comply with FAIR principles.

      Significance

      Even though the approach is not novel and has long been used in Natural Products Chemistry and in other organisms, it's highly significant to set an extraction method standard for the field of C. elegans metabolomics (including myself doing metabolomics and natural products chemistry with LCMS and NMR). However, this manuscript does not cover the technical aspects of the method with sufficient depth to hallmark this method as the standard for the field. Further information is needed to fill the missing gaps (as highlighted by the authors). Ratios between solvent and biological material amounts, reproducibility, recovery rates (even though buried in the supplementary files) and metabolite coverage are still missing.

      As a side note, the disparity between the monophasic and biphasic extractions could be overcome by a sequential extraction of the same sample, with no incurred cost on performance (and removing the much-dreaded pipetting uncertainty near the line between solvents).

      The second aspect of the manuscript, which initially was a welcoming idea (and important), became >50% of the manuscript creating a disconnect between the information set by the abstract and introduction and the results/conclusion. The work is extremely relevant in both sections of the manuscript, but the technical aspect is still lacking details and/or analysis.

      Strongly suggested: explicit compliance with the minimum reporting standards as per the Metabolomics Standards Initiative (MSI) and deposition of the data to a metabolomics repository (i.e. Metabolights or Metabolomics Workbench). These are internationally accepted requirements for metabolomics publications.

      REFEREES CROSS-COMMENTING

      Completely agree with reviewer #1 comments, they are on point and I completely missed it. Relevant and should be addressed.

      Reviewers #2 points out work worth acknowledging, the internal standard work was quite thorough and well designed.

      Reviewer #3 and my comments overlap nicely, the need for further description of samples/replication and deposition of data in a metabolomics repository.

      Further work is required to make this a good publication and standard for the field, without this extra work addressing the reviewers comments I feel this work could be to certain degree misleading and/or incomplete putting in cause its publication potential.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript is well written and consider. However, there is room for for further improvements,

      1) Author need to write exactly how many metabolites not just >, semi-quantitative analysis of >100 polar (metabolomics) and >1000 apolar (lipidomics) metabolites in C. elegans, for example they did with other papers in Table 1

      2)Authors also need to clarify on number of samples in the result section while describing the statistical analysis.

      Significance

      This might be interesting paper for the research community who work with C.elegans (metabolism or in general)

      The authors must deposit the raw data and make it available for the public,so they could also benefit from this good work.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors provide a detailed description of a method to analyse both polar as well as lipophilic metabolites from the same nematode sample. This provides significant advantages over methods using individual samples. Moreover and by using internal standards they establish an extremely good correlation of individual metabolites. This paper is of immediate importance for the worms community and beyond.

      Major comments: none

      Minor comments:

      The correction process using internal standards could be described a bit more detailed.

      Jenni Watts has written a nice Worm Book chapter on lipids which may be cited in addition to reference 17, since it covers many of the metabolites and related enzymes contained in this manuscript

      Significance

      see above

    5. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Molenaars et al., describe a protocol to extract and quantify a wide range of polar and apolar metabolites from the same C. elegans sample using methanol-chloroform based phase separation. The authors assess the method across different input amounts, in comparison to a 1-phase extraction method and through metabolic perturbations using RNAi against several metabolic enzymes. Finally, they provide a metabolomics analysis of metabolite variation across several C. elegans strains. The data are of overall high quality and presented in a clearly written manuscript.

      To help assessing the value of the method to other approaches, several controls are suggested below:

      1.Fig.1: Metabolite abundance in the polar phase should be compared to 1-phase extraction methods (analogous to Fig. 2I, which compares metabolites in the apolar phase to 1-phase extraction)

      2.Are polar metabolites also detected in the apolar phase? Can the less hydrophobic lipids missing from the apolar phase detected in the polar phase?

      3.Fig.3l-n: The authors claim that extracting metabolites from the polar and apolar phases of the same sample leads to better cross-correlation than if metabolites are extracted from different samples using methods optimized for the respective metabolite classes. To provide experimental evidence, metabolite abundance should be compared directly when metabolites are extracted from the same or from different samples using suitable methods.

      Significance

      The methodological and conceptual advancement of the present study is rather incremental. The authors essentially use the classical chloroform/methanol/water phase separation protocols developed by Bligh & Dyer and Folch, which have been used extensively for lipid extraction for many decades now. However, the effort to carefully measure the metabolites contained in the aqueous phase is laudable. For method validation, the authors use well-understood perturbations that yield predictable results. Overall, I consider the study more appropriate for a publication as a methods protocol, which could be of interest to the metabolomics community, rather than as a research paper.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      We thank the reviewers for their feedback and encouragement. We have now fully revised the manuscript to address all comments. Our specific responses are provided below and we have highlighted changes in the text. The major additions are:

      • analysis of simulated time-courses with lower temporal resolution
      • analysis of ex vivo PER2::LUCIFERASE SCN recordings
      • analysis of simulated time-courses with Poisson distributions of noise
      • plotted summary statistics for several figures
      • mathematical formula and explanation in the Methods Overall, these revisions have strengthened our findings and improved the manuscript, particularly in demonstrating that the issues with the chi-square periodogram are not specific to sampling interval or data type.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      Tackenberg & Hughey investigate the reliability of a popular period estimation algorithm, the chi-square periodogram. They find a bias in the estimation, and through careful investigation identify the cause. This is a well executed and well presented study.

      **Comments:**

      In Figs 2+3 the authors show that the discontinuity in periodogram coincides with the number of complete cycles, K. However, in Fig 2C there are several other positions where K abruptly changes, but little effect on the chi-squared statistic is observed. Can the authors offer an explanation as to why the magnitude of the discontinuities differ?

      We have taken a closer look at how each component of the chi-square statistic calculation changes at points where K decreases, and have found that discontinuities do always occur at these points. In addition to the obvious effect of the K * N term on the sudden decreases, we found that the sum of squares of the column means alone (the primary component of the numerator) also changes abruptly at each transition point of K. As a result, the discontinuity magnitude is likely roughly proportional to the amplitude of the chi-square statistic at that point.

      An important claim is that the discontinuity is observed in multiple software implementations. However, the plots of Supplementary Fig 1C,D are presented too small to evaluate this claim.

      In Supplemental Fig. 1C-D, the critical information is the shape of the periodogram and the presence of a discontinuity, so we believe the plot sizes are appropriate.

      It may be of interest to apply the algorithms to a single-cell experimental data set which are qualitatively different (e.g., oscillation shape, damping).

      We have created a new supplemental figure (Supplemental Fig. 8) by applying the strategy and visualization used in Fig. 6 to SCN PER2::LUC recordings instead of wheel-running data, and have updated the text accordingly.

      Reviewer #1 (Significance (Required)):

      It has been previously shown that the chi-square periodogram algorithm has performance shortcomings for the analysis of circadian data (e.g. Zielinski et al., 2004). However, this study demonstrates exactly why, giving more conclusive evidence to support the conclusion that it should be avoided. This will be useful to many in the mammalian circadian community. It should be noted however that other algorithms are already favoured by other ciock communities (e.g. plant), even if a rigorous understanding of the biases were lacking.

      The methods developed here will be valuable for future comparisons of circadian algorithms. Of particular importance will be comparing algorithms for analysis of single-cell rhythms or non-stationary rhythms.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Chi-squared periodograms (CSP) are routinely used in circadian biology. In particular, this test has been used to determine circadian period in behavioral data (e.g. actigraphy) in mammals, flies and other species. This paper suggests that CSP, in some circumstances (e.g. where there are discontinuities), that CSP could be improved by changing the algorithm. They propose different steps to do this (e.g. using their greedy CSP code) and/or by using alternative tests such as Lomb-Scargle.

      The authors use simulated data to demonstrate their findings, and whilst I can see the benefits of this, it would be useful to benchmark the algorithms on actual real world circadian data (e.g. actograms from mouse or fly experiments). Although these types of data may not be publicly available, it would be highly likely to be available from multiple labs in the circadian field. In particular, fly datasets will be abundant in many clock labs. This would aid the utility of the papers findings for the field.

      Fig. 6 is entirely based on real-world circadian data (mouse wheel-running activity), as is the newly added Supplemental Fig. 8.

      Reviewer #2 (Significance (Required)):

      The paper is helpful for the circadian field when dealing with datasets that may contain discontinuities.

      It appears that the paper will be primarily useful for behavioral data, rather than, for example, transcriptomic time courses, since these tend to be much shorter and less sample intensive. Thus, it would be useful for circadian (and other) researchers analysing activity data in particular.

      My expertise is in circadian rhythms, both behavioural and molecular (e.g. sequencing) level analyses. Thus, I would be a possible end-user for the algorithms in this paper.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      **Summary:**

      The authors identify a serious flaw in a popular method called Chi-squared periodogram (CSP) for period estimation in circadian rhythms. They systematically get to the source of the problem -- a discontinuity in the test statistic. This flaw leads to a bias in the period estimate. They present two modifications to the CSP, one of which they prefer. Nevertheless, they show that other more flexible methods such as Lomb-Scargle Periodogram work well without this discontinuity (bias) issue.

      **Major Comments:**

      1.One thing the authors do not include is timeseries lengths of non-integer days. Would it not be an interesting suggestion to choose a non-integer length time course, which is not a multiple of the periods of interest, and still continue using CSP as is ? This is also rather counter-intuitive.

      Figs. 3A and 6 and newly added Supplemental Fig. 8 use non-integer (24-h) days.

      2.I suppose the authors use a sampling resolution of 6min with wheel-running activity in mind. But it would be worth it in the interest of completeness to also consider a lower resolution. There is nothing in this study that ties it to the specific application, is it not?

      Although a sampling resolution of 6 minutes is not specific to wheel-running activity, we have added an analysis identical to that of Fig. 5 but with a resolution of 20 minutes (Supplemental Fig. 5). Additionally, the PER2::LUC SCN recordings analyzed in Supplemental Fig. 8 have a sampling resolution of 20 minutes.

      3.The authors discuss only the mean absolute error in the text but isn't the direction (sign) of the error also of interest. As far as I can see in Fig 5, conservative CSP overestimates and greedy CSP generally underestimates periods.

      We discuss both the error (references to Fig. 5A) and absolute error (references to Fig. 5B) in the text. We feel the interpretation suggested by the reviewer may be too reliant on the results of 3-day simulations, as the apparent underestimation by greedy appears far less substantial in simulations of 6 and 12 days.

      **Minor Comments:**

      1.I would like to see the formulae for the ratio of variances and p-values to be clear about how the authors computed the CSP. They describe it in words already, but I think some mathematics is warranted here.

      We have added the formula for the standard chi-square periodogram to the Methods section.

      2.It is nice to the see the raw data in the plots. But I would like to see the plot of the summary statistics (mean and variance/st. dev) for each of scatter plots to judge the size of bias. It is not easy to do this with the Excel sheet.

      We have overlaid a black circle representing the median and a vertical black line representing the 5th-95th percentile range onto Fig. 5 and Supplemental Figs. 3-7.

      Reviewer #3 (Significance (Required)):

      The authors present a sobering perspective on the chi-squared periodogram, which is still very popular among empirical biologists. They plainly show using artificial data that it is better to avoid the CSP when possible, although they suggest improvements to the CSP. The authors provide an R package to perform the analysis.

      There have been previous work that have highlighted other limitations of the CSP. This might be considered one more nail in the coffin of the CSP.

      I think this paper would be interest to both computational biologists and wet-lab biologists, but I think it ought to have a greater influence on the latter as the former already resort to more sophisticated approaches.

      My expertise is in Computational and Theoretical biology.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The authors identify a serious flaw in a popular method called Chi-squared periodogram (CSP) for period estimation in circadian rhythms. They systematically get to the source of the problem -- a discontinuity in the test statistic. This flaw leads to a bias in the period estimate. They present two modifications to the CSP, one of which they prefer. Nevertheless, they show that other more flexible methods such as Lomb-Scargle Periodogram work well without this discontinuity (bias) issue.

      Major Comments:

      1.One thing the authors do not include is timeseries lengths of non-integer days. Would it not be an interesting suggestion to choose a non-integer length time course, which is not a multiple of the periods of interest, and still continue using CSP as is ? This is also rather counter-intuitive.

      2.I suppose the authors use a sampling resolution of 6min with wheel-running activity in mind. But it would be worth it in the interest of completeness to also consider a lower resolution. There is nothing in this study that ties it to the specific application, is it not?

      3.The authors discuss only the mean absolute error in the text but isn't the direction (sign) of the error also of interest. As far as I can see in Fig 5, conservative CSP overestimates and greedy CSP generally underestimates periods.

      Minor Comments:

      1.I would like to see the formulae for the ratio of variances and p-values to be clear about how the authors computed the CSP. They describe it in words already, but I think some mathematics is warranted here.

      2.It is nice to the see the raw data in the plots. But I would like to see the plot of the summary statistics (mean and variance/st. dev) for each of scatter plots to judge the size of bias. It is not easy to do this with the Excel sheet.

      Significance

      The authors present a sobering perspective on the chi-squared periodogram, which is still very popular among empirical biologists. They plainly show using artificial data that it is better to avoid the CSP when possible, although they suggest improvements to the CSP. The authors provide an R package to perform the analysis.

      There have been previous work that have highlighted other limitations of the CSP. This might be considered one more nail in the coffin of the CSP.

      I think this paper would be interest to both computational biologists and wet-lab biologists, but I think it ought to have a greater influence on the latter as the former already resort to more sophisticated approaches.

      My expertise is in Computational and Theoretical biology.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Chi-squared periodograms (CSP) are routinely used in circadian biology. In particular, this test has been used to determine circadian period in behavioral data (e.g. actigraphy) in mammals, flies and other species. This paper suggests that CSP, in some circumstances (e.g. where there are discontinuities), that CSP could be improved by changing the algorithm. They propose different steps to do this (e.g. using their greedy CSP code) and/or by using alternative tests such as Lomb-Scargle.

      The authors use simulated data to demonstrate their findings, and whilst I can see the benefits of this, it would be useful to benchmark the algorithms on actual real world circadian data (e.g. actograms from mouse or fly experiments). Although these types of data may not be publicly available, it would be highly likely to be available from multiple labs in the circadian field. In particular, fly datasets will be abundant in many clock labs. This would aid the utility of the papers findings for the field.

      Significance

      The paper is helpful for the circadian field when dealing with datasets that may contain discontinuities.

      It appears that the paper will be primarily useful for behavioral data, rather than, for example, transcriptomic time courses, since these tend to be much shorter and less sample intensive. Thus, it would be useful for circadian (and other) researchers analysing activity data in particular.

      My expertise is in circadian rhythms, both behavioural and molecular (e.g. sequencing) level analyses. Thus, I would be a possible end-user for the algorithms in this paper.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Tackenberg & Hughey investigate the reliability of a popular period estimation algorithm, the chi-square periodogram. They find a bias in the estimation, and through careful investigation identify the cause. This is a well executed and well presented study.

      Comments:

      In Figs 2+3 the authors show that the discontinuity in periodogram coincides with the number of complete cycles, K. However, in Fig 2C there are several other positions where K abruptly changes, but little effect on the chi-squared statistic is observed. Can the authors offer an explanation as to why the magnitude of the discontinuities differ?

      An important claim is that the discontinuity is observed in multiple software implementations. However, the plots of Supplementary Fig 1C,D are presented too small to evaluate this claim.

      It may be of interest to apply the algorithms to a single-cell experimental data set which are qualitatively different (e.g., oscillation shape, damping).

      Significance

      It has been previously shown that the chi-square periodogram algorithm has performance shortcomings for the analysis of circadian data (e.g. Zielinski et al., 2004). However, this study demonstrates exactly why, giving more conclusive evidence to support the conclusion that it should be avoided. This will be useful to many in the mammalian circadian community. It should be noted however that other algorithms are already favoured by other ciock communities (e.g. plant), even if a rigorous understanding of the biases were lacking.

      The methods developed here will be valuable for future comparisons of circadian algorithms. Of particular importance will be comparing algorithms for analysis of single-cell rhythms or non-stationary rhythms.